Question

Why Hadoop and Why not RDBMS for managing Big Data? Elaborate with suitable 
example. [5]

Accepted Answer

The following are some of the reasons why Hadoop should be used in
managing big data over RDBMS

IN TERMS OF DATA VOLUME

Volume means the quantity of data which could be comfortably stored
and effectively processed. Relational databases surely work better
when the load is low, probably gigabytes of data. This was the case
for so long in information technology applications, but when the data
size has grown to Terabytes or Petabytes, RDBMS isn’t competent to
ensure the desired results.

On the other hand, considering Hadoop is the right approach when the
need is to handle a bigger data size. Hadoop can be used to process a
huge volume of data effectively compared to the traditional relational
database management systems.

DATABASE ARCHITECTURE

Considering the database architecture, as we have seen above Hadoop
works on the components as:

HDFS, which is the distributed file system of the Hadoop ecosystem.

MapReduce, which is a programming model that help process huge data
sets.

Hadoop YARN, which helps in managing the computing resources in
multiple clusters.

However, the traditional RDBMS will possess data based on the ACID
properties, i.e., Atomicity, Consistency, Isolation, and Durability,
which are used to maintain integrity and accuracy in data
transactions. Such transactions would be of any sectors like banking
systems, telecommunication, e-commerce, manufacturing, or education,
etc.

THROUGHPUT

It is the total data volume process over a specific time period so
that the output could be optimized. Relational database management
systems are found to be a failure in terms of achieving a higher
throughput if the data volume is high, whereas Apache Hadoop Framework
does an appreciable job in this regard. This is one major reason why
there is an increasing usage of Hadoop in the modern-day data
applications than RDBMS.

DATA DIVERSITY

The diversity of data refers to various types of data processed. There
are structures, unstructured, and semi-structured data available now.
Hadoop possesses a significant ability to store and process data of
all the above-mentioned types and prepare it for processing. When it
comes to processing big volume unstructured data, Hadoop is now the
best-known solution.

However, traditional relational databases could only be used to manage
structured or semi-structured data, in a limited volume. RDBMS fails
in managing unstructured data. However, it is very difficult to fit in
data from various sources to any proper structure. So, we can see that
Hadoop is the apt solution in handling data diversity than RDBMS.

Question #243657

Expert's answer