Big Data is when the size of the data itself becomes part of the problem. But there’s more to Big Data than merely being “big”.
The ‘Three Vs’ of Big Data:
Volume – Enterprises across all industries will need to find ways to handle the ever-increasing data volumes being created on a daily basis.
Velocity – Real-time decisions require real-time data. Velocity refers to the speed with which data must be generated, captured, shared and responded to.
Variety – Big Data encompasses all data types- structured and unstructured – such as text, sensor data, audio, video, click streams and log files. This broad-view analysis offers insights siloed data cannot approach.
What is HADOOP?
Hadoop is the open source framework designed to address the Three Vs of Big Data. It enables applications to work with thousands of computationally independent computers processing petabytes of data. Hadoop was derived from Google’s MapReduce and Google File System
Why HADOOP for BIG DATA
• HADOOP handles petabytes of data and most forms of unstructured data
• The velocity challenge of big data can be addressed by integrating appropriate tools within the Hadoop eco system, such as Vertica, HANA, etc.
Advantages of HADOOP
1) Data and computation are distributed, and the local computation model to data prevents network overload.
2) Tasks are independent, therefore –
- Can handle partial failure, i.e. entire nodes can fail and restart
- Avoids crawling horrors of failure and tolerant synchronous distributed systems
- Speculative execution available to work around stragglers
- Linear scaling utilizes cheap, commodity hardware
4) Simple programming model. The end-user programmer only writes MapReduce tasks
5) Hadoop Distributed File System (HDFS) is a simple and robust coherency model
6) Data reliably
7) HDFS is scalable without compromising fast access to information.
Traditional vs. HADOOP
Phani K Reddy is a Big Data Architect with Bodhtree, a leader in Data Analytics, Business Intelligence, and Big Data services. Bodhtree provides Hadoop implementation and maintenance services as an end-to-end service to solve specific business challenges.