We all live in data age. As we all know, the biggest challenge in the industry is a flood of data. The flood of the data is from many sources.
- Social Media, One of the website of in social media produces about the 10 billions data per day.
- One of the biggest stock exchange produced one terabyte records per day…
- And many mores….
With large volume of the data issue arise for data analysis. Early in 1990 typical hard disk capacity is 1GB of data and transfer rate is 4.4 MB/s. To read the data from this will take approx. 5 minutes. Now, Currently one terabyte storage for server is common but the transfer rate is 100MB/s. so it takes 2 and half hours to read the 1 terabyte data. It is not acceptable timing for reading the data. Even the writing takes more time compare to reading.
The obvious way to reduce is this is read from multiple disks at once. We need to split the data across the several disk. For example for 1 terabyte data required to split on 100 harddisks for the quick retrieval. The problem here is that most analysis needs to combine the data after the retrieval. Hadoop provides the MapReduced programming model that abstract the problem from disk reads and writes and transferring over to key and values.
The another questions aries here is Why can’t we use RDBMS with lots of disks instead of map reduced. The problem with RDBMS seek time for finding the required data. We can use the MapReduced as component of the RDBMS.
In next article we write more on MapReduced and Features of the Hadoop.
SPEC INDIA, taken the initiatives on Hadoop and started the initiatives for hadoop with the dedicated team.