Apache Hadoop is an open source software framework used to develop data processing applications executed in a distributed computing environment.
Hadoop allows you to first store Big Data in a distributed environment, so that, you can process it in parallel.
In 2006, Yahoo created Hadoop based on GFS and MapReduce with Doug Cutting and team. Later in Jan 2008, Yahoo released Hadoop as an open source project to Apache Software Foundation.
The Hadoop framework itself is mostly written in the Java programming language, with some native code in C.
- Hadoop Common – Libraries and utilities required by other Hadoop modules
- Hadoop Distributed File System (Storage) – HDFS allows to store any kind of data across the cluster, offering a high aggregate bandwidth across the cluster
- YARN (Processing) – Allows parallel processing of the data stored in HDFS
- MapReduce – It is a computational model and software framework for writing applications which are run on Hadoop.
Hadoop has a Master-Slave Architecture for data storage and distributed data processing using MapReduce and HDFS methods.
- Suitable for Big Data analytics
- Fault tolerance
- Flexibility and scalability in data processing
- Reliability and availability
- Distributed processing