Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. – Wikipedia
The primary objective of data mining is to extract patterns and knowledge from huge data sources. The history goes back to manual extraction methods like regression analysis when data sets weren’t that huge and were manageable. Modern data is quite widespread and large and hence the need for modern day data mining tools and techniques.
Components
The key components of data mining are data source, data warehouse server, data mining engine, pattern evaluation module, graphical user interface and knowledge base.
Techniques
Today’s data mining techniques are automated, swift and effective with the following major activities:
- Detection of irregularities
- Dependency modeling
- Discovering groups through clustering
- Simplifying known structure with classification
- Relationship approximation with regression
- Representing data sets in a compressed manner with summarization
Data Mining Steps
- ETL of data into a data warehouse
- Storing and managing data in a multidimensional database
- Offering data access to business specialists with application software
- Giving analyzed data in simplistic form, e.g. graphs
Popular Tools
Oracle Data Mining, R language, Python, Orange, Rapid Miner, Weka Apache Mahout, Rattle etc.