Big Data Analytics fulfills the organization’s need for better predictions.
As we all know, many mathematical and statistical models are available for predictions and decision making. The analysis achieved helps to mine data and generate results to help organization for predictions.
We can leverage our Big Data skills to utilize these models for future predictions.
Today, we will discuss the Spearman’s Rank Correlation, which is used in a non-parametric test to measure the strength of association between two variables.
For example, we could use this test to know the effect of one stock price on another and correlate them meaningfully.
Following are the basic requirements to calculate Spearman’s Rank Correlation :
- Scale of Measurement must be ordinal (or interval, ratio)
- Data must be in the form of matched pairs.
- The association must be monotonic
- (i.e., variables increase in value together, or one increases while the other decreases)
d = The difference between ranks of each variable pair
n = The number of pairs of data
Correlation standardizes the measure of interdependence between two variables. It consequently, tells us how closely the two variables move.
The correlation measurement, called a Correlation Coefficient, will always take on a value between 1 and – 1.
There could be 3 outcomes:
1) Correlation Coefficient is positive
- If the correlation coefficient is one, the variables have a perfect positive correlation.
- If one variable moves a given amount, the second moves proportionately in the same direction.
- A positive correlation coefficient less than one indicates a less than perfect positive correlation, with the strength of the correlation growing as the number approaches one.
2) Correlation Coefficient is 0
- If the correlation coefficient is zero, no relationship exists between the variables.
- If one variable moves, you can make no predictions about the movement of the other variable; they are uncorrelated.
3) Correlation Coefficient is negative
- If correlation coefficient is –1, the variables are perfectly negatively correlated (or inversely correlated) and move in opposition to each other.
- If one variable increases, the other variable decreases proportionally.
- A negative correlation coefficient greater than –1 indicates a less than perfect negative correlation, with the strength of the correlation growing as the number approaches –1.
The primary goal of organizations who are using Hadoop is to derive appropriate results to ascertain correct decisions. Planning, Risk Analysis and Return on Investment (ROI) also are a part of the primary goals.
Let us take a real world example of stock price,
In this example we will calculate Spearman’s Rank Correlation Coefficient using Pig.
Following are the data sets,
1. NYSE Daily 1970-2010 Open, Close, High, Low and Volume
2. NASDAQ Daily 1970-2010 Open, Close, High, Low and Volume
3. AMEX Daily 1970-2010 Open, Close, High, Low and Volume