About the Customer
Industry: Finance
The client is a leading news agency of Australia. This is first of its kind – Global Stock Ranking tool that can search foreign exchanges, sectors and company value ranges and order the companies by quality and value for investment. It can use predefined formulas or be modified to allow users to input their own criteria and produce an attractive, easy to interpret graphical display.

Key Challenges
The key challenge that the client faced was of consolidating data from various sources to a single data source. There was a need to create a system which can rank the companies from stock exchanges all over the world. Each company will be ranked over the cost of accounts and their past performance. So, the user can search any foreign exchange, sector and company value ranges and order the companies by value and quality for investment. Following were the hurdles being faced:
- Design a data warehouse for such a huge system where data is coming every second.
- Data was coming from multiple sources like txt, csv, xml, xlsx, web service and legacy database system.
- Processing historical data for 60K companies from all over the world of 40years (from 1970).
- Need for ETL process to download multiple zip files each of 3gig from Reuters web api and extract these zip files and process approx 140K xml files each of approx 1Mb.
- Total size of xml files in each run was approximately 30gig.
- Extract live data of each stock from web api and populate them in data warehouse.
- Design a data warehouse which can serve client on the fly with necessary analytic.
- Validate xmls against xsd schema and clean data to populate warehouse.
- Download large files from web api’s in a slow network with resume capabilities.
- Calculate probabilities for each company stock using complex calculations.

Our Solution
SPEC INDIA developed a Data Warehouse and Analytics Solution for the customer which helped them in doing effective data analysis. The solution covered following features:
- Consolidated Data Warehouse
- Enterprise data warehouse design with help of Kimball bus architecture.
- Consolidate global stock market data for web based analysis to user.
- Forecasting Stock Prices
- Apply algorithms to predict the stock prices for providing analytic to the end user.
- Identify/Forecast the stocks to be purchased.
- Develop Robust ETL process
- Download zip file in chunks to speed up the download using third party down loader integration with Pentaho Data Integration.
- Process 160K yearly and interim files in each cycle, with a highly distributed and robust process developed and executed over a Pentaho kettle cluster using carte server.
- Generally, a single xml file with lot of data hierarchy and of approx 1Mb size take 5-6 second to process but our team managed to finish the whole process for each file in 500-800 milliseconds using custom architecture.
- Each xml file contains data about company, issues, COAs, issue periods, statements, company contacts, company officers, COA values, issue prices and their calculated aggregates over the period.
- An ETL process executes at every hour and populates stock prices for interim price changes from web api.
- ETL process is highly flexible for exception handling, notification sending and logging the errors.
- ETL job can be resumed from any failure point in community edition.
- Designed various aggregates and process in night cycle.
- Visualization Using Open Source
- For meaningful analysis and to compare the companies over sector, market cap, issue, we have implemented Lobster JS.
- Lobster JS is an open source JS which provides interactive graphics and enables user for meaningful analysis.
Tools and Technologies

Business Benefits
SPEC INDIA’s successfully implemented solution has helped its client to make meaningful analysis before purchasing stock from stock exchanges over the globe.
It benefited them in the following ways:
- Strong analytical data representation in clear way from the large set of historical data.
- Help in decision making of stock purchases for the end user.
- Rich Data visualization helps in identifying performing and non-performing companies and compares them to take a wise decision before investment.
- Predicting/Forecasting stock prices.
- It gave answers to the following questions:
- How does the company make money?
- How promising is the overall economic environment for company?
- What Does the Company Do?
- How fast is the Company Growing?
- How Profitable Is It?
- Is It Worth the Price?