The key challenge that the client faced was of consolidating data from various sources to a single data source. There was a need to create a system which can rank the companies from stock exchanges all over the world. Each company will be ranked over the cost of accounts and their past performance. So, the user can search any foreign exchange, sector and company value ranges and order the companies by value and quality for investment.
Design a data warehouse for such a huge system where data is coming every second.
Data was coming from multiple sources like txt, csv, xml, xlsx, web service and legacy database system.
Processing historical data for 60K companies from all over the world of 40years (from 1970).
Need for ETL process to download multiple zip files each of 3gig from Reuters web api and extract these zip files and process approx 140K xml files each of approx 1Mb.
Total size of xml files in each run was approximately 30gig.
Extract live data of each stock from web api and populate them in data warehouse.
Design a data warehouse which can serve client on the fly with necessary analytic.
Validate xmls against xsd schema and clean data to populate warehouse.
Download large files from web api’s in a slow network with resume capabilities.
Calculate probabilities for each company stock using complex calculations.
SPEC INDIA developed a Data Warehouse and Analytics Solution for the customer which helped them in doing effective data analysis. The solution covered following features:
Consolidated Data Warehouse Enterprise data warehouse design with help of Kimball bus architecture.
Consolidate global stock market data for web based analysis to user.
Forecasting Stock Prices Apply algorithms to predict the stock prices for providing analytic to the end user.
Identify/Forecast the stocks to be purchased.
Develop Robust ETL process Download zip file in chunks to speed up the download using third party down loader integration with Pentaho Data Integration.
Process 160K yearly and interim files in each cycle, with a highly distributed and robust process developed and executed over a Pentaho kettle cluster using carte server.
Generally, a single xml file with lot of data hierarchy and of approx 1Mb size take 5-6 second to process but our team managed to finish the whole process for each file in 500-800 milliseconds using custom architecture.
Each xml file contains data about company, issues, COAs, issue periods, statements, company contacts, company officers, COA values, issue prices and their calculated aggregates over the period.
An ETL process executes at every hour and populates stock prices for interim price changes from web api.
ETL process is highly flexible for exception handling, notification sending and logging the errors.
ETL job can be resumed from any failure point in community edition.
Designed various aggregates and process in night cycle.
Visualization Using Open Source For meaningful analysis and to compare the companies over sector, market cap, issue, we have implemented Lobster JS.
Lobster JS is an open source JS which provides interactive graphics and enables user for meaningful analysis.
Our detailed and accurate research , analysis, and refinement leads to a comprehensive study that describes the requirements, functions, and roles in a transparent manner.
We have a team of creative design experts who are apt at producing sleek designs of the system components with modernized layouts.
Our programmers are well versed with latest programming languages, tools, and techniques to effectively interpret the analysis and design into code.
Quality is at the helm of our projects. We leave no stone unturned in ensuring superior excellence and assurance in all our solutions and services.
We have a well-defined, robust, and secure launch criteria that offers us a successful implementation clubbed with detailed testing, customer acceptance and satisfaction.