About the Customer
In a typical enterprise application, product data is distributed across multiple systems – sales, purchase, POS, CRM, billing, warehouse, reporting etc. Since product data is central to any enterprise and mission critical for any type of workflows, most systems maintain a local copy of product data. Any change in the same requires an update in all the downstream systems.
With the increased volume of product data changes due to ever-changing business needs and requirement of multiple test environments creation driven by agile adoption, it becomes significantly difficult to support delivery of product changes on a daily basis. Majority of data management teams’ time occupied in supporting product data distribution and there is little focus on innovation. The end result is that product data distribution and deployment process becomes a bottleneck – impacting business agility and competitiveness.
Existing data distribution process was an age-old process and had several drawbacks:
- Product data distribution and deployment only happen once a day. Any subsequent change to product data requires 24-hrs window to reflect in all the systems.
- With an increase in data volume over a period of time, systems often face stability issues due to high load.
- It typically takes 5-6 hours to complete data distribution and deployment across all downstream systems – impacting e2e integration testing.
- There are many manual touch points that require coordination among multiple dev and testing teams.
Solution Design Goals
As part of devops journey, teams decided to work closely on improving the process with an aim to achieve the following design goals:
- Multiple, smaller, on-demand data deployments as against one large scheduled deployment
- Delta refresh as against full refresh
- Automate Data deployment with the one-click process as against manual touch-points
- Near-zero downtime in test environments
Tools & Technologies
There were several architectural challenges to meet the above design goals. Out-of-the-box thinking and use of open source tools allowed teams to meet stated objectives. Some of the key architecture changes are:
- Download & Delta Refresh: In the existing process, all systems refresh local copy of data completely irrespective of the amount of change. With the new design, an intermediate layer introduced that enables all systems to have only delta refresh. This greatly helped to reduce time to download data from the source and refresh in the local system.
- Synchronize Data Refresh: With multiple systems involved – each having its own data storage, format and ETL process, the major question was ‘how to synchronize data refresh across all impacted systems?‘ Considering each system taking its own time to refresh data, if one system completes refreshing while other is in progress causes the challenge of systems going out of sync.
- To solve this problem, the notification process is implemented using an open source messaging broker; RabbitMQ. Every time there is a change in product data, a notification message is sent to each of the downstream system using RabbitMQ so each consumer can react to the event.
- Zero downtime: To ensure that any data deployment does not impact ongoing testing, all systems initially deploy data in an offline database and upon confirmation of latest data available in all systems, a notification is sent to each system to flip from offline to online db.