In today's data-driven landscape, businesses are constantly seeking innovative solutions to efficiently manage and analyse vast amounts of data streaming-in from various sources.
As the data architect at Bajaj Technology Services, I am thrilled to share our expertise in crafting a modern data warehouse using a hybrid model, tailored to meet the dynamic needs of our clients, including a leading Non-Banking Financial Company (NBFC) in India.
EMBRACING NEAR REAL-TIME DATA STREAMS
At BTS, we understand the importance of harnessing near real-time data streams to gain actionable insights swiftly. Leveraging open-source Postgres database, we seamlessly integrate continuous data inflow, ensuring that our clients have access to the latest information at their fingertips. By employing transformations, we curate denormalized tables optimized for performance and tailored to specific business requirements. These tables, forming the foundation of our speed layer, minimize joins, facilitating near real-time views essential for critical decision-making processes.
COLLECTING DATA FROM MULTIPLE SOURCES
Our data warehouse journey begins with collecting data from multiple sources in OLTP (Online Transaction Processing) systems. We identify the deltas in data changes to ensure that only the relevant updates are processed, minimizing the load on our systems. This is achieved through a combination of change data capture (CDC) techniques and event-driven architectures. Data is pre-joined in Postgres to streamline the transformation process before it is moved to our MPP columnar database, such as Amazon Redshift.
DATA TRANSFORMATION AND LOADING
The data transformation process is pivotal to our data warehousing strategy. We transform and de-normalize data to create optimized tables for specific business needs. This involves aggregating data, performing calculations and applying business logic to ensure that the data is ready for analysis. We use ETL (Extract, Transform, Load) tools to automate and streamline this process, ensuring data consistency and accuracy.
Once transformed, the data is loaded into our MPP columnar database, where it is stored in wide tables. These tables are designed to minimize joins and maximize query performance. The use of an MPP database allows us to distribute the data across multiple nodes, enabling parallel processing and significantly improving query response times.
MAINTAINING PERFORMANCE WITH PURGING LOGIC
To uphold the integrity and efficiency of our near real-time data processing, we implement purging logic to retain only the most recent data relevant to immediate use cases. This strategic approach not only enhances query performance by reducing data volume but also ensures the timeliness and accuracy of insights derived from our data warehouse.
EMPOWERING ANALYTICS WITH DOMAIN-DRIVEN DESIGN
In our pursuit of delivering unparalleled data insights, we transcend traditional boundaries by incorporating domain-driven design principles into our architecture. By organizing wide tables on a Massive Parallel Processing (MPP) columnar database, we empower our clients with near real-time views, typically within a 2-hour window. This enables them to glean invaluable insights for advanced analytics, visualization and strategic decision-making, propelling their business towards greater success.
OUR SUCCESS STORY: NBFC TRANSFORMATION
Our hybrid data warehouse model has been successfully deployed for a leading NBFC in India, revolutionizing their data management and analytics capabilities. By seamlessly integrating near real-time data streams, optimizing performance through de-normalized tables and leveraging domain-driven design principles, we have empowered our client to stay ahead in a competitive market landscape.
DETAILED IMPLEMENTATION
DATA COLLECTION
We begin by collecting data from various sources such as transactional databases, CRM systems and external APIs. The data is first ingested into our OLTP systems, where we identify deltas to capture only the changes. This reduces the volume of data that needs to be processed and ensures timely updates.
PRE-JOINING AND TRANSFORMATION
Data is pre-joined in the Postgres database to simplify the transformation process. This involves combining related data from different tables into a single, de-normalized table. We use ETL tools to automate this process, ensuring consistency and accuracy. The transformed data is then loaded into our MPP columnar database.
DATA MODELLING
We employ an OBT (Operational Data Store) model, where data is stored in wide tables. This model minimizes the need for complex joins, improving query performance. We also maintain historical data by using time-stamped records, allowing us to track changes over time and perform trend analysis.
CONCEPTUAL DIAGRAM: DATA FLOW
REPORTS AND ANALYTICS
Our data warehouse supports various reports, such as lead-to-customer funnel reports, financial performance dashboards and customer behaviour analysis. The OBT model simplifies report generation, allowing business users to quickly access and analyse data without complex joins.
CONCLUSION
In conclusion, the adoption of a hybrid data warehouse model, blending near real-time processing with domain-driven design principles, marks a significant milestone in modern data architecture. At BTS, we remain committed to delivering cutting-edge solutions that enable our clients to unlock the full potential of their data, driving innovation and achieving sustainable growth. Join us on this journey towards data-driven excellence.
By sharing our experiences and learnings from building a data warehouse for NBFC, we aim to provide valuable insights into the challenges and solutions in modern data warehousing. Our approach demonstrates the importance of integrating near real-time data streams, optimizing performance through de-normalized tables and leveraging domain-driven design principles to empower businesses with actionable insights and strategic decision-making capabilities.