Skip to main content

BAJAJ TECHNOLOGY SERVICES

Empowering Data Pipelines with DRIFT Accelerators: Part 1

Image
blog-arrow
Empowering Data Pipelines with DRIFT Accelerators: Part 1
Discover DRIFT Accelerators, a powerful solution for seamless, scalable data pipelines.
Oct 21, 2024 | 3 min read
Empowering-Data-Pipelines-with-DRIFT-Accelerators

The Need for Innovation

Our journey began with a clear understanding of the challenges that data teams encounter:

  1. SQL Proficiency: Most team members are proficient in SQL, necessitating a solution that leverages this skill set.
  2. Tight Deadlines:  In a fast-paced environment, meeting deadlines is crucial.
  3. Centralized Job Management: Simplifying job creation and maintenance across the organization.
  4. Quick Turnaround Time (TAT):  Rapid deployment of data pipelines is essential.
  5. Reduced Maintenance Overhead: Moving away from the traditional approach of single-code/single-job structure to a more streamlined solution.
  6. Cost-Effectiveness: Ensuring that the solution is both efficient and cost-effective.

Exploring Alternatives

Before developing DRIFT Accelerators, we explored existing solutions such as Apache Airflow and DBT. While these tools offer valuable features, they presented challenges in terms of complexity, setup time, and integration with our existing infrastructure. Apache Airflow, for instance, excels in orchestration but falls short in handling business logic seamlessly. Similarly, DBT, though powerful, required extensive setup and posed challenges in cost control and enterprise readiness.

Our Solutioning Approach

Driven by the need for a user-friendly, scalable, and cloud-agnostic solution, we embarked on developing DRIFT Accelerators. Here's our approach:

  1. Configuration-Driven Approach: Opting for a configuration-driven approach to simplify job creation and maintenance.
  2. YAML Format: Choosing YAML for its user-friendly syntax and flexibility.
  3. Scalability and Cost Management: Ensuring the solution can scale across multiple instances while managing costs effectively.
  4. Cloud Agnostic: Designing the solution to be compatible with any cloud environment.
  5. Historical Run Maintenance: Incorporating a persisted store (PostgreSQL) for tracking and evaluating historical job runs.
  6. Spark Compatibility: Supporting Spark jobs for enhanced processing capabilities.
  7. Empowering SQL Users: Enabling SQL users to create data pipelines efficiently.
  8. Versatile Data Sources: Supporting various data sources including ODBC supported databases, S3, SFTP, and emails (targets only).
  9. Notification System: Implementing a robust notification system for managing job notifications.
  10. Adhoc Data Requests: Facilitating adhoc data requests in CSV formats for offline analysis and outbound calling.

Initial Goals for Version 1

For the initial release of DRIFT Accelerators, our focus was on delivering key functionalities:

  1. Metadata tables for tracking job progress and data lineage.
  2. Support for ELT workloads and S3 as a data source.
  3. DAG orchestration for workflow management.
  4. Support for adhoc emails with CSV data attachments.
  5. Data loading onto PostgreSQL and AWS Redshift for data warehousing.

Design Considerations

In designing DRIFT Accelerators, we paid careful attention to the following:

  1. YAML Structure: A structured YAML format for defining inputs, outputs, job options, and notifications.
  2. Centralized Credentials: Storing credentials centrally on AWS Secret Manager for enhanced security.
  3. Customizable Logging: Providing flexibility in defining log levels at the job level.
  4. Grouping Jobs: Introducing the concept of job groups for better organization.
  5. Load Types: Supporting various load types, including transformations and updates.
  6. Data Modelling: Implementing robust data modelling for job-related tables, including job master, snapshot master, and job logs.

Stay Tuned for Part 2

Part 2 of this series will delve deeper into the technical architecture of DRIFT Accelerators Version 1 and highlight the new features and enhancements introduced in DRIFT Version 2. Join us as we explore how BTS Organization is revolutionizing data integration and flow technology.

[End of Part 1]

[Note: Part 2 of this article series will be published soon, providing further insights into DRIFT Accelerators Version 1 and Version 2, including new features and enhancements.]

Written by

Biswajit Mukhopadhyay
Head - Data and AI
logo