Skip to main content

BAJAJ TECHNOLOGY SERVICES

Website Indexers and Crons for a leading E-commerce Platform

Website Indexers and Crons for a leading E-commerce Platform
This case study aims to provide an in-depth understanding of functionality of indexers and crons, highlighting their business purpose, technical details, and performance metrics.
Oct 22, 2024
Website Indexers and Crons for a leading Ecommerce Platform

Introduction

An e-commerce website, utilizes a complex system of indexers and crons to manage product data, update the catalog, and provide a seamless user experience. The primary objective of the indexers and crons is to enhance the browsing experience for customers and ensure the website remains up-to-date with the latest products and offers. Additionally, they aim to automate repetitive tasks, making the website's operations more efficient.

Business Challenge

An e-commerce website, faced a significant challenge in managing its vast product catalog and ensuring data accuracy. The website hosted,120+ product categories 92,000+ product catalogues

550+ brands

28,000+ local sellers

It received data from various channels, including the catalog team, which created product categories and catalogues on Magento, and respective seller-dealer mapping. The major challenge was to extract, transform, and load this huge dataset into the database, while keeping in mind the frequency of changes and the need for a scalable and stable solution.

Solution

To address the challenge of keeping the website up-to-date with the latest product information, we designed a solution that leverages a centralized AWS S3 bucket to receive dealer details, catalog data, inventory, and dealer master data from the business team. This data is received in ETL format, thrice a day, and daily files are processed through an automated ETL pipeline that transforms and loads the data into the database and process through the various indexers and crons mentioned below.

Indexers

  • Bulk Indexer: Responsible for precooking product data, including SKU, model, scheme, dealer, and city details, and storing it in MongoDB.
  • Express Bulk  Indexer: Processes incremental data, updating the catalog with new  or modified products.
  • Elastic  Search Indexer: Pushes data from MongoDB and Magento DB to Elastic  Search, enabling fast and accurate search results.
  • Doc Indexer:Validates and updates offer data from CSV files, ensuring data  consistency and accuracy.
  • Offer Indexer: Activates or deactivates products based on offer data, updating the catalog accordingly.
  • Configurable Product Indexer: Maps similar products based on group identifiers,  enabling product variations.
  • Special Offer Indexer: Pushes special offers to MongoDB, enhancing the user  experience.
  • Two-wheeler Indexer: Updates two-wheeler product data, ensuring accurate and  up-to-date information.
  • Delete Offer Indexer: Deletes duplicate records from the indexer_seller_offers table.
  • Dealer  Coordinate Indexer: Pushes logistic details to PM Mongo, enhancing the delivery experience.
  • Logistic Category Indexer: Updates logistic data in PM Mongo DB, ensuring  accurate delivery information.
  • FAQ Indexer: Updates FAQ data in Elastic Search, improving the user experience.

Crons

  • Catalog Cron  Magento: Creates a JSON file of the entire SKU catalog, pushing it  to AWS S3.
  • Catalog Cron  AEM: Creates PDP pages based on the catalog.json file, ensuring  accurate and up-to-date product information.
  • Category Cron Magento: Creates a JSON file of categories, pushing it to AWS  S3.
  • Category Cron AEM: Deletes and recreates PLP pages based on the category.json  file.
  • Breadcrumb Cron Magento: Generates breadcrumbs for each PDP page, enhancing  navigation.
  • Order Dump Cron: Generates order-related data in xlsx format for reporting  purposes.
  • Catalog Product CSV Cron: Generates a complete catalog dump, including all  product details.
  • One Minute Cron: Pushes order data from SFDC, ensuring timely order processing.
  • Online DP PG Push Cron: Pushes online down payment cases to SFDC, ensuring  accurate payment processing.
  • PG Retry Cron: Adds a RETRY CTA to the "my order" section, enhancing the user experience.
  • Google Feed Cron: Generates a JSON file for indexing on Google, improving search  engine optimization.
  • Unbxd Cron: Generates a large JSON file for UNBXD, enabling fast and accurate  search results.

Specific Actions Taken

Setup Data Sharing Process: We established a process for the business team to share data in a specific format, aligning their frequency of sending data to our ETL pipeline.

Aligned Business Team: We worked closely with the business team to ensure they understood the data sharing process and frequency, minimizing manual intervention and ensuring a smooth process.

Impact

Improved Data Syncing: The website's database is now synced with the latest data, ensuring accuracy and reducing manual intervention.

Increased Efficiency: Automated processing of large datasets has improved efficiency, reducing the time and effort required to update the website.

Enhanced User Experience: With the latest data available, users can now search and find products more easily, enhancing their overall experience on the website.

Business KPIs

Monthly Indexer Performance Report: Tracks the performance of each indexer, including success and failure rates.

Indexer Dashboard: Provides a detailed report of all indexers, enabling performance analysis and optimization.

Ecommerce2

Written by

Dhiraj Jha
Head - commerce & experience