Introduction
An e-commerce website, utilizes a complex system of indexers and crons to manage product data, update the catalog, and provide a seamless user experience. The primary objective of the indexers and crons is to enhance the browsing experience for customers and ensure the website remains up-to-date with the latest products and offers. Additionally, they aim to automate repetitive tasks, making the website's operations more efficient.
Business Challenge
An e-commerce website, faced a significant challenge in managing its vast product catalog and ensuring data accuracy. The website hosted,120+ product categories 92,000+ product catalogues
550+ brands
28,000+ local sellers
It received data from various channels, including the catalog team, which created product categories and catalogues on Magento, and respective seller-dealer mapping. The major challenge was to extract, transform, and load this huge dataset into the database, while keeping in mind the frequency of changes and the need for a scalable and stable solution.
Solution
To address the challenge of keeping the website up-to-date with the latest product information, we designed a solution that leverages a centralized AWS S3 bucket to receive dealer details, catalog data, inventory, and dealer master data from the business team. This data is received in ETL format, thrice a day, and daily files are processed through an automated ETL pipeline that transforms and loads the data into the database and process through the various indexers and crons mentioned below.
Indexers
- Bulk Indexer: Responsible for precooking product data, including SKU, model, scheme, dealer, and city details, and storing it in MongoDB.
- Express Bulk Indexer: Processes incremental data, updating the catalog with new or modified products.
- Elastic Search Indexer: Pushes data from MongoDB and Magento DB to Elastic Search, enabling fast and accurate search results.
- Doc Indexer:Validates and updates offer data from CSV files, ensuring data consistency and accuracy.
- Offer Indexer: Activates or deactivates products based on offer data, updating the catalog accordingly.
- Configurable Product Indexer: Maps similar products based on group identifiers, enabling product variations.
- Special Offer Indexer: Pushes special offers to MongoDB, enhancing the user experience.
- Two-wheeler Indexer: Updates two-wheeler product data, ensuring accurate and up-to-date information.
- Delete Offer Indexer: Deletes duplicate records from the indexer_seller_offers table.
- Dealer Coordinate Indexer: Pushes logistic details to PM Mongo, enhancing the delivery experience.
- Logistic Category Indexer: Updates logistic data in PM Mongo DB, ensuring accurate delivery information.
- FAQ Indexer: Updates FAQ data in Elastic Search, improving the user experience.
Crons
- Catalog Cron Magento: Creates a JSON file of the entire SKU catalog, pushing it to AWS S3.
- Catalog Cron AEM: Creates PDP pages based on the catalog.json file, ensuring accurate and up-to-date product information.
- Category Cron Magento: Creates a JSON file of categories, pushing it to AWS S3.
- Category Cron AEM: Deletes and recreates PLP pages based on the category.json file.
- Breadcrumb Cron Magento: Generates breadcrumbs for each PDP page, enhancing navigation.
- Order Dump Cron: Generates order-related data in xlsx format for reporting purposes.
- Catalog Product CSV Cron: Generates a complete catalog dump, including all product details.
- One Minute Cron: Pushes order data from SFDC, ensuring timely order processing.
- Online DP PG Push Cron: Pushes online down payment cases to SFDC, ensuring accurate payment processing.
- PG Retry Cron: Adds a RETRY CTA to the "my order" section, enhancing the user experience.
- Google Feed Cron: Generates a JSON file for indexing on Google, improving search engine optimization.
- Unbxd Cron: Generates a large JSON file for UNBXD, enabling fast and accurate search results.
Specific Actions Taken
Setup Data Sharing Process: We established a process for the business team to share data in a specific format, aligning their frequency of sending data to our ETL pipeline.
Aligned Business Team: We worked closely with the business team to ensure they understood the data sharing process and frequency, minimizing manual intervention and ensuring a smooth process.
Impact
Improved Data Syncing: The website's database is now synced with the latest data, ensuring accuracy and reducing manual intervention.
Increased Efficiency: Automated processing of large datasets has improved efficiency, reducing the time and effort required to update the website.
Enhanced User Experience: With the latest data available, users can now search and find products more easily, enhancing their overall experience on the website.
Business KPIs
Monthly Indexer Performance Report: Tracks the performance of each indexer, including success and failure rates.
Indexer Dashboard: Provides a detailed report of all indexers, enabling performance analysis and optimization.