Deriving Insights from Adobe Data Feed

Unlock deeper user insights with Adobe Data Feed. Learn setup, processing, and how to leverage granular interaction data for custom analytics and advanced modeling.

Oct 21, 2024 | 4 min read

Why Use Adobe Data Feed?

Although Adobe Analytics provides a robust set of pre-built dashboard functionalities, the Adobe Data Feed offers enhanced capabilities that complement these dashboards by enabling:

Custom Analytics: For more granular, customized analysis.
Integration with Transactional Data: Seamlessly combine Adobe data with data warehouses for a unified view.
Analytical Modeling: Use data feeds to build advanced machine learning models.

Key Challenges in Using Adobe Data Feed

Despite its benefits, using Adobe Data Feed presents several challenges:

Data Complexity: With around 1,200 attributes, Adobe Data Feed can be difficult for even technical users to navigate.
Integration Issues: Loading and integrating Adobe Data Feed with existing systems (like CRMs or data warehouses) requires data engineering skills.
Performance Considerations: Large datasets, often in gigabytes, demand efficient processing and retrieval strategies.
Granularity: The data is provided at the hit level and requires aggregation to the page, visit, or visitor level depending on the requirement.
Documentation and Support: Although Adobe offers documentation, understanding and applying it to derive business insights can be challenging.

Case Study: Implementation for an Online Financial Marketplace

Here's a step-by-step breakdown of our implementation process for an online financial marketplace:

Step 1: Configuring Adobe Data Feed

To set up Adobe Data Feed in Adobe Analytics:

Navigate to Admin Settings:
Adobe Analytics → Admin → Data Feeds → Add (New Data Feed)
Data Feed Configuration:
- Name: Assign a name to the data feed.
- Report Suite: Select the appropriate Adobe report suite.
- Email Notification: Provide an email for completion notifications.
- Feed Interval: Choose Hourly or Daily as per your requirements.
- Start & End Dates: Specify start and end dates or opt for continuous feed.
- Continuous Feed: Select this option to enable the feed to run indefinitely.
- Processing Delay: Set a delay for receiving the data feed files if needed.

Step 2: Destination Setup (S3/FTP/SFTP)

Delivery Method: Adobe recommends using Amazon S3 for scalable, cloud-native integration, although FTP and SFTP are also supported.
Sample Configuration for S3:
- Account Name: ExampleAccount
- Account Description: Adobe Analytics Feed
- Account Type: Amazon S3
- Access Key: XXXXXXX
- Secret Key: XXXXXXX
- S3 Path: s3: //Adobe_Analytics/Lakehouse/Bronze/active/

You can configure other attributes like compression format (zip, gzip) and include a manifest file (for AWS S3).

Step 3: Processing Adobe Data Feed

Given the large size of the Adobe dataset, efficient processing and storage solutions are essential. Our data warehouse was built on AWS Redshift, and we utilized AWS services like Glue (for ETL) and S3 (for storage) to process and store the data.

Data Processing Workflow:
- Raw Data Ingestion: Data is received in the S3 "Bronze" bucket (raw format).
- Data Transformation: Data is processed and stored in the S3 "Silver" bucket (processed format). Processing includes:
  - Converting data to Parquet format for efficient consumption.
  - Adding derived columns (e.g., UTM parameters, hit_date derived from date_time).
  - Partitioning data by hit_date.
- Querying with Redshift Spectrum: Adobe data, stored as Parquet files in S3, is accessed via Redshift Spectrum as external tables. This allows the data to be queried using SQL and easily joined with transactional data for reporting or dashboarding.

Adobe Data Feed File Structure

Adobe Data Feed is delivered as a zip file containing:

Raw Data Feed File: Logs of user interactions.
Column Header File: Metadata for the raw data columns.
Dimension Files: Tables for enriching the raw data with contextual information.

Pre-Processing and Filtering

Before deriving insights, apply the following filters:

Exclude Unnecessary Hits: Keep only rows where exclude_hits = 0.
Filter Hit Sources: Exclude rows with hit_source values of 5, 7, 8, or 9 to remove specific data sources.

Deriving Key Insights from Adobe Data Feed

UTM Attributes:
- utm_source: Derived from the first value in the post_campaign field.
- utm_medium: Second value in post_campaign.
- utm_campaign: Third value in post_campaign.
- utm_term: Fifth value.
Product: Extract the product name from the second part of the product_list, a semi-colon separated string.
Event List: Break down the event_list array, then join each event ID with its corresponding dimension table to obtain event codes.
Unique Visit and Visitor Count:
Use a combination of post_visid_high, post_visid_low, visit_num, and visit_start_time_gmt to generate a unique visit ID, allowing for accurate tracking of sessions. For visitor count, use post_visid_high and post_visid_low.
Time Spent on Pages: Calculate the time spent on each page by subtracting the timestamp of the first hit on the current page from the first hit on the next page using an SQL window function.

Summary

While using Adobe Data Feed to complement Adobe Analytics can be complex, it provides valuable insights into user behavior, such as visit patterns and time spent on pages, which help infer user intent. By following the steps outlined in this blog, businesses can leverage Adobe Data Feed for deeper, more granular analysis.

References

Written by

Biswajit Mukhopadhyay

Head - Data and AI

https://www.linkedin.com/in/biswajit-mukhopadhyay-ba173826/

View More Blogs