Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Monday 5 August 2024

Python code for Multi-Touch Attribution (MTA) & Marketing Mix Models (MMM) based on Google Analytics 4 (GA4) BigQuery data

To implement Multi-Touch Attribution (MTA) and Marketing Mix Models (MMM) using Google Analytics 4 (GA4) BigQuery data, you’ll need to access and analyze your data from BigQuery. Below is a simplified Python code example for both MTA and MMM based on GA4 data in BigQuery. This code assumes you have access to BigQuery and that your GA4 data is stored there.

Pre-requisites

Google Cloud Account: Ensure you have access to Google Cloud and BigQuery.
Google Cloud SDK: Install and configure the Google Cloud SDK.
Python Libraries: Install the required Python libraries (google-cloud-bigquery, pandas, numpy, scikit-learn, matplotlib).

bash
pip install google-cloud-bigquery pandas numpy scikit-learn matplotlib

Step 1: Set Up BigQuery Client

python
from google.cloud import bigquery
import pandas as pd

# Set up BigQuery client
client = bigquery.Client()

# Query GA4 BigQuery data
def query_bigquery(query):
    query_job = client.query(query)
    return query_job.result().to_dataframe()

# Example GA4 query to get event data
query = """
SELECT
    event_date,
    event_name,
    traffic_source.source AS source,
    traffic_source.medium AS medium,
    traffic_source.campaign AS campaign,
    COUNT(*) AS events
FROM
    `your_project.your_dataset.your_table`
WHERE
    event_name IN ('page_view', 'purchase')
GROUP BY
    event_date, event_name, source, medium, campaign
ORDER BY
    event_date
"""

df = query_bigquery(query)
print(df.head())

Step 2: Multi-Touch Attribution (MTA) using Python

For simplicity, we'll use a linear attribution model, which assigns equal credit to each touchpoint in the conversion path.

python
def linear_attribution(df):
    # Filter for conversions
    conversions = df[df['event_name'] == 'purchase']
    
    # Calculate touchpoints
    touchpoints = df[df['event_name'] != 'purchase']
    
    # Join touchpoints with conversions
    attribution = pd.merge(touchpoints, conversions, on='event_date', suffixes=('_touch', '_conversion'))
    
    # Assign equal credit
    attribution['credit'] = 1 / attribution['events_conversion']
    
    # Aggregate credit by source, medium, and campaign
    result = attribution.groupby(['source_touch', 'medium_touch', 'campaign_touch'])['credit'].sum().reset_index()
    return result

attribution_results = linear_attribution(df)
print(attribution_results)

Step 3: Marketing Mix Models (MMM) using Python

For MMM, we'll use a simple linear regression model to estimate the impact of different marketing channels on conversions.

python
from sklearn.linear_model import LinearRegression
import numpy as np

def marketing_mix_model(df):
    # Prepare data
    X = df[['source', 'medium', 'campaign']]
    y = df['events']
    
    # Convert categorical variables to dummy/indicator variables
    X = pd.get_dummies(X, drop_first=True)
    
    # Linear Regression
    model = LinearRegression()
    model.fit(X, y)
    
    # Predictions
    df['predicted_events'] = model.predict(X)
    
    # Print coefficients
    coefficients = pd.DataFrame(model.coef_, X.columns, columns=['Coefficient'])
    print("Coefficients:\n", coefficients)
    
    return df

mmm_results = marketing_mix_model(df)
print(mmm_results.head())

Notes:

Data Preparation: The actual GA4 data schema might be different. You may need to adjust the SQL queries and data preparation steps based on your specific data structure.
Advanced Attribution: The linear attribution model is simple. Advanced models like time decay, position-based, or custom models might be more appropriate for specific needs.
MMM Complexity: Marketing Mix Models can be complex, often requiring additional features and more sophisticated modeling techniques, including regularization and cross-validation.

Further Steps

Validation and Testing: Always validate your models with a separate test dataset.
Feature Engineering: Enhance models with more features and interactions.
Visualization: Use libraries like matplotlib or seaborn to visualize results.

Adjust and extend these examples as needed for your specific use case and data.

-------------------------------

Building a Robust MTA and MMM Framework Using GA4 and BigQuery

Understanding the Data

Before diving into code, it's crucial to understand the structure of your GA4 data in BigQuery. This typically involves:

Event Data: Detailed information about user interactions with your website or app.
User Properties: Attributes associated with users (e.g., demographics).
User Identifiers: Unique identifiers for users across platforms.

Data Extraction and Preparation

Python
import pandas as pd
from google.cloud import bigquery

# Replace with your project ID and dataset name
client = bigquery.Client(project='your-project-id')

# Sample query to extract event data
query = """
SELECT
  event_timestamp,
  event_name,
  user_id,
  event_params
FROM
  `your-project-id.your-dataset.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))   
"""

# Execute the query
df = client.query(query).to_dataframe()

Data Transformation and Feature Engineering

Event Parsing: Extract relevant information from the event_params column.
User Journey Creation: Create user journeys based on event sequences.
Feature Engineering: Create features like time spent on site, number of pageviews, and custom metrics.

MTA Model Implementation

Rule-Based Attribution: Assign credit based on predefined rules (e.g., last click, first click, linear).
Data-Driven Attribution: Use statistical models (e.g., Markov chains, probabilistic models) to assign credit based on user behavior.
Machine Learning Attribution: Employ machine learning algorithms (e.g., random forest, gradient boosting) to learn complex patterns in user journeys.

MMM Model Implementation

Data Aggregation: Aggregate data to the channel or campaign level.
Model Selection: Choose appropriate statistical models (e.g., time series, regression) based on data and business objectives.
Model Training: Train the model on historical data to estimate the impact of marketing channels on sales or conversions.
Evaluation: Assess model performance using metrics like R-squared, RMSE, and lift.

Code Example (Simplified)

Python
import statsmodels.api as sm

# Assuming you have prepared dataframes for MTA and MMM
# ...

# MTA example (rule-based)
def last_click_attribution(df):
  # Assign credit to the last touchpoint
  # ...

# MMM example (linear regression)
X = df[['channel1_spend', 'channel2_spend', ...]]
y = df['sales']
model = sm.OLS(y, X).fit()

Additional Considerations

Data Quality: Ensure data accuracy and completeness for reliable results.
Experimentation: Test different models and parameters to find the optimal approach.
Visualization: Use visualization tools to understand model outputs and communicate insights effectively.
Integration with Business Tools: Integrate attribution models with marketing and sales tools for actionable insights.

Challenges and Limitations

Data Privacy: Handle user data responsibly and comply with privacy regulations.
Model Complexity: Building sophisticated attribution models requires expertise in statistics and machine learning.
Causality vs. Correlation: Attribution models can identify correlations but not necessarily causation.
Data Availability: Sufficient data is essential for accurate modeling.

By following these steps and addressing the challenges, you can build a robust MTA and MMM framework to optimize your marketing efforts.

Would you like to delve deeper into a specific aspect of MTA or MMM, such as model selection, evaluation, or visualization?

-------------------------------------------