BrightFunnel Files Patent on Dynamic Attribution Models

BrightFunnel Files Patent on Dynamic Attribution Models

We currently have fully customizable static attribution models available within the BrightFunnel platform. We are excited to announce that we have also filed a patent (“Dynamic Attribution Models” by Nisheeth Ranjan, Ranjan Bagchi, and Nadim Hossain) on how to create dynamic attribution models. Before we dive into static and dynamic models, lets define the attribution problem.

The Attribution Problem

B2B marketing teams everywhere are familiar with the revenue attribution problem: How do you credit revenue back to the marketing campaigns that influenced a deal? Similarly, the pipeline attribution problem is: How do you credit pipeline dollars back to marketing campaigns that influenced an opportunity?

Both attribution problems are important to solve because we need to measure the revenue and pipeline impact of a campaign before we can calculate an ROI (return on investment) for the campaign.

Static Attribution Models

BrightFunnel’s marketing analytics platform solves the attribution problem by calculating multiple customizable attribution models (or rules for crediting revenue or pipeline dollars back to campaigns) in parallel many times a day. Four examples of attribution models used by our customers are:

  1. First Touch: Attribute 100% of the revenue to the first campaign that influenced the opportunity.
  2. Last Touch: Attribute 100% of the revenue to the last campaign that influenced the opportunity.
  3. Evenly Weighted: Attribute 1/N% of the revenue to all campaigns that influenced the opportunity, where N is the total number of influencing campaigns.
  4. X-Y-X: Attribute X% of the revenue to the first campaign, X% to the last campaign, and split the remaining Y% over all the influencing campaigns between the first and the last. The constraint that needs to be satisfied in this model is that (X + Y + X) should equal 100.

All four models above are static attribution models because the rules of attribution are spelled out in advance.

In dynamic attribution models, the rules of attribution are learned automatically (via machine learning approaches) and adjusted constantly.

Dynamic Attribution Models

For a BrightFunnel customer, the main steps to follow in order to create a dynamic attribution model are:

  1. Divide up the customer’s past data into a training set and a test set.
  2. Start with a default attribution model and use it to credit pipeline/revenue to campaigns in the training set.
  3. Use the training set (which now links campaigns to pipeline/revenue) to calculate metrics like lead-to-opportunity conversion rate (LTO%), opportunity-to-deal conversion rate (OTD%), lead-to-opportunity velocity (LTOV), opportunity-to-deal velocity (OTDV), average opportunity amount (OA), average deal amount (DA) for each campaign group.
  4. Calculate actual revenue (AR) generated by the leads in the test set.
  5. Use the metrics in Step 3 to calculate predicted revenue (PR) generated by the leads in the test set.
  6. Calculate the test set error as the absolute value of the difference between AR (calculated in step 4) and PR (calculated in Step 5).
  7. Change the attribution model parameters and repeat steps 2 through 6 above until the test set error is minimized.
  8. The attribution model that minimizes the test set error is the dynamic attribution model personalized to the customer.

The above process automatically selects an attribution model that best predicts revenue based on the past data of a particular customer. As time passes and more data is generated by the customer, the above process can be re-run periodically to continually optimize the model.

The dynamic attribution model approach creates a custom attribution strategy tailored to a particular customer’s market’s past behavior. The model is automatically fine-tuned and improved as the BrightFunnel platform ingests more data from the customer over time. The test set error is a key metric that shows the customer how the attribution model is improving over time.

Conclusion

There is an exciting roadmap ahead for BrightFunnel’s attribution platform. We already have the capability to calculate multiple complex customizable static attribution models many times a day. Soon, we will make our attribution platform even more powerful and differentiated by adding dynamic attribution modeling.

Improved Prediction of Marketing Metrics via Machine Learning

Improved Prediction of Marketing Metrics via Machine Learning

This is an engineering blog post written for engineers and technical managers interested in applying machine learning to improve predictive analytics within their organizations.  Connect with the author: @n_ranjan or LinkedIn.

Summary:

We used regularized linear regression to learn parameters for our prediction model that accepted an input of 11 lead characteristics and output the lead to opportunity velocity (LTOV: number of days taken by a lead to convert to an opportunity) for a lead.   Our model was able to predict LTOV values with 23.64% less error compared to the “average method” described in more detail below.

Context:

A key question we are trying to answer for marketing departments is: given a set of leads, how much revenue will result from them and how will it be distributed across future months?  For example, here is how we forecast how much revenue will result in April 2014 from webinar leads created in February 2014 (similarly, we can also forecast how much revenue will result from the February leads in March or May):

R = L * LTO% * OTD% * DA, where

R = Revenue in April 2014 from Feb 2014 webinar Leads

L = Number of webinar Leads created in Feb 2014 such that (Created Date + LTOV + OTDV) is a date in April 2014

LTOV = Average number of days (Velocity) for webinar Leads to convert to Opportunities

OTDV = Average number of days (Velocity) for webinar Opportunities to convert to Deals

LTO% = Lead to Opportunity conversion rate for webinar Leads

OTD% = Opportunity to Deal conversion rate for webinar Opportunities

DA = Average webinar Deal Amount

For example, the LTOV value is calculated by averaging the LTOV values for all webinar leads created in the past 6 months.  So, we assume that the average value of LTOV in the past is a good predictor of the LTOV value in the future.  This is good first step but it predicts the same LTOV value (average from the past 6 months) for all webinar leads in the future and, thus, ignores any webinar lead characteristics that might influence the predicted LTOV value.

We wanted to use machine learning techniques to see if we could use past data and relevant lead characteristics to predict future values of LTOV, OTDV, LTO%, OTD%, and DA better than the average method.

We chose to focus on LTOV values first and this article presents the results of our efforts.

Method:

Given a lead L, we assume that its LTOV value is:

LTOV(L) = P0 * X0 + (P1 * X1) + (P2 * X2) + (P3 * X3) + … + (P14 * X14)

where

P0, P1, … P14 are parameters of the model,

X1, X2, … X11 are lead characteristics of L (number of campaign touches, number of campaign responses, etc.),

X0 = 1,

X12 = X12,

X13 = X22,

X14 = X32.

We can think of X0 to X14 as the lead’s 15 element feature vector.

We are given a data set of leads that have converted to opportunities in the past 9 months.

We divide the data set into two equally sized randomly sampled sets to create our training set and test set.

m = number of leads in the training set = number of leads in the test set.

From the training set:

1) We create a matrix X_train with m rows and 15 columns such that the i’th row of the matrix, X_train(i), contains the 15 element feature vector for the i’th lead, L(i).

2) We create a vector y_train whose i’th element is the LTOV value of the i’th lead, L(i).

Similarly, we create the matrix X_test and the vector y_test from the test set.

We use the normal equations method to compute the parameter vector, [P0, P1, … , P14], that minimizes the following cost function, J, on the training set:

J = 1 / (2 * m) * { Sum(i)_from_1_to_m [ (LTOV( X_train(i) ) – y_train(i))2 ] + lambda * (P12 + P22 … + P142) }

where

lambda = the regularization parameter.

Using the normal equations method:

[P0, P1, …P14] = (X_trainT * X_train + (lambda * D))-1 * X_trainT * y_train

where

D = 15 x 15 identity matrix with the (1, 1) element replaced with 0.

Using the P0 to P14 parameters, we calculate the training set error, J_train, and the test set error, J_test:

J_train = 1 / (2 * m) * ( Sum(i)_from_1_to_m { (LTOV( X_train(i) ) – y_train(i))2 } ),

J_test = 1 / (2 * m) * ( Sum(i)_from_1_to_m { (LTOV( X_test(i) ) – y_test(i))2 } )

We use the following 16 values for lambda: 30000, 15000, 11000, 10000, 9000, 8000, 3000, 1000, 300, 100, 50, 30, 20, 10, 3, 1.  As shown in the table below and the graph above, for each value of lambda, we calculate the training set error, J_train, and the test set error, J_test.  We choose the lambda value (9000) that minimizes the test set error to 882.07.

Lambda Training Set Error Test Set Error
30,000 801.43 905.16
15,000 784.28 885.43
11,000 778.22 882.53
10,000 776.53 882.20
9,000 774.76 882.07
8,000 772.89 882.19
3,000 760.31 888.56
1,000 746.91 893.52
300 723.68 888.36
100 698.77 886.42
50 687.60 892.84
30 682.18 899.92
20 679.12 905.71
10 675.63 914.44
3 673.02 924.47
1 672.51 928.72

Next we compare our LTOV model’s performance with the average method.

Average LTOV in training set = 30.8229 days

Using the average LTOV value from the training set, we compute the test set error, J_test_with_avg:

J_test_with_avg = 1 / (2 * m) * { Sum(i)_from_1_to_m [ (30.8229 – y_test(i))2 ] } = 1155.2

So, our model achieves a test set error of 882.07 and the average method ‘s test set error is 1155.2.

This shows that our model predicts LTOV values with 23.64% (((1155.2 – 882.07) / 1155.2) * 100%) less error than the average method.

Limitations:

1) We tested 25 different combinations of manually chosen lead characteristics and then chose the set of characteristics that yielded the lowest test set error.  There are probably more automated ways of doing feature selection that can help us do a more exhaustive search over the full sample space of lead characteristic combinations.  It is possible that we are missing out on using some other set of lead characteristics that might yield an even lower test set error.

2) Our data set was relatively small and specific to a single customer’s data.  The set of lead characteristics we chose for this customer might not generalize well to other customers.

3) We did not use a cross-validation set when we tried various values of lambda.  Before deploying this model in production, we should divide the dataset into a training set (50%), a cross validation set (25%), and a test set (25%).  We should choose the lambda value that yields the minimum cross validation set error.  The parameter vector calculated from the chosen lambda value should be used to compute the model’s final test set error.

Conclusion:

We were able to verify our intuition that machine learning techniques (specifically, regularized linear regression) can do better than the average method for predicting LTOV values.  A reduction in test set error of 23.64% is a significant improvement and encourages us to apply a machine learning based approach to predictions of the other variables (OTDV, LTO%, OTD%, and DA) that comprise the revenue prediction formula.

Next Steps:

We can apply the above regression approach to predict OTDV and DA values.  We can apply classification approaches like logistic regression, neural networks, and support vector machines to identify leads with a high probability of conversion to opportunities and, consequently, predict LTO% values.

The same classification approaches can help us identify opportunities with a high probability of conversion to deals and, thus, predict OTD% values.   Then, we can plug the values of these variables (LTOV, OTDV, LTO%, OTD%, DA) into the revenue prediction formula above and compute the predicted revenue from a particular set of leads.