Data Science, Digital Marketing Strategy,

Data-Driven Success: How to Improve the Accuracy of Lifetime Value Predictions

Magdalena Andreeva

Reading time:

Marketers and ad platforms are increasingly shifting their focus from third-party data (3PD) to first-party data (1PD), as they look to measure the impact of their advertising efforts more accurately. By using 1PD, advertisers can gain a more complete picture of their customers, including their preferences, behaviors, and purchase history and use it to predict their lifetime value (LTV) with a much higher degree of accuracy.

What is this degree of accuracy and how do we bring it up? This is a question that needs to be in the back of your head throughout the entire process of converting the nice 1PD you have gathered into actionable predictive LTV (pLTV). 

To improve accuracy of business results, marketers and agencies can use a variety of methods, such as online surveys, customer feedback, and last but not least, machine learning. They can try to use first-party data to fuel growth. This information can be used to measure customer satisfaction, brand loyalty, and the effectiveness of advertising campaigns. Advertisers can also track KPIs, such as conversion rates, cost per acquisition, and revenue, to determine the success of their advertising efforts. That being said, all those measurements of success come down to one word: value. Identifying how much lifetime value (LTV) a customer brings is the best knowledge any advertiser could ask for. 

Yet, knowing the value retrospectively is different from predicting it without waiting for 1 or 2 years, but rather, only a few days. This would require the use of historical data and modeling. 

How could we possibly predict the value of a new customer within days of subscription for a product? Turns out, machine learning can be a tremendously helpful tool for a situation like this. It is one of the ways that data science is transforming digital marketing agencies now. Here is an outline of the steps you need to take to ensure your pLTV predictions are as accurate as they can get.

1. Establish baseline LTV buckets for existing customer

As a start, since our end goal is to establish pLTV for each incoming new customer, we need to assign an LTV to each customer already in our database. This may be done in several ways, grouping the customers by month of subscription, clustering them using an appropriate machine learning algorithm for SaaS marketing, using decay functions to estimate future unrealized LTV, all of the above, and more, are reasonable options, as long as we end up with several different cohorts of customers with different LTV values, distributed and spread nicely. After all, you do not want to jump through hoops to reach a result akin to “50% of my customers yield $200, the other 50% yield $200.5”, which would be the same as your usual averaged LTV calculation. This is part of building a data strategy for SaaS marketing.

2. Extract data for an appropriate ML algorithm

Next step is to provide enough data to run an ML model of choice. You need a decent number of attributes, ideally 20+, the list of which is composed of the most probable indicators of value discrepancy among your customers. For example, you would want to dedicate a few attributes to measuring a customer’s activity in the first few days after signing up for your product, while the domain of their email, if you are provided with it, is probably of lesser importance. The correct choice of data points you feed to your model is vital to the accuracy of your pLTV prediction. 

3. Choose and run an ML model

For predictive LTV in particular, there are a few possible choices of a suitable model, depending on the data you have available: decision tree, linear regression, random forest, to name a few. Some of those fit big data, some fit low to medium volumes, the same goes for all other qualities of the data: completeness, consistency, timeliness. Needless to say, the choice of a model is instrumental in maximizing your pLTV accuracy. In our experience, Random Forest is the jack of all trades, applicable in almost any situation and leading to high prediction accuracy. You can also learn more about the Markov Chain Attribution Model for Performance Marketing.

4. Tune your model

Running your model is one thing, but optimizing it often brings you a few precious percentages, which could result in quite a difference in total value. For example, when you bring a model accuracy from 95% to 96%, this one percent might not look like much, but actually, you are reducing inaccuracy by 20% (from 5% to 4%). This entire process is achieved by hyperparameter tuning, a machine-learning technique that can be equated to choosing and comparing your fishing gear options before you go out to the pond. Some sets of hyperparameters would always be better than others for the data you have, just as some baits are better received during springtime, for example. In order to maximize the accuracy of your model, you need to get to know the data like the back of your hand and choose the right hyperparameters for the job, either by fine-tuning them by hand or by using something like grid search, a technique that runs your model several times and chooses the optimal set for you.

5. Analyze results

Once you have deployed the desired high-accuracy model and allowed the platform you are using to get to know the brand-new pLTV numbers you are providing it with, it is time to wait and see how successful your model is.

The above screenshots show the immediate improvement of CAC and return-on-ad-spend after just one month of deploying our random forest model to their Google ads. As these are the baby steps of the deployment process for your model, you should monitor the real-life data predictions it is generating and whether or not it makes sense in your setting.

6. Maintain and upgrade the model

Machine learning is extremely dynamic right now – every month gives us model improvements, theorems, python libraries and so much more. In addition to toying around with additional types of models, whenever you get an “upgrade” of your data (just to see if your chosen algorithm is still the optimal one), you need to watch out for new development and state-of-the-art improvements for the model you have chosen. There can always exist a technique or an improvement out there that you can capitalize on and increase the accuracy of your machine learning pLTV model.

These six steps, just as the machine learning models, are subject to constant development and improvement. That being said, this is a surefire method to bring your ROAS to the next level and squeeze the maximum out of your ad spend.

If you want to deep dive into the topic, check out other ways on how you can leverage machine learning and not fall behind the curve!

Nikolay Stefanov
Nikolay Stefanov

Data Scientist

With the abundance of data nowadays, the possibilites are endless! I am here to combine machine learning and digital marketing into a spicy mix.

Ready to scale your marketing-sourced revenue?