Million Ads Machine: How Nexxen Runs an Accurate, High-Performance Ad Scoring Platform

By: Himanshu Kachalia, Engineering Manager

In the fast-paced world of digital advertising, the ability to efficiently rank and score millions of ads in real-time is paramount to success. As advertisers increasingly demand measurable results tailored to their unique campaign objectives, the challenge for Demand-Side Platforms (DSPs) intensifies. How do we strike the perfect balance between speed, accuracy, and scalability in a landscape marked by fluctuating demands and diverse goals? 

Feature Engineering 

Effective feature engineering is crucial for extracting actionable insights from complex, sparse datasets. At Nexxen, we distill thousands of candidates’ features into a few hundred high-impact signal-based features, which is key to optimizing model performance and enhancing prediction accuracy. 

Our data scientists utilize a proprietary query engine to execute a variety of Spark and MapReduce jobs, deconstructing data across multiple dimensions. We’ve developed a framework that detects feature discrepancies, improving training data quality. This system combines automated pipelines with on-demand SparkSQL queries, allowing for comparisons between training data and incoming ad requests, ensuring that the models remain aligned with the latest data distributions. 

Ad Scoring and Ranking Using Machine-Learning Models 

Before scoring and ranking, a series of services handle pre-processing steps like filtering and discarding based on specific rules. These services leverage in-memory caches and distributed key-value stores for fast retrieval of metadata from relational databases and object stores. These lookups occur in milliseconds, crucial for ensuring real-time performance. 

When an impression request arrives, the scoring system uses the deserialized trained models loaded into memory for immediate scoring and ranking. Requests are transformed into feature vectors, which are then scored using a Directed Acyclic Graph (DAG), where each machine learning model acts as a node in the DAG. The DAG structure allows for dependency-based execution, optimizing for various KPIs like CPA (Cost Per Action), CPC (Cost Per Click), or Viewability. 

The complete bidding workflow—including selection, filtering, and scoring—occurs within a few milliseconds, enabling Nexxen to handle millions of requests per second while maintaining high throughput and minimal latency. 

Below is a high-level design of such a system: 

A/B Testing and Custom Bidding Strategies 

The scoring platform provides A/B testing through user-split methodologies, to quantify campaign lift while minimizing expenses commonly associated with control group impressions. Distinct machine learning model versions can be assigned unique budget caps, facilitating performance benchmarking and adaptive budget allocation. 

Advertisers can further optimize their bidding algorithms by applying bid multipliers across various targeting vectors, providing increased flexibility to maximize campaign effectiveness. 

Observability 

Our observability infrastructure is divided into two core components: 

1. Model Generation and Training: Tracks the success rates of data collection, training, and model distribution.

2.
Model Performance: Monitor real-time performance metrics such as latency, throughput, and accuracy for each deployed model.

We leverage a time-series database to collect high-resolution metrics and generate dynamic dashboards. These dashboards allow us to separate signal from noise, providing insights into true performance anomalies. 

Model Release and Versioning 

Our CI/CD pipeline integrates GitLab and Jenkins for version control, build automation, and deployment. This setup enables seamless rollout of new machine learning models or rollback to previous versions based on real-time performance metrics, ensuring both agility and reliability in model deployment. 

Looking Ahead 

As Nexxen looks to the future, a key focus will be leveraging larger and more sophisticated AI models to tackle challenges of AI-driven ad fraud and navigating the potential for AI-powered algorithms to perpetuate biases present in training data, leading to discriminatory ad targeting and reinforcing existing inequalities. 
 
By continuously refining its technology and methodologies, Nexxen is committed to developing strategies and further expand customization on our scoring platform that address these issues while ensuring efficient predictions and maintaining high performance. 

Read Next