Let’s take a look at the Optiver Realized Volatility Prediction, an extraordinary instance of combining financial knowledge with data science techniques.

In the Optiver Realized Volatility Prediction competition, participants approached feature engineering from several key perspectives, each of which focused on capturing specific financial insights to enhance model predictions. Here’s a breakdown of these perspectives, their implementations, and the business rationale behind them:

  1. Price Aggregation and Stability (WAP and Log Returns)

Participants used Weighted Average Price (WAP) as a foundational metric, which takes into account both the bid and ask prices weighted by their sizes. They then calculated log returns based on WAP to normalize price variations across time, focusing on percentage changes rather than absolute ones.

Business Rationale: WAP reflects the price level at which most trading activity occurs, providing a realistic snapshot of market consensus. Log returns, on the other hand, measure the price’s relative change, which is crucial in financial markets where volatility (the target in this competition) reflects not just price levels but how much they fluctuate. This allows trading models to focus on changes that significantly impact trading decisions and is particularly useful in options pricing, where volatility is a primary determinant of value.

  1. Merging Order Book and Trade Data

To capture a complete view of the market, competitors joined order book data (which includes bid/ask prices and sizes) with trade data (actual trades). This was achieved by creating a time-based composite index, allowing competitors to fill in gaps with forward-filled data for continuous analysis.

Business Rationale: Order book data shows market intent (what traders are willing to pay or accept), while trade data reflects actual market actions. By combining these, participants created a dataset that accurately represents market sentiment and actual market movements. This approach is critical in environments like high-frequency trading, where understanding both latent demand and actual transactions helps predict price direction and volatility.

  1. Volatility and Momentum Analysis

Competitors calculated realized volatility by summing squared log returns over various windows and added these metrics as features. They also used lag features to capture momentum, which reflect historical price trends and anticipate future changes.

Business Rationale: Realized volatility captures market fluctuations, giving insights into how prices have varied over short or medium periods, which can be indicators of future trends. Momentum, on the other hand, reflects the persistence of price movements. In financial contexts, this is valuable as momentum can imply trend continuation, helping traders make decisions on timing trades to maximize returns while minimizing risk.

  1. Handling Missing Data and Imbalance

Participants addressed gaps in trading data by forward-filling missing values for continuous analysis and carefully selected features that minimized noise. They also made adjustments for imbalanced data by focusing on normalization and reindexing by timestamp.

Business Rationale: Financial data can be sparse, particularly during low trading periods, yet maintaining continuity is vital for real-time volatility prediction. Forward-filling values allows models to handle data in a way that reflects what would be available in actual trading scenarios, where recent prices are typically the best indicators in the absence of new trades. This practice aligns with real-world trading, where smooth and timely data streams are necessary for accurate modeling.

  1. Feature Engineering for Liquidity Measures Some top competitors created liquidity measures, such as order flow imbalance and average trade sizes, to reflect the market’s buy/sell pressure. These features were then used to understand trading behaviors that might lead to volatility spikes.

Business Rationale: Liquidity is a crucial component in market microstructure. Understanding buy/sell pressures allows for predicting how susceptible a stock is to rapid price changes. In the context of options trading (as relevant to Optiver), such insights help in setting fair prices that account for possible fluctuations, directly impacting profitability and risk management in trading strategies.

These perspectives demonstrate how participants in the competition combined financial knowledge with data science techniques to create robust predictive features. By focusing on metrics like WAP, realized volatility, and liquidity, competitors were able to craft models that closely align with the realities and demands of financial markets.