In the Google Analytics Customer Revenue Prediction competition, competitors applied several sophisticated feature engineering techniques to optimize revenue predictions.
Here are some key perspectives and the business rationale behind them:
- Customer Interaction Features
Competitors extracted features based on session-level data from Google Analytics, such as number of pageviews, session duration, bounce rate, and time since last visit. They also calculated metrics like total revenue per user or session and revenue per pageview.
These features are essential for understanding customer engagement and their journey through the website. Higher engagement metrics (like increased pageviews or longer session durations) often correlate with a greater likelihood of conversion. By quantifying these interactions, competitors could predict which users are more likely to generate higher revenue, enabling more targeted marketing strategies.
- Categorical Encoding of Demographics
Features such as device type, browser type, country, and operating system were transformed using one-hot encoding and target encoding. This allowed competitors to better analyze how different customer segments behaved on the website. D emographic segmentation helps businesses tailor their marketing strategies to specific audiences. By encoding these categories, the models could identify patterns such as which device users are more likely to make purchases or if customers from certain regions tend to generate higher revenue, thus enhancing customer profiling and targeting.
- Time-Based Features
Competitors engineered features based on session timestamps, such as day of the week, hour of the day, and month. They also used lag features to capture how a user’s recent activity affected their likelihood to generate revenue.
Time-based features help in capturing seasonal trends and daily behavioral patterns. For example, if sales are consistently higher on weekends or during specific months, marketing and promotions can be optimized accordingly. This insight is crucial for e-commerce platforms in planning sales events and customer engagement strategies.
- Aggregated Customer History Metrics
Many participants aggregated session data over multiple visits to create features that reflect a user’s purchase history, such as total lifetime revenue, average purchase amount, and frequency of visits. These metrics often included rolling averages and cumulative sums.
Customer lifetime value is a significant metric for predicting long-term profitability. By aggregating historical metrics, models could better identify high-value customers and those with a higher likelihood of repeat purchases. This helps in optimizing customer retention strategies and focusing efforts on more profitable customer segments.
- Feature Interactions
Some advanced solutions involved creating interaction features, such as combining device type with country or day of the week with session duration to capture more nuanced behavioral trends.
Interaction features help reveal complex relationships within the data. For instance, mobile users from specific countries may exhibit distinct purchasing behaviors on certain days. By understanding these interactions, businesses can deliver more personalized and relevant experiences, increasing the likelihood of conversion.
These approaches combined customer interaction insights with demographic, temporal, and historical data, resulting in more predictive models for e-commerce revenue.