Throughout the years of data science development, the importance of feature engineering has become ever more apparent. To explore innovative approaches, I’ve been searching for exceptional work on Kaggle.

Kaggle’s competitions have spanned various business sectors, predominantly focusing on:

  • Financial Services: Predictive modeling for financial markets, such as market movement prediction and fraud detection, is popular due to its high industry relevance.
  • Healthcare: Competitions involving medical imaging, disease prediction, and genomic data analysis are common, underscoring the growing role of AI in healthcare.
  • Retail and E-commerce: Competitions often involve sales forecasting, customer satisfaction prediction, and personalization, reflecting industry needs for predictive insights.
  • Agriculture and Environmental Science: Challenges like crop disease detection or satellite image analysis for environmental monitoring are typical in this category.
  • Social Media and Marketing: NLP tasks such as sentiment analysis, customer review classification, and engagement prediction cater to social media platforms and marketing.

Based on participation and community engagement, five of the most popular recent Kaggle competitions in the financial services and healthcare fields include:

  • Jane Street Market Prediction: Focused on predicting market movements, this competition draws from proprietary trading, involving time-series and feature engineering for financial data.
  • Optiver Realized Volatility Prediction: This competition emphasized forecasting volatility in the trading of options, featuring financial time-series data .
  • IEEE-CIS Fraud Detection: Participants developed models for detecting fraudulent credit card transactions using imbalanced datasets.
  • RSNA-MICCAI Brain Tumor Radiogenomic Classification: This healthcare competition focused on classifying genetic mutations in brain tumors from MRI data .
  • Google Analytics Customer Revenue Prediction: Entrants predicted customer purchase behavior based on demographic and session data from Google Analytics .

In the related posts, I’ll work on these projects to find out the awesome feature engineering work.