Marketing Science · Cohort Analysis · Survival Modelling
NBA Draft Class Cohort Analysis — 30 Years of Retention and LTV (1996–2026)
Every growth team builds the same table: users by signup month, % still active at month 1, 3, 6, 12. I built one for 30 years of NBA drafts. Each draft year is a cohort. Each career year is a time period. The cell value is the % of that class still playing meaningful minutes. Same logic, different labels.
ANOVA across 23 cohorts is significant at p < 0.001 — draft class differences are statistically real. Average retention drops to 49% by career year 5 and 30% by year 10. Top-5 picks reach year 10 at 2.8x the rate of second-rounders (55.7% vs 19.9%). The 2003 class — LeBron, Carmelo, Wade, Bosh — is a genuine outlier at 1.4x the 19-class average LTV.
Read the full write-up →
Python
Cohort Analysis
Survival Analysis
ANOVA
Pandas
nba_api
Marketing Science · Funnel Analysis · Statistical Testing
Premier League Fan Funnel — Instagram to Season Tickets (2023-24)
Where do sports fans drop off on their journey from digital follower to committed season ticket holder? A three-stage funnel built from citable published sources only: 20 clubs, Instagram following from official profiles, PL official attendance figures, and season ticket holder counts from Companies House annual reports. No proxies, no assumptions.
Instagram to STH conversion is strongly negatively correlated with follower count (Pearson r = -0.631, p = 0.003). Non-Big-Six clubs convert 9.2x more of their digital audience into season ticket holders than the Big Six. The gap is a supply constraint, not weak demand. Sheffield United converts 2.86% of Instagram followers into STH. Liverpool converts 0.07%. That is a 41x difference.
Read the full write-up →
Python
Funnel Analysis
Pearson Correlation
Chi-square
Plotly
Pandas
Marketing Science · Customer Segmentation · Dimensionality Reduction
NBA Player Archetype Segmentation — Three Eras (2003–2026)
Do traditional basketball positions capture how players actually play — or are there natural statistical archetypes telling a different story? K-Means clustering on NBA advanced stats (USG%, AST%, REB%, TS%, and more) across three era windows, using the same behavioral segmentation techniques applied to customer data in Marketing.
K was selected independently per era using silhouette scoring. The modern era required K=5 — the model independently carved out a Pass-First Guard cluster (high AST, low USG) that didn't exist in either prior era. Find the breakdown below.
Read the full write-up →
3-and-D / Role Player
Ball-Dominant Playmaker
Glass Anchor
Scoring Big / PF
Pass-First Guard (Modern)
Python
K-Means
UMAP
PCA
Scikit-learn
nba_api
Sports Analytics · Predictive Modeling
End-to-end ML pipeline for rugby match prediction across 25,000+ games (1893–2026). Features engineered on ELO differentials, form, momentum, head-to-head history, rest days, margin form, and venue-specific win rates. XGBoost and Logistic Regression trained in a walk-forward backtest — training on all data before year T, predicting year T.
Backtested against real bookmaker odds (Pinnacle/bet365). Model achieves 71% accuracy vs 69% ELO baseline — but real-odds ROI is negative, consistent with an efficient market that has already priced in publicly available information.
Read the full write-up →
Could not load feature data.
Could not load accuracy data.
Python
XGBoost
Logistic Regression
Backtesting
Plotly
Feature Engineering
Sports Analytics · Published Research · Interactive Dashboard
Elo-based rating system for international rugby union. Processes 25,000+ international matches from 1893 to present with adaptive K-factors, home advantage corrections, and recency weighting. Produces pre-match win probabilities calibrated against market odds.
Extended into a Streamlit dashboard with six analytical views: current rankings, historical ELO trajectories, era dominance across five periods (pre-WW1 through modern), greatest upsets by upset probability, a live match predictor, and expected vs actual wins. Notable finding: New Zealand 2013 outperformed expected wins by 6.7.
Could not load rankings data.
Could not load World Rugby data.
Python
Elo Ratings
Statistical Modeling
Streamlit
Plotly
Published Research
Computer Vision · Reinforcement Learning
Six-stage pipeline processing match footage into structured game data. YOLOv8 detects players and ball frame-by-frame; K-means clusters jersey colors to auto-assign team membership; homography calibration maps pixel positions to field coordinates. Persistent player IDs are tracked across frames with velocity and possession computed between detections.
Labeled events — passes, carries, kicks, tries, turnovers — are used to assign rewards and train a PyTorch actor network for in-game decision classification.
Python
YOLOv8
PyTorch
Computer Vision
Reinforcement Learning
OpenCV