3 min readfrom Machine Learning

Built an ML model that predicts UFC fights and use AI to explain why [P]

Been a UFC fan for years and recently had some spare time to build a web app that uses Machine Learning to predict which fighter will win as a learning project.

Surprisingly it does a very good job, the accuracy is around 71.6%, tweaked the algorithm so it factors in each division's dominant fighting style and gives a slight edge to fighters who match it.

The ML pipeline, for those interested:
Model: XGBoost binary classifier (binary:logistic) — predicts whether fighter A wins (1) or fighter B wins (0). Draws/no-contests are excluded from training.

Features (26 total)

The key design choice is differential features — all stats are computed as fighter_A − fighter_B rather than absolute values. This makes SHAP values directly interpretable as relative advantages:

- Striking: SLpM diff, sig. strike accuracy diff, strikes absorbed diff, strike defense diff

- Grappling: TD avg diff, TD accuracy/defense diff, submission avg diff

- Physical: height diff, reach diff

- Record: win rate diff, finish rate diff, KO/TKO rate diff, sub rate diff

- Recent form: wins_last_5 diff, losses_last_5 diff, and a recency-weighted momentum score (most recent fight = 5×, oldest = 1×, normalized to [−1, +1])

- Experience: total fights diff, avg strikes/TDs/control time diff

- Style context: style_win_rate_in_division diff — historical win rate of each fighter's style (striker/wrestler/etc.) in that weight class

- Non-differenced: style encoding for both fighters (kept as a pair to capture matchup interaction), weight class delta (are they cutting/up from their natural division)

Leakage prevention
For each training sample, stats for both fighters are computed using only fights that occurred strictly before that fight's date. No future data bleeds in. All fights are sorted by date; for fight N, only fights 1..N-1 contribute to the feature vector.

Training
- XGBoost: n_estimators=500, max_depth=5, lr=0.05, subsample=0.8, colsample_bytree=0.8, min_child_weight=5, early stopping at patience=50

- 80/20 stratified train/test split

- 5-fold stratified CV to validate generalization

- Metrics: accuracy, ROC-AUC, log-loss

TreeSHAP generates per-feature importance values for each prediction. These are passed to Claude (claude-sonnet-4-6) alongside raw stats to generate a natural language explanation — Claude narrates, never calculates (all numbers come from the deterministic pipeline).

https://ufc-fight-predictor-v1.vercel.app/

Disclaimer: This is a personal learning project, not financial or betting advice. Predictions are based on historical stats and will be wrong plenty of times.

submitted by /u/albaneso
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#machine learning in spreadsheet applications
#real-time data collaboration
#natural language processing
#real-time collaboration
#rows.com
#big data management in spreadsheets
#conversational data analysis
#financial modeling
#cloud-based spreadsheet applications
#financial modeling with spreadsheets
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#UFC