Machine Learning Data Science Python OpenPowerlifting

Powerlifting Strength Analytics

Using competitive powerlifting data to answer a deceptively simple question: how strong is strong?

Strength Percentile Calculator

How strong are you?

Enter your squat / bench / deadlift → get your percentile against competitive lifters

Why & How?

[Powerlifting is a sport in which you are required to lift heavy on the ‘‘Big 3’’ lifts- Squat, Bench Press & Deadlift. The total score is the sum of weight lifted in all 3 lifts. There are bodyweight, gender and age classes for level playing field.]

After falling down the powerlifting rabbit hole, I discovered OpenPowerlifting: an open-source database which tracks literally EVERY SINGLE national & international powerlifting competitions from the 1970s to today. It is updated daily. (Click on the link above & you will see why it's one of the best sports to analyse!) I initially downloaded it to do some basic trend analysis in Excel for fun, only to watch my PC lagging and crashing when trying to load over a million rows.

Hence I was resorted to Python, and what started as a timepass curiosity evolved into what is *likely* the most rigorous statistical and ML-based breakdown of this dataset to date on the internet.

Approach

The analysis utilized the OpenPowerlifting dataset, rigorously filtering over one million raw entries down to 433,376 drug-tested, full-power competition records to establish a clean baseline for pure raw strength[cite: 1].

The methodology combined Exploratory Data Analysis with advanced statistical techniques, using log-log linear regression to extract allometric scaling exponents and bootstrap confidence intervals to pinpoint biological peak ages[cite: 1].

To move beyond static observations, medical Kaplan-Meier survival modeling was applied to quantify athlete retention and churn rates[cite: 1]. Finally, a temporally split XGBoost machine learning architecture and Unsupervised K-Means clustering were deployed to predict strength totals and identify distinct biomechanical archetypes[cite: 1].

Outcomes & Findings

Sex-Specific Scaling Bias: Log-log regression empirically proved that male strength scales at an exponent of 0.672 (near perfect geometric scaling), while female strength diverges heavily at 0.460[cite: 1]. This mathematically proves that standard coefficients like Wilks inherently penalize heavier women[cite: 1].
The Biological Peak: Bootstrap analysis clustered the peak age for raw lifters at 31.0 years for males and 30.0 years for females[cite: 1]. The performance curve is remarkably flat across the prime competitive window, confirming a gradual decline rather than a sudden "athletic cliff"[cite: 1].
High-Churn Sport: Survival analysis revealed a median career duration of just 98 to 119 days, proving powerlifting functions largely as a high-churn tourist sport[cite: 1]. However, athletes who compete three or more times in their first year double their five-year retention probability[cite: 1].
Machine Learning Predictions: A temporally split XGBoost architecture successfully predicted raw total outputs using purely baseline demographic inputs (Age, Sex, Bodyweight, Year) with an $R^2$ of 0.722, quantifying a lifter's pure biological potential without knowing their training history[cite: 1].
Biomechanical Archetypes: K-Means clustering identified 5 distinct lifter profiles[cite: 1]. This validated the "Old Man Bench" phenomenon (aging lifters shifting to upper-body reliance) and proved that a "Balanced Push-Pull" strategy achieves the highest elite totals rather than extreme specialization[cite: 1].

GitHub Repository →