The most rigorous multidimensional analysis of 50+ years of Powerlifting historical data on the internet, through the lens of Machine Learning & Statistics, with real-life practical insights for gym-goers.
Do not worry about your individual potential. Potential is only the expression of a possibility, something that can be assessed accurately only in retrospect. IN OTHER WORDS YOU'LL NEVER KNOW HOW GOOD YOU BECOME AT SOMETHING, UNLESS YOU TRY.
[Powerlifting is a sport in which you are required to lift heavy on the ‘‘Big 3’’ lifts- Squat, Bench Press & Deadlift. The total score is the sum of weight lifted in all 3 lifts. There are bodyweight, gender and age classes for level playing field.]
After falling down the powerlifting rabbit hole, I discovered OpenPowerlifting: a database which tracks literally EVERY SINGLE national & international powerlifting competitions from the 1970s to today. It is updated daily. (Check out the OpenPowerlifting link above & you will see why it's one of the best sports to analyse!) I initially downloaded it to do some basic trend analysis in Excel for fun, only to watch my PC lagging and crashing when trying to load over 1 million+ rows.
Hence I was resorted to Python, and what started as a timepass curiosity evolved into what is *likely* the most rigorous statistical and ML-based breakdown of this dataset till date on the internet.
Theoretical geometric strength scales as STRENGTH ∝ BW0.667. Male scaling matches perfectly at 0.672, but female scaling severely lags at 0.460. This means female SBD strength grows much slower per kg of added mass than geometry predicts, attributable to the higher proportion of fat tissue in female mass gain.
The Wilks-BodyWeight correlation should be ideally 0 for a fair formula. For males, it is +0.09, nearly ideal. But for females, it is -0.17, penalising heavier female lifters regardless of her absolute raw strength.
For males, bench as a share of total rises monotonically with BW from 22.1% (sub-59kg) to 24.1% (120+kg). For females, ratio stays perfectly flat at ~20% across all weight classes. It's a quantitative proof that female mass accumulation produces near zero relative upper body force transfer advantage.
Lifters who compete only once have a ~15% 2-year survival rate. Two competitions in year one lifts this to ≈30%. Three or more in year one produces a fundamentally different survival curve: ≈50% at 2 years, ≈25% at 5 years — nearly double the one-meet group. It has direct policy implications for federation's athlete retention strategies.
Applying the Kaplan-Meier estimator reveals that the majority of registered competitors never return after their debut. For both sexes, the median career duration is effectively just one single competition prep block (~12 weeks). Most people just prepare for one competition and never return again.
K-Means clustering on lift ratios alone (no sex variable) organically segregated the population. The ‘‘Anti-Bencher’’ cluster is 66.8% female; the ‘‘Deadlift Specialist’’ is 62% female. The two high-total clusters are 83–93% male. Pure mathematical clustering on biomechanical proportions independently reconstructed the biological sex divide — a data-driven confirmation that male and female lift composition profiles are structurally different phenotypes.
The correlation between age and performance is near zero (ρ = 0.04 for women, 0.19 for men). The performance curve is genuinely flat from age 27 through 35. A 27-year-old and a 34-year-old are statistically indistinguishable. The median 45-year-old posts totals matching the median 20-year-old. Only past 50 does a real decline appear. It's not too late to start, so go hit the gym and pick up the barbell !
For men, gaining bodyweight translates meaningfully to more strength (ρ = 0.55). For women, the same relationship is much weaker (ρ = 0.39), and the R² of mass predicting total is barely half that of men (17% vs 32%). When a woman gains weight to move up a class, a larger fraction is fat, not muscle; so the strength transfer is unpredictable and high-variance.
The squat correlates highest with total (ρ = 0.963). Your Deadlift also increases with the Squat due to positive correlation between them. Squat-dominant lifters post totals 12.5% higher than deadlift-dominant ones (605 kg vs 538 kg). Even when bodyweight is controlled, the squat's edge holds. And as a sport, the squat ratio has been rising year-on-year since 2012; elite lifters are squatting more, and hence you need to as well.
The ‘‘Balanced Squat-Deadlift’’ archetype (Squat ~ Deadlift) claims the highest median total (570kg) and the highest relative strength score as well. (Wilks = 377). The next best group was balanced archetype. The ‘‘Deadlift Specialist’’ cluster posts the absolute worst. A freakishly high deadlift ratio usually means a weak squat, not an elite deadlift.
Rapid social media transformations are either beginners, outliers or steroid-users. The average male gains +4.3 kg of total per year; females +3.2 kg. Lifters with 7+ years of experience post Wilks score of ~407 vs ~361 for 1st-year lifters. And to be an average 83 kg male today, you need to be 36 kg stronger than the average 83 kg male from a decade ago. The sport's baseline keeps rising.
If you are further interested and want to deep dive into this analysis, go check out the code, plots & the report in the GitHub repository below.