Krok Odds
Guide

How to Analyse Sports Statistics for Betting: A Practical Framework

Most punters use stats wrong — they look at averages, ignore sample size, and confuse descriptive with predictive. Here is a practical framework for analysing sports data that actually surfaces betting edges.

Daniel Pham
Daniel Pham
Quantitative Strategy Lead
11 min read·Published 24 Jan 2026

Every punter looks at statistics. Almost every punter looks at them wrong. The AFL website shows a team averaging 92 points per game over the season and the punter bets the over on total points. The NRL stats page shows a player averaging 120 run metres and the punter backs the over on their metres line. This is not analysis. It is pattern recognition without a framework — and the bookmaker's framework is better than yours.

This piece covers what actually matters when analysing sports statistics for betting. Which stats predict future outcomes and which only describe past ones. How to avoid the six most common statistical traps. Where to find reliable Australian sports data. And how to build a simple analytical workflow that surfaces genuine edges rather than confirming what you already wanted to bet.

The difference between descriptive and predictive statistics

Most publicly available sports statistics are descriptive — they tell you what happened. The AFL ladder tells you who won. The NRL season stats tell you who scored the most tries. Averages tell you what the central tendency was over a given period.

Predictive statistics tell you what will happen next. They are almost never published directly by sports leagues or bookmakers because they are the analytical work product that creates an edge. Converting descriptive data into predictive signals is the entire work of sports betting analysis.

The difference in practice:

  • Descriptive: "Collingwood has won 7 of their last 10 games." This tells you what happened. It does not tell you whether Collingwood played well or got lucky, whether the opponents were strong or weak, or whether the wins were by 1 point or 60 points.
  • Predictive: "Collingwood's pressure rating over the last 5 games is 15% above league average, and teams facing above-average pressure concede 12% more turnovers in the following week." This tells you something about what is likely to happen next, based on a metric that has demonstrated predictive power across a large sample.

The habit to develop: whenever you read a stat, ask "does this describe what happened, or does it predict what will happen?" If it describes, ask "what predictive signal might be hiding inside this descriptive number?" The surface stat is the starting point for analysis, not the conclusion.

The six most common statistical traps in betting analysis

1. Small sample theatre. "Player X has averaged 28 points over the last 3 games." Three games is not a sample. It is an anecdote. NBA players have hot and cold stretches of 3-5 games constantly — the variance in basketball scoring is high enough that a 3-game sample tells you almost nothing about the next game. The minimum useful sample for most sports statistics is 10-15 games. Below that, you are looking at noise, not signal.

2. The average trap. Averages hide distribution shape. A team averaging 85 points per game might score 100 one week and 70 the next, or it might score 82-88 every week. The average is the same. The betting implications are completely different — the high-variance team is more likely to cover a large spread and more likely to go under a low total. Always look at the distribution (standard deviation, range, recent trend) alongside the average. An average without variance information is half a statistic.

3. Recency bias. "They have won 5 in a row — they are in form." Winning streaks happen to average teams. In a 50-50 proposition, a 5-win streak occurs by chance roughly 3% of the time. There are 18 AFL teams, each playing 23 games. Streaks happen. The question is not whether a team is "in form" — it is whether the underlying performance metrics during the streak are different from the season baseline, and whether those metrics predict continuation. Usually, they do not. Streaks are mostly variance.

4. The narrative override. "Team X has a terrible record at this venue." This is a narrative, not a statistic — unless the sample is large enough to be statistically meaningful and the effect size is large enough to be betting-relevant. Most venue-based narratives involve samples of 5-15 games over multiple seasons with different teams. The signal-to-noise ratio is terrible. The narrative feels true because it is repeated often. The data usually shows no predictive power beyond what is explained by team strength differential.

5. Confirmation bias in stat selection. You want to bet on Collingwood. You look for stats that support Collingwood winning. You find them — Collingwood has a good record at the MCG, their midfield is ranked top 4 for clearances, the opponent is missing a key defender. You ignore the stats that cut the other way — Collingwood's defensive efficiency has dropped 15% in the last month, the opponent's forward line is the most efficient in the league, Collingwood has lost 3 of their last 4 as a favourite. This is not analysis. It is rationalisation. The discipline of writing down the case against your bet before placing it is the simplest corrective.

6. Ignoring market efficiency. "The stats say Team X should be favourite." The market already knows the stats. The bookmaker's model incorporates everything you have looked at and more — historical data across decades, player-level tracking data, weather forecasts, injury impacts, betting market movements. If the stats say Team X should be $1.80 but the market has them at $2.10, the market is probably pricing in something you have missed, not making an obvious error. The analytical question is not "what do the stats say?" It is "what does the market know that I do not?" Finding the answer to that question — the information asymmetry — is where edges actually live.

Stats that actually predict outcomes, by sport

AFL. Pressure rating (Champion Data) is the single best predictive metric. Teams facing high pressure generate fewer inside 50s and score less efficiently. Contested possession differential predicts scoring shot production. Inside 50 differential predicts total score better than any other volume metric. Clearance efficiency (scoring from clearance) is more predictive than clearance volume. Expected score (xScore, from shot location data) is a better predictor of future scoring than actual score — teams that outperform their xScore regress, teams that underperform improve.

NRL. Post-contact metres predict line break frequency better than total run metres. Tackle efficiency (missed tackle percentage) predicts defensive performance in the following 3-4 weeks — teams with high missed tackle counts concede more points in subsequent games, and the effect persists beyond the immediate matchup. Completion rate at the opponent's end of the field is more predictive than overall completion rate. Penalty differential is mostly random and has near-zero predictive power — do not bet on it.

NBA. Net rating (offensive rating minus defensive rating) over the last 10-15 games is the standard predictive metric. It outperforms win-loss record, point differential, and any individual stat. Rest-adjusted performance — teams on 2+ days rest vs teams on back-to-back — shows a measurable edge of approximately 2-3 points. Pace (possessions per game) predicts total points outcomes better than either team's scoring average. Three-point attempt rate correlates with scoring variance — high 3PA teams produce more extreme total points outcomes (both over and under).

EPL/A-League. Expected goals (xG) is the most important single metric in football betting. Teams that consistently outperform their xG regress; teams that underperform improve. The effect is measurable across a full season. xG differential (xG for minus xG against) predicts future win-loss record better than actual goal differential. Shot-creating actions (SCA) and goal-creating actions (GCA) identify players whose individual contribution is likely to produce future scoring. See the soccer betting guide for the full framework.

Where to find the data

Free sources that are good enough for most analytical work:

  • AFL: AFL Tables (afltables.com — comprehensive historical data, the best free AFL resource), Wheeloratings (expected score data), the AFL app (live stats, basic but current)
  • NRL: NRL.com stats centre, Fox Sports Lab (more detailed but harder to scrape)
  • NBA: Basketball Reference (basketball-reference.com — the gold standard, comprehensive and well-structured), NBA.com/stats (official, good for recent data), Cleaning the Glass (paid, excellent for efficiency metrics)
  • Football: FBref (fbref.com — xG and advanced stats, comprehensive), Understat (xG data, good for European leagues), WhoScored (player-level stats)
  • Cricket: ESPN Cricinfo StatsGuru (comprehensive historical database, the best free cricket resource)

For most punters, the free sources are sufficient. The edge does not come from having proprietary data. It comes from looking at the same data as everyone else and asking better questions. Paid data sources (Champion Data for AFL, Stats Insider for AU sports, Opta for football) become valuable when you have a specific analytical approach that requires data not available in the free sources — typically player tracking data or proprietary metrics.

Building a simple analytical workflow

A workflow that surfaces edges rather than confirming biases:

  1. Define the question before looking at data. "Does Collingwood's scoring increase by more than 10% when playing under a roof?" is a question. "Find reasons to back Collingwood" is not. Write the question down. The specificity forces analytical honesty.
  2. Collect the data. Spreadsheet. Free sources. Minimum 20 data points for any analysis. 50+ is better. Document the source and the date collected.
  3. Calculate the effect size. Not "does this factor matter?" but "how large is the effect, and is it large enough to overcome the vig?" A 2% performance improvement that is statistically significant is not betting-relevant if the vig is 5%.
  4. Check the sample size. If the sample is under 30, the confidence interval is wide. The effect might be real or it might be noise. Do not bet on analyses with fewer than 30 data points. The variance will eat you.
  5. Test out-of-sample. Find the effect in one dataset (say, AFL seasons 2021-2024). Test whether it holds in a different dataset (AFL season 2025 first half). If the effect disappears out-of-sample, it was noise. Most "edges" found in historical data are noise. The out-of-sample test is the only defence.
  6. Convert to a price. If the analysis suggests Team X has a 58% chance of winning, the fair price is 1/0.58 = $1.72. Compare to the market price. If the market is offering $1.85 or above, there is an edge. If the market is at $1.70 or below, the market has already priced in your finding and more. Move on.

Most analyses will produce no actionable edge. The market is efficient enough that obvious statistical patterns are already priced in. The value of the workflow is not that it produces bets every time — it is that when it does produce a bet, the bet has a genuine analytical foundation rather than a gut feel dressed up in stats.

Frequently asked questions

Do I need to learn to code to analyse sports statistics properly?

No, but it helps. A spreadsheet (Excel or Google Sheets) is sufficient for most analytical work up to a few thousand data points. You can calculate averages, standard deviations, correlations, and basic regression in a spreadsheet. The limitation is data collection — manually entering hundreds of data points is tedious and error-prone. Learning basic Python or R for data scraping and analysis becomes worthwhile if you are doing this regularly. But the analytical thinking — asking good questions, checking sample sizes, testing out-of-sample — is more important than the tool. A sharp analyst with a spreadsheet will outperform a poor analyst with a Python environment every time.

How do I know if my analysis is actually finding an edge?

Track the results of bets placed based on the analysis. After 100 bets, calculate your return on turnover. If it is positive, you might have an edge — but the confidence interval at 100 bets is wide. After 500 bets, the confidence interval tightens. If you are still positive after 500 bets placed on analyses from your workflow, you probably have a genuine edge. If you are negative, the workflow needs revision or the market is too efficient for this approach. Either way, the tracking tells you the truth. Most punters never track, so they never know.

Daniel Pham
About the author
Daniel Pham
Quantitative Strategy Lead

Daniel writes about the maths underneath advantage betting — expected value, Kelly sizing, closing line value, bankroll theory. Translates the theoretical side into practical decisions AU punters can actually apply.