In the interest of greater transparency, this post is intended to provide an explanation of each of the metrics I use in my prediction model and why they are important.
For a start, the model is a linear regression model that attempts to predict future scoring margins and winning teams based on the following independent variables.
Home Game
If a game is played at the team’s home arena, the value is 1; if not, the value is 0
Neutral Game
If a game is played at a neutral site, the value is 1; if not, the value is 0
Previous Season Difference
The difference in the team’s winning percentage at home (or as the team at a neutral site) and the opponent’s road winning percentage (or as the opponent at a neutral site)
Career Points Difference
The difference in the total amount of returning or imported scoring on each team’s roster.
This is discounted using a fractionalization index to smooth out large disparities in player career points.
For example: if a roster has 1,000 career points on it and each player has the following career point totals
Player A: 600 (60% of the total)
Player B: 300 (30%)
Players C: 60 (6%)
Player D: 20 (2%)
Player E: 20 (2%)
The fractionalization index is…
1 - (0.6^2 + 0.3^2 + 0.06^2 + 0.02^2 + 0.02^2)
1 - (0.36 + .09 + 0.0036 + 0.0004 + 0.0004)
1 - 0.4544 = 0.5456
Higher numbers are better.
The fractionalization index is then multiplied by the roster career points, so the example roster would have just 545.6 discounted career points.
The advantage here is twofold:
Teams with just a couple of major scorers will be penalized.
A player transferring in with a large amount of scoring will have his contribution reduced to more reasonable levels.
Head Coaches’ Win Percentage Difference
The difference in the winning percentage of the team’s head coach at home and the opponent’s head coach’s winning percentage on the road
New Coach
Flags a head coach if they are in their first season at a team.
Conference Advantage
The winning percentage of the team’s conference against the opponent’s conference above or below 50% and then multiplied by 100.
This metric gives us a way to differentiate between teams in bigger, wealthier conferences like the Big 10 and Big 12 and smaller conferences like the Big Sky and Northeastern.
What about really bad teams in big conferences and really good teams in small conferences?
Really bad teams in big conferences like the Big 10 and Big 12 are still likely to be better than really bad teams in small conferences, sometimes for intangible reasons the model otherwise wouldn’t account for.
Difference in Recruiting Class Average
This is the difference between the team’s three-year average recruiting class ratings and their opponents. These numbers come from 247Sports. I only factor in high school recruits, not transfer player ratings.
Difference in Effective Field Goal Percentage Margins
The difference between the team’s effective field goal percentage margin against prior opponents and their current opponent’s effective field goal percentage margin against their prior opponents.
Effective field goal percentage is calculated by the formula:
EFGP = (Shots Made + 0.5*3-Point Shots Made) / Shots Attempted
The formula gives extra weight to making 3s.
Difference in Turnovers Per Possession Margins
The difference between a team’s turnovers per possession margin against their prior opponents and their current opponent’s turnovers per possession margin against their prior opponents.
Possessions are not an officially tracked statistic, but can be estimated with the following formula:
Possessions = Shots Attempted - Offensive Rebounds + Turnovers + (0.475*Free Throws Attempted)
Difference in Offensive Rebound Percentage Margins
The difference between a team’s offensive rebounding percentage margin against their prior opponents and their current opponent’s offensive rebounding percentage margin against their prior opponents.
OR Percentage = Team Offensive Rebounds / (Team Offensive Rebounds + Opponent Defensive Rebounds)
Difference in Free Throws Made Per Shot Attempt Margins
The difference between a team’s free throws made per shot attempt margin against their prior opponents and their current opponent’s free throws made per shot attempt margin against their prior opponents.
Free Throws Made Per Shot Attempt = Free Throws Made / Shots Attempted
Each of these metrics contributes to the overall predictive power of the model, but some will have slightly different levels of importance for each individual matchup. The last four (effective field goal percentage, turnovers per possession, offensive rebounding, and free throws made per shot attempt) are essentially the “Four Factors” developed by basketball statistician, Dean Oliver.
I am constantly trying out different things with my model, and open to suggestions for improvement. However, the model remains essentially unchanged since the beginning of the 2023 season.
Additionally, I am trying to build a highly predictive model, but I am also trying to build a model that helps explain what wins in basketball. Each of these metrics should provide a comprehensive, if imperfect, assessment of the building blocks of a good basketball team.
Home-court advantage
A winning coach
Experienced scorers up and down the roster
A conference capable of shelling out large amounts of money to member schools to pay for athletics staff, facilities, etc.
High-level recruits
The ability to make shots at a high percentage and prevent your opponent from doing the same.
Valuing the basketball by not turning it over and creating turnovers by your opponents.
Offensive rebounding to ensure extra shot attempts.
Getting to the free throw line to pick up extra points, while avoiding giving your opponents the same opportunities.
I hope this has peeled back the curtain a bit more on how my predictions are made and what my model is attempting to do. You can track the model’s performance over the course of the season here: 2024 NCAA Men's Basketball Predictions. As always, comments and suggestions are welcome.