Show menu >>

Cambridge machine learning model finds riskiest and safest bets

Cambridge machine learning model finds riskiest and safest bets MAS East Midlands

Gamblers and alcoholics are given the same advice - gamble responsibly or drink in moderation. But while the alcohol content is stated on bottles, in gambling things are much murkier.
2022-10-27, by Frank Flegg, Director at BNI East Midlands

#Big Data || #AI || #Sport ||

Table of contents:

A Python machine learning model, using the Scikit-learn library, for predicting the results of football matches.


I was inspired to write this article by Machine learning: predicting the 2018 EPL mathes. Our machine learning model will train on match statistics from the 2015/2016 season to predict the results of upcoming games. The data is taken from the football statistics website

The code and data are available on github. Check also Betting Sites in India.

After publishing this article, some users started writing to me suggesting to improve the algorithm and to bet on matches.

Firstly, betting is evil, gambling, you can't beat a bookie.

Secondly, you can never make 100% predictions on football (and sports in general). This is where the human factor comes into play. The defender slipped and missed the opponent's striker who eventually scored a goal, the goalkeeper's eyes were shined with a laser and he couldn't react, the striker got injured in the first minutes of the match, etc.

Don't bet, there is no money to be made from betting, you will only lose your money and time.

The purpose of this publication is to demonstrate that machine learning can be used in sports.

Training the model

Let's write a function that will return the training data. It creates a dictionary with the vectors of the teams from all seasons. For each game, the function calculates the difference between the vectors of teams for a particular season and writes it to xTrain. The function then assigns yTrain a value of 1 if the home team wins, and 0 otherwise.

We get the training data for all seasons from 2015/2016 to 2018/2019.

We will use the LinearRegression machine learning algorithm from the Scikit-Learn library to predict the probability of winning. Let's write a function that will return the predictions. It will return a value between 0 and 1, where 0 is a loss and 1 is a gain.


Our algorithm is very primitive. It only takes into account the match statistics (and only 15 basic parameters) and the result in football depends on many factors. Even the field conditions or weather can affect the result of the game.

Next, we would like to increase the number of features, create a test sample, try different algorithms, adjust the model and get the most accurate predictions.

Frank Flegg

Frank Flegg contributor to
Director at BNI East Midlands
Digital Marketing Agency