As part of an EU-funded cross-border investigation, we built web-scraping, data-compiling algorithms to find out just how big European football's betting problem is. The results were "frightening".

In collaboration with:
Investigative Journalism for Europe&
Josimar Football

Match-fixing has exploded as a direct result of internet gambling - and so has the proliferation of semi-professional and community leagues on the betting market.

Some of these competitions are just a step up from "park soccer". These least visible leagues may also be the most vulnerable to approaches from organised criminals.

We had previously investigated this phenomenon from an Australian perspective for ABC News' flagship radio current affairs program, Background Briefing, in 2017, and then again for The Sydney Morning Herald and The Age in 2021.

This groundbreaking research tracked flows of data around the world, from local soccer pitches to unlicensed Russian betting sites, and showed the middleman were the sporting integrity companies that had actually convinced competitions to pay them to protect their games from match-fixing.

Europe is the heartbeat of world football, so when we were approached by a team of investigators working with an IJ4EU grant to see what we could uncover from a European perspective, we set about taking this research to the next level.

While the relentless stream of gambling advertisements in competitions like the English Premier League are the most visible sign that football has lost control of its relationship with bookmakers, we set about bringing some of the more obscure signs to light.

"It's completely crazy."

We audited four bookmakers across a weekend in March 2022, a period of 72 hours in total.

We then filtered this list down to matches played in leagues in the 55 member states of UEFA, European football's governing body.

This gave us 1600+ games. Of these:


1 in 2 was a semi-professional, amateur or under 19s competition


1 in 7 came from Spain


1 in 10 were in under-19 competitions


1 in 16 was a German semi-professional match


4 in 5 appeared to have their live-data sourced from a company providing sports integrity services.



Methodology

The major technical component of this project was a series of web-scraping, data-compiling algorithms that mapped out the size, scope and depth of the European football betting market.

In our Australian work, we had audited a single bookmaker, an unlicensed Russian/Armenian "white label".

With so many countries in Europe, it was important to increase the number of bookmakers to audit to ensure a diverse selection of games and licensing statuses.

After casting a wide net, we settled on four sites which, between them, seemed to cover all games on the market and provided consistent and diverse streams of harvestable data.

They ranged from an Austrian-based giant of the industry to an unlicensed and otherwise unknown Iranian site using the same Armenia technology we had seen in our Australian audit.

We experimented with a number of different approaches to scraping and compiling the data, in part to find efficiencies (and bugs) and in part to become familiar with all the data that was able to be collected.

This familiarity led to one of the most important of the project: the discovery that some bookmakers included match id codes from the sports data provider they were sourcing their data from. This allowed us to not only see the number of low level games that were available, but also who was making this possible.

After weeks of test runs, our algorithm was put to use in the last weekend before March's international break.

We also provided editorial content and support and guided the digital dirt digging behind many other stories in this series, in collaboration with other team members.

"It's completely crazy, because it offers many possibilities for match fixing."


The challenge of standardisation

One of the biggest challenges we faced in this project was also one of the most mundane: how do you standardise team names, when every bookmaker can list them slightly differently?

What one bookmaker calls Inter Milan another might Internazionale, Inter Milano or simply Inter, and without standardisation, the database of matches would be riddled with duplicate entries.

With some simple regex and a custom 'word replace' database, around 80 per cent of team names could be standardised with minimal fuss.

But how to deal with a team that one bookmaker calls FAC Wien and another calls Floridsdorfer? There were hundreds like this, and this required days of tedious data entry.

Future versions of this project would work with one central table for each league, and use machine learning to best match a bookmaker's version of the name to this 'source of truth'.

Nevertheless, the manual effort in this project allowed our various dataframes to be merged and a proper analysis of the size of the European football betting market to be seen.


World-leading research

This research was part of a much larger project that investigated the links between organised crime, sports data companies and the sports integrity industry.

This was built on earlier investigations we carried out for ABC Radio National's flagship current affairs program, Background Briefing, and then for The Age and The Sydney Morning Herald.

This groundbreaking body of research showed how international sports data companies were secretly sending "data scouts" to community sports in Australia's suburbs in order to put these matches on the illegal offshore betting market.

This initial investigation led to Basketball Australia introducing a new integrity policy to stop "courtsiding".

When its major men's league subsequently signed a deal with this same company to offer it protection against match-fixing, we were able to provide direct evidence that this new 'integrity partner' was selling this data to poorly regulated bookmakers operating in breach of the country's Interactive Gambling Act.

This investigation was a front-page story on The Age.


We then shifted the focus to football (soccer), and through compiling betting odds from unlicensed offshore sites, were able to show how Australia was the leading country for games on the betting market.These matches included social matches involving players in the 60s.

Our research has also exposed a number of other issues with offshore websites targeting Australia, and led to the first known clocking of sites by media watchdog ACMA.