Author: Indranil Ghosh
Title: An introduction to hands-on football data analysis in Python
Institute: School of Fundamental Sciences, Massey University
Twitter: @indraghosh314
Website: https://indrag49.github.io/
Date: 03-10-2021
This talk teaches these simple concepts to those who want to start working on football data analysis:
How to get open access event data from statsbomb using statsbombpy
,
How to draw a soccer pitch using mplsoccer,
How to visualize a pass network for a particular team in a particular match,
How to use NetworkX module to analyze the pass network,
How to implement computational geometric concepts like Convex Hulls, Voronoi diagrams, and Delaunay triangulations using the Python package scipy.spatial on football event and tracking data
statsbombpy
¶pip
to install statsbombpy
by using the following command:pip install statsbombpy
The open data from Statsbomb can be accessed without any need of authentication from the user but it is always advised to go through the Terms & Conditions section stated at their documentation page.
statsbombpy
package.from statsbombpy import sb
numpy
and the pandas
packages that help us manipulate our datasets and perform analyses like data cleaning and data extraction.import numpy as np
import pandas as pd
comp = sb.competitions()
credentials were not supplied. open data access only
comp
look like this:comp.head(15)
competition_id | season_id | country_name | competition_name | competition_gender | season_name | match_updated | match_available | |
---|---|---|---|---|---|---|---|---|
0 | 16 | 4 | Europe | Champions League | male | 2018/2019 | 2021-05-19T08:38:06.515138 | 2021-05-19T08:38:06.515138 |
1 | 16 | 1 | Europe | Champions League | male | 2017/2018 | 2021-01-23T21:55:30.425330 | 2021-01-23T21:55:30.425330 |
2 | 16 | 2 | Europe | Champions League | male | 2016/2017 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
3 | 16 | 27 | Europe | Champions League | male | 2015/2016 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
4 | 16 | 26 | Europe | Champions League | male | 2014/2015 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
5 | 16 | 25 | Europe | Champions League | male | 2013/2014 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
6 | 16 | 24 | Europe | Champions League | male | 2012/2013 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
7 | 16 | 23 | Europe | Champions League | male | 2011/2012 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
8 | 16 | 22 | Europe | Champions League | male | 2010/2011 | 2020-07-29T05:00 | 2020-07-29T05:00 |
9 | 16 | 21 | Europe | Champions League | male | 2009/2010 | 2020-07-29T05:00 | 2020-07-29T05:00 |
10 | 16 | 41 | Europe | Champions League | male | 2008/2009 | 2020-08-30T10:18:39.435424 | 2020-08-30T10:18:39.435424 |
11 | 16 | 39 | Europe | Champions League | male | 2006/2007 | 2021-03-31T04:18:30.437060 | 2021-03-31T04:18:30.437060 |
12 | 16 | 37 | Europe | Champions League | male | 2004/2005 | 2021-04-01T06:18:57.459032 | 2021-04-01T06:18:57.459032 |
13 | 16 | 44 | Europe | Champions League | male | 2003/2004 | 2021-04-01T00:34:59.472485 | 2021-04-01T00:34:59.472485 |
14 | 16 | 76 | Europe | Champions League | male | 1999/2000 | 2020-07-29T05:00 | 2020-07-29T05:00 |
comp
to understand the dataset better and draw out relevant information from the same. Type the following:print(comp.columns)
Index(['competition_id', 'season_id', 'country_name', 'competition_name', 'competition_gender', 'season_name', 'match_updated', 'match_available'], dtype='object')
comp
dataset. For example, if we look into the row where the competition_id
is 16
and the season_id
is 1
, we notice that the country_name
is Europe
, the competition_name
is Champions League
, the season_name
is 2017/2018
, and so on. Suppose we are satisfied with the above information, and we want to analyze a game from 1017/18's Champions League season. We keep note of the competition_id
and season_id
at that row, which are 16
and 1
respectively. Now we extract out the matches dataset by typing the following:mat = sb.matches(competition_id = 16, season_id = 1)
credentials were not supplied. open data access only
mat
looks like this:mat
match_id | match_date | kick_off | competition | season | home_team | away_team | home_score | away_score | match_status | match_status_360 | last_updated | last_updated_360 | match_week | competition_stage | stadium | referee | data_version | shot_fidelity_version | xy_fidelity_version | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 18245 | 2018-05-26 | 20:45:00.000 | Europe - Champions League | 2017/2018 | Real Madrid | Liverpool | 3 | 1 | available | unscheduled | 2021-01-23T21:55:30.425330 | None | 7 | Final | NSK Olimpijs'kyj | M. Mažić | 1.1.0 | 2 | 2 |
mat
dataset gives us the match ids, the match dates, the kick off times, the home and away teams, the scores in a particular match, the name of the referee who officiated the match and so on. Here match_id
is the unique id that will help us draw out event data for a particular match from 2017/18's Champion's League season. Let us get the event data from a match. We see there is only one match available, with match_id = 18245
, which was the Champions League final match between Real Madrid and Liverpool ⚽ that took place at the Olimpiyskiy National Sports Complex, Moscow stadium and it ended up 3-1 in Real Madrid's favor 👀 👀 👀 👀. A great feat to be honest! Let us obtain the event data for this match.events = sb.events(match_id = 18245)
credentials were not supplied. open data access only
events
fetching us the event data for the particular match looks like this:events
50_50 | ball_receipt_outcome | ball_recovery_recovery_failure | block_offensive | carry_end_location | clearance_aerial_won | clearance_body_part | clearance_head | clearance_left_foot | clearance_right_foot | ... | shot_statsbomb_xg | shot_technique | shot_type | substitution_outcome | substitution_replacement | tactics | team | timestamp | type | under_pressure | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 41212, 'lineup': [{'player': {'i... | Real Madrid | 00:00:00.000 | Starting XI | NaN |
1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 433, 'lineup': [{'player': {'id'... | Liverpool | 00:00:00.000 | Starting XI | NaN |
2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:00.000 | Half Start | NaN |
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3492 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:42:21.211 | Offside | NaN |
3493 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:48:31.725 | Half End | NaN |
3494 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:48:31.725 | Half End | NaN |
3495 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:48:02.893 | Half End | NaN |
3496 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:48:02.893 | Half End | NaN |
3497 rows × 86 columns
print(events.columns)
Index(['50_50', 'ball_receipt_outcome', 'ball_recovery_recovery_failure', 'block_offensive', 'carry_end_location', 'clearance_aerial_won', 'clearance_body_part', 'clearance_head', 'clearance_left_foot', 'clearance_right_foot', 'counterpress', 'dribble_nutmeg', 'dribble_outcome', 'dribble_overrun', 'duel_outcome', 'duel_type', 'duration', 'foul_committed_advantage', 'foul_committed_card', 'foul_committed_type', 'foul_won_advantage', 'foul_won_defensive', 'goalkeeper_body_part', 'goalkeeper_end_location', 'goalkeeper_outcome', 'goalkeeper_position', 'goalkeeper_punched_out', 'goalkeeper_technique', 'goalkeeper_type', 'id', 'index', 'injury_stoppage_in_chain', 'interception_outcome', 'location', 'match_id', 'minute', 'off_camera', 'out', 'pass_aerial_won', 'pass_angle', 'pass_assisted_shot_id', 'pass_body_part', 'pass_cross', 'pass_cut_back', 'pass_end_location', 'pass_goal_assist', 'pass_height', 'pass_inswinging', 'pass_length', 'pass_miscommunication', 'pass_outcome', 'pass_outswinging', 'pass_recipient', 'pass_shot_assist', 'pass_straight', 'pass_switch', 'pass_technique', 'pass_through_ball', 'pass_type', 'period', 'play_pattern', 'player', 'position', 'possession', 'possession_team', 'related_events', 'second', 'shot_aerial_won', 'shot_body_part', 'shot_end_location', 'shot_first_time', 'shot_freeze_frame', 'shot_key_pass_id', 'shot_one_on_one', 'shot_outcome', 'shot_redirect', 'shot_statsbomb_xg', 'shot_technique', 'shot_type', 'substitution_outcome', 'substitution_replacement', 'tactics', 'team', 'timestamp', 'type', 'under_pressure'], dtype='object')
mplsoccer
.If you do not want to recreate a football pitch manually using Python (which would be rather tedious) you can simply use the mplsoccer module without any concern. To my knowledge it provides with the best functionalities to draw a football pitch. This package is maintained by Anmol Durgapal and Andrew Rowlinson.
Keep in mind you can do a lot more advanced visualization stuffs using mplsoccer besides drawing a football pitch. We will encounter them as we move forward with other posts later. For now let us focus on visualizing a pitch in the simplest way possible. We need to pip
install the package first:
pip install mplsoccer
Requirement already satisfied: mplsoccer in c:\users\indra\anaconda3\lib\site-packages (0.0.23) Note: you may need to restart the kernel to use updated packages.
mplsoccer
uses Python 3.6+. Next we need to import matplotlib
and the Pitch
classes. import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
pitch = Pitch(pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
tight_layout = False, goal_type = 'box', label = True, axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
pitch_color
argument to 'grass'
giving an impression of a real life football pitch. Note that any other color can be set, for example, 'black'
or any color represented by its hex code. Discarding the stripe
argument removes the darker stripes that appear on the pitch. The line_color
is self-explanatory and the user can change its color too according to their need. By default, the axis, labels and the ticks representing the scales are switched off. The user can turn it on by setting label
, axis
and tick
arguments to be True
, as evident in the above pitch. Let us draw a different pitch with its color changed and stripes removed.pitch = Pitch(pitch_color='black', line_color = 'white', constrained_layout = True,
tight_layout = False, goal_type = 'box', label = True, axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
Now let us focus on the axis range for a moment. By default the Pitch()
function sets the pitch type to be statsbomb
where the y-axis is inverted and ranges from 80
to 0
. The x-axis ranges from 0
to 120
. We will be mostly working with statsbomb data, so, these orientations of the axes won't be of much concern. Nevertheless this information is way too useful and we must keep this in mind, in case we deal with football data from other sources.
To be precise, there are eight different pitch types that mplsoccer
provides us with. They are 'statsbomb'
, 'opta'
, 'tracab'
, 'skillcorner'
, 'wyscout'
,'metricasports'
, 'uefa'
, and 'custom'
. This can be set using the pitch_type
argument inside the Pitch()
function. Let us check the orientation of the uefa
pitch type:
pitch = Pitch(pitch_color='grass', stripe = True, pitch_type = 'uefa', line_color = 'white', constrained_layout = True,
tight_layout = False, goal_type = 'box', label = True, axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
orientation
and set it to 'vertical'
.pitch = Pitch(orientation = 'vertical', pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
plt.show()
view
argument to be 'half'
.pitch = Pitch(view = 'half', pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
plt.show()
mplsoccer
. The pitches can be further customized to meet the users' visualization needs. Keep an eye on the mplsoccer
documentation to learn more about the same. In the next section, we will learn how to visualize a pass network for a particular team from a match and analyze the network with the help of NetworkX Python package. This package will help us use basic concepts from complex network analysis literature to analyze the network and deduce some interesting properties from the same.NetworkX
Python package for the analysis purpose.lineup_Real = pd.DataFrame.from_dict(dict_Real)
lineup_Real
player | position | jersey_number | |
---|---|---|---|
0 | {'id': 5597, 'name': 'Keylor Navas Gamboa'} | {'id': 1, 'name': 'Goalkeeper'} | 1 |
1 | {'id': 5721, 'name': 'Daniel Carvajal Ramos'} | {'id': 2, 'name': 'Right Back'} | 2 |
2 | {'id': 5485, 'name': 'Raphaël Varane'} | {'id': 3, 'name': 'Right Center Back'} | 5 |
3 | {'id': 5201, 'name': 'Sergio Ramos García'} | {'id': 5, 'name': 'Left Center Back'} | 4 |
4 | {'id': 5552, 'name': 'Marcelo Vieira da Silva ... | {'id': 6, 'name': 'Left Back'} | 12 |
5 | {'id': 5539, 'name': 'Carlos Henrique Casimiro'} | {'id': 10, 'name': 'Center Defensive Midfield'} | 14 |
6 | {'id': 5463, 'name': 'Luka Modrić'} | {'id': 13, 'name': 'Right Center Midfield'} | 10 |
7 | {'id': 5574, 'name': 'Toni Kroos'} | {'id': 15, 'name': 'Left Center Midfield'} | 8 |
8 | {'id': 4926, 'name': 'Francisco Román Alarcón ... | {'id': 19, 'name': 'Center Attacking Midfield'} | 22 |
9 | {'id': 19677, 'name': 'Karim Benzema'} | {'id': 22, 'name': 'Right Center Forward'} | 9 |
10 | {'id': 5207, 'name': 'Cristiano Ronaldo dos Sa... | {'id': 24, 'name': 'Left Center Forward'} | 7 |
lineup_Liv = pd.DataFrame.from_dict(dict_Liv)
lineup_Liv
player | position | jersey_number | |
---|---|---|---|
0 | {'id': 3630, 'name': 'Loris Karius'} | {'id': 1, 'name': 'Goalkeeper'} | 1 |
1 | {'id': 3664, 'name': 'Trent Alexander-Arnold'} | {'id': 2, 'name': 'Right Back'} | 66 |
2 | {'id': 3471, 'name': 'Dejan Lovren'} | {'id': 3, 'name': 'Right Center Back'} | 6 |
3 | {'id': 3669, 'name': 'Virgil van Dijk'} | {'id': 5, 'name': 'Left Center Back'} | 4 |
4 | {'id': 3655, 'name': 'Andrew Robertson'} | {'id': 6, 'name': 'Left Back'} | 26 |
5 | {'id': 3532, 'name': 'Jordan Brian Henderson'} | {'id': 10, 'name': 'Center Defensive Midfield'} | 14 |
6 | {'id': 3567, 'name': 'Georginio Wijnaldum'} | {'id': 13, 'name': 'Right Center Midfield'} | 5 |
7 | {'id': 3473, 'name': 'James Philip Milner'} | {'id': 15, 'name': 'Left Center Midfield'} | 7 |
8 | {'id': 3531, 'name': 'Mohamed Salah'} | {'id': 17, 'name': 'Right Wing'} | 11 |
9 | {'id': 3629, 'name': 'Sadio Mané'} | {'id': 21, 'name': 'Left Wing'} | 19 |
10 | {'id': 3535, 'name': 'Roberto Firmino Barbosa ... | {'id': 23, 'name': 'Center Forward'} | 9 |
So, we have collected the names and the jersey number of the players (starting 11) from both the teams in separate dictionaries named players_Real
and players_Liv
. These will come handy later!
Now from the events
dataset we will extract out the relevant columns for our pass network analysis purposes.
events_pn = events[['minute', 'second', 'team', 'type', 'location', 'pass_end_location', 'pass_outcome', 'player']]
events_pn
dataframe:events_pn.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | Real Madrid | Starting XI | NaN | NaN | NaN | NaN |
1 | 0 | 0 | Liverpool | Starting XI | NaN | NaN | NaN | NaN |
2 | 0 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
3 | 0 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
4 | 45 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
5 | 45 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
type
is set to Pass
.events_pn_Real = events_Real[events_Real['type'] == 'Pass']
events_pn_Liv = events_Liv[events_Liv['type'] == 'Pass']
events_pn_Real.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos |
11 | 0 | 15 | Real Madrid | Pass | [36.2, 75.3] | [43.6, 62.0] | Incomplete | Carlos Henrique Casimiro |
16 | 0 | 25 | Real Madrid | Pass | [14.7, 23.2] | [56.7, 6.2] | Incomplete | Sergio Ramos García |
17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior |
18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro |
19 | 0 | 46 | Real Madrid | Pass | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos |
20 | 0 | 52 | Real Madrid | Pass | [41.3, 54.8] | [34.4, 40.2] | NaN | Raphaël Varane |
21 | 0 | 55 | Real Madrid | Pass | [39.1, 36.5] | [65.4, 13.1] | NaN | Sergio Ramos García |
events_pn_Liv.head(10)
minute | second | team | type | location | pass_end_location | pass_outcome | player | |
---|---|---|---|---|---|---|---|---|
6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson |
13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané |
14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira |
15 | 0 | 22 | Liverpool | Pass | [92.2, 50.9] | [109.7, 46.4] | Incomplete | Mohamed Salah |
25 | 1 | 7 | Liverpool | Pass | [42.0, 75.9] | [115.6, 59.3] | Incomplete | Trent Alexander-Arnold |
37 | 2 | 0 | Liverpool | Pass | [9.9, 39.1] | [28.1, 4.2] | NaN | Virgil van Dijk |
38 | 2 | 3 | Liverpool | Pass | [43.2, 2.8] | [50.1, 4.8] | Incomplete | Andrew Robertson |
39 | 2 | 7 | Liverpool | Pass | [53.2, 0.1] | [50.0, 4.0] | NaN | Andrew Robertson |
pass_Real_new = pass_Real.replace({"pass_maker": players_Real, "pass_receiver": players_Real})
pass_Real_new
index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 14 | 2 | 1 | 60.845455 | 31.836364 | 11 | 64.341667 | 73.875 | 24 |
1 | 6 | 7 | 2 | 3 | 81.580000 | 29.160000 | 10 | 64.341667 | 73.875 | 24 |
2 | 21 | 22 | 2 | 2 | 62.323529 | 27.082353 | 17 | 64.341667 | 73.875 | 24 |
3 | 29 | 9 | 2 | 2 | 65.081818 | 27.936364 | 11 | 64.341667 | 73.875 | 24 |
4 | 39 | 10 | 2 | 10 | 60.604762 | 55.028571 | 21 | 64.341667 | 73.875 | 24 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
73 | 16 | 2 | 1 | 1 | 64.341667 | 73.875000 | 24 | 10.870000 | 41.810 | 10 |
74 | 30 | 9 | 1 | 1 | 65.081818 | 27.936364 | 11 | 10.870000 | 41.810 | 10 |
75 | 57 | 5 | 1 | 2 | 37.436364 | 58.354545 | 22 | 10.870000 | 41.810 | 10 |
76 | 64 | 4 | 1 | 1 | 41.282353 | 24.514706 | 34 | 10.870000 | 41.810 | 10 |
77 | 74 | 8 | 1 | 1 | 51.190000 | 24.275000 | 40 | 10.870000 | 41.810 | 10 |
78 rows × 10 columns
pass_Liv_new = pass_Liv.replace({"pass_maker": players_Liv, "pass_receiver": players_Liv})
pass_Liv_new
index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 12 | 5 | 26 | 4 | 76.390909 | 28.518182 | 11 | 59.815385 | 6.830769 | 13 |
1 | 18 | 7 | 26 | 1 | 72.353333 | 36.153333 | 15 | 59.815385 | 6.830769 | 13 |
2 | 28 | 14 | 26 | 1 | 61.035294 | 37.152941 | 17 | 59.815385 | 6.830769 | 13 |
3 | 36 | 1 | 26 | 1 | 12.914286 | 40.385714 | 7 | 59.815385 | 6.830769 | 13 |
4 | 54 | 66 | 26 | 1 | 64.666667 | 72.550000 | 12 | 59.815385 | 6.830769 | 13 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
59 | 55 | 66 | 6 | 1 | 64.666667 | 72.550000 | 12 | 41.690909 | 60.172727 | 11 |
60 | 61 | 4 | 6 | 3 | 43.366667 | 25.433333 | 9 | 41.690909 | 60.172727 | 11 |
61 | 25 | 7 | 19 | 2 | 72.353333 | 36.153333 | 15 | 86.275000 | 22.075000 | 4 |
62 | 33 | 14 | 19 | 1 | 61.035294 | 37.152941 | 17 | 86.275000 | 22.075000 | 4 |
63 | 43 | 11 | 19 | 1 | 77.550000 | 64.710000 | 10 | 86.275000 | 22.075000 | 4 |
64 rows × 10 columns
pitch = Pitch(pitch_color='grass', goal_type = 'box', line_color='white', stripe = True,
constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
arrows = pitch.arrows(pass_Real.pass_maker_x, pass_Real.pass_maker_y,
pass_Real.pass_receiver_x, pass_Real.pass_receiver_y, lw = 5,
color = 'black', zorder = 1, ax=ax)
nodes = pitch.scatter(av_loc_Real.pass_maker_x, av_loc_Real.pass_maker_y,
s=350, color = 'white', edgecolors='black', linewidth=1, alpha = 1, ax = ax)
for index, row in av_loc_Real.iterrows():
pitch.annotate(players_Real[row.name], xy=(row.pass_maker_x, row.pass_maker_y),
c ='black', va = 'center', ha = 'center', size = 10, ax = ax)
plt.title("Pass network for Real Madrid against Liverpool", size = 20)
plt.show()
pitch = Pitch(pitch_color='grass', goal_type = 'box', stripe = True,
line_color='white', constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
arrows = pitch.arrows(120 - pass_Liv.pass_maker_x, pass_Liv.pass_maker_y,
120 - pass_Liv.pass_receiver_x, pass_Liv.pass_receiver_y, lw = 5,
color = 'black', zorder = 1, ax = ax)
nodes = pitch.scatter(120 - av_loc_Liv.pass_maker_x, av_loc_Liv.pass_maker_y,
s=350, color = 'red', edgecolors = 'black', linewidth=1, alpha = 1, ax = ax)
for index, row in av_loc_Liv.iterrows():
pitch.annotate(players_Liv[row.name], xy=(120 - row.pass_maker_x, row.pass_maker_y),
c ='black', va = 'center', ha = 'center', size = 10, ax = ax)
plt.title("Pass network for Liverpool against Real Madrid", size = 20)
plt.show()
Liverpool
's pass network visualization, we subtract the x coordinates from 120 just to reverse the x-axis.Now that we have been successful in correctly visualizing the pass networks of the teams involved in the game, we will now start analyzing our networks using metrics from the literature of complex network analysis.
Note that both of our networks are directed weighted graphs, with number of passes as the weight for a directed edge.
Let us first develop the isomorphic graph to the one we just visualized for Real Madrid
, but this time using the networkx
package. First we will use the relevant columns from the pass_Real_new
dataset:
pass_Real_new = pass_Real_new[['pass_maker', 'pass_receiver', 'number_of_passes']]
pass_Real_new
pass_maker | pass_receiver | number_of_passes | |
---|---|---|---|
0 | 14 | 2 | 1 |
1 | 7 | 2 | 3 |
2 | 22 | 2 | 2 |
3 | 9 | 2 | 2 |
4 | 10 | 2 | 10 |
... | ... | ... | ... |
73 | 2 | 1 | 1 |
74 | 9 | 1 | 1 |
75 | 5 | 1 | 2 |
76 | 4 | 1 | 1 |
77 | 8 | 1 | 1 |
78 rows × 3 columns
pass_Real_new
to a list of tuples, where each row is converted to a tuple. This is required for drawing a networkx
graph.L_Real = pass_Real_new.apply(tuple, axis=1).tolist()
print(L_Real)
[('14', '2', 1), ('7', '2', 3), ('22', '2', 2), ('9', '2', 2), ('10', '2', 10), ('12', '2', 2), ('5', '2', 3), ('4', '2', 3), ('8', '2', 1), ('14', '10', 1), ('7', '10', 1), ('2', '10', 7), ('22', '10', 1), ('12', '10', 1), ('5', '10', 5), ('4', '10', 2), ('8', '10', 5), ('14', '12', 1), ('7', '12', 4), ('22', '12', 2), ('1', '12', 2), ('10', '12', 1), ('4', '12', 9), ('8', '12', 4), ('14', '5', 1), ('2', '5', 5), ('1', '5', 2), ('10', '5', 3), ('12', '5', 2), ('4', '5', 5), ('8', '5', 4), ('14', '4', 1), ('7', '4', 1), ('22', '4', 5), ('9', '4', 1), ('1', '4', 4), ('10', '4', 1), ('12', '4', 2), ('5', '4', 6), ('8', '4', 10), ('14', '8', 6), ('2', '8', 1), ('22', '8', 4), ('9', '8', 4), ('1', '8', 1), ('10', '8', 4), ('12', '8', 5), ('5', '8', 4), ('4', '8', 9), ('7', '9', 1), ('2', '9', 1), ('22', '9', 1), ('1', '9', 1), ('10', '9', 1), ('12', '9', 3), ('5', '9', 1), ('8', '9', 2), ('2', '14', 2), ('9', '14', 2), ('10', '14', 1), ('12', '14', 2), ('5', '14', 1), ('8', '14', 2), ('2', '7', 2), ('22', '7', 2), ('9', '7', 1), ('12', '7', 2), ('4', '7', 1), ('8', '7', 2), ('2', '22', 3), ('12', '22', 4), ('4', '22', 4), ('8', '22', 8), ('2', '1', 1), ('9', '1', 1), ('5', '1', 2), ('4', '1', 1), ('8', '1', 1)]
G_Real = nx.DiGraph()
for i in range(len(L_Real)):
G_Real.add_edge(L_Real[i][0], L_Real[i][1], weight = L_Real[i][2])
edges_Real = G_Real.edges()
weights_Real = [G_Real[u][v]['weight'] for u, v in edges_Real]
nx.draw(G_Real, node_size=800, with_labels=True, node_color='white', width = weights_Real)
plt.gca().collections[0].set_edgecolor('black') # sets the edge color of the nodes to black
plt.title("Pass network for Real Madrid vs Liverpool", size = 20)
plt.show()
Liverpool
too, let us first clean the pass_Liv_new
dataset and then draw the isomorphic weighted directed graph:pass_Liv_new = pass_Liv_new[['pass_maker', 'pass_receiver', 'number_of_passes']]
pass_Liv_new
pass_maker | pass_receiver | number_of_passes | |
---|---|---|---|
0 | 5 | 26 | 4 |
1 | 7 | 26 | 1 |
2 | 14 | 26 | 1 |
3 | 1 | 26 | 1 |
4 | 66 | 26 | 1 |
... | ... | ... | ... |
59 | 66 | 6 | 1 |
60 | 4 | 6 | 3 |
61 | 7 | 19 | 2 |
62 | 14 | 19 | 1 |
63 | 11 | 19 | 1 |
64 rows × 3 columns
L_Liv = pass_Liv_new.apply(tuple, axis=1).tolist()
G_Liv = nx.DiGraph()
for i in range(len(L_Liv)):
G_Liv.add_edge(L_Liv[i][0], L_Liv[i][1], weight = L_Liv[i][2])
edges_Liv = G_Liv.edges()
weights_Liv = [G_Liv[u][v]['weight'] for u, v in edges_Liv]
nx.draw(G_Liv, node_size = 800, with_labels = True, node_color = 'red', width = weights_Liv)
plt.gca().collections[0].set_edgecolor('black') # sets the edge color of the nodes to black
plt.show()
Let us discuss some of the important functions from the networkx
package that we have employed for drawing graphs:
DiGraph()
function sets the base class for generating directed graphs,add_edge()
function adds an edge between two nodes given by the first two arguments and the weight
parameter sets the weight for this edgedraw()
function visualizes a networkx
graph and its parameters are self-explanatoryLet us now understand the degree, indegree and outdegree of a node from a directed weighted graph. Indegree of a node is the total number of edges that are directed towards the node, i.e, for our case, the total number of passes received by a player (node). Similarly, outdegree means the total number of edges that are directed outwards from the node, i.e, the total number of passes given by a player. Finally, the degree of a node is the total number of edges connected to a node (ignoring the directions of the edges), i.e, sum of the total number of passes given and the total number of passes received by a player. It is evident that the degree of a node is the sum of its indegree and outdegree.
We will use networkx
to find out the node degrees from the pass network of Real Madrid
.
# Prepare a dictionary with jersey numbers as the node ids,
# i.e, the dictionary keys and degrees as the dictionary values
deg_Real = dict(nx.degree(G_Real))
# convert a dictionary to a pandas dataframe
degree_Real = pd.DataFrame.from_dict(list(deg_Real.items()))
degree_Real.rename(columns = {0:'jersey_number', 1: 'node_degree'}, inplace = True)
degree_Real
jersey_number | node_degree | |
---|---|---|
0 | 14 | 12 |
1 | 2 | 17 |
2 | 7 | 11 |
3 | 22 | 11 |
4 | 9 | 14 |
5 | 10 | 15 |
6 | 12 | 16 |
7 | 5 | 14 |
8 | 4 | 17 |
9 | 8 | 19 |
10 | 1 | 10 |
Real Madrid
in that game, we notice that the player with jersey number 8
(i.e, Toni Kroos
) had the highest degree value of 19. On second are ranked the players with jersey number 2
and 4
with degree value 17, i.e, our favorite Spanish defenders 'Daniel Carvajal Ramos'
and 'Sergio Ramos García'
respectively. Tremendous! Let us use seaborn
to visualize the deg_Real
dictionary via histogram plot:X = list(deg_Real.keys())
Y = list(deg_Real.values())
sns.barplot(x = Y, y = X, palette = "magma")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("degree")
plt.title("Player pass degrees for Real Madrid vs Liverpool", size = 16)
plt.show()
Liverpool
too:# Prepare a dictionary with jersey numbers as the node ids,
# i.e, the dictionary keys and degrees as the dictionary values
deg_Liv = dict(nx.degree(G_Liv))
# convert a dictionary to a pandas dataframe
degree_Liv = pd.DataFrame.from_dict(list(deg_Liv.items()))
degree_Liv.rename(columns = {0:'jersey_number', 1: 'node_degree'}, inplace = True)
degree_Liv
jersey_number | node_degree | |
---|---|---|
0 | 5 | 12 |
1 | 26 | 11 |
2 | 7 | 17 |
3 | 14 | 17 |
4 | 1 | 7 |
5 | 66 | 13 |
6 | 4 | 12 |
7 | 11 | 11 |
8 | 6 | 12 |
9 | 9 | 10 |
10 | 19 | 6 |
14
and 7
, i,e 'Jordan Brian Henderson'
and 'James Philip Milner'
respectively. We will visualize the deg_Liv
dictionary via histogram plot:X = list(deg_Liv.keys())
Y = list(deg_Liv.values())
sns.barplot(x = Y, y = X, palette = "magma")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("degree")
plt.title("Player pass degrees for Liverpool vs Real Madrid", size = 16)
plt.show()
indeg_Real = dict(G_Real.in_degree())
indegree_Real = pd.DataFrame.from_dict(list(indeg_Real.items()))
indegree_Real.rename(columns = {0:'jersey_number', 1: 'node_indegree'}, inplace = True)
X = list(indeg_Real.keys())
Y = list(indeg_Real.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("indegree")
plt.title("Player pass indegrees for Real Madrid vs Liverpool", size = 16)
plt.show()
indeg_Liv = dict(G_Liv.in_degree())
indegree_Liv = pd.DataFrame.from_dict(list(indeg_Liv.items()))
indegree_Liv.rename(columns = {0:'jersey_number', 1: 'node_indegree'}, inplace = True)
X = list(indeg_Liv.keys())
Y = list(indeg_Liv.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("indegree")
plt.title("Player pass indegrees for Liverpool vs Real Madrid", size = 16)
plt.show()
outdeg_Real = dict(G_Real.out_degree())
outdegree_Real = pd.DataFrame.from_dict(list(outdeg_Real.items()))
outdegree_Real.rename(columns = {0:'jersey_number', 1: 'node_outdegree'}, inplace = True)
X = list(outdeg_Real.keys())
Y = list(outdeg_Real.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("outdegree")
plt.title("Player pass outdegrees for Real Madrid vs Liverpool", size = 16)
plt.show()
outdeg_Liv = dict(G_Liv.out_degree())
outdegree_Liv = pd.DataFrame.from_dict(list(outdeg_Liv.items()))
outdegree_Liv.rename(columns = {0:'jersey_number', 1: 'node_outdegree'}, inplace = True)
X = list(outdeg_Liv.keys())
Y = list(outdeg_Liv.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("outdegree")
plt.title("Player pass outdegrees for Liverpool vs Real Madrid", size = 16)
plt.show()
G_Real
and G_Liv
graphs:A_Real = nx.adjacency_matrix(G_Real)
A_Liv = nx.adjacency_matrix(G_Liv)
A_Real = A_Real.todense()
A_Liv = A_Liv.todense()
sns.heatmap(A_Real, annot = True, cmap ='gnuplot')
plt.title("Adjacency matrix for Real Madrid's pass network")
plt.show()
sns.heatmap(A_Liv, annot = True, cmap ='gnuplot')
plt.title("Adjacency matrix for Liverpool's pass network")
plt.show()
r_Real = nx.degree_pearson_correlation_coefficient(G_Real, weight = 'weight')
r_Liv = nx.degree_pearson_correlation_coefficient(G_Liv, weight = 'weight')
print(r_Real, r_Liv)
-0.17983836432860184 -0.24123721966990644
'weight'
column in the pass network. Let us create a new graph for Real Madrid
:pass_Real_mod = pass_Real_new[['pass_maker', 'pass_receiver']]
pass_Real_mod['1/nop'] = 1/pass_Real_new['number_of_passes']
pass_Real_mod.head(5)
pass_maker | pass_receiver | 1/nop | |
---|---|---|---|
0 | 14 | 2 | 1.000000 |
1 | 7 | 2 | 0.333333 |
2 | 22 | 2 | 0.500000 |
3 | 9 | 2 | 0.500000 |
4 | 10 | 2 | 0.100000 |
L_Real_mod = pass_Real_mod.apply(tuple, axis=1).tolist()
G_Real_mod = nx.DiGraph()
for i in range(len(L_Real_mod)):
G_Real_mod.add_edge(L_Real_mod[i][0], L_Real_mod[i][1], weight = L_Real_mod[i][2])
edges_Real_mod = G_Real_mod.edges()
weights_Real_mod = [G_Real_mod[u][v]['weight'] for u, v in edges_Real_mod]
nx.draw(G_Real_mod, node_size=800, with_labels=True, node_color='white', width = weights_Real_mod)
plt.gca().collections[0].set_edgecolor('black')
plt.title("Modified pass network for Real Madrid vs Liverpool", size = 20)
plt.show()
Liverpool
too:pass_Liv_mod = pass_Liv_new[['pass_maker', 'pass_receiver']]
pass_Liv_mod['1/nop'] = 1/pass_Liv_new['number_of_passes']
pass_Liv_mod.head(5)
pass_maker | pass_receiver | 1/nop | |
---|---|---|---|
0 | 5 | 26 | 0.25 |
1 | 7 | 26 | 1.00 |
2 | 14 | 26 | 1.00 |
3 | 1 | 26 | 1.00 |
4 | 66 | 26 | 1.00 |
L_Liv_mod = pass_Liv_mod.apply(tuple, axis=1).tolist()
G_Liv_mod = nx.DiGraph()
for i in range(len(L_Liv_mod)):
G_Liv_mod.add_edge(L_Liv_mod[i][0], L_Liv_mod[i][1], weight = L_Liv_mod[i][2])
edges_Liv_mod = G_Liv_mod.edges()
weights_Liv_mod = [G_Liv_mod[u][v]['weight'] for u, v in edges_Liv_mod]
nx.draw(G_Liv_mod, node_size=800, with_labels=True, node_color='red', width = weights_Liv_mod)
plt.gca().collections[0].set_edgecolor('black')
plt.title("Modified pass network for Liverpool vs Real Madrid", size = 20)
plt.show()
Real Madrid
:dis_Real = nx.shortest_path(G_Real_mod, weight = 'weight')
print(dis_Real)
{'14': {'14': ['14'], '2': ['14', '8', '10', '2'], '10': ['14', '8', '10'], '12': ['14', '8', '4', '12'], '5': ['14', '8', '5'], '4': ['14', '8', '4'], '8': ['14', '8'], '9': ['14', '8', '9'], '7': ['14', '8', '7'], '22': ['14', '8', '22'], '1': ['14', '8', '5', '1']}, '2': {'2': ['2'], '10': ['2', '10'], '5': ['2', '5'], '8': ['2', '10', '8'], '9': ['2', '5', '4', '12', '9'], '14': ['2', '14'], '7': ['2', '7'], '22': ['2', '22'], '1': ['2', '5', '1'], '12': ['2', '5', '4', '12'], '4': ['2', '5', '4']}, '7': {'7': ['7'], '2': ['7', '2'], '10': ['7', '2', '10'], '12': ['7', '12'], '4': ['7', '12', '8', '4'], '9': ['7', '12', '9'], '5': ['7', '2', '5'], '8': ['7', '12', '8'], '14': ['7', '12', '14'], '22': ['7', '12', '22'], '1': ['7', '2', '5', '1']}, '22': {'22': ['22'], '2': ['22', '2'], '10': ['22', '8', '10'], '12': ['22', '4', '12'], '4': ['22', '4'], '8': ['22', '8'], '9': ['22', '4', '12', '9'], '7': ['22', '7'], '5': ['22', '4', '5'], '1': ['22', '4', '5', '1'], '14': ['22', '8', '14']}, '9': {'9': ['9'], '2': ['9', '2'], '4': ['9', '8', '4'], '8': ['9', '8'], '14': ['9', '14'], '7': ['9', '8', '7'], '1': ['9', '1'], '10': ['9', '8', '10'], '12': ['9', '8', '4', '12'], '5': ['9', '8', '5'], '22': ['9', '8', '22']}, '10': {'10': ['10'], '2': ['10', '2'], '12': ['10', '8', '4', '12'], '5': ['10', '2', '5'], '4': ['10', '8', '4'], '8': ['10', '8'], '9': ['10', '8', '9'], '14': ['10', '2', '14'], '7': ['10', '2', '7'], '22': ['10', '8', '22'], '1': ['10', '2', '5', '1']}, '12': {'12': ['12'], '2': ['12', '2'], '10': ['12', '8', '10'], '5': ['12', '8', '5'], '4': ['12', '8', '4'], '8': ['12', '8'], '9': ['12', '9'], '14': ['12', '14'], '7': ['12', '7'], '22': ['12', '22'], '1': ['12', '8', '5', '1']}, '5': {'5': ['5'], '2': ['5', '10', '2'], '10': ['5', '10'], '4': ['5', '4'], '8': ['5', '8'], '9': ['5', '4', '12', '9'], '14': ['5', '8', '14'], '1': ['5', '1'], '12': ['5', '4', '12'], '7': ['5', '8', '7'], '22': ['5', '8', '22']}, '4': {'4': ['4'], '2': ['4', '2'], '10': ['4', '8', '10'], '12': ['4', '12'], '5': ['4', '5'], '8': ['4', '8'], '7': ['4', '12', '7'], '22': ['4', '8', '22'], '1': ['4', '5', '1'], '9': ['4', '12', '9'], '14': ['4', '12', '14']}, '8': {'8': ['8'], '2': ['8', '10', '2'], '10': ['8', '10'], '12': ['8', '4', '12'], '5': ['8', '5'], '4': ['8', '4'], '9': ['8', '9'], '14': ['8', '14'], '7': ['8', '7'], '22': ['8', '22'], '1': ['8', '5', '1']}, '1': {'1': ['1'], '12': ['1', '4', '12'], '5': ['1', '4', '5'], '4': ['1', '4'], '8': ['1', '4', '8'], '9': ['1', '4', '12', '9'], '2': ['1', '4', '2'], '10': ['1', '4', '8', '10'], '7': ['1', '4', '12', '7'], '22': ['1', '4', '8', '22'], '14': ['1', '4', '12', '14']}}
'Keylor Navas Gamboa'
(jersey number 1
) to 'Cristiano Ronaldo dos Santos Aveiro'
(jersey number 7
). We will type the following:print(dis_Real['1']['7'])
['1', '4', '12', '7']
'Keylor Navas Gamboa'
(jersey: 1
), to 'Cristiano Ronaldo dos Santos Aveiro'
(jersey: 7
) was to pass the ball first to 'Sergio Ramos García'
(jersey: 4
) who would pass to 'Marcelo Vieira da Silva Júnior'
(jersey: 12
) with him ultimately passing to 'Cristiano Ronaldo dos Santos Aveiro'
. This seems like a good post-match analysis tool. I got this idea after discussing with Sarath Babu. Liverpool
:dis_Liv = nx.shortest_path(G_Liv_mod, weight = 'weight')
print(dis_Liv)
{'5': {'5': ['5'], '26': ['5', '26'], '7': ['5', '26', '7'], '14': ['5', '14'], '4': ['5', '4'], '11': ['5', '11'], '66': ['5', '26', '7', '66'], '9': ['5', '26', '9'], '1': ['5', '14', '1'], '6': ['5', '14', '6'], '19': ['5', '26', '7', '19']}, '26': {'26': ['26'], '5': ['26', '5'], '7': ['26', '7'], '14': ['26', '14'], '9': ['26', '9'], '4': ['26', '4'], '11': ['26', '9', '11'], '66': ['26', '7', '66'], '1': ['26', '14', '1'], '6': ['26', '14', '6'], '19': ['26', '7', '19']}, '7': {'7': ['7'], '26': ['7', '66', '5', '26'], '5': ['7', '66', '5'], '14': ['7', '14'], '9': ['7', '66', '9'], '4': ['7', '4'], '1': ['7', '1'], '11': ['7', '66', '11'], '66': ['7', '66'], '6': ['7', '14', '6'], '19': ['7', '19']}, '14': {'14': ['14'], '26': ['14', '5', '26'], '5': ['14', '5'], '7': ['14', '7'], '4': ['14', '4'], '1': ['14', '1'], '66': ['14', '7', '66'], '6': ['14', '6'], '19': ['14', '7', '19'], '11': ['14', '7', '66', '11'], '9': ['14', '5', '26', '9']}, '1': {'1': ['1'], '26': ['1', '26'], '14': ['1', '14'], '4': ['1', '6', '4'], '6': ['1', '6'], '7': ['1', '6', '7'], '11': ['1', '6', '66', '11'], '66': ['1', '6', '66'], '5': ['1', '6', '66', '5'], '9': ['1', '6', '66', '9'], '19': ['1', '6', '7', '19']}, '66': {'66': ['66'], '26': ['66', '5', '26'], '5': ['66', '5'], '14': ['66', '14'], '9': ['66', '9'], '11': ['66', '11'], '6': ['66', '14', '6'], '7': ['66', '14', '7'], '4': ['66', '5', '4'], '19': ['66', '11', '19'], '1': ['66', '14', '1']}, '4': {'4': ['4'], '26': ['4', '26'], '5': ['4', '26', '5'], '14': ['4', '26', '14'], '66': ['4', '6', '66'], '6': ['4', '6'], '7': ['4', '26', '7'], '9': ['4', '26', '9'], '1': ['4', '6', '1'], '11': ['4', '6', '66', '11'], '19': ['4', '26', '7', '19']}, '11': {'11': ['11'], '5': ['11', '66', '5'], '7': ['11', '9', '7'], '9': ['11', '9'], '4': ['11', '4'], '66': ['11', '66'], '19': ['11', '19'], '14': ['11', '9', '14'], '6': ['11', '9', '14', '6'], '26': ['11', '66', '5', '26'], '1': ['11', '9', '14', '1']}, '6': {'6': ['6'], '7': ['6', '7'], '14': ['6', '66', '14'], '4': ['6', '4'], '1': ['6', '1'], '11': ['6', '66', '11'], '66': ['6', '66'], '26': ['6', '4', '26'], '5': ['6', '66', '5'], '9': ['6', '66', '9'], '19': ['6', '7', '19']}, '9': {'9': ['9'], '7': ['9', '7'], '14': ['9', '14'], '11': ['9', '11'], '66': ['9', '11', '66'], '6': ['9', '14', '6'], '5': ['9', '14', '5'], '4': ['9', '14', '4'], '19': ['9', '7', '19'], '26': ['9', '14', '5', '26'], '1': ['9', '14', '1']}, '19': {'19': ['19'], '7': ['19', '7'], '14': ['19', '14'], '9': ['19', '9'], '11': ['19', '9', '11'], '66': ['19', '9', '11', '66'], '6': ['19', '14', '6'], '5': ['19', '14', '5'], '4': ['19', '14', '4'], '26': ['19', '14', '5', '26'], '1': ['19', '14', '1']}}
print(dis_Liv['1']['9'])
['1', '6', '66', '9']
p
tells us how far the furthest player node from p
is positioned in the pass network. Let us calculate the eccentricities for all the 11 nodes for Real Madrid
.E_Real = nx.eccentricity(G_Real_mod)
print(E_Real)
{'14': 2, '2': 2, '7': 2, '22': 2, '9': 2, '10': 2, '12': 2, '5': 2, '4': 2, '8': 1, '1': 2}
av_E_Real = sum(list(E_Real.values()))/len(E_Real)
print(av_E_Real)
1.9090909090909092
Liverpool
:E_Liv = nx.eccentricity(G_Liv_mod)
print(E_Liv)
{'5': 2, '26': 2, '7': 1, '14': 2, '1': 2, '66': 2, '4': 2, '11': 2, '6': 2, '9': 2, '19': 2}
av_E_Liv = sum(list(E_Liv.values()))/len(E_Liv)
print(av_E_Liv)
1.9090909090909092
G_Real
(note that this graph should not be the modified version)cc_Real = nx.average_clustering(G_Real, weight = 'weight')
print(cc_Real)
0.182334851979709
Liverpool
:cc_Liv = nx.average_clustering(G_Liv, weight = 'weight')
print(cc_Liv)
0.2766427842450553
Real Madrid
's pass network stating the fact that a lesser number of players passed the ball among each other, compared to that of Liverpool
.centrality
(especially the betweenness centrality
) for each node in either team's pass network and understand which player was the most important in their pass network. For Real Madrid
:bc_Real = nx.betweenness_centrality(G_Real, weight = 'weight')
print(bc_Real)
{'14': 0.15222222222222223, '2': 0.10685185185185186, '7': 0.05592592592592593, '22': 0.0, '9': 0.14462962962962964, '10': 0.12407407407407407, '12': 0.009259259259259259, '5': 0.007407407407407408, '4': 0.06851851851851852, '8': 0.031481481481481485, '1': 0.11703703703703704}
max_bc_Real = max(bc_Real, key = bc_Real.get)
print(max_bc_Real)
14
Liverpool
:bc_Liv = nx.betweenness_centrality(G_Liv, weight = 'weight')
print(bc_Liv)
max_bc_Liv = max(bc_Liv, key = bc_Liv.get)
print(max_bc_Liv)
{'5': 0.06296296296296296, '26': 0.016666666666666666, '7': 0.2453703703703704, '14': 0.12407407407407407, '1': 0.002777777777777778, '66': 0.075, '4': 0.07222222222222222, '11': 0.05555555555555556, '6': 0.1259259259259259, '9': 0.021296296296296296, '19': 0.03888888888888889} 7
'Carlos Henrique Casimiro'
(jersey: 4
) from Real Madrid
and 'James Philip Milner'
(jersey: 7) from Liverpool
. We have been able to compute some interesting results using complex network analysis on our pass networks.X
then the convex hull is the smallest convex set that contains X
. This will help us get an idea about the optimal field coverage of a player during the match.scipy
package which provides us with a collection of modules for working on scientific computation with Python.scipy.spatial
module that allows us to work with spatial algorithms and data structures. As we are going to work with convex hulls first, let us import the ConvexHull
classes from scipy.spatial
: from scipy.spatial import ConvexHull
events
dataset:events_hull = events[['team', 'location', 'type', 'player']]
events_hull.head(10)
team | location | type | player | |
---|---|---|---|---|
0 | Real Madrid | NaN | Starting XI | NaN |
1 | Liverpool | NaN | Starting XI | NaN |
2 | Real Madrid | NaN | Half Start | NaN |
3 | Liverpool | NaN | Half Start | NaN |
4 | Liverpool | NaN | Half Start | NaN |
5 | Real Madrid | NaN | Half Start | NaN |
6 | Liverpool | [60.0, 40.0] | Pass | James Philip Milner |
7 | Liverpool | [35.0, 40.8] | Pass | Dejan Lovren |
8 | Real Madrid | [27.4, 60.2] | Pass | Raphaël Varane |
9 | Real Madrid | [35.3, 75.4] | Pass | Luka Modrić |
type
to Pass
or Shot
.events_hull = events_hull[(events_hull['type'] == 'Pass') | (events_hull['type'] == 'Shot')].reset_index()
events_hull.head(10)
index | team | location | type | player | |
---|---|---|---|---|---|
0 | 6 | Liverpool | [60.0, 40.0] | Pass | James Philip Milner |
1 | 7 | Liverpool | [35.0, 40.8] | Pass | Dejan Lovren |
2 | 8 | Real Madrid | [27.4, 60.2] | Pass | Raphaël Varane |
3 | 9 | Real Madrid | [35.3, 75.4] | Pass | Luka Modrić |
4 | 10 | Real Madrid | [22.3, 76.6] | Pass | Daniel Carvajal Ramos |
5 | 11 | Real Madrid | [36.2, 75.3] | Pass | Carlos Henrique Casimiro |
6 | 12 | Liverpool | [76.5, 18.1] | Pass | Jordan Brian Henderson |
7 | 13 | Liverpool | [84.4, 10.0] | Pass | Sadio Mané |
8 | 14 | Liverpool | [91.6, 21.3] | Pass | Roberto Firmino Barbosa de Oliveira |
9 | 15 | Liverpool | [92.2, 50.9] | Pass | Mohamed Salah |
location
column into location_x
and location_y
columns:Loc = events_hull['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['location_x', 'location_y'])
events_hull['location_x'] = Loc['location_x']
events_hull['location_y'] = Loc['location_y']
events_hull.head(10)
index | team | location | type | player | location_x | location_y | |
---|---|---|---|---|---|---|---|
0 | 6 | Liverpool | [60.0, 40.0] | Pass | James Philip Milner | 60.0 | 40.0 |
1 | 7 | Liverpool | [35.0, 40.8] | Pass | Dejan Lovren | 35.0 | 40.8 |
2 | 8 | Real Madrid | [27.4, 60.2] | Pass | Raphaël Varane | 27.4 | 60.2 |
3 | 9 | Real Madrid | [35.3, 75.4] | Pass | Luka Modrić | 35.3 | 75.4 |
4 | 10 | Real Madrid | [22.3, 76.6] | Pass | Daniel Carvajal Ramos | 22.3 | 76.6 |
5 | 11 | Real Madrid | [36.2, 75.3] | Pass | Carlos Henrique Casimiro | 36.2 | 75.3 |
6 | 12 | Liverpool | [76.5, 18.1] | Pass | Jordan Brian Henderson | 76.5 | 18.1 |
7 | 13 | Liverpool | [84.4, 10.0] | Pass | Sadio Mané | 84.4 | 10.0 |
8 | 14 | Liverpool | [91.6, 21.3] | Pass | Roberto Firmino Barbosa de Oliveira | 91.6 | 21.3 |
9 | 15 | Liverpool | [92.2, 50.9] | Pass | Mohamed Salah | 92.2 | 50.9 |
location
column:events_hull = events_hull[['team', 'type', 'player', 'location_x', 'location_y']]
events_hull.head(10)
team | type | player | location_x | location_y | |
---|---|---|---|---|---|
0 | Liverpool | Pass | James Philip Milner | 60.0 | 40.0 |
1 | Liverpool | Pass | Dejan Lovren | 35.0 | 40.8 |
2 | Real Madrid | Pass | Raphaël Varane | 27.4 | 60.2 |
3 | Real Madrid | Pass | Luka Modrić | 35.3 | 75.4 |
4 | Real Madrid | Pass | Daniel Carvajal Ramos | 22.3 | 76.6 |
5 | Real Madrid | Pass | Carlos Henrique Casimiro | 36.2 | 75.3 |
6 | Liverpool | Pass | Jordan Brian Henderson | 76.5 | 18.1 |
7 | Liverpool | Pass | Sadio Mané | 84.4 | 10.0 |
8 | Liverpool | Pass | Roberto Firmino Barbosa de Oliveira | 91.6 | 21.3 |
9 | Liverpool | Pass | Mohamed Salah | 92.2 | 50.9 |
Real Madrid
and the other for Liverpool
:events_hull_Real = events_hull[events_hull['team'] == 'Real Madrid'].reset_index()
events_hull_Liv = events_hull[events_hull['team'] == 'Liverpool'].reset_index()
events_hull_Real.head(5)
index | team | type | player | location_x | location_y | |
---|---|---|---|---|---|---|
0 | 2 | Real Madrid | Pass | Raphaël Varane | 27.4 | 60.2 |
1 | 3 | Real Madrid | Pass | Luka Modrić | 35.3 | 75.4 |
2 | 4 | Real Madrid | Pass | Daniel Carvajal Ramos | 22.3 | 76.6 |
3 | 5 | Real Madrid | Pass | Carlos Henrique Casimiro | 36.2 | 75.3 |
4 | 10 | Real Madrid | Pass | Sergio Ramos García | 14.7 | 23.2 |
events_hull_Liv.head(5)
index | team | type | player | location_x | location_y | |
---|---|---|---|---|---|---|
0 | 0 | Liverpool | Pass | James Philip Milner | 60.0 | 40.0 |
1 | 1 | Liverpool | Pass | Dejan Lovren | 35.0 | 40.8 |
2 | 6 | Liverpool | Pass | Jordan Brian Henderson | 76.5 | 18.1 |
3 | 7 | Liverpool | Pass | Sadio Mané | 84.4 | 10.0 |
4 | 8 | Liverpool | Pass | Roberto Firmino Barbosa de Oliveira | 91.6 | 21.3 |
players_Real = events_hull_Real.player.unique()
players_Liv = events_hull_Liv.player.unique()
print(players_Real)
print(players_Liv)
['Raphaël Varane' 'Luka Modrić' 'Daniel Carvajal Ramos' 'Carlos Henrique Casimiro' 'Sergio Ramos García' 'Marcelo Vieira da Silva Júnior' 'Toni Kroos' 'Cristiano Ronaldo dos Santos Aveiro' 'Karim Benzema' 'Keylor Navas Gamboa' 'Francisco Román Alarcón Suárez' 'José Ignacio Fernández Iglesias' 'Gareth Frank Bale' 'Marco Asensio Willemsen'] ['James Philip Milner' 'Dejan Lovren' 'Jordan Brian Henderson' 'Sadio Mané' 'Roberto Firmino Barbosa de Oliveira' 'Mohamed Salah' 'Trent Alexander-Arnold' 'Virgil van Dijk' 'Andrew Robertson' 'Georginio Wijnaldum' 'Loris Karius' 'Adam David Lallana' 'Emre Can']
events_hull_Real
.events_hull_Toni = events_hull_Real[events_hull_Real['player'] == 'Toni Kroos']
events_hull_Toni
index | team | type | player | location_x | location_y | |
---|---|---|---|---|---|---|
7 | 13 | Real Madrid | Pass | Toni Kroos | 48.8 | 13.9 |
15 | 22 | Real Madrid | Pass | Toni Kroos | 23.4 | 18.6 |
30 | 73 | Real Madrid | Pass | Toni Kroos | 35.0 | 24.9 |
36 | 79 | Real Madrid | Pass | Toni Kroos | 41.7 | 21.7 |
40 | 83 | Real Madrid | Pass | Toni Kroos | 50.6 | 28.3 |
... | ... | ... | ... | ... | ... | ... |
638 | 969 | Real Madrid | Pass | Toni Kroos | 120.0 | 80.0 |
639 | 970 | Real Madrid | Pass | Toni Kroos | 120.0 | 80.0 |
641 | 972 | Real Madrid | Pass | Toni Kroos | 96.8 | 73.1 |
666 | 1020 | Real Madrid | Pass | Toni Kroos | 120.0 | 0.1 |
672 | 1032 | Real Madrid | Pass | Toni Kroos | 56.9 | 41.5 |
92 rows × 6 columns
location_x
and location_y
from events_hull_Toni
and then compute the upper and lower bounds of the data. Any points lying beyond these bounds, i.e any point lying above the lower bound and any point lying below the upper bound, are decided to be outliers and are discarded. We use box plots and whisker plots to visualize the interquartile range for the datapoints: e_box = pd.DataFrame(data = events_hull_Toni, columns = ["location_x", "location_y"])
boxplot = sns.boxplot(x = "variable", y ="value", data=pd.melt(e_box),
order = ["location_x", "location_y"])
boxplot = sns.stripplot(x = "variable", y = "value", data = pd.melt(e_box), marker="o",
color="red", order = ["location_x", "location_y"])
boxplot.axes.set_title("Boxplot for Toni Kroos's location conditions")
plt.show()
Q1 = np.percentile(events_hull_Toni['location_x'], 25, interpolation='midpoint')
Q3 = np.percentile(events_hull_Toni['location_x'], 75, interpolation='midpoint')
IQR_x = Q3 - Q1
minimum_x = Q1 - 1.5*IQR_x
maximum_x = Q3 + 1.5*IQR_x
Q1, Q3, IQR_x, minimum_x, maximum_x
(47.400000000000006, 67.85, 20.44999999999999, 16.725000000000023, 98.52499999999998)
Q1 = np.percentile(events_hull_Toni['location_y'], 25, interpolation='midpoint')
Q3 = np.percentile(events_hull_Toni['location_y'], 75, interpolation='midpoint')
IQR_y = Q3 - Q1
minimum_y = Q1 - 1.5*IQR_y
maximum_y = Q3 + 1.5*IQR_y
Q1, Q3, IQR_y, minimum_y, maximum_y
(15.0, 41.8, 26.799999999999997, -25.199999999999996, 82.0)
upper = np.where((events_hull_Toni['location_x'] >= maximum_x) & (events_hull_Toni['location_y'] >= maximum_y))
lower = np.where((events_hull_Toni['location_x'] <= minimum_x) & (events_hull_Toni['location_y'] <= minimum_y))
events_hull_Toni.drop(upper[0], inplace = True)
events_hull_Toni.drop(lower[0], inplace = True)
events_hull_Toni
dataset:events_hull_Toni = events_hull_Toni.reset_index()
events_hull_Toni = events_hull_Toni[['team', 'type', 'player', 'location_x', 'location_y']]
events_hull_Toni.head(10)
team | type | player | location_x | location_y | |
---|---|---|---|---|---|
0 | Real Madrid | Pass | Toni Kroos | 48.8 | 13.9 |
1 | Real Madrid | Pass | Toni Kroos | 23.4 | 18.6 |
2 | Real Madrid | Pass | Toni Kroos | 35.0 | 24.9 |
3 | Real Madrid | Pass | Toni Kroos | 41.7 | 21.7 |
4 | Real Madrid | Pass | Toni Kroos | 50.6 | 28.3 |
5 | Real Madrid | Pass | Toni Kroos | 42.2 | 11.1 |
6 | Real Madrid | Pass | Toni Kroos | 48.7 | 53.1 |
7 | Real Madrid | Pass | Toni Kroos | 56.7 | 59.6 |
8 | Real Madrid | Pass | Toni Kroos | 56.4 | 15.2 |
9 | Real Madrid | Pass | Toni Kroos | 42.9 | 9.4 |
points_hull = events_hull_Toni[['location_x', 'location_y']].values
ConvexHull()
function from scipy.spatial
:convex_hull_Toni = ConvexHull(events_hull_Toni[['location_x', 'location_y']])
vertices
attribute consists of the indices of the points in points_hull
that make up the convex hull, and the simplices
attribute too consists of the indices of the points in points_hull
. The simplices
are a list of 1-D simplices of a particular length, representing line segments in 2-D. Let us print the indices:print(convex_hull_Toni.vertices)
[50 41 55 75 84 1 67 51]
print(convex_hull_Toni.simplices)
[[50 41] [67 1] [84 1] [84 75] [55 41] [55 75] [51 50] [51 67]]
pitch = Pitch(pitch_color='grass', stripe = True, line_color='black', goal_type='box',
constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
plt.scatter(events_hull_Toni.location_x, events_hull_Toni.location_y, color='white')
for i in convex_hull_Toni.simplices:
plt.plot(points_hull[i, 0], points_hull[i, 1], 'black')
plt.fill(points_hull[convex_hull_Toni.vertices, 0], points_hull[convex_hull_Toni.vertices, 1],
c='grey', alpha=0.1)
plt.title("Convex Hull for Toni Kroos's field coverage against Liverpool")
Text(0.5, 1.0, "Convex Hull for Toni Kroos's field coverage against Liverpool")
So, we have been able to compute and visualize the convex hulls for players from a particular game. Next, we will try to understand how to get tracking data from a particular game using statsbomb
api. We need tracking data to compute Delaunay triangulations and Voronoi diagrams.
The match id that we have been working with is 18245
.
We need to first import useful classes from the mplsoccer.statsbomb
module:
from mplsoccer.statsbomb import read_event, EVENT_SLUG
event_json = read_event(f'{EVENT_SLUG}/18245.json', related_event_df = False,
tactics_lineup_df = False, warn = False)
event = event_json['event']
tracking = event_json['shot_freeze_frame']
event
and tracking
datasets:event.head(5)
match_id | id | index | period | timestamp_minute | timestamp_second | timestamp_millisecond | minute | second | type_id | ... | injury_stoppage_in_chain | shot_statsbomb_xg | shot_key_pass_id | shot_first_time | shot_one_on_one | shot_redirect | substitution_replacement_id | substitution_replacement_name | tactics_formation | aerial_won | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 18245 | 5eee3ffd-f0c0-4532-868b-4a66cbf20cb8 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 35 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 41212.0 | NaN |
1 | 18245 | eaa65a92-02d3-4375-b2b7-7c2f679a620c | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 35 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 433.0 | NaN |
2 | 18245 | 9c82d2e5-ebba-4825-b7f9-b11b04433ed8 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 18245 | b791047a-3eea-452f-b3a9-212bd40cd7cb | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 18245 | 25be91a5-a084-42cb-8cc1-a0fe7b0f52f9 | 5 | 1 | 0 | 0 | 371 | 0 | 0 | 30 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 77 columns
event.tail(5)
match_id | id | index | period | timestamp_minute | timestamp_second | timestamp_millisecond | minute | second | type_id | ... | injury_stoppage_in_chain | shot_statsbomb_xg | shot_key_pass_id | shot_first_time | shot_one_on_one | shot_redirect | substitution_replacement_id | substitution_replacement_name | tactics_formation | aerial_won | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3492 | 18245 | b4258521-d4ec-466d-a90c-e4522692a45b | 3493 | 2 | 47 | 30 | 959 | 92 | 30 | 30 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3493 | 18245 | 37f51448-ebd1-4d67-8d9e-fa4b450111b2 | 3494 | 2 | 47 | 33 | 52 | 92 | 33 | 42 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3494 | 18245 | e9f7bb50-f4fc-45aa-87d3-20bbe9ebd32f | 3495 | 2 | 47 | 39 | 157 | 92 | 39 | 40 | ... | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3495 | 18245 | ce7d446a-e8bf-4631-bcf5-2bd323ba251e | 3496 | 2 | 48 | 2 | 893 | 93 | 2 | 34 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3496 | 18245 | d19b2348-de55-4bbf-9b1f-e44d95aa3a77 | 3497 | 2 | 48 | 2 | 893 | 93 | 2 | 34 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 77 columns
tracking.head(5)
id | event_freeze_id | player_teammate | player_id | player_name | player_position_id | player_position_name | x | y | match_id | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 682270cc-4bc4-4952-8f91-d3c5a704a691 | 1 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 98.0 | 48.4 | 18245 |
1 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 1 | True | 3535 | Roberto Firmino Barbosa de Oliveira | 23 | Center Forward | 109.0 | 39.9 | 18245 |
2 | 399ac143-5f7b-4080-8c0b-3c18435d7fc1 | 1 | True | 3655 | Andrew Robertson | 6 | Left Back | 102.1 | 2.5 | 18245 |
3 | 660d9d98-46b6-4b5e-9c9a-435d63142c93 | 1 | True | 4926 | Francisco Román Alarcón Suárez | 19 | Center Attacking Midfield | 100.2 | 11.0 | 18245 |
4 | fe6c7f60-2ff0-4077-882e-b045c8abc7c3 | 1 | True | 3629 | Sadio Mané | 21 | Left Wing | 90.9 | 32.3 | 18245 |
tracking.tail(5)
id | event_freeze_id | player_teammate | player_id | player_name | player_position_id | player_position_name | x | y | match_id | |
---|---|---|---|---|---|---|---|---|---|---|
356 | 18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80 | 16 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 99.9 | 19.0 | 18245 |
357 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 17 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 99.2 | 50.3 | 18245 |
358 | 18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80 | 17 | False | 5201 | Sergio Ramos García | 5 | Left Center Back | 114.1 | 42.9 | 18245 |
359 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 18 | False | 5574 | Toni Kroos | 15 | Left Center Midfield | 102.7 | 37.0 | 18245 |
360 | 18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80 | 18 | False | 5485 | Raphaël Varane | 3 | Right Center Back | 114.4 | 37.3 | 18245 |
event
and tracking
, we understand that, the former represents the event data and the later represents the tracking data. Let us look into the columns of the tracking
dataset:print(tracking.columns)
Index(['id', 'event_freeze_id', 'player_teammate', 'player_id', 'player_name', 'player_position_id', 'player_position_name', 'x', 'y', 'match_id'], dtype='object')
tracking
dataset, we understand that the column id
represents an unique id for a shot freeze frame, i.e, it gives the unique id for the moment when a particular player was taking a shot along with the information about locations of the other players. Looking at the player_name
column, we need to add a column team
to the tracking
dataset, giving us information about which team the shot taker belongs to.tracking['team'] = 0
for i in range(len(tracking)):
if tracking['player_name'][i] in players_Real:
tracking['team'][i] = 'Real Madrid'
else:
tracking['team'][i] = 'Liverpool'
tracking.head(5)
id | event_freeze_id | player_teammate | player_id | player_name | player_position_id | player_position_name | x | y | match_id | team | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 682270cc-4bc4-4952-8f91-d3c5a704a691 | 1 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 98.0 | 48.4 | 18245 | Real Madrid |
1 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 1 | True | 3535 | Roberto Firmino Barbosa de Oliveira | 23 | Center Forward | 109.0 | 39.9 | 18245 | Liverpool |
2 | 399ac143-5f7b-4080-8c0b-3c18435d7fc1 | 1 | True | 3655 | Andrew Robertson | 6 | Left Back | 102.1 | 2.5 | 18245 | Liverpool |
3 | 660d9d98-46b6-4b5e-9c9a-435d63142c93 | 1 | True | 4926 | Francisco Román Alarcón Suárez | 19 | Center Attacking Midfield | 100.2 | 11.0 | 18245 | Real Madrid |
4 | fe6c7f60-2ff0-4077-882e-b045c8abc7c3 | 1 | True | 3629 | Sadio Mané | 21 | Left Wing | 90.9 | 32.3 | 18245 | Liverpool |
tracking = tracking[['id', 'player_name', 'x', 'y', 'team']]
tracking.head(5)
id | player_name | x | y | team | |
---|---|---|---|---|---|
0 | 682270cc-4bc4-4952-8f91-d3c5a704a691 | Luka Modrić | 98.0 | 48.4 | Real Madrid |
1 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | Roberto Firmino Barbosa de Oliveira | 109.0 | 39.9 | Liverpool |
2 | 399ac143-5f7b-4080-8c0b-3c18435d7fc1 | Andrew Robertson | 102.1 | 2.5 | Liverpool |
3 | 660d9d98-46b6-4b5e-9c9a-435d63142c93 | Francisco Román Alarcón Suárez | 100.2 | 11.0 | Real Madrid |
4 | fe6c7f60-2ff0-4077-882e-b045c8abc7c3 | Sadio Mané | 90.9 | 32.3 | Liverpool |
player_info = sb.lineups(match_id = 18245)
credentials were not supplied. open data access only
player_info
has information about both the teams. Let us fetch for Real Madrid
first:info_Real = player_info['Real Madrid']
info_Real
player_id | player_name | player_nickname | jersey_number | country | |
---|---|---|---|---|---|
0 | 4926 | Francisco Román Alarcón Suárez | Isco | 22 | Spain |
1 | 5200 | Lucas Vázquez Iglesias | Lucas Vázquez | 17 | Spain |
2 | 5201 | Sergio Ramos García | Sergio Ramos | 4 | Spain |
3 | 5202 | José Ignacio Fernández Iglesias | Nacho | 6 | Spain |
4 | 5207 | Cristiano Ronaldo dos Santos Aveiro | Cristiano Ronaldo | 7 | Portugal |
5 | 5456 | Mateo Kovačić | None | 23 | Croatia |
6 | 5463 | Luka Modrić | None | 10 | Croatia |
7 | 5485 | Raphaël Varane | None | 5 | France |
8 | 5539 | Carlos Henrique Casimiro | Casemiro | 14 | Brazil |
9 | 5552 | Marcelo Vieira da Silva Júnior | Marcelo | 12 | Brazil |
10 | 5574 | Toni Kroos | None | 8 | Germany |
11 | 5597 | Keylor Navas Gamboa | Keylor Navas | 1 | Costa Rica |
12 | 5719 | Marco Asensio Willemsen | Marco Asensio | 20 | Spain |
13 | 5721 | Daniel Carvajal Ramos | Daniel Carvajal | 2 | Spain |
14 | 6399 | Gareth Frank Bale | Gareth Bale | 11 | Wales |
15 | 6704 | Theo Bernard François Hernández | Theo Hernández | 15 | France |
16 | 6706 | Francisco Casilla Cortés | Kiko Casilla | 13 | Spain |
17 | 19677 | Karim Benzema | None | 9 | France |
player_name
and jersey_number
columns and build a dictionary:info_Real = info_Real[['player_name', 'jersey_number']]
jerseys_Real = {}
for i in range(len(info_Real)):
jerseys_Real[info_Real.player_name[i]] = str(info_Real.jersey_number[i])
print(jerseys_Real)
{'Francisco Román Alarcón Suárez': '22', 'Lucas Vázquez Iglesias': '17', 'Sergio Ramos García': '4', 'José Ignacio Fernández Iglesias': '6', 'Cristiano Ronaldo dos Santos Aveiro': '7', 'Mateo Kovačić': '23', 'Luka Modrić': '10', 'Raphaël Varane': '5', 'Carlos Henrique Casimiro': '14', 'Marcelo Vieira da Silva Júnior': '12', 'Toni Kroos': '8', 'Keylor Navas Gamboa': '1', 'Marco Asensio Willemsen': '20', 'Daniel Carvajal Ramos': '2', 'Gareth Frank Bale': '11', 'Theo Bernard François Hernández': '15', 'Francisco Casilla Cortés': '13', 'Karim Benzema': '9'}
Liverpool
:info_Liv = player_info['Liverpool']
info_Liv = info_Liv[['player_name', 'jersey_number']]
jerseys_Liv = {}
for i in range(len(info_Liv)):
jerseys_Liv[info_Liv.player_name[i]] = str(info_Liv.jersey_number[i])
print(jerseys_Liv)
{'Dejan Lovren': '6', 'James Philip Milner': '7', 'Emre Can': '23', 'Alberto Moreno Pérez': '18', 'Mohamed Salah': '11', 'Jordan Brian Henderson': '14', 'Roberto Firmino Barbosa de Oliveira': '9', 'Simon Mignolet': '22', 'Georginio Wijnaldum': '5', 'Dominic Solanke': '29', 'Sadio Mané': '19', 'Loris Karius': '1', 'Andrew Robertson': '26', 'Trent Alexander-Arnold': '66', 'Virgil van Dijk': '4', 'Adam David Lallana': '20', 'Ragnar Klavan': '17', 'Nathaniel Edwin Clyne': '2'}
id
from the tracking
dataset, representing an instance when a particular shot was taken. We will filter tracking
by a id
value which will give us the information of the locations of the players on the pitch at that moment. We can view the unique id
values:tracking.id.unique()
array(['682270cc-4bc4-4952-8f91-d3c5a704a691', '9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9', '399ac143-5f7b-4080-8c0b-3c18435d7fc1', '660d9d98-46b6-4b5e-9c9a-435d63142c93', 'fe6c7f60-2ff0-4077-882e-b045c8abc7c3', 'eda7e108-2479-46f2-9cd0-a0bc2939e352', 'c36dfe04-2f8e-48f0-8df6-1c4d0b93a16e', '3e93f456-9971-4a33-9b10-ee9961410a32', '9def9ed2-52f0-496b-8ae8-f4c5a97c2d8a', '20b934f1-9afa-401d-9a16-f97fea2b80d9', '6711367a-6855-4914-903e-a5e19771429c', 'e8c20962-0eef-4066-97ce-dcaad4f70b52', '02f0755f-76cf-4d30-8062-369dc9509bdd', '6cb4171b-90e6-4473-831e-df7a2da29f28', '93c40040-ab9a-4549-8f0e-46c5c1c8e9cd', '142e18c8-316a-4f9f-a0f8-3c41549ad1c3', '6f994944-70fc-4a30-acca-315e3fede0bb', '7654fe57-734f-45d8-bc83-ab940cd37c45', '30a872eb-fe88-4c46-858b-a4f487cb69e4', '53b73ee0-8c9c-4b64-83c5-69fc453376a1', '804f8c8e-d714-4e6a-9cd1-599665efb8c8', '36687201-f131-4418-9dd0-f632bc9c4257', '650a2dc2-e5bb-4fac-9259-afbc03bdc322', '312f9c86-6a3c-42b1-bdeb-f92cb1b16a48', '222c90b6-8293-409a-ac6d-e2c3c2e69948', 'c7f3935c-23fa-4ddc-a6ee-eb9d0972d034', '05688a6e-37f8-4aa6-a36e-d8151aa75997', '18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80'], dtype=object)
shot_id = '3e93f456-9971-4a33-9b10-ee9961410a32' # select a particular value from the id column
tracking_filtered = tracking[tracking['id'] == shot_id] # filter by the shot_id
event_filtered = event[event['id'] == shot_id]
event_filtered = event_filtered[['id', 'player_name', 'x', 'y', 'team_name']]
event_filtered = event_filtered.rename(columns = {'team_name':'team'})
data_filtered = pd.concat([event_filtered, tracking_filtered])
data_filtered
dataset looks like this:data_filtered
id | player_name | x | y | team | |
---|---|---|---|---|---|
747 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Cristiano Ronaldo dos Santos Aveiro | 111.7 | 58.7 | Real Madrid |
7 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Loris Karius | 118.1 | 45.0 | Liverpool |
35 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Roberto Firmino Barbosa de Oliveira | 100.8 | 49.0 | Liverpool |
63 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Daniel Carvajal Ramos | 100.9 | 50.2 | Real Madrid |
91 | 3e93f456-9971-4a33-9b10-ee9961410a32 | James Philip Milner | 91.3 | 28.4 | Liverpool |
119 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Karim Benzema | 108.9 | 37.9 | Real Madrid |
147 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Georginio Wijnaldum | 105.7 | 56.5 | Liverpool |
175 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Jordan Brian Henderson | 108.0 | 50.0 | Liverpool |
202 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Virgil van Dijk | 111.7 | 54.7 | Liverpool |
228 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Trent Alexander-Arnold | 105.2 | 35.3 | Liverpool |
254 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Dejan Lovren | 111.8 | 41.1 | Liverpool |
280 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Toni Kroos | 91.0 | 30.3 | Real Madrid |
304 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Francisco Román Alarcón Suárez | 102.4 | 40.6 | Real Madrid |
X
consisting of points on a 2-D Euclidean surface, a Delaunay triangulation is a type of geometric triangulation such that no points in X
lies inside the circum-circle of any triangle in the triangulation. A representation of the Delaunay triangle from the same wikipedia article:
Delaunay
from scipy.spatial
to compute the triangulation:from scipy.spatial import Delaunay
data_filtered
for the teams:tracking_Real = data_filtered[data_filtered['team'] == 'Real Madrid'].reset_index()
tracking_Liv = data_filtered[data_filtered['team'] == 'Liverpool'].reset_index()
tracking_Real
index | id | player_name | x | y | team | |
---|---|---|---|---|---|---|
0 | 747 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Cristiano Ronaldo dos Santos Aveiro | 111.7 | 58.7 | Real Madrid |
1 | 63 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Daniel Carvajal Ramos | 100.9 | 50.2 | Real Madrid |
2 | 119 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Karim Benzema | 108.9 | 37.9 | Real Madrid |
3 | 280 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Toni Kroos | 91.0 | 30.3 | Real Madrid |
4 | 304 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Francisco Román Alarcón Suárez | 102.4 | 40.6 | Real Madrid |
tracking_Liv
index | id | player_name | x | y | team | |
---|---|---|---|---|---|---|
0 | 7 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Loris Karius | 118.1 | 45.0 | Liverpool |
1 | 35 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Roberto Firmino Barbosa de Oliveira | 100.8 | 49.0 | Liverpool |
2 | 91 | 3e93f456-9971-4a33-9b10-ee9961410a32 | James Philip Milner | 91.3 | 28.4 | Liverpool |
3 | 147 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Georginio Wijnaldum | 105.7 | 56.5 | Liverpool |
4 | 175 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Jordan Brian Henderson | 108.0 | 50.0 | Liverpool |
5 | 202 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Virgil van Dijk | 111.7 | 54.7 | Liverpool |
6 | 228 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Trent Alexander-Arnold | 105.2 | 35.3 | Liverpool |
7 | 254 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Dejan Lovren | 111.8 | 41.1 | Liverpool |
points_Real = tracking_Real[['x', 'y']].values
print(points_Real)
[[111.7 58.7] [100.9 50.2] [108.9 37.9] [ 91. 30.3] [102.4 40.6]]
del_Real = Delaunay(tracking_Real[['x', 'y']])
loc_Real = tracking_Real[['player_name','x', 'y']].reset_index()
loc_Liv = tracking_Liv[['player_name','x', 'y']].reset_index()
loc_Real
index | player_name | x | y | |
---|---|---|---|---|
0 | 0 | Cristiano Ronaldo dos Santos Aveiro | 111.7 | 58.7 |
1 | 1 | Daniel Carvajal Ramos | 100.9 | 50.2 |
2 | 2 | Karim Benzema | 108.9 | 37.9 |
3 | 3 | Toni Kroos | 91.0 | 30.3 |
4 | 4 | Francisco Román Alarcón Suárez | 102.4 | 40.6 |
loc_Liv
index | player_name | x | y | |
---|---|---|---|---|
0 | 0 | Loris Karius | 118.1 | 45.0 |
1 | 1 | Roberto Firmino Barbosa de Oliveira | 100.8 | 49.0 |
2 | 2 | James Philip Milner | 91.3 | 28.4 |
3 | 3 | Georginio Wijnaldum | 105.7 | 56.5 |
4 | 4 | Jordan Brian Henderson | 108.0 | 50.0 |
5 | 5 | Virgil van Dijk | 111.7 | 54.7 |
6 | 6 | Trent Alexander-Arnold | 105.2 | 35.3 |
7 | 7 | Dejan Lovren | 111.8 | 41.1 |
pitch = Pitch(pitch_color='grass', stripe=True, line_color='white', view = 'half', figsize=(8, 9),
constrained_layout=True, tight_layout=False, goal_type='box')
fig, ax = pitch.draw()
plt.scatter(tracking_Real.x, tracking_Real.y, color='white', s = 400, edgecolors='black', zorder=2)
plt.scatter(tracking_Liv.x, tracking_Liv.y, color='red', edgecolors='black', s = 400)
plt.triplot(points_Real[:, 0], points_Real[:, 1], del_Real.simplices.copy(), 'k-', lw = 4)
for index, row in loc_Real.iterrows():
pitch.annotate(jerseys_Real[loc_Real['player_name'][row.name]], xy=(row.x, row.y), c ='black',
va = 'center', ha = 'center', size = 14, ax = ax)
for index, row in loc_Liv.iterrows():
pitch.annotate(jerseys_Liv[loc_Liv['player_name'][row.name]], xy=(row.x, row.y), c ='black',
va = 'center', ha = 'center', size = 14, ax = ax)
Liverpool
's players and the white nodes indicate that of Real Madrid
's. The black lines indicate the direct links between the players from a particular team at a particular moment, forming the Delaunay triangulations, also called the pass triangulations. In his book Soccematics, Dr. Sumpter mentions that these lines have two useful indications: first, they portray the availability of passes among the players from a particular team, and second, they also indicate the "no man's lines" for the players from the opposition team, meaning, if an opposition player is on one of these linking lines, then they are at a disadvantage. Beautiful implementation of computational geometry, isn't it?X
of points, denote the partitions of a 2-D Euclidean space into regions that are close to each of these points. X
. Look at the image of a Voronoi diagram (taken from here), which is the dual of the Delaunay triangulation that is shown above.
data_filtered
dataset, because we need the location of all the players on the pitch. Voronoi
for computing the Voronoi diagrams and voronoi_plot_2d
to plot the diagrams on a pitch.from scipy.spatial import Voronoi, voronoi_plot_2d
data_filtered
and compute the Voronoi diagrams:data_filtered['y'] = 80 - data_filtered['y']
points = data_filtered[['x', 'y']].values
vor = Voronoi(points)
pitch = Pitch(pitch_color='grass', stripe=True, line_color='white', view = 'half', figsize=(8,9),
constrained_layout=True, tight_layout=False, goal_type='box')
fig, ax = pitch.draw()
plt.scatter(tracking_Real.x, 80 - tracking_Real.y, color='white', s = 1050, edgecolors='black', zorder=2)
plt.scatter(tracking_Liv.x, 80 -tracking_Liv.y, color='red', edgecolors='black', s = 1050)
pl = voronoi_plot_2d(vor, ax=ax, show_vertices=False, line_width = 8)
for index, row in loc_Real.iterrows():
pitch.annotate(jerseys_Real[loc_Real['player_name'][row.name]], xy=(row.x, 80 - row.y),
c ='black', va = 'center', ha = 'center', size = 15, ax = ax)
for index, row in loc_Liv.iterrows():
pitch.annotate(jerseys_Liv[loc_Liv['player_name'][row.name]], xy=(row.x, 80 - row.y),
c ='black', va = 'center', ha = 'center', size = 15, ax = ax)