Title: Football (soccer) data analysis: a pedagogic introduction
Author: Indranil Ghosh
Institute: School of Fundamental Sciences, Massey University
Date: TBD
This talk teaches four simple concepts to those who want to start working on football data analysis:
How to get open access event data from statsbomb using statsbombpy,
How to draw a soccer pitch using mplsoccer,
How to visualize a pass network for a particular team in a particular map,
How to use NetworkX module to analyze the pass network,
How to draw pass maps along with their corresponding heat maps, and
How to implement computational geometric concepts like Convex Hulls, Voronoi diagrams, and Delaunay triangulations using the Python package scipy.spatial on football event and tracking data
statsbombpy¶pip to install statsbombpy by using the following command:pip install statsbombpy
The open data from Statsbomb can be accessed without any need of authentication from the user but it is always advised to go through the Terms & Conditions section stated at their documentation page.
statsbombpy package.from statsbombpy import sb
numpy and the pandas packages that help us manipulate our datasets and perform analyses like data cleaning and data extraction.import numpy as np
import pandas as pd
comp = sb.competitions()
credentials were not supplied. open data access only
comp look like this:comp.head(15)
| competition_id | season_id | country_name | competition_name | competition_gender | season_name | match_updated | match_available | |
|---|---|---|---|---|---|---|---|---|
| 0 | 16 | 4 | Europe | Champions League | male | 2018/2019 | 2021-04-19T17:36:05.724116 | 2021-04-19T17:36:05.724116 |
| 1 | 16 | 1 | Europe | Champions League | male | 2017/2018 | 2021-01-23T21:55:30.425330 | 2021-01-23T21:55:30.425330 |
| 2 | 16 | 2 | Europe | Champions League | male | 2016/2017 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
| 3 | 16 | 27 | Europe | Champions League | male | 2015/2016 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
| 4 | 16 | 26 | Europe | Champions League | male | 2014/2015 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
| 5 | 16 | 25 | Europe | Champions League | male | 2013/2014 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
| 6 | 16 | 24 | Europe | Champions League | male | 2012/2013 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
| 7 | 16 | 23 | Europe | Champions League | male | 2011/2012 | 2020-08-26T12:33:15.869622 | 2020-07-29T05:00 |
| 8 | 16 | 22 | Europe | Champions League | male | 2010/2011 | 2020-07-29T05:00 | 2020-07-29T05:00 |
| 9 | 16 | 21 | Europe | Champions League | male | 2009/2010 | 2020-07-29T05:00 | 2020-07-29T05:00 |
| 10 | 16 | 41 | Europe | Champions League | male | 2008/2009 | 2020-08-30T10:18:39.435424 | 2020-08-30T10:18:39.435424 |
| 11 | 16 | 39 | Europe | Champions League | male | 2006/2007 | 2021-03-31T04:18:30.437060 | 2021-03-31T04:18:30.437060 |
| 12 | 16 | 37 | Europe | Champions League | male | 2004/2005 | 2021-04-01T06:18:57.459032 | 2021-04-01T06:18:57.459032 |
| 13 | 16 | 44 | Europe | Champions League | male | 2003/2004 | 2021-04-01T00:34:59.472485 | 2021-04-01T00:34:59.472485 |
| 14 | 16 | 76 | Europe | Champions League | male | 1999/2000 | 2020-07-29T05:00 | 2020-07-29T05:00 |
comp to understand the dataset better and draw out relevant information from the same. Type the following:print(comp.columns)
Index(['competition_id', 'season_id', 'country_name', 'competition_name',
'competition_gender', 'season_name', 'match_updated',
'match_available'],
dtype='object')
comp dataset. For example, if we look into the row where the competition_id is 16and the season_id is 1, we notice that the country_name is Europe, the competition_name is Champions League, the season_name is 2017/2018, and so on. Suppose we are satisfied with the above information, and we want to analyze a game from 1017/18's Champions League season. We keep note of the competition_id and season_id at that row, which are 16 and 1 respectively. Now we extract out the matches dataset by typing the following:mat = sb.matches(competition_id = 16, season_id = 1)
credentials were not supplied. open data access only
mat looks like this:mat
| match_id | match_date | kick_off | competition | season | home_team | away_team | home_score | away_score | match_status | match_status_360 | last_updated | last_updated_360 | match_week | competition_stage | stadium | referee | data_version | shot_fidelity_version | xy_fidelity_version | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 18245 | 2018-05-26 | 20:45:00.000 | Europe - Champions League | 2017/2018 | Real Madrid | Liverpool | 3 | 1 | available | unscheduled | 2021-01-23T21:55:30.425330 | None | 7 | Final | NSK Olimpijs'kyj | M. Mažić | 1.1.0 | 2 | 2 |
mat dataset gives us the match ids, the match dates, the kick off times, the home and away teams, the scores in a particular match, the name of the referee who officiated the match and so on. Here match_id is the unique id that will help us draw out event data for a particular match from 2017/18's Champion's League season. Let us get the event data from a match. We see there is only one match available, with match_id = 18245, which was the Champions League final match between Real Madrid and Liverpool ⚽ that took place at the Olimpiyskiy National Sports Complex, Moscow stadium and it ended up 3-1 in Real Madrid's favor 👀 👀 👀 👀. A great feat to be honest! Let us obtain the event data for this match.events = sb.events(match_id = 18245)
credentials were not supplied. open data access only
events fetching us the event data for the particular match looks like this:events
| 50_50 | ball_receipt_outcome | ball_recovery_recovery_failure | block_offensive | carry_end_location | clearance_aerial_won | clearance_body_part | clearance_head | clearance_left_foot | clearance_right_foot | ... | shot_statsbomb_xg | shot_technique | shot_type | substitution_outcome | substitution_replacement | tactics | team | timestamp | type | under_pressure | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 41212, 'lineup': [{'player': {'i... | Real Madrid | 00:00:00.000 | Starting XI | NaN |
| 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 433, 'lineup': [{'player': {'id'... | Liverpool | 00:00:00.000 | Starting XI | NaN |
| 2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:00.000 | Half Start | NaN |
| 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
| 4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3492 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:42:21.211 | Offside | NaN |
| 3493 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:48:31.725 | Half End | NaN |
| 3494 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:48:31.725 | Half End | NaN |
| 3495 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:48:02.893 | Half End | NaN |
| 3496 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:48:02.893 | Half End | NaN |
3497 rows × 86 columns
print(events.columns)
Index(['50_50', 'ball_receipt_outcome', 'ball_recovery_recovery_failure',
'block_offensive', 'carry_end_location', 'clearance_aerial_won',
'clearance_body_part', 'clearance_head', 'clearance_left_foot',
'clearance_right_foot', 'counterpress', 'dribble_nutmeg',
'dribble_outcome', 'dribble_overrun', 'duel_outcome', 'duel_type',
'duration', 'foul_committed_advantage', 'foul_committed_card',
'foul_committed_type', 'foul_won_advantage', 'foul_won_defensive',
'goalkeeper_body_part', 'goalkeeper_end_location', 'goalkeeper_outcome',
'goalkeeper_position', 'goalkeeper_punched_out', 'goalkeeper_technique',
'goalkeeper_type', 'id', 'index', 'injury_stoppage_in_chain',
'interception_outcome', 'location', 'match_id', 'minute', 'off_camera',
'out', 'pass_aerial_won', 'pass_angle', 'pass_assisted_shot_id',
'pass_body_part', 'pass_cross', 'pass_cut_back', 'pass_end_location',
'pass_goal_assist', 'pass_height', 'pass_inswinging', 'pass_length',
'pass_miscommunication', 'pass_outcome', 'pass_outswinging',
'pass_recipient', 'pass_shot_assist', 'pass_straight', 'pass_switch',
'pass_technique', 'pass_through_ball', 'pass_type', 'period',
'play_pattern', 'player', 'position', 'possession', 'possession_team',
'related_events', 'second', 'shot_aerial_won', 'shot_body_part',
'shot_end_location', 'shot_first_time', 'shot_freeze_frame',
'shot_key_pass_id', 'shot_one_on_one', 'shot_outcome', 'shot_redirect',
'shot_statsbomb_xg', 'shot_technique', 'shot_type',
'substitution_outcome', 'substitution_replacement', 'tactics', 'team',
'timestamp', 'type', 'under_pressure'],
dtype='object')
mplsoccer.If you do not want to recreate a football pitch manually using Python (which would be rather tedious) you can simply use the mplsoccer module without any concern. To my knowledge it provides with the best functionalities to draw a football pitch. This package is maintained by Anmol Durgapal and Andrew Rowlinson.
Keep in mind you can do a lot more advanced visualization stuffs using mplsoccer besides drawing a football pitch. We will encounter them as we move forward with other posts later. For now let us focus on visualizing a pitch in the simplest way possible. We need to pip install the package first:
pip install mplsoccer
mplsoccer uses Python 3.6+. Next we need to import matplotlib and the Pitch classes. import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
pitch = Pitch(pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
tight_layout = False, goal_type = 'box', label = True, axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
pitch_color argument to 'grass' giving an impression of a real life football pitch. Note that any other color can be set, for example, 'black' or any color represented by its hex code. Discarding the stripe argument removes the darker stripes that appear on the pitch. The line_color is self-explanatory and the user can change its color too according to their need. By default, the axis, labels and the ticks representing the scales are switched off. The user can turn it on by setting label, axis and tick arguments to be True, as evident in the above pitch. Let us draw a different pitch with its color changed and stripes removed.pitch = Pitch(pitch_color='black', line_color = 'white', constrained_layout = True,
tight_layout = False, goal_type = 'box', label = True, axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
Now let us focus on the axis range for a moment. By default the Pitch() function sets the pitch type to be statsbomb where the y-axis is inverted and ranges from 80 to 0. The x-axis ranges from 0 to 120. We will be mostly working with statsbomb data, so, these orientations of the axes won't be of much concern. Nevertheless this information is way too useful and we must keep this in mind, in case we deal with football data from other sources.
To be precise, there are eight different pitch types that mplsoccer provides us with. They are 'statsbomb', 'opta', 'tracab', 'skillcorner', 'wyscout','metricasports', 'uefa', and 'custom'. This can be set using the pitch_type argument inside the Pitch() function. Let us check the orientation of the uefa pitch type:
pitch = Pitch(pitch_color='grass', stripe = True, pitch_type = 'uefa', line_color = 'white', constrained_layout = True,
tight_layout = False, goal_type = 'box', label = True, axis = True, tick = True)
fig, ax = pitch.draw()
plt.show()
orientation and set it to 'vertical'.pitch = Pitch(orientation = 'vertical', pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
plt.show()
view argument to be 'half'.pitch = Pitch(view = 'half', pitch_color = 'grass', line_color = 'white', stripe = True, constrained_layout = True,
tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
plt.show()
mplsoccer. The pitches can be further customized to meet the users' visualization needs. Keep an eye on the mplsoccer documentation to learn more about the same. In the next section, we will learn how to visualize a pass network for a particular team from a match and analyze the network with the help of NetworkX Python package. This package will help us use basic concepts from complex network analysis literature to analyze the network and deduce some interesting properties from the same.pip install networkx
networkx:import networkx as nx
pip install the seaborn package which is a Python package built on matplotlib and is used for generating informative and appealing statistical graphs for analysis purposes. pip install seaborn
seaborn tooimport seaborn as sns
events dataset, we notice that there is a column named tactics that provides us with team lineups, formations, player ids and their jersey number from both the teams. The corresponding row values for column type gives us an idea about whether it was the starting 11 formation or was a tactical shift or any other developments in the teams. Let us generate a completely new dataset only focusing on the tactics and the type columns. We will filter the data in such a way that the tactics column has no rows set to nan.tact = events[events['tactics'].isnull() == False]
tact = tact[['tactics', 'team', 'type']]
tact dataset looks like:tact
| tactics | team | type | |
|---|---|---|---|
| 0 | {'formation': 41212, 'lineup': [{'player': {'i... | Real Madrid | Starting XI |
| 1 | {'formation': 433, 'lineup': [{'player': {'id'... | Liverpool | Starting XI |
| 3489 | {'formation': 433, 'lineup': [{'player': {'id'... | Liverpool | Tactical Shift |
| 3490 | {'formation': 433, 'lineup': [{'player': {'id'... | Real Madrid | Tactical Shift |
| 3491 | {'formation': 433, 'lineup': [{'player': {'id'... | Real Madrid | Tactical Shift |
type column in tact, we see that they are set as 'Starting XI', one for each team. Let us separately fetch the data for the teams, filtering by typetact = tact[tact['type'] == 'Starting XI']
tact_Real = tact[tact['team'] == 'Real Madrid']
tact_Liv = tact[tact['team'] == 'Liverpool']
tact_Real = tact_Real['tactics']
tact_Liv = tact_Liv['tactics']
tact_Real and tact_Liv are dataframes made of single rows with their indices (Which we will use to extract the data), and the tactics column is made up of a Python dict object. For now we are only interested in the key 'lineup' to get all the information about the players from the teams. dict_Real = tact_Real[0]['lineup']
dict_Liv = tact_Liv[1]['lineup']
from_dict() function provided by pandas to convert the dictionary into a dataframe.lineup_Real = pd.DataFrame.from_dict(dict_Real)
lineup_Real
| player | position | jersey_number | |
|---|---|---|---|
| 0 | {'id': 5597, 'name': 'Keylor Navas Gamboa'} | {'id': 1, 'name': 'Goalkeeper'} | 1 |
| 1 | {'id': 5721, 'name': 'Daniel Carvajal Ramos'} | {'id': 2, 'name': 'Right Back'} | 2 |
| 2 | {'id': 5485, 'name': 'Raphaël Varane'} | {'id': 3, 'name': 'Right Center Back'} | 5 |
| 3 | {'id': 5201, 'name': 'Sergio Ramos García'} | {'id': 5, 'name': 'Left Center Back'} | 4 |
| 4 | {'id': 5552, 'name': 'Marcelo Vieira da Silva ... | {'id': 6, 'name': 'Left Back'} | 12 |
| 5 | {'id': 5539, 'name': 'Carlos Henrique Casimiro'} | {'id': 10, 'name': 'Center Defensive Midfield'} | 14 |
| 6 | {'id': 5463, 'name': 'Luka Modrić'} | {'id': 13, 'name': 'Right Center Midfield'} | 10 |
| 7 | {'id': 5574, 'name': 'Toni Kroos'} | {'id': 15, 'name': 'Left Center Midfield'} | 8 |
| 8 | {'id': 4926, 'name': 'Francisco Román Alarcón ... | {'id': 19, 'name': 'Center Attacking Midfield'} | 22 |
| 9 | {'id': 19677, 'name': 'Karim Benzema'} | {'id': 22, 'name': 'Right Center Forward'} | 9 |
| 10 | {'id': 5207, 'name': 'Cristiano Ronaldo dos Sa... | {'id': 24, 'name': 'Left Center Forward'} | 7 |
lineup_Liv = pd.DataFrame.from_dict(dict_Liv)
lineup_Liv
| player | position | jersey_number | |
|---|---|---|---|
| 0 | {'id': 3630, 'name': 'Loris Karius'} | {'id': 1, 'name': 'Goalkeeper'} | 1 |
| 1 | {'id': 3664, 'name': 'Trent Alexander-Arnold'} | {'id': 2, 'name': 'Right Back'} | 66 |
| 2 | {'id': 3471, 'name': 'Dejan Lovren'} | {'id': 3, 'name': 'Right Center Back'} | 6 |
| 3 | {'id': 3669, 'name': 'Virgil van Dijk'} | {'id': 5, 'name': 'Left Center Back'} | 4 |
| 4 | {'id': 3655, 'name': 'Andrew Robertson'} | {'id': 6, 'name': 'Left Back'} | 26 |
| 5 | {'id': 3532, 'name': 'Jordan Brian Henderson'} | {'id': 10, 'name': 'Center Defensive Midfield'} | 14 |
| 6 | {'id': 3567, 'name': 'Georginio Wijnaldum'} | {'id': 13, 'name': 'Right Center Midfield'} | 5 |
| 7 | {'id': 3473, 'name': 'James Philip Milner'} | {'id': 15, 'name': 'Left Center Midfield'} | 7 |
| 8 | {'id': 3531, 'name': 'Mohamed Salah'} | {'id': 17, 'name': 'Right Wing'} | 11 |
| 9 | {'id': 3629, 'name': 'Sadio Mané'} | {'id': 21, 'name': 'Left Wing'} | 19 |
| 10 | {'id': 3535, 'name': 'Roberto Firmino Barbosa ... | {'id': 23, 'name': 'Center Forward'} | 9 |
players_Real = {}
for i in range(len(lineup_Real)):
key = lineup_Real.player[i]['name']
val = lineup_Real.jersey_number[i]
players_Real[key] = str(val)
print(players_Real)
{'Keylor Navas Gamboa': '1', 'Daniel Carvajal Ramos': '2', 'Raphaël Varane': '5', 'Sergio Ramos García': '4', 'Marcelo Vieira da Silva Júnior': '12', 'Carlos Henrique Casimiro': '14', 'Luka Modrić': '10', 'Toni Kroos': '8', 'Francisco Román Alarcón Suárez': '22', 'Karim Benzema': '9', 'Cristiano Ronaldo dos Santos Aveiro': '7'}
players_Liv = {}
for i in range(len(lineup_Liv)):
key = lineup_Liv.player[i]['name']
val = lineup_Liv.jersey_number[i]
players_Liv[key] = str(val)
print(players_Liv)
{'Loris Karius': '1', 'Trent Alexander-Arnold': '66', 'Dejan Lovren': '6', 'Virgil van Dijk': '4', 'Andrew Robertson': '26', 'Jordan Brian Henderson': '14', 'Georginio Wijnaldum': '5', 'James Philip Milner': '7', 'Mohamed Salah': '11', 'Sadio Mané': '19', 'Roberto Firmino Barbosa de Oliveira': '9'}
So, we have collected the names and the jersey number of the players (starting 11) from both the teams in separate dictionaries named players_Real and players_Liv. These will come handy later!
Now from the events dataset we will extract out the relevant columns for our pass network analysis purposes.
events_pn = events[['minute', 'second', 'team', 'type', 'location', 'pass_end_location', 'pass_outcome', 'player']]
events_pn dataframe:events_pn.head(10)
| minute | second | team | type | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | Real Madrid | Starting XI | NaN | NaN | NaN | NaN |
| 1 | 0 | 0 | Liverpool | Starting XI | NaN | NaN | NaN | NaN |
| 2 | 0 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
| 3 | 0 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
| 4 | 45 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
| 5 | 45 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
| 6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
| 7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
| 8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
| 9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
events_pn dataframe:events_pn.tail(10)
| minute | second | team | type | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 3487 | 82 | 27 | Liverpool | Substitution | NaN | NaN | NaN | James Philip Milner |
| 3488 | 88 | 21 | Real Madrid | Substitution | NaN | NaN | NaN | Karim Benzema |
| 3489 | 31 | 41 | Liverpool | Tactical Shift | NaN | NaN | NaN | NaN |
| 3490 | 61 | 1 | Real Madrid | Tactical Shift | NaN | NaN | NaN | NaN |
| 3491 | 88 | 34 | Real Madrid | Tactical Shift | NaN | NaN | NaN | NaN |
| 3492 | 42 | 21 | Real Madrid | Offside | [114.8, 41.4] | NaN | NaN | Karim Benzema |
| 3493 | 48 | 31 | Real Madrid | Half End | NaN | NaN | NaN | NaN |
| 3494 | 48 | 31 | Liverpool | Half End | NaN | NaN | NaN | NaN |
| 3495 | 93 | 2 | Liverpool | Half End | NaN | NaN | NaN | NaN |
| 3496 | 93 | 2 | Real Madrid | Half End | NaN | NaN | NaN | NaN |
events_Real = events_pn[events_pn['team'] == 'Real Madrid']
events_Liv = events_pn[events_pn['team'] == 'Liverpool']
View the first 10 rows from both the datasets:
events_Real.head(10)
| minute | second | team | type | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | Real Madrid | Starting XI | NaN | NaN | NaN | NaN |
| 2 | 0 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
| 5 | 45 | 0 | Real Madrid | Half Start | NaN | NaN | NaN | NaN |
| 8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
| 9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
| 10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos |
| 11 | 0 | 15 | Real Madrid | Pass | [36.2, 75.3] | [43.6, 62.0] | Incomplete | Carlos Henrique Casimiro |
| 16 | 0 | 25 | Real Madrid | Pass | [14.7, 23.2] | [56.7, 6.2] | Incomplete | Sergio Ramos García |
| 17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior |
| 18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro |
events_Liv.head(10)
| minute | second | team | type | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | Liverpool | Starting XI | NaN | NaN | NaN | NaN |
| 3 | 0 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
| 4 | 45 | 0 | Liverpool | Half Start | NaN | NaN | NaN | NaN |
| 6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
| 7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
| 12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson |
| 13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané |
| 14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira |
| 15 | 0 | 22 | Liverpool | Pass | [92.2, 50.9] | [109.7, 46.4] | Incomplete | Mohamed Salah |
| 25 | 1 | 7 | Liverpool | Pass | [42.0, 75.9] | [115.6, 59.3] | Incomplete | Trent Alexander-Arnold |
type is set to Pass.events_pn_Real = events_Real[events_Real['type'] == 'Pass']
events_pn_Liv = events_Liv[events_Liv['type'] == 'Pass']
events_pn_Real.head(10)
| minute | second | team | type | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
| 9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
| 10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos |
| 11 | 0 | 15 | Real Madrid | Pass | [36.2, 75.3] | [43.6, 62.0] | Incomplete | Carlos Henrique Casimiro |
| 16 | 0 | 25 | Real Madrid | Pass | [14.7, 23.2] | [56.7, 6.2] | Incomplete | Sergio Ramos García |
| 17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior |
| 18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro |
| 19 | 0 | 46 | Real Madrid | Pass | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos |
| 20 | 0 | 52 | Real Madrid | Pass | [41.3, 54.8] | [34.4, 40.2] | NaN | Raphaël Varane |
| 21 | 0 | 55 | Real Madrid | Pass | [39.1, 36.5] | [65.4, 13.1] | NaN | Sergio Ramos García |
events_pn_Liv.head(10)
| minute | second | team | type | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
| 7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
| 12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson |
| 13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané |
| 14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira |
| 15 | 0 | 22 | Liverpool | Pass | [92.2, 50.9] | [109.7, 46.4] | Incomplete | Mohamed Salah |
| 25 | 1 | 7 | Liverpool | Pass | [42.0, 75.9] | [115.6, 59.3] | Incomplete | Trent Alexander-Arnold |
| 37 | 2 | 0 | Liverpool | Pass | [9.9, 39.1] | [28.1, 4.2] | NaN | Virgil van Dijk |
| 38 | 2 | 3 | Liverpool | Pass | [43.2, 2.8] | [50.1, 4.8] | Incomplete | Andrew Robertson |
| 39 | 2 | 7 | Liverpool | Pass | [53.2, 0.1] | [50.0, 4.0] | NaN | Andrew Robertson |
events_pn_Real dataset, we are focusing on the second and the third row (index 1 and 2). Luka Modrić makes the pass at around 0th minute and 10th second (Second row) and Daniel Carvajal Ramos receives the pass at around 0th minute and 11th second (third row). So in both the datasets we need to add two extra columns named as pass_maker and pass_receiver, where pass_maker column would be similar to player column and the pass_receiver column would be the player column whose index would be shifted by one place in the negative direction. This can be achieved by the shift() function provided by pandas. We will perform this operation on both events_pn_Real and events_pn_Liv.events_pn_Real['pass_maker'] = events_pn_Real['player']
events_pn_Real['pass_receiver'] = events_pn_Real['player'].shift(-1)
events_pn_Liv['pass_maker'] = events_pn_Liv['player']
events_pn_Liv['pass_receiver'] = events_pn_Liv['player'].shift(-1)
events_pn_Real.head(10)
| minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
|---|---|---|---|---|---|---|---|---|---|---|
| 8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane | Raphaël Varane | Luka Modrić |
| 9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić | Luka Modrić | Daniel Carvajal Ramos |
| 10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos | Daniel Carvajal Ramos | Carlos Henrique Casimiro |
| 11 | 0 | 15 | Real Madrid | Pass | [36.2, 75.3] | [43.6, 62.0] | Incomplete | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Sergio Ramos García |
| 16 | 0 | 25 | Real Madrid | Pass | [14.7, 23.2] | [56.7, 6.2] | Incomplete | Sergio Ramos García | Sergio Ramos García | Marcelo Vieira da Silva Júnior |
| 17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Carlos Henrique Casimiro |
| 18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Toni Kroos |
| 19 | 0 | 46 | Real Madrid | Pass | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos | Toni Kroos | Raphaël Varane |
| 20 | 0 | 52 | Real Madrid | Pass | [41.3, 54.8] | [34.4, 40.2] | NaN | Raphaël Varane | Raphaël Varane | Sergio Ramos García |
| 21 | 0 | 55 | Real Madrid | Pass | [39.1, 36.5] | [65.4, 13.1] | NaN | Sergio Ramos García | Sergio Ramos García | Cristiano Ronaldo dos Santos Aveiro |
events_pn_Liv.head(10)
| minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner | James Philip Milner | Dejan Lovren |
| 7 | 0 | 3 | Liverpool | Pass | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren | Dejan Lovren | Jordan Brian Henderson |
| 12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson | Jordan Brian Henderson | Sadio Mané |
| 13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané | Sadio Mané | Roberto Firmino Barbosa de Oliveira |
| 14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira | Roberto Firmino Barbosa de Oliveira | Mohamed Salah |
| 15 | 0 | 22 | Liverpool | Pass | [92.2, 50.9] | [109.7, 46.4] | Incomplete | Mohamed Salah | Mohamed Salah | Trent Alexander-Arnold |
| 25 | 1 | 7 | Liverpool | Pass | [42.0, 75.9] | [115.6, 59.3] | Incomplete | Trent Alexander-Arnold | Trent Alexander-Arnold | Virgil van Dijk |
| 37 | 2 | 0 | Liverpool | Pass | [9.9, 39.1] | [28.1, 4.2] | NaN | Virgil van Dijk | Virgil van Dijk | Andrew Robertson |
| 38 | 2 | 3 | Liverpool | Pass | [43.2, 2.8] | [50.1, 4.8] | Incomplete | Andrew Robertson | Andrew Robertson | Andrew Robertson |
| 39 | 2 | 7 | Liverpool | Pass | [53.2, 0.1] | [50.0, 4.0] | NaN | Andrew Robertson | Andrew Robertson | James Philip Milner |
pass_outcome are set as nan are actually the successful passes. We will again filter the datasets by successful passes:events_pn_Real = events_pn_Real[events_pn_Real['pass_outcome'].isnull() == True].reset_index()
events_pn_Liv = events_pn_Liv[events_pn_Liv['pass_outcome'].isnull() == True].reset_index()
events_pn_Real.head(10)
| index | minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane | Raphaël Varane | Luka Modrić |
| 1 | 9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić | Luka Modrić | Daniel Carvajal Ramos |
| 2 | 10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos | Daniel Carvajal Ramos | Carlos Henrique Casimiro |
| 3 | 17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Carlos Henrique Casimiro |
| 4 | 18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Toni Kroos |
| 5 | 19 | 0 | 46 | Real Madrid | Pass | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos | Toni Kroos | Raphaël Varane |
| 6 | 20 | 0 | 52 | Real Madrid | Pass | [41.3, 54.8] | [34.4, 40.2] | NaN | Raphaël Varane | Raphaël Varane | Sergio Ramos García |
| 7 | 21 | 0 | 55 | Real Madrid | Pass | [39.1, 36.5] | [65.4, 13.1] | NaN | Sergio Ramos García | Sergio Ramos García | Cristiano Ronaldo dos Santos Aveiro |
| 8 | 22 | 0 | 58 | Real Madrid | Pass | [64.5, 11.1] | [54.2, 5.6] | NaN | Cristiano Ronaldo dos Santos Aveiro | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior |
| 9 | 23 | 0 | 59 | Real Madrid | Pass | [55.3, 5.5] | [83.9, 4.3] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Karim Benzema |
events_pn_Liv.head(10)
| index | minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner | James Philip Milner | Dejan Lovren |
| 1 | 12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson | Jordan Brian Henderson | Sadio Mané |
| 2 | 13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané | Sadio Mané | Roberto Firmino Barbosa de Oliveira |
| 3 | 14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira | Roberto Firmino Barbosa de Oliveira | Mohamed Salah |
| 4 | 37 | 2 | 0 | Liverpool | Pass | [9.9, 39.1] | [28.1, 4.2] | NaN | Virgil van Dijk | Virgil van Dijk | Andrew Robertson |
| 5 | 39 | 2 | 7 | Liverpool | Pass | [53.2, 0.1] | [50.0, 4.0] | NaN | Andrew Robertson | Andrew Robertson | James Philip Milner |
| 6 | 40 | 2 | 10 | Liverpool | Pass | [45.5, 4.0] | [27.4, 16.8] | NaN | James Philip Milner | James Philip Milner | Virgil van Dijk |
| 7 | 41 | 2 | 13 | Liverpool | Pass | [26.7, 19.6] | [27.8, 47.3] | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren |
| 8 | 42 | 2 | 16 | Liverpool | Pass | [28.0, 45.4] | [28.4, 21.4] | NaN | Dejan Lovren | Dejan Lovren | Virgil van Dijk |
| 9 | 43 | 2 | 19 | Liverpool | Pass | [30.4, 25.7] | [30.7, 52.9] | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren |
So it seems we have been able to logically clean and modify the datasets. Now we are only focused on building the pass network among the players who were in the starting 11 from both the teams. So we will discard out the rows which consist of pass events that took place after the first substitution for either of the teams. Let us find the minute and second of the first substitution for both Real Madrid and Liverpool.
Now, let us filter the datasets events_Real and events_Liv by setting the type to be Substitution. This will give us the information of when the first substitution had taken place for the teams.
substitution_Real = events_Real[events_Real['type'] == 'Substitution']
substitution_Liv = events_Liv[events_Liv['type'] == 'Substitution']
substitution_Real
| minute | second | team | type | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 3485 | 36 | 17 | Real Madrid | Substitution | NaN | NaN | NaN | Daniel Carvajal Ramos |
| 3486 | 60 | 56 | Real Madrid | Substitution | NaN | NaN | NaN | Francisco Román Alarcón Suárez |
| 3488 | 88 | 21 | Real Madrid | Substitution | NaN | NaN | NaN | Karim Benzema |
substitution_Liv
| minute | second | team | type | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 3484 | 29 | 39 | Liverpool | Substitution | NaN | NaN | NaN | Mohamed Salah |
| 3487 | 82 | 27 | Liverpool | Substitution | NaN | NaN | NaN | James Philip Milner |
Real Madrid at the 36th minute and 17th second, whereas for Liverpool it takes place around 29th minute and 39th second. Let us find these out by writing a small Python code:substitution_Real_minute = np.min(substitution_Real['minute'])
substitution_Real_minute_data = substitution_Real[substitution_Real['minute'] == substitution_Real_minute]
substitution_Real_second = np.min(substitution_Real_minute_data['second'])
print("minute =", substitution_Real_minute, "second =", substitution_Real_second)
minute = 36 second = 17
substitution_Liv_minute = np.min(substitution_Liv['minute'])
substitution_Liv_minute_data = substitution_Liv[substitution_Liv['minute'] == substitution_Liv_minute]
substitution_Liv_second = np.min(substitution_Liv_minute_data['second'])
print("minute = ", substitution_Liv_minute, "second = ", substitution_Liv_second)
minute = 29 second = 39
events_pn_Real = events_pn_Real[(events_pn_Real['minute'] <= substitution_Real_minute)]
events_pn_Liv = events_pn_Liv[(events_pn_Liv['minute'] <= substitution_Liv_minute)]
events_pn_Real.head(10)
| index | minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 8 | 0 | 8 | Real Madrid | Pass | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane | Raphaël Varane | Luka Modrić |
| 1 | 9 | 0 | 10 | Real Madrid | Pass | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić | Luka Modrić | Daniel Carvajal Ramos |
| 2 | 10 | 0 | 11 | Real Madrid | Pass | [22.3, 76.6] | [33.4, 68.0] | NaN | Daniel Carvajal Ramos | Daniel Carvajal Ramos | Carlos Henrique Casimiro |
| 3 | 17 | 0 | 40 | Real Madrid | Pass | [57.5, 4.6] | [49.2, 15.6] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Carlos Henrique Casimiro |
| 4 | 18 | 0 | 43 | Real Madrid | Pass | [48.8, 18.4] | [49.8, 12.5] | NaN | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Toni Kroos |
| 5 | 19 | 0 | 46 | Real Madrid | Pass | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos | Toni Kroos | Raphaël Varane |
| 6 | 20 | 0 | 52 | Real Madrid | Pass | [41.3, 54.8] | [34.4, 40.2] | NaN | Raphaël Varane | Raphaël Varane | Sergio Ramos García |
| 7 | 21 | 0 | 55 | Real Madrid | Pass | [39.1, 36.5] | [65.4, 13.1] | NaN | Sergio Ramos García | Sergio Ramos García | Cristiano Ronaldo dos Santos Aveiro |
| 8 | 22 | 0 | 58 | Real Madrid | Pass | [64.5, 11.1] | [54.2, 5.6] | NaN | Cristiano Ronaldo dos Santos Aveiro | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior |
| 9 | 23 | 0 | 59 | Real Madrid | Pass | [55.3, 5.5] | [83.9, 4.3] | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Karim Benzema |
events_pn_Liv.head(10)
| index | minute | second | team | type | location | pass_end_location | pass_outcome | player | pass_maker | pass_receiver | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6 | 0 | 0 | Liverpool | Pass | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner | James Philip Milner | Dejan Lovren |
| 1 | 12 | 0 | 16 | Liverpool | Pass | [76.5, 18.1] | [84.8, 9.5] | NaN | Jordan Brian Henderson | Jordan Brian Henderson | Sadio Mané |
| 2 | 13 | 0 | 18 | Liverpool | Pass | [84.4, 10.0] | [92.5, 19.1] | NaN | Sadio Mané | Sadio Mané | Roberto Firmino Barbosa de Oliveira |
| 3 | 14 | 0 | 19 | Liverpool | Pass | [91.6, 21.3] | [90.6, 50.7] | NaN | Roberto Firmino Barbosa de Oliveira | Roberto Firmino Barbosa de Oliveira | Mohamed Salah |
| 4 | 37 | 2 | 0 | Liverpool | Pass | [9.9, 39.1] | [28.1, 4.2] | NaN | Virgil van Dijk | Virgil van Dijk | Andrew Robertson |
| 5 | 39 | 2 | 7 | Liverpool | Pass | [53.2, 0.1] | [50.0, 4.0] | NaN | Andrew Robertson | Andrew Robertson | James Philip Milner |
| 6 | 40 | 2 | 10 | Liverpool | Pass | [45.5, 4.0] | [27.4, 16.8] | NaN | James Philip Milner | James Philip Milner | Virgil van Dijk |
| 7 | 41 | 2 | 13 | Liverpool | Pass | [26.7, 19.6] | [27.8, 47.3] | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren |
| 8 | 42 | 2 | 16 | Liverpool | Pass | [28.0, 45.4] | [28.4, 21.4] | NaN | Dejan Lovren | Dejan Lovren | Virgil van Dijk |
| 9 | 43 | 2 | 19 | Liverpool | Pass | [30.4, 25.7] | [30.7, 52.9] | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren |
Now from the datasets, we will split the location and the pass_end_location columns into two columns each representing the coordinates and name them as pass_maker_x, pass_maker_y, pass_receiver_x and pass_receiver_y.
Let us manipulate the dataset for Real Madrid first:
Loc = events_pn_Real['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['pass_maker_x', 'pass_maker_y'])
Loc_end = events_pn_Real['pass_end_location']
Loc_end = pd.DataFrame(Loc_end.to_list(), columns=['pass_receiver_x', 'pass_receiver_y'])
events_pn_Real['pass_maker_x'] = Loc['pass_maker_x']
events_pn_Real['pass_maker_y'] = Loc['pass_maker_y']
events_pn_Real['pass_receiver_x'] = Loc_end['pass_receiver_x']
events_pn_Real['pass_receiver_y'] = Loc_end['pass_receiver_y']
events_pn_Real = events_pn_Real[['index', 'minute', 'second', 'team', 'type', 'pass_outcome',
'player', 'pass_maker', 'pass_receiver', 'pass_maker_x',
'pass_maker_y', 'pass_receiver_x', 'pass_receiver_y']]
events_pn_Real.head(10)
| index | minute | second | team | type | pass_outcome | player | pass_maker | pass_receiver | pass_maker_x | pass_maker_y | pass_receiver_x | pass_receiver_y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 8 | 0 | 8 | Real Madrid | Pass | NaN | Raphaël Varane | Raphaël Varane | Luka Modrić | 27.4 | 60.2 | 36.1 | 71.6 |
| 1 | 9 | 0 | 10 | Real Madrid | Pass | NaN | Luka Modrić | Luka Modrić | Daniel Carvajal Ramos | 35.3 | 75.4 | 22.4 | 76.6 |
| 2 | 10 | 0 | 11 | Real Madrid | Pass | NaN | Daniel Carvajal Ramos | Daniel Carvajal Ramos | Carlos Henrique Casimiro | 22.3 | 76.6 | 33.4 | 68.0 |
| 3 | 17 | 0 | 40 | Real Madrid | Pass | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Carlos Henrique Casimiro | 57.5 | 4.6 | 49.2 | 15.6 |
| 4 | 18 | 0 | 43 | Real Madrid | Pass | NaN | Carlos Henrique Casimiro | Carlos Henrique Casimiro | Toni Kroos | 48.8 | 18.4 | 49.8 | 12.5 |
| 5 | 19 | 0 | 46 | Real Madrid | Pass | NaN | Toni Kroos | Toni Kroos | Raphaël Varane | 48.8 | 13.9 | 36.1 | 56.3 |
| 6 | 20 | 0 | 52 | Real Madrid | Pass | NaN | Raphaël Varane | Raphaël Varane | Sergio Ramos García | 41.3 | 54.8 | 34.4 | 40.2 |
| 7 | 21 | 0 | 55 | Real Madrid | Pass | NaN | Sergio Ramos García | Sergio Ramos García | Cristiano Ronaldo dos Santos Aveiro | 39.1 | 36.5 | 65.4 | 13.1 |
| 8 | 22 | 0 | 58 | Real Madrid | Pass | NaN | Cristiano Ronaldo dos Santos Aveiro | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior | 64.5 | 11.1 | 54.2 | 5.6 |
| 9 | 23 | 0 | 59 | Real Madrid | Pass | NaN | Marcelo Vieira da Silva Júnior | Marcelo Vieira da Silva Júnior | Karim Benzema | 55.3 | 5.5 | 83.9 | 4.3 |
Loc = events_pn_Liv['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['pass_maker_x', 'pass_maker_y'])
Loc_end = events_pn_Liv['pass_end_location']
Loc_end = pd.DataFrame(Loc_end.to_list(), columns=['pass_receiver_x', 'pass_receiver_y'])
events_pn_Liv['pass_maker_x'] = Loc['pass_maker_x']
events_pn_Liv['pass_maker_y'] = Loc['pass_maker_y']
events_pn_Liv['pass_receiver_x'] = Loc_end['pass_receiver_x']
events_pn_Liv['pass_receiver_y'] = Loc_end['pass_receiver_y']
events_pn_Liv = events_pn_Liv[['index', 'minute', 'second', 'team', 'type', 'pass_outcome',
'player', 'pass_maker', 'pass_receiver', 'pass_maker_x',
'pass_maker_y', 'pass_receiver_x', 'pass_receiver_y']]
events_pn_Liv.head(10)
| index | minute | second | team | type | pass_outcome | player | pass_maker | pass_receiver | pass_maker_x | pass_maker_y | pass_receiver_x | pass_receiver_y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6 | 0 | 0 | Liverpool | Pass | NaN | James Philip Milner | James Philip Milner | Dejan Lovren | 60.0 | 40.0 | 32.1 | 41.2 |
| 1 | 12 | 0 | 16 | Liverpool | Pass | NaN | Jordan Brian Henderson | Jordan Brian Henderson | Sadio Mané | 76.5 | 18.1 | 84.8 | 9.5 |
| 2 | 13 | 0 | 18 | Liverpool | Pass | NaN | Sadio Mané | Sadio Mané | Roberto Firmino Barbosa de Oliveira | 84.4 | 10.0 | 92.5 | 19.1 |
| 3 | 14 | 0 | 19 | Liverpool | Pass | NaN | Roberto Firmino Barbosa de Oliveira | Roberto Firmino Barbosa de Oliveira | Mohamed Salah | 91.6 | 21.3 | 90.6 | 50.7 |
| 4 | 37 | 2 | 0 | Liverpool | Pass | NaN | Virgil van Dijk | Virgil van Dijk | Andrew Robertson | 9.9 | 39.1 | 28.1 | 4.2 |
| 5 | 39 | 2 | 7 | Liverpool | Pass | NaN | Andrew Robertson | Andrew Robertson | James Philip Milner | 53.2 | 0.1 | 50.0 | 4.0 |
| 6 | 40 | 2 | 10 | Liverpool | Pass | NaN | James Philip Milner | James Philip Milner | Virgil van Dijk | 45.5 | 4.0 | 27.4 | 16.8 |
| 7 | 41 | 2 | 13 | Liverpool | Pass | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren | 26.7 | 19.6 | 27.8 | 47.3 |
| 8 | 42 | 2 | 16 | Liverpool | Pass | NaN | Dejan Lovren | Dejan Lovren | Virgil van Dijk | 28.0 | 45.4 | 28.4 | 21.4 |
| 9 | 43 | 2 | 19 | Liverpool | Pass | NaN | Virgil van Dijk | Virgil van Dijk | Dejan Lovren | 30.4 | 25.7 | 30.7 | 52.9 |
av_loc_Real = events_pn_Real.groupby('pass_maker').agg({'pass_maker_x':['mean'],
'pass_maker_y':['mean', 'count']})
av_loc_Real
| pass_maker_x | pass_maker_y | ||
|---|---|---|---|
| mean | mean | count | |
| pass_maker | |||
| Carlos Henrique Casimiro | 60.845455 | 31.836364 | 11 |
| Cristiano Ronaldo dos Santos Aveiro | 81.580000 | 29.160000 | 10 |
| Daniel Carvajal Ramos | 64.341667 | 73.875000 | 24 |
| Francisco Román Alarcón Suárez | 62.323529 | 27.082353 | 17 |
| Karim Benzema | 65.081818 | 27.936364 | 11 |
| Keylor Navas Gamboa | 10.870000 | 41.810000 | 10 |
| Luka Modrić | 60.604762 | 55.028571 | 21 |
| Marcelo Vieira da Silva Júnior | 59.865217 | 11.130435 | 23 |
| Raphaël Varane | 37.436364 | 58.354545 | 22 |
| Sergio Ramos García | 41.282353 | 24.514706 | 34 |
| Toni Kroos | 51.190000 | 24.275000 | 40 |
groupby() function from pandas splits events_pn_Real into groups indexed by the player names. Whereas, the agg() function aggregates the data into the averages of the pass makers' locations and also counts the number of passes made by these players. Now refine the column names of av_loc_Real:av_loc_Real.columns = ['pass_maker_x', 'pass_maker_y', 'count']
av_loc_Real
| pass_maker_x | pass_maker_y | count | |
|---|---|---|---|
| pass_maker | |||
| Carlos Henrique Casimiro | 60.845455 | 31.836364 | 11 |
| Cristiano Ronaldo dos Santos Aveiro | 81.580000 | 29.160000 | 10 |
| Daniel Carvajal Ramos | 64.341667 | 73.875000 | 24 |
| Francisco Román Alarcón Suárez | 62.323529 | 27.082353 | 17 |
| Karim Benzema | 65.081818 | 27.936364 | 11 |
| Keylor Navas Gamboa | 10.870000 | 41.810000 | 10 |
| Luka Modrić | 60.604762 | 55.028571 | 21 |
| Marcelo Vieira da Silva Júnior | 59.865217 | 11.130435 | 23 |
| Raphaël Varane | 37.436364 | 58.354545 | 22 |
| Sergio Ramos García | 41.282353 | 24.514706 | 34 |
| Toni Kroos | 51.190000 | 24.275000 | 40 |
Liverpool:av_loc_Liv = events_pn_Liv.groupby('pass_maker').agg({'pass_maker_x':['mean'],
'pass_maker_y':['mean', 'count']})
av_loc_Liv.columns = ['pass_maker_x', 'pass_maker_y', 'count']
av_loc_Liv
| pass_maker_x | pass_maker_y | count | |
|---|---|---|---|
| pass_maker | |||
| Andrew Robertson | 59.815385 | 6.830769 | 13 |
| Dejan Lovren | 41.690909 | 60.172727 | 11 |
| Georginio Wijnaldum | 76.390909 | 28.518182 | 11 |
| James Philip Milner | 72.353333 | 36.153333 | 15 |
| Jordan Brian Henderson | 61.035294 | 37.152941 | 17 |
| Loris Karius | 12.914286 | 40.385714 | 7 |
| Mohamed Salah | 77.550000 | 64.710000 | 10 |
| Roberto Firmino Barbosa de Oliveira | 78.250000 | 43.570000 | 10 |
| Sadio Mané | 86.275000 | 22.075000 | 4 |
| Trent Alexander-Arnold | 64.666667 | 72.550000 | 12 |
| Virgil van Dijk | 43.366667 | 25.433333 | 9 |
A to a player B is not identical to a pass from player B to player A). We will use the groupby() and the count() function to count the number of rows where a unique player A passed the ball to another unique player B.pass_Real = events_pn_Real.groupby(['pass_maker', 'pass_receiver']).index.count().reset_index()
pass_Real.head(10)
| pass_maker | pass_receiver | index | |
|---|---|---|---|
| 0 | Carlos Henrique Casimiro | Daniel Carvajal Ramos | 1 |
| 1 | Carlos Henrique Casimiro | Luka Modrić | 1 |
| 2 | Carlos Henrique Casimiro | Marcelo Vieira da Silva Júnior | 1 |
| 3 | Carlos Henrique Casimiro | Raphaël Varane | 1 |
| 4 | Carlos Henrique Casimiro | Sergio Ramos García | 1 |
| 5 | Carlos Henrique Casimiro | Toni Kroos | 6 |
| 6 | Cristiano Ronaldo dos Santos Aveiro | Daniel Carvajal Ramos | 3 |
| 7 | Cristiano Ronaldo dos Santos Aveiro | Karim Benzema | 1 |
| 8 | Cristiano Ronaldo dos Santos Aveiro | Luka Modrić | 1 |
| 9 | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior | 4 |
pass_Liv = events_pn_Liv.groupby(['pass_maker', 'pass_receiver']).index.count().reset_index()
pass_Liv.head(10)
| pass_maker | pass_receiver | index | |
|---|---|---|---|
| 0 | Andrew Robertson | Andrew Robertson | 1 |
| 1 | Andrew Robertson | Georginio Wijnaldum | 3 |
| 2 | Andrew Robertson | James Philip Milner | 3 |
| 3 | Andrew Robertson | Jordan Brian Henderson | 2 |
| 4 | Andrew Robertson | Roberto Firmino Barbosa de Oliveira | 2 |
| 5 | Andrew Robertson | Virgil van Dijk | 2 |
| 6 | Dejan Lovren | James Philip Milner | 1 |
| 7 | Dejan Lovren | Jordan Brian Henderson | 1 |
| 8 | Dejan Lovren | Loris Karius | 2 |
| 9 | Dejan Lovren | Mohamed Salah | 1 |
index column to number_of_passes:pass_Real.rename(columns = {'index':'number_of_passes'}, inplace = True)
pass_Real.head(10)
| pass_maker | pass_receiver | number_of_passes | |
|---|---|---|---|
| 0 | Carlos Henrique Casimiro | Daniel Carvajal Ramos | 1 |
| 1 | Carlos Henrique Casimiro | Luka Modrić | 1 |
| 2 | Carlos Henrique Casimiro | Marcelo Vieira da Silva Júnior | 1 |
| 3 | Carlos Henrique Casimiro | Raphaël Varane | 1 |
| 4 | Carlos Henrique Casimiro | Sergio Ramos García | 1 |
| 5 | Carlos Henrique Casimiro | Toni Kroos | 6 |
| 6 | Cristiano Ronaldo dos Santos Aveiro | Daniel Carvajal Ramos | 3 |
| 7 | Cristiano Ronaldo dos Santos Aveiro | Karim Benzema | 1 |
| 8 | Cristiano Ronaldo dos Santos Aveiro | Luka Modrić | 1 |
| 9 | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior | 4 |
pass_Liv.rename(columns = {'index':'number_of_passes'}, inplace = True)
pass_Liv.head(10)
| pass_maker | pass_receiver | number_of_passes | |
|---|---|---|---|
| 0 | Andrew Robertson | Andrew Robertson | 1 |
| 1 | Andrew Robertson | Georginio Wijnaldum | 3 |
| 2 | Andrew Robertson | James Philip Milner | 3 |
| 3 | Andrew Robertson | Jordan Brian Henderson | 2 |
| 4 | Andrew Robertson | Roberto Firmino Barbosa de Oliveira | 2 |
| 5 | Andrew Robertson | Virgil van Dijk | 2 |
| 6 | Dejan Lovren | James Philip Milner | 1 |
| 7 | Dejan Lovren | Jordan Brian Henderson | 1 |
| 8 | Dejan Lovren | Loris Karius | 2 |
| 9 | Dejan Lovren | Mohamed Salah | 1 |
av_loc_Real and pass_Real, Let us identify the left and the right dataframes for performing the merge. Here, av_loc_Real is the left dataframe and pass_Real is the right. We will use the merge() function from pandas to carry out the merging operation. pass_Real = pass_Real.merge(av_loc_Real, left_on = 'pass_maker', right_index = True)
pass_Real.head(10)
| pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | |
|---|---|---|---|---|---|---|
| 0 | Carlos Henrique Casimiro | Daniel Carvajal Ramos | 1 | 60.845455 | 31.836364 | 11 |
| 1 | Carlos Henrique Casimiro | Luka Modrić | 1 | 60.845455 | 31.836364 | 11 |
| 2 | Carlos Henrique Casimiro | Marcelo Vieira da Silva Júnior | 1 | 60.845455 | 31.836364 | 11 |
| 3 | Carlos Henrique Casimiro | Raphaël Varane | 1 | 60.845455 | 31.836364 | 11 |
| 4 | Carlos Henrique Casimiro | Sergio Ramos García | 1 | 60.845455 | 31.836364 | 11 |
| 5 | Carlos Henrique Casimiro | Toni Kroos | 6 | 60.845455 | 31.836364 | 11 |
| 6 | Cristiano Ronaldo dos Santos Aveiro | Daniel Carvajal Ramos | 3 | 81.580000 | 29.160000 | 10 |
| 7 | Cristiano Ronaldo dos Santos Aveiro | Karim Benzema | 1 | 81.580000 | 29.160000 | 10 |
| 8 | Cristiano Ronaldo dos Santos Aveiro | Luka Modrić | 1 | 81.580000 | 29.160000 | 10 |
| 9 | Cristiano Ronaldo dos Santos Aveiro | Marcelo Vieira da Silva Júnior | 4 | 81.580000 | 29.160000 | 10 |
The left_on argument specifies the column names to join our right dataframe on, and the right_index argument decides whether to use the index from the right dataframe as the key for joining. Let us do the same operation for the other team:
pass_Liv = pass_Liv.merge(av_loc_Liv, left_on = 'pass_maker', right_index = True)
pass_Liv.head(10)
| pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | |
|---|---|---|---|---|---|---|
| 0 | Andrew Robertson | Andrew Robertson | 1 | 59.815385 | 6.830769 | 13 |
| 1 | Andrew Robertson | Georginio Wijnaldum | 3 | 59.815385 | 6.830769 | 13 |
| 2 | Andrew Robertson | James Philip Milner | 3 | 59.815385 | 6.830769 | 13 |
| 3 | Andrew Robertson | Jordan Brian Henderson | 2 | 59.815385 | 6.830769 | 13 |
| 4 | Andrew Robertson | Roberto Firmino Barbosa de Oliveira | 2 | 59.815385 | 6.830769 | 13 |
| 5 | Andrew Robertson | Virgil van Dijk | 2 | 59.815385 | 6.830769 | 13 |
| 6 | Dejan Lovren | James Philip Milner | 1 | 41.690909 | 60.172727 | 11 |
| 7 | Dejan Lovren | Jordan Brian Henderson | 1 | 41.690909 | 60.172727 | 11 |
| 8 | Dejan Lovren | Loris Karius | 2 | 41.690909 | 60.172727 | 11 |
| 9 | Dejan Lovren | Mohamed Salah | 1 | 41.690909 | 60.172727 | 11 |
pass_Real = pass_Real.merge(av_loc_Real, left_on = 'pass_receiver',
right_index = True, suffixes = ['', '_receipt'])
pass_Real.rename(columns = {'pass_maker_x_receipt':'pass_receiver_x',
'pass_maker_y_receipt':'pass_receiver_y',
'count_receipt':'number_of_passes_received'}, inplace = True)
pass_Real = pass_Real[pass_Real['pass_maker'] != pass_Real['pass_receiver']].reset_index()
pass_Real
| index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Carlos Henrique Casimiro | Daniel Carvajal Ramos | 1 | 60.845455 | 31.836364 | 11 | 64.341667 | 73.875 | 24 |
| 1 | 6 | Cristiano Ronaldo dos Santos Aveiro | Daniel Carvajal Ramos | 3 | 81.580000 | 29.160000 | 10 | 64.341667 | 73.875 | 24 |
| 2 | 21 | Francisco Román Alarcón Suárez | Daniel Carvajal Ramos | 2 | 62.323529 | 27.082353 | 17 | 64.341667 | 73.875 | 24 |
| 3 | 29 | Karim Benzema | Daniel Carvajal Ramos | 2 | 65.081818 | 27.936364 | 11 | 64.341667 | 73.875 | 24 |
| 4 | 39 | Luka Modrić | Daniel Carvajal Ramos | 10 | 60.604762 | 55.028571 | 21 | 64.341667 | 73.875 | 24 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 73 | 16 | Daniel Carvajal Ramos | Keylor Navas Gamboa | 1 | 64.341667 | 73.875000 | 24 | 10.870000 | 41.810 | 10 |
| 74 | 30 | Karim Benzema | Keylor Navas Gamboa | 1 | 65.081818 | 27.936364 | 11 | 10.870000 | 41.810 | 10 |
| 75 | 57 | Raphaël Varane | Keylor Navas Gamboa | 2 | 37.436364 | 58.354545 | 22 | 10.870000 | 41.810 | 10 |
| 76 | 64 | Sergio Ramos García | Keylor Navas Gamboa | 1 | 41.282353 | 24.514706 | 34 | 10.870000 | 41.810 | 10 |
| 77 | 74 | Toni Kroos | Keylor Navas Gamboa | 1 | 51.190000 | 24.275000 | 40 | 10.870000 | 41.810 | 10 |
78 rows × 10 columns
pass_Liv = pass_Liv.merge(av_loc_Liv, left_on = 'pass_receiver',
right_index = True, suffixes = ['', '_receipt'])
pass_Liv.rename(columns = {'pass_maker_x_receipt':'pass_receiver_x',
'pass_maker_y_receipt':'pass_receiver_y',
'count_receipt':'number_of_passes_received'}, inplace = True)
pass_Liv = pass_Liv[pass_Liv['pass_maker'] != pass_Liv['pass_receiver']].reset_index()
pass_Liv
| index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 12 | Georginio Wijnaldum | Andrew Robertson | 4 | 76.390909 | 28.518182 | 11 | 59.815385 | 6.830769 | 13 |
| 1 | 18 | James Philip Milner | Andrew Robertson | 1 | 72.353333 | 36.153333 | 15 | 59.815385 | 6.830769 | 13 |
| 2 | 28 | Jordan Brian Henderson | Andrew Robertson | 1 | 61.035294 | 37.152941 | 17 | 59.815385 | 6.830769 | 13 |
| 3 | 36 | Loris Karius | Andrew Robertson | 1 | 12.914286 | 40.385714 | 7 | 59.815385 | 6.830769 | 13 |
| 4 | 54 | Trent Alexander-Arnold | Andrew Robertson | 1 | 64.666667 | 72.550000 | 12 | 59.815385 | 6.830769 | 13 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 59 | 55 | Trent Alexander-Arnold | Dejan Lovren | 1 | 64.666667 | 72.550000 | 12 | 41.690909 | 60.172727 | 11 |
| 60 | 61 | Virgil van Dijk | Dejan Lovren | 3 | 43.366667 | 25.433333 | 9 | 41.690909 | 60.172727 | 11 |
| 61 | 25 | James Philip Milner | Sadio Mané | 2 | 72.353333 | 36.153333 | 15 | 86.275000 | 22.075000 | 4 |
| 62 | 33 | Jordan Brian Henderson | Sadio Mané | 1 | 61.035294 | 37.152941 | 17 | 86.275000 | 22.075000 | 4 |
| 63 | 43 | Mohamed Salah | Sadio Mané | 1 | 77.550000 | 64.710000 | 10 | 86.275000 | 22.075000 | 4 |
64 rows × 10 columns
pass_Real_new = pass_Real.replace({"pass_maker": players_Real, "pass_receiver": players_Real})
pass_Real_new
| index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 14 | 2 | 1 | 60.845455 | 31.836364 | 11 | 64.341667 | 73.875 | 24 |
| 1 | 6 | 7 | 2 | 3 | 81.580000 | 29.160000 | 10 | 64.341667 | 73.875 | 24 |
| 2 | 21 | 22 | 2 | 2 | 62.323529 | 27.082353 | 17 | 64.341667 | 73.875 | 24 |
| 3 | 29 | 9 | 2 | 2 | 65.081818 | 27.936364 | 11 | 64.341667 | 73.875 | 24 |
| 4 | 39 | 10 | 2 | 10 | 60.604762 | 55.028571 | 21 | 64.341667 | 73.875 | 24 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 73 | 16 | 2 | 1 | 1 | 64.341667 | 73.875000 | 24 | 10.870000 | 41.810 | 10 |
| 74 | 30 | 9 | 1 | 1 | 65.081818 | 27.936364 | 11 | 10.870000 | 41.810 | 10 |
| 75 | 57 | 5 | 1 | 2 | 37.436364 | 58.354545 | 22 | 10.870000 | 41.810 | 10 |
| 76 | 64 | 4 | 1 | 1 | 41.282353 | 24.514706 | 34 | 10.870000 | 41.810 | 10 |
| 77 | 74 | 8 | 1 | 1 | 51.190000 | 24.275000 | 40 | 10.870000 | 41.810 | 10 |
78 rows × 10 columns
pass_Liv_new = pass_Liv.replace({"pass_maker": players_Liv, "pass_receiver": players_Liv})
pass_Liv_new
| index | pass_maker | pass_receiver | number_of_passes | pass_maker_x | pass_maker_y | count | pass_receiver_x | pass_receiver_y | number_of_passes_received | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 12 | 5 | 26 | 4 | 76.390909 | 28.518182 | 11 | 59.815385 | 6.830769 | 13 |
| 1 | 18 | 7 | 26 | 1 | 72.353333 | 36.153333 | 15 | 59.815385 | 6.830769 | 13 |
| 2 | 28 | 14 | 26 | 1 | 61.035294 | 37.152941 | 17 | 59.815385 | 6.830769 | 13 |
| 3 | 36 | 1 | 26 | 1 | 12.914286 | 40.385714 | 7 | 59.815385 | 6.830769 | 13 |
| 4 | 54 | 66 | 26 | 1 | 64.666667 | 72.550000 | 12 | 59.815385 | 6.830769 | 13 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 59 | 55 | 66 | 6 | 1 | 64.666667 | 72.550000 | 12 | 41.690909 | 60.172727 | 11 |
| 60 | 61 | 4 | 6 | 3 | 43.366667 | 25.433333 | 9 | 41.690909 | 60.172727 | 11 |
| 61 | 25 | 7 | 19 | 2 | 72.353333 | 36.153333 | 15 | 86.275000 | 22.075000 | 4 |
| 62 | 33 | 14 | 19 | 1 | 61.035294 | 37.152941 | 17 | 86.275000 | 22.075000 | 4 |
| 63 | 43 | 11 | 19 | 1 | 77.550000 | 64.710000 | 10 | 86.275000 | 22.075000 | 4 |
64 rows × 10 columns
pitch = Pitch(pitch_color='grass', goal_type = 'box', line_color='white', stripe = True,
constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
arrows = pitch.arrows(pass_Real.pass_maker_x, pass_Real.pass_maker_y,
pass_Real.pass_receiver_x, pass_Real.pass_receiver_y, lw = 5,
color = 'black', zorder = 1, ax=ax)
nodes = pitch.scatter(av_loc_Real.pass_maker_x, av_loc_Real.pass_maker_y,
s=350, color = 'white', edgecolors='black', linewidth=1, alpha = 1, ax = ax)
for index, row in av_loc_Real.iterrows():
pitch.annotate(players_Real[row.name], xy=(row.pass_maker_x, row.pass_maker_y),
c ='black', va = 'center', ha = 'center', size = 10, ax = ax)
plt.title("Pass network for Real Madrid against Liverpool", size = 20)
plt.show()
pitch = Pitch(pitch_color='grass', goal_type = 'box', stripe = True,
line_color='white', constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
arrows = pitch.arrows(120 - pass_Liv.pass_maker_x, pass_Liv.pass_maker_y,
120 - pass_Liv.pass_receiver_x, pass_Liv.pass_receiver_y, lw = 5,
color = 'black', zorder = 1, ax = ax)
nodes = pitch.scatter(120 - av_loc_Liv.pass_maker_x, av_loc_Liv.pass_maker_y,
s=350, color = 'red', edgecolors = 'black', linewidth=1, alpha = 1, ax = ax)
for index, row in av_loc_Liv.iterrows():
pitch.annotate(players_Liv[row.name], xy=(120 - row.pass_maker_x, row.pass_maker_y),
c ='black', va = 'center', ha = 'center', size = 10, ax = ax)
plt.title("Pass network for Liverpool against Real Madrid", size = 20)
plt.show()
Liverpool's pass network visualization, we subtract the x coordinates from 120 just to reverse the x-axis.Now that we have been successful in correctly visualizing the pass networks of the teams involved in the game, we will now start analyzing our networks using metrics from the literature of complex network analysis.
Note that both of our networks are directed weighted graphs, with number of passes as the weight for a directed edge.
Let us first develop the isomorphic graph to the one we just visualized for Real Madrid, but this time using the networkx package. First we will use the relevant columns from the pass_Real_new dataset:
pass_Real_new = pass_Real_new[['pass_maker', 'pass_receiver', 'number_of_passes']]
pass_Real_new
| pass_maker | pass_receiver | number_of_passes | |
|---|---|---|---|
| 0 | 14 | 2 | 1 |
| 1 | 7 | 2 | 3 |
| 2 | 22 | 2 | 2 |
| 3 | 9 | 2 | 2 |
| 4 | 10 | 2 | 10 |
| ... | ... | ... | ... |
| 73 | 2 | 1 | 1 |
| 74 | 9 | 1 | 1 |
| 75 | 5 | 1 | 2 |
| 76 | 4 | 1 | 1 |
| 77 | 8 | 1 | 1 |
78 rows × 3 columns
pass_Real_new to a list of tuples, where each row is converted to a tuple. This is required for drawing a networkx graph.L_Real = pass_Real_new.apply(tuple, axis=1).tolist()
print(L_Real)
[('14', '2', 1), ('7', '2', 3), ('22', '2', 2), ('9', '2', 2), ('10', '2', 10), ('12', '2', 2), ('5', '2', 3), ('4', '2', 3), ('8', '2', 1), ('14', '10', 1), ('7', '10', 1), ('2', '10', 7), ('22', '10', 1), ('12', '10', 1), ('5', '10', 5), ('4', '10', 2), ('8', '10', 5), ('14', '12', 1), ('7', '12', 4), ('22', '12', 2), ('1', '12', 2), ('10', '12', 1), ('4', '12', 9), ('8', '12', 4), ('14', '5', 1), ('2', '5', 5), ('1', '5', 2), ('10', '5', 3), ('12', '5', 2), ('4', '5', 5), ('8', '5', 4), ('14', '4', 1), ('7', '4', 1), ('22', '4', 5), ('9', '4', 1), ('1', '4', 4), ('10', '4', 1), ('12', '4', 2), ('5', '4', 6), ('8', '4', 10), ('14', '8', 6), ('2', '8', 1), ('22', '8', 4), ('9', '8', 4), ('1', '8', 1), ('10', '8', 4), ('12', '8', 5), ('5', '8', 4), ('4', '8', 9), ('7', '9', 1), ('2', '9', 1), ('22', '9', 1), ('1', '9', 1), ('10', '9', 1), ('12', '9', 3), ('5', '9', 1), ('8', '9', 2), ('2', '14', 2), ('9', '14', 2), ('10', '14', 1), ('12', '14', 2), ('5', '14', 1), ('8', '14', 2), ('2', '7', 2), ('22', '7', 2), ('9', '7', 1), ('12', '7', 2), ('4', '7', 1), ('8', '7', 2), ('2', '22', 3), ('12', '22', 4), ('4', '22', 4), ('8', '22', 8), ('2', '1', 1), ('9', '1', 1), ('5', '1', 2), ('4', '1', 1), ('8', '1', 1)]
G_Real = nx.DiGraph()
for i in range(len(L_Real)):
G_Real.add_edge(L_Real[i][0], L_Real[i][1], weight = L_Real[i][2])
edges_Real = G_Real.edges()
weights_Real = [G_Real[u][v]['weight'] for u, v in edges_Real]
nx.draw(G_Real, node_size=800, with_labels=True, node_color='white', width = weights_Real)
plt.gca().collections[0].set_edgecolor('black') # sets the edge color of the nodes to black
plt.title("Pass network for Real Madrid vs Liverpool", size = 20)
plt.show()
Liverpool too, let us first clean the pass_Liv_new dataset and then draw the isomorphic weighted directed graph:pass_Liv_new = pass_Liv_new[['pass_maker', 'pass_receiver', 'number_of_passes']]
pass_Liv_new
| pass_maker | pass_receiver | number_of_passes | |
|---|---|---|---|
| 0 | 5 | 26 | 4 |
| 1 | 7 | 26 | 1 |
| 2 | 14 | 26 | 1 |
| 3 | 1 | 26 | 1 |
| 4 | 66 | 26 | 1 |
| ... | ... | ... | ... |
| 59 | 66 | 6 | 1 |
| 60 | 4 | 6 | 3 |
| 61 | 7 | 19 | 2 |
| 62 | 14 | 19 | 1 |
| 63 | 11 | 19 | 1 |
64 rows × 3 columns
L_Liv = pass_Liv_new.apply(tuple, axis=1).tolist()
G_Liv = nx.DiGraph()
for i in range(len(L_Liv)):
G_Liv.add_edge(L_Liv[i][0], L_Liv[i][1], weight = L_Liv[i][2])
edges_Liv = G_Liv.edges()
weights_Liv = [G_Liv[u][v]['weight'] for u, v in edges_Liv]
nx.draw(G_Liv, node_size = 800, with_labels = True, node_color = 'red', width = weights_Liv)
plt.gca().collections[0].set_edgecolor('black') # sets the edge color of the nodes to black
plt.show()
Let us discuss some of the important functions from the networkx package that we have employed for drawing graphs:
DiGraph() function sets the base class for generating directed graphs,add_edge() function adds an edge between two nodes given by the first two arguments and the weight parameter sets the weight for this edgedraw() function visualizes a networkx graph and its parameters are self-explanatoryLet us now understand the degree, indegree and outdegree of a node from a directed weighted graph. Indegree of a node is the total number of edges that are directed towards the node, i.e, for our case, the total number of passes received by a player (node). Similarly, outdegree means the total number of edges that are directed outwards from the node, i.e, the total number of passes given by a player. Finally, the degree of a node is the total number of edges connected to a node (ignoring the directions of the edges), i.e, sum of the total number of passes given and the total number of passes received by a player. It is evident that the degree of a node is the sum of its indegree and outdegree.
We will use networkx to find out the node degrees from the pass network of Real Madrid.
# Prepare a dictionary with jersey numbers as the node ids,
# i.e, the dictionary keys and degrees as the dictionary values
deg_Real = dict(nx.degree(G_Real))
# convert a dictionary to a pandas dataframe
degree_Real = pd.DataFrame.from_dict(list(deg_Real.items()))
degree_Real.rename(columns = {0:'jersey_number', 1: 'node_degree'}, inplace = True)
degree_Real
| jersey_number | node_degree | |
|---|---|---|
| 0 | 14 | 12 |
| 1 | 2 | 17 |
| 2 | 7 | 11 |
| 3 | 22 | 11 |
| 4 | 9 | 14 |
| 5 | 10 | 15 |
| 6 | 12 | 16 |
| 7 | 5 | 14 |
| 8 | 4 | 17 |
| 9 | 8 | 19 |
| 10 | 1 | 10 |
Real Madrid in that game, we notice that the player with jersey number 8 (i.e, Toni Kroos) had the highest degree value of 19. On second are ranked the players with jersey number 2 and 4 with degree value 17, i.e, our favorite Spanish defenders 'Daniel Carvajal Ramos' and 'Sergio Ramos García' respectively. Tremendous! Let us use seaborn to visualize the deg_Real dictionary via histogram plot:X = list(deg_Real.keys())
Y = list(deg_Real.values())
sns.barplot(x = Y, y = X, palette = "magma")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("degree")
plt.title("Player pass degrees for Real Madrid vs Liverpool", size = 16)
plt.show()
Liverpool too:# Prepare a dictionary with jersey numbers as the node ids,
# i.e, the dictionary keys and degrees as the dictionary values
deg_Liv = dict(nx.degree(G_Liv))
# convert a dictionary to a pandas dataframe
degree_Liv = pd.DataFrame.from_dict(list(deg_Liv.items()))
degree_Liv.rename(columns = {0:'jersey_number', 1: 'node_degree'}, inplace = True)
degree_Liv
| jersey_number | node_degree | |
|---|---|---|
| 0 | 5 | 12 |
| 1 | 26 | 11 |
| 2 | 7 | 17 |
| 3 | 14 | 17 |
| 4 | 1 | 7 |
| 5 | 66 | 13 |
| 6 | 4 | 12 |
| 7 | 11 | 11 |
| 8 | 6 | 12 |
| 9 | 9 | 10 |
| 10 | 19 | 6 |
14 and 7, i,e 'Jordan Brian Henderson' and 'James Philip Milner' respectively. We will visualize the deg_Liv dictionary via histogram plot:X = list(deg_Liv.keys())
Y = list(deg_Liv.values())
sns.barplot(x = Y, y = X, palette = "magma")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("degree")
plt.title("Player pass degrees for Liverpool vs Real Madrid", size = 16)
plt.show()
indeg_Real = dict(G_Real.in_degree())
indegree_Real = pd.DataFrame.from_dict(list(indeg_Real.items()))
indegree_Real.rename(columns = {0:'jersey_number', 1: 'node_indegree'}, inplace = True)
X = list(indeg_Real.keys())
Y = list(indeg_Real.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("indegree")
plt.title("Player pass indegrees for Real Madrid vs Liverpool", size = 16)
plt.show()
indeg_Liv = dict(G_Liv.in_degree())
indegree_Liv = pd.DataFrame.from_dict(list(indeg_Liv.items()))
indegree_Liv.rename(columns = {0:'jersey_number', 1: 'node_indegree'}, inplace = True)
X = list(indeg_Liv.keys())
Y = list(indeg_Liv.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("indegree")
plt.title("Player pass indegrees for Liverpool vs Real Madrid", size = 16)
plt.show()
outdeg_Real = dict(G_Real.out_degree())
outdegree_Real = pd.DataFrame.from_dict(list(outdeg_Real.items()))
outdegree_Real.rename(columns = {0:'jersey_number', 1: 'node_outdegree'}, inplace = True)
X = list(outdeg_Real.keys())
Y = list(outdeg_Real.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("outdegree")
plt.title("Player pass outdegrees for Real Madrid vs Liverpool", size = 16)
plt.show()
outdeg_Liv = dict(G_Liv.out_degree())
outdegree_Liv = pd.DataFrame.from_dict(list(outdeg_Liv.items()))
outdegree_Liv.rename(columns = {0:'jersey_number', 1: 'node_outdegree'}, inplace = True)
X = list(outdeg_Liv.keys())
Y = list(outdeg_Liv.values())
sns.barplot(x = Y, y = X, palette = "hls")
plt.xticks(range(0, max(Y)+5, 2))
plt.ylabel("Player Jersey number")
plt.xlabel("outdegree")
plt.title("Player pass outdegrees for Liverpool vs Real Madrid", size = 16)
plt.show()
G_Real and G_Liv graphs:A_Real = nx.adjacency_matrix(G_Real)
A_Liv = nx.adjacency_matrix(G_Liv)
A_Real = A_Real.todense()
A_Liv = A_Liv.todense()
sns.heatmap(A_Real, annot = True, cmap ='gnuplot')
plt.title("Adjacency matrix for Real Madrid's pass network")
plt.show()
sns.heatmap(A_Liv, annot = True, cmap ='gnuplot')
plt.title("Adjacency matrix for Liverpool's pass network")
plt.show()
r_Real = nx.degree_pearson_correlation_coefficient(G_Real, weight = 'weight')
r_Liv = nx.degree_pearson_correlation_coefficient(G_Liv, weight = 'weight')
print(r_Real, r_Liv)
-0.17983836432860179 -0.2412372196699064
'weight' column in the pass network. Let us create a new graph for Real Madrid:pass_Real_mod = pass_Real_new[['pass_maker', 'pass_receiver']]
pass_Real_mod['1/nop'] = 1/pass_Real_new['number_of_passes']
pass_Real_mod.head(5)
| pass_maker | pass_receiver | 1/nop | |
|---|---|---|---|
| 0 | 14 | 2 | 1.000000 |
| 1 | 7 | 2 | 0.333333 |
| 2 | 22 | 2 | 0.500000 |
| 3 | 9 | 2 | 0.500000 |
| 4 | 10 | 2 | 0.100000 |
L_Real_mod = pass_Real_mod.apply(tuple, axis=1).tolist()
G_Real_mod = nx.DiGraph()
for i in range(len(L_Real_mod)):
G_Real_mod.add_edge(L_Real_mod[i][0], L_Real_mod[i][1], weight = L_Real_mod[i][2])
edges_Real_mod = G_Real_mod.edges()
weights_Real_mod = [G_Real_mod[u][v]['weight'] for u, v in edges_Real_mod]
nx.draw(G_Real_mod, node_size=800, with_labels=True, node_color='white', width = weights_Real_mod)
plt.gca().collections[0].set_edgecolor('black')
plt.title("Modified pass network for Real Madrid vs Liverpool", size = 20)
plt.show()
Liverpool too:pass_Liv_mod = pass_Liv_new[['pass_maker', 'pass_receiver']]
pass_Liv_mod['1/nop'] = 1/pass_Liv_new['number_of_passes']
pass_Liv_mod.head(5)
| pass_maker | pass_receiver | 1/nop | |
|---|---|---|---|
| 0 | 5 | 26 | 0.25 |
| 1 | 7 | 26 | 1.00 |
| 2 | 14 | 26 | 1.00 |
| 3 | 1 | 26 | 1.00 |
| 4 | 66 | 26 | 1.00 |
L_Liv_mod = pass_Liv_mod.apply(tuple, axis=1).tolist()
G_Liv_mod = nx.DiGraph()
for i in range(len(L_Liv_mod)):
G_Liv_mod.add_edge(L_Liv_mod[i][0], L_Liv_mod[i][1], weight = L_Liv_mod[i][2])
edges_Liv_mod = G_Liv_mod.edges()
weights_Liv_mod = [G_Liv_mod[u][v]['weight'] for u, v in edges_Liv_mod]
nx.draw(G_Liv_mod, node_size=800, with_labels=True, node_color='red', width = weights_Liv_mod)
plt.gca().collections[0].set_edgecolor('black')
plt.title("Modified pass network for Liverpool vs Real Madrid", size = 20)
plt.show()
Real Madrid:dis_Real = nx.shortest_path(G_Real_mod, weight = 'weight')
print(dis_Real)
{'14': {'14': ['14'], '2': ['14', '8', '10', '2'], '10': ['14', '8', '10'], '12': ['14', '8', '4', '12'], '5': ['14', '8', '5'], '4': ['14', '8', '4'], '8': ['14', '8'], '9': ['14', '8', '9'], '7': ['14', '8', '7'], '22': ['14', '8', '22'], '1': ['14', '8', '5', '1']}, '2': {'2': ['2'], '10': ['2', '10'], '5': ['2', '5'], '8': ['2', '10', '8'], '9': ['2', '5', '4', '12', '9'], '14': ['2', '14'], '7': ['2', '7'], '22': ['2', '22'], '1': ['2', '5', '1'], '12': ['2', '5', '4', '12'], '4': ['2', '5', '4']}, '7': {'7': ['7'], '2': ['7', '2'], '10': ['7', '2', '10'], '12': ['7', '12'], '4': ['7', '12', '8', '4'], '9': ['7', '12', '9'], '5': ['7', '2', '5'], '8': ['7', '12', '8'], '14': ['7', '12', '14'], '22': ['7', '12', '22'], '1': ['7', '2', '5', '1']}, '22': {'22': ['22'], '2': ['22', '2'], '10': ['22', '8', '10'], '12': ['22', '4', '12'], '4': ['22', '4'], '8': ['22', '8'], '9': ['22', '4', '12', '9'], '7': ['22', '7'], '5': ['22', '4', '5'], '1': ['22', '4', '5', '1'], '14': ['22', '8', '14']}, '9': {'9': ['9'], '2': ['9', '2'], '4': ['9', '8', '4'], '8': ['9', '8'], '14': ['9', '14'], '7': ['9', '8', '7'], '1': ['9', '1'], '10': ['9', '8', '10'], '12': ['9', '8', '4', '12'], '5': ['9', '8', '5'], '22': ['9', '8', '22']}, '10': {'10': ['10'], '2': ['10', '2'], '12': ['10', '8', '4', '12'], '5': ['10', '2', '5'], '4': ['10', '8', '4'], '8': ['10', '8'], '9': ['10', '8', '9'], '14': ['10', '2', '14'], '7': ['10', '2', '7'], '22': ['10', '8', '22'], '1': ['10', '2', '5', '1']}, '12': {'12': ['12'], '2': ['12', '2'], '10': ['12', '8', '10'], '5': ['12', '8', '5'], '4': ['12', '8', '4'], '8': ['12', '8'], '9': ['12', '9'], '14': ['12', '14'], '7': ['12', '7'], '22': ['12', '22'], '1': ['12', '8', '5', '1']}, '5': {'5': ['5'], '2': ['5', '10', '2'], '10': ['5', '10'], '4': ['5', '4'], '8': ['5', '8'], '9': ['5', '4', '12', '9'], '14': ['5', '8', '14'], '1': ['5', '1'], '12': ['5', '4', '12'], '7': ['5', '8', '7'], '22': ['5', '8', '22']}, '4': {'4': ['4'], '2': ['4', '2'], '10': ['4', '8', '10'], '12': ['4', '12'], '5': ['4', '5'], '8': ['4', '8'], '7': ['4', '12', '7'], '22': ['4', '8', '22'], '1': ['4', '5', '1'], '9': ['4', '12', '9'], '14': ['4', '12', '14']}, '8': {'8': ['8'], '2': ['8', '10', '2'], '10': ['8', '10'], '12': ['8', '4', '12'], '5': ['8', '5'], '4': ['8', '4'], '9': ['8', '9'], '14': ['8', '14'], '7': ['8', '7'], '22': ['8', '22'], '1': ['8', '5', '1']}, '1': {'1': ['1'], '12': ['1', '4', '12'], '5': ['1', '4', '5'], '4': ['1', '4'], '8': ['1', '4', '8'], '9': ['1', '4', '12', '9'], '2': ['1', '4', '2'], '10': ['1', '4', '8', '10'], '7': ['1', '4', '12', '7'], '22': ['1', '4', '8', '22'], '14': ['1', '4', '12', '14']}}
'Keylor Navas Gamboa' (jersey number 1) to 'Cristiano Ronaldo dos Santos Aveiro' (jersey number 7). We will type the following:print(dis_Real['1']['7'])
['1', '4', '12', '7']
'Keylor Navas Gamboa' (jersey: 1), to 'Cristiano Ronaldo dos Santos Aveiro' (jersey: 7) was to pass the ball first to 'Sergio Ramos García' (jersey: 4) who would pass to 'Marcelo Vieira da Silva Júnior' (jersey: 12) with him ultimately passing to 'Cristiano Ronaldo dos Santos Aveiro'. This seems like a good post-match analysis tool. I got this idea after discussing with Sarath Babu. Liverpool:dis_Liv = nx.shortest_path(G_Liv_mod, weight = 'weight')
print(dis_Liv)
{'5': {'5': ['5'], '26': ['5', '26'], '7': ['5', '26', '7'], '14': ['5', '14'], '4': ['5', '4'], '11': ['5', '11'], '66': ['5', '26', '7', '66'], '9': ['5', '26', '9'], '1': ['5', '14', '1'], '6': ['5', '14', '6'], '19': ['5', '26', '7', '19']}, '26': {'26': ['26'], '5': ['26', '5'], '7': ['26', '7'], '14': ['26', '14'], '9': ['26', '9'], '4': ['26', '4'], '11': ['26', '9', '11'], '66': ['26', '7', '66'], '1': ['26', '14', '1'], '6': ['26', '14', '6'], '19': ['26', '7', '19']}, '7': {'7': ['7'], '26': ['7', '66', '5', '26'], '5': ['7', '66', '5'], '14': ['7', '14'], '9': ['7', '66', '9'], '4': ['7', '4'], '1': ['7', '1'], '11': ['7', '66', '11'], '66': ['7', '66'], '6': ['7', '14', '6'], '19': ['7', '19']}, '14': {'14': ['14'], '26': ['14', '5', '26'], '5': ['14', '5'], '7': ['14', '7'], '4': ['14', '4'], '1': ['14', '1'], '66': ['14', '7', '66'], '6': ['14', '6'], '19': ['14', '7', '19'], '11': ['14', '7', '66', '11'], '9': ['14', '5', '26', '9']}, '1': {'1': ['1'], '26': ['1', '26'], '14': ['1', '14'], '4': ['1', '6', '4'], '6': ['1', '6'], '7': ['1', '6', '7'], '11': ['1', '6', '66', '11'], '66': ['1', '6', '66'], '5': ['1', '6', '66', '5'], '9': ['1', '6', '66', '9'], '19': ['1', '6', '7', '19']}, '66': {'66': ['66'], '26': ['66', '5', '26'], '5': ['66', '5'], '14': ['66', '14'], '9': ['66', '9'], '11': ['66', '11'], '6': ['66', '14', '6'], '7': ['66', '14', '7'], '4': ['66', '5', '4'], '19': ['66', '11', '19'], '1': ['66', '14', '1']}, '4': {'4': ['4'], '26': ['4', '26'], '5': ['4', '26', '5'], '14': ['4', '26', '14'], '66': ['4', '6', '66'], '6': ['4', '6'], '7': ['4', '26', '7'], '9': ['4', '26', '9'], '1': ['4', '6', '1'], '11': ['4', '6', '66', '11'], '19': ['4', '26', '7', '19']}, '11': {'11': ['11'], '5': ['11', '66', '5'], '7': ['11', '9', '7'], '9': ['11', '9'], '4': ['11', '4'], '66': ['11', '66'], '19': ['11', '19'], '14': ['11', '9', '14'], '6': ['11', '9', '14', '6'], '26': ['11', '66', '5', '26'], '1': ['11', '9', '14', '1']}, '6': {'6': ['6'], '7': ['6', '7'], '14': ['6', '66', '14'], '4': ['6', '4'], '1': ['6', '1'], '11': ['6', '66', '11'], '66': ['6', '66'], '26': ['6', '4', '26'], '5': ['6', '66', '5'], '9': ['6', '66', '9'], '19': ['6', '7', '19']}, '9': {'9': ['9'], '7': ['9', '7'], '14': ['9', '14'], '11': ['9', '11'], '66': ['9', '11', '66'], '6': ['9', '14', '6'], '5': ['9', '14', '5'], '4': ['9', '14', '4'], '19': ['9', '7', '19'], '26': ['9', '14', '5', '26'], '1': ['9', '14', '1']}, '19': {'19': ['19'], '7': ['19', '7'], '14': ['19', '14'], '9': ['19', '9'], '11': ['19', '9', '11'], '66': ['19', '9', '11', '66'], '6': ['19', '14', '6'], '5': ['19', '14', '5'], '4': ['19', '14', '4'], '26': ['19', '14', '5', '26'], '1': ['19', '14', '1']}}
print(dis_Liv['1']['9'])
['1', '6', '66', '9']
p tells us how far the furthest player node from p is positioned in the pass network. Let us calculate the eccentricities for all the 11 nodes for Real Madrid.E_Real = nx.eccentricity(G_Real_mod)
print(E_Real)
{'14': 2, '2': 2, '7': 2, '22': 2, '9': 2, '10': 2, '12': 2, '5': 2, '4': 2, '8': 1, '1': 2}
av_E_Real = sum(list(E_Real.values()))/len(E_Real)
print(av_E_Real)
1.9090909090909092
Liverpool:E_Liv = nx.eccentricity(G_Liv_mod)
print(E_Liv)
{'5': 2, '26': 2, '7': 1, '14': 2, '1': 2, '66': 2, '4': 2, '11': 2, '6': 2, '9': 2, '19': 2}
av_E_Liv = sum(list(E_Liv.values()))/len(E_Liv)
print(av_E_Liv)
1.9090909090909092
G_Real (note that this graph should not be the modified version)cc_Real = nx.average_clustering(G_Real, weight = 'weight')
print(cc_Real)
0.182334851979709
Liverpool:cc_Liv = nx.average_clustering(G_Liv, weight = 'weight')
print(cc_Liv)
0.27664278424505534
Real Madrid's pass network stating the fact that a lesser number of players passed the ball among each other, compared to that of Liverpool.centrality (especially the betweenness centrality) for each node in either team's pass network and understand which player was the most important in their pass network. For Real Madrid:bc_Real = nx.betweenness_centrality(G_Real, weight = 'weight')
print(bc_Real)
{'14': 0.15222222222222223, '2': 0.10685185185185186, '7': 0.05592592592592593, '22': 0.0, '9': 0.14462962962962964, '10': 0.12407407407407407, '12': 0.009259259259259259, '5': 0.007407407407407408, '4': 0.06851851851851852, '8': 0.031481481481481485, '1': 0.11703703703703704}
max_bc_Real = max(bc_Real, key = bc_Real.get)
print(max_bc_Real)
14
Liverpool:bc_Liv = nx.betweenness_centrality(G_Liv, weight = 'weight')
print(bc_Liv)
max_bc_Liv = max(bc_Liv, key = bc_Liv.get)
print(max_bc_Liv)
{'5': 0.06296296296296296, '26': 0.016666666666666666, '7': 0.2453703703703704, '14': 0.12407407407407407, '1': 0.002777777777777778, '66': 0.075, '4': 0.07222222222222222, '11': 0.05555555555555556, '6': 0.1259259259259259, '9': 0.021296296296296296, '19': 0.03888888888888889}
7
'Carlos Henrique Casimiro' (jersey: 4) from Real Madrid and 'James Philip Milner' (jersey: 7) from Liverpool. We have been able to compute some interesting results using complex network analysis on our pass networks. This completes my presentation. 😌😌😌😌😌😌😌😌😌events dataset:events.head(12)
| 50_50 | ball_receipt_outcome | ball_recovery_recovery_failure | block_offensive | carry_end_location | clearance_aerial_won | clearance_body_part | clearance_head | clearance_left_foot | clearance_right_foot | ... | shot_statsbomb_xg | shot_technique | shot_type | substitution_outcome | substitution_replacement | tactics | team | timestamp | type | under_pressure | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 41212, 'lineup': [{'player': {'i... | Real Madrid | 00:00:00.000 | Starting XI | NaN |
| 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | {'formation': 433, 'lineup': [{'player': {'id'... | Liverpool | 00:00:00.000 | Starting XI | NaN |
| 2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:00.000 | Half Start | NaN |
| 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
| 4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.000 | Half Start | NaN |
| 5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:00.000 | Half Start | NaN |
| 6 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:00.371 | Pass | NaN |
| 7 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Liverpool | 00:00:03.275 | Pass | NaN |
| 8 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:08.236 | Pass | NaN |
| 9 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:10.701 | Pass | True |
| 10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:11.728 | Pass | NaN |
| 11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Real Madrid | 00:00:15.994 | Pass | NaN |
12 rows × 86 columns
events dataset: 'team', 'type', 'minute', 'location', 'pass_end_location', 'pass_outcome', and 'player'events_pass = events[['team', 'type', 'minute', 'location',
'pass_end_location', 'pass_outcome', 'player']]
events_pass.head(10)
| team | type | minute | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|
| 0 | Real Madrid | Starting XI | 0 | NaN | NaN | NaN | NaN |
| 1 | Liverpool | Starting XI | 0 | NaN | NaN | NaN | NaN |
| 2 | Real Madrid | Half Start | 0 | NaN | NaN | NaN | NaN |
| 3 | Liverpool | Half Start | 0 | NaN | NaN | NaN | NaN |
| 4 | Liverpool | Half Start | 45 | NaN | NaN | NaN | NaN |
| 5 | Real Madrid | Half Start | 45 | NaN | NaN | NaN | NaN |
| 6 | Liverpool | Pass | 0 | [60.0, 40.0] | [32.1, 41.2] | NaN | James Philip Milner |
| 7 | Liverpool | Pass | 0 | [35.0, 40.8] | [92.7, 22.7] | Incomplete | Dejan Lovren |
| 8 | Real Madrid | Pass | 0 | [27.4, 60.2] | [36.1, 71.6] | NaN | Raphaël Varane |
| 9 | Real Madrid | Pass | 0 | [35.3, 75.4] | [22.4, 76.6] | NaN | Luka Modrić |
player column gives us the names of the players who were associated with different events during the match. Suppose, we are only interested to generate the pass map and its corresponding heat map for a particular player, for example, 'Toni Kroos'. For that, we have to clean the events_pass dataset in such a way that, we have only those rows where player='Toni Kroos'. Be very careful to use the exact spelling while performing these string operations, otherwise the reader will end up with unnecessary syntax and/or logical errors. Before filtering, let us collect the name of all the players who were involved in this match.players = events_pass.player.unique()
print(players)
[nan 'James Philip Milner' 'Dejan Lovren' 'Raphaël Varane' 'Luka Modrić' 'Daniel Carvajal Ramos' 'Carlos Henrique Casimiro' 'Jordan Brian Henderson' 'Sadio Mané' 'Roberto Firmino Barbosa de Oliveira' 'Mohamed Salah' 'Sergio Ramos García' 'Marcelo Vieira da Silva Júnior' 'Toni Kroos' 'Cristiano Ronaldo dos Santos Aveiro' 'Karim Benzema' 'Trent Alexander-Arnold' 'Keylor Navas Gamboa' 'Francisco Román Alarcón Suárez' 'Virgil van Dijk' 'Andrew Robertson' 'Georginio Wijnaldum' 'Loris Karius' 'Adam David Lallana' 'José Ignacio Fernández Iglesias' 'Gareth Frank Bale' 'Emre Can' 'Marco Asensio Willemsen']
'Toni Kroos' in our case). One good practice is to simply copy the particular player name from the players list that we just generated and use it according to our needs. This way, the spelling errors can be avoided. The filtration with python is an easy process:events_pass_p1 = events_pass[events_pass['player'] == 'Toni Kroos']
events_pass_p1.head(10)
| team | type | minute | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|
| 19 | Real Madrid | Pass | 0 | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos |
| 28 | Real Madrid | Pass | 1 | [23.4, 18.6] | [14.9, 26.8] | NaN | Toni Kroos |
| 79 | Real Madrid | Pass | 5 | [35.0, 24.9] | [57.1, 6.6] | NaN | Toni Kroos |
| 85 | Real Madrid | Pass | 6 | [41.7, 21.7] | [43.2, 41.2] | NaN | Toni Kroos |
| 89 | Real Madrid | Pass | 6 | [50.6, 28.3] | [49.2, 5.5] | NaN | Toni Kroos |
| 106 | Real Madrid | Pass | 7 | [42.2, 11.1] | [50.6, 13.4] | NaN | Toni Kroos |
| 125 | Real Madrid | Pass | 9 | [48.7, 53.1] | [50.1, 63.3] | NaN | Toni Kroos |
| 126 | Real Madrid | Pass | 9 | [56.7, 59.6] | [48.8, 30.9] | NaN | Toni Kroos |
| 128 | Real Madrid | Pass | 9 | [56.4, 15.2] | [48.8, 28.2] | NaN | Toni Kroos |
| 138 | Real Madrid | Pass | 10 | [42.9, 9.4] | [28.8, 39.2] | NaN | Toni Kroos |
type column in events_pass_p1 has event types other than passes, which we do not want for now. Thus, we have to again clean the dataset such that we have only those rows where type = Pass. The other rows can be discarded for now. Before that, let us analyse what event types other than 'Pass' are available for 'Toni Kroos':print(events_pass_p1.type.unique())
['Pass' 'Ball Receipt*' 'Carry' 'Ball Recovery' 'Pressure' 'Foul Won' 'Foul Committed' 'Dispossessed' 'Duel' 'Dribbled Past' 'Block']
0:events_pass_p1 = events_pass_p1[events_pass_p1['type'] == 'Pass'].reset_index()
events_pass_p1
| index | team | type | minute | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 0 | 19 | Real Madrid | Pass | 0 | [48.8, 13.9] | [36.1, 56.3] | NaN | Toni Kroos |
| 1 | 28 | Real Madrid | Pass | 1 | [23.4, 18.6] | [14.9, 26.8] | NaN | Toni Kroos |
| 2 | 79 | Real Madrid | Pass | 5 | [35.0, 24.9] | [57.1, 6.6] | NaN | Toni Kroos |
| 3 | 85 | Real Madrid | Pass | 6 | [41.7, 21.7] | [43.2, 41.2] | NaN | Toni Kroos |
| 4 | 89 | Real Madrid | Pass | 6 | [50.6, 28.3] | [49.2, 5.5] | NaN | Toni Kroos |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 87 | 975 | Real Madrid | Pass | 85 | [120.0, 80.0] | [116.1, 76.6] | NaN | Toni Kroos |
| 88 | 976 | Real Madrid | Pass | 85 | [120.0, 80.0] | [115.9, 77.3] | NaN | Toni Kroos |
| 89 | 978 | Real Madrid | Pass | 85 | [96.8, 73.1] | [75.4, 74.5] | NaN | Toni Kroos |
| 90 | 1026 | Real Madrid | Pass | 91 | [120.0, 0.1] | [91.5, 8.1] | NaN | Toni Kroos |
| 91 | 1038 | Real Madrid | Pass | 92 | [56.9, 41.5] | [84.8, 71.3] | NaN | Toni Kroos |
92 rows × 8 columns
'Toni Kroos' from the match.'Toni Kroos' has been involved in 92 passes. We will later work out his pass success rate. But look at the number. Isn't he a brilliant midfielder that the German national team and the Real Madrid team have in their disposal? What a playmaker he is! Let us find out what were all his pass outcomes:print(events_pass_p1.pass_outcome.unique())
[nan 'Out' 'Incomplete' 'Pass Offside']
nan, 'Toni Kross' has Out, Incomplete and Pass Offside as pass outcomes. If we look closely the events_pass_p1 dataframe has the minute column which tells us at what minute the pass had started from Kroos's end. It also has the location and the pass_end_location columns informing us about the coordinates of Kroos when he pass the ball and the coordinates of where the ball ended after the pass (successful or unsuccessful). Let us manipulate the pass_outcome column by replacing all the nan values with 'successful' with the help of fillna() function provided by pandas. This will teach us the simplest way to handle nan values.events_pass_p1['pass_outcome'] = events_pass_p1['pass_outcome'].fillna('Successful')
events_pass_p1
| index | team | type | minute | location | pass_end_location | pass_outcome | player | |
|---|---|---|---|---|---|---|---|---|
| 0 | 19 | Real Madrid | Pass | 0 | [48.8, 13.9] | [36.1, 56.3] | Successful | Toni Kroos |
| 1 | 28 | Real Madrid | Pass | 1 | [23.4, 18.6] | [14.9, 26.8] | Successful | Toni Kroos |
| 2 | 79 | Real Madrid | Pass | 5 | [35.0, 24.9] | [57.1, 6.6] | Successful | Toni Kroos |
| 3 | 85 | Real Madrid | Pass | 6 | [41.7, 21.7] | [43.2, 41.2] | Successful | Toni Kroos |
| 4 | 89 | Real Madrid | Pass | 6 | [50.6, 28.3] | [49.2, 5.5] | Successful | Toni Kroos |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 87 | 975 | Real Madrid | Pass | 85 | [120.0, 80.0] | [116.1, 76.6] | Successful | Toni Kroos |
| 88 | 976 | Real Madrid | Pass | 85 | [120.0, 80.0] | [115.9, 77.3] | Successful | Toni Kroos |
| 89 | 978 | Real Madrid | Pass | 85 | [96.8, 73.1] | [75.4, 74.5] | Successful | Toni Kroos |
| 90 | 1026 | Real Madrid | Pass | 91 | [120.0, 0.1] | [91.5, 8.1] | Successful | Toni Kroos |
| 91 | 1038 | Real Madrid | Pass | 92 | [56.9, 41.5] | [84.8, 71.3] | Successful | Toni Kroos |
92 rows × 8 columns
Loc = events_pass_p1['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['location_x', 'location_y'])
Loc_end = events_pass_p1['pass_end_location']
Loc_end = pd.DataFrame(Loc_end.to_list(), columns=['pass_end_location_x', 'pass_end_location_y'])
events_pass_p1['location_x'] = Loc['location_x']
events_pass_p1['location_y'] = Loc['location_y']
events_pass_p1['pass_end_location_x'] = Loc_end['pass_end_location_x']
events_pass_p1['pass_end_location_y'] = Loc_end['pass_end_location_y']
events_pass_p1 = events_pass_p1[['minute', 'location_x', 'location_y',
'pass_end_location_x', 'pass_end_location_y', 'pass_outcome']]
events_pass_p1.head(8)
| minute | location_x | location_y | pass_end_location_x | pass_end_location_y | pass_outcome | |
|---|---|---|---|---|---|---|
| 0 | 0 | 48.8 | 13.9 | 36.1 | 56.3 | Successful |
| 1 | 1 | 23.4 | 18.6 | 14.9 | 26.8 | Successful |
| 2 | 5 | 35.0 | 24.9 | 57.1 | 6.6 | Successful |
| 3 | 6 | 41.7 | 21.7 | 43.2 | 41.2 | Successful |
| 4 | 6 | 50.6 | 28.3 | 49.2 | 5.5 | Successful |
| 5 | 7 | 42.2 | 11.1 | 50.6 | 13.4 | Successful |
| 6 | 9 | 48.7 | 53.1 | 50.1 | 63.3 | Successful |
| 7 | 9 | 56.7 | 59.6 | 48.8 | 30.9 | Successful |
Toni Kroos on a football pitch and also visualize its corresponding heat map.pitch = Pitch(pitch_color = 'black', line_color = 'white', constrained_layout = True, tight_layout = False, goal_type = 'box')
fig, ax = pitch.draw()
# Heat map code
res = sns.kdeplot(events_pass_p1['location_x'], events_pass_p1['location_y'], fill = True,
thresh = 0.05, alpha = 0.5, levels = 10, cmap = 'Purples_d')
# Pass map code
for i in range(len(events_pass_p1)):
if events_pass_p1.pass_outcome[i] == 'Successful':
pitch.arrows(events_pass_p1.location_x[i], events_pass_p1.location_y[i], events_pass_p1.pass_end_location_x[i],
events_pass_p1.pass_end_location_y[i], ax=ax, color='green', width = 3)
pitch.scatter(events_pass_p1.location_x[i], events_pass_p1.location_y[i], ax = ax, color = 'green')
else:
pitch.arrows(events_pass_p1.location_x[i], events_pass_p1.location_y[i], events_pass_p1.pass_end_location_x[i],
events_pass_p1.pass_end_location_y[i], ax=ax, color='red', width=3)
pitch.scatter(events_pass_p1.location_x[i], events_pass_p1.location_y[i], ax = ax, color='red')
plt.title("Toni Kroos pass and heat map")
Text(0.5, 1.0, 'Toni Kroos pass and heat map')
kdeplot() function, the thresh value sets the lowest iso-proportion level at which the contour lines are to be drawn, levels sets the number of contour levels, fill sets whether to fill the area between the contours, the alpha sets the transparency of the plot (default value is 1, lesser than 1 means more transparent), and the cmap sets the color map. To study more about kdeplot() look here.'Toni Kroos' let us calculate the percentage of successful and unsuccessful passes.events_pass_p1['pass_outcome'].value_counts(normalize=True).mul(100)
Successful 91.304348 Incomplete 6.521739 Out 1.086957 Pass Offside 1.086957 Name: pass_outcome, dtype: float64
events_pass_p1['pass_outcome'].value_counts(normalize=True).mul(100).plot.bar()
<AxesSubplot:>
'Kroos' had created around 91.3% of successful passes. Wild!X then the convex hull is the smallest convex set that contains X. This will help us get an idea about the optimal field coverage of a player during the match.scipy package which provides us with a collection of modules for working on scientific computation with Python.scipy.spatial module that allows us to work with spatial algorithms and data structures. As we are going to work with convex hulls first, let us import the ConvexHull classes from scipy.spatial: from scipy.spatial import ConvexHull
events dataset:events_hull = events[['team', 'location', 'type', 'player']]
events_hull.head(10)
| team | location | type | player | |
|---|---|---|---|---|
| 0 | Real Madrid | NaN | Starting XI | NaN |
| 1 | Liverpool | NaN | Starting XI | NaN |
| 2 | Real Madrid | NaN | Half Start | NaN |
| 3 | Liverpool | NaN | Half Start | NaN |
| 4 | Liverpool | NaN | Half Start | NaN |
| 5 | Real Madrid | NaN | Half Start | NaN |
| 6 | Liverpool | [60.0, 40.0] | Pass | James Philip Milner |
| 7 | Liverpool | [35.0, 40.8] | Pass | Dejan Lovren |
| 8 | Real Madrid | [27.4, 60.2] | Pass | Raphaël Varane |
| 9 | Real Madrid | [35.3, 75.4] | Pass | Luka Modrić |
type to Pass or Shot.events_hull = events_hull[(events_hull['type'] == 'Pass') | (events_hull['type'] == 'Shot')].reset_index()
events_hull.head(10)
| index | team | location | type | player | |
|---|---|---|---|---|---|
| 0 | 6 | Liverpool | [60.0, 40.0] | Pass | James Philip Milner |
| 1 | 7 | Liverpool | [35.0, 40.8] | Pass | Dejan Lovren |
| 2 | 8 | Real Madrid | [27.4, 60.2] | Pass | Raphaël Varane |
| 3 | 9 | Real Madrid | [35.3, 75.4] | Pass | Luka Modrić |
| 4 | 10 | Real Madrid | [22.3, 76.6] | Pass | Daniel Carvajal Ramos |
| 5 | 11 | Real Madrid | [36.2, 75.3] | Pass | Carlos Henrique Casimiro |
| 6 | 12 | Liverpool | [76.5, 18.1] | Pass | Jordan Brian Henderson |
| 7 | 13 | Liverpool | [84.4, 10.0] | Pass | Sadio Mané |
| 8 | 14 | Liverpool | [91.6, 21.3] | Pass | Roberto Firmino Barbosa de Oliveira |
| 9 | 15 | Liverpool | [92.2, 50.9] | Pass | Mohamed Salah |
location column into location_x and location_y columns:Loc = events_hull['location']
Loc = pd.DataFrame(Loc.to_list(), columns=['location_x', 'location_y'])
events_hull['location_x'] = Loc['location_x']
events_hull['location_y'] = Loc['location_y']
events_hull.head(10)
| index | team | location | type | player | location_x | location_y | |
|---|---|---|---|---|---|---|---|
| 0 | 6 | Liverpool | [60.0, 40.0] | Pass | James Philip Milner | 60.0 | 40.0 |
| 1 | 7 | Liverpool | [35.0, 40.8] | Pass | Dejan Lovren | 35.0 | 40.8 |
| 2 | 8 | Real Madrid | [27.4, 60.2] | Pass | Raphaël Varane | 27.4 | 60.2 |
| 3 | 9 | Real Madrid | [35.3, 75.4] | Pass | Luka Modrić | 35.3 | 75.4 |
| 4 | 10 | Real Madrid | [22.3, 76.6] | Pass | Daniel Carvajal Ramos | 22.3 | 76.6 |
| 5 | 11 | Real Madrid | [36.2, 75.3] | Pass | Carlos Henrique Casimiro | 36.2 | 75.3 |
| 6 | 12 | Liverpool | [76.5, 18.1] | Pass | Jordan Brian Henderson | 76.5 | 18.1 |
| 7 | 13 | Liverpool | [84.4, 10.0] | Pass | Sadio Mané | 84.4 | 10.0 |
| 8 | 14 | Liverpool | [91.6, 21.3] | Pass | Roberto Firmino Barbosa de Oliveira | 91.6 | 21.3 |
| 9 | 15 | Liverpool | [92.2, 50.9] | Pass | Mohamed Salah | 92.2 | 50.9 |
location column:events_hull = events_hull[['team', 'type', 'player', 'location_x', 'location_y']]
events_hull.head(10)
| team | type | player | location_x | location_y | |
|---|---|---|---|---|---|
| 0 | Liverpool | Pass | James Philip Milner | 60.0 | 40.0 |
| 1 | Liverpool | Pass | Dejan Lovren | 35.0 | 40.8 |
| 2 | Real Madrid | Pass | Raphaël Varane | 27.4 | 60.2 |
| 3 | Real Madrid | Pass | Luka Modrić | 35.3 | 75.4 |
| 4 | Real Madrid | Pass | Daniel Carvajal Ramos | 22.3 | 76.6 |
| 5 | Real Madrid | Pass | Carlos Henrique Casimiro | 36.2 | 75.3 |
| 6 | Liverpool | Pass | Jordan Brian Henderson | 76.5 | 18.1 |
| 7 | Liverpool | Pass | Sadio Mané | 84.4 | 10.0 |
| 8 | Liverpool | Pass | Roberto Firmino Barbosa de Oliveira | 91.6 | 21.3 |
| 9 | Liverpool | Pass | Mohamed Salah | 92.2 | 50.9 |
Real Madrid and the other for Liverpool:events_hull_Real = events_hull[events_hull['team'] == 'Real Madrid'].reset_index()
events_hull_Liv = events_hull[events_hull['team'] == 'Liverpool'].reset_index()
events_hull_Real.head(5)
| index | team | type | player | location_x | location_y | |
|---|---|---|---|---|---|---|
| 0 | 2 | Real Madrid | Pass | Raphaël Varane | 27.4 | 60.2 |
| 1 | 3 | Real Madrid | Pass | Luka Modrić | 35.3 | 75.4 |
| 2 | 4 | Real Madrid | Pass | Daniel Carvajal Ramos | 22.3 | 76.6 |
| 3 | 5 | Real Madrid | Pass | Carlos Henrique Casimiro | 36.2 | 75.3 |
| 4 | 10 | Real Madrid | Pass | Sergio Ramos García | 14.7 | 23.2 |
events_hull_Liv.head(5)
| index | team | type | player | location_x | location_y | |
|---|---|---|---|---|---|---|
| 0 | 0 | Liverpool | Pass | James Philip Milner | 60.0 | 40.0 |
| 1 | 1 | Liverpool | Pass | Dejan Lovren | 35.0 | 40.8 |
| 2 | 6 | Liverpool | Pass | Jordan Brian Henderson | 76.5 | 18.1 |
| 3 | 7 | Liverpool | Pass | Sadio Mané | 84.4 | 10.0 |
| 4 | 8 | Liverpool | Pass | Roberto Firmino Barbosa de Oliveira | 91.6 | 21.3 |
players_Real = events_hull_Real.player.unique()
players_Liv = events_hull_Liv.player.unique()
print(players_Real)
print(players_Liv)
['Raphaël Varane' 'Luka Modrić' 'Daniel Carvajal Ramos' 'Carlos Henrique Casimiro' 'Sergio Ramos García' 'Marcelo Vieira da Silva Júnior' 'Toni Kroos' 'Cristiano Ronaldo dos Santos Aveiro' 'Karim Benzema' 'Keylor Navas Gamboa' 'Francisco Román Alarcón Suárez' 'José Ignacio Fernández Iglesias' 'Gareth Frank Bale' 'Marco Asensio Willemsen'] ['James Philip Milner' 'Dejan Lovren' 'Jordan Brian Henderson' 'Sadio Mané' 'Roberto Firmino Barbosa de Oliveira' 'Mohamed Salah' 'Trent Alexander-Arnold' 'Virgil van Dijk' 'Andrew Robertson' 'Georginio Wijnaldum' 'Loris Karius' 'Adam David Lallana' 'Emre Can']
events_hull_Real.events_hull_Toni = events_hull_Real[events_hull_Real['player'] == 'Toni Kroos']
events_hull_Toni
| index | team | type | player | location_x | location_y | |
|---|---|---|---|---|---|---|
| 7 | 13 | Real Madrid | Pass | Toni Kroos | 48.8 | 13.9 |
| 15 | 22 | Real Madrid | Pass | Toni Kroos | 23.4 | 18.6 |
| 30 | 73 | Real Madrid | Pass | Toni Kroos | 35.0 | 24.9 |
| 36 | 79 | Real Madrid | Pass | Toni Kroos | 41.7 | 21.7 |
| 40 | 83 | Real Madrid | Pass | Toni Kroos | 50.6 | 28.3 |
| ... | ... | ... | ... | ... | ... | ... |
| 638 | 969 | Real Madrid | Pass | Toni Kroos | 120.0 | 80.0 |
| 639 | 970 | Real Madrid | Pass | Toni Kroos | 120.0 | 80.0 |
| 641 | 972 | Real Madrid | Pass | Toni Kroos | 96.8 | 73.1 |
| 666 | 1020 | Real Madrid | Pass | Toni Kroos | 120.0 | 0.1 |
| 672 | 1032 | Real Madrid | Pass | Toni Kroos | 56.9 | 41.5 |
92 rows × 6 columns
location_x and location_y from events_hull_Toni and then compute the upper and lower bounds of the data. Any points lying beyond these bounds, i.e any point lying above the lower bound and any point lying below the upper bound, are decided to be outliers and are discarded. We use box plots and whisker plots to visualize the interquartile range for the datapoints: e_box = pd.DataFrame(data = events_hull_Toni, columns = ["location_x", "location_y"])
boxplot = sns.boxplot(x = "variable", y ="value", data=pd.melt(e_box),
order = ["location_x", "location_y"])
boxplot = sns.stripplot(x = "variable", y = "value", data = pd.melt(e_box), marker="o",
color="red", order = ["location_x", "location_y"])
boxplot.axes.set_title("Boxplot for Toni Kroos's location conditions")
plt.show()
Q1 = np.percentile(events_hull_Toni['location_x'], 25, interpolation='midpoint')
Q3 = np.percentile(events_hull_Toni['location_x'], 75, interpolation='midpoint')
IQR_x = Q3 - Q1
minimum_x = Q1 - 1.5*IQR_x
maximum_x = Q3 + 1.5*IQR_x
Q1, Q3, IQR_x, minimum_x, maximum_x
(47.400000000000006, 67.85, 20.44999999999999, 16.725000000000023, 98.52499999999998)
Q1 = np.percentile(events_hull_Toni['location_y'], 25, interpolation='midpoint')
Q3 = np.percentile(events_hull_Toni['location_y'], 75, interpolation='midpoint')
IQR_y = Q3 - Q1
minimum_y = Q1 - 1.5*IQR_y
maximum_y = Q3 + 1.5*IQR_y
Q1, Q3, IQR_y, minimum_y, maximum_y
(15.0, 41.8, 26.799999999999997, -25.199999999999996, 82.0)
upper = np.where((events_hull_Toni['location_x'] >= maximum_x) & (events_hull_Toni['location_y'] >= maximum_y))
lower = np.where((events_hull_Toni['location_x'] <= minimum_x) & (events_hull_Toni['location_y'] <= minimum_y))
events_hull_Toni.drop(upper[0], inplace = True)
events_hull_Toni.drop(lower[0], inplace = True)
events_hull_Toni dataset:events_hull_Toni = events_hull_Toni.reset_index()
events_hull_Toni = events_hull_Toni[['team', 'type', 'player', 'location_x', 'location_y']]
events_hull_Toni.head(10)
| team | type | player | location_x | location_y | |
|---|---|---|---|---|---|
| 0 | Real Madrid | Pass | Toni Kroos | 48.8 | 13.9 |
| 1 | Real Madrid | Pass | Toni Kroos | 23.4 | 18.6 |
| 2 | Real Madrid | Pass | Toni Kroos | 35.0 | 24.9 |
| 3 | Real Madrid | Pass | Toni Kroos | 41.7 | 21.7 |
| 4 | Real Madrid | Pass | Toni Kroos | 50.6 | 28.3 |
| 5 | Real Madrid | Pass | Toni Kroos | 42.2 | 11.1 |
| 6 | Real Madrid | Pass | Toni Kroos | 48.7 | 53.1 |
| 7 | Real Madrid | Pass | Toni Kroos | 56.7 | 59.6 |
| 8 | Real Madrid | Pass | Toni Kroos | 56.4 | 15.2 |
| 9 | Real Madrid | Pass | Toni Kroos | 42.9 | 9.4 |
points_hull = events_hull_Toni[['location_x', 'location_y']].values
ConvexHull() function from scipy.spatial:convex_hull_Toni = ConvexHull(events_hull_Toni[['location_x', 'location_y']])
vertices attribute consists of the indices of the points in points_hull that make up the convex hull, and the simplices attribute too consists of the indices of the points in points_hull. The simplices are a list of 1-D simplices of a particular length, representing line segments in 2-D. Let us print the indices:print(convex_hull_Toni.vertices)
[50 41 55 75 84 1 67 51]
print(convex_hull_Toni.simplices)
[[50 41] [67 1] [84 1] [84 75] [55 41] [55 75] [51 50] [51 67]]
pitch = Pitch(pitch_color='grass', stripe = True, line_color='black', goal_type='box',
constrained_layout=True, tight_layout=False)
fig, ax = pitch.draw()
plt.scatter(events_hull_Toni.location_x, events_hull_Toni.location_y, color='white')
for i in convex_hull_Toni.simplices:
plt.plot(points_hull[i, 0], points_hull[i, 1], 'black')
plt.fill(points_hull[convex_hull_Toni.vertices, 0], points_hull[convex_hull_Toni.vertices, 1],
c='grey', alpha=0.1)
plt.title("Convex Hull for Toni Kroos's field coverage against Liverpool")
Text(0.5, 1.0, "Convex Hull for Toni Kroos's field coverage against Liverpool")
So, we have been able to compute and visualize the convex hulls for players from a particular game. Next, we will try to understand how to get tracking data from a particular game using statsbomb api. We need tracking data to compute Delaunay triangulations and Voronoi diagrams.
The match id that we have been working with is 18245.
We need to first import useful classes from the mplsoccer.statsbomb module:
from mplsoccer.statsbomb import read_event, EVENT_SLUG
event_json = read_event(f'{EVENT_SLUG}/18245.json', related_event_df = False,
tactics_lineup_df = False, warn = False)
event = event_json['event']
tracking = event_json['shot_freeze_frame']
event and tracking datasets:event.head(5)
| match_id | id | index | period | timestamp_minute | timestamp_second | timestamp_millisecond | minute | second | type_id | ... | injury_stoppage_in_chain | shot_statsbomb_xg | shot_key_pass_id | shot_first_time | shot_one_on_one | shot_redirect | substitution_replacement_id | substitution_replacement_name | tactics_formation | aerial_won | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 18245 | 5eee3ffd-f0c0-4532-868b-4a66cbf20cb8 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 35 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 41212.0 | NaN |
| 1 | 18245 | eaa65a92-02d3-4375-b2b7-7c2f679a620c | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 35 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 433.0 | NaN |
| 2 | 18245 | 9c82d2e5-ebba-4825-b7f9-b11b04433ed8 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 18245 | b791047a-3eea-452f-b3a9-212bd40cd7cb | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 18 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 18245 | 25be91a5-a084-42cb-8cc1-a0fe7b0f52f9 | 5 | 1 | 0 | 0 | 371 | 0 | 0 | 30 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 77 columns
event.tail(5)
| match_id | id | index | period | timestamp_minute | timestamp_second | timestamp_millisecond | minute | second | type_id | ... | injury_stoppage_in_chain | shot_statsbomb_xg | shot_key_pass_id | shot_first_time | shot_one_on_one | shot_redirect | substitution_replacement_id | substitution_replacement_name | tactics_formation | aerial_won | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3492 | 18245 | b4258521-d4ec-466d-a90c-e4522692a45b | 3493 | 2 | 47 | 30 | 959 | 92 | 30 | 30 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3493 | 18245 | 37f51448-ebd1-4d67-8d9e-fa4b450111b2 | 3494 | 2 | 47 | 33 | 52 | 92 | 33 | 42 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3494 | 18245 | e9f7bb50-f4fc-45aa-87d3-20bbe9ebd32f | 3495 | 2 | 47 | 39 | 157 | 92 | 39 | 40 | ... | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3495 | 18245 | ce7d446a-e8bf-4631-bcf5-2bd323ba251e | 3496 | 2 | 48 | 2 | 893 | 93 | 2 | 34 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3496 | 18245 | d19b2348-de55-4bbf-9b1f-e44d95aa3a77 | 3497 | 2 | 48 | 2 | 893 | 93 | 2 | 34 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 77 columns
tracking.head(5)
| id | event_freeze_id | player_teammate | player_id | player_name | player_position_id | player_position_name | x | y | match_id | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 682270cc-4bc4-4952-8f91-d3c5a704a691 | 1 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 98.0 | 48.4 | 18245 |
| 1 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 1 | True | 3535 | Roberto Firmino Barbosa de Oliveira | 23 | Center Forward | 109.0 | 39.9 | 18245 |
| 2 | 399ac143-5f7b-4080-8c0b-3c18435d7fc1 | 1 | True | 3655 | Andrew Robertson | 6 | Left Back | 102.1 | 2.5 | 18245 |
| 3 | 660d9d98-46b6-4b5e-9c9a-435d63142c93 | 1 | True | 4926 | Francisco Román Alarcón Suárez | 19 | Center Attacking Midfield | 100.2 | 11.0 | 18245 |
| 4 | fe6c7f60-2ff0-4077-882e-b045c8abc7c3 | 1 | True | 3629 | Sadio Mané | 21 | Left Wing | 90.9 | 32.3 | 18245 |
tracking.tail(5)
| id | event_freeze_id | player_teammate | player_id | player_name | player_position_id | player_position_name | x | y | match_id | |
|---|---|---|---|---|---|---|---|---|---|---|
| 356 | 18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80 | 16 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 99.9 | 19.0 | 18245 |
| 357 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 17 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 99.2 | 50.3 | 18245 |
| 358 | 18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80 | 17 | False | 5201 | Sergio Ramos García | 5 | Left Center Back | 114.1 | 42.9 | 18245 |
| 359 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 18 | False | 5574 | Toni Kroos | 15 | Left Center Midfield | 102.7 | 37.0 | 18245 |
| 360 | 18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80 | 18 | False | 5485 | Raphaël Varane | 3 | Right Center Back | 114.4 | 37.3 | 18245 |
event and tracking, we understand that, the former represents the event data and the later represents the tracking data. Let us look into the columns of the tracking dataset:print(tracking.columns)
Index(['id', 'event_freeze_id', 'player_teammate', 'player_id', 'player_name',
'player_position_id', 'player_position_name', 'x', 'y', 'match_id'],
dtype='object')
tracking dataset, we understand that the column id represents an unique id for a shot freeze frame, i.e, it gives the unique id for the moment when a particular player was taking a shot along with the information about locations of the other players. Looking at the player_name column, we need to add a column team to the tracking dataset, giving us information about which team the shot taker belongs to.tracking['team'] = 0
for i in range(len(tracking)):
if tracking['player_name'][i] in players_Real:
tracking['team'][i] = 'Real Madrid'
else:
tracking['team'][i] = 'Liverpool'
tracking.head(5)
| id | event_freeze_id | player_teammate | player_id | player_name | player_position_id | player_position_name | x | y | match_id | team | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 682270cc-4bc4-4952-8f91-d3c5a704a691 | 1 | False | 5463 | Luka Modrić | 13 | Right Center Midfield | 98.0 | 48.4 | 18245 | Real Madrid |
| 1 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | 1 | True | 3535 | Roberto Firmino Barbosa de Oliveira | 23 | Center Forward | 109.0 | 39.9 | 18245 | Liverpool |
| 2 | 399ac143-5f7b-4080-8c0b-3c18435d7fc1 | 1 | True | 3655 | Andrew Robertson | 6 | Left Back | 102.1 | 2.5 | 18245 | Liverpool |
| 3 | 660d9d98-46b6-4b5e-9c9a-435d63142c93 | 1 | True | 4926 | Francisco Román Alarcón Suárez | 19 | Center Attacking Midfield | 100.2 | 11.0 | 18245 | Real Madrid |
| 4 | fe6c7f60-2ff0-4077-882e-b045c8abc7c3 | 1 | True | 3629 | Sadio Mané | 21 | Left Wing | 90.9 | 32.3 | 18245 | Liverpool |
tracking = tracking[['id', 'player_name', 'x', 'y', 'team']]
tracking.head(5)
| id | player_name | x | y | team | |
|---|---|---|---|---|---|
| 0 | 682270cc-4bc4-4952-8f91-d3c5a704a691 | Luka Modrić | 98.0 | 48.4 | Real Madrid |
| 1 | 9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9 | Roberto Firmino Barbosa de Oliveira | 109.0 | 39.9 | Liverpool |
| 2 | 399ac143-5f7b-4080-8c0b-3c18435d7fc1 | Andrew Robertson | 102.1 | 2.5 | Liverpool |
| 3 | 660d9d98-46b6-4b5e-9c9a-435d63142c93 | Francisco Román Alarcón Suárez | 100.2 | 11.0 | Real Madrid |
| 4 | fe6c7f60-2ff0-4077-882e-b045c8abc7c3 | Sadio Mané | 90.9 | 32.3 | Liverpool |
player_info = sb.lineups(match_id = 18245)
credentials were not supplied. open data access only
player_info has information about both the teams. Let us fetch for Real Madrid first:info_Real = player_info['Real Madrid']
info_Real
| player_id | player_name | player_nickname | jersey_number | country | |
|---|---|---|---|---|---|
| 0 | 4926 | Francisco Román Alarcón Suárez | Isco | 22 | Spain |
| 1 | 5200 | Lucas Vázquez Iglesias | Lucas Vázquez | 17 | Spain |
| 2 | 5201 | Sergio Ramos García | Sergio Ramos | 4 | Spain |
| 3 | 5202 | José Ignacio Fernández Iglesias | Nacho | 6 | Spain |
| 4 | 5207 | Cristiano Ronaldo dos Santos Aveiro | Cristiano Ronaldo | 7 | Portugal |
| 5 | 5456 | Mateo Kovačić | None | 23 | Croatia |
| 6 | 5463 | Luka Modrić | None | 10 | Croatia |
| 7 | 5485 | Raphaël Varane | None | 5 | France |
| 8 | 5539 | Carlos Henrique Casimiro | Casemiro | 14 | Brazil |
| 9 | 5552 | Marcelo Vieira da Silva Júnior | Marcelo | 12 | Brazil |
| 10 | 5574 | Toni Kroos | None | 8 | Germany |
| 11 | 5597 | Keylor Navas Gamboa | Keylor Navas | 1 | Costa Rica |
| 12 | 5719 | Marco Asensio Willemsen | Marco Asensio | 20 | Spain |
| 13 | 5721 | Daniel Carvajal Ramos | Daniel Carvajal | 2 | Spain |
| 14 | 6399 | Gareth Frank Bale | Gareth Bale | 11 | Wales |
| 15 | 6704 | Theo Bernard François Hernández | Theo Hernández | 15 | France |
| 16 | 6706 | Francisco Casilla Cortés | Kiko Casilla | 13 | Spain |
| 17 | 19677 | Karim Benzema | None | 9 | France |
player_name and jersey_number columns and build a dictionary:info_Real = info_Real[['player_name', 'jersey_number']]
jerseys_Real = {}
for i in range(len(info_Real)):
jerseys_Real[info_Real.player_name[i]] = str(info_Real.jersey_number[i])
print(jerseys_Real)
{'Francisco Román Alarcón Suárez': '22', 'Lucas Vázquez Iglesias': '17', 'Sergio Ramos García': '4', 'José Ignacio Fernández Iglesias': '6', 'Cristiano Ronaldo dos Santos Aveiro': '7', 'Mateo Kovačić': '23', 'Luka Modrić': '10', 'Raphaël Varane': '5', 'Carlos Henrique Casimiro': '14', 'Marcelo Vieira da Silva Júnior': '12', 'Toni Kroos': '8', 'Keylor Navas Gamboa': '1', 'Marco Asensio Willemsen': '20', 'Daniel Carvajal Ramos': '2', 'Gareth Frank Bale': '11', 'Theo Bernard François Hernández': '15', 'Francisco Casilla Cortés': '13', 'Karim Benzema': '9'}
Liverpool:info_Liv = player_info['Liverpool']
info_Liv = info_Liv[['player_name', 'jersey_number']]
jerseys_Liv = {}
for i in range(len(info_Liv)):
jerseys_Liv[info_Liv.player_name[i]] = str(info_Liv.jersey_number[i])
print(jerseys_Liv)
{'Dejan Lovren': '6', 'James Philip Milner': '7', 'Emre Can': '23', 'Alberto Moreno Pérez': '18', 'Mohamed Salah': '11', 'Jordan Brian Henderson': '14', 'Roberto Firmino Barbosa de Oliveira': '9', 'Simon Mignolet': '22', 'Georginio Wijnaldum': '5', 'Dominic Solanke': '29', 'Sadio Mané': '19', 'Loris Karius': '1', 'Andrew Robertson': '26', 'Trent Alexander-Arnold': '66', 'Virgil van Dijk': '4', 'Adam David Lallana': '20', 'Ragnar Klavan': '17', 'Nathaniel Edwin Clyne': '2'}
id from the tracking dataset, representing an instance when a particular shot was taken. We will filter tracking by a id value which will give us the information of the locations of the players on the pitch at that moment. We can view the unique id values:tracking.id.unique()
array(['682270cc-4bc4-4952-8f91-d3c5a704a691',
'9f5aa3eb-3bed-4bc0-97a5-bb8444b235b9',
'399ac143-5f7b-4080-8c0b-3c18435d7fc1',
'660d9d98-46b6-4b5e-9c9a-435d63142c93',
'fe6c7f60-2ff0-4077-882e-b045c8abc7c3',
'eda7e108-2479-46f2-9cd0-a0bc2939e352',
'c36dfe04-2f8e-48f0-8df6-1c4d0b93a16e',
'3e93f456-9971-4a33-9b10-ee9961410a32',
'9def9ed2-52f0-496b-8ae8-f4c5a97c2d8a',
'20b934f1-9afa-401d-9a16-f97fea2b80d9',
'6711367a-6855-4914-903e-a5e19771429c',
'e8c20962-0eef-4066-97ce-dcaad4f70b52',
'02f0755f-76cf-4d30-8062-369dc9509bdd',
'6cb4171b-90e6-4473-831e-df7a2da29f28',
'93c40040-ab9a-4549-8f0e-46c5c1c8e9cd',
'142e18c8-316a-4f9f-a0f8-3c41549ad1c3',
'6f994944-70fc-4a30-acca-315e3fede0bb',
'7654fe57-734f-45d8-bc83-ab940cd37c45',
'30a872eb-fe88-4c46-858b-a4f487cb69e4',
'53b73ee0-8c9c-4b64-83c5-69fc453376a1',
'804f8c8e-d714-4e6a-9cd1-599665efb8c8',
'36687201-f131-4418-9dd0-f632bc9c4257',
'650a2dc2-e5bb-4fac-9259-afbc03bdc322',
'312f9c86-6a3c-42b1-bdeb-f92cb1b16a48',
'222c90b6-8293-409a-ac6d-e2c3c2e69948',
'c7f3935c-23fa-4ddc-a6ee-eb9d0972d034',
'05688a6e-37f8-4aa6-a36e-d8151aa75997',
'18f64bd1-c8a9-4f31-9e58-3ec7a1de0a80'], dtype=object)
shot_id = '3e93f456-9971-4a33-9b10-ee9961410a32' # select a particular value from the id column
tracking_filtered = tracking[tracking['id'] == shot_id] # filter by the shot_id
event_filtered = event[event['id'] == shot_id]
event_filtered = event_filtered[['id', 'player_name', 'x', 'y', 'team_name']]
event_filtered = event_filtered.rename(columns = {'team_name':'team'})
data_filtered = pd.concat([event_filtered, tracking_filtered])
data_filtered dataset looks like this:data_filtered
| id | player_name | x | y | team | |
|---|---|---|---|---|---|
| 747 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Cristiano Ronaldo dos Santos Aveiro | 111.7 | 58.7 | Real Madrid |
| 7 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Loris Karius | 118.1 | 45.0 | Liverpool |
| 35 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Roberto Firmino Barbosa de Oliveira | 100.8 | 49.0 | Liverpool |
| 63 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Daniel Carvajal Ramos | 100.9 | 50.2 | Real Madrid |
| 91 | 3e93f456-9971-4a33-9b10-ee9961410a32 | James Philip Milner | 91.3 | 28.4 | Liverpool |
| 119 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Karim Benzema | 108.9 | 37.9 | Real Madrid |
| 147 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Georginio Wijnaldum | 105.7 | 56.5 | Liverpool |
| 175 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Jordan Brian Henderson | 108.0 | 50.0 | Liverpool |
| 202 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Virgil van Dijk | 111.7 | 54.7 | Liverpool |
| 228 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Trent Alexander-Arnold | 105.2 | 35.3 | Liverpool |
| 254 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Dejan Lovren | 111.8 | 41.1 | Liverpool |
| 280 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Toni Kroos | 91.0 | 30.3 | Real Madrid |
| 304 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Francisco Román Alarcón Suárez | 102.4 | 40.6 | Real Madrid |
X consisting of points on a 2-D Euclidean surface, a Delaunay triangulation is a type of geometric triangulation such that no points in X lies inside the circum-circle of any triangle in the triangulation. A representation of the Delaunay triangle from the same wikipedia article:
Delaunay from scipy.spatial to compute the triangulation:from scipy.spatial import Delaunay
data_filtered for the teams:tracking_Real = data_filtered[data_filtered['team'] == 'Real Madrid'].reset_index()
tracking_Liv = data_filtered[data_filtered['team'] == 'Liverpool'].reset_index()
tracking_Real
| index | id | player_name | x | y | team | |
|---|---|---|---|---|---|---|
| 0 | 747 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Cristiano Ronaldo dos Santos Aveiro | 111.7 | 58.7 | Real Madrid |
| 1 | 63 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Daniel Carvajal Ramos | 100.9 | 50.2 | Real Madrid |
| 2 | 119 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Karim Benzema | 108.9 | 37.9 | Real Madrid |
| 3 | 280 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Toni Kroos | 91.0 | 30.3 | Real Madrid |
| 4 | 304 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Francisco Román Alarcón Suárez | 102.4 | 40.6 | Real Madrid |
tracking_Liv
| index | id | player_name | x | y | team | |
|---|---|---|---|---|---|---|
| 0 | 7 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Loris Karius | 118.1 | 45.0 | Liverpool |
| 1 | 35 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Roberto Firmino Barbosa de Oliveira | 100.8 | 49.0 | Liverpool |
| 2 | 91 | 3e93f456-9971-4a33-9b10-ee9961410a32 | James Philip Milner | 91.3 | 28.4 | Liverpool |
| 3 | 147 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Georginio Wijnaldum | 105.7 | 56.5 | Liverpool |
| 4 | 175 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Jordan Brian Henderson | 108.0 | 50.0 | Liverpool |
| 5 | 202 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Virgil van Dijk | 111.7 | 54.7 | Liverpool |
| 6 | 228 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Trent Alexander-Arnold | 105.2 | 35.3 | Liverpool |
| 7 | 254 | 3e93f456-9971-4a33-9b10-ee9961410a32 | Dejan Lovren | 111.8 | 41.1 | Liverpool |
points_Real = tracking_Real[['x', 'y']].values
print(points_Real)
[[111.7 58.7] [100.9 50.2] [108.9 37.9] [ 91. 30.3] [102.4 40.6]]
del_Real = Delaunay(tracking_Real[['x', 'y']])
loc_Real = tracking_Real[['player_name','x', 'y']].reset_index()
loc_Liv = tracking_Liv[['player_name','x', 'y']].reset_index()
loc_Real
| index | player_name | x | y | |
|---|---|---|---|---|
| 0 | 0 | Cristiano Ronaldo dos Santos Aveiro | 111.7 | 58.7 |
| 1 | 1 | Daniel Carvajal Ramos | 100.9 | 50.2 |
| 2 | 2 | Karim Benzema | 108.9 | 37.9 |
| 3 | 3 | Toni Kroos | 91.0 | 30.3 |
| 4 | 4 | Francisco Román Alarcón Suárez | 102.4 | 40.6 |
loc_Liv
| index | player_name | x | y | |
|---|---|---|---|---|
| 0 | 0 | Loris Karius | 118.1 | 45.0 |
| 1 | 1 | Roberto Firmino Barbosa de Oliveira | 100.8 | 49.0 |
| 2 | 2 | James Philip Milner | 91.3 | 28.4 |
| 3 | 3 | Georginio Wijnaldum | 105.7 | 56.5 |
| 4 | 4 | Jordan Brian Henderson | 108.0 | 50.0 |
| 5 | 5 | Virgil van Dijk | 111.7 | 54.7 |
| 6 | 6 | Trent Alexander-Arnold | 105.2 | 35.3 |
| 7 | 7 | Dejan Lovren | 111.8 | 41.1 |
pitch = Pitch(pitch_color='grass', stripe=True, line_color='white', view = 'half', figsize=(8, 9),
constrained_layout=True, tight_layout=False, goal_type='box')
fig, ax = pitch.draw()
plt.scatter(tracking_Real.x, tracking_Real.y, color='white', s = 400, edgecolors='black', zorder=2)
plt.scatter(tracking_Liv.x, tracking_Liv.y, color='red', edgecolors='black', s = 400)
plt.triplot(points_Real[:, 0], points_Real[:, 1], del_Real.simplices.copy(), 'k-', lw = 4)
for index, row in loc_Real.iterrows():
pitch.annotate(jerseys_Real[loc_Real['player_name'][row.name]], xy=(row.x, row.y), c ='black',
va = 'center', ha = 'center', size = 14, ax = ax)
for index, row in loc_Liv.iterrows():
pitch.annotate(jerseys_Liv[loc_Liv['player_name'][row.name]], xy=(row.x, row.y), c ='black',
va = 'center', ha = 'center', size = 14, ax = ax)
Liverpool's players and the white nodes indicate that of Real Madrid's. The black lines indicate the direct links between the players from a particular team at a particular moment, forming the Delaunay triangulations, also called the pass triangulations. In his book Soccematics, Dr. Sumpter mentions that these lines have two useful indications: first, they portray the availability of passes among the players from a particular team, and second, they also indicate the "no man's lines" for the players from the opposition team, meaning, if an opposition player is on one of these linking lines, then they are at a disadvantage. Beautiful implementation of computational geometry, isn't it?X of points, denote the partitions of a 2-D Euclidean space into regions that are close to each of these points. X. Look at the image of a Voronoi diagram (taken from here), which is the dual of the Delaunay triangulation that is shown above.
data_filtered dataset, because we need the location of all the players on the pitch. Voronoi for computing the Voronoi diagrams and voronoi_plot_2d to plot the diagrams on a pitch.from scipy.spatial import Voronoi, voronoi_plot_2d
data_filtered and compute the Voronoi diagrams:data_filtered['y'] = 80 - data_filtered['y']
points = data_filtered[['x', 'y']].values
vor = Voronoi(points)
pitch = Pitch(pitch_color='grass', stripe=True, line_color='white', view = 'half', figsize=(8,9),
constrained_layout=True, tight_layout=False, goal_type='box')
fig, ax = pitch.draw()
plt.scatter(tracking_Real.x, 80 - tracking_Real.y, color='white', s = 1050, edgecolors='black', zorder=2)
plt.scatter(tracking_Liv.x, 80 -tracking_Liv.y, color='red', edgecolors='black', s = 1050)
pl = voronoi_plot_2d(vor, ax=ax, show_vertices=False, line_width = 8)
for index, row in loc_Real.iterrows():
pitch.annotate(jerseys_Real[loc_Real['player_name'][row.name]], xy=(row.x, 80 - row.y),
c ='black', va = 'center', ha = 'center', size = 15, ax = ax)
for index, row in loc_Liv.iterrows():
pitch.annotate(jerseys_Liv[loc_Liv['player_name'][row.name]], xy=(row.x, 80 - row.y),
c ='black', va = 'center', ha = 'center', size = 15, ax = ax)