Here we will analyze the team Attributes table. There are various fields which I have chosen to help me analyze the data.
# import Statements
import numpy as np
import pandas as pd
import sqlite3
import matplotlib.pyplot as plt
print("-------------------->> Imports successful")
#Setting up Path of the Database
path = "data/"
database = path + 'database.sqlite'
#Connection to Database
conn = sqlite3.connect(database)
print("-------------------->>Debug: Database connected")
#Command used to check tables in DB
# tables = pd.read_sql("""SELECT * FROM sqlite_sequence""",conn)
# tables
Here we check what data columns are present in the Team Attributes Table
24 in total here we will try to find relevant ones
#Checking Relevant Columns which are present
df = pd.read_sql("SELECT * from Team_Attributes ",conn)
for i, v in enumerate(df.columns):
print(i, v)
df.describe()
Below we have chosen the following fields for analysis for games won by the team
## This Query Selects the following Data,
## Home API ID, winner (calulated as a diffrence of home minus away team goals),
## and a few factors like PlaySpeed and Defence Stats
attr = pd.read_sql("""
SELECT
Match.home_team_api_id as home_id,
(home_team_goal - away_team_goal) as winner,
Team_Attributes.buildUpPlaySpeed as buildUp_PlaySpeed,
Team_Attributes.buildUpPlayDribbling as buildUp_PlayDribbling,
Team_Attributes.buildUpPlayPassing as buildUp_PlayPassing,
Team_Attributes.defenceAggression as defence_Aggression,
Team_Attributes.defenceTeamWidth as defence_team_width
from Match
JOIN Team_Attributes on Team_Attributes.team_api_id = home_team_api_id
WHERE winner >=1
""",conn)
Here we are plotting graphs for each of the above values.
graph1 =attr.sort_values(by="winner", ascending=False)
graph1.hist(column="buildUp_PlaySpeed");
graph1.hist(column="buildUp_PlayDribbling");
graph1.hist(column="buildUp_PlayPassing");
graph1.hist(column="defence_team_width");
graph1.hist(column="defence_Aggression");
print("Mean Play Speed Value for Winning Teams is",graph1['buildUp_PlaySpeed'].mean())
print("Mean Play Dribbling Value for Winning Teams is",graph1['buildUp_PlayDribbling'].mean())
print("Mean Play Passing Value for Winning Teams is",graph1['buildUp_PlayPassing'].mean())
print("Mean Team Width for Winning Teams is",graph1['defence_team_width'].mean())
print("Mean Defence Agression for Winning Teams is",graph1['defence_Aggression'].mean())
The above is the ideal mean values for winning teams. A team should strive to be within one standard deviation of the mean to have the chance of winning.