Running Analysis

Project - Date (16.10.19)

Using Python, running times for Mens 5,000 m (5k) and 10,000 m (10k) were compared. The plot aims to show the trend in results for these 2 events and, when the user's own times are plotted, provide an insight into whether their times' are broadly on track.

The plot is shown below with my own time indicated in red. Note that the below plot is interactive. Hover of points for more info, pan around, and turn on the magnifying glass to zoom in.

Bokeh Plot

The Plot

Mens (5k) and (10k) race data was taken from thepowerof10.info and shows British Athletics results for 2019. Only athletes with times in both databases can be plotted. Whilst the 5k database contains nearly 1,400 athletes the 10k database only has 435. So, between both databases there are only 334 athletes which can be plotted.

A line of best fit is then applied to the plot. Points in which 5k times were greater than the 10k÷2 were excluded from the linear regression. This is to filter anomalous results. For the above plot, the equation of the slope is:

T10k = 0.860 + 2.049 * T5k

The above equation can tried out on this Repl (online code)

Colour is then added to the plot based upon the 5k rankings with dark blue being the closest to rank 1 and yellow being closer to rank 1,400. Note that some athletes do not qualify for ranking and are grey. The red point is my times.

The Code

Full program here.

The program makes use of the following modules:

Beautifulsoup: The data for the (5k) and (10k) was not available as a csv or other. BeautifulSoup was used to scrape the data.

Pandas: Pandas was used to order the data into a table.

Datetime: The running time formats needed to be converted from strings into datetime objects so they could be plotted.

Bokeh: This was used to produce the interactive plot. More info available here.

Scipy: This was used to for the linear regression. More info available here.

Next Steps

The plot could be imporved in a number of ways including more data (the inclusion of more amateur runners would be good), more events, more categories (men/women/age), and the ability to put your own time in via a webpage. An interactive example of a Bokeh plot is here.