Comparing Post Positions Using the Chi Square Distribution

In horse racing, at the start of a race, horses line-up in starting gates. The rack track assigns horses and their jockeys their starting gates in advance of a race. We examine the published data from a racetrack to determine if a starting gate is preferred over another. We compare the number of wins across the post positions for sprint races and long distance races using two chi-square distributions. Given the post positions and the number of wins for a given race track, it can be determined that the inner-most and outer-most track positions tend to be preferred over the center track positions.


Introduction
This paper answers the question, should a stakeholder in the outcome of a race prefer one post position over another?This paper examines post position data from the Charles Town Race Track (West Virginia) to determine if a person placing a wager would prefer one post position to another.The post position is the position of the stall in starting gate from which a horse starts.The post position of a racehorse on a flat track is numbered by its position relative to the inside barrier of the track.
We begin this analysis by assuming a reasonable distribution for the given data.The two data sets in Section 2 give the number of starting horses and the number of wins.The Bernoulli distribution seems reasonable for the given data sets.The question remains, do all of the post positions have the same distribution?
This paper presents a chi-square distribution for the wins at Charles Town Race Track for the post positions across all of their race tracks for the time January 1, 2014 to May 23, 2014.

The Data
On May 24, 2014 the Charles Town Race Track published the data in Table 1 for sprint races.The table contains the number of races, the number of wins, and the percentage number of wins.The percentage is simply a numeric calculation based on the previous two columns.For the percentage number of wins, Charles Town Races rounded their table entries to one decimal to the right of the decimal point.For analytic purposes, we need more precision.Four decimal places should do.Given the data in Tables 1 and 2, should a stakeholder in the outcome of a race prefer one post position to another?Stakeholders include jockeys, racehorse owners, and people placing a wager.

Wagers
Charles Town Races separates their races into two categories: 1) sprint races and 2) distance races.A person can place a wager on either type of race in the following ways:  Win ---Your horse must win.
 Place ---Your horse must finish first or second.
 Show ---Your horse must finish first, second, or third.
 Across the board ---Wager the same amount on one horse to win, place, and show.
 Exacta ---Pick the first two horses in exact order to finish.
 Trifecta ---Pick the first three horses in exact order to finish.
 Superfecta ---Pick the first four horses in exact order to finish.
 Daily Double ---Pick the winners of two consecutive races.
 Pick Three ---Pick the winners of three consecutive races.
 Pick Four ---Pick the winners of four consecutive races.
Given the above phrases, assume that the following phrases have the same meaning (excepting plurality).

 Must win
 Must finish first

 Pick the winners
Other phrases suggest ways that a person can win a wager.However, those are the most commonly used.A horse can only "win" the race by crossing the finish line first without, somehow, being disqualified.
Other factors can account for the outcome of a horse race.Those other factors include racetrack number, the jockey weight, type of horses racing, race track conditions, horse gender, and so on.

The Models
We define the Bernoulli random variable in Equation (1) as a starting basis for comparing the starting post positions of the horses. = { 0, If horse does not win a race.1, If horse wins a race. (1) We add the index i to denote the post position i = 1, 2, ..., 10 since these are ten Bernoulli random variables and ten binomial distributions each with their own expectations.
= { 0, If horse does not win a race from post position . 1, If horse wins a race from post position . (2) for the i th post position i=1, 2, ...,10.Tables 1 and 2 give the probabilities for each post position in the right-most column for the Bernoulli random variables in Equation ( 2).
Summing the Bernoulli random variables in Equation ( 2) gives ten binomial distributions for sprint races and ten binomial distributions for distance races.From this we can test to see if each of the distributions are the same or if one post position is preferred over another (vise-versa one post position is shunned over another).(Hogg and Tanis 1993, pages 511-521;Hogg and Craig 1995, pages 116-123) present a multinomial probability distribution.The multinomial distribution has the following assumptions:  The experiment has k possible outcomes that are mutually exclusive and exhaustive, say A 1 , A 2 , ..., A k .
 n independent trials of this experiment are observed.
 The random variable X i is equal to the number of times A i occurs in the n trials, i = 1, 2, ..., k.
Horses can run under multiple post-positions.This is because the given data is over a six-month period.Not that the same horse started in two or more different gates in the same race, it is the case that the same horse started in multiple gates in multiple races over the six-month period.
Since we cannot guarantee that the probabilities sum to 1, we rule out the multinomial distribution as a plausible model.We can still use a chi-square distribution to compare the binomial distributions.
We can model the data using the quadratic formula.First, we normalize the test statistic, and then square it.This gives a χ 2 (1) distribution.The quadratic theorem allows us to add r independent χ 2 (1) distributions by squaring the normalized test statistic.The quadratic theorem will be demonstrated next.

Sprint Races
Under a fair assignment of the starting post positions, we would expect the percentage of wins to be one-tenth for each post.However, this does not take into account the number of races run.A more accurate measure would take into account the number of races run.We obtain the fair, self-weights p '  i by dividing the number of starts for each post position i by the total of all start positions 4,609.
Under the null hypothesis, we have ten binomial distributions b H0 (X ' i , n i ,p' i ).Under the alternative hypothesis, we have ten binomial distributions b H1 (X i , n i ,p i ).In the statistical model, we wish to test the probabilities p' i under H 0 against those probabilities under H 1 .We calculate the probabilities p ' i under the null hypothesis H 0 as follow: The expected values under the first model are simply the fair, self-weights times the number of races.The expected values must be integers since they represent the number of wins or the number of horses that crossed the finish line first.Equation (3) gives the expected number of wins for each post position i under the fair, self-weighting model for sprint races.

𝐸(𝑋 𝑖
(3) To test that these ten binomial distributions have the same distribution, we use the chi-square distribution with 10 degrees of freedom.(Hogg and Craig 1995, page 249) discuss random sampling from a distribution that is binomial b(1, p).(Hogg and Craig 1995, page 481-485) discuss quadratic forms of random variables, the chi-square distribution and the degrees of freedom.We normalize ten binomial test statistics.Each normalized test statistic is approximately N(0,1).The distribution of the square of a normalized test statistic is χ 2 (1).The quadratic theorem allows us to add these ten statistics to obtain the distribution χ 2 (10).
where α is the desired significance level of the test.Reject the null hypothesis if Q 10 ≥ χ 2 α (10).We arbitrarily set α = 0.05.Using the data in Table 1, we test The critical value for the hypothesis test is χ 2 α=0.05 (10) = 18.3.Since Q 10 = 37.08 ≥ 18.3, we reject the null hypothesis.Post positions are significant in the sprint races.The next section will determine which post-positions a stakeholder prefers.

Sprint Race Post Position Analysis
Table 3 shows the individual chi-square tests for each post position.Which post positions are preferred?The cut-off value for a chi-square test with one degree of freedom is χ 2 0.05 (1) = 3.84.Horses starting from post positions 1, 7, 9, and 10 are preferred in sprint races.

Distance Races
We perform a similar analysis as in Section 5.The critical region for χ 2 α=0.05 (10) = 18.3.Since Q 10 = 6.77≤ 18.3, we accept the null hypothesis.Post positions are not important in distance races.

Concluding Remarks
We developed statistical models for both sprint horse races and distance horse races.The multinomial distribution could not be used to model the data because the required assumptions did not hold true.We used ten normalized binomial distributions to fit the data.Each normalized test statistic has a chi-square distribution with one degree of freedom.
The models show that for sprint races the innermost and outer-most track positions tend to be preferred over the middle track positions.For the distance races, track positions are not a significant variable towards winning.

Table 1 .
Sprint Race Data January 1 toMay 23, 2014Similarly, on May 24, 2014 the Charles Town Race Track published the data in Table2for distance races.The table contains the number of races, the number of wins, and the percentage number of wins.The percentage is simply a numeric calculation based on the previous two columns.For the percentage number of wins, Charles Town Races rounded their table entries to one decimal to the right of the decimal point.For analytic purposes, we need more precision.Four decimal places should do.

Table 3 .
Post Position Chi-Square Tests for Sprint Races

Table 4 .
Post Position Chi-Square Tests for Distance Races ) gives the test statistic for comparing the distributions for the long distance races of post positions.