Home | All Chalktalks


The Team Race Protest Leaderboard

Author: SailRank

Date: 2024-04-01

Today we are gonna look at all things related to DSQs in team racing. The first team races uploaded to techscore appear in Fall 2012, so we have roughly 12 years of data to work with here!

SailRank Protest Probability Calculator

Feel free to also check out our NEW Protest Probability Calculator: The SailRank Protest Probability Calculator

The calculator takes inputs such as the rule infraction, venue, judges, and teams involved to give you a perfect estimate of what your odds are at winning a given protest!


Loading The Dataset

tr_data = fread("../seasoncsvs/tr_res.csv")
tr_data = tr_data %>% select(regatta_id, t1_school_id, t2_school_id, winning_combinations, losing_combinations, season)
tr_data %>% head() %>% datatable()

Find The DSQs

tr_data = tr_data %>% mutate(t1_dsq = str_count(winning_combinations, "DSQ"), t2_dsq = str_count(losing_combinations, "DSQ")) 
tr_data %>% filter(t2_dsq !=0) %>% head() %>% datatable()

All Time DSQs Leaderboard

t2_dsqs = tr_data %>%
  group_by(school_id = t2_school_id) %>%
  summarize(dsq_count = sum(t2_dsq), race_count = n())

t1_dsqs = tr_data %>%
  group_by(school_id = t1_school_id) %>%
  summarize(dsq_count = sum(t1_dsq), race_count = n())

school_dsqs_count =
  t2_dsqs %>% 
  full_join(t1_dsqs, by = "school_id", suffix = c("_t2", "_t1")) %>% 
  mutate(dsq_count = coalesce(dsq_count_t2, 0) + coalesce(dsq_count_t1, 0), race_count = coalesce(race_count_t2, 0) + coalesce(race_count_t1, 0)) %>% 
    select(school_id, dsq_count, race_count) %>% 
  distinct() %>% 
  mutate(dsqs_per_hundred = (as.double(dsq_count) / as.double(race_count)) * 100.0)

school_dsqs_count %>% arrange(desc(dsqs_per_hundred)) %>% filter(race_count > 100, dsq_count >= 1) %>% datatable()

URI leads the way with roughly 1.4 DSQs per 100 races! RWU has the highest total at 34.

summary(school_dsqs_count)
##   school_id           dsq_count        race_count     dsqs_per_hundred
##  Length:140         Min.   : 0.000   Min.   :   3.0   Min.   :0.0000  
##  Class :character   1st Qu.: 0.000   1st Qu.:  18.0   1st Qu.:0.0000  
##  Mode  :character   Median : 0.000   Median :  66.5   Median :0.0000  
##                     Mean   : 2.071   Mean   : 378.8   Mean   :0.3014  
##                     3rd Qu.: 1.000   3rd Qu.: 398.8   3rd Qu.:0.3229  
##                     Max.   :34.000   Max.   :2578.0   Max.   :5.7143
school_dsqs_count %>% filter(race_count > 100) %>% ggplot(aes(x=dsqs_per_hundred)) + 
  geom_density(color = "darkblue") +
  theme_minimal() +
  labs(title = "Density of DSQs per 100 Races", y = "Density", x = "DSQs per 100 Races", subtitle = "By Team Spring 2012 - Spring 2024")


Protest Form Frontrunners

We can also find out who is filling out all those protest forms. Here’s who has caused their opponents the most DSQs:

t1_forms = tr_data %>%
  group_by(school_id = t1_school_id) %>%
  summarize(caused_dsq_count = sum(t2_dsq), race_count = n()) %>%
    mutate(dsqs_caused_per_hundred = (as.double(caused_dsq_count) / as.double(race_count)) * 100.0)
t2_forms = tr_data %>%
  group_by(school_id = t2_school_id) %>%
  summarize(caused_dsq_count = sum(t1_dsq), race_count = n()) %>%
    mutate(dsqs_caused_per_hundred = (as.double(caused_dsq_count) / as.double(race_count)) * 100.0)
schools_dsqs_caused = t1_forms %>% 
  full_join(t2_forms, by = "school_id", suffix = c("_t2", "_t1")) %>% 
  mutate(caused_dsq_count = coalesce(caused_dsq_count_t2, 0) + coalesce(caused_dsq_count_t1, 0), race_count = coalesce(race_count_t2, 0) + coalesce(race_count_t1, 0)) %>%
  select(school_id, caused_dsq_count, race_count) %>% 
  distinct() %>% 
  mutate(dsqs_caused_per_hundred = (as.double(caused_dsq_count) / as.double(race_count)) * 100.0)

schools_dsqs_caused %>% arrange(desc(dsqs_caused_per_hundred)) %>% filter(race_count > 100, caused_dsq_count >= 1) %>% datatable()

When the Bears say protest they mean it. They take the top spot in both per hundred race categories and total!

summary(schools_dsqs_caused)
##   school_id         caused_dsq_count   race_count     dsqs_caused_per_hundred
##  Length:140         Min.   : 0.000   Min.   :   3.0   Min.   :0.0000         
##  Class :character   1st Qu.: 0.000   1st Qu.:  18.0   1st Qu.:0.0000         
##  Mode  :character   Median : 0.000   Median :  66.5   Median :0.0000         
##                     Mean   : 2.071   Mean   : 378.8   Mean   :0.2435         
##                     3rd Qu.: 2.000   3rd Qu.: 398.8   3rd Qu.:0.2844         
##                     Max.   :41.000   Max.   :2578.0   Max.   :5.4054
schools_dsqs_caused %>% filter(race_count > 100) %>% ggplot(aes(x=dsqs_caused_per_hundred)) + 
  geom_density(color = "darkblue") +
  theme_minimal() +
  labs(title = "Density of DSQs per 100 Races", y = "Density", x = "DSQs per 100 Races", subtitle = "By Team Spring 2012 - Spring 2024")


DSQs by Season

seasonal_dsqs = tr_data %>%
  group_by(season) %>%
  summarize(dsq_count = sum(t2_dsq) + sum(t1_dsq), race_count = n()) %>%
    mutate(dsqs_per_hundred = (as.double(dsq_count) / as.double(race_count)) * 100.0) 
seasonal_dsqs %>%
  arrange(desc(dsqs_per_hundred)) %>% datatable()
summary(seasonal_dsqs)
##     season            dsq_count       race_count   dsqs_per_hundred
##  Length:23          Min.   : 0.00   Min.   :  92   Min.   :0.0000  
##  Class :character   1st Qu.: 1.00   1st Qu.: 177   1st Qu.:0.5422  
##  Mode  :character   Median : 4.00   Median : 743   Median :0.9379  
##                     Mean   :12.61   Mean   :1153   Mean   :0.9416  
##                     3rd Qu.:20.00   3rd Qu.:2068   3rd Qu.:1.4007  
##                     Max.   :60.00   Max.   :3092   Max.   :2.9211
seasonal_dsqs %>% filter(grepl("s", season)) %>% ggplot(aes(x = season, y = dsqs_per_hundred)) +
  geom_line(aes(group = 1), color = "lightblue") +
  geom_point(color="darkblue", size = 2.0) +
  theme_minimal() +
  labs(title = "DSQs Per 100 Races By Season", y="DSQs Per 100 Races", x="Season", subtitle = "Spring 2013 - 2024")

Spring 2014 was certainly a season to remember with a nearly 53% increase over spring 2020 the next highest season on record. Spring 2021 remains the lowest incident rate on record with just 1 DSQ across the 743 team races sailed. A rare win for the COVID era.

Just for fun here’s what looking at just the fall team races yields:

seasonal_dsqs %>% filter(grepl("f", season)) %>% ggplot(aes(x = season, y = dsqs_per_hundred)) +
  geom_line(aes(group = 1), color = "lightblue") +
  geom_point(color="darkblue", size = 2.0) +
  theme_minimal() +
  labs(title = "DSQs Per 100 Races By Season", y="DSQs Per 100 Races", x="Season", subtitle = "Fall 2012 - 2023")

Warmer weather means less DSQs…


Most DSQs In A Single Season

t2_season_dsqs = tr_data %>%
  group_by(school_id = t2_school_id, season = season) %>%
  summarize(dsq_count = sum(t2_dsq), race_count = n()) %>%
  mutate(school_season_id = paste(school_id, season, sep = "_", collapse = NULL))

t1_season_dsqs = tr_data %>%
  group_by(school_id = t1_school_id, season = season) %>%
  summarize(dsq_count = sum(t1_dsq), race_count = n()) %>%
  mutate(school_season_id = paste(school_id, season, sep = "_", collapse = NULL))

school_season_dsqs_count =
  t2_season_dsqs %>% 
  full_join(t1_season_dsqs, by = "school_season_id", suffix = c("_t2", "_t1")) %>% 
  mutate(dsq_count = coalesce(dsq_count_t2, 0) + coalesce(dsq_count_t1, 0), race_count = coalesce(race_count_t2, 0) + coalesce(race_count_t1, 0)) %>% 
    select(school_season_id, dsq_count, race_count) %>% 
  distinct() %>% 
  mutate(dsqs_per_hundred = (as.double(dsq_count) / as.double(race_count)) * 100.0)

school_season_dsqs_count %>% arrange(desc(dsqs_per_hundred)) %>% filter(dsq_count > 1, race_count > 50) %>% datatable()

Does anyone know what was in the water on the BU campus that year?

school_season_dsqs_count %>% filter(race_count > 100) %>% ggplot(aes(x=dsqs_per_hundred)) + 
  geom_density(color = "darkblue") +
  theme_minimal() +
  labs(title = "Density of DSQs per 100 Races", y = "Density", x = "DSQs per 100 Races", subtitle = "By Team and Season Spring 2012 - Spring 2024")


The Pacifists

Some schools have managed to keep their records entirely clean across the entirety of 12 years.

school_dsqs_count %>% filter(dsq_count == 0) %>% arrange(desc(race_count)) %>% datatable()

A round of applause for the 92 schools that have yet to receive their first DSQ! Especially to Eckerd who are nearing 1000 races with no incidents.


Eye For An Eye

Alternatively titled Fight Fire with Fire, here are the 23 races in which both teams landed themselves a DSQ.

eyeforeyes = tr_data %>% filter(t1_dsq >= 1, t2_dsq >= 1)
eyeforeyes %>% arrange(desc(regatta_id)) %>% datatable()

Double Jeopardy

More than one DSQ for a team in one race??? It’s possible and has happened 11 times since fall 2012.

double_dsqs = tr_data %>% filter(t1_dsq > 1 | t2_dsq > 1) 
double_dsqs %>% arrange(desc(regatta_id)) %>% datatable()

Lost Their Rulebooks

We can also find the races with the most DSQs total!

tr_data %>% filter(t1_dsq + t2_dsq >= 2) %>% arrange(desc(t1_dsq + t2_dsq)) %>% datatable()

Shoutout to the Bears and Camels for holding the all time record with 4 DSQs in a single race in 2022! (It has been noted that there may have been a mistake when this was inputted in techscore)

Here’s a link to the rules for those of you who were involved in these: Racing Rules of Sailing


Thanks

Thanks for reading this quick little dive into the world of DSQs! Enjoy the rest of your 04/01 festivities!

Home | All Chalktalks