Today we are gonna look at all things related to DSQs in team racing. The first team races uploaded to techscore appear in Fall 2012, so we have roughly 12 years of data to work with here!
Feel free to also check out our NEW Protest Probability Calculator: The SailRank Protest Probability Calculator
The calculator takes inputs such as the rule infraction, venue, judges, and teams involved to give you a perfect estimate of what your odds are at winning a given protest!
tr_data = fread("../seasoncsvs/tr_res.csv")
tr_data = tr_data %>% select(regatta_id, t1_school_id, t2_school_id, winning_combinations, losing_combinations, season)
tr_data %>% head() %>% datatable()tr_data = tr_data %>% mutate(t1_dsq = str_count(winning_combinations, "DSQ"), t2_dsq = str_count(losing_combinations, "DSQ")) 
tr_data %>% filter(t2_dsq !=0) %>% head() %>% datatable()t2_dsqs = tr_data %>%
  group_by(school_id = t2_school_id) %>%
  summarize(dsq_count = sum(t2_dsq), race_count = n())
t1_dsqs = tr_data %>%
  group_by(school_id = t1_school_id) %>%
  summarize(dsq_count = sum(t1_dsq), race_count = n())
school_dsqs_count =
  t2_dsqs %>% 
  full_join(t1_dsqs, by = "school_id", suffix = c("_t2", "_t1")) %>% 
  mutate(dsq_count = coalesce(dsq_count_t2, 0) + coalesce(dsq_count_t1, 0), race_count = coalesce(race_count_t2, 0) + coalesce(race_count_t1, 0)) %>% 
    select(school_id, dsq_count, race_count) %>% 
  distinct() %>% 
  mutate(dsqs_per_hundred = (as.double(dsq_count) / as.double(race_count)) * 100.0)
school_dsqs_count %>% arrange(desc(dsqs_per_hundred)) %>% filter(race_count > 100, dsq_count >= 1) %>% datatable()URI leads the way with roughly 1.4 DSQs per 100 races! RWU has the highest total at 34.
summary(school_dsqs_count)##   school_id           dsq_count        race_count     dsqs_per_hundred
##  Length:140         Min.   : 0.000   Min.   :   3.0   Min.   :0.0000  
##  Class :character   1st Qu.: 0.000   1st Qu.:  18.0   1st Qu.:0.0000  
##  Mode  :character   Median : 0.000   Median :  66.5   Median :0.0000  
##                     Mean   : 2.071   Mean   : 378.8   Mean   :0.3014  
##                     3rd Qu.: 1.000   3rd Qu.: 398.8   3rd Qu.:0.3229  
##                     Max.   :34.000   Max.   :2578.0   Max.   :5.7143school_dsqs_count %>% filter(race_count > 100) %>% ggplot(aes(x=dsqs_per_hundred)) + 
  geom_density(color = "darkblue") +
  theme_minimal() +
  labs(title = "Density of DSQs per 100 Races", y = "Density", x = "DSQs per 100 Races", subtitle = "By Team Spring 2012 - Spring 2024")We can also find out who is filling out all those protest forms. Here’s who has caused their opponents the most DSQs:
t1_forms = tr_data %>%
  group_by(school_id = t1_school_id) %>%
  summarize(caused_dsq_count = sum(t2_dsq), race_count = n()) %>%
    mutate(dsqs_caused_per_hundred = (as.double(caused_dsq_count) / as.double(race_count)) * 100.0)
t2_forms = tr_data %>%
  group_by(school_id = t2_school_id) %>%
  summarize(caused_dsq_count = sum(t1_dsq), race_count = n()) %>%
    mutate(dsqs_caused_per_hundred = (as.double(caused_dsq_count) / as.double(race_count)) * 100.0)
schools_dsqs_caused = t1_forms %>% 
  full_join(t2_forms, by = "school_id", suffix = c("_t2", "_t1")) %>% 
  mutate(caused_dsq_count = coalesce(caused_dsq_count_t2, 0) + coalesce(caused_dsq_count_t1, 0), race_count = coalesce(race_count_t2, 0) + coalesce(race_count_t1, 0)) %>%
  select(school_id, caused_dsq_count, race_count) %>% 
  distinct() %>% 
  mutate(dsqs_caused_per_hundred = (as.double(caused_dsq_count) / as.double(race_count)) * 100.0)
schools_dsqs_caused %>% arrange(desc(dsqs_caused_per_hundred)) %>% filter(race_count > 100, caused_dsq_count >= 1) %>% datatable()When the Bears say protest they mean it. They take the top spot in both per hundred race categories and total!
summary(schools_dsqs_caused)##   school_id         caused_dsq_count   race_count     dsqs_caused_per_hundred
##  Length:140         Min.   : 0.000   Min.   :   3.0   Min.   :0.0000         
##  Class :character   1st Qu.: 0.000   1st Qu.:  18.0   1st Qu.:0.0000         
##  Mode  :character   Median : 0.000   Median :  66.5   Median :0.0000         
##                     Mean   : 2.071   Mean   : 378.8   Mean   :0.2435         
##                     3rd Qu.: 2.000   3rd Qu.: 398.8   3rd Qu.:0.2844         
##                     Max.   :41.000   Max.   :2578.0   Max.   :5.4054schools_dsqs_caused %>% filter(race_count > 100) %>% ggplot(aes(x=dsqs_caused_per_hundred)) + 
  geom_density(color = "darkblue") +
  theme_minimal() +
  labs(title = "Density of DSQs per 100 Races", y = "Density", x = "DSQs per 100 Races", subtitle = "By Team Spring 2012 - Spring 2024")seasonal_dsqs = tr_data %>%
  group_by(season) %>%
  summarize(dsq_count = sum(t2_dsq) + sum(t1_dsq), race_count = n()) %>%
    mutate(dsqs_per_hundred = (as.double(dsq_count) / as.double(race_count)) * 100.0) 
seasonal_dsqs %>%
  arrange(desc(dsqs_per_hundred)) %>% datatable()summary(seasonal_dsqs)##     season            dsq_count       race_count   dsqs_per_hundred
##  Length:23          Min.   : 0.00   Min.   :  92   Min.   :0.0000  
##  Class :character   1st Qu.: 1.00   1st Qu.: 177   1st Qu.:0.5422  
##  Mode  :character   Median : 4.00   Median : 743   Median :0.9379  
##                     Mean   :12.61   Mean   :1153   Mean   :0.9416  
##                     3rd Qu.:20.00   3rd Qu.:2068   3rd Qu.:1.4007  
##                     Max.   :60.00   Max.   :3092   Max.   :2.9211seasonal_dsqs %>% filter(grepl("s", season)) %>% ggplot(aes(x = season, y = dsqs_per_hundred)) +
  geom_line(aes(group = 1), color = "lightblue") +
  geom_point(color="darkblue", size = 2.0) +
  theme_minimal() +
  labs(title = "DSQs Per 100 Races By Season", y="DSQs Per 100 Races", x="Season", subtitle = "Spring 2013 - 2024")Spring 2014 was certainly a season to remember with a nearly 53% increase over spring 2020 the next highest season on record. Spring 2021 remains the lowest incident rate on record with just 1 DSQ across the 743 team races sailed. A rare win for the COVID era.
Just for fun here’s what looking at just the fall team races yields:
seasonal_dsqs %>% filter(grepl("f", season)) %>% ggplot(aes(x = season, y = dsqs_per_hundred)) +
  geom_line(aes(group = 1), color = "lightblue") +
  geom_point(color="darkblue", size = 2.0) +
  theme_minimal() +
  labs(title = "DSQs Per 100 Races By Season", y="DSQs Per 100 Races", x="Season", subtitle = "Fall 2012 - 2023")Warmer weather means less DSQs…
t2_season_dsqs = tr_data %>%
  group_by(school_id = t2_school_id, season = season) %>%
  summarize(dsq_count = sum(t2_dsq), race_count = n()) %>%
  mutate(school_season_id = paste(school_id, season, sep = "_", collapse = NULL))
t1_season_dsqs = tr_data %>%
  group_by(school_id = t1_school_id, season = season) %>%
  summarize(dsq_count = sum(t1_dsq), race_count = n()) %>%
  mutate(school_season_id = paste(school_id, season, sep = "_", collapse = NULL))
school_season_dsqs_count =
  t2_season_dsqs %>% 
  full_join(t1_season_dsqs, by = "school_season_id", suffix = c("_t2", "_t1")) %>% 
  mutate(dsq_count = coalesce(dsq_count_t2, 0) + coalesce(dsq_count_t1, 0), race_count = coalesce(race_count_t2, 0) + coalesce(race_count_t1, 0)) %>% 
    select(school_season_id, dsq_count, race_count) %>% 
  distinct() %>% 
  mutate(dsqs_per_hundred = (as.double(dsq_count) / as.double(race_count)) * 100.0)
school_season_dsqs_count %>% arrange(desc(dsqs_per_hundred)) %>% filter(dsq_count > 1, race_count > 50) %>% datatable()Does anyone know what was in the water on the BU campus that year?
school_season_dsqs_count %>% filter(race_count > 100) %>% ggplot(aes(x=dsqs_per_hundred)) + 
  geom_density(color = "darkblue") +
  theme_minimal() +
  labs(title = "Density of DSQs per 100 Races", y = "Density", x = "DSQs per 100 Races", subtitle = "By Team and Season Spring 2012 - Spring 2024")Some schools have managed to keep their records entirely clean across the entirety of 12 years.
school_dsqs_count %>% filter(dsq_count == 0) %>% arrange(desc(race_count)) %>% datatable()A round of applause for the 92 schools that have yet to receive their first DSQ! Especially to Eckerd who are nearing 1000 races with no incidents.
Alternatively titled Fight Fire with Fire, here are the 23 races in which both teams landed themselves a DSQ.
eyeforeyes = tr_data %>% filter(t1_dsq >= 1, t2_dsq >= 1)
eyeforeyes %>% arrange(desc(regatta_id)) %>% datatable()More than one DSQ for a team in one race??? It’s possible and has happened 11 times since fall 2012.
double_dsqs = tr_data %>% filter(t1_dsq > 1 | t2_dsq > 1) 
double_dsqs %>% arrange(desc(regatta_id)) %>% datatable()We can also find the races with the most DSQs total!
tr_data %>% filter(t1_dsq + t2_dsq >= 2) %>% arrange(desc(t1_dsq + t2_dsq)) %>% datatable()Shoutout to the Bears and Camels for holding the all time record with 4 DSQs in a single race in 2022! (It has been noted that there may have been a mistake when this was inputted in techscore)
Here’s a link to the rules for those of you who were involved in these: Racing Rules of Sailing
Thanks for reading this quick little dive into the world of DSQs! Enjoy the rest of your 04/01 festivities!