Today we are gonna look at all things related to DSQs in team racing. The first team races uploaded to techscore appear in Fall 2012, so we have roughly 12 years of data to work with here!
Feel free to also check out our NEW Protest Probability Calculator: The SailRank Protest Probability Calculator
The calculator takes inputs such as the rule infraction, venue, judges, and teams involved to give you a perfect estimate of what your odds are at winning a given protest!
tr_data = fread("../seasoncsvs/tr_res.csv")
tr_data = tr_data %>% select(regatta_id, t1_school_id, t2_school_id, winning_combinations, losing_combinations, season)
tr_data %>% head() %>% datatable()
tr_data = tr_data %>% mutate(t1_dsq = str_count(winning_combinations, "DSQ"), t2_dsq = str_count(losing_combinations, "DSQ"))
tr_data %>% filter(t2_dsq !=0) %>% head() %>% datatable()
t2_dsqs = tr_data %>%
group_by(school_id = t2_school_id) %>%
summarize(dsq_count = sum(t2_dsq), race_count = n())
t1_dsqs = tr_data %>%
group_by(school_id = t1_school_id) %>%
summarize(dsq_count = sum(t1_dsq), race_count = n())
school_dsqs_count =
t2_dsqs %>%
full_join(t1_dsqs, by = "school_id", suffix = c("_t2", "_t1")) %>%
mutate(dsq_count = coalesce(dsq_count_t2, 0) + coalesce(dsq_count_t1, 0), race_count = coalesce(race_count_t2, 0) + coalesce(race_count_t1, 0)) %>%
select(school_id, dsq_count, race_count) %>%
distinct() %>%
mutate(dsqs_per_hundred = (as.double(dsq_count) / as.double(race_count)) * 100.0)
school_dsqs_count %>% arrange(desc(dsqs_per_hundred)) %>% filter(race_count > 100, dsq_count >= 1) %>% datatable()
URI leads the way with roughly 1.4 DSQs per 100 races! RWU has the highest total at 34.
summary(school_dsqs_count)
## school_id dsq_count race_count dsqs_per_hundred
## Length:140 Min. : 0.000 Min. : 3.0 Min. :0.0000
## Class :character 1st Qu.: 0.000 1st Qu.: 18.0 1st Qu.:0.0000
## Mode :character Median : 0.000 Median : 66.5 Median :0.0000
## Mean : 2.071 Mean : 378.8 Mean :0.3014
## 3rd Qu.: 1.000 3rd Qu.: 398.8 3rd Qu.:0.3229
## Max. :34.000 Max. :2578.0 Max. :5.7143
school_dsqs_count %>% filter(race_count > 100) %>% ggplot(aes(x=dsqs_per_hundred)) +
geom_density(color = "darkblue") +
theme_minimal() +
labs(title = "Density of DSQs per 100 Races", y = "Density", x = "DSQs per 100 Races", subtitle = "By Team Spring 2012 - Spring 2024")
We can also find out who is filling out all those protest forms. Here’s who has caused their opponents the most DSQs:
t1_forms = tr_data %>%
group_by(school_id = t1_school_id) %>%
summarize(caused_dsq_count = sum(t2_dsq), race_count = n()) %>%
mutate(dsqs_caused_per_hundred = (as.double(caused_dsq_count) / as.double(race_count)) * 100.0)
t2_forms = tr_data %>%
group_by(school_id = t2_school_id) %>%
summarize(caused_dsq_count = sum(t1_dsq), race_count = n()) %>%
mutate(dsqs_caused_per_hundred = (as.double(caused_dsq_count) / as.double(race_count)) * 100.0)
schools_dsqs_caused = t1_forms %>%
full_join(t2_forms, by = "school_id", suffix = c("_t2", "_t1")) %>%
mutate(caused_dsq_count = coalesce(caused_dsq_count_t2, 0) + coalesce(caused_dsq_count_t1, 0), race_count = coalesce(race_count_t2, 0) + coalesce(race_count_t1, 0)) %>%
select(school_id, caused_dsq_count, race_count) %>%
distinct() %>%
mutate(dsqs_caused_per_hundred = (as.double(caused_dsq_count) / as.double(race_count)) * 100.0)
schools_dsqs_caused %>% arrange(desc(dsqs_caused_per_hundred)) %>% filter(race_count > 100, caused_dsq_count >= 1) %>% datatable()
When the Bears say protest they mean it. They take the top spot in both per hundred race categories and total!
summary(schools_dsqs_caused)
## school_id caused_dsq_count race_count dsqs_caused_per_hundred
## Length:140 Min. : 0.000 Min. : 3.0 Min. :0.0000
## Class :character 1st Qu.: 0.000 1st Qu.: 18.0 1st Qu.:0.0000
## Mode :character Median : 0.000 Median : 66.5 Median :0.0000
## Mean : 2.071 Mean : 378.8 Mean :0.2435
## 3rd Qu.: 2.000 3rd Qu.: 398.8 3rd Qu.:0.2844
## Max. :41.000 Max. :2578.0 Max. :5.4054
schools_dsqs_caused %>% filter(race_count > 100) %>% ggplot(aes(x=dsqs_caused_per_hundred)) +
geom_density(color = "darkblue") +
theme_minimal() +
labs(title = "Density of DSQs per 100 Races", y = "Density", x = "DSQs per 100 Races", subtitle = "By Team Spring 2012 - Spring 2024")
seasonal_dsqs = tr_data %>%
group_by(season) %>%
summarize(dsq_count = sum(t2_dsq) + sum(t1_dsq), race_count = n()) %>%
mutate(dsqs_per_hundred = (as.double(dsq_count) / as.double(race_count)) * 100.0)
seasonal_dsqs %>%
arrange(desc(dsqs_per_hundred)) %>% datatable()
summary(seasonal_dsqs)
## season dsq_count race_count dsqs_per_hundred
## Length:23 Min. : 0.00 Min. : 92 Min. :0.0000
## Class :character 1st Qu.: 1.00 1st Qu.: 177 1st Qu.:0.5422
## Mode :character Median : 4.00 Median : 743 Median :0.9379
## Mean :12.61 Mean :1153 Mean :0.9416
## 3rd Qu.:20.00 3rd Qu.:2068 3rd Qu.:1.4007
## Max. :60.00 Max. :3092 Max. :2.9211
seasonal_dsqs %>% filter(grepl("s", season)) %>% ggplot(aes(x = season, y = dsqs_per_hundred)) +
geom_line(aes(group = 1), color = "lightblue") +
geom_point(color="darkblue", size = 2.0) +
theme_minimal() +
labs(title = "DSQs Per 100 Races By Season", y="DSQs Per 100 Races", x="Season", subtitle = "Spring 2013 - 2024")
Spring 2014 was certainly a season to remember with a nearly 53% increase over spring 2020 the next highest season on record. Spring 2021 remains the lowest incident rate on record with just 1 DSQ across the 743 team races sailed. A rare win for the COVID era.
Just for fun here’s what looking at just the fall team races yields:
seasonal_dsqs %>% filter(grepl("f", season)) %>% ggplot(aes(x = season, y = dsqs_per_hundred)) +
geom_line(aes(group = 1), color = "lightblue") +
geom_point(color="darkblue", size = 2.0) +
theme_minimal() +
labs(title = "DSQs Per 100 Races By Season", y="DSQs Per 100 Races", x="Season", subtitle = "Fall 2012 - 2023")
Warmer weather means less DSQs…
t2_season_dsqs = tr_data %>%
group_by(school_id = t2_school_id, season = season) %>%
summarize(dsq_count = sum(t2_dsq), race_count = n()) %>%
mutate(school_season_id = paste(school_id, season, sep = "_", collapse = NULL))
t1_season_dsqs = tr_data %>%
group_by(school_id = t1_school_id, season = season) %>%
summarize(dsq_count = sum(t1_dsq), race_count = n()) %>%
mutate(school_season_id = paste(school_id, season, sep = "_", collapse = NULL))
school_season_dsqs_count =
t2_season_dsqs %>%
full_join(t1_season_dsqs, by = "school_season_id", suffix = c("_t2", "_t1")) %>%
mutate(dsq_count = coalesce(dsq_count_t2, 0) + coalesce(dsq_count_t1, 0), race_count = coalesce(race_count_t2, 0) + coalesce(race_count_t1, 0)) %>%
select(school_season_id, dsq_count, race_count) %>%
distinct() %>%
mutate(dsqs_per_hundred = (as.double(dsq_count) / as.double(race_count)) * 100.0)
school_season_dsqs_count %>% arrange(desc(dsqs_per_hundred)) %>% filter(dsq_count > 1, race_count > 50) %>% datatable()
Does anyone know what was in the water on the BU campus that year?
school_season_dsqs_count %>% filter(race_count > 100) %>% ggplot(aes(x=dsqs_per_hundred)) +
geom_density(color = "darkblue") +
theme_minimal() +
labs(title = "Density of DSQs per 100 Races", y = "Density", x = "DSQs per 100 Races", subtitle = "By Team and Season Spring 2012 - Spring 2024")
Some schools have managed to keep their records entirely clean across the entirety of 12 years.
school_dsqs_count %>% filter(dsq_count == 0) %>% arrange(desc(race_count)) %>% datatable()
A round of applause for the 92 schools that have yet to receive their first DSQ! Especially to Eckerd who are nearing 1000 races with no incidents.
Alternatively titled Fight Fire with Fire, here are the 23 races in which both teams landed themselves a DSQ.
eyeforeyes = tr_data %>% filter(t1_dsq >= 1, t2_dsq >= 1)
eyeforeyes %>% arrange(desc(regatta_id)) %>% datatable()
More than one DSQ for a team in one race??? It’s possible and has happened 11 times since fall 2012.
double_dsqs = tr_data %>% filter(t1_dsq > 1 | t2_dsq > 1)
double_dsqs %>% arrange(desc(regatta_id)) %>% datatable()
We can also find the races with the most DSQs total!
tr_data %>% filter(t1_dsq + t2_dsq >= 2) %>% arrange(desc(t1_dsq + t2_dsq)) %>% datatable()
Shoutout to the Bears and Camels for holding the all time record with 4 DSQs in a single race in 2022! (It has been noted that there may have been a mistake when this was inputted in techscore)
Here’s a link to the rules for those of you who were involved in these: Racing Rules of Sailing
Thanks for reading this quick little dive into the world of DSQs! Enjoy the rest of your 04/01 festivities!