Vaccines have helped save millions of lives. In the 19th century, before herd immunization was achieved through vaccination programs, deaths from infectious diseases, like smallpox and polio, were common. However, today, despite all the scientific evidence for their importance, vaccination programs have become somewhat controversial.
The controversy began with a paper published in 1988 by Andrew Wakefield claiming there was a link between the administration of the measles, mumps and rubella (MMR) vaccine, and the appearance of autism. Despite further research contradicting this finding, sensationalist media reports and fear mongering from conspiracy theorists have led parts of the public to believe that vaccines are harmful. Some parents have stopped vaccinating their children, and this dangerous practice can be potentially disastrous given that the Center for Disease Control (CDC) estimates that vaccinations will prevent more than 21 million hospitalizations and 732,000 deaths among children born in the last 20 years (see Benefits from Immunization during the Vaccines for Children Program Era — United States, 1994-2013, MMWR).
Effective communication of data is a strong antidote to misinformation and fear mongering. We are going to prepare a report to show the positive impact vaccines have had for public health.
The data used for these plots were collected, organized and distributed by the Tycho Project. The data from this project are compiled neatly in the dslabs
package:
library(dslabs)
data(us_contagious_diseases)
head(us_contagious_diseases)
## disease state year weeks_reporting count population
## 1 Hepatitis A Alabama 1966 50 321 3345787
## 2 Hepatitis A Alabama 1967 49 291 3364130
## 3 Hepatitis A Alabama 1968 52 314 3386068
## 4 Hepatitis A Alabama 1969 49 380 3412450
## 5 Hepatitis A Alabama 1970 51 413 3444165
## 6 Hepatitis A Alabama 1971 51 378 3481798
To get us started, we will use the us_contagious_disease
dataframe and dplyr
tools to create an object called measles_data
that stores only the Measles data. Then, we will create a per 100,000 people cumulative incidence rate. We will also remove Alaska and Hawaii since they only became states in the late 50s. We will take into account the weeks_reporting
column when computing the per 100,000 people measles rate.
library(dplyr)
dat <- us_contagious_diseases %>% filter(disease == "Measles" & !(state %in% c("Alaska", "Hawaii"))) %>%
mutate(measles_rate = (count / population * 100000) / (weeks_reporting/52))
head(dat)
## disease state year weeks_reporting count population measles_rate
## 1 Measles Alabama 1928 52 8843 2589923 341.43872
## 2 Measles Alabama 1929 49 2959 2619131 119.89333
## 3 Measles Alabama 1930 52 4156 2646248 157.05255
## 4 Measles Alabama 1931 49 8934 2670818 354.98411
## 5 Measles Alabama 1932 41 270 2693027 12.71577
## 6 Measles Alabama 1933 51 1735 2713243 65.19945
Next, let’s take a look at just one state. I will plot the Measles disease rate per year for the state of Massachusetts. According to the CDC, the first Measles vaccine was introduced in 1963 by John Enders and colleagues. I will add a vertical line to the plot to show this year.
library(ggplot2)
theme_set(theme_bw())
dat %>% filter(state == "Massachusetts") %>%
ggplot(aes(x=year, y=measles_rate)) +
geom_line(na.rm = T) +
geom_vline(xintercept=1963, color="firebrick1", lty=2) +
geom_text(aes(x=1964.5, label="Vaccine introduced", y=650), angle=90, size=3) +
labs(title="Yearly Measles Rate per 100,000 residents in Massachusetts", y="Measles Rate per 100,000", x="Year")
As we can see above, it appears that after the introduction of the measles vaccine in 1963, the measles rate begins to drop drastically and stabilize around 0. However, this is just Massachusetts. Does the pattern hold for other states? Let’s use boxplots to get an idea of the distribution of rates for each year, and see if the pattern holds across states.
dat %>% ggplot(aes(factor(year), measles_rate)) +
geom_boxplot(na.rm = T, fill="dodgerblue") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Box Plots of U.S. States' Measle Rates across the Years", y="Measles Rate", x="Year")
As we can see, over the years, the distribution of measles rates tends to become less variable across the states and all of the measle rates in each of the states appear to decrease drastically after the introduction of the measles vaccine in 1963. However, one problem with the boxplot is that it does not let us see state-specific trends. Let’s make a plot showing the trends for all states, including the U.S. average.
dat <- dat %>% mutate(count_reporting = (52/weeks_reporting) * count)
national_rate <- dat %>% group_by(year) %>% summarize(rate = sum(count_reporting, na.rm = T) / sum(population, na.rm = T) * 100000)
dat %>% ggplot(aes(x=year, y=measles_rate)) +
geom_line(aes(group = state), na.rm = T, alpha=0.1) +
geom_line(data = national_rate, aes(y=rate, color="U.S. Measles Rate"), na.rm = T, size = 1) +
geom_vline(xintercept=1963,lty=2, color = "firebrick1") +
geom_text(aes(x=1964.5, label="Vaccine introduced", y=2000), angle=90, size=3) +
labs(title="Yearly Measles Rate per 100,000 residents", y="Measles Rate per 100,000", x="Year") +
scale_colour_manual("", values = c("dodgerblue"))
Again we can see that the all the states, and the U.S. as a whole, appear to follow the same trend. However, one problem with the plot above is that we can’t distinguish states from each other. There are just too many. We have three variables to show: year, state and rate. If we use the two dimensions to show year and state then we need something other than vertical or horizontal position to show the rates. Let’s try using color. Note that we will be using a square root transformation of the heat map to make the plot more interpretable.
ggplot(data = dat, aes(x=year, y=state)) +
geom_tile(aes(fill=sqrt(measles_rate))) +
coord_equal() +
scale_x_continuous(expand=c(0,0)) +
scale_y_discrete(expand=c(0,0)) +
scale_fill_continuous("Square Root of Measles Rate", low = "white", high = "orangered", na.value = "gray95") +
labs(title="Heat Map of Yearly Square Root of Measle Rate across the U.S. States", x="Year", Y="State")
Al of these plots above provide strong evidence showing the benefits of vaccines: as vaccines were introduced, disease rates were reduced. But did autism increase? We find some yearly reported autism rates data and provide a plot that shows if it has increased and if the increase coincides with the introduction of vaccines.
# autism data taken from CDC (https://www.cdc.gov/ncbddd/autism/documents/ASDPrevalenceDataTable2016.pdf)
# prevalence is given as per 1,000 children
us_years <- c(1967, 1985, 1988, 1996, 1998, 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014)
autism_prevalence <- c(0.19, 0.12, 0.4, 4.3, 4.0, 1.1, 6.7, 6.6, 8.0, 9.0, 11.3, 14.7, 14.6, 16.8)
autism_df <- data.frame(year = us_years, prevalence = autism_prevalence)
ggplot(data=autism_df, aes(x=year, y=prevalence * 100)) +
geom_line(aes(color = "Autism Rate")) +
geom_line(data=national_rate, aes(x=year, y=rate, color="Measles Rate"), na.rm = T, alpha = 0.5) +
geom_vline(xintercept=1948,lty=2) +
geom_text(aes(x=1949, label="Mumps vaccine introduced", y=800), angle=90, size=3) +
geom_vline(xintercept=1963,lty=2) +
geom_text(aes(x=1964, label="Measles vaccine introduced", y=800), angle=90, size=3) +
geom_vline(xintercept=1969,lty=2) +
geom_text(aes(x=1970, label="Rubella vaccine introduced", y=800), angle=90, size=3) +
geom_vline(xintercept=1971.5,lty=2) +
geom_text(aes(x=1972.5, label="MMR vaccine introduced", y=800), angle=90, size=3) +
geom_vline(xintercept=1980,lty=2) +
geom_text(aes(x=1981, label="DSM-III", y=800), angle=90, size=3) +
geom_vline(xintercept=1994,lty=2) +
geom_text(aes(x=1995, label="DSM-IV", y=800), angle=90, size=3) +
geom_vline(xintercept=2013,lty=2) +
geom_text(aes(x=2014, label="DSM-V", y=800), angle=90, size=3) +
labs(title="Yearly Measles and Autism Rates in the U.S.", x="Year", y="Autism and Measles Rat per 100,000") +
scale_colour_manual("", values = c("dodgerblue", "firebrick1"))
In the above plot, we see the yearly measles rate per 100,000 children in the United States (1928-2003) compared to the yearly autism rate per 100,000 children in the United States (1970-2014). Additionally, we have several horizontal lines. The first 4 lines represent when the Mumps, Measles, Rubella, and MMR vaccines were introduced in 1948, 1963, 1969, and 1971 respectively. We see that with the introduction of the vaccines, autism rates did not increase. In fact, autism rates were essentially stagnant between 1970 and 1990. That menas for almost 20 years after the introduction of the MMR vaccine, we did not see an increase in autism rates. However, we have also plotted the release dates of the “Diagnostic and Statistical Manual of Mental Disorders” (DSM) editions, which is used to aid physicians in diagnosing psychiatric and developmental disorders (these DSM release dates were found here). We see that with the introduction of each new manual, autism rates tend to increase. Therefore, it seems that it is more likely that autism rates have increased due to better diagnotics tools for physicans and not because of vaccination.