Over my blogging “career,” which now stretches back nearly nine years, and my hobby before that of engaging in online “debates” on Usenet newsgroups back before 2004, I developed an interest in the antivaccine movement. Antivaccinationism, “antivax,” or whatever you want to call it, represents a particularly insidious and dangerous form of quackery because it doesn’t just endanger the children whose parents don’t vaccinate them. It also endangers children who are vaccinated, because vaccines are not 100% effective. The best vaccines have effectiveness rates in the 90%-plus range, but that still leaves somewhere up to 10% of children unprotected. Worse, because herd immunity requires in general approximately 90% of the population and above to be vaccinated against a vaccine-preventable disease to put the damper on outbreaks, it doesn’t take much of a degradation of vaccination rates to put a population in danger of outbreaks. That’s why, even though overall vaccine uptake is high in the US, we still see outbreaks, because there are areas with pockets of nonvaccinators and antivaccinationists who drive vaccine uptake down to dangerous levels. We’ve seen this in California and elsewhere. Other countries have observed even more dramatic examples, the most well-known being the way that fear of the MMR vaccine stoked by Andrew Wakefield’s bad science and the fear mongering of the British press led MMR uptake to plummet. The result? Measles came roaring back in the UK and Europe, from having been considered under control in the 1990s to being endemic again by 2008.

As much as I get chastised by concern trolls for saying this, to antivaccinationists it really is all about the vaccines. Always. They blame autism, other neurodevelopmental conditions, and a wide variety of chronic diseases on vaccines, without evidence that there is even a correlation. They even falsely blame sudden infant death syndrome (SIDS) on vaccines, even though there is no evidence of an association and, indeed, existing evidence suggests that vaccines likely have a protective effect against SIDS more than anything else. No matter what happens, no matter what the evidence says, antivaccinationists will always find a way to blame bad things on vaccines, even going so far as to claim at times that shaken baby syndrome is a misdiagnosis for vaccine injury.

One thing, however, that is often forgotten, is that they also do their utmost to downplay the beneficial effects of vaccines. One such tactic is for antivaccinationists to claim that the pertussis vaccine doesn’t work because we are seeing resurgences of pertussis even in the face of high vaccine uptake. For example, another common trope is what I like to refer to as the “vaccines didn’t save us” or the “vaccines don’t work” gambit, in which it is pointed out that the introduction of vaccines doesn’t correlate tightly with drops in mortality from various diseases. Julian Whitaker even used this gambit when he debated Steve Novella. The fundamental flaw in this trope neglects the contribution of better medical care to the survival of more victims of disease, which decreased mortality. If you look at graphs of disease incidence you will see a profound and powerful effect of the introduction of vaccines on specific vaccine-preventable diseases. In other words, vaccines work.

Over the Thanksgiving long weekend here in the US, there appeared a study that simply emphasizes once again that vaccines work. More importantly, it estimates how well they work. I’ve frequently said that vaccines are the medical intervention that have saved more lives than any other, and this study by investigators at the University of Pittsburgh’s graduate school of public health, published on Thanksgiving Day in the New England Journal of Medicine (NEJM) and showing up on the news the day before provides yet more evidence to support my assertion. In one way, it’s a shame that it was published over a long holiday weekend here in the US, where it was unlikely to garner as much attention as it normally might have at another time. On the other hand, it was Thanksgiving, and if there is anything we should be thankful for it’s that so few children die of vaccine-preventable diseases anymore. This study simply underlines this.

What the authors did was a massive undertaking that involved going back over case reports from before and after times when specific vaccines became commercially available. Boiled down to its essence, the study examined these reports and came up with estimates for cases of a disease prevented based on the drop in cases after the vaccine for that disease came into widespread use, and they did it all the way back to 1888. From the Methods section of the paper:

In an effort to overcome these limitations, we digitized all weekly surveillance reports of nationally notifiable diseases for U.S. cities and states published between 1888 and 2011. This data set, which we have made publicly available (, consists of 87,950,807 reported individual cases, each localized in space and time. We used these data to derive a quantitative history of disease reduction in the United States over the past century, focusing particularly on the effect of vaccination programs.

We obtained all tables containing weekly surveillance data on nationally notifiable diseases that were published between 1888 and 2011 in the Morbidity and Mortality Weekly Report and its precursor journals from various online and hard-copy sources.21-24 We digitized all data available in tabular format that listed etiologically defined cases or deaths according to week for locations in the United States. Reported counts (weekly tallies) of cases or deaths and the reporting locations, periods, and diseases were extracted from these data and standardized. Then we selected eight vaccine-preventable contagious diseases for more detailed analysis and computed weekly incidence rates, deriving a quantitative history of each disease.

We estimated the number of cases that have been prevented since the introduction of vaccines for seven of the eight diseases. (Since there were no data from the era before the introduction of the smallpox vaccine in 1800, we could not quantify the number of smallpox cases that were prevented by the vaccine.) We estimated the numbers of cases of polio, measles, rubella, mumps, hepatitis A, diphtheria, and pertussis that were prevented by vaccines by subtracting the reported number of weekly cases after the introduction of vaccines from a simulated counterfactual number of cases that would have occurred in the absence of vaccination, assuming that there were no other changes that would have affected incidence rates. We used the year of vaccine licensure as the cutoff year to separate the prevaccine period from the vaccination period. Counterfactual numbers were estimated by multiplying the median weekly incidence rate from prevaccine years with population estimates for vaccination years.

Yes, you read that right: nearly 88 million reported individual cases. The New York Times news report on the study points out that this massive digitization of data was performed by Digital Divide Data, described as “a social enterprise that provides jobs and technology training to young people in Cambodia, Laos and Kenya.” However, getting the data digitized and organized into spreadsheets was only the first step. Massive databases and spreadsheets are not particularly useful if they aren’t in a form that can be queried to answer research questions. The data thus had to be standardized and sorted in order to allow for that. Once that was done, the investigators were able to conclude since 1924:

Assuming that the difference between incidence rates before and after vaccine licensure for these diseases was attributable solely to vaccination programs, we estimated that a total of 103.1 million cases of these contagious diseases have been prevented since 1924 on the basis of median weekly prevaccine incidence rates. Estimates based on the 10th and 90th percentile of weekly prevaccine incidence rates were 72.3 million and 147.8 million cases, respectively. Of those hypothetical cases, approximately 26 million were prevented in the past decade. Sensitivity analyses that used different methods for imputing missing data and for simulating counterfactual cases resulted in estimates ranging from about 75 million to 106 million prevented cases. The number of cases that were prevented per disease depended on the incidence rate before vaccination and the duration of the vaccination program.

If you delve into the paper, you’ll find a really cool interactive graphic about disease elimination in the US, specifically hepatitis A, measles, mumps, pertussis, polio, rubella, and smallpox. As you move your cursor to different points of the graph, different facts and statistics pop up. You can look at state level data. If you click on different lines indicating when a specific vaccine was first licensed, all the other lines representing the other diseases disappear, and you see the data only for that disease. For instance, if you look at when the measles vaccine was first licensed in 1963, you’ll see a brief blip upward in measles incidence well within the range of random variation followed by a drop to almost zero by 1968, a mere five years after the vaccine was licensed. The pertussis vaccine took a bit longer; after it was licensed in 1948 it took around 8 years before the disease incidence hit bottom. Particularly cool is a set of graphs in Figure 2 that show snapshots of disease elimination in the US for different diseases and the entire country divided up into ten different areas. It’s particularly striking and an effective way of demonstrating the effect of vaccines on infectious disease:


The investigators were very conservative about their assumptions, as well. The authors point out in the discussion that their estimate of number of cases of diseases prevented is probably an underestimate. The reasons include an inability to include all vaccine-preventable diseases and to correct for underreporting of cases. They note that the underreporting rate was higher in the era before specific vaccines came into use and that they don’t always have the detailed historical demographic data, such as birth rates and age-specific disease incidence rates, that would enable them to make such adjustments. Unfortunately, such data are only available for a small number of locations and for limited periods of time.

One weakness of the study is that the authors could not examine death rates in nearly as much detail as they could study incidence. They could only estimate the effect of various vaccines on death rates. Hence, they did not report death rates in the NEJM article because, according to the NYT article, death certificate data became sufficiently reliable and consistent only in the 1960s. They could, however, make a reasonable estimate of three or four million deaths prevented based on the known mortality rates of the diseases studied in the database.

The real accomplishment of this project is not so much the first publication, but rather the open-source Project Tycho™ database, named after Danish scientist Tycho Brahe (1546—1601), who was known for his detailed astronomical and planetary observations. The reason for choosing Tycho Brahe becomes obvious if you know that Tycho could not use all of his data during his lifetime. However, his assistant Johannes Kepler (1571-1630) used his data to derive the laws of planetary motion. As the authors put it:

Similarly, this project aims to advance the availability of large scale public health data to the worldwide community to accelerate advancements in scientific discovery and technological progress.


The database contains three levels of data. Level 1 data were the basis of the NEJM article, and “include different types of counts that have been standardized into a common format for a specific analysis published recently in the NEJM.” Level 2 data are defined thusly:

Level 2 data only includes counts that have been reported in a common format, e.g. diseases reported for a one week period and without disease subcategories. These data can be used immediately for analysis, includes a wide range of diseases and locations but this level does not include data that have not been standardized yet.

While Level 3 data are defined:

Level 3 data include all the different types of counts ever reported. Although this is the most complete data, the large number of different counts requires extensive standardization and various judgment calls before they can be used for analysis.

All of these data are broken down into diseases, states, and cities, as well as time periods. Level 1 data include eight diseases, 50 states and 122 cities from 1916-2009; Level 2, 47 diseases, 50 states, 1,287 cities from 1888-2013; and Level 3, 56 diseases, 72 disease subcategories, 3,000 cities, etc. from 1888-2013. Any investigator can establish an account to look at Level 1 and Level 2 data, although the University of Pittsburgh won’t give out Level 3 data to anyone, because the database contains “substantial number of counts for which the disease name, time period, or location has not yet been identified from contextual information.” To get an idea of the power of this database, it’s useful to take a look at a couple of short videos:


As you can see, this is a fantastic resource that is likely only to get better with time as raw data are curated, organized, and put into a form that can be mined for correlations. Epidemiologists, vaccinologists, and infectious disease researchers will be able to use this resource to ask questions and look at historical comparisons in a way that they haven’t been able to do before because of the difficulty in reconstructing old disease patterns. No wonder the Bill and Melinda Gates Foundation funded this work!

There is one concern I have about the project, although it does not in any way outweigh the potential usefulness of this database. That concern derives from what I know of bad science generated by antivaccinationists. I can easily see antivaccine “scientists” mining this database in ways to look for correlations to support their agenda, particularly if they get their hands on the raw data, which, according to the authors, needs a lot of cleaning:

These data have not been filtered or standardized and cannot be used for analysis. These data include a large variety of data counts and often varying types of information. In this level, multiple types of data counts are often available for one location, disease, and week. In some cases, different counts provide conflicting information on a location and disease. The use of data from this level requires extensive knowledge of the historical U.S. disease surveillance system and data digitization and quality control procedures. We continue to standardize data and will include newly standardized data in the level 2 data section of this website at regular intervals. These level 3 data are provided for those that are interested in contributing to the data standardization process.

Can you imagine what Jake Crosby might do with such a data set? Or Mark and David Geier? Just take what they’ve tried to do with the VAERS database and the Vaccine Safety Datalink and put it on steroids. I rather expect that various antivaccine “scientists” have already registered accounts for Project Tycho™ and are furiously mining ever smaller slices of data trying to see if they can “prove” that vaccines don’t work or linking their work with other databases to try to correlate vaccine uptake with autism.

Still, any database can be abused, as can any scientific tool. If the database is truly open source, then its creators are obligated to provide access to everyone who requests it. The benefits of such a resource far outweigh the risk that Jake Crosby, Mark Geier, Gary Goldman, or other epidemiologist wannabes might use it to produce nonsense. Besides, the correlations between the introduction of various vaccines and plunges in the incidence of the diseases being vaccinated against are so robust that I doubt the antivaccinationists can do any real serious harm, other than producing studies to use to preach to the choir with. Meanwhile, real scientists will be using the database to do real science and ask important questions about infectious disease and how it can be prevented with vaccines.



Posted by David Gorski

Dr. Gorski's full information can be found here, along with information for patients. David H. Gorski, MD, PhD, FACS is a surgical oncologist at the Barbara Ann Karmanos Cancer Institute specializing in breast cancer surgery, where he also serves as the American College of Surgeons Committee on Cancer Liaison Physician as well as an Associate Professor of Surgery and member of the faculty of the Graduate Program in Cancer Biology at Wayne State University. If you are a potential patient and found this page through a Google search, please check out Dr. Gorski's biographical information, disclaimers regarding his writings, and notice to patients here.