National Security Task Force May 2021 Report*
The first public alerts about a novel coronavirus circulating in China in late 2019 came from open source networks established to conduct surveillance and provide warning of outbreaks of infectious diseases. One was ProMED (Program for Monitoring Emerging Diseases), a worldwide volunteer network of mostly health professionals who collect information from social media chatter, health department announcements, and local media outlets and feed it to a small staff of part-time health professionals, who curate information, circulate reports by email to subscribers worldwide, and post them in near-real time on their web site. The COVID-19 alert was the latest in a successful record for ProMED, which also provided early warning of the SARS, MERS, Ebola, and Zika outbreaks.
On December 30, 2019 a member of the ProMED network in Taiwan monitoring Chinese social media sent an email to a ProMED editor in New York calling attention to concerns expressed by medical authorities in Wuhan about an unexplained pneumonia. The ProMED editor proceeded to look for further information to validate this message, and found further evidence from a known Chinese financial news reporter confirming that Wuhan health authorities were responding to an unexplained outbreak. She then sent an email to the 80,000 ProMed subscribers alerting the world to a pneumonia of unknown cause in Wuhan, and seeking further information.
On the same day, an automated system in Boston crawling the internet for information on outbreaks of infectious diseases also found evidence of an outbreak while monitoring Chinese language local news and social media in Wuhan. HealthMap, a web site run by a team of researchers, epidemiologists, and software developers at Boston Children’s Hospital, uses machine learning to monitor a wide variety of open data sources, including news reports, social media, government reports, internet search queries, and other information streams worldwide for signs of outbreaks of infectious diseases. The automated natural language processing system monitors the internet for information related to infectious diseases in many languages, and prior to COVID-19 had a proven record of successfully capturing, recognizing, and publicly reporting early signals of H1N1 influenza, MERS, and Ebola, well in advance of government alerts.
The HealthMap software continuously collects information from hundreds of thousands of sources across the internet, analyses their content, and draws conclusions about the nature and location of infectious diseases worldwide. Analysts provide feedback, leading to continuous improvement of the machine learning algorithm. The resulting information on emerging diseases and their effects on human and animal health is disseminated online in nine languages and easily accessible maps for use by governments, health departments, and individuals (including international travelers).
On December 30, 2019, HealthMap’s automated system generated a public alert about unidentified pneumonia cases in Wuhan. In the initial report the system ranked the seriousness of the alert as a 3 on a disease severity scale of 1 to 5, in which serious diseases are 4’s and 5’s. Full recognition of the potential magnitude of the outbreak came later.
On the following day, BlueDot, a Canadian startup that uses a machine learning algorithm to collect and assess data from foreign language press reports, animal and plant disease networks, public health reports, and airline ticketing records, warned of an outbreak in Wuhan, and from the information on air travel was able to correctly predict where it would spread next.
This brief cannot comment on the potential relative performance or timing of classified western intelligence services in identifying the threat from an emergent novel coronavirus, nor do we know the extent to which intelligence services integrate open source information into their assessments. But it’s clear that the three open source public alerts on December 30-31, 2019 were all issued a week before the first COVID-19 report from the CDC and 10 days before the WHO.
Human networks versus automated machine learning systems
A question naturally arises on the performance of conventional human-moderated early warning networks versus more nascent automated machine learning systems. Reports from the creators of both approaches suggest that the ProMED email network of health professionals and the automated systems that extract information from the internet and social media in fact complement each other. The human editors at ProMED’s network also have access to the reporting of HealthMAP, BlueDot, and other automated systems and use that information to help interpret the email messages they receive and shape their reports.
While HealthMap is based on a customized automated natural language processing system, humans play key roles in its operation, including analysts who provide feedback by correcting errors and public individuals who provide information as “citizen epidemiologists.” HealthMap and ProMED were, furthermore, founding partners in the establishment EpiCore, a new and more formal worldwide network of health professionals who can be asked to verify evidence of disease outbreaks found by automated systems.
Professional networks and automated systems can work together to accurately identify the signals of a major new outbreak in the noise of everyday activities, and help to avoid the twin problems of too many false alarms or too few early alerts. The information collected by automated systems can help human professionals make judgements on when to sound an alarm about a major potential outbreak.
The open source networks operate on small budgets, with funding and support provided by foundations, corporations, and government agencies in the United States, Canada, and Europe. Some are informal and rely on volunteers and part-time professionals, and relationships of trust between professionals built up over time. These open source networks are meanwhile evolving with a) new platforms for automated systems and b) more networked health professionals on the ground where outbreaks may occur. They have a proven record of providing early public alerts of new infectious diseases, including for COVID-19, and we can expect that ongoing expansion in technology and geographic coverage will further enhance this capability. Such tools will become even more important in a world where novel diseases cross over into increasingly-mobile human populations at higher rates. We can expect, with some confidence, that future early warning alerts will come from such open source networks, providing public health authorities opportunities to take early action to mitigate outbreaks before they become pandemics.
The collection and analysis of open source publicly available information is growing rapidly, driven by the explosive increase in information accessible on the internet (worldwide media, social media, government data, publications, and commercial data) and by advances in data science to mine vast quantities of data to extract useful inferences. Open source assessments are now widely used for national security, law enforcement, and business purposes as well as public health. From a national security perspective, open source information can augment intelligence information collected in other ways, and offer new tools in areas such as monitoring proliferation activities and verification of agreements. Subjects for further research could include assessments of the accuracy, reliability, and credibility of reporting based on open sources, and the potential for generating public alerts in areas other than public health.
*This paper is the result of a collective effort under the auspices of the Hoover Institution’s National Security Task Force. David Fedor and Admiral James Ellis participated in the interviews and the drafting of this report, which also benefited from reviews and comments of other participants in the Task Force.