Big data can be a tool, a weapon or a currency. Now, amid the COVID-19 pandemic, big data has become a life-saving ally for the health care community. This moment in history is unlike any other — and the value of data in ending it resembles nothing we’ve yet seen.
The following is just a sample of the many ways big data and machine learning are flattening the coronavirus curve, as well as study the disease, mount a sensible public response and reopen our communities, economies and countries. Combining the coronavirus and big data may prove just how valuable artificial intelligence and other major technologies can be. Here are a few ways this is possible:
1. Predicting Community and Individual Risk of Infection
Predicting an individual’s risk of infection — and better understanding the risk factors involved — is vital in a situation like this. Whether to help us mount a defense, adjust our personal habits or anticipate where the disease will take the heaviest toll, each of the following risk factors plays a role in analytical models:
- Age, location and socioeconomic status.
- Social and hygiene habits.
- Pre-existing medical conditions.
- Number and frequency of interactions with others.
- Weather and climate.
Researchers in the Netherlands are hard at work studying data related to these risk factors so the medical community has better models for predicting COVID-19 spread and other diseases.
Another project by Dave DeCaprio, Joseph Gartner and Carol J. McCall, et al., used similar methodologies to create a “Vulnerability Index.” In their words, “by providing this tool quickly to the health care data science community, widespread adoption will lead to more effective intervention strategies and … help to curtail the worst effects of this pandemic.”
2. Predicting Patient Treatment Outcomes
The health care data science community has long seen big data as a vital tool in predicting treatment outcomes and understanding the viability of different treatments. This strategy is a form of high-tech triage, where doctors must weigh the potential risks of a course of treatment against its likelihood to succeed.
Amid the COVID-19 pandemic, doctors — especially in the United States’ underdeveloped health care system — have had to make gut-wrenching decisions about who gets a respirator and who does not, plus other matters. Successfully “predicting life and death” has taken on a new and urgent importance.
Each of the factors named above, like age and pre-existing conditions, plays a role in these analytical models. Professor Dimitris Bertsimas of the MIT Sloan School of Management — plus almost two dozen other researchers — used these data classes to develop what they call the COVIDanalytics platform.
The platform uses machine learning to predict patient outcomes based on their progression and characteristics. The software then makes accurate treatment recommendations, such as hospitalizing the patient or moving them to intensive care.
3. Monitoring Patients as They Enter Hospitals
The health care community is exploring many options for using big data and machine learning to screen patients, provide intervention quickly and allocate resources. Tampa General Hospital was one of the first facilities to deploy face-scan and AI technology to respond to and classify incoming patients.
With just a quick scan, the system determines who has a fever and who does not — and signals for further examination when the patient shows potential signs of infection. In remote or understaffed health care settings, this is vitally important for making the best use of limited staff and resources.
The ability to keep tabs on the number and condition of patients as they enter health care facilities is crucial no matter the circumstances. In a pandemic, knowing at a glance how many patients require further screening, how many hospital beds are available and which hospital is best equipped to deal with incoming casualties or illness can save lives.
These situations are why unified, shared dashboards for health systems — powered by big data — have become so vital. Keeping first responders, hospital staff and dispatch professionals informed with accurate, real-time data about the availability of medical staff and resources is essential for allocating limited supplies effectively. It is also crucial for sending patients to facilities that can see them promptly and responding as quickly as possible to the most severe cases.
4. Identifying Promising Drug Candidates
When it comes to drug discovery, computers are far better than human researchers for exploring hundreds, thousands or millions of potential chemical combinations.
In the search for effective medications and vaccines — efforts which now include the frenzied search for a viable COVID-19 vaccination — scientists rely on machine learning to predict how a virus’s proteins will interact with existing or novel drugs. This process is known as drug-target interaction (DTI) prediction.
To do this, scientists train neural networks using vast databases of existing DTI data. This method results in a list of drugs or drug combinations that have the highest potential to bind to and inhibit the actions of virus proteins.
In March 2020, a team of researchers from Tsinghua University, the Jiangsu Provincial Center for Disease Control and the Shanghai Institute of Materia Medica announced they had found a promising vaccine candidate for COVID-19.
Their methodology involved a neural network trained on “knowledge graphs” that had previously been successful in finding Baricitinib, one of the earliest examples of a potential treatment candidate for COVID-19.
Scientists build these knowledge graphs using natural language processing and machine learning. The algorithm combs vast medical archives to find connections between, for example, proteins and drugs — or other associations that are meaningful to the medical community.
5. Estimating Real-Time Spread and Forecasting Future Spread
The last example of scientists leveraging big data to flatten the coronavirus curve is potentially the most controversial.
For anybody reading between the headlines, it’s clear the mechanisms under development or already being used to study the pandemic may outlive the crisis and become part of a worldwide surveillance apparatus. The United States’ $2 trillion Coronavirus Relief Bill involves spending $500 million to implement a “public health surveillance and data collection system.”
Jake Laperruque, representing the D.C.-based nonprofit Project on Government Oversight, said he “could definitely see it being used to build out infrastructure for things like location tracking, cell phone tracking tools [or] social media monitoring tools.”
Governments have traditionally worked with health systems to monitor the number of cases as well as the location and timing of infections. Unfortunately, this is a fantastically expensive and labor-intensive process. Using social media to expand our collection of knowledge in real-time is a crucial development in a situation as fast-moving as the COVID-19 pandemic.
This point is where social media enters the picture. Multiple research papers have already demonstrated how useful machine learning can be for digesting real-time social media posts and status updates. This technology can also estimate how quickly — and where — a communicable disease might spread next.
Users of Facebook, Twitter and other websites may use language that gives insights into their current or future health. Machine learning equipped with natural language processing can study massive numbers of public messages remarkably quickly. The software can then help researchers come to conclusions about disease spread and ultimately enable public officials to make responsible decisions in-the-moment.
COVID-19 and Big Data — And the Future of Pandemic Response
Each of these big data techniques likely has a role to play in flattening the infection curve and helping us mount a practical, effective and humanistic response to the pandemic.
In the right hands, these tools will ensure the world is better prepared should a similar crisis arise in the future.