DIGITAL BIOME

Computing the Biome

Janos Sztipanovits  |  Ethan Jackson

Overview

Individuals, industries, societies, and governments want to stay healthy. They need cost-effective systems to detect biological threats and predict future disease outbreaks as early as possible. COVID-19 acutely and painfully demonstrated the impacts of the unpredicted. The goals of this program, Computing the Biome, are twofold:

  1. demonstrate an extensible data and AI platform that continuously monitors and predicts biothreats in a major U.S. city
  2. create a framework for economic sustainability and global scalability of these results, by empowering businesses and advanced science missions to consume predictions and produce valuable consumer apps and breakthroughs

Description

Predicting biological threats is hard. Earth’s biome is home to hundreds of millions to possibly a billion species ranging from nanometer-sized viruses to kilometer-sized forests. These species are interconnected, co-evolving, and moving at breathtaking scales and speeds. As a result, biological threats such as emerging diseases, invasive species, and agricultural pathogens can appear unexpectedly and quickly harm our societies and ecosystems. They already cause hundreds of billions of dollars per year in economic damages.

Predicting these will require:

  1. continuous data streams not yet available today
  2. detailed models harnessing expertise from across the science domains
  3. modern AI platforms that use data and models to compute the biome in real-time – just as we continuously compute weather forecasts using real-time data streams and models

Fortunately, revolutions in sensing technology, AI, and consumer demand are about to transform how we compute the biome and predict threats. First, this team will produce and interconnect novel data streams ranging from kilometer-scale hyper-local weather, to autonomously identified disease transmitting insects (only millimeters in size), to genomically recognized known and novel viruses (only nanometers in size) – demonstrating that cross-cutting continuous data streams for biothreat detection and prediction can be rapidly unlocked.

Next, the team will combine their expertise in ecology, epidemiology, and virology to design new predictive models and anomaly detectors. Our team will develop the first of these high-impact AIs focused on predicting mosquito-borne diseases, which are difficult to control and impact over 600 million people per year. More broadly, the resulting data platform will empower development of new foundational methods for use by the AI community – based on real-world data and grounded in the societal challenges of our age.

Finally, economic sustainability will depend on a vibrant ecosystem where businesses and global missions can consume state-of-the-art models and produce applications and insights that people want to use. Even before COVID-19, the U.S. spent >$1 billion per year on biothreat mitigation. We want to deliver solutions that benefit these critical efforts.

Differentiators

Our main premise is that only a modern sensor network – that continuously monitors species at geographic scales across environments – will be capable of predicting complex biothreats early enough to manage risks. This perspective is based on the successes of existing sensor networks and AI models to monitor and predict other complex phenomena (e.g.weather systems, smart power grids, and transportation systems).

Today, outbreaks of human disease are usually detected through clinical case data, news reports, and other digital data. WHO’s GOARN system is a global aggregator of many of these data sources. It has successfully detected outbreaks early, but generally not early enough to stop their spread.

On the other hand, efforts like USAID’s PREDICT program preemptively sampled the environment to look for future novel threats, even sampling coronaviruses in bats in China prior to COVID-19. However, these programs rely on manual sampling. We believe new platforms and AI could make these search efforts more efficient and cost-effective.

Road Map

Our first user is Harris County, Texas – home to the city of Houston and 4.7 million people.

First six months (foundations):

  1. a unified data platform housing new biome data streams and tools for simulating biomes
  2. an equitable AI that uses simulations to design fair sensor networks – to be released as a global health planning tool
  3. an announcement and hackathon coinciding with WHO’s World Health Day

First year (protect against known):

  1. socially equitable deployment of a sensor network into urban areas with high risk of West Nile Virus (WNV)
  2. streaming of biome data into continuous predictions
  3. release of public health and clinical risk tools to protect communities

Year 1.5 (detect unknown):

  1. development of biome baselines for detecting anomalies such as insecticide resistance and invasive species
  2. release of upgraded tools that guide insecticide use to reduce dangerous resistance and maximize sustainability
  3. recommend discussion of these results at the World Economic Forum, where human health and environmental sustainability are likely to be major intertwined topics

Year 2 (sustainability):

  1. real-time biome models going beyond WNV to other threat classes such as emerging human and agricultural pathogens
  2. AI-based biodiversity models
  3. creation of a non-profit to manage infrastructure and support business and science access

Partnerships

  1. Microsoft: sensor nodes, species recognizers, models, and industry leadership
  2. Tomorrow.io: hyperlocal local climactic models for habitat suitability including newly launched satellite-based weather radars
  3. Harris County Public Health: equitable deployment and management of systems over the 1,800 mi2 of Harris County, Texas
  4. Vanderbilt University: open-source data platforms and application design studios for the wider community, and academic leadership
  5. Johns Hopkins University: AI-ready disease control policies and coordination with global health missions
  6. University of Pittsburgh: genomic data analytics for microbial threat detection and liaison with biotech stakeholders
  7. University of Washington: AI-enable epidemiological models and forecasts built on top of the above capabilities

Intellectual Property

Open platforms will be utilized, and arrangements have been made for data and code releases under open data and code licenses. 

2021 NSF Convergence Accelerator Expo

In July 2021, the Computing the Biome team participated in the National Science Foundation Convergence Accelerator Expo. During the two-day event, the team demoed their solution for protecting communities from biological threats with real-time sensing and AI. A short promotional video and nine minute demo video were produced for the event, and can be found below.