COVID-19 Datasets for Machine Learning

Curated by Sasha Luccioni (Mila)

For ideas and inspiration, check out our recent white paper regarding AI and the COVID pandemic.

Name ML Approaches/Applications
nCov2019 location dataset Epidemiology
COVID-19 Open Research Dataset Challenge (Kaggle) NLP/IR for finding relevant passages
COVID-19 Open Research Dataset (CORD-19) Research articles
European Centre for Disease Prevention and Control Daily Global Statistics Dashboard. Daily situation report summaries and data tables
COVID-19 image data collection Diagnosis from medical images
CHIME: COVID-19 Hospital Impact Model for Epidemics Hospital case management + prediction
Genomic epidemiology of hCov-19 Genomics
COVID-19: The First Public Coronavirus Twitter Dataset NLP + IR for social media analysis
Protein Data Bank: Covid-19 Coronavirus REsources Protein modeling
Novel Coronavirus 2019 Dataset Epidemiology
WHO Database of publications on coronavirus disease (COVID-19) NLP/IR
Dimensions COVID-19 publications, data sets, clinical trials Research + validation
Italian Covid-19 Database Various
Realtime tracking of genetic evolution (tree) of covid-19 across the world  Modeling + Epidemiology
COVID-19 Korea Dataset & Comprehensive Medical Dataset & visualizer Epidemiology
Sequences of outbreak isolates and records relating to coronavirus biology. Genomics 
COVID-19 BSTI Imaging Database Diagnosis from medical images
Covid-19 Twitter chatter dataset for scientific use Twitter NLP source data and preprocessing data
Dataset of Infections in Germany Includes age and sex information.
API for Coronavirus Data Dataset

Last updated: March 25th 2020