University of Illinois
Last Reviewed: 12/19/2019
The University of Illinois and Mayo Clinic are leveraging the power of data analytics, artificial intelligence, machine learning, and high-performance computation to advance healthcare discovery.
The Center for Computational Biotechnology and Genomic Medicine (CCBGM) brings together the engineering and genomic biology strengths of the University of Illinois Urbana Champaign (UIUC) with the world renowned expertise in individualized medicine and clinical research and practice of the Mayo Clinic to work on computationally challenging projects that use artificial intelligence, machine learning and system innovations to advance healthcare discovery.
The goal of the CCBGM is to leverage the power of data analytics, artificial intelligence, machine learning, and high-performance computation to advance healthcare discovery. In order to accomplish this there is a need for computational tools for big data analytics and machine learning models in genomic and genetic data analysis, patient specific data, imaging, compression, encryption, and data transfer. The center will address these needs using biological modeling, algorithm design, interface development and iterative optimization in direct collaboration with cutting-edge biology researchers analyzing real data.
Companies interested in our center include computer system vendors and developers, pharma and bio companies, hospitals and the insurance industry as well as others with a focus on pre-competitive technology, tools and methods.
The CCBGM offers:
The CCBGM’s mission is:
To contribute to the nation’s research infrastructure base by developing long-term partnerships among and between a diverse group of industry, academia, and government.
To leverage NSF funds for the management of CCBGM coordination with industry in support of faculty researchers and graduate students performing industrially relevant research.
To expand the innovation capacity of our nation’s competitive workforce in genomics-focused big data analytics through partnerships between industry and the CCBGM
CCBGM Research Thrusts
The set of projects in the three thematic components leverage the multidisciplinary capabilities of the CCBGM team and focus on clinical knowledge in human patients. However, the methods, tools, and algorithms developed as part of these efforts (e.g., microbiome, compression, imaging, genomic security, and acceleration) apply in the broader context of analyzing the sequence data of crops, animals,and other organisms.
The computing and data management component will focus on innovations in storage and compression technologies for genomic data. Such methods are required to process and understand large-scale bioinformatics problems, including epistatic interactions from genome-wide association studies (GWAS) addressed at scale. Traditionally, studies have focused on individual variants, their expression changes, and associated observable phenotypes. However, with the current availability of computing resources and knowledge extracted from genomics big data, we propose to study epistatic interactions, which are the effects of two or more variants on an observed phenotype. Further, growth in genomics data poses a problem in storage and retrieval, and these challenges are still not well addressed. One need from the scientific community analyzing the data is to avoid losing resolution in the data when the compressed data are decompressed. To that end, we will develop compression algorithms and theory for efficient compression techniques such that the users of genomics data and their analyses will not be affected, as though the data never went through a compression process.
The actionable intelligence component will look at the translation of big data to clinical knowledge. The overarching goal is to enhance patient-specific understanding of disease and tailor diagnosis and individualized treatment. Projects in this thematic component will develop technologies to identify and classify genomic variants, genes, and drivers for human disease. Specifically, we will develop algorithms to help merge heterogeneous datasets (e.g., multi-omics, clinical, and microbiome) and identify statistically significant mutations, genes, metabolites, pathways, and networks that are associated with clinical or functional outcomes. With those patient-specific findings, we can potentially identify drugs that are designed to affect those genes/pathways/metabolites, thereby increasing the chances of a successful treatment and recovery instead of using generic drugs that might not work on specific patients. For example, if a metabolite predictive of depressive disorder (for which there is no known drug) is found, pharmaceutics can investigate related regulatory pathways and potentially innovate a new drug.
The systems innovation component will address the design and implementation of specialized computer systems to efficiently and accurately execute the algorithms for mining actionable intelligence from genomic data. Application-specific computing systems must have the ability to (1) efficiently handle storage and retrieval of large quantities of data produced in sequencing experiments as well as a corpus of medical information that maintains known correlations between genomic variants, genes, pathways, and human diseases; and (2) efficiently compute complex statistical analyses and machine-learning algorithms on parallel-processing platforms such as GPUs and FPGAs, as well as scale out to utilize large warehouse-scale computers (clouds, supercomputers). The projects will explore the design of a common schema for information exchange by closely studying the shared semantics of several annotation datasets (including dbSNP, 1000genomes, or the Human Gene Mutation Database) and their relationships to a variant based on location or ID cross-reference. Our design will also address constant evaluation, monitoring, and quality control of algorithms, workflows, and systems, which will provide the flexibility to incorporate new data, statistical models, and algorithms as they become available.
Currently funded projects include:
Examples of recent project accomplishments include:
Using information-compression algorithms for genomic data storage and transfer to facilitate efficient organization and maintenance of omics databases to allow fast random access, query and search via specialized software solutions.
Artificial intelligence is a catalyst and incubator for bringing individualized medicine technologies into clinical practice. Probing whether psychiatric treatment augmented by machine learning can select patient specific therapeutics in patients with depression, the Mayo Clinic Department of Psychiatry and engineers from the University of Illinois have developed clinically relevant innovations that combine psychiatrist’s evaluations and patient’s omics information.
Scaling the computation of epistatic interactions in GWAS Data to develop fast production-grade software to enable the detection of epistasis in many existing GWAS datasets, in both the biomedical and agricultural fields.
Artificial intelligence and probabilistic modeling help identify epilepsy-causing brain regions. A large-scale evaluation of an artificial intelligence based approach for locating seizure origins from non-seizure data.
Improving the accuracy of genomic variant calling through deep learning using new machine-learning implementations will use industry-standard libraries, such as Tensorflow, and STL and targeting both CPUs and FPGAs.
CCBGM researchers developed an algorithm to help physicians identify individuals who might benefit from genetic testing for a predisposition to certain cancers. Data-driven analytics and machine learning research allowed Mayo Clinic scientists to more closely examine the molecular mechanisms of triple negative breast cancer, using single cell techniques.
CCBGM will utilize the advanced research infrastructure existing at the partners – University of Illinois at Urbana-Champaign and the Mayo Clinic. Facilities, laboratories, and resources of the partners are below.
University of Illinois at Urbana-Champaign
Beckman Institute for Advanced Science and Technology
Institute for Genomic Biology
Coordinated Science Laboratory
Sequencing Unit of the Carver Biotechnology Center
Merged computer infrastructure of HPCBio, Institute for Genomic Biology, and Carver Biotechnology Center
Organization of Mayo Medical Center
Center for Individualized Medicine
Next Generation DNA sequencing
Bioinformatics Program and Service Lines
Information Technology Program.
Collectively, these facilities and laboratories provide research opportunities in the multidisciplinary areas of genomics and computing on our research university campuses. For example, CCBGM offers a structure and forum for multidisciplinary work involving both genomic biologists and computer scientists and engineers from the Center for Individualized Medicine at Mayo, and the Institute for Genomic Biology at Illinois. The strength of our facilities and labs allow the Center to assemble a diverse and complementary set of research partners from the universities who are exceptionally qualified to address big challenges in biology, bioinformatics, and computing as they apply to agriculture, health care, energy, and other critical human issues.