You are here

Center for Computational Biotechnology and Genomic Medicine (CCBGM)

Mayo Clinic

University of Chicago

University of Illinois

Last Reviewed: 05/22/2017

The Center for Computational Biotechnology and Genomic Medicine applies expertise in petascale computing, software algorithm optimization, and human genome deep sequencing technologies to transform healthcare with genomic medicine.

Center Mission and Rationale

The CCBGM will use the power of computational predictive genomics to advance pressing societal issues that require predictive genomics, such as enabling patient-specific cancer treatment, understanding and modifying microbiomes, and supporting humanity’s rapidly expanding need for food by improving the efficiency of plant and animal agriculture. The CCBGM brings together two university sites with unique resources in computational and biological sciences, namely the University of Illinois at Urbana-Champaign (UIUC) and Mayo Clinic; the University of Chicago, working as an affiliate institution, has expertise in high-performance computing (HPC; through Argonne National Laboratory) and cancer genomics.

The CCBGM offers:

  • An approach to big-data problems in genomic biology that comprehensively spans all of its key elements, from analytics and computing to generation of actionable intelligence.
  • Biological expertise ranging from human genomics to crop and animal sciences combined with expertise in algorithms and computing systems (e.g., HPC, cloud, and special-purpose acceleration).
  • A strong track record of working with industry in the multidisciplinary domains of computing,biotechnology, and life sciences.

 

The CCBGM’s mission is:

  • To contribute to the nation’s research infrastructure base by developing long-term partnerships among industry, academia, and government.
  • To leverage NSF funds for the management of CCBGM coordination with industry in support of faculty researchers and graduate students performing industrially relevant research.
  • To expand the innovation capacity of our nation’s competitive workforce in genomics-focused bigdata analytics through partnerships between industry and the CCBGM.

Research program

CCBGM Research Thrusts

The set of projects in the three thematic components leverage the multidisciplinary capabilities of the CCBGM team and focus on clinical knowledge in human patients. However, the methods, tools, and algorithms developed as part of these efforts (e.g., microbiome, compression,imaging, and acceleration) apply in the broader context of analyzing the sequence data of crops, animals,and other organisms.

The computing and data management component will focus on innovations in storage and compression technologies for genomic data. Such methods are required to process and understand large-scale bioinformatics problems, including epistatic interactions from genome-wide association studies (GWAS) addressed at scale. Traditionally, studies have focused on individual variants, their expression changes, and associated observable phenotypes. However, with the current availability of computing resources and knowledge extracted from genomics big data, we propose to study epistatic interactions, which are the effects of two or more variants on an observed phenotype. Further, growth in genomics data poses a problem in storage and retrieval, and these challenges are still not well addressed. One need from the scientific community analyzing the data is to avoid losing resolution in the data when the compressed data are decompressed. To that end, we will develop compression algorithms and theory for efficient compression techniques such that the users of genomics data and their analyses will not be affected, as though the data never went through a compression process.

The actionable intelligence component will look at the translation of big data to clinical knowledge. The overarching goal is to enhance patient-specific understanding of disease and tailor diagnosis and individualized treatment. Projects in this thematic component will develop technologies to identify and classify genomic variants, genes, and drivers for human disease. Specifically, we will develop algorithms to help merge heterogeneous datasets (e.g., multi-omics, clinical, and microbiome) and identify statistically significant mutations, genes, metabolites, pathways, and networks that are associated with clinical or functional outcomes. With those patient-specific findings, we can potentially identify drugs that are designed to affect those genes/pathways/metabolites, thereby increasing the chances of a successful treatment and recovery instead of using generic drugs that might not work on specific patients. For example, if a metabolite predictive of depressive disorder (for which there is no known drug) is found, pharmaceutics can investigate related regulatory pathways and potentially innovate a new drug.

The systems innovation component will address the design and implementation of specialized computer systems to efficiently and accurately execute the algorithms for mining actionable intelligence from genomic data. Application-specific computing systems must have the ability to (1) efficiently handle storage and retrieval of large quantities of data produced in sequencing experiments as well as a corpus of medical information that maintains known correlations between genomic variants, genes, pathways, and human diseases; and (2) efficiently compute complex statistical analyses and machine-learning algorithms on parallel-processing platforms such as GPUs and FPGAs, as well as scale out to utilize large  warehouse-scale computers (clouds, supercomputers). The projects will explore the design of a common schema for information exchange by closely studying the shared semantics of several annotation datasets (including dbSNP, 1000genomes, or the Human Gene Mutation Database) and their relationships to a variant based on location or ID cross-reference. Our design will also address constant evaluation, monitoring, and quality control of algorithms, workflows, and systems, which will provide the flexibility to incorporate new data, statistical models, and algorithms as they become available.

Facilities and Laboratory

CCBGM will utilize the advanced research infrastructure existing at the partners – University of Illinois at Urbana-Champaign and the Mayo Clinic. Facilities, laboratories, and resources of the partners are below.

University of Illinois at Urbana-Champaign

Beckman Institute for Advanced Science and Technology
Institute for Genomic Biology
Coordinated Science Laboratory
HPCBio
CompGen
Sequencing Unit of the Carver Biotechnology Center
Merged computer infrastructure of HPCBio, Institute for Genomic Biology, and Carver Biotechnology Center

Mayo Clinic

Organization of Mayo Medical Center
Center for Individualized Medicine
Next Generation DNA sequencing
Bioinformatics Program and Service Lines
Information Technology Program.

Collectively, these facilities and laboratories provide research opportunities in the multidisciplinary areas of genomics and computing on our three research university campuses.   For example, CCBGM offers a structure and forum for multidisciplinary work involving both genomic biologists and computer scientists and engineers from the Center for Individualized Medicine at Mayo, and the Institute for Genomic Biology at Illinois.  The strength of our facilities and labs allow the Center to assemble a diverse and complementary set of research partners from the universities who are exceptionally qualified to address big challenges in biology, bioinformatics, and computing as they apply to agriculture, health care, energy, and other critical human issues.

Locations

University of Illinois

1308 West Main Street

Urbana, Illinois, 61801

United States

University of Chicago

900 East 57th Street

Chicago, Illinois, 60637

United States

Mayo Clinic

200 First Clinic SW

Rochester, Minnesota, 55905

United States