You are here

Center for Computational Biotechnology and Genomic Medicine (CCBGM)

Mayo Clinic

University of Illinois

Last Reviewed: 12/19/2019

The University of Illinois and Mayo Clinic are leveraging the power of data analytics, artificial intelligence, machine learning, and high-performance computation to advance healthcare discovery.

Center Mission and Rationale

The Center for Computational Biotechnology and Genomic Medicine (CCBGM) brings together the engineering and genomic biology strengths of the University of Illinois Urbana Champaign (UIUC) with the world renowned expertise in individualized medicine and clinical research and practice of the Mayo Clinic to work on computationally challenging projects that use artificial intelligence, machine learning and system innovations to advance healthcare discovery. 

The goal of the CCBGM is to leverage the power of data analytics, artificial intelligence, machine learning, and high-performance computation to advance healthcare discovery.   In order to accomplish this there is a need for computational tools for big data analytics and machine learning models in genomic and genetic data analysis, patient specific data, imaging, compression, encryption, and data transfer. The center will address these needs using biological modeling, algorithm design, interface development and iterative optimization in direct collaboration with cutting-edge biology researchers analyzing real data.  

Companies interested in our center include computer system vendors and developers, pharma and bio companies, hospitals and the insurance industry as well as others with a focus on pre-competitive technology, tools and methods.

The CCBGM offers:

  • An approach to big-data problems in genomic biology that comprehensively spans all of its key elements, from analytics and computing to generation of actionable intelligence.
  • Biological expertise ranging from human genomics to crop and animal sciences combined with expertise in algorithms and computing systems (e.g., HPC, cloud, and special-purpose acceleration).
  • A strong track record of working with industry in the multidisciplinary domains of computing, biotechnology, and life sciences.
  • Access to multidisciplinary faculty, clinicians, and students working in bioinformatics, genomic applications, health sciences and computing systems and algorithms.


The CCBGM’s mission is:

    To contribute to the nation’s research infrastructure base by developing long-term partnerships among and between a diverse group of industry, academia, and government.

    To leverage NSF funds for the management of CCBGM coordination with industry in support of faculty researchers and graduate students performing industrially relevant research.

    To expand the innovation capacity of our nation’s competitive workforce in genomics-focused big data analytics through partnerships between industry and the CCBGM

Research program

CCBGM Research Thrusts

The set of projects in the three thematic components leverage the multidisciplinary capabilities of the CCBGM team and focus on clinical knowledge in human patients. However, the methods, tools, and algorithms developed as part of these efforts (e.g., microbiome, compression, imaging, genomic security, and acceleration) apply in the broader context of analyzing the sequence data of crops, animals,and other organisms.

The computing and data management component will focus on innovations in storage and compression technologies for genomic data. Such methods are required to process and understand large-scale bioinformatics problems, including epistatic interactions from genome-wide association studies (GWAS) addressed at scale. Traditionally, studies have focused on individual variants, their expression changes, and associated observable phenotypes. However, with the current availability of computing resources and knowledge extracted from genomics big data, we propose to study epistatic interactions, which are the effects of two or more variants on an observed phenotype. Further, growth in genomics data poses a problem in storage and retrieval, and these challenges are still not well addressed. One need from the scientific community analyzing the data is to avoid losing resolution in the data when the compressed data are decompressed. To that end, we will develop compression algorithms and theory for efficient compression techniques such that the users of genomics data and their analyses will not be affected, as though the data never went through a compression process.

The actionable intelligence component will look at the translation of big data to clinical knowledge. The overarching goal is to enhance patient-specific understanding of disease and tailor diagnosis and individualized treatment. Projects in this thematic component will develop technologies to identify and classify genomic variants, genes, and drivers for human disease. Specifically, we will develop algorithms to help merge heterogeneous datasets (e.g., multi-omics, clinical, and microbiome) and identify statistically significant mutations, genes, metabolites, pathways, and networks that are associated with clinical or functional outcomes. With those patient-specific findings, we can potentially identify drugs that are designed to affect those genes/pathways/metabolites, thereby increasing the chances of a successful treatment and recovery instead of using generic drugs that might not work on specific patients. For example, if a metabolite predictive of depressive disorder (for which there is no known drug) is found, pharmaceutics can investigate related regulatory pathways and potentially innovate a new drug.

The systems innovation component will address the design and implementation of specialized computer systems to efficiently and accurately execute the algorithms for mining actionable intelligence from genomic data. Application-specific computing systems must have the ability to (1) efficiently handle storage and retrieval of large quantities of data produced in sequencing experiments as well as a corpus of medical information that maintains known correlations between genomic variants, genes, pathways, and human diseases; and (2) efficiently compute complex statistical analyses and machine-learning algorithms on parallel-processing platforms such as GPUs and FPGAs, as well as scale out to utilize large  warehouse-scale computers (clouds, supercomputers). The projects will explore the design of a common schema for information exchange by closely studying the shared semantics of several annotation datasets (including dbSNP, 1000genomes, or the Human Gene Mutation Database) and their relationships to a variant based on location or ID cross-reference. Our design will also address constant evaluation, monitoring, and quality control of algorithms, workflows, and systems, which will provide the flexibility to incorporate new data, statistical models, and algorithms as they become available.

Special Activities

Currently funded projects include:

  • Information-Compression Algorithms for Genomic Data Storage and Transfer - Data compression enables timely exchange and long-term storage of heterogeneous biological and clincial data.  To facilitate efficient organization and mainenance of genomic databases and to allow for fast random access, query, and search, specialized software solutions for compression and computing in the compressive domain are being developed.
  • Improving the Accuracy of Genomic Variant Calling Through Deep Learning - This project is developing new deep learning approaches to tackle unsolved problems for vaiant calling (eg. SNPs and small indels in low-complexity regions with ambiguity).  Newly develooped algorithms provide the best variant calling quality and also translate well across different application domains, sequencing methods, and platforms.  New machine-learning based implimentaion will use industry-standard libraries and target both GPUs and FPGAs for computation acceleration.
  • Algorithms for Experimental Design in Cancer Genomics - Methods for tumor sequencing study design aimed at reducing ambiguity do not currently exist.  This project proposes the first computational method, which given preliminary sequencing data, will suggest follow-up sequencing experiments with the aim of reducing non-uniqueness of solutions.  This method will be based on a mathematical model that incoproates a tradeoff between non-uniqueness and cost of different sequencing technologies.  This method will lead to better dequencing experiments that improve the understanding of turmigenesis and provide actionable intelligence for personalized medicine.
  • Secure Access and Sharing of Health Data - The growth in electonic health data and patient specific biological data provide huge opportunities to transform healthcare delivery.  However, it is essential to share data among and between researchers,clinicians and computing experts and this creates new challenges in securing data access and sharing in the presence of malicious actors that may want to violate the data integrity, compromise data availitiy or misuse the data. This project focuses on a securing a genomic data processing pipeline.
  • Machine Learning and Neurodegenerative Diseases - Machine learning can assist with many aspects of dementia research and clinical care however their translation has been hindered by the lack of holistic approaches that can support end-to-end health delivery. This project is developing holistic analytic solutions by addressing core analytic challenges in multiple aspects of dementia-related care delivery and research: prevention, early detection, understanding disease etiologies and personlized care.  This project will address these key challenges using artificial intellligence-based analytical techniques, and through their clinical tranlsation, will demonstrate end-to-end care delivery for dementia inclincial sesttings. 


Examples of recent project accomplishments include:

Using information-compression algorithms for genomic data storage and transfer to facilitate efficient organization and maintenance of omics databases to allow fast random access, query and search via specialized software solutions.

Artificial intelligence is a catalyst and incubator for bringing individualized medicine technologies into clinical practice. Probing whether psychiatric treatment augmented by machine learning can select patient specific therapeutics in patients with depression, the Mayo Clinic Department of Psychiatry and engineers from the University of Illinois have developed clinically relevant innovations that combine psychiatrist’s evaluations and patient’s omics information.

Scaling the computation of epistatic interactions in GWAS Data to develop fast production-grade software to enable the detection of epistasis in many existing GWAS datasets, in both the biomedical and agricultural fields.

Artificial intelligence and probabilistic modeling help identify epilepsy-causing brain regions. A large-scale evaluation of an artificial intelligence based approach for locating seizure origins from non-seizure data.

Improving the accuracy of genomic variant calling through deep learning using new machine-learning implementations will use industry-standard libraries, such as Tensorflow, and STL and targeting both CPUs and FPGAs.

CCBGM researchers developed an algorithm to help physicians identify individuals who might benefit from genetic testing for a predisposition to certain cancers. Data-driven analytics and machine learning research allowed Mayo Clinic scientists to more closely examine the molecular mechanisms of triple negative breast cancer, using single cell techniques.

Facilities and Laboratory

CCBGM will utilize the advanced research infrastructure existing at the partners – University of Illinois at Urbana-Champaign and the Mayo Clinic. Facilities, laboratories, and resources of the partners are below.

University of Illinois at Urbana-Champaign

Beckman Institute for Advanced Science and Technology
Institute for Genomic Biology
Coordinated Science Laboratory
Sequencing Unit of the Carver Biotechnology Center
Merged computer infrastructure of HPCBio, Institute for Genomic Biology, and Carver Biotechnology Center

Mayo Clinic

Organization of Mayo Medical Center
Center for Individualized Medicine
Next Generation DNA sequencing
Bioinformatics Program and Service Lines
Information Technology Program.

Collectively, these facilities and laboratories provide research opportunities in the multidisciplinary areas of genomics and computing on our research university campuses.   For example, CCBGM offers a structure and forum for multidisciplinary work involving both genomic biologists and computer scientists and engineers from the Center for Individualized Medicine at Mayo, and the Institute for Genomic Biology at Illinois.  The strength of our facilities and labs allow the Center to assemble a diverse and complementary set of research partners from the universities who are exceptionally qualified to address big challenges in biology, bioinformatics, and computing as they apply to agriculture, health care, energy, and other critical human issues.


University of Illinois

1308 West Main Street

Urbana, Illinois, 61801

United States

Mayo Clinic

200 First Clinic SW

Rochester, Minnesota, 55905

United States