Proceedings of the IEEE - Popular - Leung

Proceedings of the IEEE - Popular - Leung

Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets

Published in January 2016



Michael K. K. Leung, Andrew Delong, Babak Alipanahi, and Brendan J. Frey


In this paper, we provide an introduction to machine learning tasks that address important problems in genomic medicine. One of the goals of genomic medicine is to determine how variations in the DNA of individuals can affect the risk of different diseases, and to find causal explanations so that targeted therapies can be designed. Here we focus on how machine learning can help to model the relationship between DNA and the quantities of key molecules in the cell, with the premise that these quantities, which we refer to as cell variables, may be associated with disease risks. Modern biology allows high-throughput measurement of many such cell variables, including gene expression, splicing, and proteins binding to nucleic acids, which can all be treated as training targets for predictive models. With the growing availability of large-scale data sets and advanced computational techniques such as deep learning, researchers can help to usher in a new era of effective genomic medicine.