Dr Weida Tong

Bioinformatics and Biostatistics, NCTR, Director


Make Genomics Reproducible Again – MAQC and Beyond
Weida Tong, Ph.D
National Center for toxicological Research, NCTR/FDA

Across the landscape of clinical medicine, drug development and genomic technology development, reproducibility is the foundation for their translation to clinical utility and regulatory application. Human error, computational error, or technical error leading to irreproducibility of biomedical research hampers clinical translation of biomarkers and therapeutics. These concerns pervade through the use of high-throughput genomic technologies such as microarrays and next-generation sequencing for both preclinical and clinical studies. In this presentation, The FDA-led community-wide MAQC/SEQC consortium projects will be discussed that is focused on promoting standardization and quality control efforts to address alarming concerns on the lack of reproducibility in the generation, analysis, and interpretation of genomics data. Among various issues encountered, computational reproducibility becomes increasingly challenging in this field. This is simply due to the fact that the size of data is so massive that the manual inspection of data quality and analysis results are often prohibited and thus the reproducibility is largely in the mercy of algorithms used where the metrics to asses reproducibility have not been established. Thus, the presentation will discuss some of advancements in this area based on the data generated from MAQC/SEQC and beyond. In the end, a set of lessons-learned and general guideline will be provided to explicitly consider reproducibility, a fundamental hallmark of good science, in analysis of transcriptomics data.

Prof. Dr Christos Hatzis

Bioinformatics, Director

Yale Cancer Center, U.S.A.

Heterogeneity and its effect on patient selection: progress and remaining challenges

Tumors expressing specific markers become eligible for treatment with targeted therapies, but only a subset of these patients respond to the treatment and have a long-term survival benefit. Identifying responding patients is complicated by the genetic heterogeneity of the tumors and its interaction with the stromal components. We will review the progress in breast cancer and outline remaining challenges.

Prof. Dr Leming Shi

Center for Pharmacogenomics, Director

Fudan University, Shanghai, China

Prediction of Cancer of Unknown Primary (CUP) Using Tissue-Specific Molecular Signatures from GTEx and TCGA Data

Luyao Ren, Jingcheng Yang, Bin Li, Chen Suo, Ying Yu, Yuanting Zheng, Leming Shi
Center for Pharmacogenomics, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai 200438, China. E-mail: lemingshi@fudan.edu.cn

Cancer of unknown primary site (CUP) is a common disease affecting 2-5% of people who suffer from epithelial tumors. Histologic and imaging techniques fail to identify the primary sites of tumors even after all the clinicopathological and laboratory investigations have been done. Accurate classification of primary sites of tumors can benefit patients because site-specific therapies improve patient survival. Every normal tissue performs its unique biological functions by expressing a set of tissue-specific genes. We hypothesize that the classification of CUP can benefit from the large molecular profiles of normal tissues in the Genotype Tissue Expression project (GTEx), because tissue-specific expression profiles can be preserved during carcinogenesis. First, we use the GTEx data set to identify genes specifically expressed in normal tissue types. Secondly, expression profiles of the tissue-specific genes are used to develop classification models to predict the origins of normal tissues, primary sites, and metastatic sites using The Cancer Genome Atlas (TCGA) data. Thirdly, we also evaluate the prediction performance of models derived from different levels of omics data, including gene expression, somatic mutation, copy number variation, methylation, and protein expression. Fourthly, we compare molecular alternations between normal tissues, known primary sites, metastases, and CUP.

Prof. Dr Thomas Rattei

Division of Computational Systems Biology, Head

University of Vienna, Austria

Deciphering the microbiome: Large-scale prediction of microbial roles and traits

The human microbiome is crucial for our health. Many links between the microbiome, human disease and therapy efficacy have been established during the last years. However, the individuality and dynamics of the human microbiome make it extremely difficult to develop clinical applications based on microbiome diagnostics or alteration. It is therefore an important task for current research, to better understand fundamental properties of the human microbiome on the functional level, beyond taxonomy.

The accessibility of almost complete genome sequences of uncultivable microbial species from human microbiomes is a promising achievement of contemporary metagenomics, but necessitates computational methods to predict microbial phenotypes solely based on genomic data. We have investigated how comparative genomics can be utilized for the prediction of microbial phenotypes. We have improved and extended the PICA framework, which uses machine learning for phenotypic trait prediction. We have demonstrated its applicability to large-scale genome databases and incomplete genome sequences. Most of the traits can be reliably predicted in only 60-70% complete genomes. So far, 45 models have been developed and made available via our web platform phendb.org. We suggest that the extended PICA framework can be used to automatically annotate phenotypes in near-complete microbial genome sequences from human microbiomes, as generated in large numbers in current highly-resolved metagenomics studies.

Prof. Dr Sepp Hochreiter

Institute of Bioinformatics, Director

Johannes Kepler University, Linz, Austria

Deep Learning in Drug Design

Deep Learning has emerged as one of the most successful fields of machine learning and artificial intelligence with overwhelming success in industrial speech, language and vision benchmarks. Consequently it became the central field of research for IT giants like Google, facebook, Microsoft, Baidu, and Amazon. Deep Learning is founded on novel neural network techniques, the recent availability of very fast computers, and massive data sets. In its core, Deep Learning discovers multiple levels of abstract representations of the input.
Using Deep Learning we won the NIH Tox21 challenge organized by the US agencies NIH, EPA, and FDA, which was an unprecedented multi-million-dollar
effort to test toxicity prediction methods. In collaboration with pharma companies Deep Learning has identified unknown side effects of drug candidates when given their chemical structure and learned on data from bioassays. We extended this approach to high content imaging, where we detect biological effects given images of cell lines to which a compound was added. We deploy Deep Neural Networks to toxicity and target prediction in collaboration with Janssen, Merck, Novartis, AstraZeneca, GSK, Bayer together with hardware-related companies like Intel, HP, NVIDIA and others.

Prof. Dr Tieliu Shi

Shanghai Key Laboratory of Regulatory Biology, Professor

East China Normal University, Shanghai, China

A microRNA based model that associates survival and treatment responses to therapy in lung adenocarcinoma

Yimin Ma, Jiajun Chen, Geng Chen and Tieliu Shi*
The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
*Correspondence: tlshi@bio.ecnu.edu.cn

Lung adenocarcinoma (LUAD), a leading cause of cancer-related death worldwide, is the most common subtype of lung cancer. MicroRNAs (miRNAs) are a class of small non-protein-coding RNAs which have negative correlation with mRNAs. In this study, we first selected 300 miRNAs which actually negatively regulated mRNAs in the TCGA LUAD cohort and built the miRNA-mRNA regulation pairs based on the target information and regulation relationships. Then, we randomly divided LUAD patients in the TCGA cohort into a training set and a testing set. We refined 27 miRNAs identified by our methods, being capable of distinguish patients with different overall survival time individually. As a consequence, seven miRNAs (has-miR-194-1, hsa-miR-194-2, hsa-miR-548b, hsa-miR-556, hsa-miR-624, hsa-miR-767 and hsa-miR-3136) were selected to build a risk-score model based on the training set. We validated this model in the testing set and an independent dataset, thereby providing a distinct and consistent predictor of overall survival among LUAD patients. This model is an independent predictor of overall survival, also providing useful insight into stage I, stage II and stage III LUAD patients, which can contribute to the clinical application besides the current TNM staging system or subdivide the current system.

Dr Christoph Bock

Center for Molecular Medicine, PI

Austrian Academy of Sciences, Vienna, Austria

Bioinformatics for Personalized Medicine: Looking Beyond the Genome

CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
Department of Laboratory Medicine, Medical University of Vienna, Vienna, Austria
Max Planck Institute for Informatics, Saarbrücken, Germany

The complexity of the human body requires trillions of individual cells to integrate, interact, and strike the right balance between stability and plasticity. Key mechanisms underlying this extraordinary feat of self-organization are encoded in the human genome, yet there are additional
levels of regulation that are operate on top of the genomic DNA sequence, collectively referred to as the epigenome.
International consortia have mapped the human genome, epigenome, and transcriptome in hundreds of cells types. These maps are now being refined by ongoing single-cell sequencing projects, which will eventually give rise to a comprehensive catalog of all cells in the human body.
Contributing to these consortia and building on their data, we investigate the relevance of the human epigenome for personalized medicine, focusing on better diagnostics, adaptive therapies, and disease modeling.
We have developed bioinformatic methods for analyzing and interpreting DNA methylation data (reviewed in: Bock 2012 Nature Reviews Genetics), which contribute to their use as epigenetic biomarkers. Epigenome analysis will have an important role to play for a forward-looking and
prediction-based approach to cancer therapy, which is inspired by the impact of computational methods on HIV therapy (reviewed in: Bock & Lengauer 2012 Nature Reviews Cancer).
In our ongoing work, we co-develop computational and experimental methods for epigenome, transcriptome, and multi-omics profiling in single cells (reviewed in: Bock et al. 2016 Trends in Biotechnology). We have also established an assay and bioinformatic methods for CRISPR single-
cell sequencing, which enables the large-scale functional analysis of regulatory mechanisms (Datlinger et al. 2017 Nature Methods), and we are applying this technology to dissect the role of the human epigenome for cancer and immune diseases.

Christoph Bock is supported by a New Frontiers Group award of the Austrian Academy of Sciences and by an ERC Starting Grant (n° 679146).

Prof. Dr Wenzhong Xiao

MGH and Stanford Genome Technology Center

Harvard Medical School, Boston and Stanford University, California, U.S.A.

Modeling Human Metabolism for Precision Medicine

A long-term goal of computational biology is reliable in silico simulation of human health and disease. Here I will discuss our recent progress in systems-level modeling of human metabolism by the integration of human metabolic network models, enzyme kinetics, exchange rates of metabolites between tissues, and multi-omics data. We applied this approach of personalized semi-quantitative metabolic modeling to a study of muscle wasting after severe burn injury, and successfully described known metabolic dysfunctions in hospitalized patients. In silico gene knock-out and knock-in analyses predicted key genes in patients as potential targets for intervention. Developing knowledge-based multi-tissue models of human metabolism will be critical for personalized diagnosis and treatments of metabolic diseases.

Dr Joaquin Dopazo

Clinical Bioinformatics Area, Director

Fundación Progreso y Salud, Seville, Spain

Use of models of signaling circuits for individualized prognosis and targeted therapeutic interventions

Clinical Bioinformatics Research Area, Fundacion Progreso y Salud, Sevilla, Spain

Gene expression measurements or gene mutation data constitute low-informative, decontextualized values of cell activity or integrity that can be related with disease outcomes in an attempt to understand the molecular mechanisms that govern the cell behavior or fate. Typically, functional enrichment methods (e.g. Gene Ontology enrichment) are used to help in the interpretation of the observed gene expression changes. However, a more comprehensive, systems-based understanding of the way in which genes interact to shape the phenotype is required. Here I will discuss how primary gene expression and gene mutation data can be integrated and transformed into mechanism-based biomarkers containing higher-level information on the molecular mechanisms that determine complex phenotypes, such as disease outcome or drug effect, by means of relatively simple models of signaling pathway activity. Moreover, I will show how these models can be used to propose knowledge-based therapeutic interventions.

Prof. Dr Joanna Polańska

Data Mining Division, Head

Silesian University of Technology, Gliwice, Poland

Divisive intelligent k-means algorithm in support of identification of homogeneous clusters in high dimensional data

Application of MALDI Imaging mass spectrometry (MSI) in cancer research allows for the spatial identification of molecular profiles and their heterogeneity within the tumour, but leads to the creation of highly complicated datasets of great volume. The data is extremely high-dimensional, 1 tissue sample of low spatial and ion resolution, gives data of 100,000 mass channels (features) x 10,000 spectra (observations). Feature dimensionality of the size of 100,000 causes, except for storage and computational load, the extra problems, among which convergence of distances is one of the most important. MSI data contains redundancy, repetition, and non-informative signals. From dimensionality reduction technique we require to remove the data redundancy as much as possible and to loose the crucial information as less as possible. Dimensionality reduction techniques can be divided into two main groups: (1) feature selection; and (2) feature extraction. Feature extraction uses data transformation which transforms high-dimensional MSI data into a space of much lower number of variables. Feature selection methods focus on choosing the most representative subset of features. In MALDI analyses, features are originally defined by peptides/lipids represented by peaks. Our spectrum modelling by Gaussian Mixture (GM) not only enables the peptide abundance to be more accurately estimated, but it also allows overlapping peaks to be resolved. GM modelling significantly reduces number of features but the problem of spatial redundancy remains unresolved.
We have also developed an intelligent spectra clustering algorithm, named divisive iK-means with region-driven feature selection, that consists of recursive sub-region splitting performed in reduced domain independently customized for every sub-region to be split. Both its elements, the iterative character of the developed algorithm and the sub-region dependent reduction of feature domain help in discovery of hidden secondary tissue structure. The diviK technique was successfully applied in search for molecular heterogeneity of tumour tissue among head & neck and thyroid cancer patients.

1. Pietrowska M, Diehl H, Mrukwa G, Kalinowska-Herok M, Gawin M, Chekan M, Elm J, Drazek G, Krawczyk A, Lange D, Meyer HE, Polanska J, Henkel C, Widlak P: Molecular profiles of thyroid cancer subtypes: classification based on features of tissue revealed by mass spectrometry imaging. Biochimica et Biophysica Acta – Proteins and Protemics, 2016, S1570-9639(16): 30217-5
2. Widlak P, Mrukwa G, Kalinowska M, Pietrowska M, Chekan M, Wierzgon J, Gawin M, Drazek G, Polanska J: Detection of molecular signatures of oral squamous cell carcinoma and normal epithelium - application of a novel methodology for unsupervised segmentation of imaging mass spectrometry data. Proteomics, 2016, 16(11-12):1613- 21
3. Polanski A, Marczyk M, Pietrowska M, Widlak P, Polanska J: Signal Partitioning Algorithm for Highly Efficient Gaussian Mixture Modeling in Mass Spectrometry. Plos ONE, 2015, 10(7): e0134256

Doz. Dr Peter Sykacek

Boku University Vienna, Austria

Towards malignancy directed subclassifying of complex diseases

Peter Sykacek
Department of Biotechnology
University of Natural Resources and Life Sciences
Muthgasse 18, 1190 Wien

The inherent complexity of diseases like cancer which emerge in the same organ in various manifestations render the provision of optimal treatments a very complicated task. Subclassifying diseases like breast cancer or glioblastoma multiformae is thus an important step towards personalising treatments. Routine use of large scale high throughput methods in translational medical research led to many initiatives trying to subdivide diseases on the basis of their molecular signatures. The current state of the art is to integrate diverse information sources like genetic and epigenetic variability, expression signatures at mRNA, miRNA, protein levels and sometimes even differences in phosphorylation states and perfrom data driven clustering to arrive at prototpical signatures of diseases with different molecular manifestations. Established classifications are then used to assign cases to disease subtypes. Investiagtions of clinical parameters like patient survuival are subsequently used to demonstrate the validity of disease classifications, for example by highlighting the differences in disease subtype specific Kaplan-Meier curves.

From a data scientific perspective such purely molecular signature driven classifications are not at all satisfying as they summarise all case specific differences, ignoring that the causes of differences may have no relation to the disease of interest. The proposition of this presentation is to evaluate whether a malignancy directed filtering of molecular signatures can lead to a better undrestanding of disease
subtypes. We propose to this end using mixture of GLMs with negative binomial observation models. Model inference is based on a variational Bayesian method. The relation of expression signatures to malignancy is captured by Bayes factor driven indicator variables which provide a weight for subsequent clustering. An application of the proposed approach on a subset of the TCGA-GBM (Glioblastoma) project suggests that the proposition is worth persuing. Further improvements of the model are however essential.

Dr. Chen Suo

Fudan University, Shanghai, China

Integrative analysis of multi-omics data reveals potential driver genes in cancer

Genome and transcriptome analyses can be used to explore cancers comprehensively, and it is common to have multiple omics data measured from each
individual. Furthermore, there are rich functional data such as predicted impact of genomic alterations on protein coding and gene/protein networks. However, integration of the complex information across different omics and functional data is still challenging. Clinical validation, particularly based on patient outcomes such survival, is important for assessing the relevance of the integrated information for comparing different procedures.
An analysis pipeline is built for integrating genomic and transcriptomic alterations from genomic and transcriptomic data and functional data from protein function prediction and gene interaction networks. The method accumulated evidence for the functional implications of genomic altered potential driver genes found within and across patients. A driver-gene score (DGscore) is developed to capture the cumulative effect of such genes. To contribute to the score, a gene has to be frequently altered, with high impact at its RNA expression level, and functionally linked to many differentially expressed neighbors in the functional gene network. The pipeline has been applied to 60 matched tumor and normal samples of the same patient from The Cancer Genome Atlas breast-cancer project and 48 high- risk neuroblastoma samples, respectively. In clinical validation, patients with DGscore have worse survival than those with low scores.
In conclusion, integration of mutation, copy number alteration, expression and functional data allows identification of clinically relevant potential driver genes in cancer.

Prof. Dr Johannes Grillari


Evercyte and Boku University Vienna, Austria

miRNAs as members of the senescence associated secretory phenotype influence cell fate decision

Lucia Terlecki-Zaniewicz, Sylvia Weilner, Ingo Lämmermann, Regina Weinmüllner, Hanna Dellago, Matthias Hackl, Markus Schosserer and Johannes Grillari

Christian-Doppler Laboratory on Biotechnology of Skin Aging, Department of Biotechnology, BOKU-University of Natural Resources and Life Sciences Vienna, Austria

Cellular senescence has evolved from an in vitro model system to study aging to a multifaceted phenomenon of in vivo importance since senescent cells in vivo have been identified and their removal delays the onset of age-associated diseases in a mouse model system. In order to understand how senescent cells that accumulate within organisms with age negatively impact on organ and tissue function, we have started to characterize miRNAs and RNA modifying proteins that are differentially expressed in early passage versus senescent cells and their functional role in the context of cellular and organismal aging. Thereby, we identified circulating miRNAs as bona fide members of the senescence associated secretory phenotype (SASP) that are transferred from senescent cells to their microenvironment or even the systemic environment. These miRNAs are transported via extracellular vesicles and recipient cells taking them up are altered in their cell fate, including altered osteogenic differentiation of mesenchymal stem cells or altered wound healing of skin keratinocytes.
In summary, we present evidence of the importance of specific miRNAs and highlight their potential use as biomarkers of aging and age-associated diseases, or even as therapeutic tools and targets to prevent age-associated diseases.

Prof. Dr Dimitar Vassilev


Sofia University, Bulgaria

An Ontology-Based Approach for Integration of Clinical and Molecular Information for Assistance in Medical Diagnostics

A platform for integration of clinical and bio-medical (genetic, biochemical and metabolical) data is presented. The aim is also to federate different software tools for data and knowledge processing.
The platform is used for medical and healthcare practice and research in particular for translating for the purposes of prevention and diagnostic of rare diseases.
Expected information and knowledge is used also for risk assessment in medical insurance and social healthcare.

Dr Andreas Posch

Managing Director

Ares Genetics GmbH, Vienna, Austria

The Genetic Antibiotic Resistance and Susceptibility Database (GEAR) as a Translational Research Platform for Precision Diagnostics in Infectious Diseases and Antibiotic Stewardship

Antimicrobial drug resistance is a major public health burden. Current reports on emerging drug resistance are usually limited to specific bacteria or small sets of drugs. Here we introduce the Genetic Antibiotic Resistance and Susceptibility Database (GEAR), the most comprehensive collection of pathogen genomes and antibiotic resistance profiles, developed in a translational research partnership with the Chair for Clinical Bioinformatics at Saarland University. GEAR contains the entire DNA sequences of more than 11,000 bacterial strains as well as related sensitivity data for 21 antibiotics. The strains were isolated from patient samples at over 200 sites across the world over the last three decades. GEAR allows to assemble and annotate bacterial genomes from next-generation sequencing raw data, identify genetic variations in those genomes and correlate them with the response of the respective bacterial strain to antibiotics. This talk will illustrate how GEAR can be used to advance our understanding about the genetic foundations of antibiotic resistance, monitor and investigate global patterns of emerging antibiotic resistance as well as enable personalized & optimized treatment based on next-generation precision diagnostics.

Dr Michał Okoniewski

Scientific IT Services

ETH Zurich, Switzerland

Personalized Health Genomics Data: Solutions for Collaborative Analytics

The talk covers the experience and findings from personalised medicine projects related to collaborative data management and analytics of big genomic variant data.
Scientific IT Services of ETH performs various tasks in the context of Swiss Personalised Health Network. In particular performs a comparative study of collaborative data management software for genomic-based medicine,The software is expected to ensure the common collection and processing mechanisms for the labs generating the data and the data processing centres. For the data center in Zurich also a specialised cluster for medical data Leonhard-Med has been deployed.
The complementary project on big data analytics done at the Warsaw University of Technology aims into benchmarking of genomic data warehouses implemented with the distributed data storage and querying engines. The recently published results include the preferred data storage and querying engines for specific queries on population data, reflecting particular real medical questions.
Both projects give insights on various precision medicine data-related challenges and solutions.

Prof. Dr Witold Rudnicki

Director of University Computing Center, Professor

University of Białystok & ICM, University of Warsaw

Knowledge discovery in data using Multidimensional Feature Selection

Witold R. Rudnicki∗,a,b,c), Krzysztof Mnich b), Szymon Migacz c), Pawe Tabaszewski c), Radosaw Piliszek b), Aneta Polewko-Klim a) and W.Lesinski a)

a) Institute of Informatics, University of Bialystok, Bialystok, Poland
b) Computational Centre, University of Bialystok, Bialystok, Poland
c) Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland
∗ presenting author, e-mail: w.rudnicki@uwb.edu.pl .

Keywords: All-relevant feature selection, mutual information, multi-dimensional filters.

Identification of informative variables is often the most important step of the analysis of a dataset but it tends to be performed using quick filtering procedures. Unfortunately variables that are highly relevant due to synergistic interactions with other variables may be lost due to simple one-dimensional filtering criteria. What is more, the ranking of relevance does not take into account the interactions with other variables.
Here we present a software library MDFS [1] devoted to identification of variables that carry information on the decision variable using multi-dimensional approach. It is an implementation of an algorithm based on conditional mutual information provided as an R library with computational engines in C++ and CUDA C. It performs an exhaustive search of all low-dimensional subspaces of the system in a reasonable time. To this end, the mutual entropy of all possible k-tuples of variables (k equal 1-5) with decision variable is computed. Then, for each variable the maximal information gain due to interactions with other variables is obtained. For non-informative variables this quantity conforms to the well known statistical distributions. This allows for discerning truly informative variables from non-informative ones.
MDFS can be applied to datasets described with millions of variables containing hundreds of thousands of objects. The exhaustive search of the pairwise synergetic effects for the gene expression data for 1000 objects and 20 000 genes takes less than minute a single GPU, while the 3D search will take less than 24 hours. Even the 4D analysis can be performed within a week on a medium size computational cluster equipped with GPUs.
The application of the library is demonstrated using datasets obtained from neuroblastoma patients. The genetic markers that carry information on difference in the clinical endpoints (survival and death). In this work we predict the patient’s survivor based on gene expression data studied by means of RNA-seq and microarrays described in [2] as well as copy number data described in [3]. To identify relevant variables we used standard t-test and MDFS. Then the Random Forest classifiers were built using the most important variables identified by both methods.
In most cases the MDFS method is more sensitive than t-test. What is more, the Random Forest [4] models build using most relevant features obtained from MDFS have lower error than those obtained from t-test.

Research was supported by the grant from the Polish NSC, grant UMO-2013/09/B/ST6/01550.

Mnich, K. and Rudnicki W.R., arXiv:1705.05756, 2017.
Zhang, W. et al. Genome biol. 16, 2015.
Theissen J. et al., GENE CHROMOSOME CANC, 53, 2014.
Breiman L. MACH LEARN, 45, 2001.

Dr Serghei Mangul

Fellow of Institute for Quantitative and Computational Biosciences


Profiling adaptive immune repertoires across 544 individuals from 53 GTEx tissues by RNA Sequencing

Assay-based approaches provide a detailed view of the adaptive immune system by profiling immunoglobulin (Ig) and T cell (TCRs) receptor repertoires. However, these methods carry a high cost and lack the scale of standard RNA sequencing (RNA-Seq). Here we report the development of ImReP, a novel computational method for rapid and accurate profiling of the adaptive immune repertoires from regular
RNA-Seq data. ImReP can also accurately assemble the complementary determining regions 3 (CDR3s), the most variable regions of immune receptors. We applied our novel method to 8,555 samples across 53 tissues from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. ImReP is able to efficiently extract receptor-derived reads from RNA-Seq data. Using ImReP, we have created a systematic atlas of immune sequences across a broad range of tissue types, most of which have not been studied for adaptive immune receptor repertoires. We also compared the GTEx tissues to track the flow of T and B cell clonotypes across immune-related tissues, including secondary lymphoid organs and organs encompassing mucosal, exocrine, and endocrine sites, and we examined the compositional similarities of clonal populations between these tissues. The Atlas of immune repertoires (The AIR), is freely available at https://smangul1.github.io/TheAIR/, is one of the largest collection of CDR3 sequences and tissue types. We anticipate this recourse will enhance future immunology studies and advance the development of personal therapies for human diseases. In particular, B and T cell diversity obtained by Imrep, can be used as universal cancer immunotherapy biomarker. ImReP is freely available at https://sergheimangul.wordpress.com/imrep/ .

Prof. Dr Haja Kadarmideen

Head of Department, Professor

Technical University of Denmark, Lyngby, Denmark

Applications of computational and Systems genomics methods towards personalized medicine in obesity

Haja N Kadarmideen, Department of Bio and Health Informatics, Technical University of Denmark, 2800 Kongens Lyngby, Denmark

Genome Wide Association studies (GWAS) has been used very widely in identifying genetic architecture of complex diseases and traits in human genetics. It has been shown that GWAS is statistically and computationally less attractive than whole genomic prediction (WGP) method when the aim is to predict genetic risks to inherit a disease or stratify patients for disease treatments. WGP method fits all genomic variants in one- step in a statistical model and avoids multiple testing inherent to GWAS. WGP models run for multiple diseases at the same run thus readily offers identification of pleiotropic genetic variants. This talk will provide some overview of large scale meta GWAS applied to childhood obesity cohorts followed by theory and applications of WGP methods. To understand the functional / systems biology of identified genes, it is often required to conduct the gene expression profiling in multiple tissues of relevance to obesity and diabetes. While differential gene expression methods are applied to identify regulatory mechanisms, gene co-expression network (GCN) methods has attractive computational and statistical properties. This talk will cover both theory and applications of GCN to obesity and metabolic diseases, specifically the WGCNA and PCIT methods. Finding disease causing genes/ pathogenic variants is cumbersome – we adapt systems genomics methods that integrates genomic data with transcriptomic data in identifying both regulatory and causal variants (cis and trans eQTLs / eSNPs) – methods are easily extended to other –omics datasets. This talk will also present a new method (called WISH) that goes beyond additive GWAS models in order to study genome-wide genetic interactions (pair-wise epistasis). The WISH method implements ultra-fast calculations of genome-wide pair-wise genetic epistasis, computes epistatic variance components of complex diseases, builds-up genetic interaction networks based on scale-free topology and provides visualization tools. Applications of WISH to obesity and metabolic diseases in pig model for obesity will be discussed.

Dr Hong Fang

Senior Bioinformatician, Coordinator


Employing FDALabel Database: Extracting Pharmacogenomics Information to Advance the Study of Precision Medicine

Hong Fang1, Ryley Uber1, Zhichao Liu1, Joshua Xu1, Shraddha Thakkar1, Shashi Amur2, Padmaja Mummaneni2, Minjun Chen1, Baitang Ning1, Steve Harris1, Guangxu Zhou1, Leihong Wu1, Paul Howard1, Weida Tong1

1 National Center for Toxicological Research, U.S. Food and Drug Administration (FDA), Jefferson, AR 72079;
2 Office of Translational Science (OTS), Center for Drug Evaluation and Research (CDER), FDA, Silver Spring, MD 20993, USA

Pharmacogenomics (PGx) focuses on how genomics and genetic variants (inherited and acquired) affect drug response. A better understanding of the association between genetic markers and individual phenotypes may improve therapy by enhancing drug efficacy, safety, and advance precision medicine. The FDALabel database (https://rm2.scinet.fda.gov/druglabel/#simsearch-0) was developed from the FDA's Structured Product Labeling (SPL) repository to allow users to perform full-text and customizable searches of the labeling section {e.g. Boxed Warning, Warning and Precautions, Adverse Reaction (AR) sections}. In this study, 48 known biomarkers were used to query PGx relevant contents from the FDALabel database, including Indication, Clinical Pharmacology, Clinical Studies, and Use in Specific Populations. As a result, we identified 163 drugs out of 1129 small molecule drugs with PGx biomarker information. Furthermore, statistical analysis, pattern recognition, and network visualization were applied to investigate association of drug efficacy and severe ARs with PGx biomarkers and subpopulation. The results indicated that these drugs have a higher association with certain ARs in specific patient subpopulations (e.g., a higher association between CYP2D6 poor metabolizers and ARs caused by drugs for the treatment of psychiatric disoders ), and cover a broad range of therapeutic classes (e.g., Psychiatry, Cardiology, Oncology, and Endocrinology). FDALabel database (free publicly available) provides a convenient tool to navigate and extract PGx information from FDA-approved drug. The knowledge gained from these drugs and biomarkers in this study will enhance the understanding of PGx to advance precision medicine.