In the Laboratory of Functional and Structural Genomics we examine the relationship of the expression levels of selected genes from their location in three-dimensional space. In addition, we use structural information to enrich the sequential genomic analysis in order to better define the function of selected genomic regions that are important in the context of personalized medicine.

For this purpose, first we are developing a variety of large-scale computational tools for analysis of whole genome sequences, the identification of structural variants, determining the statistical significance of the observed number of copies of genomic regions in selected cohorts of patients. Secondly, we evaluate their uniqueness comparing the observed changes with typical and natural genomic diversity that has been cataloged for example in the 1000 Genomes Project Consortium. Thirdly, we infer the biological function of these genomic regions using publicly available databases. Fourthly, we identify unique local three-dimensional environment for selected sites, eg. regulatory ones. In the fifth step, we analyze the impact of structural re-arrangements of those local neighborhoods on the gene expression profiles, which is related to the presence of transcription factories


Michał Łaźniewski
researcher
PhD

Miguel Ángel Lermo Jiménez
researcher
PhD student

Abstract: We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.

Authors: Poznik GD, Xue Y, Mendez FL, Willems TF, Massaia A, Wilson Sayres MA, Ayub Q, McCarthy SA, Narechania A, Kashin S, Chen Y, Banerjee R, Rodriguez-Flores JL, Cerezo M, ShaoH, Gymrek M, Malhotra A, Louzada S, Desalle R, Ritchie GR, Cerveira E, Fitzgerald TW, Garrison E, Marcketta A, Mittelman D, Romanovitch M, Zhang C, Zheng-Bradley X, Abecasis GR, McCarroll SA, Flicek P, Underhill PA, Coin L, Zerbino DR, Yang F, Lee C, Clarke L, Auton A, Erlich Y, HandsakerRE, 1000 Genomes Project Consortium, Bustamante CD, Tyler-Smith C

Note: 'Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosomesequences' by Poznik GD, Xue Y, Mendez FL, Willems TF, Massaia A, Wilson Sayres MA, Ayub Q,McCarthy SA, Narechania A, Kashin S, Chen Y, Banerjee R, Rodriguez-Flores JL, Cerezo M, ShaoH, Gymrek M, Malhotra A, Louzada S, Desalle R, Ritchie GR, Cerveira E, Fitzgerald TW, Garrison E,Marcketta A, Mittelman D, Romanovitch M, Zhang C, Zheng-Bradley X, Abecasis GR, McCarroll SA,Flicek P, Underhill PA, Coin L, Zerbino DR, Yang F, Lee C, Clarke L, Auton A, Erlich Y, HandsakerRE, 1000 Genomes Project Consortium, Bustamante CD, Tyler-Smith C. Nat Genet. 2016 Apr 25.doi: 10.1038/ng.3559.

Abstract: Protein–protein interactions (PPIs) play a vital role in most biological processes. Hence their comprehension can promote a better understanding of the mechanisms underlying living systems. However, besides the cost and the time limitation involved in the detection of experimentally validated PPIs, the noise in the data is still an important issue to overcome. In the last decade several in silico PPI prediction methods using both structural and genomic information were developed for this purpose. Here we introduce a unique validation approach aimed to collect reliable non interacting proteins (NIPs). Thereafter the most relevant protein/protein-pair related features were selected. Finally, the prepared dataset was used for PPI classification, leveraging the prediction capabilities of well-established machine learning methods. Our best classification procedure displayed specificity and sensitivity values of 96.33% and 98.02%, respectively, surpassing the prediction capabilities of other methods, including those trained on gold standard datasets. We showed that the PPI/NIP predictive performances can be considerably improved by focusing on data preparation.

Authors: Srivastava A, MazzoccoG, Kel A, Wyrwicz LS, Plewczynski D

Note: 'Detecting reliable non interacting proteins (NIPs) significantly enhancing the computational prediction of protein-protein interactions using machine learning methods' Srivastava A, MazzoccoG, Kel A, Wyrwicz LS, Plewczynski D. Mol Biosyst. 2016 Jan 7.

Authors: PlewczynskiD, Gruca S, Szałaj P, Gulik K, de Oliveira SF, Malhotra A

Note: 'Analysis of Structural Chromosome Variants by Next Generation Sequencing Methods' PlewczynskiD, Gruca S, Szałaj P, Gulik K, de Oliveira SF and Malhotra A. book chapter in 'Clinical Applicationsfor Next-Generation Sequencing' book, Elsevier, 2015

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

Authors: 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR

Note: 'A global reference for human genetic variation' by 1000 Genomes Project Consortium,Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S,McVean GA, Abecasis GR. Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Abstract: Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.

Authors: Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M, Konkel MK, Malhotra A, Stütz AM, Shi X, Paolo Casale F, Chen J, Hormozdiari F, Dayama G, ChenK, Malig M, Chaisson MJ, Walter K, Meiers S, Kashin S, Garrison E, Auton A, Lam HY, Jasmine MuX, Alkan C, Antaki D, Bae T, Cerveira E, Chines P, Chong Z, Clarke L, Dal E, Ding L, Emery S, FanX, Gujral M, Kahveci F, Kidd JM, Kong Y, Lameijer EW, McCarthy S, Flicek P, Gibbs RA, Marth G, Mason CE, Menelaou A, Muzny DM, Nelson BJ, Noor A, Parrish NF, Pendleton M, Quitadamo A, Raeder B, Schadt EE, Romanovitch M, Schlattl A, Sebra R, Shabalin AA, Untergasser A, Walker JA, Wang M, Yu F, Zhang C, Zhang J, Zheng-Bradley X, Zhou W, Zichner T, Sebat J, Batzer MA, McCarroll SA; 1000 Genomes Project Consortium, Mills RE, Gerstein MB, Bashir A, Stegle O, Devine SE, Lee C, Eichler EE, Korbel JO

Note: 'An integrated map of structural variation in 2,504 human genomes' by Sudmant PH, Rausch T,Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M,Konkel MK, Malhotra A, Stütz AM, Shi X, Paolo Casale F, Chen J, Hormozdiari F, Dayama G, ChenK, Malig M, Chaisson MJ, Walter K, Meiers S, Kashin S, Garrison E, Auton A, Lam HY, Jasmine MuX, Alkan C, Antaki D, Bae T, Cerveira E, Chines P, Chong Z, Clarke L, Dal E, Ding L, Emery S, FanX, Gujral M, Kahveci F, Kidd JM, Kong Y, Lameijer EW, McCarthy S, Flicek P, Gibbs RA, Marth G,Mason CE, Menelaou A, Muzny DM, Nelson BJ, Noor A, Parrish NF, Pendleton M, Quitadamo A,Raeder B, Schadt EE, Romanovitch M, Schlattl A, Sebra R, Shabalin AA, Untergasser A, Walker JA,Wang M, Yu F, Zhang C, Zhang J, Zheng-Bradley X, Zhou W, Zichner T, Sebat J, Batzer MA,McCarroll SA; 1000 Genomes Project Consortium, Mills RE, Gerstein MB, Bashir A, Stegle O,Devine SE, Lee C, Eichler EE, Korbel JO. Nature. 2015 Oct 1;526(7571):75-81. doi: 10.1038/nature15394.

Abstract: Glyceraldehyde-3-phosphate dehydrogenase from human sperm (GAPDHS) provides energy to the sperm flagellum, and is therefore essential for sperm motility and male fertility. This isoform is distinct from somatic GAPDH, not only in being specific for the testis but also because it contains an additional amino-terminal region that encodes a proline-rich motif that is known to bind to the fibrous sheath of the sperm tail. By conducting a large-scale sequence comparison on low-complexity sequences available in databases, we identified a strong similarity between the proline-rich motif from GAPDHS and the proline-rich sequence from Ena/vasodilator-stimulated phosphoprotein-like (EVL), which is known to bind an SH3 domain of dynamin-binding protein (DNMBP). The putative binding partners of the proline-rich GAPDHS motif include SH3 domain-binding protein 4 (SH3BP4) and the IL2-inducible T-cell kinase/tyrosine-protein kinase ITK/TSK (ITK). This result implies that GAPDHS participates in specific signal-transduction pathways. Gene Ontology category-enrichment analysis showed several functional classes shared by both proteins, of which the most interesting ones are related to signal transduction and regulation of hydrolysis. Furthermore, a mutation of one EVL proline to leucine is known to cause colorectal cancer, suggesting that mutation of homologous amino acid residue in the GAPDHS motif may be functionally deleterious.

Authors: Tatjewski M, Gruca A, Plewczynski D, Grynberg M

Note: 'The proline-rich region of glyceraldehyde-3-phosphate dehydrogenase from human sperm may bindSH3 domains, as revealed by a bioinformatic study of low-complexity protein segments' Tatjewski M,Gruca A, Plewczynski D, Grynberg M. Mol Reprod Dev. 2015 Dec 11. doi: 10.1002/mrd.22606.

Abstract: The aftermath of influenza infection is determined by a complex set of host-pathogen interactions, where genomic variability on both viral and host sides influences the final outcome. Although there exists large body of literature describing influenza virus variability, only a very small fraction covers the issue of host variance. The goal of this review is to explore the variability of host genes responsible for host-pathogen interactions, paying particular attention to genes responsible for the presence of sialylated glycans in the host endothelial membrane, mucus, genes used by viral immune escape mechanisms, and genes particularly expressed after vaccination, since they are more likely to have a direct influence on the infection outcome.

Authors: Arcanjo AC, Mazzocco G, de Oliveira SF, Plewczynski D, Radomski JP

Note: 'Role of the host genetic variability in the influenza A virus susceptibility' by Arcanjo AC, Mazzocco G, de Oliveira SF, Plewczynski D, Radomski JP. Acta Biochim Pol. 2014; 61(3):403-19. Epub 2014 Sep 4.