Loading…
This event has ended. Visit the official site or create your own event on Sched.
Monday, July 16
 

09:00 MSK

Registration
Monday July 16, 2018 09:00 - 09:45 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

09:45 MSK

Opening Ceremony
Speakers
avatar for Sergey Aplonov

Sergey Aplonov

Vice Rector for Research, Saint Petersburg State University


Monday July 16, 2018 09:45 - 09:55 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

09:55 MSK

Remarks
Speakers
avatar for Anton Korobeynikov

Anton Korobeynikov

Associate Professor, Center for Algorithmic Biotechnology, Saint Petersburg State University, 6 linia V.O., 11/21d, 1990034 St Petersburg, Russia
avatar for Alla Lapidus

Alla Lapidus

Professor, Center for Algorithmic Biotechnology, Saint Petersburg State University, 6 linia V.O., 11/21d, 1990034 St Petersburg, Russia


Monday July 16, 2018 09:55 - 10:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

10:00 MSK

Tools to Link Human and Environmental Microbiomes for Health
The rapid decline in cost of sequencing technology together with advances in computational techniques has led to the possibility of integrating microbial knowledge across spatial and temporal scales. In this talk, I describe approaches developed for the Human Microbiome Project that allow us to map microbes from birth to death and across the body. I also describe how these human-associated microbial communities relate to those in the environment. Finally, I show how we can integrate chemical and microbial mapping to understand systems like the cystic fibrosis lung, and, ultimately, to take control of our own gut microbiology to improve our health.

Speakers
avatar for Rob Knight

Rob Knight

Professor, UC San Diego
Ph.D., Professor at the University of California, San Diego, Co-founder of the Earth Microbiome Project


Monday July 16, 2018 10:00 - 11:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

11:00 MSK

Break
Monday July 16, 2018 11:00 - 11:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

11:20 MSK

Exploring the microbiome of primates using cell-free DNA
Blood circulates throughout the body and contains molecules drawn from almost every tissue. What can we learn by studying the circulating nucleic acids in it? Using high-throughput, non-targeted shotgun sequencing of circulating cell-free DNA from plasma, besides sequences from the host, we detect those from: bacteria, archaea, eukaryotic parasites and viruses. After careful host subtraction and iterative stages of assembly and annotation there are thousands of microbial contigs over 1 kbp. The majority of these have predicted coding sequences, however most have low levels or no homology to existing sequences. The presence of the novel sequences was validated using independent sequencing experiments and direct PCR amplification. Known sequences support many prior observations of taxa present in primate microbiomes. The structure of the microbiome detected in blood is stable for a few months, and correlates with the environment more strongly than the host species, although viruses have a host-taxa association. Numerous potentially zoonotic taxa can be identified in an unbiased manner, and this together with the breadth of novel taxa, show that microbial diversity and the need to monitor environment reservoirs is higher than previously appreciated.

Speakers
MK

Mark Kowarsky

Stanford University


Monday July 16, 2018 11:20 - 11:40 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

11:40 MSK

Soil microbiome as a keystone factor of soil genesis - the insight from the large scale study of heterogeneous soil chronosequences.
As soil is the most complex ecosystem in terms of microbial biodiversity, the models differentiating ecological factors at every stage of soil evolution, are strongly needed. The best models are soil chronosequences, where soil formation occur from the initial stage to the embryonic soil containing the horizons of zonal soil type. The study characterizes the microbiomes of the set of chronosequences in various climatic zones: mining sites, post pyrogenic soils, Antarctic moraines and soils of the coastal transgression. In all chronosequences the soil evolutionary stages were described, the replicated samples were taken from the entire soil profile. The amplicon libraries of the 16S rRNA gene were sequenced by use of ILLUMINA MiSeq platform. The quantitative approaches were performed to study the amount of bacteria, archaea and fungi in soil samples. Bioinformatics analysis included both the traditional and original methods. The profile analysis revealed a relationship between microbiome structure and soil genesis, especially the processes of decomposition and the ongoing mineralization of organic residues. It was also shown that the strength of the ecological factors determining the structure of microbiomes depended on climatic zone. The main factors were soil pH and vegetation for temperate climate, water regime and the deglaciation for polar and semi-polar regions and the amount of mineralized organic matter for post-pyrogenic soils. The study was funded by RSF №14-26-00094p (soil sampling) and RSF № 17-16-01030 (sequencing).

Speakers
EP

Elizaveta Pershina

All-Russia Research Institute for Agricultural Microbiology


Monday July 16, 2018 11:40 - 12:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

12:00 MSK

QIIMP: Microbiome metadata made easy
Drawing meaningful conclusions from even the best microbiome data is impossible without accurate and relevant metadata about the samples. However, researchers routinely struggle to record complete, consistent metadata that meet international minimum information standards. The Center for Microbiome Innovation has therefore developed QIIMP, the Quick and Intuitive Interactive Metadata Portal, which guides researchers through the generation of high-quality, standards-compliant metadata files.  I will introduce QIIMP and demonstrate how it integrates with and improves upon existing metadata handling approaches.




















Speakers
AB

Amanda Birmingham

University of California, San Diego


Monday July 16, 2018 12:00 - 12:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

12:20 MSK

New algorithms and tools for large-scale sequence analysis of metagenomics data
Sequencing costs have dropped much faster than Moore's law in the past decade. The analysis of large metagenomic datasets and not its generation is the now the main time and cost bottleneck. We present three methods that together much alleviate the challenges posed by the exploding amount of metagenomics data and that allow us to go from an experiment-by experiment analysis to large-scale analyses of hundreds or thousands of datasets.
MMseqs2 is a protein sequence and profile search method slightly more sensitive than PSI-BLAST and 400 times faster. MMseqs2 can annotate 1.1 billion sequences in 8.3 hours on 28 cores. MMseqs2 offers great potential to increase the fraction of annotatable (meta)genomic sequences. Linclust is a sequence clustering method whose run time scales linearly with the input set size, not nearly quadratically as in conventional algorithms. It can cluster 1.6 billion metagenomic sequence fragments in 10 hours on a single server to 50% sequence identity, >1000 times faster than has been possible previously. PLASS (unpublished) is a metagenomic protein sequence assembler whose runtime and memory scale linearly with dataset size. It can assemble ten times more protein sequences from soil metagenomes, and faster than Megahit and other popular nucleotide-level assemblers.

Speakers
JS

Johannes Soeding

Max Planck Institute for Biophysical Chemistry
MS

Martin Steinegger

Max Planck Institute for Biophysical Chemistry


Monday July 16, 2018 12:20 - 12:40 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

12:40 MSK

Lunch
Monday July 16, 2018 12:40 - 14:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

14:00 MSK

Culture-free generation of highly contiguous microbial genomes from human and marine microbiomes
There are more than 1,000 species of bacteria, viruses and fungi that live in the human gut. Far from being passive passengers, these organisms strongly interact with host metabolism, the immune system, and more. For all of this interaction, the dynamics between human hosts and bacteria (microbiome) has only been explored in earnest for the last fifteen to twenty years. Compelling early experiments have shown that intestinal microbiome composition is associated with obesity, cardiovascular diseases, and the effectiveness of certain cancer chemotherapies. Therefore, understanding the impact of microbiomes speciation on noncommunicable diseases such as cancer, hematological and cardiometabolic disorders is fundamental to our health care. But how does one begin to model the dynamics of >1,000, mostly un-sequenced species and strains of bacteria, viruses and fungi? Our translational laboratory develops and applies novel molecular and computational tools to study strain level dynamics of the microbiome, to understand how microbial genomes change over time and predict the functional output of microbiomes. These innovations allow us to better (1) measure the types and functions of microbes in patients with non-communicable diseases, (2) iterate interaction models between microbial genes, gene products, and host cells and (3) test the impact of microbially targeted interventions in clinical trials.

Speakers
avatar for Ami Bhatt

Ami Bhatt

Assistant Professor, Stanford University
Ph. D., Assistant Professor, Departments of Medicine and Genetics, Divisions of Hematology and BMT


Monday July 16, 2018 14:00 - 15:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

15:00 MSK

Assembling a (diploid/polyploid) genome to perfection: the case of the bdelloid rotifer Adineta vaga
Genome scientists commonly turn a blind eye to heterozygosity by aiming to reconstruct a non-redundant haploid genome. As a result of this methodological bias, most short-read assemblers available to date are incapable of resolving diploid or polyploid genomes. Theoretically, the use of third-generation sequencing reads of great lengths (such as PacBio and Nanopore reads) and/or the long-distance information provided by chromosome conformation capture (3C) should solve the problem of diploid/polyploid genome assembly and produce chromosome-scale, haplotype-specific assemblies, but fully resolved heterozygous genomes are still extremely rare in the literature. As a test of the potential of these approaches to deliver "perfect" assemblies, we turned to the reasonably sized genome of the bdelloid rotifer Adineta vaga (expected size: 244 Mb). Bdelloid rotifers are famous for their tens of million years of evolution in the apparent absence of meiotic sex (only females have ever been observed, and they produce eggs clonally via mitotic parthenogenesis). Such a long evolution without recombination is expected to result in a complex genome structure replete with palindrome and colinearity breakpoints, an hypothesis that can only be tested by assembling separately all haplotypes. In 2013, we published a first diploid (actually, tetraploid) draft genome of Adineta vaga, the assembly of which from 21X 454 reads took six weeks of computation on a 64-core, 256-Gb RAM server using MIRA; final N50s was 47 kb for contig and 260 kb for scaffolds. To finish this genome and produce a fully resolved diploid assembly, we have now generated 100X 2*250 bp Illumina paired-end reads, 100X PacBio, 100X Nanopore and 150X 3C data from the same clonal lineage. Despite this plethora of data and our use of a panel of state-of-the-art approaches (including custom-developed ones), generating a perfect, telomere-to-telomere assembly turned out to be more difficult than expected and required a significant amount of manual refinements. We are now developing novel tools to automatize and streamline these approaches, with the aim of making perfect assemblies the norm rather than the exception.

Speakers
JF

Jean-François Flot

Université libre de Bruxelles


Monday July 16, 2018 15:00 - 15:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

15:20 MSK

Main results from the 3,000 rice genomes sequences
Analysis of the 3,000 rice genomes, sequenced by CAAS, BGI and IRRI, revealed unprecedented amount of genome diversity in Oryza Sativa, defined population structure with remarkable precision, identified genes and genome regions with unusual conservation and discovered haplotype structure of domestication genes. Combining sequence data with available phenotypic information we were able to find new trait-loci associations and to confirm previously known associations. Our portal SNP-Seek is commonly used for allele mining and visualization of genome variations.

Speakers
NA

Nickolai Alexandrov

Inari Agriculture, Inc.


Monday July 16, 2018 15:20 - 15:40 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

15:40 MSK

Protein storage in plant seeds is associated with the amyloid formation
Amyloids are protein fibrils exhibiting ordered spatial structure known as cross-β. Such a structure gives amyloids extreme resistance to different chemical and physical influences. Historically, amyloids were generally considered as the lethal pathogens causing dozens of incurable disorders in humans and animals but recently amyloids became clear to be essential functional quaternary protein structures involved in various biological functions in archaea, bacteria and eukaryotes. Despite their social importance, plants remain only large group of multicellular organisms where amyloids were not found. We performed a large-scale bioinformatics analysis of the distribution of potentially amyloidogenic regions in the proteomes of 75 species of the lands plants that comprised about 2.9 million of proteins. Using two bioinformatics tools, Waltz and SARP, we demonstrated that potentially amyloidogenic proteins are widespread in the proteomes of plants with their number corresponded to the number of amyloidogenic proteins in the proteomes of organisms in which amyloids were previously identified, like humans and different species of fungi. Amyloidogenic proteins of plants tended to be associated with different biological processes and functions including defense from pathogens, transmembrane transport and protein storage in seeds. In-depth analysis of the association between amyloidogenic properties and seed storage function of such proteins was done. We found that seed storage proteins comprising conservative β-barrel domain Cupin-1, mainly 7S and 11S globulins, are rich in amyloidogenic regions in the most of land plant species. So, 302 storage protein with Cupin-1 domain belonging to 54 of 75 species analyzed contained amyloidogenic regions. In addition, we identified 119 seed storage proteins with Zein domain, 121 proteins with Gliadin domain, 13 with Vicilin domain and 7 proteins with high molecular weight Glutenin that were found to be amyloidogenic. Experimental analysis performed with several storage proteins or their regions confirmed amyloid properties including formation of unbranched fibrils, binding amyloid-specific dyes, and formation of detergent-resistant aggregates. Based on these data we conclude that amyloid formation by seed storage proteins represent a novel molecular mechanism for long-term stabilization of such proteins to avoid their degradation and misfolding during natural dehydration and unfavorable environmental conditions.

This work was supported by the Russian Science Foundation (Grant No 17-16-01100).

Speakers
avatar for Anton A. Nizhnikov

Anton A. Nizhnikov

Head of Laboratory, All-Russia Research Institute for Agricultural Microbiology, Saint-Petersburg


Monday July 16, 2018 15:40 - 16:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

16:00 MSK

Search for new amyloidogenic proteins: bioinformatic predictions and verification in the yeast-based assays​
Formation of cross-beta fibrous aggregates (amyloids) is implicated in a variety of human diseases, and many amyloids have been shown to possess transmissible (prion) properties. While various algorithms for prediction of amyloidogenic properties were developed, these predictions are difficult to verify in vivo due to the complexity of the human organism. Heritable endogenous amyloids found in yeast cells and termed yeast prions provide a powerful approach to the investigation of amyloid formation in vivo. We have developed a yeast assay for studying the prion properties of mammalian and human proteins (Chandramowlishwaran et al. 2018 293: 3436-3450). This assay employs chimeric constructs that contain mammalian or human proteins (or domains) fused to the prion domain of the yeast prion protein Sup35. Prion formation by Sup35 is phenotypically detectable, thus enabling a monitoring of amyloid nucleation and propagation by mammalian and human amyloidogenic sequences in yeast. We have demonstrated that such chimeric constructs are applicable to studying amyloid properties of various proteins associated with mammalian and human diseases. By using this yeast-based assay, we have verified predictions of the algorithm ArchCandy (Ahmed et al. 2015 Alzh
Dementia 11: 681-690) and identified new human proteins and distinct domains with amyloidogenic properties.

This work was supported by the SPbSU project 15.61.2218.2013, RSF grant 14-50-00069 and NIH grant P50AG025688. The authors acknowledge the SPbSU Resource Centers “CHROMAS”, “Molecular and Cell Technologies” and “Biobank” for technical support.

Speakers

Monday July 16, 2018 16:00 - 16:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

16:20 MSK

Break
Monday July 16, 2018 16:20 - 17:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

16:20 MSK

Poster Section
Speakers
avatar for Timofei Ermak

Timofei Ermak

BIOCAD / Institute of Cell Biophysics RAS

Poster Section
AA

Aleksandr Arzamasov

SBP Medical Discovery Institute
AS

Alexander Shlemov

Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University
avatar for Alexey Gurevich

Alexey Gurevich

Senior Research Scientist, Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia
I am leading Natural Product Discovery research direction at CAB (http://cab.spbu.ru/research/antibiotics-discovery/). Together with the Center for Computational Mass Spectrometry at UCSD and Mohimani Lab at Carnegie Mellon University, we are creating software for identification of... Read More →
AM

Alla Mikheenko

Saint Petersburg State University
avatar for Andrei Prjibelski

Andrei Prjibelski

Saint Petersburg State University
AS

Andrey Slabodkin

Saint Petersburg State University
AT

Azat Tagirdzhanov

Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia; Department of Higher Mathematics, St. Petersburg Electrotechnical University “LETI”, St. Petersburg, Russia
BB

Bert Bogaerts

UGhent – Department of information Technology, IDLab, imec
avatar for Daria Zhernakova

Daria Zhernakova

Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University
DO

Dmitrii Ostromyshenskii

Institute of Cytology of the Russian Academy of Science
DP

Dmitry Prokopov

junior researcher, Institute of Molecular and Cellular Biology Siberian Branch of the Russian Academy of Sciences
avatar for Dmity Rodin

Dmity Rodin

Ariel University
avatar for Elena Bushmanova

Elena Bushmanova

Saint Petersburg State University
EG

Erik Gandalipov

ITMO University
IT

Ivan Tolstoganov

Saint Petersburg State University
KK

Ksenia Krasheninnikova

Dobzhansky Center for Genome Bioinformatics, St.Petersburg State University
KZ

Kseniya Zayulina

Research center of Biotechnology RAS
M

M.A.Babenko

All-Russia Research Institute for Agricultural Microbiology
MC

Maria Chernigovskaya

Saint Petersburg State University
avatar for Oleg Shpynov

Oleg Shpynov

JetBrains Research
OK

Olga Kunyavskaya

Saint Petersburg State University
PS

Pavel Shelyakin

VIGG RAS, IITP RAS, Skoltech
SK

Sergey Kazakov

ITMO University
avatar for Sergey Nurk

Sergey Nurk

Researcher, Saint-Petersburg State University
VU

Vladimir Ulyantsev

ITMO University
YS

Yaroslav Solovev

ITMO University
YK

Yulia Kondratenko

Saint Petersburg State University
Saint Petersburg State University, Russia
avatar for Yulia Yakovleva

Yulia Yakovleva

Department of Cytology and Histology, Saint Petersburg State University, Bioinformatics Institute
Saint Petersburg State University, Russia
YG

Yuri Gorshkov

ITMO University
YB

Yury Barbitoff

Dpt. Of Genetics and Biotechnology, Saint-Petersburg State University; Bioinformatics Institute
YV

Yury V. Malovichko

All-Russia Research Institute for Agricultural Microbiology (ARRIAM)


Monday July 16, 2018 16:20 - 17:30 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034
 
Tuesday, July 17
 

09:00 MSK

Towards perfect de novo DNA assembly
We are about to enter an era of DNA sequencing where one can in the near future produce, de novo, a reference-quality genome of any living species for 1,000 EU.  This ability will revolutionize ecology, evolution, and conservation science and effectively mark the beginning of a new exploration of the natural world.
The technological driver is the advent of long read sequencers such as the PacBio Sequel and Oxford Promethion.  The long reads in effect make assembly easier, and one sees corresponding improvements in the continuity of the results, but the underlying algorithms are effectively the same as those first developed 20 years ago, and repetitions at the scale of read length are still an issue.  Indeed, truly better assembly requires finding all artifacts in the reads and the resolution of repeat families, topics that I don’t think have received sufficient attention and that are particularly critical issues for long reads.
Therefore we are developing algorithms that carefully analyze a long read shotgun data set before assembly. By efficiently comparing all the data against itself we have developed a computational approach to accurately determine the quality of any stretch of a PacBio read based only on the sequence data itself.  These intrinsic QVs allow us to  accurately identify low quality regions, chimers, and missed adaptamers.  Removing these artifacts with a process we call scrubbing leaves one with reads that assemble without the need for base-level error correction.  We have further developed a heuristic consensus algorithm that is far more efficient and accurate than pervious methods and further identifies potential sites of variation due to haplotypes or repeats.  Using this algorithm we further correct reads, typically to Q40 (99.99% accurate).  In effect, we have developed a process that takes Q7 reads full of artifacts, and produces Q40 artifact-free reads solving all aspects of the assembly problem save the separation of nearly identical, ubiquitous, and large repeats.

Speakers
avatar for Gene Myers

Gene Myers

Managing Director, Max-Planck Institute for Molecular Cell Biology and Genetics
Ph.D., Managing Director, Max-Planck Institute for Molecular Cell Biology and Genetics


Tuesday July 17, 2018 09:00 - 10:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

10:00 MSK

Bwise: a novel accurate, haplotype-specific genome assembler
Assemblers based on the de Bruijn graph (DBG) paradigm usually discard lots of useful information from short paired-end reads, resulting in fragmented assembly (particularly in the case of heterozygous genomes). String graphs based assemblers may be able to use the whole read information but suffer from low scalability on large datasets. To combine those two approaches, we efficiently align reads (paired or not) on DBG in a new assembler dubbed Bwise (short for “de Bruijn workflow using integrally the information of short paired-end reads”). Bwise maps reads (or read pairs) on the (cleaned, compacted) DBG generated from the same set of reads. A previous work, a short read corrector BCOOL, showed that such mapping provided very accurate sequences. Here we use them as so called super-reads (i.e. linear paths of unitigs) that are subsequently filtered and assembled into contigs. To improve the initial set of contigs, a new DBG can be constructed with a higher kmer size to reiterate the assembly process on a simpler and less fragmented graph. As k increases up to read length (or even beyond it), the contig graph outputted by Bwise becomes progressively simpler and the statistics of the contig set improve dramatically. Bwise were originally designed for assembling complex diploid or polyploid genomes and showed great results in that way. In the case of the rotifer A.vaga the MIRA assembler obtained an assembly with a 47kb N50 in 9 months on a cluster presenting more than one terabyte of RAM where Bwise was able to propose a N50 of 150kb in 2 hours on a 20 core cluster. But Bwise also performs very well on haploid or meta-genomic data-set, often delivering assemblies more continuous and accurate than other state-of-the-art approaches. Bwise is also scalable and is able to assemble a 100X human data-set using less than 100GB of RAM in two days on a 20 cores cluster.

Speakers

Tuesday July 17, 2018 10:00 - 10:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

10:20 MSK

Break
Tuesday July 17, 2018 10:20 - 10:40 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

10:40 MSK

Assembling barcoded RNA sequencing data
De novo transcriptome assembly is a valuable alternative to the classic reference-based methods for RNA-Seq analysis. Although, multiple approaches and algorithms were developed, the problem of restoring all complete full-length isoforms remains a challenging problem that, in some cases, cannot be possibly resolved just by using short paired-end reads or coverage depth. We propose to utilize a recently developed barcoded RNA sequencing protocol that allows to generate reads with each barcode corresponding to a separate RNA molecule. To enable assembly of this protocol we developed algorithms on top of existing RNA-Seq assembler rnaSPAdes. In this manuscript we demonstrate that using barcoded data leads to dramatic improvements in de novo transcriptome assembly quality, such as generation of complete transcript sequences even for the complex eukaryotic genes with multiple expressing isoforms.

Speakers
avatar for Andrei Prjibelski

Andrei Prjibelski

Saint Petersburg State University


Tuesday July 17, 2018 10:40 - 11:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

11:00 MSK

BiosyntheticSPAdes: Reconstructing Biosynthetic Gene Clusters From Assembly Graphs
Predicting Biosynthetic Gene Clusters (BGCs) is critically important for discovery of antibiotics and other natural products. While BGC prediction from complete genomes is a well-studied problem, predicting BGC in fragmented genomic assemblies remains an open problem. The existing BGC prediction tools often assume that each BGC is encoded within a single contig in the genome assembly, a condition that is violated for many sequenced microbial genomes where BGCs are often scattered through several contigs, making it difficult to reconstruct them. The situation is even more severe in shotgun metagenomics, where the contigs are often short, and the existing tools fail to predict a large fraction of long BGCs. While it is difficult to predict BGCs spanning multiple contigs, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding long BGCs. We describe biosyntheticSPAdes, a novel tool for predicting BGCs in assembly graphs and demonstrate that it greatly improves the reconstruction of BGCs from genomic and metagenomics datasets.

Speakers
avatar for Dmitry Meleshko

Dmitry Meleshko

Saint Petersburg State University


Tuesday July 17, 2018 11:00 - 11:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

11:20 MSK

Plasmid detection and assembly in genomic and metagenomic datasets
Although plasmids are important for bacterial survival and adaptation, plasmid detection and assembly from genomic, let alone metagenomic, samples remains challenging. The recently developed plasmidSPAdes assembler addressed some of these challenges in the case of isolate genomes but stopped short of detecting plasmids in metagenomic assemblies. We present the metaplasmidSPAdes tool that enabled plasmid assembly in metagenomics datasets and reduced the false positive rate of plasmid detection as compared to the state-of-the-art approaches. Applications of plasmidSPAdes and metaplasmidSPAdes to diverse isolate and metagenomics datasets revealed a surprisingly high yield of novel plasmids without significant similarities with known plasmids and plasmids carrying antibiotic-resistance genes.

Speakers

Tuesday July 17, 2018 11:20 - 11:40 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

11:40 MSK

CellPi: unsupervised processing pipeline of mouse and human single-cell RNA-seq data
Single-cell RNA-seq becomes a standard for cell types characterization. Different single cell-specific tools been developed in the last four years. We present a pipeline that combines well known tools and ideas into a user-friendly R package that provides consistently good clustering results on a wide range of single cell experiments. It is capable to separate highly homogeneous cell populations of the developing embryo as small as 6-10 cells and detect inseparable sub-populations up to 2000 cells without using any additional annotation.
We propose a use of 1d-tSNE for optimal perplexity selection that improves PCA+tSNE clustering.

Speakers
AS

Alexey Samosyuk

Skoltech, MIPT, VIGG


Tuesday July 17, 2018 11:40 - 12:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

12:00 MSK

Lunch
Tuesday July 17, 2018 12:00 - 13:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

13:20 MSK

Test tubes, sequencing machines, computers: bioinformatics as a molecular biology tool
Combination of comparative genomics approaches and large-scale data analyses allows one to make specific predictions about the function and regulation of concrete genes that can then be validated using standard experimental techniques. Notably, these prediction go way beyond simple similarity-based annotations and often describe novel biological phenomena.
I shall present some recent examples of such studies, including discovery of the second lactose catabolism pathway and a global regulator of motility in Escherichia coli, and characterization of the desiccation-rehydration cycle in a midge Polypedilum vanderplanki.

Speakers
avatar for Mikhail S. Gelfand

Mikhail S. Gelfand

Deputy Director, Institute for Information Transmission Problems
D.Sc. and Ph.D., Member of Academia Europaea, Deputy Director of Institute for Information Transmission Problems


Tuesday July 17, 2018 13:20 - 14:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

14:20 MSK

HEDGE: Highly accurate GPU-powered protein-protein docking pipeline
Speakers
avatar for Timofei Ermak

Timofei Ermak

BIOCAD / Institute of Cell Biophysics RAS


Tuesday July 17, 2018 14:20 - 15:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

15:00 MSK

CRISPR: fascinating biology and limitless applications
CRISPR is the new generation of genome editing and regulation tools that have rapidly revolutionized the practice of genome engineering. However, CRISPR is much more than that. It is a system of microbial adaptive immunity the existence of which has not been suspected until recently and that embodies the Lamarckian principle of evolution by inheritance of acquired characters. The evolutionary history of CRISPR itself is also remarkable, revealing surprising connections between parasitic genetic elements and host defense. I will discuss the biology and evolution of CRISPR-Cas and the molecular features that make it uniquely efficient as a genome editing tool.

Speakers
avatar for Eugene V. Koonin

Eugene V. Koonin

Senior Investigator, National Institutes of Health
PhD, Senior Investigator, National Center for Biotechnology Information, National Library of Medicine, NIH


Tuesday July 17, 2018 15:00 - 16:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

16:00 MSK

Break
Tuesday July 17, 2018 16:00 - 16:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

16:20 MSK

Promoters and enhancers landscape of embryonic development and hibernation in chicken
Cap Analysis of Gene Expression (CAGE) in combination NGS  provides precision mapping of transcription start sites (TSSs) and genome-wide capture of promoter and enhancers activities in differentiated cells populations and tissues of interest. We used the chick model and performed CAGE-based TSS analysis on embryonic samples covering the full 3-week developmental period. In total, 31,863 robust TSS peaks (>1 tag per million [TPM]) were mapped to the latest chicken genome assembly, of which 34% to 46% were active in any given developmental stage. ZENBU, a web-based, open-source platform, was used for interactive data exploration. Our study also uncovered a large set of extremely stable housekeeping TSSs and many novel stage-specific ones. We furthermore demonstrated that TSS mapping could expedite motif-based promoter analysis for regulatory modules associated with stage-specific and housekeeping genes. Finally, using Brachyury as an example, we provide evidence that precise TSS mapping in combination with Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-on technology enables us, for the first time, to efficiently target endogenous avian genes for transcriptional activation. Next, we applied CAGE technology to analyze transcriptional network underlying mechanisms of “chicken hibernation” – temperature-dependent embryonic development arrest. Using a set of time points reflecting active vs cooling-induced hibernation on different developmental  stages of chicken embryos we identified several transcriptional patterns specific for the “hibernation”

Speakers
RD

Ruslan Deviatiiarov

Institute of Fundamental Medicine and Biology, Kazan Federal University
avatar for Oleg Gusev

Oleg Gusev

Unit Leader, KFU/RIKEN
Comparative Genomics, Biology of Extremophiles, Translational Genomics, Transcriptomics, regulatory elements, anhydrobiosis, hibernation.


Tuesday July 17, 2018 16:20 - 16:40 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

16:40 MSK

Genome rearrangements in bacteria
Bacterial chromosomes are complex fast-evolving systems. Genome rearrangements and horizontal gene transfer lead to the genome plasticity that is necessary for adaptation for changes in life style. Genome rearrangements play the important role in bacterial evolution as they can destroy genes, create new genes and change the copy number of gene transcripts. Accumulation of large amount of whole-sequenced bacterial genomes from closely-related species allows to study genome rearrangements in context of evolution. We reconstructed the evolutionary history of genome rearrangements for bacterial species from diverse ecological niches and with different genome organization. Our results show that rearrangement rates differ dramatically in different bacterial species, that is likely to be related to the adaptation driven by changes in life style. Meanwhile, for newly formed pathogens Yersinia pestis and Burkholderia mallei we revealed the correlation between mutations rates and inversions rates. Analysis of contradictions between the obtained evolutionary trees based on the alignments of common genes and the gene order yielded numerous parallel rearrangements. Numerous gene losses and inversions likely have been caused by a high rate of intragenomic recombination between limited number of repeated elements such as transposases and 16S-23S rRNA clusters. In Streptococcus pneumoniae and Burkholderia pseudomallei we revealed parallel inversions that may result in phase (antigenic) variation. The reconstructed inter-chromosome translocations in bacterial genomes with multi-chromosome genome organization indicate strong selection against transfer of large fractions of gene­s between the leading and the lagging strands.

Speakers
PS

Pavel Shelyakin

VIGG RAS, IITP RAS, Skoltech


Tuesday July 17, 2018 16:40 - 17:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

17:00 MSK

Analysis and visualization of segmental duplications in mammalian genomes
Segmental duplications (SDs) play key roles in gene evolution and genomic diseases. Nevertheless, the real extent of SDs in the genomes remains unknown because SDs represent a significant impediment to accurate human genome assembly and existing assembly tools often collapse highly similar SDs. Thus, the tools capable of accurate finding and thorough analysis of SDs are extremely important. The recently developed SDquest algorithm for SD detection has shown that SDs account for at least 6% of the human genome. The novel genome assembler Flye possesses a unique possibility of reconstructing the mosaic structure of SDs using the assembly graph. At the same time, a huge number of identified SDs makes their analysis a challenging problem due to the lack of suitable visualization tools. To counter this gap, we developed a novel genome visualizer for accurate assessment and analysis of SDs. It allows to explore intra- and interchromosomal duplications and analyze their complex mosaic structure. The visualization can ease the process of SD analysis and help to reveal new SD patterns. The tool is available online.

Speakers
AM

Alla Mikheenko

Saint Petersburg State University


Tuesday July 17, 2018 17:00 - 17:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

18:00 MSK

Dinner
Conference Dinner

Tuesday July 17, 2018 18:00 - 21:00 MSK
Pier University Embankment, 13, Sankt-Peterburg, Russia, 199034
 
Wednesday, July 18
 

09:00 MSK

Sequencing genome diversity in fish
Nearly half of vertebrate species are fish, and within them there is enormous genetic and evolutionary diversity.  We have recently been involved in two large scale fish genome sequencing projects.  The first focuses on the hundreds of cichlid fish species in Lake Malawi, which constitute the most extensive recent vertebrate adaptive radiation. We have mapped its genomic diversity by sequencing 134 individuals covering 73 species across all major lineages. Phylogenetic analyses suggest that no single species tree adequately represents all species relationships, with evidence for substantial gene flow at multiple times. Sequencing of related species from East African rivers indicates that the Malawi radiation arose from a hybridisation between at least two previously separated lineages, and that differentially fixed variants contributed from the ancestral lineages have been under adaptive selection within the Malawi radiation. In parallel, we have been generating high quality de novo genome reference sequences for representatives of fish orders, in the context of the international Vertebrate Genomes Project. Using long sequencing reads from single molecule technologies, and related data, we can now generate near chromosomal sequences, and there are exciting prospects for scaling these approaches towards sequencing all accessible species in the coming decade.

Speakers
avatar for Richard Durbin

Richard Durbin

Professor, Dept. of Genetics, University of Cambridge
Ph.D., Honorary Professor of Computational genomics at the University of Cambridge


Wednesday July 18, 2018 09:00 - 10:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

10:00 MSK

The Genome Russia Project – 2018
The Russian Federation spans 11 time zones and is the home of ~146,000,000 people: 80% are the ethnic Russians and the remainder identify themselves as one of ~200 indigenous ethnic minorities. Despite the large population size and high ethnic diversity, no centralized reference database of
functional and endemic genetic variation has been established to date. The national Genome Russia Project aims to perform high coverage whole genome sequencing and analysis of peoples of the Russian Federation. We shall describe our progress based upon resolving genome-wide variation (SNPs, indels, and copy number variation) from 264 healthy adults, including 60 newly sequenced samples consisting of family trios from three geographic regions: Pskov, Novgorod and Yakutia,. People of Russia are shown to carry known and novel genetic variants of adaptive, clinical and functional consequence that in many cases show appreciable occurrence or allele frequency divergence from the neighboring Eurasian populations. Population genetic phylogenetic analyses revealed strong geographic partitions among indigenous ethnicities corresponding to the geographic locales where they have lived. Allele frequency spectra identified strong constraints to gene flow corresponding to the geological barriers (e.g. the Ural Mountains and Verkhoyansk mountain range). These first conclusion of the Genome Russia Project include results important for medical genetics as well as for population natural history studies.

Speakers
SJ

Stephen J OBrien

Saint Petersburg State University


Wednesday July 18, 2018 10:00 - 10:40 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

10:40 MSK

Break
Wednesday July 18, 2018 10:40 - 11:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

11:00 MSK

A Rapid Exact Solution for the Guided Genome Aliquoting Problem
Genome rearrangements are large-scale evolutionary events that shuffle genomic architectures. Since such events are rare, the maximum parsimony assumption implies that the evolutionary distance between genomes can be estimated as the minimum number of genome rearrangements, which further enables reconstruction of ancestral genomes by minimizing the total evolutionary distance along the branches of the evolutionary tree. The basic case of this problem for three given genomes is known as the genome median problem (GMP), which asks for a single ancestral genome (median genome) at the minimum total distance from the given ones. A median genome corresponds to an optimal perfect matching in the breakpoint graph of the given genomes that maximizes the total number of 2-colored alternating cycles. While the GMP is NP-hard (Tannier and et. al, 2009), one of the prominent exact and practical solutions to the GMP is based on decomposition of the breakpoint graph into adequate subgraphs, i.e., induced subgraphs where any optimal matching can be extended to an optimal matching in the breakpoint graph (Xu and et. al, 2008). Whole genome duplications (WGDs) represent yet another type of dramatic evolutionary events, which simultaneously duplicate each chromosome of a genome. In particular, WGDs are known to happen in the evolution of plants and yeasts. A WGD can be viewed as a partial case of a whole genome multiplication (WGM), which simultaneously creates m ≥ 2 copies of each chromosome. An analog of the GMP in presence of a WGM is known as the guided genome aliquoting problem (GGAP). The GGAP for given genomes A and B, where all genes in B are present in a single copy (ordinary genome), while all genes in A are present in m copies, asks for an ordinary ancestral genome R that minimizes the total distance between genomes A and mR (genome resulted from the WGM of R) and between B and R. In the present study, we propose an exact fast algorithm for solving the GGAP for m = 2 and m = 3, which is based on extension of the adequate subgraphs approach. Namely, we identify all simple adequate subgraphs of small size for the GGAP. Our algorithm searches for such subgraphs in the given breakpoint graph, finds optimal matchings in them, which are further combined and extended to an optimal matching (representing a solution R) in the breakpoint graph.

Speakers
MA

Maria Atamanova

ITMO University


Wednesday July 18, 2018 11:00 - 11:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

11:20 MSK

Bounded-length Smith-Waterman alignment
Given a fixed alignment scoring scheme, the bounded-sum Smith--Waterman alignment problem on a pair of strings of lengths $m$, $n$, asks for the maximum alignment score across all substring pairs, such that the sum of the substring lengths is above the given threshold $w$. This problem was introduced by Arslan and E{\u g}ecio{\u g}lu under the name ``local alignment with length threshold''. They describe a dynamic programming algorithm solving the problem in time $O(mn^2)$, and also an approximation algorithm running in time $O(rmn)$, where $r$ is a parameter controlling the accuracy of approximation. We introduce the bounded-length Smith--Waterman alignment problem, which is closed related to the bounded-sum problem. We then show that both these problems can be solved exactly in time $O(mn)$, assuming a rational scoring scheme. Our algorithms rely on the techniques of fast window-substring alignment and implicit unit-Monge matrix searching, developed previously by the author.

Speakers
avatar for Alexander Tiskin

Alexander Tiskin

University of Warwick


Wednesday July 18, 2018 11:20 - 11:40 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

11:40 MSK

Reconstruction of a Set of Points from the Noise Multiset of Pairwise Distances in n^2 Steps for the Cyclic Sequencing Problem
Motivation and Aim: An important fraction of the peptidoma of bacteria is non-ribosomal peptides (NRP), representing a class of secondary peptide metabolites, usually produced by bacteria and fungi, and having an extremely wide range of biological activity and pharmacological properties. In the overwhelming majority of cases (64%), NRPs have a cyclic structure. In connection with their biosynthesis from the non-rybosomal path, the identification of NRPs by classical methods of bioinformatics and genomics is impossible, and is carried out only on the basis of mass spectrometry. Mathematically, the sequencing of a cyclic chain from mass spectra is reduced to the problem known to mathematicians for long: the recovery of the coordinates of a set of points X from the multiset of pairwise distances between them ∆X (so-called the beltway problem, which having no polynomial-time algorithm in the general case). The computational complexity of the best algorithm developed by now is O(n^n log n). Despite the many approaches used (the brute force method, graph models, dynamic programming, the divide and conquer method, hidden Markov models, spectral convolution, etc.), attempts to design a polynomial algorithm for the beltway problem failed. So at present, the possibilities of de novo reconstruction of the structure of cyclic NRPs are limited. Thus, the development of new bioinformatic methods for the reconstruction of bacterial non-ribosomal peptides is very relevant.

Methods and Algorithms: We proposed a new method to solve the problem. It is based on sequential removal of redundancy from the inputs. For the error-free inputs that simulate mass spectra with high accuracy (~10^-3 Da), the size of inputs decreases from O(n^2) to O(n). In this way, exhaustive search can be almost completely removed from the algorithms, and the number of steps to reconstruct a sequence is n^2, where n is is the cardinality of the set X, n=|X|.

Results: Now we generalized this method through the use of integral transforms. It is shown that the generalized approach can be successfully used for reconstructing the set X not only from a complete and error-free set of pairwise distances ∆X, but also for a set ∆X + f containing a large number of redundant and missing data f (noise), |f| > |∆X|. The high efficiency of the proposed method was shown. The computational complexity of the our algorithm is O(n^2), where n is the cardinality of the input set ∆X + f, n=|∆X + f|.

The work was carried out with the support of the Russian Foundation for Basic Research, project No.17-00-00462.

Speakers
EF

Eduard Fomin

Institute of Cytology and Genetics SB RAS


Wednesday July 18, 2018 11:40 - 12:00 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

12:00 MSK

Bayesian modelling of gene network alterations during tree-like processes: evolution or cells differentiation
Tree is a typical diagrammatic representation of relationships among objects in various biological processes, for instance, phylogenetic trees or trees of cellular differentiation. The problem to predict some characteristics of objects within inner tree nodes is well studied when these characteristics are independent. Here we represent the case when these characteristics are non-independent. To be specific, we considered each object characterised by a network of interacting genes together with their expression levels and built the model to predict the configurations of the gene network within inner tree nodes: ancestral states in phylogenies or progenitor cells in the differentiation.
In our model we assumed that a tree-like process is continuous, i.e. the gene expression covariance matrix together with coefficients of gene-gene interactions change from the root state to leaves in agreement with a continuous-states time-homogeneous Markov Process, specifically the Wiener Process. We also assumed that the gene network topology should be maintained during this process so that within each inner node and outer leaf of the tree, the gene network satisfies the Structural Equation Model (SEM). We utilised the Bayesian inference to construct the scheme for MCMC parameter optimisation method.
We applied the developed model to the tree-like process of blood differentiation from hematopoietic stem cells through different progenitor states to mature states (monocytes, lymphocytes, neutrophils, etc.). We used gene expression data within leaves of the tree (microarray Human Map dataset) and optimised all parameters of both SEM model and the Wiener Process by MCMC. We modelled RAS signalling network as it involves in Leukemia development. We predicted the states of this gene network in inner nodes and, using parameters of the Wiener Process, predicted the point on the tree where the cancer cells (T-cells or B-cells) have its own branch. The knowledge of this point can potentially help in leukaemia treatment. We consider, the developed methodology can be readily applied to other cell development and also phylogenetic studies.

Speakers
avatar for Anna Igolkina

Anna Igolkina

Peter the Great St.Petersburg Polytechnic University


Wednesday July 18, 2018 12:00 - 12:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

12:20 MSK

Lunch
Wednesday July 18, 2018 12:20 - 13:30 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

13:30 MSK

Discovering novel metabolisms via metagenomics
I will demonstrate the power of metagenomics in uncovering the details and the nuances of some major microbially-driven biogeochemical processes, focusing on the bacterial methane cycling as an important part of the global carbon turnover on Earth. First, by combining metagenomics with stable isotope probing, we uncover that the major species involved in the methane cycle in lake sediments (the methanotrophs) are not the ones easily cultivated in the laboratory. Second, we uncover specific satellite organisms, associated with the methanotrophs, that appear to also feed on carbon originating from methane. Additionally, we uncover the denitrification capabilities for both functional groups, suggesting that methane cycling may be linked to nitrogen cycling in oxic/unoxic interface environments. We further uncover dependence of methanotrophy on lanthanides, the rare Earth elements previously assumed to be biologically inert, and uncover a complex interplay between alternative enzymes relying on common (calcium) versus exotic (lanthanides) metals for both activity and expression. I will further highlight more recent discoveries from combining synthetic ecology approaches with meta-omics, which include further insights into communal metabolism of methane and into novel genes and enzymes, as well as into additional actors in global methane turnover that appear to function in concert with bona fide methanotrophs. Finally, I conclude that metagenomics has had a revolutionary impact on the field of methanotrophy over the past decade, and that the momentum is still going strong.

Speakers
avatar for Ludmila Chistoserdova

Ludmila Chistoserdova

Senior Scientist, University of Washington
Ph.D., Senior Scientist, Department of Chemical Engineering, University of Washington


Wednesday July 18, 2018 13:30 - 14:30 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

14:30 MSK

ClinCNV: novel method for large-scale CNV and CNA discovery
Germline copy number variants (CNV) and somatic copy number alterations (CNA) are a common source of genomic variation involved in many genomic disorders, such as schizophrenia or cancer. Genomic microarrays, FISH, MLPA, as well as many other technologies, are widely used for detection of CNVs. Whole-genome sequencing (WGS) and whole-exome sequencing (WES) are well established, highly accurate tools for the detection of point mutations and small indels. CNV/CNA detection using WGS/WES data has been emerging as a competitive alternative for interrogating such type of variation, but remains challenging.
We have developed a new method for multi-sample CNV/CNA detection. The ClinCNV method can integrate multiple data types, including signatures derived from various WES, WGS, and microarray protocols. At first, the reference genome is divided into non-overlapping windows and different sources of information such as read depth, hybridization intensity or B-allele frequency ratio are quantified and normalized, taking into account both window- and sample-specific variability. In case if the same sample was analyzed several times with different experimental techniques (e.g., WES and shallow WGS), which is a common approach for diagnostics, evidence of copy number changes from different sources is summed up into a single matrix of likelihoods of size [number of windows] by [number of states], where states denotes distinct copy numbers. Next, ClinCNV recursively identifies segments with the strongest evidence of CNV presence in a two-step manner: common CNVs that have >5% frequency within the studied cohort are identified first, while less frequent variants are detected in the second step. Finally, we use strict filtering to remove spurious results. The algorithm’s computational complexity is linearly dependent on the number of states, allowing simultaneous detection of hundreds of distinct non-discrete copy numbers, which is especially useful for CNA detection in heterogeneous samples with complex clonal structure as frequently found in cancer.
Using the ClinCNV method we analyzed a cohort of 2834 WGS samples from the Pan Cancer Analysis of Whole Genomes (PCAWG) study, 2651 of which passed QC control. We detected 16,907, 6,156 and 568 bi-allelic deletions, duplications and mCNV events, respectively, of size greater than 3KB. FDRs for the three variant types were estimated using the IRS [1] method and available microarray intensity data and were equal to 0.0229, 0.029 and 0.049. We also investigated segments that show non-diploid coverage patterns across the majority of samples, which potentially represent reference assembly errors or highly homologous regions such as segmental duplications. We will furthermore report results for 436 chronic lymphocytic leukemia (CLL) tumor and normal pairs analyzed by WES,and 67 shallow WGS samples with coverage depth from 0.5x to 9x (average 3.8x). Comparisons between different platforms and their power to detect CN changes will be provided.

Speakers
avatar for German Demidov

German Demidov

Institute of Medical Genetics and Applied Genomics, Tübingen, Germany


Wednesday July 18, 2018 14:30 - 14:50 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

14:50 MSK

Mathematical modeling of SNP %GC in microbial core genomes
The present talk will address whether the GC content of non-recombinant substituted bases in microbial core genomes (sbGC) exhibits any association with the GC content of the corresponding core genomes (cgGC). The GC content of the substituted bases of the strains comprising each core genome, 36 in total each representing a separate microbial species consisting of at least 10 strains, was compared with the GC content of the corresponding core genomes. We found that sbGC within each core genome showed a non-linear association with cgGC with a bias towards higher GC content for most core genomes, assuming as a null-hypothesis that sbGC should be approximately equal to cgGC. The most GC rich core genomes (i.e. approximately %GC>60), on the other hand, exhibited slightly less GC-biased sbGC than expected. We present a simple mathematical model that estimates sbGC from cgGC. The model assumes only that the estimated sbGC is a function of cgGC proportional to fixed AT->GC (α) and GC->AT (β) mutation rates. Using non-linear regression to estimate α and β from the empirical data described above, we find that the best fitted model indicates that GC->AT mutation rates β=(1.91±0.13) p<0.001 are approximately (1.91/0.79)=2.42 times higher, on average, than AT->GC α=(-0.79±0.25) p<0.001 mutation rates. Whether the observed sbGC GC-bias for all but the most GC-rich prokaryotic species is due to selection, compensating for the GC->AT mutation bias, and/or selective neutral processes is currently debated. Residual standard error was found to be σ=0.076 indicating estimated errors of sbGC to be approximately within ±15.2% GC (95% confidence interval) for the strains of all species in the study. Not only did our model give reasonable estimates of sbGC it also provides further support to previous observations that mutation rates in prokaryotes exhibit an universal GC->AT bias that appears to be remarkably consistent between taxa.

Speakers
avatar for Jon Bohlin

Jon Bohlin

Scientist - Bioinformatician, Norwegian Institute of Public Health
I work as a bioinformatician at the Norwegian Institute of Public Health. I'm partially responsible for the NGS infrastructure at our domain. My area of expertise lies within comparative microbial genomics.


Wednesday July 18, 2018 14:50 - 15:10 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

15:10 MSK

Break
Wednesday July 18, 2018 15:10 - 15:30 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

15:30 MSK

Building time- and cost-effective bioinformatics pipelines in the Cloud - from bcl to visual analysis with New Genome Browser
The talk covers contemporary approaches and practical experience of building bioinformatics pipelines based on cloud technologies using data processing parallelization and flexible resources orchestration with an eye to the most efficient use of time and money.


Wednesday July 18, 2018 15:30 - 16:10 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034

16:10 MSK

Closing Remarks
Speakers
avatar for Alla Lapidus

Alla Lapidus

Professor, Center for Algorithmic Biotechnology, Saint Petersburg State University, 6 linia V.O., 11/21d, 1990034 St Petersburg, Russia


Wednesday July 18, 2018 16:10 - 16:20 MSK
Main Hall, Saint Petersburg State University Университетская наб., 7/9, Санкт-Петербург, г. Санкт-Петербург, Russia, 199034
 
Thursday, July 19
 

09:30 MSK

QIIME Workshop
Limited Capacity seats available

QIIME Workshop run by Dr. Amanda Birmingham and Dr. Rob Knight (UCSD, USA),  will include lectures covering basic QIIME usage and theory, and hands-on work with QIIME to perform microbiome analysis from raw sequence data through publication-quality statistics and visualizations.
This workshop will provide the foundation on which students can begin using these tools to advance their own studies of microbial ecology.

We encourage participants to visit the QIIME home page prior to the workshop.

Computer/software requirements: any computer with Internet access with modern browser (such as Firefox).

Speakers
AB

Amanda Birmingham

University of California, San Diego
avatar for Rob Knight

Rob Knight

Professor, UC San Diego
Ph.D., Professor at the University of California, San Diego, Co-founder of the Earth Microbiome Project



Thursday July 19, 2018 09:30 - 11:00 MSK
Auditorium 410, Graduate School of Management Volkhovskiy Pereulok, 3, Sankt-Peterburg, Leningradskaya oblast', Russia, 199004

11:00 MSK

Break
Thursday July 19, 2018 11:00 - 11:30 MSK
Auditorium 410, Graduate School of Management Volkhovskiy Pereulok, 3, Sankt-Peterburg, Leningradskaya oblast', Russia, 199004

11:30 MSK

QIIME Workshop
Limited Capacity seats available

QIIME Workshop run by Dr. Amanda Birmingham and Dr. Rob Knight (UCSD, USA),  will include lectures covering basic QIIME usage and theory, and hands-on work with QIIME to perform microbiome analysis from raw sequence data through publication-quality statistics and visualizations.
This workshop will provide the foundation on which students can begin using these tools to advance their own studies of microbial ecology.

We encourage participants to visit the QIIME home page prior to the workshop.

Computer/software requirements: any computer with Internet access with modern browser (such as Firefox).

Speakers
AB

Amanda Birmingham

University of California, San Diego
avatar for Rob Knight

Rob Knight

Professor, UC San Diego
Ph.D., Professor at the University of California, San Diego, Co-founder of the Earth Microbiome Project


Thursday July 19, 2018 11:30 - 13:00 MSK
Auditorium 410, Graduate School of Management Volkhovskiy Pereulok, 3, Sankt-Peterburg, Leningradskaya oblast', Russia, 199004

13:00 MSK

Lunch
Thursday July 19, 2018 13:00 - 14:00 MSK
Auditorium 410, Graduate School of Management Volkhovskiy Pereulok, 3, Sankt-Peterburg, Leningradskaya oblast', Russia, 199004
 
Filter sessions
Apply filters to sessions.