July 19, 2019, Friday

From BIMIB

Jump to: navigation, search

Difference between revisions of "NGS MM 19"

m
Line 72: Line 72:
 
:Joint work with Grazia Fazio, Simona Songia, Andrea Grioni, Silvia Rigamonti, Barbara Buldini, Chiara Palmi, Valentino Conter, Giuseppe Basso, Andrea Biondi.
 
:Joint work with Grazia Fazio, Simona Songia, Andrea Grioni, Silvia Rigamonti, Barbara Buldini, Chiara Palmi, Valentino Conter, Giuseppe Basso, Andrea Biondi.
  
* 15:00 Luca Denti, DISCo, Universita` degli Studi di Milano-Bicocca
+
* 15:00 Dario Pescini, Statistica, Universita` degli Studi di Milano-Bicocca
 +
*'''Single-cell RNA-Seq and Metabolism Modelling'''
 +
 
 +
* 15:30 Luca Denti, DISCo, Universita` degli Studi di Milano-Bicocca
 
:'''MALVA: genotyping by Mapping-free ALlele detection of known VAriants'''
 
:'''MALVA: genotyping by Mapping-free ALlele detection of known VAriants'''
 
:The amount of genetic variation discovered and characterized in human populations is huge and is growing rapidly with the widespread availability of modern sequencing technologies. Such a great deal of variation data, that accounts for human diversity, leads to various challenging computational tasks, including variant calling and genotyping of newly sequenced individuals. The standard pipelines for addressing these problems include read mapping, which is a computationally expensive procedure. A few mapping-free tools were proposed in recent years to speed up the genotyping process. While such tools have highly efficient run-times, they focus on isolated, bi-allelic SNPs, providing limited support for multi-allelic SNPs, indels, and genomic regions with high variant density.
 
:The amount of genetic variation discovered and characterized in human populations is huge and is growing rapidly with the widespread availability of modern sequencing technologies. Such a great deal of variation data, that accounts for human diversity, leads to various challenging computational tasks, including variant calling and genotyping of newly sequenced individuals. The standard pipelines for addressing these problems include read mapping, which is a computationally expensive procedure. A few mapping-free tools were proposed in recent years to speed up the genotyping process. While such tools have highly efficient run-times, they focus on isolated, bi-allelic SNPs, providing limited support for multi-allelic SNPs, indels, and genomic regions with high variant density.

Revision as of 23:10, 14 April 2019


NSG Milano Meeting 2019

Meeting milanese su "Next Generation Sequencing" 15 Aprile 2019, ore 9:00 Aula Martini (U6-4), Universita` degli Studi di Milano Bicocca Piazza dell'Ateneo Nuovo 1, 20126 Milano

The NGS Milano Meeting is an informal, day-long forum to discuss experiences and various interesting research and applied projects where NGS technologies play an important role. The current instantiation will focus on "Big Data" handling and the various aspects of "Data Management" for research projects.


Organizers

  • Marco Antoniotti, DISCo UNIMIB
  • Gianluca della Vedova, DISCo, UNIMIB
  • Alex Graudenzi, IBFM CNR
  • Giancarlo Mauri, DISCo UNIMB


Program

  • 8:45 Welcome Address, Marco Antoniotti, DISCo, Universita` degli Studi di Milano-Bicocca
  • 9:00 Davide Cittaro, San Raffaele
TBA
  • 9:30 Mattia Pelizzola, IIT
Dynamics of transcriptional and post-transcriptional regulation
The abundance of premature and mature RNA species, and the responsiveness of their modulation, are set by the kinetics rates of three fundamental steps governing the dynamics of the RNA life-cycle: synthesis, processing and degradation. Experimental and computational methods are being developed that allow the genome-wide quantification of RNA dynamics, revealing previously inaccessible details on the mechanisms regulating gene expression programs. I will discuss the role of RNA dynamics in the context of the transcriptional programs driven by the activation of the MYC transcription factor.
  • 10:00 Raoul Bonnal, INGM
A journey towards Reproducibility: Pipelines, Software, Infrastructure
Reproducing results in a complex organization is a challenging task. Many stakeholders, both internal and external, such as Data Scientists (Bio, Math, Stat, CS), PIs, technicians, departments (IT), collaborators, scientists and funding agencies, could be positively influenced by the adoption of reproducible best practices. The talk gives you an overview of our path towards reproducing bioinformatics analyses, software deployment strategies and computing infrastructure. The talk discusses the technical solutions implemented, possible improvements and the cultural shift introduced by the reproducibility mindset.
  • 10:30 Arnaud Ceol, IEO
Data management at the joint IEO/IIT genomics unit: present and perspectives
Since its inception, the joint genomics unit of the European Oncology Institute (IEO) and the Center for Genomic Science (an outstation of the Istituto Italiano di Tecnologia), relies on a series of protocols, mostly based on applications developed in-house, to manage its sequencing workflow and data management, from the sequencing request up to the bioinformatics analyses. In the last year, this pipeline has been extended to support to new technologies and infrastructures, both on the wet and dry side (NovaSeq sequencing, single-cell analyses, HPC computing). We will present the solution adopted by our units as well as the novel problems that arose in the last year or will arise in the new future, such as the integration of new sequencing technologies or protection of sensible data.


  • 11:00 COFFEE BREAK


  • 11:30 Andrea Calabria, HSR
Characterization of Hematopoietic System Reconstitution In-vivo in Metachromatic Leukodystrophy Gene Therapy Patients
Here we report the molecular analysis of hematopoietic reconstitution in 20 patients enrolled in a self-inactivatinglentiviral vector-based hematopoietic stem cell (HSC) gene therapy clinical trial for metachromatic leukodystrophy conducted at SR-Tiget (up to 7 years' follow-up) and in 7 additional patients treated in early access programs. We retrieved integration site (IS) from CD34+, myeloid and lymphoid cells purified at different time points after therapy (in the first year at 1, 3, 6, 9, 12 months whereas after the first year every 6 months) from bone marrow and/or peripheral blood using PCR protocols. From each patient, we retrieved from 6,000 to 65,000 IS, many of which persisted long term with multi-lineage potential. Regarding potential implications for the safety of the treatment, we did not observe clonal dominance events, no bias to integrate near cancer genes and no common insertion sites generated by genetic selection in any patient. The clonal dynamics of hematopoietic reconstitution of the different lineages showed that circulating lymphoid cells were oligoclonal at early time-points and progressively switched to polyclonal after 6 months, whereas myeloid cells were polyclonal from the first time points. Estimations of the HSC activity, obtained by mark-and-recapture statistics of IS observed over time in short-lived cells, showed that at earlier time points the population size was >26,000 cells that then progressively stabilized to ~10.000 from 9 months post-transplantation, suggesting that the initial waves of reconstitution are sustained by short-lived progenitors. Our data indicate that the treatment results in a highly diversified polyclonal and multilineage reconstitution of hematopoiesis without signs of genotoxicity.
Joint work with Giulio Spinozzi, Paola Rancoita, Fabrizio Benedicenti, Daniela Cesana, Serena Acquati, Daniela Redaelli, Vanessa Attanasio, Francesca Fumagalli, Alessandro Aiuti1, Alessandra Biffi, Luigi Naldini, Clelia Di Serio, and Eugenio Montini.
  • 12:00 Thalia Vlachou, IEO
Intra-tumoral Heterogeneity and Clonal Evolution in Xenograft Models of Acute Myeloid Leukemia (AML)
Acute myeloid leukemia (AML) is one of the most frequent hematological malignancies in adults, and still represents a disease with an unmet medical need, with 50-60% of patients relapsing within 3 years after diagnosis. AMLs are characterized by a high degree of intra-tumor heterogeneity, both at the biological and the genetic level, which is critical for tumor maintenance and response to treatments. Biologically, AMLs are organized hierarchically, with rare stem-like cells (leukemia stem cells, LSCs) endowed with the unique properties of self-renewal and differentiation. Genetically, AMLs harbor patient-specific combinations of different driver mutations, which are organized within individual cases in sub-clones with distinct growth properties. We hypothesized that tumor maintenance and relapse in AMLs are driven by the selective expansion of quiescent sub-clones within the LSC population, which serve as the genomic and functional reservoir of the tumor. The experimental strategy we employed to test this hypothesis was based on the xenotransplantation of human leukemias, the implementation of an in vivo clonal tracking approach, the functional isolation of leukemic subpopulations with diverse proliferation histories, whole-exome sequencing (WES) and single-cell RNA sequencing (scRNAseq) of bulk and isolated leukemic subpopulations. We identified two functional LSC classes, quiescent and cycling, that are in equilibrium in the tumor and largely share the same clonal architecture. We further observed that genetic leukemic clones appear to consist of a high number of individual LSCs, the majority of which exhaust upon serial transplantation. Finally, by genetic analyses of isolated leukemic subsets, we were able to detect a specific enrichment for rare mutations in the quiescent compartment of patient xenografts, which can be selected under the environmental pressure of chemotherapy. Our data indicate that tumor evolution is sustained by the quiescent LSC pool and suggest that long-term survival and propagation of individual LSC clones relies on their ability to preserve a quiescent compartment. Upon selective expansion of quiescent LSCs, minor genetic sub-clones carrying mutations associated with tumor aggressiveness and chemotherapy-resistance can emerge, suggesting a mechanism for the development of refractory relapse tumors.
  • 12:30 Giovanni Crosta, DSAT, Universita` degli Studi di Milano-Bicocca
Putting (single-cell) data into orbit
Data from single-cell mRNA sequencing, made available by leading-edge experimental methods, demand proper representation and understanding. Multivariate statistics and graph theoretic methods represent cells in a suitable feature space, assign to each cell a time label known as "pseudo-time" and display "trajectories" (in fact orbits) in such space. Orbits shall describe a process by which progenitors differentiate into one or more types of adult cells: broncho-alveolar progenitors are e.g., found to evolve into two distinct pneumocyte types. This work aims at applying the qualitative theory of dynamical systems to describe the differentiation process. Some notions of qualitative theory are presented. The main stages of single-cell data analysis are outlined. Next, a two-dimensional continuous time, autonomous dynamical system of polynomial type is looked for, the orbits of which may interpret some sequences of data points in feature (= state) space. An energy function F of two variables, {sigma1,sigma2}, is defined and the autonomous dynamical system obtained from ?F, which thus generates a gradient flow. Both F and the gradient flow give rise to a phase portrait with two attractors, A and B, a saddle point, O, and a separatrix. These properties are suggested by data from single cell sequencing. Initial states of the system correspond to progenitors. Attractors A and B correspond to the two cell types yielded by progenitor differentiation. The separatrix and the saddle point make sure an orbit asymptotically reaches either A or B. Why and how a gradient flow model shall be applied to data from single-cell sequencing is discussed. The application of dynamical system theory presented herewith relies on a heuristic basis, as all population dynamics models do. Nonetheless, placing a given cell on an orbit of its own enables time ordering and compliance with causality, unlike pseudo-time assignment induced by a minimum spanning tree. An earlier (2009) application in a much simpler context, the evolving morphology of cytoskeletal tubulines, is finally recalled: from cyto-toxicity experiments, epifluorescence images of tubulin filaments were obtained, then analysed and assigned to morphology classes; class centroids formed a sequence in feature (? state) space describing loss of cytoskeletal structure followed by its recovery.


  • 13:00 LUNCH/PRANZO


  • 14:30 Giovanni Cazzaniga, Centro Ricerca Tettamanti, Universita` degli Studi di Milano-Bicocca
The NGS-based diagnostic and prognostic workup of childhood leukemia
Background. Next Generation Sequencing (NGS) methods contributed to identify prognostic markers and set up a personalized therapy in a cost- and time-effective manner in several contexts. In the new AIEOP-BFM ALL2017 protocol, for childhood acute lymphoblastic leukemia, an early stratification of patients is needed to investigate the efficacy of therapies directed on specific genetic lesions.
Objectives. The main goal of diagnostics and monitoring in new protocol AIEOP-BFM-ALL 2017 is to provide a rapid and multi-comprehensive strategy for clinical decision making, in particular for actionable lesions and precision medicine.
Methods. NGS was developed within the Euroclonality-NGS consortium to identify clonal IG/TR gene rearrangements for MRD quantification at day +33, with purpose-build bioinformatics tools ARResT/Interrogate and ViDjil for IG/TR NGS-datasets. NGS-digital-MLPA (dMLPA) has been developed to recognize the 'IKZF-plus' patient subgroup by day +33. Moreover, multiplex-RT-PCR used in routine diagnostics for known fusion genes, has been improved to detect TCF3-HLF/t(17;19) transcripts associated to poor prognosis and needing early intensive therapy. In addition, the probe-based RNA Target-Capture NGS has been developed to identify translocations of recurrent genes with any partner gene, by using the Trusight RNA Pan Cancer Library Prep targeting 1385 cancer-associated genes (Illumina). Bioinformatics strategy has been applied for fusion gene detection NGS-datasets, employing both ready-to-use web-based platform (BaseSpace, Illumina Cloud), as well as in-house bioinformatics method, named BreakingPoint.
Results. Since May 2017, we analyzed 226 patients; IG/TR NGS screening identified more markers than conventional methods. Indeed, 1439 IG/TR rearrangements were identified by NGS, with a mean of 6.37 clones/pt (range 0-15), with a mean response time of 14.6 days from diagnosis (range 7-26 days). In 5 out of 226 (2.2%) cases, no IG/TR markers were identified: 3/5 were very immature T-ALL and 2/5 were BCP-ALL (1 BII in addition tp 1 cases with 2.5% blasts). A total of 86 samples was analyzed in parallel with conventional MLPA and dMLPA to detect Ikaros-plus patients, obtaining 98.8% concordance (only 1/86 discordant) and overall 85% concordant results on Copy Number Variation analysis. In a cohort of 261 patients, selected by either MRD at TP1 (d33) 5x10-4 or relapse, RNA-targeted analysis detected 109 fusions, involving recurrent genes such as ETV6, NUP214, BCL9, EBF1, MLL, TCF3 (two cases with TCF3/HLF), ZNF384, PAX5 and JAK2.
Conclusion. NGS allows to identify translocations as well as IG/TR rearrangements, while digital MLPA detects copy number alterations (CNAs) associated to Ikaros-plus. By these combined NGS-based new methods, in addition to routine diagnostics, it's possible to fine-tune risk stratifications and treatment for genetically defined subgroups, for which a specific experimental arm will be available within a controlled clinical protocol.
Joint work with Grazia Fazio, Simona Songia, Andrea Grioni, Silvia Rigamonti, Barbara Buldini, Chiara Palmi, Valentino Conter, Giuseppe Basso, Andrea Biondi.
  • 15:00 Dario Pescini, Statistica, Universita` degli Studi di Milano-Bicocca
  • Single-cell RNA-Seq and Metabolism Modelling
  • 15:30 Luca Denti, DISCo, Universita` degli Studi di Milano-Bicocca
MALVA: genotyping by Mapping-free ALlele detection of known VAriants
The amount of genetic variation discovered and characterized in human populations is huge and is growing rapidly with the widespread availability of modern sequencing technologies. Such a great deal of variation data, that accounts for human diversity, leads to various challenging computational tasks, including variant calling and genotyping of newly sequenced individuals. The standard pipelines for addressing these problems include read mapping, which is a computationally expensive procedure. A few mapping-free tools were proposed in recent years to speed up the genotyping process. While such tools have highly efficient run-times, they focus on isolated, bi-allelic SNPs, providing limited support for multi-allelic SNPs, indels, and genomic regions with high variant density.
To address these issues, we introduce MALVA, a fast and lightweight mapping-free method to genotype an individual directly from a sample of reads. MALVA is the first mapping-free tool that is able to genotype multi-allelic SNPs and indels, even in high density genomic regions, and to effectively handle a huge number of variants such as those provided by the 1000 Genome Project. An experimental evaluation on whole-genome data shows that MALVA requires one order of magnitude less time to genotype a donor than alignment-based pipelines, providing similar accuracy. Remarkably, on indels, MALVA provides even better results than the most widely adopted variant discovery tools.


  • 16:00 COFFEE BREAK


  • 16:30 Simone Ciccolella, DISCo, Universita` degli Studi di Milano-Bicocca
Inferring Cancer Progression from Single-cell Sequencing while Allowing Mutation Losses
In recent years, the well-known Infinite Sites Assumption (ISA) has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions. However, recent studies leveraging Single-cell Sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. Still, all established computational methods that infer phylogenies with mutation losses are not sufficiently mature yet. To address this problem, we present the SASC (Simulated Annealing Single-Cell inference) tool which is a new and robust approach based on simulated annealing for the inference of cancer progression from SCS data sets. In particular, we introduce an extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of mutation loss in the evolutionary history of the tumor: the Dollo-k model.
  • 17:00 Anna Sandionigi ZooPlantLab, BtBs, Universita` degli Studi di Milano Bicocca
The dark side of biodiversity. Possibilities and limits to the estimation and classification of living organism
The possibility to increase the available data, the accessibility to new calculation resources and the strong reduction of DNA sequencing costs are improving our capabilities to reduce the margin of errors around the biodiversity estimation. However, in most cases, the data generated through metagenomic amplicon approaches contain more information than we can eventually use. This is due, very often, to incorrect or missing classification of the entries and lack of useful metadata associated. The need for new integrated tools to implement these pieces of information can lead to better interpretation of DNA metabarcoding data. It is possible that we do not know what is already present in our archives: what we are looking for is a smart way to make it emerge.
  • 17:30 Isabella Castiglioni, IBFM-CNR
Radiomics: Repeatibility, Reproducibility, Significance
We present the current and future challenges of the new "Personalized Medicine" made available by Radiomics; in particular we will discuss the quality and quantity criteria for data that must be respected to guarantee the stability of results.
18:00 Day Closing and Final Greetings, Alex Graudenzi, IBFM-CNR



Registration and Coffee!!!!!

Participation is free as well as coffee. Participants will just be asked to provide contact information.

Register here.