May 17, 2012, Thursday

From BIMIB

Jump to: navigation, search

Pirola Yuri

I am a joint post-doctoral fellow at DISCo, University of Milano-Bicocca (Milan) and Parco Tecnologico Padano (Lodi) working on combinatorial problems arising from bioinformatics and computational biology, with a special emphasis on haplotyping and gene structure prediction problems. My main interests involve the study of the computational complexity of optimization problems, the design of exact/approximation/parameterized algorithms for their resolution, and their implementation and experimentation.

I received a PhD in Computer Science in 2010 with a thesis on combinatorial problems in studies of genetic variations under the supervision of Prof. Paola Bonizzoni. I received my master degree in Computer Science in 2006 working on a thesis on problem difficulty measurement in Genetic Programming under the supervision of Leonardo Vanneschi and Giancarlo Mauri.

My Curriculum Vitae provides an up-to-date and detailed overview of my scientific interests and activities (Italian, last update April 12, 2012).

My education includes a summer school organized by CINECA on Parallel and Scientific Computing which I attended in 2006 (information on new editions are available here) and the Lipari International Summer School on Bioinformatics and Computational Biology of June 2008.

I have been a Visiting Scholar at University of California, Riverside invited by Prof. Tao Jiang from February to July 2009.


Contents

Talks

  • A fast and practical approach to genotype phasing and imputation on a pedigree with erroneous and incomplete information, ICCABS 2012 (slides).
    This work proposes the Min-Recombinant Haplotype Configuration with Bounded Errors problem (MRHCE), which extends the original Min-Recombinant Haplotype Configuration formulation by incorporating two common characteristics of real data: errors and missing genotypes (including untyped individuals). We describe a practical algorithm for MRHCE that is based on a reduction to the Satisfiability problem (SAT) and exploits recent advances in the constraint programming literature. An experimental analysis demonstrates the soundness of our model and the effectiveness of the algorithm under several scenarios.
    The analysis on real data and the comparison with state-of-the-art programs reveals that our approach couples better scalability to large and complex pedigrees with the explicit inclusion of genotyping errors into the model.
    The software, released under the GNU General Public License, can be freely downloaded from this page.
  • PIntron: a fast method for gene structure prediction via maximal pairings of a pattern and a text, ICCABS 2011 (slides).
    In this work, we propose a novel pipeline for computational gene-structure prediction based on spliced alignment of expressed sequences (ESTs and mRNAs). This pipeline, called PIntron, is composed by four steps: Firstly, alternative alignments of expressed sequences to a reference genomic sequence are implicitly computed and represented in a graph (called embedding graph) by a novel fast spliced alignment procedure. Secondly, biologically meaningful alignments are extracted. Then, a consensus gene structure induced by the previously computed alignments is determined based on a parsimony principle. Finally, the resulting introns are reconciliated and classified according to general biological criteria.
    The software, released under the GNU Affero General Public License, can be freely downloaded from this page.
  • Haplotype Inference on Pedigrees with Recombinations and Mutations, WABI 2010 (slides).
    The work proposes a new combinatorial formulation for haplotype inference on general pedigrees that generalizes the existing combinatorial ones to a more realistic settings. We prove an approximation-preserving reduction from this problem, called Minimum Change Haplotype Configuration (MCHC), to a well-known coding theory problem, namely the Nearest Codeword Problem. This reduction automatically implies the approximability of MCHC within a factor O(n/log n) and, more importantly, it leads to a new very efficient heuristic algorithm that has been experimentally compared with other state-of-the-art HI methods.
    The software implementing the heuristic is freely available at this page under the GNU General Public License.
  • Minimum Factorization Agreement of Spliced ESTs, WABI 2009 (slides).
    The work presents a new parsimony-based formulation for the problem of choosing the best alignment of an expressed sequence against a genomic sequence exploiting the redundancy of current expressed sequence databases. A preliminary experimental evaluation of the formulation and the related algorithm shows their applicability on real instances.
    The implementation of this approach is integrated in our software PIntron.
  • Pure Parsimony Xor Haplotyping, ISBRA 2009 (slides).
    In this work we addressed the problem of haplotype inference from xor genotypes under the pure parsimony assumption. Exact algorithms for restricted instances, a fixed parameter algorithm, an approximation algorithm, and an effective heuristic have been proposed.
    A prototypical implementation of the heuristic is freely available at this page under the GNU General Public License.


Software

The development of most of the software projects is based on GitHub. Please refer to my GitHub page or to the AlgoLab GitHub profile for a more complete description of the projects, along with their documentation.

Recently, I contributed to the following projects:

  • reHCstar, an exact algorithm for genotype phasing and imputation on a pedigree with erroneous and incomplete information.
  • PIntron, a computational gene-structure prediction pipeline.
  • Heu-MCHC, a fast and accurate heuristic algorithm for the haplotype inference problem on pedigree data with recombinations and mutations.
  • PPXH-GR, a fast heuristic for the Pure-Parsimony Xor-Haplotyping problem.
  • PXH-GA, an implementation of Genetic Algorithms to solve the Pure Parsimony Xor-Haplotyping Problem, written for the PhD course on Evolutionary Algorithms.


Education

PhD

My PhD thesis in Computer Science was titled "Combinatorial Problems in Studies of Genetic Variations: Haplotyping and Transcript Analysis", and is freely available for download from the institutional repository. Notice that the content has been submitted to international conferences and accepted for publication. Please refer to the list of my publications.

The presentation I held for the admission to the final exam is available here (English) and the one for the final PhD thesis defense is available here (English). The progress of my research at the end of the 2nd year is described here (Italian) and the slides that I used to present it at the "Collegio Docenti" are here (English). A draft of my PhD thesis proposal can be found here (Italian). The presentation of the proposal at the "Collegio Docenti" is here (Italian).

A summary of my activities during my first year as a PhD student can be found here (Italian) while the activities of my second year are summarized here (Italian).

The presentations that I held at the end of a few PhD courses are (in reverse chronological order):

  • course: "Biomolecular Computing: Theory and Experiments", teacher: Natasha Jonoska (visiting researcher). The final assignment was the analysis of an article: I prepared a written report and an oral presentation;
  • course: "Teoria e Applicazioni del Calcolo Evoluzionistico", teacher: Leonardo Vanneschi, presentation here (Italian). See below for the source code of the program used for the experimentation;
  • course: "Web Services: concetti, applicazioni e problemi aperti", teacher: Flavio De Paoli, presentation here (Italian).


Master

The title of my master thesis was "Analisi della neutralità degli spazi di ricerca booleani in Programmazione Genetica" (English: "Analysis of Neutrality in GP Boolean Fitness Landscapes"). A copy of the abstract in English and in Italian is available. The presentation that I used to "defend" my thesis can be found here (Italian).


Teaching


Publications

(Preprints of the following papers and articles are available upon request)

Articles

Yuri Pirola, Paola Bonizzoni & Tao Jiang. An Efficient Algorithm for Haplotype Inference on Pedigrees with Recombinations and Mutations. IEEE/ACM Transactions on Computational Biology and Bioinformatics Vol. 9 (1) pp. 9-25 (2012). [url]

Leonardo Vanneschi, Yuri Pirola, Giancarlo Mauri, Marco Tomassini, Philippe Collard & Sebastien Verel. A study of the neutrality of Boolean function landscapes in genetic programming. Theoretical Computer Science Vol. 425 pp. 34-57 (2012). [url]

Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi & Yuri Pirola. Variants of constrained longest common subsequence. Information Processing Letters Vol. 110 (20) pp. 877-881 (Sep, 2010). [arXiv version]   [url]

Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola & Romeo Rizzi. Pure Parsimony Xor Haplotyping. IEEE/ACM Transactions on Computational Biology and Bioinformatics Vol. 7 (4) pp. 598-610 (Oct-Dec, 2010). [arXiv version]   [url]

Gianluca Della Vedova, Riccardo Dondi, Tao Jiang, Giulio Pavesi, Yuri Pirola & Lusheng Wang. Beyond Evolutionary Trees. Natural Computing (Sep, 2009). [url]

Paola Bonizzoni, Giancarlo Mauri, Graziano Pesole, Ernesto Picardi, Yuri Pirola & Raffaella Rizzi. Detecting Alternative Gene Structures from Spliced ESTs: A Computational Approach. Journal of Computational Biology Vol. 16 (1) pp. 43-66 (Jan, 2009). [url]

In Proceedings

Yuri Pirola, Gianluca Della Vedova, Stefano Biffani, Alessandra Stella & Paola Bonizzoni. A fast and practical approach to genotype phasing and imputation on a pedigree with erroneous and incomplete information. In Computational Advances in Bio and medical Sciences, 2nd IEEE Int. Conference, ICCABS 2012, Las Vegas NV, USA, Feb. 23-201225, 2012. IEEE. (2012).

Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi & Yuri Pirola. Parameterized Complexity of k-Anonymity: Hardness and Tractability. In Combinatorial Algorithms, 21st Int. Workshop, IWOCA 2010, London UK, Jul 26-28, Revised selected papers. C. Iliopoulos and W. Smyth (Ed.). Springer. Vol. 6460 pp. 242-255 (2011). [url]

Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola & Raffaella Rizzi. PIntron: a fast method for gene structure prediction via maximal pairings of a pattern and a text. In Computational Advances in Bio and medical Sciences, 1st IEEE Int. Conference, ICCABS 2011, Orlando FL, USA, Feb 3-5, 2011. IEEE. pp. 33-39 (2011). [url]

Yuri Pirola, Paola Bonizzoni & Tao Jiang. Haplotype Inference on Pedigrees with Recombinations and Mutations. In Algorithms in Bioinformatics, 10th Int. Workshop, WABI 2010, Liverpool UK, Sep 6-8, 2010, Proceedings. V. Moulton and M. Singh (Ed.). Springer. Vol. 6293 pp. 148-161 (2010). [url]

Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola & Raffaella Rizzi. Minimum Factorization Agreement of Spliced ESTs. In Algorithms in Bioinformatics, 9th Int. Workshop, WABI 2009, Philadelphia PA, USA, Sep 12-13, 2009, Proceedings. Springer. Vol. 5724 pp. 1-12 (Sep, 2009). [url]

Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola & Romeo Rizzi. Pure Parsimony Xor Haplotyping. In Bioinformatics Research and Applications, 5th Int. Symposium, ISBRA 2009, Fort Lauderdale FL, USA, May 13-16, 2009, Proceedings. Ion I. Mandoiu and Giri Narasimhan and Yanqing Zhang (Ed.). Springer. Vol. 5542 pp. 186-197 (May, 2009). [url]

Leonardo Vanneschi, Marco Tomassini, Philippe Collard, Sebastien Verel, Yuri Pirola & Giancarlo Mauri. A Comprehensive View of Fitness Landscapes with Neutrality and Fitness Clouds. In Genetic Programming, 10th European Conference, Proceedings. Marc Ebner et al. (Ed.). Springer. Vol. 4445 pp. 241-250 (2007). [url]

Leonardo Vanneschi, Yuri Pirola & Philippe Collard. A quantitative study of neutrality in GP boolean landscapes. In Genetic and Evolutionary Computation, 8th Annual Conference, GECCO 2006, Proceedings. ACM Press. pp. 895-902 (2006). [Free official version]   [url]

Phd Thesis

Yuri Pirola. Combinatorial Problems in Studies of Genetic Variations: Haplotyping and Transcript Analysis. Univ. degli Studi di Milano-Bicocca, Milan, Italy. (2010). [url]


Contacts

Address:
Stanza 1004
c/o Edificio U14
Dip. di Informatica Sistemistica e Comunicazione
Viale Sarca, 336
I-20126 Milano

Phone: +39 02 6448 7879

Email: yuri.pirola -as usual- disco.unimib.it


Links


License

This page and all the material here available are licensed under the Creative Commons Attribuzione-Non commerciale-Non opere derivate 2.5 Italia License.