Algorithmic research can have a number of applications. The main fields of applications of the research done in the lab are listed.
Some of the problems we are currently investigating are alternative splicing prediction, haplotyping, and probe selection.
The investigation of genetic differences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP). As a consequence, computing a complete map of all SNPs occurring in the human populations is one of the primary goals of recent studies in human genomics.
The study of microbial communities requires to analyze populations of ribosomal RNA gene (rDNA) clones by hybridization experiments on DNA microarrays. Unlike in the classical SBH (sequencing by hybridization) procedure, where multiple probes are on a DNA chip, in this application we perform a series of experiments, each one consisting of applying a single probe to a DNA microarray containing a large sample of rDNA sequences from the studied population. The overall cost of the analysis is thus roughly proportional to the number of experiments, underscoring the need for minimizing the number of probes. We have developed some efficient algorithms for solving such problem, and our preliminary tests demonstrate that those algorithms are able to find satisfactory probe sets for real rDNA data.
For some problems, decomposing a graph allows for the application of a divide-and-conquer technique to design efficient algorithms. A kind of decomposition that we have studied is called modular decomposition.
Correlation Clustering is a model for clustering data when a binary relationship between data points is known. More precisely, for each pair of points we have two scores measuring the similarity and dissimilarity respectively, of the two points, and we would like to compute an optimal partition where the value of a partition is obtained by summing up scores of pairs involving points from a same cluster and scores of pairs involving points from different clusters. The main difference between Correlation Clustering and, say, k-means is that for the former the number of sets of the solution is not known a priori.
A closely related problem is Consensus Clustering, where we are given a set of partitions and we would like to obtain a partition that best summarizes the input partitions. We have investigate the computational complexity of both problems and we have designed some efficient approximation algorithms.