During several years of my experience in the field of bioinformatics I have acquired skills in working with engineering of different types of bioinformatics databases (annotation of DNA and protein sequences). I have developed several complete software taking care of all aspects of computer science and bioinformatics:
- architecture of data, structure of databases;
- automated processes and iterative algorithms;
- punctual manipulation and mass analyses, grouping, filtering;
- interactive graphical visualization.
Most notable work where I was involved in is the development of the published "BioinfoTools: a kinase-oriented database integrated with a collection of tools for data manipulation". The project encapsulate all above mentioned aspects of my skills, that are going to be argued in the following.
I work with databases of orthologous of Chimpanzee, Mouse, Rat, C. elegans, Drosophila melanogaster, etc. The structure data organization is performed adopting relational technologies, Oracle, XML, MS Excel, SQL language and normal forms. Some development around automated processes related to databases are:
- conversion between data formats of different databases (LocusLink, etc.); between different microarray platforms (Amersham, Affimetrix, etc.);
- development of databases of heterogeneous cross-species annotations, based on multiple alignments among segments of genetic and proteinic sequences (orthologous as well as paralogous) of various species.
Processes for automated analysis regard:
- development of algorithms for extraction of alignments generated by the BLAST program (and its multiple personalities). In the publication ("Blast Parser: A new interactive and fully automated tool for parsing BLAST output"), the software evidences also aspects of pre-analysis on alignments (e.g. warnings on E-Value, alignments fragmented, etc.) and the data visualization with grids and reports;
- automatic protein sequences identification and allocation to the pertinent family class (kinase, glycoside hydrolase, etc.);
- iterative algorithms running through branched networks, looking for special conditions; build of new data structures;
- elaboration of statistics based on preprocessed data.
I develop complete user interfaces for graphical representation and interaction with data:
- interactive grids having functions like boolean filtering, clustering, multi-column sorting;
- visualization of phylogenetic structures of protein families by polar and linear graphical representation, with flexibility in the interaction with the tree (annotation, live notes, etc.). This work is described in the publication "PoInTree: A Polar and Interactive Phylogenetic Tree";
- system visualization of the genome framework, including structure of the 5' upstreams showing repeats, transcription factor binding sites, tandem repeats, splice isoforms, promoters, enhancers, insulators and other regulatory modules. These visualization systems use specially designed languages as an intermediary between the biological databases and the visualizer itself;
- graphics based on statistical analyses, e.g. histograms, lines, pies, surfaces, etc.
Most of described projects can be found at the site: geneproject.altervista.org
My Curriculum Vitae ITA updated in December 2007.
Email: mcarreras [ atsign ] rambler [dot] ru
Tel. (Italy): +39 0384 253579
- DISCo (Dipartimento di Informatica, Sistemistica, Comunicazione) - Università di Milano-Bicocca (UNIMIB) - Milano, Italy
- Laboratory of Bioinformatics - State Research Institute of Genetics And Selection of Industrial Microorganism (GosNIIgenetika) - Moscow, Russia