December 11, 2017, Monday

From BIMIB

Jump to: navigation, search

Publications/COR2009

Contents

SUPPLEMENTARY MATERIAL

Software Documentation

The software is written in MathWorks Matlab language, a platform independent platform.

It is composed by the code (.m) files and GUIs (.fig) files. It consists in 4 independent computational modules that implement the described mathematical theories: Functional Enrichment, Kernel Method, Jaccard Method and Contingency. These modules are linked together like a flow as machinery gears but they are standalone and compute individually. Each module takes files in input and produces others in output; modules are so linked by the flow of files:

  • Gene clusters -> Functional Enrichment
  • Functional Enrichment -> Jaccard/Kernel method
  • clusters, Jaccard/Kernel method -> Contingency
  • Each module is provided with a GUI that allows to import files, produced by the preceding module, and set proper options described by the mathematical theories

The Functional Enrichment module requires in input one microarray file, tab delimited, formatted in two columns: the gene name and the belonging cluster (the zero = 0 cluster is skipped). Two thresholds can be set, in the GUI, to act as filters: a filter that discharges GO Terms having less than 4 genes; a filter that discharges low annotated genes, lower than the threshold. The module produces in output one file having the arbitrary name provided in the GUI.
The Kernel module requires in input multiple Functional Enrichment files. The module produces in output a number of window files reporting connections between clusters of adjacent windows.
The Jaccard module requires in input multiple Functional Enrichment files. The module produces in output a number of window files reporting connections between clusters of adjacent windows and a file reporting computed Jaccard coefficients.
The Contingency module requires in input multiple Microarray files and multiple Cluster/Window files; if necessary, can be assigned one additional file reporting the list of GO Terms to be used during the computation (tab delimited and formatted in three columns: the window number, the step number and the GO Term). The module produces in output a number of contingency files reporting the counting of genes for each connection for each GO Term.

System Requirements:

  • MathWorks MATLAB 7+ (base pack) language
    OS: Windows, Linux, Solaris, Mac
    RAM: 300+ MB
    HDD: 50 MB
    CPU: 1.0+ GHz


Functional Enrichment Module

For each cluster, it performs Gene Ontology Functional Enrichment Analysis of its genes

Input: Clusters
Output: Functional Enrichment



Distance Module


It performs clusters similarity, evaluating tree-distance between the Gene Ontology Directed Aciclic Graphs (GO-DAGs) previously calculated using Jaccard and Kernel functions.


Input: Functional Enrichment
Output:Cluster pairs in contiguous windows


JACCARD


KERNEL


GO Prefiltering Module

It selects gene ontology annotations for the next contingency table calculation

Input: Cluster pairs in contiguous windows
Output: Selected GO terms



Contingency Module

It performs the contingency table of distribution of GO terms selected in previous module

Input: Cluster pairs in contiguous windows and selected GO terms
Output: Contingency table



Evaluate Module 


It evaluates the performance of Kernel and Jaccard methods using the associated entropy

Input: Contingency table
Output: Final Score 
 



Pre-processing and processing data results

Diauxic Shift Dataset

Diauxic shift consists in the switch from anaerobic to aerobic metabolism after glucose exhaustion, so this DNA microarrays virtually contains every gene of Saccharomyces cerevisiae were used to carry out a comprehensive investigation of the temporal program of gene expression accompanying the metabolic shift from fermentation to respiration.

Seven distinct temporal patterns of induction were observed.
Timecourse Dataset


DIAUXIC-SHIFT DATASET PRE-PROCESSING

Cluster Analysis Results
(method: K-Means; number of clusters: 4)

It was used K-means method. It is a partitioning algorithm with a prefixed number k of cluster. It tries to minimize the sum of within-cluster-variances. The algorithm chooses a random sample of k different objects as initial cluster midpoints. Then it alternates between two steps until convergence:
1. Assign each object to its closest of the k midpoints with respect to Euclidean distance
2. Calculate k new midpoints as the averages of all points assigned to the old midpoints, respectively

We are prefixed 4 numbers of clusters.

Functional Enrichment Results


Yeast Sporulation Dataset

Diploid cells of budding yeast produce haploid cells through the developmental program of sporulation, which consists of meiosis and spore morphogenesis. So, this DNA microarrays containing nearly every yeast gene were used to assay changes in gene expression during sporulation. Seven distinct temporal patterns of induction were observed. In our study, it was used in its original form, without excluding any sample

YEAST SPORULATION DATASET PRE-PROCESSING

Timecourse Dataset

Cluster analysis Results:
(method: k-means, number of clusters: 4)


Also for this dataset, it was used k-means cluster method, with 4 numbers of clusters

Functional Enrichment Results:


DIAUXIC-SHIFT & YEAST SPORULATION PROCESSING

JACCARD METHOD

Jaccard Function Application

Prefiltering GO

Contingency

Evaluation


KERNEL METHOD

Kernel Function Application

Prefiltering GO

Contingency

Diauxic Shift &amp Yeast Sporulation Kernel Contingency Table_1-1

Evaluation


Spellman Cell Cycle Dataset

PRE-PROCESSING

Explorative Analysis

Timecourse Dataset

Cluster Analysis Results
(CLICK)

Functional Enrichment Results