home services analysis & bioinformatics
The Microarray Centre also offers analysis and bioinformatics-related services. Our Bioinformatics Team provides
many services, such as discussion and analysis of your array-based project and experimental design assessments,
complementary to UHNMAC customers and UHN researchers.
- Take a look through our databases: human CpG, mouse CpG and cDNA data centres
- Contact our Bioinformatics Manager, Carl Virtanen, at
- Current UHNMAC clients can use our secure User Portal to retrieve data and check the status of a project
Visit our Bioinformatics Website
Data analysis for microarray projects
A basic data analysis package is available for on a per project basis. Our data analysis service is also available to customers with microarray data obtained from other facilities. The basic data analysis package includes:
- consultation on experimental design (complimentary)
- extensive quality control, filtering, and normalisation
- statistical analyses such as clustering and T-tests/ANOVA using R/Bioconductor and GeneSpring software packages
- data available via secure online User Portal
Pricing for advanced data analysis is also quoted on a per project basis. Advanced data analysis can include:
- gene ontology
- pathway analysis
- literature searches
- sequence and SNP analysis from raw chromatograms
- BLAST searches
- multiple alignments and trees
- primer design
- antibody epitope prediction
- computer programming
We have also put together some information that you may find helpful. If you have any suggestions or additions, please contact us at and let us know!
Look through resources and links to find solutions for your needs
Academic and open source
- Explanations of various terms used in bioinformatics
As with any science experiment, the reliability of your data increases with the number of replicates performed. Of course, if you are using a limited source of RNA, the number of replicates you are capable of doing will also be limited. Most groups strive to have at least 3 replicates to allow for statistical analysis. For UHNMAC Service packages that include data analysis, a minimum of 3 replicates are required.
It is possible that you did nothing wrong! When comparing two RNA samples, the majority of genes will not be differentially expressed and thus the majority will have ratios around 1. If you are sure that at least a few genes should be differentially expressed, you may have to repeat the experiment (and do a reciprocal labelling!) to verify the results.
Pre-processing is a step that extracts or enhances meaningful data characteristics and is often performed prior to analyzing or “processing” the data. General pre-processing techniques include log transformations, combining replicates, eliminating outliers, use of control spots, and normalisation.
Normalisation means to adjust microarray data to account for systemic differences across data sets. Most often, normalisation is used to account for the different dye efficiency in a two-colour experiment.
Global normalisation takes into account all areas of the array during normalisation. Significant local effects can heavily influence this method. Sub-grid normalisation calculates the normalisation factor for each sub-grid independently, thus making this method insensitive to local variations on the array.
Due to the controversy over what constitutes a housekeeping gene (for a given organism, tissue, condition, etc), we tend not to use housekeeping genes for normalisation.
LOWESS, also known as LOESS, stands for LOcally WEighted polynomial regreSSion. The general idea for this kind of normalisation is to fit a mathematical function through the data and obtain a model of the distortion, and then use this model to adjust the data. The LOWESS function is a curve-fitting equation. It performs a local fit to the data in an intensity-dependent manner. The intensity value for each spot is normalised based on data distribution in the immediate neighbourhood of the spot’s intensity.
The logarithmic transformation provides values that are more easily interpretable and more biologically meaningful. It is convenient to log transform numbers in order to eliminate misleading disproportion between two relative changes. For example, assume two spots both have intensity values of 1000 in the control sample, and values of 100 and 10,000 in the treated sample. The absolute difference between the control and treated samples is 900 and 9000, respectively, for the two spots. However, from a biological point of view, the phenomenon is the same, a 10-fold change in both genes (10-fold increase for one gene, and a 10-fold decrease for the other gene). By using log transformation, fold changes happening around small intensity values will be comparable to fold changes happening around large intensity values. In this example, one gene has a fold change of 1 and the other of –1.
Log base 2 has the advantage of producing a continuous spectrum of values and treating up- and down-regulated genes in a similar way. A gene up-regulated by a factor of 2 has a log2 (ratio) of 1, a gene down-regulated by a factor of 2 has a log2 (ratio) of –1 and a gene with no change in expression (ratio of 1) has a log2 (ratio) equal to zero. The log base 2 transformations are convenient and make further analysis and data interpretation easier.
ANOVA stands for Analysis of Variance. The idea behind ANOVA is to study the relationship between the inter-group and within-group variabilities. One-way ANOVA investigates the data by only considering one factor, or in other words, considers only one way of partitioning the data into groups. Two-way ANOVA considers that the data can be grouped by at least two factors.
M is defined as log2 (LexE/LexR) and the formula for A is (log2 (LexE*LexR/))/2. This plot, as opposed to a log vs log plot, allows for the rapid identification of skewed data. Data points in a perfectly normalised data set will be centered on the M=0 axis.
In a 16-bit tiff file, genes with an intensity value of 65,536 are considered saturated. The true intensity of these spots is actually unknown and those spots are flagged and often excluded from further analysis.
Distance metrics, also known as similiarity metrics, are a function that takes two points (x and y) in an n-dimensional space and has the following properties: symmetry, positivity, and triangle inequality. A Euclidean distance is the simplest (shortest) distance between x and y, while the Manhattan (city block) distance is one in which movement can only be in parallel with the x or y axis.
Principal Component Analysis (PCA) is a numerical procedure carried out to discover or reduce dimensionality of the data set, identify new meaningful underlying variables, and to magnify the trends in data (increasing separation of poorly correlated elements and bringing highly correlated elements closer together). PCA rotates the data space, aligning the directions of the greatest variability in the data (the first and second principal component) with the x and y axes of the scatter plot.
Validation is most often performed by real-time (quantitative) PCR. Validation can also be performed using nanostrings, Bio-Plex assays, and the Ziplex platform. More information about data validation services offered at the UHNMAC can be found here.
Minimum Information About Microarray Experiments (MIAME) is a set of guidelines that outlines the minimum information required to interpret unambiguously, and possibly verify, microarray experiments. Visit the Microarray Gene Expression Data Society (MGED) for more information. Brazma, A, et al. Minimum information about a microarray experiment (MIAME), towards standards for microarray data. Nature Genetics, 2001, 30(4):e15
Gene Ontology (GO) is a controlled vocabulary to describe gene and gene product attributes of any organism. The GO project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The three organising principles of GO are molecular function, biological process and cellular component.