BMC Research Notes | |
Estimating the similarity of alternative Affymetrix probe sets using transcriptional networks | |
Michel Bellis1  | |
[1] CNRS, CRBM, UMR-5237, 1919 Route de Mende, Montpellier 34293, France | |
关键词: Matlab; Python; Alternative probe sets; 3’ Alternative poly-adenylation; Transcriptional networks; Microarrays; Bioinformatics; | |
Others : 1143157 DOI : 10.1186/1756-0500-6-107 |
|
received in 2013-02-13, accepted in 2013-02-28, 发布年份 2013 | |
【 摘 要 】
Background
The usefulness of the data from Affymetrix microarray analysis depends largely on the reliability of the files describing the correspondence between probe sets, genes and transcripts. Particularly, when a gene is targeted by several probe sets, these files should give information about the similarity of each alternative probe set pair. Transcriptional networks integrate the multiple correlations that exist between all probe sets and supply much more information than a simple correlation coefficient calculated for two series of signals. In this study, we used the PSAWN (Probe Set Assignment With Networks) programme we developed to investigate whether similarity of alternative probe sets resulted in some specific properties.
Findings
PSAWNpy delivered a full textual description of each probe set and information on the number and properties of secondary targets. PSAWNml calculated the similarity of each alternative probe set pair and allowed finding relationships between similarity and localisation of probes in common transcripts or exons. Similar alternative probe sets had very low negative correlation, high positive correlation and similar neighbourhood overlap. Using these properties, we devised a test that allowed grouping similar probe sets in a given network. By considering several networks, additional information concerning the similarity reproducibility was obtained, which allowed defining the actual similarity of alternative probe set pairs. In particular, we calculated the common localisation of probes in exons and in known transcripts and we showed that similarity was correctly correlated with them. The information collected on all pairs of alternative probe sets in the most popular 3’ IVT Affymetrix chips is available in tabular form at http://bns.crbm.cnrs.fr/download.html webcite.
Conclusions
These processed data can be used to obtain a finer interpretation when comparing microarray data between biological conditions. They are particularly well adapted for searching 3’ alternative poly-adenylation events and can be also useful for studying the structure of transcriptional networks. The PSAWNpy, (in Python) and PSAWNml (in Matlab) programmes are freely available and can be downloaded at http://code.google.com/p/arraymatic webcite. Tutorials and reference manuals are available at BMC Research Notes online (Additional file 1) or from http://bns.crbm.cnrs.fr/softwares.html webcite.
【 授权许可】
2013 Bellis; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150329011436253.pdf | 1940KB | download | |
Figure 8. | 85KB | Image | download |
Figure 7. | 98KB | Image | download |
Figure 6. | 113KB | Image | download |
Figure 5. | 63KB | Image | download |
Figure 4. | 88KB | Image | download |
Figure 3. | 80KB | Image | download |
Figure 2. | 88KB | Image | download |
Figure 1. | 131KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
【 参考文献 】
- [1]Yu H, Wang F, Tu K, Xie L, Li YY, Li YX: Transcript-level annotation of Affymetrix probe sets improves the interpretation of gene expression data. BMC Bioinforma 2007, 8:194. BioMed Central Full Text
- [2]Harbig J, Sprinkle R, Enkemann SA: A sequence-based identification of the genes detected by probe sets on the Affymetrix U133 plus 2.0 array. Nucleic Acids Res 2005, 33:e31.
- [3]Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng F: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 2005, 33:e175.
- [4]Chalifa-Caspi V, Yanai I, Ophir R, Rosen N, Shmoish M, Benjamin-Rodrig H, Shklar M, Stein TI, Shmueli O, Safran M, Lancet D: GeneAnnot: comprehensive two-way linking between oligonucleotide array probe sets and GeneCards genes. Bioinformatics 2004, 20:1457-1458.
- [5]Ballester B, Johnson N, Proctor G, Flicek P: Consistent annotation of gene expression arrays. BMC Genomics 2010, 11:294. BioMed Central Full Text
- [6]Cui X, Loraine AE: Consistency analysis of redundant probe sets on affymetrix three-prime expression arrays and applications to differential mRNA processing. PLoS One 2009, 4:e4229.
- [7]Elbez Y, Farkash-Amar S, Simon I: An analysis of intra array repeats: the good, the bad and the non informative. BMC Genomics 2006, 7:136. BioMed Central Full Text
- [8]Li K: Genome-wide coexpression dynamics: theory and application. Proc Natl Acad Sci U S A 2002, 99:16875-16880.
- [9]Lai Y, Wu B, Chen L, Zhao H: A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics 2004, 20:3146-3155.
- [10]Stalteri MA, Harrison AP: Interpretation of multiple probe sets mapping to the same gene in Affymetrix GeneChips. BMC Bioinforma 2007, 8:13. BioMed Central Full Text
- [11]D’Mello V, Lee JY, MacDonald CC, Tian B: Alternative mRNA polyadenylation can potentially affect detection of gene expression by Affymetrix genechip arrays. Appl Bioinformatics 2006, 5:249-253.
- [12]Salisbury J, Hutchison KW, Wigglesworth K, Eppig JJ, Graber JH: Probe-level analysis of expression microaarays characterizes isoform-specific degradation during mouse oocyte maturation. PLoS One 2009, 4:e7479.
- [13]Robinson TJ, Dinan MA, Dewhirst M, Garcia-Blanco MA, Pearson JL: SplicerAV: a tool for mining microarray expression data for changes in RNA processing. BMC Bioinforma 2010, 11:108. BioMed Central Full Text
- [14]Hennetin J, Pehkonen P, Bellis M: Construction and use of gene expression covariation matrix. BMC Bioinforma 2009, 10:214. BioMed Central Full Text
- [15]Thierry-Mieg D, Thierry-Mieg J: AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol 2006, 7(Suppl 1):S12. BioMed Central Full Text
- [16]Martin DE, Demougin P, Hall MN, Bellis M: Rank difference analysis of microarrays (RDAM), a novel approach to statistical analysis of microarray expression profiling data. BMC Bioinforma 2004, 5:148. BioMed Central Full Text
- [17]Liu D, Brockman JM, Dass B, Hutchins LN, Singh P: Systematic variation in mRNA 3’-processing signals during mouse spermatogenesis. Nucleic Acids Res 2007, 35:234-246.
- [18]Mayr C, Bartel DP: Widespread shortening of 3’utrs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 2009, 138:673-684.
- [19]Martignetti L, Zinovyev A, Barillot E: Identification of shortened 3’ untranslated regions from expression arrays. J Bioinform Comput Biol 2012, 10:1241001.
- [20]Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 2002, 30:1575-1584.
- [21]Mao L, Van Hemert JL, Dash S, Dickerson JA: Arabidopsis gene co-expression network and its functional modules. BMC Bioinforma 2009, 10:346. BioMed Central Full Text
- [22]Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005, 102:15545-15550.