Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.
Published in | Biomedical Statistics and Informatics (Volume 2, Issue 4) |
DOI | 10.11648/j.bsi.20170204.16 |
Page(s) | 166-171 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2017. Published by Science Publishing Group |
Microarray Gene Expression Data, T-Test, Renewed Approach, Wilcoxon Signed Rank Test, Differentially Expressed Genes, Outlier
[1] | Nguyen TV, Andresen BS, Corydon TJ, Ghisla S, Abd-El Razik N, Mohsen AW, Cederbaum SD, Roe DS, Roe CR, Lench NJ, Vockley J (2002); Identification of isobutyryl-CoA dehydrogenase and its deficiency in humans. Mol Genet Metab, vol. 77, pp. 68-79. |
[2] | Chu G, Narasimhan B, Tibshirani R, Tusher V (2002); "SAM "Significance Analysis of Microarrays" Users Guide and technical document." |
[3] | Monti S, Tamayo P, Mesirov J, Golub T. (2003); Consensus clustering: a re-sampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn, vol. 52, pp. 91-118. |
[4] | Devore J. And Peck R (1997); “Satistics: The exploration and analysis of data”, 3rd edition, Duxury Press, Pacific Grove, CA. |
[5] | Thomas JG, Olson JM, Tapscott SJ, Zhao (2001); An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research, vol.11, No. 7, pp. 1227-1236. |
[6] | Pan W (2001); A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics, vol. 18, pp. 546-554. |
[7] | Efron B, Tibshirani R, Gross V, Tusher VG (2001); Empirical Bayes analysis of a microarray experiment. Journal of American Statistic Association, vol. 96, pp. 1151-1160. |
[8] | Tusher VG, Tibshirani R, and Chu G (2001); “Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,” Proceeding National Academy of Sciences USA, vol. 98, pp. 5116-5121. |
[9] | Jung K., Quast K., Gannoun A. and Urfer W. (2006); A renewed approach to the nonparametric analysis of replicated microarray experiments. Biometrical Journal, vol. 48, pp. 245-254. |
[10] | Quackenbush J (2001); Computational analysis of cDNA microarray data. Nature Reviews, vol. 6, No. 2, pp. 418-428. |
[11] | Chun-Ming Jiang, Xiao-Hua Wang, Jin Shu, Wei-Xia Yang, Ping Fu, Li-Li Zhuang, Guo-Ping Zhou (2015); Analysis of differentially expressed genes based on microarray data of glioma. Int J Clin Exp Med, vol. 8, pp. 17321–17332. |
[12] | Jennifer SM, Ariana KL, Charles JR, Qing-Xiang AS (2015); Differentially Expressed Genes and Signature Pathways of Human Prostate Cancer. PLoS One, vol. 10, No. 12, e0145322. https://doi.org/10.1371/journal.pone.0145322. |
[13] | Hossen Md. B. and Siraj-Ud-Doulah (2016); Identification of Robust Clustering Methods in Gene Expression Data Analysis. Current Bioinformatics, vol. 11, No. 3, pp. 01-05. |
[14] | Best DJ and Rayner JC (1987); Multiple Comparisons, Selection and Applications in Biometry. Vol. 30, pp. 719-724. |
[15] | Dudoit S, Shaffer CBJ (2003); Multiple hypothesis testing in microarray experiments. Statistical Science. vol. 18, No. 1, pp. 71–103. |
[16] | Alka B, Monir HS, Hassan AK (2015); Incremental principal component analysis based outlier detection methods for spatiotemporal data streams. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. II-4/W2, pp. 67-71. |
[17] | Jolliffe (2001); Principal Component Analysis, 2nd edition, Springer Series in Statistics. |
[18] | Snedecor, G. W., Cochran, W. G. (1980). Statistical Methods (seventh edition). Iowa State University, Press, Ames, Iowa. |
[19] | Corder, G. W., Foreman, D. I. (2009). Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach Wiley, ISBN 978-0-470-45461-9. |
[20] | Meiller A, Alvarez S, Drané P, Lallemand C, Blanchard B, et al. (2007); p53-dependent stimulation of redox-related genes in the lymphoid organs of gamma-irradiated mice: identification of haeme-oxygenase 1 as a direct p53 target gene. Nucleic Acids Res, vol. 20, pp. 6924–6934. |
[21] | Zhao LP, Prentice R and Breeden L (2001); Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc National Acedemy of Science USA, vol. 98, pp. 5631-5636. |
APA Style
Harun or Rashid, Arefin Mowla, Siddikur Rahman, Siraj-Ud-Doulah, Bipul Hossen. (2017). Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomedical Statistics and Informatics, 2(4), 166-171. https://doi.org/10.11648/j.bsi.20170204.16
ACS Style
Harun or Rashid; Arefin Mowla; Siddikur Rahman; Siraj-Ud-Doulah; Bipul Hossen. Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomed. Stat. Inform. 2017, 2(4), 166-171. doi: 10.11648/j.bsi.20170204.16
@article{10.11648/j.bsi.20170204.16, author = {Harun or Rashid and Arefin Mowla and Siddikur Rahman and Siraj-Ud-Doulah and Bipul Hossen}, title = {Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data}, journal = {Biomedical Statistics and Informatics}, volume = {2}, number = {4}, pages = {166-171}, doi = {10.11648/j.bsi.20170204.16}, url = {https://doi.org/10.11648/j.bsi.20170204.16}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.bsi.20170204.16}, abstract = {Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.}, year = {2017} }
TY - JOUR T1 - Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data AU - Harun or Rashid AU - Arefin Mowla AU - Siddikur Rahman AU - Siraj-Ud-Doulah AU - Bipul Hossen Y1 - 2017/10/20 PY - 2017 N1 - https://doi.org/10.11648/j.bsi.20170204.16 DO - 10.11648/j.bsi.20170204.16 T2 - Biomedical Statistics and Informatics JF - Biomedical Statistics and Informatics JO - Biomedical Statistics and Informatics SP - 166 EP - 171 PB - Science Publishing Group SN - 2578-8728 UR - https://doi.org/10.11648/j.bsi.20170204.16 AB - Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data. VL - 2 IS - 4 ER -