| Malta medical journal | |
| Exploration and reduction of data using principal component analysis | |
| Anton Buhagiar1  | |
| 关键词: Matrix plots; correlation; correlation matrix; spheres; ellipsoids; rotation of coordinates; principal components; factors; eigenvalues; scree plot; factor loadings for variables; factor scores of cases; uses of principal component analysis such as exploration of data; dimension reduction; regrouping of variables and ordering of data; application to a data set containing national track record times for males and females in various countries.; | |
| DOI : | |
| 学科分类:医学(综合) | |
| 来源: University of Malta * Medical School | |
PDF
|
|
【 摘 要 】
In a data set with two variables only, a scatterplot between the two variables can be easily plotted to represent the data visually. When the number of variables in the data set is large, however, it is more difficult to represent visually. The method of principal component analysis (PCA) can sometimes be used to represent the data faithfully in few dimensions (eg. three or less), with little or no loss of information. This reduction in dimensionality is best achieved when the original variables are highly correlated, positively or negatively. In this case, it is quite conceivable that 20 or 30 original variables can be adequately represented by two or three new variables, which are suitable combinations of the original ones, and which are called principal components. Principal components are uncorrelated between themselves, so that each component describes a different dimension of the data. The principal components can also be arranged in descending order of their variance. The first component has the largest variance, and is the most important, followed by the second component with the second largest variance, and so on. The first two components can then be evaluated for each case in the data set and plotted against each other in a scattergraph, the score for the first component being plotted along the horizontal axis, the score of the second component being plotted on the vertical axis. This scatterplot is a parsimonious two-dimensional picture of the variables and cases in the original data set. We illustrate the method by applying it to simulated datasets, and to a dataset containing national track record times for males and females in various countries.
【 授权许可】
Unknown
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO201912010261901ZK.pdf | 138KB |
PDF