Title: | Random Forest Cluster Analysis |
---|---|
Description: | Tools to perform random forest consensus clustering of different data types. The package is designed to accept a list of matrices from different assays, typically from high-throughput molecular profiling so that class discovery may be jointly performed. For references, please see Tao Shi & Steve Horvath (2006) <doi:10.1198/106186006X94072> & Monti et al (2003) <doi:10.1023/A:1023949509487> . |
Authors: | Ankur Chakravarthy, PhD |
Maintainer: | Ankur Chakravarthy <[email protected]> |
License: | GPL |
Version: | 0.1.2 |
Built: | 2024-10-29 05:46:50 UTC |
Source: | https://github.com/cran/RFclust |
These data serve as an example dataset to execute RFCluster on.These were processed and originally included in the iCluster R package which is likely to be archived on CRAN. I have reincluded this dataset here to permit the example to be run.
data("gbm")
data("gbm")
A list of matrices containing Copy Number, Methylation and Expression estimates for 55 GBMs for 1500-1800 genes. Data were originally derived by The Cancer Genome Atlas project.
https://doi.org/10.1093/bioinformatics/btp659
data(gbm)
data(gbm)
This takes a list of matrices of different data types , features in rows, samples in columns, and performs random forest clustering (one-dimensional). When multiple data types are available this is one way of modelling the data together.
RFCluster(Data, ClustAlg = "pam", MaxK, nTrees = 1000, exportFigures = "pdf", ClustReps = 500, ProjectName = "RFCluster", verbose = TRUE, ...)
RFCluster(Data, ClustAlg = "pam", MaxK, nTrees = 1000, exportFigures = "pdf", ClustReps = 500, ProjectName = "RFCluster", verbose = TRUE, ...)
Data |
Named list, contains matrices with samples in columns, features in rows. The names of the list should represent the platform or the feature type, such as expression, or CN, or clin; as long as it is distinct. |
ClustAlg |
Algorithm for consensus clustering |
MaxK |
Maximum number of clusters you are searching for |
nTrees |
How many trees are we using in the random forest to generate a proximity matrix? |
ProjectName |
Name of the project, to annotate plots and other output |
ClustReps |
Number of replicates for consensus clustering |
verbose |
Should output be verbose? |
exportFigures |
Format of the results file for figures et cetera to be exported to |
... |
Other optional arguments, passed onto ConsensusClusterPlus; see that package's documentation for a full set. |
Standard output for ConsensusClusterPlus runs.
Ankur Chakravarthy, PhD
Monti, S., Tamayo, P., Mesirov, J. et al. Machine Learning (2003) 52: 91. https://doi.org/10.1023/A:1023949509487
Tao Shi & Steve Horvath (2006) Unsupervised Learning With Random Forest Predictors, Journal of Computational and Graphical Statistics, 15:1, 118-138, DOI: 10.1198/106186006X94072
library(RFclust) #Get GBM example data from the iCluster package, repackaged to maintain CRAN compatibility data(gbm) #Transpose so columns are samples and features are rows gbm.t <- lapply(gbm, t) #Make sure the sample names are the same across the matrices for the different #samples - the code breaks otherwise colnames(gbm.t[[2]]) <- colnames(gbm.t[[3]]) <- colnames(gbm.t[[1]]) #Run function on that dataset - these methods are computationally intensive #so automatic testing during build has been disabled (takes > 5s). #Users may test the software by running the code separately as the example is reproducible Test.cluster <- RFCluster(Data = gbm.t, ClustAlg = "pam", MaxK = 5, nTrees = 10, ProjectName = "RFCluster_Test", ClustReps = 50 , writeTable = FALSE, plot = NULL) unlink("RFCluster_Test",recursive = TRUE)
library(RFclust) #Get GBM example data from the iCluster package, repackaged to maintain CRAN compatibility data(gbm) #Transpose so columns are samples and features are rows gbm.t <- lapply(gbm, t) #Make sure the sample names are the same across the matrices for the different #samples - the code breaks otherwise colnames(gbm.t[[2]]) <- colnames(gbm.t[[3]]) <- colnames(gbm.t[[1]]) #Run function on that dataset - these methods are computationally intensive #so automatic testing during build has been disabled (takes > 5s). #Users may test the software by running the code separately as the example is reproducible Test.cluster <- RFCluster(Data = gbm.t, ClustAlg = "pam", MaxK = 5, nTrees = 10, ProjectName = "RFCluster_Test", ClustReps = 50 , writeTable = FALSE, plot = NULL) unlink("RFCluster_Test",recursive = TRUE)