1 TL;DR

This code block is not evaluated. Need a breakdown? Look at the following sections.

# load libraries
suppressWarnings(suppressMessages(require(netDx)))
options(stringsAsFactors = FALSE)

# prepare data
library(curatedTCGAData)
library(MultiAssayExperiment)
curatedTCGAData(diseaseCode="BRCA", assays="*",dru.run=TRUE)
brca <- curatedTCGAData("BRCA",c("mRNAArray"),FALSE)

staget <- colData(brca)$pathology_T_stage
st2 <- rep(NA,length(staget))
st2[which(staget %in% c("t1","t1a","t1b","t1c"))] <- 1
st2[which(staget %in% c("t2","t2a","t2b"))] <- 2
st2[which(staget %in% c("t3","t3a"))] <- 3
st2[which(staget %in% c("t4","t4b","t4d"))] <- 4
colData(brca)$STAGE <- st2

pam50 <- colData(brca)$PAM50.mRNA
pam50[which(!pam50 %in% "Luminal A")] <- "notLumA"
pam50[which(pam50 %in% "Luminal A")] <- "LumA"
colData(brca)$pam_mod <- pam50

idx <- union(which(pam50 == "Normal-like"), which(is.na(st2)))
pID <- colData(brca)$patientID
tokeep <- setdiff(pID, pID[idx])
brca <- brca[,tokeep,]

smp <- sampleMap(brca)
samps <- smp[which(smp$assay=="BRCA_mRNAArray-20160128"),]
notdup <- samps[which(!duplicated(samps$primary)),"colname"]
brca[[1]] <- brca[[1]][,notdup]

# set "ID" and "STATUS" columns (netDx looks for these). 
pID <- colData(brca)$patientID
colData(brca)$ID <- pID
colData(brca)$STATUS <- colData(brca)$pam_mod

# define features 
groupList <- list()

# genes in mRNA data are grouped by pathways
pathList <- readPathways(getExamplePathways())
groupList[["BRCA_mRNAArray-20160128"]] <- pathList[1:3]
# clinical data is not grouped; each variable is its own feature
groupList[["clinical"]] <- list(
      age="patient.age_at_initial_pathologic_diagnosis",
       stage="STAGE"
)

# define simliarity function used to create features
# in this example, pairwise Pearson correlation is used for gene expression
# and normalized difference is used for clinical data
makeNets <- function(dataList, groupList, netDir,...) {
    netList <- c() # initialize before is.null() check
    # make RNA nets (NOTE: the check for is.null() is important!)
    # (Pearson correlation)
    if (!is.null(groupList[["BRCA_mRNAArray-20160128"]])) { 
    netList <- makePSN_NamedMatrix(dataList[["BRCA_mRNAArray-20160128"]],
                rownames(dataList[["BRCA_mRNAArray-20160128"]]),
                groupList[["BRCA_mRNAArray-20160128"]],
                netDir,verbose=FALSE, 
                writeProfiles=TRUE,...) 
    }
    
    # make clinical nets (normalized difference)
    netList2 <- c()
    if (!is.null(groupList[["clinical"]])) {
    netList2 <- makePSN_NamedMatrix(dataList$clinical, 
        rownames(dataList$clinical),
        groupList[["clinical"]],netDir,
        simMetric="custom",customFunc=normDiff, # custom function
        writeProfiles=FALSE,
        sparsify=TRUE,verbose=TRUE,...)
    }
    netList <- c(unlist(netList),unlist(netList2))
    return(netList)
}

# train the model. 
# Here we run two train/test splits (numSplits). In each split, 
# feature selection scores features out of 2, and features that
# score >=1 are used to classify test samples

set.seed(42) # make results reproducible
out <- buildPredictor(dataList=brca,groupList=groupList,
   makeNetFunc=makeNets, ### custom network creation function
   outDir=sprintf("%s/pred_output",tempdir()), ## absolute path
   numCores=1L,featScoreMax=2L, featSelCutoff=1L,numSplits=2L)

# look at results
print(summary(out))

2 Introduction

In this example, we will build a binary classifier from clinical data and gene expression data. We will create pathway-level features for gene expression and use variable-level features for clinical data.

Feature scoring is performed over multiple random splits of the data into train and blind test partitions. Feature selected networks are those that consistently score highly across the multiple splits (e.g. those that score 9 out of 10 in >=70% of splits).

Conceptually, this is what the higher-level logic looks like for a cross-validation design. In the pseudocode example below, the predictor runs for 100 train/test splits. Within a split, features are scored from 0 to 10. Features scoring >=9 are used to predict labels on the held-out test set (20%).

(Note: these aren’t real function calls; this block just serves to illustrate the concept of the design for our purposes)

numSplits <- 100     # num times to split data into train/blind test samples
featScoreMax <- 10      # num folds for cross-validation, also max score for a network
featSelCutoff <- 9
netScores <- list()  # collect <numSplits> set of netScores
perf <- list()       # collect <numSplits> set of test evaluations

for k in 1:numSplits
 [train, test] <- splitData(80:20) # split data using RNG seed
  featScores[[k]] <- scoreFeatures(train, featScoreMax)
 topFeat[[k]] <- applyFeatCutoff(featScores[[k]])
 perf[[k]] <- collectPerformance(topFeat[[k]], test)
end

3 Setup

suppressWarnings(suppressMessages(require(netDx)))

4 Data

In this example, we use curated data from The Cancer Genome Atlas, through the BioConductor curatedTCGAData package. The goal is to classify a breast tumour into either a Luminal A subtype or otherwise (binary). The predictor will integrate clinical variables selected by the user, along with gene expression data.

Here we load the required packages and download clinical and gene expression data.

suppressMessages(library(curatedTCGAData))

## Warning: package 'SummarizedExperiment' was built under R version 3.6.1

## Warning: package 'BiocParallel' was built under R version 3.6.1

suppressMessages(library(MultiAssayExperiment))

This is the data we will use:

curatedTCGAData(diseaseCode="BRCA", assays="*",dru.run=TRUE)

##                                         Title DispatchClass
## 31                       BRCA_CNASeq-20160128           Rda
## 32                       BRCA_CNASNP-20160128           Rda
## 33                       BRCA_CNVSNP-20160128           Rda
## 35             BRCA_GISTIC_AllByGene-20160128           Rda
## 36                 BRCA_GISTIC_Peaks-20160128           Rda
## 37     BRCA_GISTIC_ThresholdedByGene-20160128           Rda
## 39  BRCA_Methylation_methyl27-20160128_assays        H5File
## 40      BRCA_Methylation_methyl27-20160128_se           Rds
## 41 BRCA_Methylation_methyl450-20160128_assays        H5File
## 42     BRCA_Methylation_methyl450-20160128_se           Rds
## 43                 BRCA_miRNASeqGene-20160128           Rda
## 44                    BRCA_mRNAArray-20160128           Rda
## 45                     BRCA_Mutation-20160128           Rda
## 46              BRCA_RNASeq2GeneNorm-20160128           Rda
## 47                   BRCA_RNASeqGene-20160128           Rda
## 48                    BRCA_RPPAArray-20160128           Rda

Let’s fetch and store the data locally:

brca <- curatedTCGAData("BRCA",c("mRNAArray"),FALSE)

## snapshotDate(): 2019-04-29

## see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation

## downloading 0 resources

## loading from cache 
##     'EH594 : 594'

## see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation

## downloading 0 resources

## loading from cache 
##     'EH587 : 587'

## see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation

## downloading 0 resources

## loading from cache 
##     'EH590 : 590'

## see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation

## downloading 0 resources

## loading from cache 
##     'EH599 : 599'

## harmonizing input:
##   removing 13783 sampleMap rows not in names(experiments)
##   removing 571 colData rownames not in sampleMap 'primary'

This next code block prepares the TCGA data. In practice you would do this once, and save the data before running netDx, but we run it here to see an end-to-end example.

staget <- colData(brca)$pathology_T_stage
st2 <- rep(NA,length(staget))
st2[which(staget %in% c("t1","t1a","t1b","t1c"))] <- 1
st2[which(staget %in% c("t2","t2a","t2b"))] <- 2
st2[which(staget %in% c("t3","t3a"))] <- 3
st2[which(staget %in% c("t4","t4b","t4d"))] <- 4
colData(brca)$STAGE <- st2

pam50 <- colData(brca)$PAM50.mRNA
pam50[which(!pam50 %in% "Luminal A")] <- "notLumA"
pam50[which(pam50 %in% "Luminal A")] <- "LumA"
colData(brca)$pam_mod <- pam50

idx <- union(which(pam50 == "Normal-like"), which(is.na(st2)))
pID <- colData(brca)$patientID
tokeep <- setdiff(pID, pID[idx])
brca <- brca[,tokeep,]

smp <- sampleMap(brca)
samps <- smp[which(smp$assay=="BRCA_mRNAArray-20160128"),]
notdup <- samps[which(!duplicated(samps$primary)),"colname"]
brca[[1]] <- brca[[1]][,notdup]

## harmonizing input:
##   removing 63 sampleMap rows with 'colname' not in colnames of experiments

The important thing is to create ID and STATUS columns in the sample metadata slot. netDx uses these to get the patient identifiers and labels, respectively.

pID <- colData(brca)$patientID
colData(brca)$ID <- pID
colData(brca)$STATUS <- colData(brca)$pam_mod

5 Design custom patient similarity networks (features)

netDx allows the user to define a custom function that takes patient data and variable groupings as input, and returns a set of patient similarity networks (PSN) as output. The user can customize what datatypes are used, how they are grouped, and what defines patient similarity for a given datatype. When running the predictor (next section), the user simply passes this custom function as an input variable; i.e. the makeNetFunc parameter when calling buildPredictor().

Note: While netDx provides a high degree of flexibility in achieving your design of choice, it is up to the user to ensure that the design, i.e. the similarity metric and variable groupings, is appropriate for your application. Domain knowledge is almost likely required for good design.

netDx requires that this function take some generic parameters as input. These include:

dataList: the patient data
groupList: sets of input data that would correspond to individual networks (e.g. genes grouped into pathways)
netDir: the directory where the resulting PSN would be stored.

This section provides more details on the dataList and groupList variables.

5.1 dataList

This contains the input patient data for the predictor. Each key is a datatype, while each value is the corresponding data matrix. Note that columns are patients and rows are unit names (e.g. genes for rna, or variable names for clinical data). This will

Important: The software expects the patient order in the columns to match the row order in the pheno table.

The names are datatypes, and values contain matrices for the corresponding data type. Here is a toy example of a dataList object with expression for 100 genes and 2 clinical variables, for 20 patients

ids <- sprintf("patient%i",1:20)
mrna <- matrix(rnorm(2000),nrow=100,ncol=20) # 100 genes x 20 patients
rownames(mrna) <- sprintf("gene%i",1:100)
colnames(mrna) <- ids

age <- round(runif(20,min=20,max=35))
important_variable <- c(rep("LOW",10),rep("HIGH",10))
clin <- t(data.frame(age=age,imp_var=important_variable))
colnames(clin) <- ids

dataList <- list(clinical=clin,transcription=mrna)

summary(dataList)

##               Length Class  Mode     
## clinical        40   -none- character
## transcription 2000   -none- numeric

5.2 groupList

This object tells the predictor how to group units when constructing a network. For examples, genes may be grouped into a network representing a pathway. This object is a list; the names match those of dataList while each value is itself a list and reflects a potential network.

groupList <- list()

# genes in mRNA data are grouped by pathways
pathList <- readPathways(getExamplePathways())

## ---------------------------------------

## File: 14772294097ec_Human_AllPathways_January_24_2016_symbol.gmt

## Read 2760 pathways in total, internal list has 2712 entries

##  FILTER: sets with num genes in [10, 200]

##    => 1006 pathways excluded
##    => 1706 left

groupList[["BRCA_mRNAArray-20160128"]] <- pathList[1:3]
# clinical data is not grouped; each variable is its own feature
groupList[["clinical"]] <- list(
      age="patient.age_at_initial_pathologic_diagnosis",
       stage="STAGE"
)

So the groupList variable has one entry per data layer:

summary(groupList)

##                         Length Class  Mode
## BRCA_mRNAArray-20160128 3      -none- list
## clinical                2      -none- list

Each entry contains a list, with one entry per feature. Here we have 3 pathway-level features for mRNA and two variable-level features for clinical data.

For example, here are the networks to be created with RNA data. Genes corresponding to pathways are to be grouped into individual network. Such a groupList would create pathway-level networks:

groupList[["BRCA_mRNAArray-20160128"]][1:3]

## $GUANOSINE_NUCLEOTIDES__I_DE_NOVO__I__BIOSYNTHESIS
##  [1] "NME7"   "NME6"   "RRM2B"  "GMPS"   "NME2"   "NME3"   "NME4"   "NME5"  
##  [9] "RRM2"   "NME1"   "GUK1"   "RRM1"   "IMPDH2" "IMPDH1"
## 
## $RETINOL_BIOSYNTHESIS
##  [1] "RDH10" "DHRS4" "LRAT"  "LIPC"  "CES5A" "DHRS9" "RDH11" "DHRS3" "CES1" 
## [10] "RBP1"  "CES4A" "RBP2"  "PNLIP" "RBP5"  "RBP4"  "CES2" 
## 
## $`MUCIN_CORE_1_AND_CORE_2__I_O__I_-GLYCOSYLATION`
##  [1] "GALNT1"  "GCNT4"   "GALNT7"  "GCNT3"   "GCNT7"   "GALNT6"  "GALNT4" 
##  [8] "GALNT5"  "ST3GAL2" "ST3GAL1" "ST3GAL4" "GALNT10" "GALNT15" "GALNTL6"
## [15] "B3GNT3"  "GALNT16" "GALNT18" "GALNT11" "GALNT12" "GCNT1"   "C1GALT1"
## [22] "GALNT13" "GALNT14" "WBSCR17" "GALNT8"  "GALNT9"  "GALNT2"  "GALNT3"

For clinical data, we want to keep each variable as its own network:

head(groupList[["clinical"]])

## $age
## [1] "patient.age_at_initial_pathologic_diagnosis"
## 
## $stage
## [1] "STAGE"

5.3 Define patient similarity for each network

This function is defined by the user and tells the predictor how to create networks from the provided input data.

This function must take dataList,groupList, and netDir as input variables. The residual ... parameter is to pass additional variables to makePSN_NamedMatrix(), notably numCores (number of parallel jobs).

In this particular example, the custom similarity function does the following:

Creates pathway-level networks from RNA data using the default Pearson correlation measure makePSN_NamedMatrix(writeProfiles=TRUE,...)
Creates variable-level networks from clinical data using a custom similarity function of normalized difference: makePSN_NamedMatrix(writeProfiles=FALSE,simMetric="custom",customFunc=normDiff).

makeNets <- function(dataList, groupList, netDir,...) {
    netList <- c() # initialize before is.null() check
    # make RNA nets (NOTE: the check for is.null() is important!)
    # (Pearson correlation)
    if (!is.null(groupList[["BRCA_mRNAArray-20160128"]])) { 
    netList <- makePSN_NamedMatrix(dataList[["BRCA_mRNAArray-20160128"]],
                rownames(dataList[["BRCA_mRNAArray-20160128"]]),
                groupList[["BRCA_mRNAArray-20160128"]],
                netDir,verbose=FALSE, 
                writeProfiles=TRUE,...) 
    }
    
    # make clinical nets (normalized difference)
    netList2 <- c()
    if (!is.null(groupList[["clinical"]])) {
    netList2 <- makePSN_NamedMatrix(dataList$clinical, 
        rownames(dataList$clinical),
        groupList[["clinical"]],netDir,
        simMetric="custom",customFunc=normDiff, # custom function
        writeProfiles=FALSE,
        sparsify=TRUE,verbose=TRUE,...)
    }
    netList <- c(unlist(netList),unlist(netList2))
    return(netList)
}

Note: dataList and groupList are generic containers that can contain whatever object the user requires to create PSN. The custom function gives the user complete flexibility in feature design.

6 Build predictor

Finally we call the function that runs the netDx predictor. We provide:

number of train/test splits: numSplits,
max score for features (featScoreMax, set to 10)
threshold to call feature-selected networks for each train/test split (featSelCutoff),
and the information to create the PSN, including patient data (dataList), how variables are to be grouped into networks (groupList) and the custom function to generate features (makeNetFunc).

Running the below takes a lot of time so we have commented it out. Feel free to uncomment and run. Change numCores to match the number of cores available on your machine for parallel processing.

The call below runs 2 train/test splits. Within each split, it:

splits data into train/test using the default split of 80:20
score2 networks between 0 to 2 (i.e. featScoreMax=2)
uses networks that score >=1 out of 2 (featSelCutoff) to classify test samples for that split.

These are unrealistically low values set so the example will run fast. In practice a good starting point is featScoreMax=10, featSelCutoff=9 and numSplits=100, but these parameters depend on the sample sizes in the dataset.

set.seed(42) # make results reproducible
outDir <- sprintf("%s/pred_output",tempdir()) # location for intermediate work
# set keepAllData to TRUE to not delete at the end of the predictor run.
# This can be useful for debugging.

out <- buildPredictor(dataList=brca,groupList=groupList,
  makeNetFunc=makeNets,outDir=outDir,
  numSplits=2L,featScoreMax=2L, featSelCutoff=1L,
    numCores=1L)

## Predictor started at:

## 2020-01-27 14:11:34

## -------------------------------

## # patients = 525

## # classes = 2 { LumA,notLumA }

## Sample breakdown by class

## 
##    LumA notLumA 
##     230     295

## 2 train/test splits

## Feature selection cutoff = 1 of 2

## Datapoints:

##  BRCA_mRNAArray-20160128: 17814 units

##  clinical: 2 units

## 
## 
## Custom function to generate input nets:

## function(dataList, groupList, netDir,...) {
##  netList <- c() # initialize before is.null() check
##  # make RNA nets (NOTE: the check for is.null() is important!)
##  # (Pearson correlation)
##  if (!is.null(groupList[["BRCA_mRNAArray-20160128"]])) { 
##  netList <- makePSN_NamedMatrix(dataList[["BRCA_mRNAArray-20160128"]],
##              rownames(dataList[["BRCA_mRNAArray-20160128"]]),
##              groupList[["BRCA_mRNAArray-20160128"]],
##              netDir,verbose=FALSE, 
##              writeProfiles=TRUE,...) 
##  }
##  
##  # make clinical nets (normalized difference)
##  netList2 <- c()
##  if (!is.null(groupList[["clinical"]])) {
##  netList2 <- makePSN_NamedMatrix(dataList$clinical, 
##      rownames(dataList$clinical),
##      groupList[["clinical"]],netDir,
##      simMetric="custom",customFunc=normDiff, # custom function
##      writeProfiles=FALSE,
##      sparsify=TRUE,verbose=TRUE,...)
##  }
##  netList <- c(unlist(netList),unlist(netList2))
##  return(netList)
## }

## -------------------------------

## -------------------------------

## Train/test split # 1

## -------------------------------

##          IS_TRAIN
## STATUS    TRAIN TEST
##   LumA      184   46
##   notLumA   236   59

## # values per feature (training)

##  Group BRCA_mRNAArray-20160128: 17814 values

##  Group clinical: 2 values

## ** Creating features

## Pearson similarity chosen - enforcing min. 5 patients per net.

## ** Compiling features

## 
## ** Running feature selection

##  Class: LumA

## 
##    LumA nonpred    <NA> 
##     184     236       0

##  Scoring features

##  Writing queries:

##      184 IDs; 2 queries (92 sampled, 92 test)

##      Q1: 92 test;  92 query

##      Q2: 92 test;  92 query

## QueryRunner time taken: 2.6 s

##  Compiling feature scores

## GUANOSINE_NUCLEOTIDES__I_DE_NOVO__I__BIOSYNTHESIS.profile 
##                                                         2 
##    MUCIN_CORE_1_AND_CORE_2__I_O__I_-GLYCOSYLATION.profile 
##                                                         2 
##                                            stage_cont.txt 
##                                                         1 
##                              RETINOL_BIOSYNTHESIS.profile 
##                                                         2

##

##  Class: notLumA

## 
## nonpred notLumA    <NA> 
##     184     236       0

##  Scoring features

##  Writing queries:

##      236 IDs; 2 queries (118 sampled, 118 test)

##      Q1: 118 test;  118 query

##      Q2: 118 test;  118 query

## QueryRunner time taken: 1.7 s

##  Compiling feature scores

## GUANOSINE_NUCLEOTIDES__I_DE_NOVO__I__BIOSYNTHESIS.profile 
##                                                         2 
##    MUCIN_CORE_1_AND_CORE_2__I_O__I_-GLYCOSYLATION.profile 
##                                                         2 
##                              RETINOL_BIOSYNTHESIS.profile 
##                                                         1 
##                                              age_cont.txt 
##                                                         1

##

## 
## ** Predicting labels for test

## LumA

##  4 feature(s) selected

##  Create & compile features

##  Filter set provided

##      BRCA_mRNAArray-20160128: 3 of 3 nets pass

##      clinical: 0 of 2 nets pass

## Pearson similarity chosen - enforcing min. 5 patients per net.

##  ** LumA: Compute similarity

## notLumA

##  4 feature(s) selected

##  Create & compile features

##  Filter set provided

##      BRCA_mRNAArray-20160128: 3 of 3 nets pass

##      clinical: 0 of 2 nets pass

## Pearson similarity chosen - enforcing min. 5 patients per net.

##  ** notLumA: Compute similarity

##

## ** Predict labels

## Split 1: ACCURACY (N=105 test) = 80.0%

## 
## ----------------------------------------

## -------------------------------

## Train/test split # 2

## -------------------------------

##          IS_TRAIN
## STATUS    TRAIN TEST
##   LumA      184   46
##   notLumA   236   59

## # values per feature (training)

##  Group BRCA_mRNAArray-20160128: 17814 values

##  Group clinical: 2 values

## ** Creating features

## Pearson similarity chosen - enforcing min. 5 patients per net.

## ** Compiling features

## 
## ** Running feature selection

##  Class: LumA

## 
##    LumA nonpred    <NA> 
##     184     236       0

##  Scoring features

##  Writing queries:

##      184 IDs; 2 queries (92 sampled, 92 test)

##      Q1: 92 test;  92 query

##      Q2: 92 test;  92 query

## QueryRunner time taken: 2.2 s

##  Compiling feature scores

## GUANOSINE_NUCLEOTIDES__I_DE_NOVO__I__BIOSYNTHESIS.profile 
##                                                         2 
##    MUCIN_CORE_1_AND_CORE_2__I_O__I_-GLYCOSYLATION.profile 
##                                                         2 
##                              RETINOL_BIOSYNTHESIS.profile 
##                                                         2 
##                                              age_cont.txt 
##                                                         1

##

##  Class: notLumA

## 
## nonpred notLumA    <NA> 
##     184     236       0

##  Scoring features

##  Writing queries:

##      236 IDs; 2 queries (118 sampled, 118 test)

##      Q1: 118 test;  118 query

##      Q2: 118 test;  118 query

## QueryRunner time taken: 1.9 s

##  Compiling feature scores

##    MUCIN_CORE_1_AND_CORE_2__I_O__I_-GLYCOSYLATION.profile 
##                                                         1 
## GUANOSINE_NUCLEOTIDES__I_DE_NOVO__I__BIOSYNTHESIS.profile 
##                                                         2 
##                              RETINOL_BIOSYNTHESIS.profile 
##                                                         2 
##                                            stage_cont.txt 
##                                                         1

##

## 
## ** Predicting labels for test

## LumA

##  4 feature(s) selected

##  Create & compile features

##  Filter set provided

##      BRCA_mRNAArray-20160128: 3 of 3 nets pass

##      clinical: 0 of 2 nets pass

## Pearson similarity chosen - enforcing min. 5 patients per net.

##  ** LumA: Compute similarity

## notLumA

##  4 feature(s) selected

##  Create & compile features

##  Filter set provided

##      BRCA_mRNAArray-20160128: 3 of 3 nets pass

##      clinical: 0 of 2 nets pass

## Pearson similarity chosen - enforcing min. 5 patients per net.

##  ** notLumA: Compute similarity

##

## ** Predict labels

## Split 2: ACCURACY (N=104 test) = 76.0%

## 
## ----------------------------------------

## Predictor completed at:

## 2020-01-27 14:14:21

7 Examine output

The results are stored in the list object returned by the buildPredictor() call. This list contains:

inputNets: all input networks that the model started with.
Split<i>: a list with results for each train-test split
- predictions: real and predicted labels for test patients
- accuracy: percent accuracy of predictions
- featureScores: feature scores for each label (list with g entries, where g is number of patient labels). Each entry contains the feature selection scores for the corresponding label.
- featureSelected: vector of features that pass feature selection. List of length g, with one entry per label.

summary(out)

##           Length Class  Mode     
## inputNets 10     -none- character
## Split1     4     -none- list     
## Split2     4     -none- list

summary(out$Split1)

##                 Length Class      Mode   
## featureScores      2   -none-     list   
## featureSelected    2   -none-     list   
## predictions     2692   data.frame list   
## accuracy           1   -none-     numeric

Save results to a file for downstream analysis:

save(out,file=sprintf("%s/results.rda",outDir))

Write prediction results to text files:

numSplits <- 2
st <- unique(colData(brca)$STATUS) # to get similarity scores for each class
for (k in 1:numSplits) { 
    pred <- out[[sprintf("Split%i",k)]][["predictions"]];
    oF <- sprintf("%s/Split%i_predictionResults.txt",outDir,k)
    tmp <- pred[,c("ID","STATUS","TT_STATUS","PRED_CLASS",sprintf("%s_SCORE",st))]
    write.table(tmp,file=oF,sep="\t",col=TRUE,row=FALSE,quote=FALSE)
}

8 sessionInfo

sessionInfo()

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/C/C/C/C/en_CA.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] curatedTCGAData_1.6.0       MultiAssayExperiment_1.10.4
##  [3] SummarizedExperiment_1.14.1 DelayedArray_0.10.0        
##  [5] BiocParallel_1.18.1         matrixStats_0.55.0         
##  [7] Biobase_2.44.0              GenomicRanges_1.36.0       
##  [9] GenomeInfoDb_1.20.0         IRanges_2.18.0             
## [11] S4Vectors_0.22.0            BiocGenerics_0.30.0        
## [13] netDx_0.99.8                BiocStyle_2.12.0           
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6                  bit64_0.9-7                  
##  [3] doParallel_1.0.14             httr_1.4.0                   
##  [5] tools_3.6.0                   backports_1.1.4              
##  [7] R6_2.4.0                      KernSmooth_2.23-15           
##  [9] DBI_1.0.0                     lazyeval_0.2.2               
## [11] colorspace_1.4-1              tidyselect_0.2.5             
## [13] bit_1.1-14                    curl_3.3                     
## [15] compiler_3.6.0                glmnet_2.0-18                
## [17] graph_1.62.0                  bookdown_0.14                
## [19] caTools_1.17.1.2              scales_1.0.0                 
## [21] rappdirs_0.3.1                caroline_0.7.6               
## [23] stringr_1.4.0                 digest_0.6.18                
## [25] rmarkdown_1.12                R.utils_2.8.0                
## [27] XVector_0.24.0                pkgconfig_2.0.2              
## [29] htmltools_0.4.0               fastmap_1.0.1                
## [31] dbplyr_1.4.2                  rlang_0.4.1                  
## [33] RSQLite_2.1.2                 shiny_1.4.0                  
## [35] combinat_0.0-8                gtools_3.8.1                 
## [37] dplyr_0.8.3                   R.oo_1.22.0                  
## [39] RCurl_1.95-4.12               magrittr_1.5                 
## [41] GenomeInfoDbData_1.2.1        Matrix_1.2-17                
## [43] Rcpp_1.0.1                    munsell_0.5.0                
## [45] R.methodsS3_1.7.1             stringi_1.4.3                
## [47] yaml_2.2.0                    RJSONIO_1.3-1.1              
## [49] zlibbioc_1.30.0               AnnotationHub_2.16.1         
## [51] gplots_3.0.1.1                plyr_1.8.4                   
## [53] BiocFileCache_1.8.0           grid_3.6.0                   
## [55] blob_1.2.0                    promises_1.1.0               
## [57] gdata_2.18.0                  ExperimentHub_1.10.0         
## [59] bigmemory.sri_0.1.3           crayon_1.3.4                 
## [61] lattice_0.20-38               zeallot_0.1.0                
## [63] knitr_1.22                    pillar_1.3.1                 
## [65] igraph_1.2.4.1                reshape2_1.4.3               
## [67] codetools_0.2-16              XML_3.98-1.19                
## [69] glue_1.3.1                    evaluate_0.13                
## [71] BiocManager_1.30.4            httpuv_1.5.2                 
## [73] vctrs_0.2.0                   foreach_1.4.4                
## [75] gtable_0.3.0                  purrr_0.3.2                  
## [77] assertthat_0.2.1              ggplot2_3.1.1                
## [79] xfun_0.6                      mime_0.6                     
## [81] xtable_1.8-4                  pracma_2.2.5                 
## [83] later_1.0.0                   RCy3_2.4.6                   
## [85] tibble_2.1.1                  iterators_1.0.10             
## [87] AnnotationDbi_1.46.1          memoise_1.1.0                
## [89] interactiveDisplayBase_1.22.0 bigmemory_4.5.33             
## [91] ROCR_1.0-7

Building binary classifier from clinical and ’omic data

2020-01-27

Package