Title: | Generalized Mass Spectrum Missing Peaks Abundance Imputation |
---|---|
Description: | GMSimpute implements the Two-Step Lasso (TS-Lasso) and compound minimum to recover the abundance of missing peaks in mass spectrum analysis. TS-Lasso is a label-free imputation method that handles various types of missing peaks simultaneously. This package provides the procedure to generate missing peaks (or data) for simulation study, as well as a tool to estimate and visualize the proportion of missing at random. |
Authors: | Qian Li [aut, cre] |
Maintainer: | Qian Li <[email protected]> |
License: | GPL(>=2) |
Version: | 0.0.1.0 |
Built: | 2025-01-25 03:03:21 UTC |
Source: | https://github.com/qianli10000/gmsimpute |
GMS.Lasso recovers the abundance of missing peaks via either TS.Lasso or the minimum abundance per compound.
GMS.Lasso(input_data, alpha = 1, nfolds = 10, log.scale = TRUE, TS.Lasso = TRUE)
GMS.Lasso(input_data, alpha = 1, nfolds = 10, log.scale = TRUE, TS.Lasso = TRUE)
input_data |
Raw abundance matrix with missing value, with features in rows and samples in columns. |
alpha |
Weights for L1 penalty in Elastic Net. The default and suggested value is alpha=1, which is for Lasso. |
nfolds |
The number of folds used in parameter (lambda) tuning. |
log.scale |
Whether the input_data needs log scale transform.The default is log.scale=T, assuming input_data is the raw abundance matrix. If input_data is log abundance matrix, log.scale=F. |
TS.Lasso |
Whether to use TS.Lasso or the minimum per compound for imputation. |
imputed.final |
The imputed abundance matrix at the scale of input_data. |
data('tcga.bc') # tcga.bc contains mass specturm abundance of 150 metabolites for 30 breast cancer # tumor and normal tissue samples with missing values. imputed.compound.min=GMS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=FALSE) # Impute raw abundance matrix tcga.bc with compound minimum imputed.tslasso=GMS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=TRUE) # Impute raw abundance matrix tcga.bc with TS.Lasso
data('tcga.bc') # tcga.bc contains mass specturm abundance of 150 metabolites for 30 breast cancer # tumor and normal tissue samples with missing values. imputed.compound.min=GMS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=FALSE) # Impute raw abundance matrix tcga.bc with compound minimum imputed.tslasso=GMS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=TRUE) # Impute raw abundance matrix tcga.bc with TS.Lasso
GTS.Lasso recovers the abundance of missing peaks via either TS.Lasso or the minimum abundance per compound.
GTS.Lasso(input_data, alpha = 1, nfolds = 10, log.scale = TRUE, TS.Lasso = TRUE)
GTS.Lasso(input_data, alpha = 1, nfolds = 10, log.scale = TRUE, TS.Lasso = TRUE)
input_data |
Raw abundance matrix with missing value, with features in rows and samples in columns. |
alpha |
Weights for L1 penalty in Elastic Net. The default and suggested value is alpha=1, which is for Lasso. |
nfolds |
The number of folds used in parameter (lambda) tuning. |
log.scale |
Whether the input_data needs log scale transform.The default is log.scale=T, assuming input_data is the raw abundance matrix. If input_data is log abundance matrix, log.scale=F. |
TS.Lasso |
Whether to use TS.Lasso or the minimum per compound for imputation. |
imputed.final |
The imputed abundance matrix at the scale of input_data. |
data('tcga.bc') # tcga.bc contains mass specturm abundance of 150 metabolites for 30 breast cancer # tumor and normal tissue samples with missing values. imputed.compound.min=GTS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=FALSE) # Impute raw abundance matrix tcga.bc with compound minimum imputed.tslasso=GTS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=TRUE) # Impute raw abundance matrix tcga.bc with TS.Lasso
data('tcga.bc') # tcga.bc contains mass specturm abundance of 150 metabolites for 30 breast cancer # tumor and normal tissue samples with missing values. imputed.compound.min=GTS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=FALSE) # Impute raw abundance matrix tcga.bc with compound minimum imputed.tslasso=GTS.Lasso(tcga.bc,log.scale=TRUE,TS.Lasso=TRUE) # Impute raw abundance matrix tcga.bc with TS.Lasso
MAR.est estimates the proportion of missing peaks at random (MAR) caused by preprocessing tools with two technical replicates per sample.
MAR.est(abundance, sample, log.scale = TRUE, violin.plot = FALSE)
MAR.est(abundance, sample, log.scale = TRUE, violin.plot = FALSE)
abundance |
The full abundance matrix without missing value, with features in rows and samples in columns. |
sample |
A vector of characters or integers. It is the sample name for each pair of replicates. |
log.scale |
A scalar or vector of proportions. It is the total percentage of missing peaks throughout the full matrix. |
violin.plot |
Logical, whether to generate violin and box plots to visualize abundance distribution of missing and nonmissing peaks. |
MAR.Proportion |
Estimated MAR proportion |
plot |
Violin and box plots generated by ggplot2 |
data('replicates') # replicates contains mass specturm log abundance of 85 peptides # with missing values for 4 pairs of technical replicates. MAR=MAR.est(replicates,sample=rep(1:4,each=2),log.scale=FALSE,violin.plot=TRUE) # Estimates the MAR proportion in the 4 pairs of replicates and output violin/box plots object. print(MAR$plot) # Print violin/box plots
data('replicates') # replicates contains mass specturm log abundance of 85 peptides # with missing values for 4 pairs of technical replicates. MAR=MAR.est(replicates,sample=rep(1:4,each=2),log.scale=FALSE,violin.plot=TRUE) # Estimates the MAR proportion in the 4 pairs of replicates and output violin/box plots object. print(MAR$plot) # Print violin/box plots
missing.sim generates various types of missing peaks based on specified missing proportion.
missing.sim(complete.data, total.missing, random, pct.full, seednum = 365)
missing.sim(complete.data, total.missing, random, pct.full, seednum = 365)
complete.data |
The full abundance matrix without missing value, with features in rows and samples in columns. |
total.missing |
A scalar or vector of proportions. It is the total percentage of missing peaks throughout the full matrix. |
random |
A scalar or vector of proportions. It is the percentage of random missing in all the missing peaks. |
pct.full |
A scalar for the percentage of alighned features (metabolites or peptides) without missing peaks. |
seednum |
The seed set for generating missing peaks index. Default seed is seednum=365. |
simulated.data |
The list of all simulated scenarios |
Labels |
The description for each simulated scenario |
data('tcga.bc.full') # tcga.bc.full contains mass specturm abundance of 100 metabolites for 30 breast cancer # tumor and normal tissue samples without missing values. simulated.data=missing.sim(tcga.bc.full,total.missing=c(0.2,0.4),random=c(0.3,0.5,0.7),pct.full=0.4) # Generate missing (NA) values in full abundance matrix tcga.bc.full permuting all scenarios
data('tcga.bc.full') # tcga.bc.full contains mass specturm abundance of 100 metabolites for 30 breast cancer # tumor and normal tissue samples without missing values. simulated.data=missing.sim(tcga.bc.full,total.missing=c(0.2,0.4),random=c(0.3,0.5,0.7),pct.full=0.4) # Generate missing (NA) values in full abundance matrix tcga.bc.full permuting all scenarios
Raw mass spectrum proteomics log abundance for 4 pairs of technical replicates.
replicates
replicates
A data frame of 85 rows and 8 columns with missing peaks' abundance as NA.
Raw mass spectrum metabolomics data for TCGA breast cancer study.
tcga.bc
tcga.bc
A data frame of 150 rows and 30 columns with missing peaks' abundance as NA.
A subset of mass spectrum metabolomics data for TCGA breast cancer study without missing peaks.
tcga.bc.full
tcga.bc.full
A data frame of 100 rows and 30 columns without missing value (NA).
TS.Lasso recovers the abundance of various types of missing peaks.
TS.Lasso(input_data, alpha = 1, nfolds = 10, log.scale = TRUE)
TS.Lasso(input_data, alpha = 1, nfolds = 10, log.scale = TRUE)
input_data |
Raw abundance matrix with missing value, with features in rows and samples in columns. |
alpha |
Weights for L1 penalty in Elastic Net. The default and suggested value is alpha=1, which is for Lasso. |
nfolds |
The number of folds used in parameter (lambda) tuning. |
log.scale |
Whether the input_data needs log scale transform.The default is log.scale=T, assuming input_data is the raw abundance matrix. If input_data is log abundance matrix, set log.scale=F. |
imputed.final |
The imputed abundance matrix at the scale of input_data. |
data('tcga.bc') # tcga.bc contains mass specturm abundance of 150 metabolites for 30 breast cancer # tumor and normal tissue samples with missing values. imputed=TS.Lasso(tcga.bc,log.scale=TRUE) # Impute raw abundance matrix tcga.bc
data('tcga.bc') # tcga.bc contains mass specturm abundance of 150 metabolites for 30 breast cancer # tumor and normal tissue samples with missing values. imputed=TS.Lasso(tcga.bc,log.scale=TRUE) # Impute raw abundance matrix tcga.bc