Title: | Integrative Penalized Conditional Logistic Regression |
---|---|
Description: | Implements L1 and L2 penalized conditional logistic regression with penalty factors allowing for integration of multiple data sources. Implements stability selection for variable selection. |
Authors: | Vera Djordjilovi'c [aut, cre] , Erica Ponzi [aut] |
Maintainer: | Vera Djordjilovi'c <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.0.0 |
Built: | 2024-11-25 03:54:28 UTC |
Source: | https://github.com/veradjordjilovic/penalizedclr |
Internal function that performs cross validation to determine reasonable default values for L1 penalty in a conditional logistic regression
default.lambda(X, Y, stratum, nfolds = 10, alpha = 1)
default.lambda(X, Y, stratum, nfolds = 10, alpha = 1)
X |
A matrix of covariates, with the number of rows equaling the number of observations. |
Y |
A binary response variable. |
stratum |
A numeric vector with stratum membership of each observation. |
nfolds |
The number of folds used in cross-validation. Default is 10. |
alpha |
The elastic net mixing parameter, a number between 0 and 1. alpha=0 would give pure ridge; alpha=1 gives lasso. Pure ridge penalty is never obtained in this implementation since alpha must be positive. |
A numeric value for lambda
minimizing cross validated deviance.
Computes a data adaptive vector of penalty factors for blocks of covariates by fitting
a tentative penalized conditional logistic regression model. The penalty for the i
th block is obtained
as the inverse of the arithmetic mean of coefficient estimates for its covariates.
default.pf( response, stratum, penalized, unpenalized = NULL, alpha = 1, p = NULL, standardize = TRUE, event, nfolds = 10, type.step1, verbose = F )
default.pf( response, stratum, penalized, unpenalized = NULL, alpha = 1, p = NULL, standardize = TRUE, event, nfolds = 10, type.step1, verbose = F )
response |
The response variable, either a 0/1 vector or a factor with two levels. |
stratum |
A numeric vector with stratum membership of each observation. |
penalized |
A matrix of penalized covariates. |
unpenalized |
A matrix of additional unpenalized covariates. |
alpha |
The elastic net mixing parameter, a number between 0 and 1. alpha=0 would give pure ridge; alpha=1 gives lasso. Pure ridge penalty is never obtained in this implementation since alpha must be positive. |
p |
The sizes of blocks of covariates, a numerical vector of the length equal to the number of blocks, and with the sum equal to the number of penalized covariates. If missing, all covariates are treated the same and a single penalty is applied. |
standardize |
Should the covariates be standardized, a logical value. |
event |
If response is a factor, the level that should be considered a success in the logistic regression. |
nfolds |
The number of folds used in cross-validation. Default is 10. |
type.step1 |
Should the tentative model be fit on all covariates jointly ( |
verbose |
Logical. Should the message about the obtained penalty factors be printed? |
Blocks that contain covariates with large estimated coefficients will obtain a smaller penalty.
If all estimated coefficients pertaining to a block are zero, the function returns a message.
A tentative conditional logistic regression model is fit either to each covariates block separately (type.step1 = "sep"
) or jointly to all blocks (type.step1 = "comb"
).
Note that unpenalized = NULL
is the only implemented option in this function as of now.
The function returns a list containing the vector of penalty factors correspondng to different blocks.
Schulze G. (2017) Clinical Outcome Prediction based on Multi-Omics Data: Extension of IPF-LASSO. Master Thesis.
Performs cross validation to determine reasonable values for L1 penalty in a conditional logistic regression.
find.default.lambda( response, stratum, penalized, unpenalized = NULL, alpha = 1, p = NULL, standardize = TRUE, event, pf.list = NULL, nfolds = 10 )
find.default.lambda( response, stratum, penalized, unpenalized = NULL, alpha = 1, p = NULL, standardize = TRUE, event, pf.list = NULL, nfolds = 10 )
response |
The response variable, either a 0/1 vector or a factor with two levels. |
stratum |
A numeric vector with stratum membership of each observation. |
penalized |
A matrix of penalized covariates. |
unpenalized |
A matrix of additional unpenalized covariates. |
alpha |
The elastic net mixing parameter, a number between 0 and 1. alpha=0 would give pure ridge; alpha=1 gives lasso. Pure ridge penalty is never obtained in this implementation since alpha must be positive. |
p |
The sizes of blocks of covariates, a numerical vector of the length equal to the number of blocks, and with the sum equal to the number of penalized covariates. If missing, all covariates are treated the same and a single penalty is applied. |
standardize |
Should the covariates be standardized, a logical value. |
event |
If response is a factor, the level that should be considered a success in the logistic regression. |
pf.list |
List of vectors of penalty factors. |
nfolds |
The number of folds used in cross-validation. Default is 10. |
The function is based on cross-validation implemented in the clogitL1
package and returns
the value of lambda
that minimizes cross validated deviance.
In the presence of blocks of covariates, a user specifies a list of
candidate vectors of penalty factors. For each candidate vector of penalty factors a
single lambda
value is obtained. Note that
cross-validation includes random data splitting, meaning
that obtained values can vary significantly between different runs.
A single numeric value if p
and pf.list
are missing, or a list of numeric values
with L1 penalties for each vector of penalty factors supplied.
set.seed(123) # simulate covariates (pure noise in two blocks of 20 and 80 variables) X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80)) p <- c(20,80) pf.list <- list(c(0.5, 1), c(2, 0.9)) # stratum membership stratum <- sort(rep(1:100, 2)) # the response Y <- rep(c(1, 0), 100) # obtain a list with vectors of penalty factors lambda.list <- find.default.lambda(response = Y, penalized = X, stratum = stratum, p = p, pf.list = pf.list) # when `p` and `pf.list` are not provided all covariates are treated as a single block lambda <- find.default.lambda(response = Y, penalized = X, stratum = stratum)
set.seed(123) # simulate covariates (pure noise in two blocks of 20 and 80 variables) X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80)) p <- c(20,80) pf.list <- list(c(0.5, 1), c(2, 0.9)) # stratum membership stratum <- sort(rep(1:100, 2)) # the response Y <- rep(c(1, 0), 100) # obtain a list with vectors of penalty factors lambda.list <- find.default.lambda(response = Y, penalized = X, stratum = stratum, p = p, pf.list = pf.list) # when `p` and `pf.list` are not provided all covariates are treated as a single block lambda <- find.default.lambda(response = Y, penalized = X, stratum = stratum)
Fits conditional logistic regression models with L1 and L2 penalty allowing for different penalties for different blocks of covariates.
penalized.clr( response, stratum, penalized, unpenalized = NULL, lambda, alpha = 1, p = NULL, standardize = TRUE, event )
penalized.clr( response, stratum, penalized, unpenalized = NULL, lambda, alpha = 1, p = NULL, standardize = TRUE, event )
response |
The response variable, either a 0/1 vector or a factor with two levels. |
stratum |
A numeric vector with stratum membership of each observation. |
penalized |
A matrix of penalized covariates. |
unpenalized |
A matrix of additional unpenalized covariates. |
lambda |
The tuning parameter for L1. Either a single non-negative number, or a numeric vector of the length equal to the number of blocks. See p below. |
alpha |
The elastic net mixing parameter, a number between 0 and 1. alpha=0 would give pure ridge; alpha=1 gives lasso. Pure ridge penalty is never obtained in this implementation since alpha must be positive. |
p |
The sizes of blocks of covariates, a numerical vector of the length equal to the number of blocks, and with the sum equal to the number of penalized covariates. If missing, all covariates are treated the same and a single penalty is applied. |
standardize |
Should the covariates be standardized, a logical value. |
event |
If response is a factor, the level that should be considered a success in the logistic regression. |
The penalized.clr
function fits a conditional logistic regression
model for a given combination of L1 (lambda
) and L2 penalties. L2 penalty is
obtained from lambda
and alpha
as lambda*(1-alpha)/(2*alpha)
.
Note that lambda
is a single number if all covariates are to be penalized
equally, and a vector of penatlies, if predictors are divided in blocks (of sizes provided in
p
) that are to be penalized differently. The penalized.clr
function
is based on the Cox model routine available in the
penalized
package.
A list with the following elements:
penalized
- Regression coefficients for the penalized covariates.
unpenalized
- Regression coefficients for the unpenalized covariates.
converged
- Whether the fitting process was judged to have converged.
lambda
- The tuning parameter for L1 used.
alpha
- The elastic net mixing parameter used.
stable.clr
and stable.clr.g
for variable selection through stability selection
in penalized conditional logistic regression with a single penalty factor and multiple penalty factors, respectively.
set.seed(123) # simulate covariates (pure noise in two blocks of 20 and 80 variables) X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80)) # stratum membership stratum <- sort(rep(1:100, 2)) # the response Y <- rep(c(1, 0), 100) fit <- penalized.clr( response = Y, stratum = stratum, penalized = X, lambda = c(1, 0.3), p = c(20, 80), standardize = TRUE) fit$penalized fit$converged fit$lambda
set.seed(123) # simulate covariates (pure noise in two blocks of 20 and 80 variables) X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80)) # stratum membership stratum <- sort(rep(1:100, 2)) # the response Y <- rep(c(1, 0), 100) fit <- penalized.clr( response = Y, stratum = stratum, penalized = X, lambda = c(1, 0.3), p = c(20, 80), standardize = TRUE) fit$penalized fit$converged fit$lambda
Performs stability selection for conditional logistic regression models with L1 and L2 penalty.
stable.clr( response, stratum, penalized, unpenalized = NULL, lambda.seq, alpha = 1, B = 100, parallel = TRUE, standardize = TRUE, event )
stable.clr( response, stratum, penalized, unpenalized = NULL, lambda.seq, alpha = 1, B = 100, parallel = TRUE, standardize = TRUE, event )
response |
The response variable, either a 0/1 vector or a factor with two levels. |
stratum |
A numeric vector with stratum membership of each observation. |
penalized |
A matrix of penalized covariates. |
unpenalized |
A matrix of additional unpenalized covariates. |
lambda.seq |
a sequence of non-negative value to be used as tuning parameter for L1 |
alpha |
The elastic net mixing parameter, a number between 0 and 1. alpha=0 would give pure ridge; alpha=1 gives lasso. Pure ridge penalty is never obtained in this implementation since alpha must be positive. |
B |
A single positive number for the number of subsamples. |
parallel |
Logical. Should the computation be parallelized? |
standardize |
Should the covariates be standardized, a logical value. |
event |
If response is a factor, the level that should be considered a success in the logistic regression. |
A list with a numeric vector Pistab
giving selection probabilities for each penalized covariate, and
a sequence lambda.seq
used.
stable.clr.g
for stability selection
in penalized conditional logistic regression with multiple penalties for block structured covariates.
set.seed(123) # simulate covariates (pure noise in two blocks of 20 and 80 variables) X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80)) # stratum membership stratum <- sort(rep(1:100, 2)) # the response Y <- rep(c(1, 0), 100) # default L1 penalty lambda <- find.default.lambda(response = Y, penalized = X, stratum = stratum) # perform stability selection stable1 <- stable.clr(response = Y, penalized = X, stratum = stratum, lambda.seq = lambda)
set.seed(123) # simulate covariates (pure noise in two blocks of 20 and 80 variables) X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80)) # stratum membership stratum <- sort(rep(1:100, 2)) # the response Y <- rep(c(1, 0), 100) # default L1 penalty lambda <- find.default.lambda(response = Y, penalized = X, stratum = stratum) # perform stability selection stable1 <- stable.clr(response = Y, penalized = X, stratum = stratum, lambda.seq = lambda)
Performs stability selection for conditional logistic regression models with L1 and L2 penalty allowing for different penalties for different blocks (groups) of covariates (different data sources).
stable.clr.g( response, stratum, penalized, unpenalized = NULL, p = NULL, lambda.list, alpha = 1, B = 100, parallel = TRUE, standardize = TRUE, event )
stable.clr.g( response, stratum, penalized, unpenalized = NULL, p = NULL, lambda.list, alpha = 1, B = 100, parallel = TRUE, standardize = TRUE, event )
response |
The response variable, either a 0/1 vector or a factor with two levels. |
stratum |
A numeric vector with stratum membership of each observation. |
penalized |
A matrix of penalized covariates. |
unpenalized |
A matrix of additional unpenalized covariates. |
p |
The sizes of blocks of covariates, a numerical vector of the length equal to the number of blocks, and with the sum equal to the number of penalized covariates. If missing, all covariates are treated the same and a single penalty is applied. |
lambda.list |
A list of vectors of penalties to be applied to different blocks of covariates. Each vector should have the length equal to the number of blocks. |
alpha |
The elastic net mixing parameter, a number between 0 and 1. alpha=0 would give pure ridge; alpha=1 gives lasso. Pure ridge penalty is never obtained in this implementation since alpha must be positive. |
B |
A single positive number for the number of subsamples. |
parallel |
Logical. Should the computation be parallelized? |
standardize |
Should the covariates be standardized, a logical value. |
event |
If response is a factor, the level that should be considered a success in the logistic regression. |
This function implements stability selection (Meinshausen and Bühlmann, 2010) in
a conditional logistic regression. The implementation is based on the modification of Shah and
Samworth (2013) featuring complementary subsamples. Note that this means that the number
of subsamples will be 2B
instead of B
. Subsampling procedure is repeated
2B
times for each vector of per-block penalties resulting each time in a vector of
selection frequencies (frequency of non-zero coefficient estimate of each covariate).
The final selection probability Pistab
is obtained by taking the maximum over
all considered vectors of penalties.
A list containing a numeric vector Pistab
,
giving selection probabilities for all penalized covariates,
lambda.list
and p
provided as input arguments.
Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473.
Shah, R. D., & Samworth, R. J. (2013). Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1), 55-80.
set.seed(123) # simulate covariates (pure noise in two blocks of 20 and 80 variables) X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80)) p <- c(20,80) # stratum membership stratum <- sort(rep(1:100, 2)) # the response Y <- rep(c(1, 0), 100) # list of L1 penalties lambda.list = list(c(0.5,1), c(2,0.9)) # perform stability selection stable.g1 <- stable.clr.g(response = Y, penalized = X, stratum = stratum, p = p, lambda.list = lambda.list)
set.seed(123) # simulate covariates (pure noise in two blocks of 20 and 80 variables) X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80)) p <- c(20,80) # stratum membership stratum <- sort(rep(1:100, 2)) # the response Y <- rep(c(1, 0), 100) # list of L1 penalties lambda.list = list(c(0.5,1), c(2,0.9)) # perform stability selection stable.g1 <- stable.clr.g(response = Y, penalized = X, stratum = stratum, p = p, lambda.list = lambda.list)
Internal function used by stable.clr
and stable.clr.g
.
subsample.clr( response, stratum, penalized, unpenalized = NULL, lambda, alpha = 1, B = 100, matB = NULL, return.matB = FALSE, parallel = TRUE, standardize = TRUE )
subsample.clr( response, stratum, penalized, unpenalized = NULL, lambda, alpha = 1, B = 100, matB = NULL, return.matB = FALSE, parallel = TRUE, standardize = TRUE )
response |
The response variable, either a 0/1 vector or a factor with two levels. |
stratum |
A numeric vector with stratum membership of each observation. |
penalized |
A matrix of penalized covariates. |
unpenalized |
A matrix of additional unpenalized covariates. |
lambda |
The tuning parameter for L1. Either a single non-negative number, or a numeric vector of the length equal to the number of blocks. See p below. |
alpha |
The elastic net mixing parameter, a number between 0 and 1. alpha=0 would give pure ridge; alpha=1 gives lasso. Pure ridge penalty is never obtained in this implementation since alpha must be positive. |
B |
A single positive number for the number of subsamples. |
matB |
A 2B x ceiling(unique(stratum)/2) matrix with index set of selected strata in each of 2B subsamples |
return.matB |
Logical. Should the matrix matB be returned? |
parallel |
Logical. Should the computation be parallelized? |
standardize |
Should the covariates be standardized, a logical value. |
If return.matB
is TRUE, a list with two elements, a numeric vector Pistab
,
giving selection probabilities for each covariate and a matrix matB
;
otheriwise only Pistab
.