Pretraining and the lasso
Erin Craig, Mert Pilanci, Thomas Le Menestrel, Balasubramanian Narasimhan, Manuel Rivas, Roozbeh Dehghannasiri, Julia Salzman, Jonathan Taylor, Robert Tibshirani
Pretraining is a popular and powerful paradigm in machine learning. For example, suppose we have a modest-sized dataset of images of cats and dogs, and we want to fit a deep neural network to classify them. With pretraining, we start with a neural network trained on a large corpus of images, consisting of not just cats and dogs but hundreds of other image types. Then we fix all of the network weights except for the top layer(s) and train (or “fine tune”) those weights on our dataset. This often results in dramatically better performance than the network trained solely on our smaller dataset.
Why should this work? By training our neural network first with a large, broad set of images, we learn general features like edges, textures, fur, eyes and so on. Then, during the fine tuning stage, we learn how to use these features to distinguish between cats and dogs.
Can pretraining help the lasso? Yes!
Here we present a framework for the lasso in which an overall model is fit to a large set of data, and then fine-tuned to a specific task. This latter dataset can be a subset of the original dataset, but does not need to be. Pretraining the lasso has a wide variety of applications, including stratified models, multinomial targets, multi-response models, conditional average treatment estimation and even gradient boosting.
Software
Software for fitting pretrained lasso models is available as an R package, accompanied by documentation and examples.