push.bayes
push.bayes.ensemble
- class push.bayes.ensemble.Ensemble(mk_nn: Callable, *args: any, num_devices: int = 1, cache_size: int = 4, view_size: int = 4)
Bases:
InferThe Ensemble Class. Used for running deep ensembles.
- Parameters:
mk_nn (Callable) – The base model to be ensembled.
*args (any) – Any arguments required for base model to be initialized.
num_devices (int, optional) – The desired number of gpu devices that will be utilized. Defaults to 1.
cache_size (int, optional) – The size of cache used to store particles. Defaults to 4.
view_size (int, optional) – The number of particles to consider storing in cache. Defaults to 4.
- bayes_infer(dataloader: ~torch.utils.data.dataloader.DataLoader, epochs: int, loss_fn: ~typing.Callable = MSELoss(), lr: float = 0.01, num_ensembles: int = 2, mk_scheduler=<function mk_scheduler>, prior=False, random_seed=False, bootstrap=False, ensemble_entry=<function _deep_ensemble_main>, ensemble_state={}, f_save: bool = False)
Creates particles and launches push distribution training loop.
- Parameters:
dataloader (Callable) – Dataloader.
epochs (int, optional) – Number of epochs to train for.
loss_fn (Callable) – Loss function to be used during training.
num_ensembles (int, optional) – The number of models to be ensembled.
mk_optim (any) – Returns an optimizer.
ensemble_entry (function) – Training loop for deep ensemble.
ensemble_state (dict) – A dictionary to store state variables for ensembled models. For example, in SWAG, we need to know how many SWAG epochs have passed to properly calculate a running average of model weights.
f_save (bool) – Flag to save each particle/model. Requires “particles” folder in the root directory of the script calling train_deep_ensemble.
- Returns:
None
- posterior_pred(data: DataLoader, f_reg=True, mode=['mean']) Tensor
Generate posterior predictions for the given data.
- Parameters:
data (Union[torch.Tensor, DataLoader]) – The input data for which predictions are to be generated. If a torch.Tensor is provided, it is treated as a single input instance. If a DataLoader is provided, predictions are generated for all instances in the DataLoader.
f_reg (bool, optional) – Flag indicating whether this is a regression task. Set to false for classification tasks.
mode (str, optional) – The mode for generating predictions. Options include “mean” for mean predictions, “median” for median predictions, “max” for max predictions, and “min” for min predictions. Defaults to “mean”.
- Returns:
The posterior predictions for the input data.
- Return type:
torch.Tensor
- Raises:
ValueError – If the provided data is not of type torch.Tensor or DataLoader.
Note
This function uses the push_dist module to launch distributed predictions asynchronously. The type of predictions depends on the specified mode.
- push.bayes.ensemble.create_optimizer(lr)
Create a function that returns Adam optimizer with a specific learning rate.
- Parameters:
lr (float) – Learning rate for the optimizer.
- Returns:
Function that generates Adam optimizer with the specified learning rate.
- Return type:
function
- push.bayes.ensemble.mk_empty_scheduler(optim)
Returns Adam optimizer.
- Parameters:
params – Model parameters.
- Returns:
Adam optimizer.
- Return type:
torch.optim.Adam
- push.bayes.ensemble.mk_optim(params)
Returns Adam optimizer.
- Parameters:
params – Model parameters.
- Returns:
Adam optimizer.
- Return type:
torch.optim.Adam
- push.bayes.ensemble.mk_scheduler(optim)
Returns Adam optimizer.
- Parameters:
params – Model parameters.
- Returns:
Adam optimizer.
- Return type:
torch.optim.Adam
- push.bayes.ensemble.train_deep_ensemble(dataloader: ~typing.Callable, loss_fn: ~typing.Callable, epochs: int, nn: ~typing.Callable, *args, lr: float = 0.01, num_devices: int = 1, cache_size: int = 4, view_size: int = 4, num_ensembles: int = 2, prior=False, random_seed=False, bootstrap=False, ensemble_entry=<function _deep_ensemble_main>, ensemble_state={}) List[Tensor]
Train a deep ensemble PusH distribution and return a list of particle parameters.
- Parameters:
dataloader (Callable) – Dataloader.
loss_fn (Callable) – Loss function to be used during training.
epochs (int, optional) – Number of epochs to train for.
nn (Callable) – The base model to be ensembled and trained.
*args (any) – Any arguments needed for the model’s initialization.
num_devices (int, optional) – The desired number of gpu devices to be utilized during training. Defaults to 1.
cache_size (int, optional) – The desired size of cache allocated to storing particles. Defaults to 4.
view_size (int, optional) – The number of other particle’s parameters that can be seen by a particle on a single GPU. Defaults to 4.
num_ensembles (int, optional) – The number of models to be ensembled. Defaults to 2.
mk_optim (any, optional) – Returns an optimizer. Defaults to mk_optim.
ensemble_entry (function, optional) – Training loop for deep ensemble. Defaults to _deep_ensemble_main.
ensemble_state (dict, optional) – a dictionary to store state variables for ensembled models. i.e. in swag we need to know how how many swag epochs have passed to properly calculate a running average of model weights. Defaults to {}.
- Returns:
Returns a list of all particle’s parameters.
- Return type:
List[torch.Tensor]
push.bayes.infer
- class push.bayes.infer.Infer(mk_nn: Callable, *args: any, num_devices=1, cache_size=4, view_size=4)
Bases:
objectBase Infer class
Creates a PusH distribution with an inference method and return parameters method.
Infer is a base class that should be inherited by a child class that implements a Bayesian inference method.
- Parameters:
mk_nn (Callable) – Function to create base model.
*args (any) – Any arguments required for base model to be initialized.
num_devices (int, optional) – The desired number of gpu devices that will be utilized. Defaults to 1.
cache_size (int, optional) – The size of cache used to store particles. Defaults to 4.
view_size (int, optional) – The number of particles to consider storing in cache. Defaults to 4.
- bayes_infer(dataloader: DataLoader, epochs: int, **kwargs) None
Bayesian inference method.
This method should be overridden by subclass.
- Parameters:
dataloader (DataLoader) – The dataloader to use for training.
epochs (int) – The number of epochs to train for.
- Raises:
- get_var(outputs: List[List[Tensor]]) List[List[Tensor]]
Calculates the variance of predictions over different models.
- Parameters:
outputs (List[List[torch.Tensor]]) – List of model predictions for each batch.
- Returns:
List of tensors representing the variance of predictions over different models.
- Return type:
List[torch.Tensor]
- p_parameters() List[List[Tensor]]
Return parameters of all particles.
- Returns:
List of all particle parameters.
- Return type:
List[List[torch.Tensor]]
- posterior_pred(data: DataLoader, f_reg=True, mode='mean') Tensor
Posterior prediction.
This method should be overridden by subclass.
push.bayes.stein_vgd
- class push.bayes.stein_vgd.SteinVGD(mk_nn: Callable, *args: any, num_devices=1, cache_size=4, view_size=4)
Bases:
InferSteinVGD Class.
This class extends the ‘Infer’ class and uses Stein Variational Gradient Descent (SteinVGD) for Bayesian inference tasks.
- Parameters:
mk_nn (Callable) – A function that creates the neural network architecture for the model.
*args (any) – Additional arguments that will be passed to the ‘Infer’ class.
num_devices (int) – The number of devices to be used for computation. Default is 1.
cache_size (int) – The size of the cache for storing computed gradients. Default is 4.
view_size (int) – The size of the view for distributed computations. Default is 4.
- bayes_infer(dataloader: ~torch.utils.data.dataloader.DataLoader, epochs: int, prior=False, random_seed=False, bootstrap=False, loss_fn=MSELoss(), num_particles=1, lengthscale=1.0, lr=0.001, svgd_entry=<function _svgd_leader>, svgd_state={})
Perform Bayesian inference using SteinVGD.
- Parameters:
dataloader (DataLoader) – Dataloader for training.
epochs (int) – Number of training epochs.
prior – Prior information for Bayesian inference. Default is None.
loss_fn (Callable) – Loss function to be used during training. Default is torch.nn.MSELoss().
num_particles (int) – Number of particles to use in SVGD. Default is 1.
lengthscale (float) – Characteristic length scale of the SVGD kernel. Default is 1.0.
lr (float) – Learning rate for optimization. Default is 1e-3.
svgd_entry (Callable) – SVGD entry function. Default is _svgd_leader.
svgd_state (dict) – Additional state information for SVGD. Default is {}.
- posterior_pred(data: DataLoader, f_reg=True, mode=['mean']) Tensor
Posterior prediction.
This method should be overridden by subclass.
- push.bayes.stein_vgd.mk_empty_optim(params)
Helper function to create an empty optimizer.
- Parameters:
params – Model parameters.
- Returns:
None.
- push.bayes.stein_vgd.mk_empty_scheduler(optim)
Helper function to create an empty optimizer.
- Parameters:
params – Model parameters.
- Returns:
None.
- push.bayes.stein_vgd.normal_prior(params: Iterable[Tensor]) list[torch.Tensor]
Compute gradients with respect to a normal distribution.
This function calculates the gradients with respect to a normal distribution with mean 0.0 and standard deviation 1.0.
- Parameters:
params (Iterable[torch.Tensor]) – Collection of tensors for which gradients are computed.
- Returns:
List of computed gradients for each parameter.
- Return type:
List[torch.Tensor]
- push.bayes.stein_vgd.torch_squared_exp_kernel(x: Tensor, y: Tensor, length_scale: float) Tensor
Compute the squared exponential kernel between two tensors.
This function calculates the squared exponential kernel value between two tensors x and y. The kernel has a characteristic length scale specified by length_scale.
- Parameters:
x (torch.Tensor) – First input tensor.
y (torch.Tensor) – Second input tensor.
length_scale (float) – Characteristic length scale of the kernel.
- Returns:
Computed squared exponential kernel value.
- Return type:
torch.Tensor
Note
The kernel is commonly used in Gaussian Process regression for modeling smooth functions.
- push.bayes.stein_vgd.torch_squared_exp_kernel_grad(x: Tensor, y: Tensor, length_scale: float) Tensor
Compute the gradient of the squared exponential kernel.
This function calculates the gradient of the squared exponential kernel with respect to its inputs.
- Parameters:
x (torch.Tensor) – First input tensor.
y (torch.Tensor) – Second input tensor.
length_scale (float) – Characteristic length scale of the kernel.
- Returns:
Computed gradient of the squared exponential kernel.
- Return type:
torch.Tensor
- push.bayes.stein_vgd.train_svgd(dataloader: ~torch.utils.data.dataloader.DataLoader, loss_fn: ~typing.Callable, epochs: int, num_particles: int, nn: ~typing.Callable, *args, lengthscale=1.0, lr=0.001, prior=None, num_devices=1, cache_size=4, view_size=4, svgd_entry=<function _svgd_leader>, svgd_state={}) None
Trains a model using Stein Variational Gradient Descent (SVGD).
This function trains a model using Stein Variational Gradient Descent (SVGD). It initializes a SteinVGD instance and performs Bayesian inference using the provided data loader, loss function, and training parameters. The resulting parameters from SVGD are returned.
- Parameters:
dataloader (DataLoader) – The data loader for the training data.
loss_fn (Callable) – The loss function to be used during training.
epochs (int) – The number of training epochs.
num_particles (int) – The number of particles to use in SVGD.
nn (Callable) – A function that creates the neural network architecture for the model.
*args (any) – Additional arguments to be passed to the SteinVGD constructor.
lengthscale (float, optional) – The characteristic length scale of the SVGD kernel. Default is 1.0.
lr (float, optional) – The learning rate for optimization. Default is 1e3.
prior – Prior information for Bayesian inference. Default is None.
num_devices (int, optional) – The number of devices to be used for computation. Default is 1.
cache_size (int, optional) – The size of the cache for storing computed gradients. Default is 4.
view_size (int, optional) – The size of the view for distributed computations. Default is 4.
svgd_entry (Callable, optional) – The SVGD entry function. Default is _svgd_leader.
svgd_state (dict, optional) – Additional state information for SVGD. Default is {}.
- Returns:
None
Note
The returned parameters can be used for further inference, testing, and analysis.
push.bayes.swag
- class push.bayes.swag.MultiSWAG(mk_nn: Callable, *args: any, num_devices=1, cache_size=4, view_size=4)
Bases:
InferMultiSWAG class for running MultiSWAG models.
- Parameters:
mk_nn (Callable) – The base model to be ensembled.
*args – Any arguments required for the base model initialization.
num_devices (int) – The desired number of GPU devices to utilize.
cache_size (int) – The size of the cache used to store particles.
view_size (int) – The number of particles to consider storing in the cache.
- bayes_infer(dataloader: ~torch.utils.data.dataloader.DataLoader, pretrain_epochs: int, swag_epochs: int, loss_fn: ~typing.Callable = MSELoss(), lr: float = 0.01, num_models: int = 1, cov_mat_rank: int = 20, prior=False, random_seed=False, bootstrap=False, mswag_entry=<function _mswag_particle>, mswag_state={}, f_save=False, mswag_sample_entry=<function _mswag_sample_entry>, mswag_sample=<function _mswag_sample>)
Perform Bayesian inference using MultiSWAG.
- Parameters:
dataloader (DataLoader) – DataLoader containing the data.
loss_fn (Callable) – Loss function used for training.
num_models (int) – Number of models to be ensembled.
cov_mat_rank (int) – Maximum rank of low rank plus diagonal covariance matrix
lr (float) – Learning rate for training.
pretrain_epochs (int) – Number of epochs for pretraining.
swag_epochs (int) – Number of epochs for SWAG training.
mswag_entry (Callable) – Training loop for deep ensemble.
mswag_state (dict) – State variables for ensembled models.
f_save (bool) – Flag to save each particle/model.
mswag_sample_entry (Callable) – Sampling function.
mswag_sample (Callable) – MultiSWAG sample function.
- Returns:
None
- push.bayes.swag.create_optimizer(lr)
Create a function that returns Adam optimizer with a specific learning rate.
- Parameters:
lr (float) – Learning rate for the optimizer.
- Returns:
Function that generates Adam optimizer with the specified learning rate.
- Return type:
function
- push.bayes.swag.mk_optim(params)
Returns an Adam optimizer.
- Parameters:
params – Parameters for optimization.
- Returns:
Adam optimizer.
- Return type:
torch.optim.Adam
- push.bayes.swag.mk_scheduler(optim)
Returns Adam optimizer.
- Parameters:
params – Model parameters.
- Returns:
Adam optimizer.
- Return type:
torch.optim.Adam
- push.bayes.swag.train_mswag(dataloader: ~torch.utils.data.dataloader.DataLoader, loss_fn: ~typing.Callable, pretrain_epochs: int, swag_epochs: int, nn: ~typing.Callable, *args, lr: float = 0.01, num_devices=1, cache_size: int = 4, view_size: int = 4, num_models: int = 1, cov_mat_rank: int = 20, prior=False, random_seed=False, bootstrap=False, mswag_entry=<function _mswag_particle>, mswag_state={}, f_save=False, mswag_sample_entry=<function _mswag_sample_entry>, mswag_sample=<function _mswag_sample>)
Train a MultiSWAG model.
- Parameters:
dataloader (DataLoader) – DataLoader containing the training data.
loss_fn (Callable) – Loss function used for training.
pretrain_epochs (int) – Number of epochs for pretraining.
swag_epochs (int) – Number of epochs for SWAG training.
num_models (int) – Number of models to use in MultiSWAG.
cov_mat_rank (int) – Maximum rank of low rank plus diagonal covariance matrix
cache_size (int) – Size of the cache for MultiSWAG.
view_size (int) – Size of the view for MultiSWAG.
nn (Callable) – Callable function representing the neural network model.
*args – Additional arguments for the neural network.
num_devices (int) – Number of devices for training (default is 1).
lr (float) – Learning rate for training (default is 1e-3).
mswag_entry (Callable) – MultiSWAG entry function (default is _mswag_particle).
mswag_state (dict) – Initial state for MultiSWAG (default is {}).
f_save (bool) – Flag to save the model (default is False).
mswag_sample_entry (Callable) – MultiSWAG sample entry function (default is _mswag_sample_entry).
mswag_sample (Callable) – MultiSWAG sample function (default is _mswag_sample).
- Returns:
Trained MultiSWAG model.
- Return type:
- push.bayes.swag.update_theta(state, state_sq, state_cov_mat_sqrt, param, param_sq, n, cov_mat_rank)
Updates the first and second moments and iterates the number of parameter settings averaged.
- Parameters:
state – First moment.
state_sq – Second moment.
param – Parameters.
param_sq – Squared parameters.
n (int) – Number of iterations.
push.bayes.utils
- push.bayes.utils.flatten(lst)
Flatten a list of tensors into a 1D tensor. Inspired by: https://github.com/wjmaddox/swa_gaussian/blob/master/swag/utils.py
- Parameters:
lst (list) – List of tensors to be flattened.
- Returns:
Flattened 1D tensor.
- Return type:
torch.Tensor
- push.bayes.utils.unflatten_like(vector, likeTensorList)
Unflatten a 1D tensor into a list of tensors shaped like likeTensorList. Inspired by: https://github.com/wjmaddox/swa_gaussian/blob/master/swag/utils.py