push.bayes

push.bayes.ensemble

class push.bayes.ensemble.Ensemble(mk_nn: Callable, *args: any, num_devices: int = 1, cache_size: int = 4, view_size: int = 4)

Bases: Infer

The Ensemble Class. Used for running deep ensembles.

Parameters:

mk_nn (Callable) – The base model to be ensembled.
*args (any) – Any arguments required for base model to be initialized.
num_devices (int, optional) – The desired number of gpu devices that will be utilized. Defaults to 1.
cache_size (int, optional) – The size of cache used to store particles. Defaults to 4.
view_size (int, optional) – The number of particles to consider storing in cache. Defaults to 4.

bayes_infer(dataloader: ~torch.utils.data.dataloader.DataLoader, epochs: int, loss_fn: ~typing.Callable = MSELoss(), lr: float = 0.01, num_ensembles: int = 2, mk_scheduler=<function mk_scheduler>, prior=False, random_seed=False, bootstrap=False, ensemble_entry=<function _deep_ensemble_main>, ensemble_state={}, f_save: bool = False)

Creates particles and launches push distribution training loop.

Parameters:

dataloader (Callable) – Dataloader.
epochs (int, optional) – Number of epochs to train for.
loss_fn (Callable) – Loss function to be used during training.
num_ensembles (int, optional) – The number of models to be ensembled.
mk_optim (any) – Returns an optimizer.
ensemble_entry (function) – Training loop for deep ensemble.
ensemble_state (dict) – A dictionary to store state variables for ensembled models. For example, in SWAG, we need to know how many SWAG epochs have passed to properly calculate a running average of model weights.
f_save (bool) – Flag to save each particle/model. Requires “particles” folder in the root directory of the script calling train_deep_ensemble.

Returns:

None

posterior_pred(data: DataLoader, f_reg=True, mode=['mean']) → Tensor

Generate posterior predictions for the given data.

Parameters:

data (Union[torch.Tensor, DataLoader]) – The input data for which predictions are to be generated. If a torch.Tensor is provided, it is treated as a single input instance. If a DataLoader is provided, predictions are generated for all instances in the DataLoader.
f_reg (bool, optional) – Flag indicating whether this is a regression task. Set to false for classification tasks.
mode (str, optional) – The mode for generating predictions. Options include “mean” for mean predictions, “median” for median predictions, “max” for max predictions, and “min” for min predictions. Defaults to “mean”.

Returns:

The posterior predictions for the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the provided data is not of type torch.Tensor or DataLoader.

Note

This function uses the push_dist module to launch distributed predictions asynchronously. The type of predictions depends on the specified mode.

push.bayes.ensemble.create_optimizer(lr)

Create a function that returns Adam optimizer with a specific learning rate.

Parameters:: lr (float) – Learning rate for the optimizer.
Returns:: Function that generates Adam optimizer with the specified learning rate.
Return type:: function

push.bayes.ensemble.mk_empty_scheduler(optim)

Returns Adam optimizer.

Parameters:: params – Model parameters.
Returns:: Adam optimizer.
Return type:: torch.optim.Adam

push.bayes.ensemble.mk_optim(params)

Returns Adam optimizer.

Parameters:: params – Model parameters.
Returns:: Adam optimizer.
Return type:: torch.optim.Adam

push.bayes.ensemble.mk_scheduler(optim)

Returns Adam optimizer.

Parameters:: params – Model parameters.
Returns:: Adam optimizer.
Return type:: torch.optim.Adam

push.bayes.ensemble.train_deep_ensemble(dataloader: ~typing.Callable, loss_fn: ~typing.Callable, epochs: int, nn: ~typing.Callable, *args, lr: float = 0.01, num_devices: int = 1, cache_size: int = 4, view_size: int = 4, num_ensembles: int = 2, prior=False, random_seed=False, bootstrap=False, ensemble_entry=<function _deep_ensemble_main>, ensemble_state={}) → List[Tensor]

Train a deep ensemble PusH distribution and return a list of particle parameters.

Parameters:

dataloader (Callable) – Dataloader.
loss_fn (Callable) – Loss function to be used during training.
epochs (int, optional) – Number of epochs to train for.
nn (Callable) – The base model to be ensembled and trained.
*args (any) – Any arguments needed for the model’s initialization.
num_devices (int, optional) – The desired number of gpu devices to be utilized during training. Defaults to 1.
cache_size (int, optional) – The desired size of cache allocated to storing particles. Defaults to 4.
view_size (int, optional) – The number of other particle’s parameters that can be seen by a particle on a single GPU. Defaults to 4.
num_ensembles (int, optional) – The number of models to be ensembled. Defaults to 2.
mk_optim (any, optional) – Returns an optimizer. Defaults to mk_optim.
ensemble_entry (function, optional) – Training loop for deep ensemble. Defaults to _deep_ensemble_main.
ensemble_state (dict, optional) – a dictionary to store state variables for ensembled models. i.e. in swag we need to know how how many swag epochs have passed to properly calculate a running average of model weights. Defaults to {}.

Returns:

Returns a list of all particle’s parameters.

Return type:

List[torch.Tensor]

push.bayes.infer

class push.bayes.infer.Infer(mk_nn: Callable, *args: any, num_devices=1, cache_size=4, view_size=4)

Bases: object

Base Infer class

Creates a PusH distribution with an inference method and return parameters method.

Infer is a base class that should be inherited by a child class that implements a Bayesian inference method.

Parameters:

mk_nn (Callable) – Function to create base model.
*args (any) – Any arguments required for base model to be initialized.
num_devices (int, optional) – The desired number of gpu devices that will be utilized. Defaults to 1.
cache_size (int, optional) – The size of cache used to store particles. Defaults to 4.
view_size (int, optional) – The number of particles to consider storing in cache. Defaults to 4.

bayes_infer(dataloader: DataLoader, epochs: int, **kwargs) → None

Bayesian inference method.

This method should be overridden by subclass.

Parameters:

dataloader (DataLoader) – The dataloader to use for training.
epochs (int) – The number of epochs to train for.

Raises:

NotImplementedError –

get_var(outputs: List[List[Tensor]]) → List[List[Tensor]]

Calculates the variance of predictions over different models.

Parameters:: outputs (List[List[torch.Tensor]]) – List of model predictions for each batch.
Returns:: List of tensors representing the variance of predictions over different models.
Return type:: List[torch.Tensor]

p_parameters() → List[List[Tensor]]

Return parameters of all particles.

Returns:: List of all particle parameters.
Return type:: List[List[torch.Tensor]]

posterior_pred(data: DataLoader, f_reg=True, mode='mean') → Tensor

Posterior prediction.

This method should be overridden by subclass.

Parameters:

data (DataLoader) – Test data.
f_reg (bool, optional) – Set to True for regression task. Defaults to True.
mode (str, optional) – Type of posterior prediction to use. Defaults to “mean”.

Returns:

Tensor of predictions

Return type:

torch.Tensor

push.bayes.stein_vgd

class push.bayes.stein_vgd.SteinVGD(mk_nn: Callable, *args: any, num_devices=1, cache_size=4, view_size=4)

Bases: Infer

SteinVGD Class.

This class extends the ‘Infer’ class and uses Stein Variational Gradient Descent (SteinVGD) for Bayesian inference tasks.

Parameters:

mk_nn (Callable) – A function that creates the neural network architecture for the model.
*args (any) – Additional arguments that will be passed to the ‘Infer’ class.
num_devices (int) – The number of devices to be used for computation. Default is 1.
cache_size (int) – The size of the cache for storing computed gradients. Default is 4.
view_size (int) – The size of the view for distributed computations. Default is 4.

bayes_infer(dataloader: ~torch.utils.data.dataloader.DataLoader, epochs: int, prior=False, random_seed=False, bootstrap=False, loss_fn=MSELoss(), num_particles=1, lengthscale=1.0, lr=0.001, svgd_entry=<function _svgd_leader>, svgd_state={})

Perform Bayesian inference using SteinVGD.

Parameters:

dataloader (DataLoader) – Dataloader for training.
epochs (int) – Number of training epochs.
prior – Prior information for Bayesian inference. Default is None.
loss_fn (Callable) – Loss function to be used during training. Default is torch.nn.MSELoss().
num_particles (int) – Number of particles to use in SVGD. Default is 1.
lengthscale (float) – Characteristic length scale of the SVGD kernel. Default is 1.0.
lr (float) – Learning rate for optimization. Default is 1e-3.
svgd_entry (Callable) – SVGD entry function. Default is _svgd_leader.
svgd_state (dict) – Additional state information for SVGD. Default is {}.

posterior_pred(data: DataLoader, f_reg=True, mode=['mean']) → Tensor

Posterior prediction.

This method should be overridden by subclass.

Parameters:

data (DataLoader) – Test data.
f_reg (bool, optional) – Set to True for regression task. Defaults to True.
mode (str, optional) – Type of posterior prediction to use. Defaults to “mean”.

Returns:

Tensor of predictions

Return type:

torch.Tensor

push.bayes.stein_vgd.mk_empty_optim(params)

Helper function to create an empty optimizer.

Parameters:: params – Model parameters.
Returns:: None.

push.bayes.stein_vgd.mk_empty_scheduler(optim)

Helper function to create an empty optimizer.

Parameters:: params – Model parameters.
Returns:: None.

push.bayes.stein_vgd.normal_prior(params: Iterable[Tensor]) → list[torch.Tensor]

Compute gradients with respect to a normal distribution.

This function calculates the gradients with respect to a normal distribution with mean 0.0 and standard deviation 1.0.

Parameters:: params (Iterable[torch.Tensor]) – Collection of tensors for which gradients are computed.
Returns:: List of computed gradients for each parameter.
Return type:: List[torch.Tensor]

push.bayes.stein_vgd.torch_squared_exp_kernel(x: Tensor, y: Tensor, length_scale: float) → Tensor

Compute the squared exponential kernel between two tensors.

This function calculates the squared exponential kernel value between two tensors x and y. The kernel has a characteristic length scale specified by length_scale.

Parameters:

x (torch.Tensor) – First input tensor.
y (torch.Tensor) – Second input tensor.
length_scale (float) – Characteristic length scale of the kernel.

Returns:

Computed squared exponential kernel value.

Return type:

torch.Tensor

Note

The kernel is commonly used in Gaussian Process regression for modeling smooth functions.

push.bayes.stein_vgd.torch_squared_exp_kernel_grad(x: Tensor, y: Tensor, length_scale: float) → Tensor

Compute the gradient of the squared exponential kernel.

This function calculates the gradient of the squared exponential kernel with respect to its inputs.

Parameters:

x (torch.Tensor) – First input tensor.
y (torch.Tensor) – Second input tensor.
length_scale (float) – Characteristic length scale of the kernel.

Returns:

Computed gradient of the squared exponential kernel.

Return type:

torch.Tensor

push.bayes.stein_vgd.train_svgd(dataloader: ~torch.utils.data.dataloader.DataLoader, loss_fn: ~typing.Callable, epochs: int, num_particles: int, nn: ~typing.Callable, *args, lengthscale=1.0, lr=0.001, prior=None, num_devices=1, cache_size=4, view_size=4, svgd_entry=<function _svgd_leader>, svgd_state={}) → None

Trains a model using Stein Variational Gradient Descent (SVGD).

This function trains a model using Stein Variational Gradient Descent (SVGD). It initializes a SteinVGD instance and performs Bayesian inference using the provided data loader, loss function, and training parameters. The resulting parameters from SVGD are returned.

Parameters:

dataloader (DataLoader) – The data loader for the training data.
loss_fn (Callable) – The loss function to be used during training.
epochs (int) – The number of training epochs.
num_particles (int) – The number of particles to use in SVGD.
nn (Callable) – A function that creates the neural network architecture for the model.
*args (any) – Additional arguments to be passed to the SteinVGD constructor.
lengthscale (float, optional) – The characteristic length scale of the SVGD kernel. Default is 1.0.
lr (float, optional) – The learning rate for optimization. Default is 1e3.
prior – Prior information for Bayesian inference. Default is None.
num_devices (int, optional) – The number of devices to be used for computation. Default is 1.
cache_size (int, optional) – The size of the cache for storing computed gradients. Default is 4.
view_size (int, optional) – The size of the view for distributed computations. Default is 4.
svgd_entry (Callable, optional) – The SVGD entry function. Default is _svgd_leader.
svgd_state (dict, optional) – Additional state information for SVGD. Default is {}.

Returns:

None

Note

The returned parameters can be used for further inference, testing, and analysis.

push.bayes.swag

class push.bayes.swag.MultiSWAG(mk_nn: Callable, *args: any, num_devices=1, cache_size=4, view_size=4)

Bases: Infer

MultiSWAG class for running MultiSWAG models.

Parameters:

mk_nn (Callable) – The base model to be ensembled.
*args – Any arguments required for the base model initialization.
num_devices (int) – The desired number of GPU devices to utilize.
cache_size (int) – The size of the cache used to store particles.
view_size (int) – The number of particles to consider storing in the cache.

bayes_infer(dataloader: ~torch.utils.data.dataloader.DataLoader, pretrain_epochs: int, swag_epochs: int, loss_fn: ~typing.Callable = MSELoss(), lr: float = 0.01, num_models: int = 1, cov_mat_rank: int = 20, prior=False, random_seed=False, bootstrap=False, mswag_entry=<function _mswag_particle>, mswag_state={}, f_save=False, mswag_sample_entry=<function _mswag_sample_entry>, mswag_sample=<function _mswag_sample>)

Perform Bayesian inference using MultiSWAG.

Parameters:

dataloader (DataLoader) – DataLoader containing the data.
loss_fn (Callable) – Loss function used for training.
num_models (int) – Number of models to be ensembled.
cov_mat_rank (int) – Maximum rank of low rank plus diagonal covariance matrix
lr (float) – Learning rate for training.
pretrain_epochs (int) – Number of epochs for pretraining.
swag_epochs (int) – Number of epochs for SWAG training.
mswag_entry (Callable) – Training loop for deep ensemble.
mswag_state (dict) – State variables for ensembled models.
f_save (bool) – Flag to save each particle/model.
mswag_sample_entry (Callable) – Sampling function.
mswag_sample (Callable) – MultiSWAG sample function.

Returns:

None

posterior_pred(data: DataLoader, loss_fn=MSELoss(), num_samples: int = 20, scale: float = 1.0, var_clamp: float = 1e-30, mode: List[str] = ['mean'], f_reg: bool = True)

Generate posterior predictions using MultiSWAG.

Parameters:

dataloader (DataLoader) – DataLoader containing the data.
loss_fn (Callable) – Loss function used for computing losses.
num_samples (int) – Number of samples to generate.
scale (float) – Scaling factor for the SWAG sample.
var_clamp (float) – Clamping value for the variance.

Returns:

None

push.bayes.swag.create_optimizer(lr)

Create a function that returns Adam optimizer with a specific learning rate.

Parameters:: lr (float) – Learning rate for the optimizer.
Returns:: Function that generates Adam optimizer with the specified learning rate.
Return type:: function

push.bayes.swag.mk_optim(params)

Returns an Adam optimizer.

Parameters:: params – Parameters for optimization.
Returns:: Adam optimizer.
Return type:: torch.optim.Adam

push.bayes.swag.mk_scheduler(optim)

Returns Adam optimizer.

Parameters:: params – Model parameters.
Returns:: Adam optimizer.
Return type:: torch.optim.Adam

push.bayes.swag.train_mswag(dataloader: ~torch.utils.data.dataloader.DataLoader, loss_fn: ~typing.Callable, pretrain_epochs: int, swag_epochs: int, nn: ~typing.Callable, *args, lr: float = 0.01, num_devices=1, cache_size: int = 4, view_size: int = 4, num_models: int = 1, cov_mat_rank: int = 20, prior=False, random_seed=False, bootstrap=False, mswag_entry=<function _mswag_particle>, mswag_state={}, f_save=False, mswag_sample_entry=<function _mswag_sample_entry>, mswag_sample=<function _mswag_sample>)

Train a MultiSWAG model.

Parameters:

dataloader (DataLoader) – DataLoader containing the training data.
loss_fn (Callable) – Loss function used for training.
pretrain_epochs (int) – Number of epochs for pretraining.
swag_epochs (int) – Number of epochs for SWAG training.
num_models (int) – Number of models to use in MultiSWAG.
cov_mat_rank (int) – Maximum rank of low rank plus diagonal covariance matrix
cache_size (int) – Size of the cache for MultiSWAG.
view_size (int) – Size of the view for MultiSWAG.
nn (Callable) – Callable function representing the neural network model.
*args – Additional arguments for the neural network.
num_devices (int) – Number of devices for training (default is 1).
lr (float) – Learning rate for training (default is 1e-3).
mswag_entry (Callable) – MultiSWAG entry function (default is _mswag_particle).
mswag_state (dict) – Initial state for MultiSWAG (default is {}).
f_save (bool) – Flag to save the model (default is False).
mswag_sample_entry (Callable) – MultiSWAG sample entry function (default is _mswag_sample_entry).
mswag_sample (Callable) – MultiSWAG sample function (default is _mswag_sample).

Returns:

Trained MultiSWAG model.

Return type:

MultiSWAG

push.bayes.swag.update_theta(state, state_sq, state_cov_mat_sqrt, param, param_sq, n, cov_mat_rank)

Updates the first and second moments and iterates the number of parameter settings averaged.

Parameters:

state – First moment.
state_sq – Second moment.
param – Parameters.
param_sq – Squared parameters.
n (int) – Number of iterations.

push.bayes.utils

push.bayes.utils.flatten(lst)

Flatten a list of tensors into a 1D tensor. Inspired by: https://github.com/wjmaddox/swa_gaussian/blob/master/swag/utils.py

Parameters:: lst (list) – List of tensors to be flattened.
Returns:: Flattened 1D tensor.
Return type:: torch.Tensor

push.bayes.utils.unflatten_like(vector, likeTensorList)

Unflatten a 1D tensor into a list of tensors shaped like likeTensorList. Inspired by: https://github.com/wjmaddox/swa_gaussian/blob/master/swag/utils.py

Parameters:

vector (torch.Tensor) – 1D tensor to be unflattened.
likeTensorList (list) – List of tensors providing the shape for unflattening.

Returns:

List of unflattened tensors.

Return type:

list