push.bayes

push.bayes.ensemble

class push.bayes.ensemble.Ensemble(mk_nn: Callable, *args: any, num_devices: int = 1, cache_size: int = 4, view_size: int = 4)

Bases: Infer

The Ensemble Class. Used for running deep ensembles.

Parameters:
  • mk_nn (Callable) – The base model to be ensembled.

  • *args (any) – Any arguments required for base model to be initialized.

  • num_devices (int, optional) – The desired number of gpu devices that will be utilized. Defaults to 1.

  • cache_size (int, optional) – The size of cache used to store particles. Defaults to 4.

  • view_size (int, optional) – The number of particles to consider storing in cache. Defaults to 4.

bayes_infer(dataloader: ~torch.utils.data.dataloader.DataLoader, epochs: int, loss_fn: ~typing.Callable = MSELoss(), lr: float = 0.01, num_ensembles: int = 2, mk_scheduler=<function mk_scheduler>, prior=False, random_seed=False, bootstrap=False, ensemble_entry=<function _deep_ensemble_main>, ensemble_state={}, f_save: bool = False)

Creates particles and launches push distribution training loop.

Parameters:
  • dataloader (Callable) – Dataloader.

  • epochs (int, optional) – Number of epochs to train for.

  • loss_fn (Callable) – Loss function to be used during training.

  • num_ensembles (int, optional) – The number of models to be ensembled.

  • mk_optim (any) – Returns an optimizer.

  • ensemble_entry (function) – Training loop for deep ensemble.

  • ensemble_state (dict) – A dictionary to store state variables for ensembled models. For example, in SWAG, we need to know how many SWAG epochs have passed to properly calculate a running average of model weights.

  • f_save (bool) – Flag to save each particle/model. Requires “particles” folder in the root directory of the script calling train_deep_ensemble.

Returns:

None

posterior_pred(data: DataLoader, f_reg=True, mode=['mean']) Tensor

Generate posterior predictions for the given data.

Parameters:
  • data (Union[torch.Tensor, DataLoader]) – The input data for which predictions are to be generated. If a torch.Tensor is provided, it is treated as a single input instance. If a DataLoader is provided, predictions are generated for all instances in the DataLoader.

  • f_reg (bool, optional) – Flag indicating whether this is a regression task. Set to false for classification tasks.

  • mode (str, optional) – The mode for generating predictions. Options include “mean” for mean predictions, “median” for median predictions, “max” for max predictions, and “min” for min predictions. Defaults to “mean”.

Returns:

The posterior predictions for the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the provided data is not of type torch.Tensor or DataLoader.

Note

This function uses the push_dist module to launch distributed predictions asynchronously. The type of predictions depends on the specified mode.

push.bayes.ensemble.create_optimizer(lr)

Create a function that returns Adam optimizer with a specific learning rate.

Parameters:

lr (float) – Learning rate for the optimizer.

Returns:

Function that generates Adam optimizer with the specified learning rate.

Return type:

function

push.bayes.ensemble.mk_empty_scheduler(optim)

Returns Adam optimizer.

Parameters:

params – Model parameters.

Returns:

Adam optimizer.

Return type:

torch.optim.Adam

push.bayes.ensemble.mk_optim(params)

Returns Adam optimizer.

Parameters:

params – Model parameters.

Returns:

Adam optimizer.

Return type:

torch.optim.Adam

push.bayes.ensemble.mk_scheduler(optim)

Returns Adam optimizer.

Parameters:

params – Model parameters.

Returns:

Adam optimizer.

Return type:

torch.optim.Adam

push.bayes.ensemble.train_deep_ensemble(dataloader: ~typing.Callable, loss_fn: ~typing.Callable, epochs: int, nn: ~typing.Callable, *args, lr: float = 0.01, num_devices: int = 1, cache_size: int = 4, view_size: int = 4, num_ensembles: int = 2, prior=False, random_seed=False, bootstrap=False, ensemble_entry=<function _deep_ensemble_main>, ensemble_state={}) List[Tensor]

Train a deep ensemble PusH distribution and return a list of particle parameters.

Parameters:
  • dataloader (Callable) – Dataloader.

  • loss_fn (Callable) – Loss function to be used during training.

  • epochs (int, optional) – Number of epochs to train for.

  • nn (Callable) – The base model to be ensembled and trained.

  • *args (any) – Any arguments needed for the model’s initialization.

  • num_devices (int, optional) – The desired number of gpu devices to be utilized during training. Defaults to 1.

  • cache_size (int, optional) – The desired size of cache allocated to storing particles. Defaults to 4.

  • view_size (int, optional) – The number of other particle’s parameters that can be seen by a particle on a single GPU. Defaults to 4.

  • num_ensembles (int, optional) – The number of models to be ensembled. Defaults to 2.

  • mk_optim (any, optional) – Returns an optimizer. Defaults to mk_optim.

  • ensemble_entry (function, optional) – Training loop for deep ensemble. Defaults to _deep_ensemble_main.

  • ensemble_state (dict, optional) – a dictionary to store state variables for ensembled models. i.e. in swag we need to know how how many swag epochs have passed to properly calculate a running average of model weights. Defaults to {}.

Returns:

Returns a list of all particle’s parameters.

Return type:

List[torch.Tensor]

push.bayes.infer

class push.bayes.infer.Infer(mk_nn: Callable, *args: any, num_devices=1, cache_size=4, view_size=4)

Bases: object

Base Infer class

Creates a PusH distribution with an inference method and return parameters method.

Infer is a base class that should be inherited by a child class that implements a Bayesian inference method.

Parameters:
  • mk_nn (Callable) – Function to create base model.

  • *args (any) – Any arguments required for base model to be initialized.

  • num_devices (int, optional) – The desired number of gpu devices that will be utilized. Defaults to 1.

  • cache_size (int, optional) – The size of cache used to store particles. Defaults to 4.

  • view_size (int, optional) – The number of particles to consider storing in cache. Defaults to 4.

bayes_infer(dataloader: DataLoader, epochs: int, **kwargs) None

Bayesian inference method.

This method should be overridden by subclass.

Parameters:
  • dataloader (DataLoader) – The dataloader to use for training.

  • epochs (int) – The number of epochs to train for.

Raises:

NotImplementedError

get_var(outputs: List[List[Tensor]]) List[List[Tensor]]

Calculates the variance of predictions over different models.

Parameters:

outputs (List[List[torch.Tensor]]) – List of model predictions for each batch.

Returns:

List of tensors representing the variance of predictions over different models.

Return type:

List[torch.Tensor]

p_parameters() List[List[Tensor]]

Return parameters of all particles.

Returns:

List of all particle parameters.

Return type:

List[List[torch.Tensor]]

posterior_pred(data: DataLoader, f_reg=True, mode='mean') Tensor

Posterior prediction.

This method should be overridden by subclass.

Parameters:
  • data (DataLoader) – Test data.

  • f_reg (bool, optional) – Set to True for regression task. Defaults to True.

  • mode (str, optional) – Type of posterior prediction to use. Defaults to “mean”.

Returns:

Tensor of predictions

Return type:

torch.Tensor

push.bayes.stein_vgd

class push.bayes.stein_vgd.SteinVGD(mk_nn: Callable, *args: any, num_devices=1, cache_size=4, view_size=4)

Bases: Infer

SteinVGD Class.

This class extends the ‘Infer’ class and uses Stein Variational Gradient Descent (SteinVGD) for Bayesian inference tasks.

Parameters:
  • mk_nn (Callable) – A function that creates the neural network architecture for the model.

  • *args (any) – Additional arguments that will be passed to the ‘Infer’ class.

  • num_devices (int) – The number of devices to be used for computation. Default is 1.

  • cache_size (int) – The size of the cache for storing computed gradients. Default is 4.

  • view_size (int) – The size of the view for distributed computations. Default is 4.

bayes_infer(dataloader: ~torch.utils.data.dataloader.DataLoader, epochs: int, prior=False, random_seed=False, bootstrap=False, loss_fn=MSELoss(), num_particles=1, lengthscale=1.0, lr=0.001, svgd_entry=<function _svgd_leader>, svgd_state={})

Perform Bayesian inference using SteinVGD.

Parameters:
  • dataloader (DataLoader) – Dataloader for training.

  • epochs (int) – Number of training epochs.

  • prior – Prior information for Bayesian inference. Default is None.

  • loss_fn (Callable) – Loss function to be used during training. Default is torch.nn.MSELoss().

  • num_particles (int) – Number of particles to use in SVGD. Default is 1.

  • lengthscale (float) – Characteristic length scale of the SVGD kernel. Default is 1.0.

  • lr (float) – Learning rate for optimization. Default is 1e-3.

  • svgd_entry (Callable) – SVGD entry function. Default is _svgd_leader.

  • svgd_state (dict) – Additional state information for SVGD. Default is {}.

posterior_pred(data: DataLoader, f_reg=True, mode=['mean']) Tensor

Posterior prediction.

This method should be overridden by subclass.

Parameters:
  • data (DataLoader) – Test data.

  • f_reg (bool, optional) – Set to True for regression task. Defaults to True.

  • mode (str, optional) – Type of posterior prediction to use. Defaults to “mean”.

Returns:

Tensor of predictions

Return type:

torch.Tensor

push.bayes.stein_vgd.mk_empty_optim(params)

Helper function to create an empty optimizer.

Parameters:

params – Model parameters.

Returns:

None.

push.bayes.stein_vgd.mk_empty_scheduler(optim)

Helper function to create an empty optimizer.

Parameters:

params – Model parameters.

Returns:

None.

push.bayes.stein_vgd.normal_prior(params: Iterable[Tensor]) list[torch.Tensor]

Compute gradients with respect to a normal distribution.

This function calculates the gradients with respect to a normal distribution with mean 0.0 and standard deviation 1.0.

Parameters:

params (Iterable[torch.Tensor]) – Collection of tensors for which gradients are computed.

Returns:

List of computed gradients for each parameter.

Return type:

List[torch.Tensor]

push.bayes.stein_vgd.torch_squared_exp_kernel(x: Tensor, y: Tensor, length_scale: float) Tensor

Compute the squared exponential kernel between two tensors.

This function calculates the squared exponential kernel value between two tensors x and y. The kernel has a characteristic length scale specified by length_scale.

Parameters:
  • x (torch.Tensor) – First input tensor.

  • y (torch.Tensor) – Second input tensor.

  • length_scale (float) – Characteristic length scale of the kernel.

Returns:

Computed squared exponential kernel value.

Return type:

torch.Tensor

Note

The kernel is commonly used in Gaussian Process regression for modeling smooth functions.

push.bayes.stein_vgd.torch_squared_exp_kernel_grad(x: Tensor, y: Tensor, length_scale: float) Tensor

Compute the gradient of the squared exponential kernel.

This function calculates the gradient of the squared exponential kernel with respect to its inputs.

Parameters:
  • x (torch.Tensor) – First input tensor.

  • y (torch.Tensor) – Second input tensor.

  • length_scale (float) – Characteristic length scale of the kernel.

Returns:

Computed gradient of the squared exponential kernel.

Return type:

torch.Tensor

push.bayes.stein_vgd.train_svgd(dataloader: ~torch.utils.data.dataloader.DataLoader, loss_fn: ~typing.Callable, epochs: int, num_particles: int, nn: ~typing.Callable, *args, lengthscale=1.0, lr=0.001, prior=None, num_devices=1, cache_size=4, view_size=4, svgd_entry=<function _svgd_leader>, svgd_state={}) None

Trains a model using Stein Variational Gradient Descent (SVGD).

This function trains a model using Stein Variational Gradient Descent (SVGD). It initializes a SteinVGD instance and performs Bayesian inference using the provided data loader, loss function, and training parameters. The resulting parameters from SVGD are returned.

Parameters:
  • dataloader (DataLoader) – The data loader for the training data.

  • loss_fn (Callable) – The loss function to be used during training.

  • epochs (int) – The number of training epochs.

  • num_particles (int) – The number of particles to use in SVGD.

  • nn (Callable) – A function that creates the neural network architecture for the model.

  • *args (any) – Additional arguments to be passed to the SteinVGD constructor.

  • lengthscale (float, optional) – The characteristic length scale of the SVGD kernel. Default is 1.0.

  • lr (float, optional) – The learning rate for optimization. Default is 1e3.

  • prior – Prior information for Bayesian inference. Default is None.

  • num_devices (int, optional) – The number of devices to be used for computation. Default is 1.

  • cache_size (int, optional) – The size of the cache for storing computed gradients. Default is 4.

  • view_size (int, optional) – The size of the view for distributed computations. Default is 4.

  • svgd_entry (Callable, optional) – The SVGD entry function. Default is _svgd_leader.

  • svgd_state (dict, optional) – Additional state information for SVGD. Default is {}.

Returns:

None

Note

The returned parameters can be used for further inference, testing, and analysis.

push.bayes.swag

class push.bayes.swag.MultiSWAG(mk_nn: Callable, *args: any, num_devices=1, cache_size=4, view_size=4)

Bases: Infer

MultiSWAG class for running MultiSWAG models.

Parameters:
  • mk_nn (Callable) – The base model to be ensembled.

  • *args – Any arguments required for the base model initialization.

  • num_devices (int) – The desired number of GPU devices to utilize.

  • cache_size (int) – The size of the cache used to store particles.

  • view_size (int) – The number of particles to consider storing in the cache.

bayes_infer(dataloader: ~torch.utils.data.dataloader.DataLoader, pretrain_epochs: int, swag_epochs: int, loss_fn: ~typing.Callable = MSELoss(), lr: float = 0.01, num_models: int = 1, cov_mat_rank: int = 20, prior=False, random_seed=False, bootstrap=False, mswag_entry=<function _mswag_particle>, mswag_state={}, f_save=False, mswag_sample_entry=<function _mswag_sample_entry>, mswag_sample=<function _mswag_sample>)

Perform Bayesian inference using MultiSWAG.

Parameters:
  • dataloader (DataLoader) – DataLoader containing the data.

  • loss_fn (Callable) – Loss function used for training.

  • num_models (int) – Number of models to be ensembled.

  • cov_mat_rank (int) – Maximum rank of low rank plus diagonal covariance matrix

  • lr (float) – Learning rate for training.

  • pretrain_epochs (int) – Number of epochs for pretraining.

  • swag_epochs (int) – Number of epochs for SWAG training.

  • mswag_entry (Callable) – Training loop for deep ensemble.

  • mswag_state (dict) – State variables for ensembled models.

  • f_save (bool) – Flag to save each particle/model.

  • mswag_sample_entry (Callable) – Sampling function.

  • mswag_sample (Callable) – MultiSWAG sample function.

Returns:

None

posterior_pred(data: DataLoader, loss_fn=MSELoss(), num_samples: int = 20, scale: float = 1.0, var_clamp: float = 1e-30, mode: List[str] = ['mean'], f_reg: bool = True)

Generate posterior predictions using MultiSWAG.

Parameters:
  • dataloader (DataLoader) – DataLoader containing the data.

  • loss_fn (Callable) – Loss function used for computing losses.

  • num_samples (int) – Number of samples to generate.

  • scale (float) – Scaling factor for the SWAG sample.

  • var_clamp (float) – Clamping value for the variance.

Returns:

None

push.bayes.swag.create_optimizer(lr)

Create a function that returns Adam optimizer with a specific learning rate.

Parameters:

lr (float) – Learning rate for the optimizer.

Returns:

Function that generates Adam optimizer with the specified learning rate.

Return type:

function

push.bayes.swag.mk_optim(params)

Returns an Adam optimizer.

Parameters:

params – Parameters for optimization.

Returns:

Adam optimizer.

Return type:

torch.optim.Adam

push.bayes.swag.mk_scheduler(optim)

Returns Adam optimizer.

Parameters:

params – Model parameters.

Returns:

Adam optimizer.

Return type:

torch.optim.Adam

push.bayes.swag.train_mswag(dataloader: ~torch.utils.data.dataloader.DataLoader, loss_fn: ~typing.Callable, pretrain_epochs: int, swag_epochs: int, nn: ~typing.Callable, *args, lr: float = 0.01, num_devices=1, cache_size: int = 4, view_size: int = 4, num_models: int = 1, cov_mat_rank: int = 20, prior=False, random_seed=False, bootstrap=False, mswag_entry=<function _mswag_particle>, mswag_state={}, f_save=False, mswag_sample_entry=<function _mswag_sample_entry>, mswag_sample=<function _mswag_sample>)

Train a MultiSWAG model.

Parameters:
  • dataloader (DataLoader) – DataLoader containing the training data.

  • loss_fn (Callable) – Loss function used for training.

  • pretrain_epochs (int) – Number of epochs for pretraining.

  • swag_epochs (int) – Number of epochs for SWAG training.

  • num_models (int) – Number of models to use in MultiSWAG.

  • cov_mat_rank (int) – Maximum rank of low rank plus diagonal covariance matrix

  • cache_size (int) – Size of the cache for MultiSWAG.

  • view_size (int) – Size of the view for MultiSWAG.

  • nn (Callable) – Callable function representing the neural network model.

  • *args – Additional arguments for the neural network.

  • num_devices (int) – Number of devices for training (default is 1).

  • lr (float) – Learning rate for training (default is 1e-3).

  • mswag_entry (Callable) – MultiSWAG entry function (default is _mswag_particle).

  • mswag_state (dict) – Initial state for MultiSWAG (default is {}).

  • f_save (bool) – Flag to save the model (default is False).

  • mswag_sample_entry (Callable) – MultiSWAG sample entry function (default is _mswag_sample_entry).

  • mswag_sample (Callable) – MultiSWAG sample function (default is _mswag_sample).

Returns:

Trained MultiSWAG model.

Return type:

MultiSWAG

push.bayes.swag.update_theta(state, state_sq, state_cov_mat_sqrt, param, param_sq, n, cov_mat_rank)

Updates the first and second moments and iterates the number of parameter settings averaged.

Parameters:
  • state – First moment.

  • state_sq – Second moment.

  • param – Parameters.

  • param_sq – Squared parameters.

  • n (int) – Number of iterations.

push.bayes.utils

push.bayes.utils.flatten(lst)

Flatten a list of tensors into a 1D tensor. Inspired by: https://github.com/wjmaddox/swa_gaussian/blob/master/swag/utils.py

Parameters:

lst (list) – List of tensors to be flattened.

Returns:

Flattened 1D tensor.

Return type:

torch.Tensor

push.bayes.utils.unflatten_like(vector, likeTensorList)

Unflatten a 1D tensor into a list of tensors shaped like likeTensorList. Inspired by: https://github.com/wjmaddox/swa_gaussian/blob/master/swag/utils.py

Parameters:
  • vector (torch.Tensor) – 1D tensor to be unflattened.

  • likeTensorList (list) – List of tensors providing the shape for unflattening.

Returns:

List of unflattened tensors.

Return type:

list