Computations

GGN eigenvalues

class vivit.EigvalshComputation(subsampling: Optional[List[int]] = None, mc_samples: int = 0, verbose: bool = False)

Provide BackPACK extension and hook to compute GGN eigenvalues.

__init__(subsampling: Optional[List[int]] = None, mc_samples: int = 0, verbose: bool = False)

Specify GGN approximations. Use no approximations by default.

Note

The loss function must use reduction = 'mean'.

Parameters

subsampling – Indices of samples used for GGN curvature sub-sampling. None (equivalent to list(range(batch_size))) uses all mini-batch samples. Defaults to None (no curvature sub-sampling).
mc_samples – If 0, don’t Monte-Carlo (MC) approximate the GGN. Otherwise, specifies the number of MC samples used to approximate the backpropagated loss Hessian. Default: 0 (no MC approximation).
verbose – Turn on verbose mode. If enabled, this will print what’s happening during backpropagation to command line (consider it a debugging tool). Defaults to False.

get_extension() → BackpropExtension

Instantiate the BackPACK extension for computing GGN eigenvalues.

Returns: BackPACK extension to compute eigenvalues that should be passed to the with backpack(...) context.

get_extension_hook(param_groups: List[Dict]) → Callable[[Module], None]

Instantiates the BackPACK extension hook to compute GGN eigenvalues.

Parameters

param_groups –

Parameter groups list as required by a torch.optim.Optimizer. Specifies the block structure: Each group must specify the 'params' key which contains a list of the parameters that form a GGN block. Examples:

[{'params': list(p for p in model.parameters()}] computes eigenvalues for the full GGN (one block).
[{'params': [p]} for p in model.parameters()] computes eigenvalues for each block of a per-parameter block-diagonal GGN approximation.

Returns

BackPACK extension hook to compute eigenvalues that should be passed to the with backpack(...) context. The hook computes GGN eigenvalues during backpropagation and stores them internally (under self._evals).

get_result(group: Dict) → Tensor

Return eigenvalues of a GGN block after the backward pass.

Parameters: group – Parameter group that defines the GGN block.
Returns: One-dimensional tensor containing the block’s eigenvalues (ascending order).
Raises: KeyError – If there are no results for the group.

GGN eigenpairs (eigenvalues + eigenvector)

class vivit.EighComputation(subsampling: Optional[List[int]] = None, mc_samples: int = 0, verbose: bool = False, warn_small_eigvals: float = 0.0001)

Provide BackPACK extension and hook to compute GGN eigenpairs.

__init__(subsampling: Optional[List[int]] = None, mc_samples: int = 0, verbose: bool = False, warn_small_eigvals: float = 0.0001)

Specify GGN approximations. Use no approximations by default.

Note

The loss function must use reduction = 'mean'.

Parameters

subsampling – Indices of samples used for GGN curvature sub-sampling. None (equivalent to list(range(batch_size))) uses all mini-batch samples. Defaults to None (no curvature sub-sampling).
mc_samples – If 0, don’t Monte-Carlo (MC) approximate the GGN. Otherwise, specifies the number of MC samples used to approximate the backpropagated loss Hessian. Default: 0 (no MC approximation).
verbose – Turn on verbose mode. If enabled, this will print what’s happening during backpropagation to command line (consider it a debugging tool). Defaults to False.
warn_small_eigvals – The eigenvector computation breaks down for numerically small eigenvalues close to zero. This variable triggers a user warning when attempting to compute eigenvectors for eigenvalues whose absolute value is smaller. Defaults to 1e-4. You can disable the warning by setting it to 0 (not recommended).

get_extension() → BackpropExtension

Instantiate the BackPACK extension for computing GGN eigenpairs.

Returns: BackPACK extension to compute eigenvalues that should be passed to the with backpack(...) context.

get_extension_hook(param_groups: List[Dict]) → Callable[[Module], None]

Instantiates the BackPACK extension hook to compute GGN eigenpairs.

Parameters

param_groups –

Parameter groups list as required by a torch.optim.Optimizer. Specifies the block structure: Each group must specify the 'params' key which contains a list of the parameters that form a GGN block, and a 'criterion' entry that specifies a filter function to select eigenvalues for the eigenvector computation (details below).

Examples for 'params':

[{'params': list(p for p in model.parameters()}] uses the full GGN (one block).
[{'params': [p]} for p in model.parameters()] uses a per-parameter block-diagonal GGN approximation.

The function specified under 'criterion' is a Callable[[Tensor], List[int]]. It receives the eigenvalues (in ascending order) and returns the indices of eigenvalues whose eigenvectors should be computed. Examples:

{'criterion': lambda evals: [evals.numel() - 1]} discards all eigenvalues except for the largest.
{'criterion': lambda evals: list(range(evals.numel()))} computes eigenvectors for all Gram matrix eigenvalues.

Returns

BackPACK extension hook to compute eigenpairs that should be passed to the with backpack(...) context. The hook computes GGN eigenpairs during backpropagation and stores them internally (under self._evals and self._evecs).

get_result(group: Dict) → Tuple[Tensor, List[Tensor]]

Return eigenvalues and eigenvectors of a GGN block after the backward pass.

Parameters

group – Parameter group that defines the GGN block.

Returns

One-dimensional tensor containing the block’s eigenvalues and eigenvectors in parameter list format with leading axis for the eigenvectors.

Example: Let evals, evecs denote the returned variables. Let group['params'] = [p1, p2] consist of two parameters. Then, evecs contains tensors of shape [(evals.numel(), *p1.shape), (evals.numel(), *p2.shape)]. For an eigenvalue evals[k], the corresponding eigenvector (in list format) is [vecs[k] for vecs in evecs] and its tensors are of shape [p1.shape, p2.shape].

Raises

KeyError – If there are no results for the group.

1ˢᵗ- and 2ⁿᵈ-order directional derivatives along GGN eigenvectors

class vivit.DirectionalDerivativesComputation(subsampling_grad: Optional[List[int]] = None, subsampling_ggn: Optional[List[int]] = None, mc_samples_ggn: Optional[int] = 0, verbose: Optional[bool] = False, warn_small_eigvals: float = 0.0001)

Provide BackPACK extension and hook for 1ˢᵗ/2ⁿᵈ-order directional derivatives.

The directions are given by the GGN eigenvectors. First-order directional derivatives are denoted γ, second-order directional derivatives as λ.

__init__(subsampling_grad: Optional[List[int]] = None, subsampling_ggn: Optional[List[int]] = None, mc_samples_ggn: Optional[int] = 0, verbose: Optional[bool] = False, warn_small_eigvals: float = 0.0001)

Specify GGN and gradient approximations. Use no approximations by default.

Note

The loss function must use reduction = 'mean'.

Parameters

subsampling_grad – Indices of samples used for gradient sub-sampling. None (equivalent to list(range(batch_size))) uses all mini-batch samples to compute directional gradients . Defaults to None (no gradient sub-sampling).
subsampling_ggn – Indices of samples used for GGN curvature sub-sampling. None (equivalent to list(range(batch_size))) uses all mini-batch samples to compute directions and directional curvatures. Defaults to None (no curvature sub-sampling).
mc_samples_ggn – If 0, don’t Monte-Carlo (MC) approximate the GGN (using the same samples to compute the directions and directional curvatures). Otherwise, specifies the number of MC samples used to approximate the backpropagated loss Hessian. Default: 0 (no MC approximation).
verbose – Turn on verbose mode. If enabled, this will print what’s happening during backpropagation to command line (consider it a debugging tool). Defaults to False.
warn_small_eigvals – The gamma computation breaks down for numerically small eigenvalues close to zero (we need to divide by their square root). This variable triggers a user warning when attempting to compute directional gradients for eigenvalues whose absolute value is smaller. Defaults to 1e-4. You can disable the warning by setting it to 0 (not recommended).

get_extension_hook(param_groups: List[Dict]) → Callable[[Module], None]

Instantiate BackPACK extension hook to compute GGN directional derivatives.

Parameters

param_groups –

Parameter groups list as required by a torch.optim.Optimizer. Specifies the block structure: Each group must specify the 'params' key which contains a list of the parameters that form a GGN block, and a 'criterion' entry that specifies a filter function to select eigenvalues as directions along which to compute directional derivatives (details below).

Examples for 'params':

[{'params': list(p for p in model.parameters()}] uses the full GGN (one block).
[{'params': [p]} for p in model.parameters()] uses a per-parameter block-diagonal GGN approximation.

The function specified under 'criterion' is a Callable[[Tensor], List[int]]. It receives the eigenvalues (in ascending order) and returns the indices of eigenvalues whose eigenvectors should be used as directions to evaluate directional derivatives. Examples:

{'criterion': lambda evals: [evals.numel() - 1]} discards all directions except for the leading eigenvector.
{'criterion': lambda evals: list(range(evals.numel()))} computes directional derivatives along all Gram matrix eigenvectors.

Returns

BackPACK extension hook, to compute directional derivatives, that should be passed to the with backpack(...) context. The hook computes GGN directional derivatives during backpropagation and stores them internally (under self._gammas and self._lambdas).

get_extensions() → List[BackpropExtension]

Instantiate the BackPACK extensions to compute GGN directional derivatives.

Returns: BackPACK extensions, to compute directional 1ˢᵗ- and 2ⁿᵈ-order directional derivatives along GGN eigenvectors, that should be extracted and passed to the with backpack(...) context.

get_result(group: Dict) → Tuple[Tensor, Tensor]

Return 1ˢᵗ/2ⁿᵈ-order directional derivatives along GGN eigenvectors.

Must be called after the backward pass.

Parameters: group – Parameter group that defines the GGN block.
Returns: 1ˢᵗ- directional derivatives γ as [N, K] tensor with γ[n, k] the directional gradient of sample n (or subsampling_grad[n]) along direction k. 2ⁿᵈ- directional derivatives λ as [N, K] tensor with λ[n, k] the directional curvature of sample n (or subsampling_ggn[n]) along direction k.
Raises: KeyError – If there are no results for the group.

Directionally damped Newton steps

class vivit.DirectionalDampedNewtonComputation(subsampling_grad: Optional[List[int]] = None, subsampling_ggn: Optional[List[int]] = None, mc_samples_ggn: Optional[int] = 0, verbose: Optional[bool] = False, warn_small_eigvals: float = 0.0001)

Provides BackPACK extension and hook for directionally damped Newton steps.

Let \(\{e_k\}_{k = 1, \dots, K}\) denote the \(K\) eigenvectors used for the Newton step. Let \(\gamma_k, \lambda_k\) denote the first- and second-order derivatives along these eigenvectors. The direcionally damped Newton step \(s\) is

\[s = \sum_{k=1}^K \frac{- \gamma_k}{\lambda_k + \delta_k} e_k\]

Here, \(\delta_k\) is the damping along direction \(k\). The exact Newton step has zero damping.

__init__(subsampling_grad: Optional[List[int]] = None, subsampling_ggn: Optional[List[int]] = None, mc_samples_ggn: Optional[int] = 0, verbose: Optional[bool] = False, warn_small_eigvals: float = 0.0001)

Specify GGN and gradient approximations. Use no approximations by default.

Note

The loss function must use reduction = 'mean'.

Parameters

subsampling_grad – Indices of samples used for gradient sub-sampling. None (equivalent to list(range(batch_size))) uses all mini-batch samples to compute directional gradients . Defaults to None (no gradient sub-sampling).
subsampling_ggn – Indices of samples used for GGN curvature sub-sampling. None (equivalent to list(range(batch_size))) uses all mini-batch samples to compute directions and directional curvatures. Defaults to None (no curvature sub-sampling).
mc_samples_ggn – If 0, don’t Monte-Carlo (MC) approximate the GGN (using the same samples to compute the directions and directional curvatures). Otherwise, specifies the number of MC samples used to approximate the backpropagated loss Hessian. Default: 0 (no MC approximation).
verbose – Turn on verbose mode. If enabled, this will print what’s happening during backpropagation to command line (consider it a debugging tool). Defaults to False.
warn_small_eigvals – The gamma computation and tranformation from Gram into parameter space break down for numerically small eigenvalues close to zero (need to divide by their square root). This variable triggers a user warning when attempting to compute damped Newton steps for eigenvalues whose absolute value is smaller. Defaults to 1e-4. You can disable the warning by setting it to 0 (not recommended).

get_extension_hook(param_groups: List[Dict]) → Callable[[Module], None]

Instantiate BackPACK extension hook to compute a damped Newton step.

Parameters

param_groups –

Parameter groups list as required by a torch.optim.Optimizer. Specifies the block structure: Each group must specify the 'params' key which contains a list of the parameters that form a GGN block, a 'criterion' entry that specifies a filter function to select eigenvalues as directions along which to compute the damped Newton step, and a 'damping' entry that that contains a function that produces the directional dampings (details below).

Examples for 'params':

[{'params': list(p for p in model.parameters()}] uses the full GGN (one block).
[{'params': [p]} for p in model.parameters()] uses a per-parameter block-diagonal GGN approximation.

The function specified under 'criterion' is a Callable[[Tensor], List[int]]. It receives the eigenvalues (in ascending order) and returns the indices of eigenvalues whose eigenvectors should be used as directions for the damped Newon step. Examples:

{'criterion': lambda evals: [evals.numel() - 1]} discards all directions except for the leading eigenvector.
{'criterion': lambda evals: list(range(evals.numel()))} computes damped Newton step along all Gram matrix eigenvectors.

The function specified under 'damping' is a Callable[[Tensor, Tensor, Tensor, Tensor]]. It receives the eigenvalues, Gram eigenvectors, directional gradients and directional curvatures, and returns a tensor that contains the damping values for all directions. Example:

{'damping': lambda evals, evecs, gammas, lambdas: ones_like(evals)} corresponds to constant damping \(\delta_k = 1\ \forall k\).

Returns

BackPACK extension hook, to compute damped Newton steps, that should be passed to the with backpack(...) context. The hook computes a damped Newton step during backpropagation and stores it internally (under self._newton_step).

get_extensions() → List[BackpropExtension]

Instantiate the BackPACK extensions to compute a damped Newton step.

Returns: BackPACK extensions, to compute a damped Newton step, that should be extracted and passed to the with backpack(...) context.

get_result(group: Dict) → Tuple[Tensor]

Return a damped Newton step in parameter format.

Must be called after the backward pass.

Parameters: group – Parameter group that defines the GGN block.
Returns: Damped Newton step in parameter format, i.e. the same format as group['params'].
Raises: KeyError – If there are no results for the group.