Computations
GGN eigenvalues
- class vivit.EigvalshComputation(subsampling: Optional[List[int]] = None, mc_samples: int = 0, verbose: bool = False)
Provide BackPACK extension and hook to compute GGN eigenvalues.
- __init__(subsampling: Optional[List[int]] = None, mc_samples: int = 0, verbose: bool = False)
Specify GGN approximations. Use no approximations by default.
Note
The loss function must use
reduction = 'mean'
.- Parameters:
subsampling – Indices of samples used for GGN curvature sub-sampling.
None
(equivalent tolist(range(batch_size))
) uses all mini-batch samples. Defaults toNone
(no curvature sub-sampling).mc_samples – If
0
, don’t Monte-Carlo (MC) approximate the GGN. Otherwise, specifies the number of MC samples used to approximate the backpropagated loss Hessian. Default:0
(no MC approximation).verbose – Turn on verbose mode. If enabled, this will print what’s happening during backpropagation to command line (consider it a debugging tool). Defaults to
False
.
- get_extension() BackpropExtension
Instantiate the BackPACK extension for computing GGN eigenvalues.
- Returns:
BackPACK extension to compute eigenvalues that should be passed to the
with backpack(...)
context.
- get_extension_hook(param_groups: List[Dict]) Callable[[Module], None]
Instantiates the BackPACK extension hook to compute GGN eigenvalues.
- Parameters:
param_groups –
Parameter groups list as required by a
torch.optim.Optimizer
. Specifies the block structure: Each group must specify the'params'
key which contains a list of the parameters that form a GGN block. Examples:[{'params': list(p for p in model.parameters()}]
computes eigenvalues for the full GGN (one block).[{'params': [p]} for p in model.parameters()]
computes eigenvalues for each block of a per-parameter block-diagonal GGN approximation.
- Returns:
BackPACK extension hook to compute eigenvalues that should be passed to the
with backpack(...)
context. The hook computes GGN eigenvalues during backpropagation and stores them internally (underself._evals
).
GGN eigenpairs (eigenvalues + eigenvector)
- class vivit.EighComputation(subsampling: Optional[List[int]] = None, mc_samples: int = 0, verbose: bool = False, warn_small_eigvals: float = 0.0001)
Provide BackPACK extension and hook to compute GGN eigenpairs.
- __init__(subsampling: Optional[List[int]] = None, mc_samples: int = 0, verbose: bool = False, warn_small_eigvals: float = 0.0001)
Specify GGN approximations. Use no approximations by default.
Note
The loss function must use
reduction = 'mean'
.- Parameters:
subsampling – Indices of samples used for GGN curvature sub-sampling.
None
(equivalent tolist(range(batch_size))
) uses all mini-batch samples. Defaults toNone
(no curvature sub-sampling).mc_samples – If
0
, don’t Monte-Carlo (MC) approximate the GGN. Otherwise, specifies the number of MC samples used to approximate the backpropagated loss Hessian. Default:0
(no MC approximation).verbose – Turn on verbose mode. If enabled, this will print what’s happening during backpropagation to command line (consider it a debugging tool). Defaults to
False
.warn_small_eigvals – The eigenvector computation breaks down for numerically small eigenvalues close to zero. This variable triggers a user warning when attempting to compute eigenvectors for eigenvalues whose absolute value is smaller. Defaults to
1e-4
. You can disable the warning by setting it to0
(not recommended).
- get_extension() BackpropExtension
Instantiate the BackPACK extension for computing GGN eigenpairs.
- Returns:
BackPACK extension to compute eigenvalues that should be passed to the
with backpack(...)
context.
- get_extension_hook(param_groups: List[Dict]) Callable[[Module], None]
Instantiates the BackPACK extension hook to compute GGN eigenpairs.
- Parameters:
param_groups –
Parameter groups list as required by a
torch.optim.Optimizer
. Specifies the block structure: Each group must specify the'params'
key which contains a list of the parameters that form a GGN block, and a'criterion'
entry that specifies a filter function to select eigenvalues for the eigenvector computation (details below).Examples for
'params'
:[{'params': list(p for p in model.parameters()}]
uses the full GGN (one block).[{'params': [p]} for p in model.parameters()]
uses a per-parameter block-diagonal GGN approximation.
The function specified under
'criterion'
is aCallable[[Tensor], List[int]]
. It receives the eigenvalues (in ascending order) and returns the indices of eigenvalues whose eigenvectors should be computed. Examples:{'criterion': lambda evals: [evals.numel() - 1]}
discards all eigenvalues except for the largest.{'criterion': lambda evals: list(range(evals.numel()))}
computes eigenvectors for all Gram matrix eigenvalues.
- Returns:
BackPACK extension hook to compute eigenpairs that should be passed to the
with backpack(...)
context. The hook computes GGN eigenpairs during backpropagation and stores them internally (underself._evals
andself._evecs
).
- get_result(group: Dict) Tuple[Tensor, List[Tensor]]
Return eigenvalues and eigenvectors of a GGN block after the backward pass.
- Parameters:
group – Parameter group that defines the GGN block.
- Returns:
One-dimensional tensor containing the block’s eigenvalues and eigenvectors in parameter list format with leading axis for the eigenvectors.
Example: Let
evals, evecs
denote the returned variables. Letgroup['params'] = [p1, p2]
consist of two parameters. Then,evecs
contains tensors of shape[(evals.numel(), *p1.shape), (evals.numel(), *p2.shape)]
. For an eigenvalueevals[k]
, the corresponding eigenvector (in list format) is[vecs[k] for vecs in evecs]
and its tensors are of shape[p1.shape, p2.shape]
.- Raises:
KeyError – If there are no results for the group.
1ˢᵗ- and 2ⁿᵈ-order directional derivatives along GGN eigenvectors
- class vivit.DirectionalDerivativesComputation(subsampling_grad: Optional[List[int]] = None, subsampling_ggn: Optional[List[int]] = None, mc_samples_ggn: Optional[int] = 0, verbose: Optional[bool] = False, warn_small_eigvals: float = 0.0001)
Provide BackPACK extension and hook for 1ˢᵗ/2ⁿᵈ-order directional derivatives.
The directions are given by the GGN eigenvectors. First-order directional derivatives are denoted
γ
, second-order directional derivatives asλ
.- __init__(subsampling_grad: Optional[List[int]] = None, subsampling_ggn: Optional[List[int]] = None, mc_samples_ggn: Optional[int] = 0, verbose: Optional[bool] = False, warn_small_eigvals: float = 0.0001)
Specify GGN and gradient approximations. Use no approximations by default.
Note
The loss function must use
reduction = 'mean'
.- Parameters:
subsampling_grad – Indices of samples used for gradient sub-sampling.
None
(equivalent tolist(range(batch_size))
) uses all mini-batch samples to compute directional gradients . Defaults toNone
(no gradient sub-sampling).subsampling_ggn – Indices of samples used for GGN curvature sub-sampling.
None
(equivalent tolist(range(batch_size))
) uses all mini-batch samples to compute directions and directional curvatures. Defaults toNone
(no curvature sub-sampling).mc_samples_ggn – If
0
, don’t Monte-Carlo (MC) approximate the GGN (using the same samples to compute the directions and directional curvatures). Otherwise, specifies the number of MC samples used to approximate the backpropagated loss Hessian. Default:0
(no MC approximation).verbose – Turn on verbose mode. If enabled, this will print what’s happening during backpropagation to command line (consider it a debugging tool). Defaults to
False
.warn_small_eigvals – The
gamma
computation breaks down for numerically small eigenvalues close to zero (we need to divide by their square root). This variable triggers a user warning when attempting to compute directional gradients for eigenvalues whose absolute value is smaller. Defaults to1e-4
. You can disable the warning by setting it to0
(not recommended).
- get_extension_hook(param_groups: List[Dict]) Callable[[Module], None]
Instantiate BackPACK extension hook to compute GGN directional derivatives.
- Parameters:
param_groups –
Parameter groups list as required by a
torch.optim.Optimizer
. Specifies the block structure: Each group must specify the'params'
key which contains a list of the parameters that form a GGN block, and a'criterion'
entry that specifies a filter function to select eigenvalues as directions along which to compute directional derivatives (details below).Examples for
'params'
:[{'params': list(p for p in model.parameters()}]
uses the full GGN (one block).[{'params': [p]} for p in model.parameters()]
uses a per-parameter block-diagonal GGN approximation.
The function specified under
'criterion'
is aCallable[[Tensor], List[int]]
. It receives the eigenvalues (in ascending order) and returns the indices of eigenvalues whose eigenvectors should be used as directions to evaluate directional derivatives. Examples:{'criterion': lambda evals: [evals.numel() - 1]}
discards all directions except for the leading eigenvector.{'criterion': lambda evals: list(range(evals.numel()))}
computes directional derivatives along all Gram matrix eigenvectors.
- Returns:
BackPACK extension hook, to compute directional derivatives, that should be passed to the
with backpack(...)
context. The hook computes GGN directional derivatives during backpropagation and stores them internally (underself._gammas
andself._lambdas
).
- get_extensions() List[BackpropExtension]
Instantiate the BackPACK extensions to compute GGN directional derivatives.
- Returns:
BackPACK extensions, to compute directional 1ˢᵗ- and 2ⁿᵈ-order directional derivatives along GGN eigenvectors, that should be extracted and passed to the
with backpack(...)
context.
- get_result(group: Dict) Tuple[Tensor, Tensor]
Return 1ˢᵗ/2ⁿᵈ-order directional derivatives along GGN eigenvectors.
Must be called after the backward pass.
- Parameters:
group – Parameter group that defines the GGN block.
- Returns:
1ˢᵗ- directional derivatives
γ
as[N, K]
tensor withγ[n, k]
the directional gradient of samplen
(orsubsampling_grad[n]
) along directionk
. 2ⁿᵈ- directional derivativesλ
as[N, K]
tensor withλ[n, k]
the directional curvature of samplen
(orsubsampling_ggn[n]
) along directionk
.- Raises:
KeyError – If there are no results for the group.
Directionally damped Newton steps
- class vivit.DirectionalDampedNewtonComputation(subsampling_grad: Optional[List[int]] = None, subsampling_ggn: Optional[List[int]] = None, mc_samples_ggn: Optional[int] = 0, verbose: Optional[bool] = False, warn_small_eigvals: float = 0.0001)
Provides BackPACK extension and hook for directionally damped Newton steps.
Let \(\{e_k\}_{k = 1, \dots, K}\) denote the \(K\) eigenvectors used for the Newton step. Let \(\gamma_k, \lambda_k\) denote the first- and second-order derivatives along these eigenvectors. The direcionally damped Newton step \(s\) is
\[s = \sum_{k=1}^K \frac{- \gamma_k}{\lambda_k + \delta_k} e_k\]Here, \(\delta_k\) is the damping along direction \(k\). The exact Newton step has zero damping.
- __init__(subsampling_grad: Optional[List[int]] = None, subsampling_ggn: Optional[List[int]] = None, mc_samples_ggn: Optional[int] = 0, verbose: Optional[bool] = False, warn_small_eigvals: float = 0.0001)
Specify GGN and gradient approximations. Use no approximations by default.
Note
The loss function must use
reduction = 'mean'
.- Parameters:
subsampling_grad – Indices of samples used for gradient sub-sampling.
None
(equivalent tolist(range(batch_size))
) uses all mini-batch samples to compute directional gradients . Defaults toNone
(no gradient sub-sampling).subsampling_ggn – Indices of samples used for GGN curvature sub-sampling.
None
(equivalent tolist(range(batch_size))
) uses all mini-batch samples to compute directions and directional curvatures. Defaults toNone
(no curvature sub-sampling).mc_samples_ggn – If
0
, don’t Monte-Carlo (MC) approximate the GGN (using the same samples to compute the directions and directional curvatures). Otherwise, specifies the number of MC samples used to approximate the backpropagated loss Hessian. Default:0
(no MC approximation).verbose – Turn on verbose mode. If enabled, this will print what’s happening during backpropagation to command line (consider it a debugging tool). Defaults to
False
.warn_small_eigvals – The
gamma
computation and tranformation from Gram into parameter space break down for numerically small eigenvalues close to zero (need to divide by their square root). This variable triggers a user warning when attempting to compute damped Newton steps for eigenvalues whose absolute value is smaller. Defaults to1e-4
. You can disable the warning by setting it to0
(not recommended).
- get_extension_hook(param_groups: List[Dict]) Callable[[Module], None]
Instantiate BackPACK extension hook to compute a damped Newton step.
- Parameters:
param_groups –
Parameter groups list as required by a
torch.optim.Optimizer
. Specifies the block structure: Each group must specify the'params'
key which contains a list of the parameters that form a GGN block, a'criterion'
entry that specifies a filter function to select eigenvalues as directions along which to compute the damped Newton step, and a'damping'
entry that that contains a function that produces the directional dampings (details below).Examples for
'params'
:[{'params': list(p for p in model.parameters()}]
uses the full GGN (one block).[{'params': [p]} for p in model.parameters()]
uses a per-parameter block-diagonal GGN approximation.
The function specified under
'criterion'
is aCallable[[Tensor], List[int]]
. It receives the eigenvalues (in ascending order) and returns the indices of eigenvalues whose eigenvectors should be used as directions for the damped Newon step. Examples:{'criterion': lambda evals: [evals.numel() - 1]}
discards all directions except for the leading eigenvector.{'criterion': lambda evals: list(range(evals.numel()))}
computes damped Newton step along all Gram matrix eigenvectors.
The function specified under
'damping'
is aCallable[[Tensor, Tensor, Tensor, Tensor]]
. It receives the eigenvalues, Gram eigenvectors, directional gradients and directional curvatures, and returns a tensor that contains the damping values for all directions. Example:{'damping': lambda evals, evecs, gammas, lambdas: ones_like(evals)}
corresponds to constant damping \(\delta_k = 1\ \forall k\).
- Returns:
BackPACK extension hook, to compute damped Newton steps, that should be passed to the
with backpack(...)
context. The hook computes a damped Newton step during backpropagation and stores it internally (underself._newton_step
).
- get_extensions() List[BackpropExtension]
Instantiate the BackPACK extensions to compute a damped Newton step.
- Returns:
BackPACK extensions, to compute a damped Newton step, that should be extracted and passed to the
with backpack(...)
context.
- get_result(group: Dict) Tuple[Tensor]
Return a damped Newton step in parameter format.
Must be called after the backward pass.
- Parameters:
group – Parameter group that defines the GGN block.
- Returns:
Damped Newton step in parameter format, i.e. the same format as
group['params']
.- Raises:
KeyError – If there are no results for the group.