pytorch3d.renderer.implicit.harmonic_embedding

harmonic_embedding

class pytorch3d.renderer.implicit.harmonic_embedding.HarmonicEmbedding(n_harmonic_functions: int = 6, omega_0: float = 1.0, logspace: bool = True, append_input: bool = True)[source]

Bases: Module

__init__(n_harmonic_functions: int = 6, omega_0: float = 1.0, logspace: bool = True, append_input: bool = True) → None[source]

The harmonic embedding layer supports the classical Nerf positional encoding described in NeRF and the integrated position encoding in MIP-NeRF.

During the inference you can provide the extra argument diag_cov.

If diag_cov is None, it converts rays parametrized with a ray_bundle to 3D points by extending each ray according to the corresponding length. Then it converts each feature (i.e. vector along the last dimension) in x into a series of harmonic features embedding, where for each i in range(dim) the following are present in embedding[…]:

[
    sin(f_1*x[..., i]),
    sin(f_2*x[..., i]),
    ...
    sin(f_N * x[..., i]),
    cos(f_1*x[..., i]),
    cos(f_2*x[..., i]),
    ...
    cos(f_N * x[..., i]),
    x[..., i],              # only present if append_input is True.
]

where N corresponds to n_harmonic_functions-1, and f_i is a scalar denoting the i-th frequency of the harmonic embedding.

If diag_cov is not None, it approximates conical frustums following a ray bundle as gaussians, defined by x, the means of the gaussians and diag_cov, the diagonal covariances. Then it converts each gaussian into a series of harmonic features embedding, where for each i in range(dim) the following are present in embedding[…]:

[
    sin(f_1*x[..., i]) * exp(0.5 * f_1**2 * diag_cov[..., i,]),
    sin(f_2*x[..., i]) * exp(0.5 * f_2**2 * diag_cov[..., i,]),
    ...
    sin(f_N * x[..., i]) * exp(0.5 * f_N**2 * diag_cov[..., i,]),
    cos(f_1*x[..., i]) * exp(0.5 * f_1**2 * diag_cov[..., i,]),
    cos(f_2*x[..., i]) * exp(0.5 * f_2**2 * diag_cov[..., i,]),,
    ...
    cos(f_N * x[..., i]) * exp(0.5 * f_N**2 * diag_cov[..., i,]),
    x[..., i],              # only present if append_input is True.
]

where N equals n_harmonic_functions-1, and f_i is a scalar denoting the i-th frequency of the harmonic embedding.

If logspace==True, the frequencies [f_1, …, f_N] are powers of 2:

f_1, …, f_N = 2**torch.arange(n_harmonic_functions)

If logspace==False, frequencies are linearly spaced between 1.0 and 2**(n_harmonic_functions-1):

`f_1, …, f_N = torch.linspace(
1.0, 2**(n_harmonic_functions-1), n_harmonic_functions

)`

Note that x is also premultiplied by the base frequency omega_0 before evaluating the harmonic functions.

Parameters:

n_harmonic_functions – int, number of harmonic features
omega_0 – float, base frequency
logspace – bool, Whether to space the frequencies in logspace or linear space
append_input – bool, whether to concat the original input to the harmonic embedding. If true the output is of the form (embed.sin(), embed.cos(), x)

forward(x: Tensor, diag_cov: Tensor | None = None, **kwargs) → Tensor[source]

Parameters:

x – tensor of shape […, dim]
diag_cov – An optional tensor of shape (…, dim) representing the diagonal covariance matrices of our Gaussians, joined with x as means of the Gaussians.

Returns:

embedding – a harmonic embedding of x of shape […, (n_harmonic_functions * 2 + int(append_input)) * num_points_per_ray]

static get_output_dim_static(input_dims: int, n_harmonic_functions: int, append_input: bool) → int[source]

Utility to help predict the shape of the output of forward.

Parameters:

input_dims – length of the last dimension of the input tensor
n_harmonic_functions – number of embedding frequencies
append_input – whether or not to concat the original input to the harmonic embedding

Returns:

int – the length of the last dimension of the output tensor

get_output_dim(input_dims: int = 3) → int[source]: Same as above. The default for input_dims is 3 for 3D applications which use harmonic embedding for positional encoding, so the input might be xyz.