pytorch3d.implicitron.models.implicit_function.decoding_functions

decoding_functions

This file contains

modules which get used by ImplicitFunction objects for decoding an embedding defined in
space, e.g. to color or opacity.
DecoderFunctionBase and its subclasses, which wrap some of those modules, providing
some such modules as an extension point which an ImplicitFunction object could use.

class pytorch3d.implicitron.models.implicit_function.decoding_functions.DecoderActivation(value)[source]

Bases: Enum

RELU = 'relu'

SOFTPLUS = 'softplus'

SIGMOID = 'sigmoid'

IDENTITY = 'identity'

class pytorch3d.implicitron.models.implicit_function.decoding_functions.DecoderFunctionBase(*args, **kwargs)[source]

Bases: ReplaceableBase, Module

Decoding function is a torch.nn.Module which takes the embedding of a location in space and transforms it into the required quantity (for example density and color).

forward(features: Tensor, z: Tensor | None = None) → Tensor[source]

Parameters:

features (torch.Tensor) – tensor of shape (batch, …, num_in_features)
z – optional tensor to append to parts of the decoding function

Returns:

decoded_features (torch.Tensor) –

tensor of: shape (batch, …, num_out_features)

class pytorch3d.implicitron.models.implicit_function.decoding_functions.ElementwiseDecoder(*args, **kwargs)[source]

Bases: DecoderFunctionBase

Decoding function which scales the input, adds shift and then applies relu, softplus, sigmoid or nothing on its input: result = operation(input * scale + shift)

Members:

scale: a scalar with which input is multiplied before being shifted.: Defaults to 1.
shift: a scalar which is added to the scaled input before performing: the operation. Defaults to 0.
operation: which operation to perform on the transformed input. Options are:: RELU, SOFTPLUS, SIGMOID or IDENTITY. Defaults to IDENTITY.

scale: float = 1

shift: float = 0

operation: DecoderActivation = 'identity'

forward(features: Tensor, z: Tensor | None = None) → Tensor[source]

class pytorch3d.implicitron.models.implicit_function.decoding_functions.MLPWithInputSkips(*args, **kwargs)[source]

Bases: Configurable, Module

Implements the multi-layer perceptron architecture of the Neural Radiance Field.

As such, MLPWithInputSkips is a multi layer perceptron consisting of a sequence of linear layers with ReLU activations.

Additionally, for a set of predefined layers input_skips, the forward pass appends a skip tensor z to the output of the preceding layer.

Note that this follows the architecture described in the Supplementary Material (Fig. 7) of [1], for which keep the defaults for:

last_layer_bias_init to None

last_activation to “relu”

use_xavier_init to true

If you want to use this as a part of the color prediction in TensoRF model set:

last_layer_bias_init to 0
last_activation to “sigmoid”
use_xavier_init to False

References

[1] Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik

and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV2020

Members:

n_layers: The number of linear layers of the MLP. input_dim: The number of channels of the input tensor. output_dim: The number of channels of the output. skip_dim: The number of channels of the tensor z appended when

evaluating the skip layers.

hidden_dim: The number of hidden units of the MLP. input_skips: The list of layer indices at which we append the skip

tensor z.

last_layer_bias_init: If set then all the biases in the last layer: are initialized to that value.
last_activation: Which activation to use in the last layer. Options are:: “relu”, “softplus”, “sigmoid” and “identity”. Default is “relu”.
use_xavier_init: If True uses xavier init for all linear layer weights.: Otherwise the default PyTorch initialization is used. Default True.

n_layers: int = 8

input_dim: int = 39

output_dim: int = 256

skip_dim: int = 39

hidden_dim: int = 256

input_skips: Tuple[int, ...] = (5,)

skip_affine_trans: bool = False

last_layer_bias_init: float | None = None

last_activation: DecoderActivation = 'relu'

use_xavier_init: bool = True

forward(x: Tensor, z: Tensor | None = None)[source]

Parameters:

x – The input tensor of shape (…, input_dim).
z – The input skip tensor of shape (…, skip_dim) which is appended to layers whose indices are specified by input_skips.

Returns:

y – The output tensor of shape (…, output_dim).

class pytorch3d.implicitron.models.implicit_function.decoding_functions.MLPDecoder(*args, **kwargs)[source]

Bases: DecoderFunctionBase

Decoding function which uses MLPWithIputSkips to convert the embedding to output. The input_dim of the network is set from the value of input_dim member.

Members:

input_dim: dimension of input. param_groups: dictionary where keys are names of individual parameters

or module members and values are the parameter group where the parameter/member will be sorted to. “self” key is used to denote the parameter group at the module level. Possible keys, including the “self” key do not have to be defined. By default all parameters are put into “default” parameter group and have the learning rate defined in the optimizer, it can be overridden at the:

module level with “self” key, all the parameters and child
module’s parameters will be put to that parameter group

member level, which is the same as if the param_groups in that
member has key=“self” and value equal to that parameter group. This is useful if members do not have param_groups, for example torch.nn.Linear.

parameter level, parameter with the same name as the key
will be put to that parameter group.

network_args: configuration for MLPWithInputSkips

input_dim: int = 3

param_groups: Dict[str, str] = Field(name=None,type=None,default=<dataclasses._MISSING_TYPE object>,default_factory=<function MLPDecoder.<lambda>>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=<dataclasses._MISSING_TYPE object>,_field_type=None)

network: MLPWithInputSkips

forward(features: Tensor, z: Tensor | None = None) → Tensor[source]

classmethod network_tweak_args(type, args: DictConfig) → None[source]: Special method to stop get_default_args exposing member’s input_dim.

create_network_impl(type, args: DictConfig) → None[source]: Set the input dimension of the network to the input dimension of the decoding function.

class pytorch3d.implicitron.models.implicit_function.decoding_functions.TransformerWithInputSkips(n_layers: int = 8, input_dim: int = 39, output_dim: int = 256, skip_dim: int = 39, hidden_dim: int = 64, input_skips: Tuple[int, ...] = (5,), dim_down_factor: float = 1)[source]

Bases: Module

__init__(n_layers: int = 8, input_dim: int = 39, output_dim: int = 256, skip_dim: int = 39, hidden_dim: int = 64, input_skips: Tuple[int, ...] = (5,), dim_down_factor: float = 1)[source]

Parameters:

n_layers – The number of linear layers of the MLP.
input_dim – The number of channels of the input tensor.
output_dim – The number of channels of the output.
skip_dim – The number of channels of the tensor z appended when evaluating the skip layers.
hidden_dim – The number of hidden units of the MLP.
input_skips – The list of layer indices at which we append the skip tensor z.

forward(x: Tensor, z: Tensor | None = None)[source]

Parameters:

x – The input tensor of shape (minibatch, n_pooled_feats, …, n_ray_pts, input_dim).
z – The input skip tensor of shape (minibatch, n_pooled_feats, …, n_ray_pts, skip_dim) which is appended to layers whose indices are specified by input_skips.

Returns:

y –

The output tensor of shape: (minibatch, 1, …, n_ray_pts, input_dim).

class pytorch3d.implicitron.models.implicit_function.decoding_functions.TransformerEncoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, d_model_out=-1)[source]

Bases: Module

TransformerEncoderLayer is made up of self-attn and feedforward network. This standard encoder layer is based on the paper “Attention Is All You Need”. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000-6010. Users may modify or implement in a different way during application.

Parameters:

d_model – the number of expected features in the input (required).
nhead – the number of heads in the multiheadattention models (required).
dim_feedforward – the dimension of the feedforward network model (default=2048).
dropout – the dropout value (default=0.1).
activation – the activation function of intermediate layer, relu or gelu (default=relu).

Examples::

>>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
>>> src = torch.rand(10, 32, 512)
>>> out = encoder_layer(src)

forward(src, src_mask=None, src_key_padding_mask=None)[source]

Pass the input through the encoder layer.

Parameters:

src – the sequence to the encoder layer (required).
src_mask – the mask for the src sequence (optional).
src_key_padding_mask – the mask for the src keys per batch (optional).

Shape:: see the docs in Transformer class.