pytorch3d.implicitron.models.implicit_function.voxel_grid_implicit_function

voxel_grid_implicit_function

class pytorch3d.implicitron.models.implicit_function.voxel_grid_implicit_function.VoxelGridImplicitFunction(*args, **kwargs)[source]

Bases: ImplicitFunctionBase, Module

This implicit function consists of two streams, one for the density calculation and one for the color calculation. Each of these streams has three main parts:

  1. Voxel grids:

    They take the (x, y, z) position and return the embedding of that point. These components are replaceable, you can make your own or choose one of several options.

  2. Harmonic embeddings:

    Convert each feature into series of ‘harmonic features’, feature is passed through sine and cosine functions. Input is of shape [minibatch, …, D] output [minibatch, …, (n_harmonic_functions * 2 + int(append_input)) * D]. Appends input by default. If you want it to behave like identity, put n_harmonic_functions=0 and append_input=True.

  3. Decoding functions:

    The decoder is an instance of the DecoderFunctionBase and converts the embedding of a spatial location to density/color. Examples are Identity which returns its input and the MLP which uses fully connected nerual network to transform the input. These components are replaceable, you can make your own or choose from several options.

Calculating density is done in three steps:
  1. Evaluating the voxel grid on points

  2. Embedding the outputs with harmonic embedding

  3. Passing through the Density decoder

To calculate the color we need the embedding and the viewing direction, it has five steps:
  1. Transforming the viewing direction with camera

  2. Evaluating the voxel grid on points

  3. Embedding the outputs with harmonic embedding

  4. Embedding the normalized direction with harmonic embedding

  5. Passing everything through the Color decoder

If using the Implicitron configuration system the input_dim to the decoding functions will be set to the output_dim of the Harmonic embeddings.

A speed up comes from using the scaffold, a low resolution voxel grid. The scaffold is referenced as “binary occupancy grid mask” in TensoRF paper and “AlphaMask” in official TensoRF implementation. The scaffold is used in:

  1. filtering points in empty space
    • controlled by scaffold_filter_points boolean. If set to True, points for which

      scaffold predicts that are in empty space will return 0 density and (0, 0, 0) color.

  2. calculating the bounding box of an object and cropping the voxel grids
    • controlled by volume_cropping_epochs.

    • at those epochs the implicit function will find the bounding box of an object

      inside it and crop density and color grids. Cropping of the voxel grids means preserving only voxel values that are inside the bounding box and changing the resolution to match the original, while preserving the new cropped location in world coordinates.

The scaffold has to exist before attempting filtering and cropping, and is created on scaffold_calculating_epochs. Each voxel in the scaffold is labeled as having density 1 if the point in the center of it evaluates to greater than scaffold_empty_space_threshold. 3D max pooling is performed on the densities of the points in 3D. Scaffold features are off by default.

Members:

voxel_grid_density (VoxelGridBase): voxel grid to use for density estimation voxel_grid_color (VoxelGridBase): voxel grid to use for color estimation

harmonic_embedder_xyz_density (HarmonicEmbedder): Function to transform the outputs of

the voxel_grid_density

harmonic_embedder_xyz_color (HarmonicEmbedder): Function to transform the outputs of

the voxel_grid_color for density

harmonic_embedder_dir_color (HarmonicEmbedder): Function to transform the outputs of

the voxel_grid_color for color

decoder_density (DecoderFunctionBase): decoder function to use for density estimation color_density (DecoderFunctionBase): decoder function to use for color estimation

use_multiple_streams (bool): if you want the density and color calculations to run on

different cuda streams set this to True. Default True.

xyz_ray_dir_in_camera_coords (bool): This is true if the directions are given in

camera coordinates. Default False.

voxel_grid_scaffold (VoxelGridModule): which holds the scaffold. Extents and

translation of it are set to those of voxel_grid_density.

scaffold_calculating_epochs (Tuple[int, …]): at which epochs to recalculate the

scaffold. (The scaffold will be created automatically at the beginning of the calculation.)

scaffold_resolution (Tuple[int, int, int]): (width, height, depth) of the underlying

voxel grid which stores scaffold

scaffold_empty_space_threshold (float): if self._get_density evaluates to less than

this it will be considered as empty space and the scaffold at that point would evaluate as empty space.

scaffold_occupancy_chunk_size (str or int): Number of xy scaffold planes to calculate

at the same time. To calculate the scaffold we need to query _get_density() at every voxel, this calculation can be split into scaffold depth number of xy plane calculations if you want the lowest memory usage, one calculation to calculate the whole scaffold, but with higher memory footprint or any other number of planes. Setting to a non-positive number calculates all planes at the same time. Defaults to ‘-1’ (=calculating all planes).

scaffold_max_pool_kernel_size (int): Size of the pooling region to use when

calculating the scaffold. Defaults to 3.

scaffold_filter_points (bool): If set to True the points will be filtered using

self.voxel_grid_scaffold. Filtered points will be predicted as having 0 density and (0, 0, 0) color. The points which were not evaluated as empty space will be passed through the steps outlined above.

volume_cropping_epochs: on which epochs to crop the voxel grids to fit the object’s

bounding box. Scaffold has to be calculated before cropping.

voxel_grid_density: VoxelGridModule
voxel_grid_color: VoxelGridModule
harmonic_embedder_xyz_density_args: DictConfig = Field(name=None,type=None,default=<dataclasses._MISSING_TYPE object>,default_factory=<function get_default_args_field.<locals>.create>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=<dataclasses._MISSING_TYPE object>,_field_type=None)
harmonic_embedder_xyz_color_args: DictConfig = Field(name=None,type=None,default=<dataclasses._MISSING_TYPE object>,default_factory=<function get_default_args_field.<locals>.create>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=<dataclasses._MISSING_TYPE object>,_field_type=None)
harmonic_embedder_dir_color_args: DictConfig = Field(name=None,type=None,default=<dataclasses._MISSING_TYPE object>,default_factory=<function get_default_args_field.<locals>.create>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=<dataclasses._MISSING_TYPE object>,_field_type=None)
decoder_density_class_type: str = 'MLPDecoder'
decoder_density: DecoderFunctionBase
decoder_color_class_type: str = 'MLPDecoder'
decoder_color: DecoderFunctionBase
use_multiple_streams: bool = True
xyz_ray_dir_in_camera_coords: bool = False
scaffold_calculating_epochs: Tuple[int, ...] = ()
scaffold_resolution: Tuple[int, int, int] = (128, 128, 128)
scaffold_empty_space_threshold: float = 0.001
scaffold_occupancy_chunk_size: int = -1
scaffold_max_pool_kernel_size: int = 3
scaffold_filter_points: bool = True
volume_cropping_epochs: Tuple[int, ...] = ()
forward(ray_bundle: ImplicitronRayBundle, fun_viewpool=None, camera: CamerasBase | None = None, global_code=None, **kwargs) Tuple[Tensor, Tensor, Dict][source]

The forward function accepts the parametrizations of 3D points sampled along projection rays. The forward pass is responsible for attaching a 3D vector and a 1D scalar representing the point’s RGB color and opacity respectively.

Parameters:
  • ray_bundle

    An ImplicitronRayBundle object containing the following variables: origins: A tensor of shape (minibatch, …, 3) denoting the

    origins of the sampling rays in world coords.

    directions: A tensor of shape (minibatch, …, 3)

    containing the direction vectors of sampling rays in world coords.

    lengths: A tensor of shape (minibatch, …, num_points_per_ray)

    containing the lengths at which the rays are sampled.

  • fun_viewpool

    an optional callback with the signature

    fun_fiewpool(points) -> pooled_features

    where points is a [N_TGT x N x 3] tensor of world coords, and pooled_features is a [N_TGT x … x N_SRC x latent_dim] tensor of the features pooled from the context images.

  • camera – A camera model which will be used to transform the viewing directions

Returns:

rays_densities

A tensor of shape (minibatch, …, num_points_per_ray, 1)

denoting the opacitiy of each ray point.

rays_colors: A tensor of shape (minibatch, …, num_points_per_ray, 3)

denoting the color of each ray point.

static allows_multiple_passes() bool[source]

Returns True as this implicit function allows multiple passes. Overridden from ImplicitFunctionBase.

subscribe_to_epochs() Tuple[Tuple[int, ...], Callable[[int], bool]][source]

Method which expresses interest in subscribing to optimization epoch updates. This implicit function subscribes to epochs to calculate the scaffold and to crop voxel grids, so this method combines wanted epochs and wraps their callbacks.

Returns:

list of epochs on which to call a callable and callable to be called on

particular epoch. The callable returns True if parameter change has happened else False and it must be supplied with one argument, epoch.

classmethod decoder_density_tweak_args(type_, args: DictConfig) None[source]
create_decoder_density_impl(type_, args: DictConfig) None[source]

Decoding functions come after harmonic embedding and voxel grid. In order to not calculate the input dimension of the decoder in the config file this function calculates the required input dimension and sets the input dimension of the decoding function to this value.

classmethod decoder_color_tweak_args(type_, args: DictConfig) None[source]
create_decoder_color_impl(type_, args: DictConfig) None[source]

Decoding functions come after harmonic embedding and voxel grid. In order to not calculate the input dimension of the decoder in the config file this function calculates the required input dimension and sets the input dimension of the decoding function to this value.