Welcome to PyTorch3D’s documentation!

PyTorch3D is a library of reusable components for Deep Learning with 3D data.

Table of Contents

CircleCI Anaconda-Server Badge

Introduction

PyTorch3D provides efficient, reusable components for 3D Computer Vision research with PyTorch.

Key features include:

  • Data structure for storing and manipulating triangle meshes
  • Efficient operations on triangle meshes (projective transformations, graph convolution, sampling, loss functions)
  • A differentiable mesh renderer

PyTorch3D is designed to integrate smoothly with deep learning methods for predicting and manipulating 3D data. For this reason, all operators in PyTorch3D:

  • Are implemented using PyTorch tensors
  • Can handle minibatches of hetereogenous data
  • Can be differentiated
  • Can utilize GPUs for acceleration

Within FAIR, PyTorch3D has been used to power research projects such as Mesh R-CNN.

Installation

For detailed instructions refer to INSTALL.md.

License

PyTorch3D is released under the BSD License.

Documentation

Learn more about the API by reading the PyTorch3D documentation.

We also have deep dive notes on several API components:

Overview Video

We have created a short (~14 min) video tutorial providing an overview of the PyTorch3D codebase including several code examples. Click on the image below to watch the video on YouTube:

Development

We welcome new contributions to PyTorch3D and we will be actively maintaining this library! Please refer to CONTRIBUTING.md for full instructions on how to run the code, tests and linter, and submit your pull requests.

Development and Compatibility

  • main branch: actively developed, without any guarantee, Anything can be broken at any time
    • REMARK: this includes nightly builds which are built from main
    • HINT: the commit history can help locate regressions or changes
  • backward-compatibility between releases: no guarantee. Best efforts to communicate breaking changes and facilitate migration of code or data (incl. models).

Contributors

PyTorch3D is written and maintained by the Facebook AI Research Computer Vision Team.

In alphabetical order:

  • Amitav Baruah
  • Steve Branson
  • Luya Gao
  • Georgia Gkioxari
  • Taylor Gordon
  • Justin Johnson
  • Patrick Labtut
  • Christoph Lassner
  • Wan-Yen Lo
  • David Novotny
  • Nikhila Ravi
  • Jeremy Reizenstein
  • Dave Schnizlein
  • Roman Shapovalov
  • Olivia Wiles

Citation

If you find PyTorch3D useful in your research, please cite our tech report:

@article{ravi2020pytorch3d,
    author = {Nikhila Ravi and Jeremy Reizenstein and David Novotny and Taylor Gordon
                  and Wan-Yen Lo and Justin Johnson and Georgia Gkioxari},
    title = {Accelerating 3D Deep Learning with PyTorch3D},
    journal = {arXiv:2007.08501},
    year = {2020},
}

If you are using the pulsar backend for sphere-rendering (the PulsarPointRenderer or pytorch3d.renderer.points.pulsar.Renderer), please cite the tech report:

@article{lassner2020pulsar,
    author = {Christoph Lassner and Michael Zollh\"ofer},
    title = {Pulsar: Efficient Sphere-based Neural Rendering},
    journal = {arXiv:2004.07484},
    year = {2020},
}

News

Please see below for a timeline of the codebase updates in reverse chronological order. We are sharing updates on the releases as well as research projects which are built with PyTorch3D. The changelogs for the releases are available under Releases, and the builds can be installed using conda as per the instructions in INSTALL.md.

[Feb 9th 2021]: PyTorch3D v0.4.0 released with support for implicit functions, volume rendering and a reimplementation of NeRF.

[November 2nd 2020]: PyTorch3D v0.3.0 released, integrating the pulsar backend.

[Aug 28th 2020]: PyTorch3D v0.2.5 released

[July 17th 2020]: PyTorch3D tech report published on ArXiv: https://arxiv.org/abs/2007.08501

[April 24th 2020]: PyTorch3D v0.2.0 released

[March 25th 2020]: SynSin codebase released using PyTorch3D: https://github.com/facebookresearch/synsin

[March 8th 2020]: PyTorch3D v0.1.1 bug fix release

[Jan 23rd 2020]: PyTorch3D v0.1.0 released. Mesh R-CNN codebase released: https://github.com/facebookresearch/meshrcnn

API Documentation

pytorch3d.common

pytorch3d.common.get_device(x, device: Union[str, torch.device, None] = None) → torch.device[source]

Gets the device of the specified variable x if it is a tensor, or falls back to a default CPU device otherwise. Allows overriding by providing an explicit device.

Parameters:
  • x – a torch.Tensor to get the device from or another type
  • device – Device (as str or torch.device) to fall back to
Returns:

A matching torch.device object

pytorch3d.common.make_device(device: Union[str, torch.device]) → torch.device[source]

Makes an actual torch.device object from the device specified as either a string or torch.device object. If the device is cuda without a specific index, the index of the current device is assigned.

Parameters:device – Device (as str or torch.device)
Returns:A matching torch.device object

pytorch3d.structures

class pytorch3d.structures.Meshes(verts=None, faces=None, textures=None, *, verts_normals=None)[source]

This class provides functions for working with batches of triangulated meshes with varying numbers of faces and vertices, and converting between representations.

Within Meshes, there are three different representations of the faces and verts data:

List
  • only used for input as a starting point to convert to other representations.
Padded
  • has specific batch dimension.
Packed
  • no batch dimension.
  • has auxiliary variables used to index into the padded representation.

Example:

Input list of verts V_n = [[V_1], [V_2], … , [V_N]] where V_1, … , V_N are the number of verts in each mesh and N is the number of meshes.

Input list of faces F_n = [[F_1], [F_2], … , [F_N]] where F_1, … , F_N are the number of faces in each mesh.

From the faces, edges are computed and have packed and padded representations with auxiliary variables.

E_n = [[E_1], … , [E_N]] where E_1, … , E_N are the number of unique edges in each mesh. Total number of unique edges = sum(E_n)

__init__(verts=None, faces=None, textures=None, *, verts_normals=None) → None[source]
Parameters:
  • verts

    Can be either

    • List where each element is a tensor of shape (num_verts, 3) containing the (x, y, z) coordinates of each vertex.
    • Padded float tensor with shape (num_meshes, max_num_verts, 3). Meshes should be padded with fill value of 0 so they all have the same number of vertices.
  • faces

    Can be either

    • List where each element is a tensor of shape (num_faces, 3) containing the indices of the 3 vertices in the corresponding mesh in verts which form the triangular face.
    • Padded long tensor of shape (num_meshes, max_num_faces, 3). Meshes should be padded with fill value of -1 so they have the same number of faces.
  • textures – Optional instance of the Textures class with mesh texture properties.
  • verts_normals

    Optional. Can be either

    • List where each element is a tensor of shape (num_verts, 3) containing the normals of each vertex.
    • Padded float tensor with shape (num_meshes, max_num_verts, 3). They should be padded with fill value of 0 so they all have the same number of vertices.

    Note that modifying the mesh later, e.g. with offset_verts_, can cause these normals to be forgotten and normals to be recalculated based on the new vertex positions.

Refer to comments above for descriptions of List and Padded representations.

__getitem__(index) → pytorch3d.structures.meshes.Meshes[source]
Parameters:index – Specifying the index of the mesh to retrieve. Can be an int, slice, list of ints or a boolean tensor.
Returns:Meshes object with selected meshes. The mesh tensors are not cloned.
isempty() → bool[source]

Checks whether any mesh is valid.

Returns:bool indicating whether there is any data.
verts_list()[source]

Get the list representation of the vertices.

Returns:list of tensors of vertices of shape (V_n, 3).
faces_list()[source]

Get the list representation of the faces.

Returns:list of tensors of faces of shape (F_n, 3).
verts_packed()[source]

Get the packed representation of the vertices.

Returns:tensor of vertices of shape (sum(V_n), 3).
verts_packed_to_mesh_idx()[source]

Return a 1D tensor with the same first dimension as verts_packed. verts_packed_to_mesh_idx[i] gives the index of the mesh which contains verts_packed[i].

Returns:1D tensor of indices.
mesh_to_verts_packed_first_idx()[source]

Return a 1D tensor x with length equal to the number of meshes such that the first vertex of the ith mesh is verts_packed[x[i]].

Returns:1D tensor of indices of first items.
num_verts_per_mesh()[source]

Return a 1D tensor x with length equal to the number of meshes giving the number of vertices in each mesh.

Returns:1D tensor of sizes.
faces_packed()[source]

Get the packed representation of the faces. Faces are given by the indices of the three vertices in verts_packed.

Returns:tensor of faces of shape (sum(F_n), 3).
faces_packed_to_mesh_idx()[source]

Return a 1D tensor with the same first dimension as faces_packed. faces_packed_to_mesh_idx[i] gives the index of the mesh which contains faces_packed[i].

Returns:1D tensor of indices.
mesh_to_faces_packed_first_idx()[source]

Return a 1D tensor x with length equal to the number of meshes such that the first face of the ith mesh is faces_packed[x[i]].

Returns:1D tensor of indices of first items.
verts_padded()[source]

Get the padded representation of the vertices.

Returns:tensor of vertices of shape (N, max(V_n), 3).
faces_padded()[source]

Get the padded representation of the faces.

Returns:tensor of faces of shape (N, max(F_n), 3).
num_faces_per_mesh()[source]

Return a 1D tensor x with length equal to the number of meshes giving the number of faces in each mesh.

Returns:1D tensor of sizes.
edges_packed()[source]

Get the packed representation of the edges.

Returns:tensor of edges of shape (sum(E_n), 2).
edges_packed_to_mesh_idx()[source]

Return a 1D tensor with the same first dimension as edges_packed. edges_packed_to_mesh_idx[i] gives the index of the mesh which contains edges_packed[i].

Returns:1D tensor of indices.
mesh_to_edges_packed_first_idx()[source]

Return a 1D tensor x with length equal to the number of meshes such that the first edge of the ith mesh is edges_packed[x[i]].

Returns:1D tensor of indices of first items.
faces_packed_to_edges_packed()[source]

Get the packed representation of the faces in terms of edges. Faces are given by the indices of the three edges in the packed representation of the edges.

Returns:tensor of faces of shape (sum(F_n), 3).
num_edges_per_mesh()[source]

Return a 1D tensor x with length equal to the number of meshes giving the number of edges in each mesh.

Returns:1D tensor of sizes.
verts_padded_to_packed_idx()[source]

Return a 1D tensor x with length equal to the total number of vertices such that verts_packed()[i] is element x[i] of the flattened padded representation. The packed representation can be calculated as follows.

p = verts_padded().reshape(-1, 3)
verts_packed = p[x]
Returns:1D tensor of indices.
has_verts_normals() → bool[source]

Check whether vertex normals are already present.

verts_normals_packed()[source]

Get the packed representation of the vertex normals.

Returns:tensor of normals of shape (sum(V_n), 3).
verts_normals_list()[source]

Get the list representation of the vertex normals.

Returns:list of tensors of normals of shape (V_n, 3).
verts_normals_padded()[source]

Get the padded representation of the vertex normals.

Returns:tensor of normals of shape (N, max(V_n), 3).
faces_normals_packed()[source]

Get the packed representation of the face normals.

Returns:tensor of normals of shape (sum(F_n), 3).
faces_normals_list()[source]

Get the list representation of the face normals.

Returns:list of tensors of normals of shape (F_n, 3).
faces_normals_padded()[source]

Get the padded representation of the face normals.

Returns:tensor of normals of shape (N, max(F_n), 3).
faces_areas_packed()[source]

Get the packed representation of the face areas.

Returns:tensor of areas of shape (sum(F_n),).
laplacian_packed()[source]
clone()[source]

Deep copy of Meshes object. All internal tensors are cloned individually.

Returns:new Meshes object.
detach()[source]

Detach Meshes object. All internal tensors are detached individually.

Returns:new Meshes object.
to(device: Union[str, torch.device], copy: bool = False)[source]

Match functionality of torch.Tensor.to() If copy = True or the self Tensor is on a different device, the returned tensor is a copy of self with the desired torch.device. If copy = False and the self Tensor already has the correct torch.device, then self is returned.

Parameters:
  • device – Device (as str or torch.device) for the new tensor.
  • copy – Boolean indicator whether or not to clone self. Default False.
Returns:

Meshes object.

cpu()[source]
cuda()[source]
get_mesh_verts_faces(index: int)[source]

Get tensors for a single mesh from the list representation.

Parameters:index – Integer in the range [0, N).
Returns:verts – Tensor of shape (V, 3). faces: LongTensor of shape (F, 3).
split(split_sizes: list)[source]

Splits Meshes object of size N into a list of Meshes objects of size len(split_sizes), where the i-th Meshes object is of size split_sizes[i]. Similar to torch.split().

Parameters:split_sizes – List of integer sizes of Meshes objects to be returned.
Returns:list[Meshes].
offset_verts_(vert_offsets_packed)[source]

Add an offset to the vertices of this Meshes. In place operation. If normals are present they may be recalculated.

Parameters:vert_offsets_packed – A Tensor of shape (3,) or the same shape as self.verts_packed, giving offsets to be added to all vertices.
Returns:self.
offset_verts(vert_offsets_packed)[source]

Out of place offset_verts.

Parameters:vert_offsets_packed – A Tensor of the same shape as self.verts_packed giving offsets to be added to all vertices.
Returns:new Meshes object.
scale_verts_(scale)[source]

Multiply the vertices of this Meshes object by a scalar value. In place operation.

Parameters:scale – A scalar, or a Tensor of shape (N,).
Returns:self.
scale_verts(scale)[source]

Out of place scale_verts.

Parameters:scale – A scalar, or a Tensor of shape (N,).
Returns:new Meshes object.
update_padded(new_verts_padded)[source]

This function allows for an update of verts_padded without having to explicitly convert it to the list representation for heterogeneous batches. Returns a Meshes structure with updated padded tensors and copies of the auxiliary tensors at construction time. It updates self._verts_padded with new_verts_padded, and does a shallow copy of (faces_padded, faces_list, num_verts_per_mesh, num_faces_per_mesh). If packed representations are computed in self, they are updated as well.

Parameters:new_points_padded – FloatTensor of shape (N, V, 3)
Returns:Meshes with updated padded representations
get_bounding_boxes()[source]

Compute an axis-aligned bounding box for each mesh in this Meshes object.

Returns:bboxes – Tensor of shape (N, 3, 2) where bbox[i, j] gives the min and max values of mesh i along the jth coordinate axis.
extend(N: int)[source]

Create new Meshes class which contains each input mesh N times

Parameters:N – number of new copies of each mesh.
Returns:new Meshes object.
sample_textures(fragments)[source]
pytorch3d.structures.join_meshes_as_batch(meshes: List[pytorch3d.structures.meshes.Meshes], include_textures: bool = True)[source]

Merge multiple Meshes objects, i.e. concatenate the meshes objects. They must all be on the same device. If include_textures is true, they must all be compatible, either all or none having textures, and all the Textures objects being the same type. If include_textures is False, textures are ignored.

If the textures are TexturesAtlas then being the same type includes having the same resolution. If they are TexturesUV then it includes having the same align_corners and padding_mode.

Parameters:
  • meshes – list of meshes.
  • include_textures – (bool) whether to try to join the textures.
Returns:

new Meshes object containing all the meshes from all the inputs.

pytorch3d.structures.join_meshes_as_scene(meshes: Union[pytorch3d.structures.meshes.Meshes, List[pytorch3d.structures.meshes.Meshes]], include_textures: bool = True) → pytorch3d.structures.meshes.Meshes[source]

Joins a batch of meshes in the form of a Meshes object or a list of Meshes objects as a single mesh. If the input is a list, the Meshes objects in the list must all be on the same device. Unless include_textures is False, the meshes must all have the same type of texture or must all not have textures.

If textures are included, then the textures are joined as a single scene in addition to the meshes. For this, texture types have an appropriate method called join_scene which joins mesh textures into a single texture. If the textures are TexturesAtlas then they must have the same resolution. If they are TexturesUV then they must have the same align_corners and padding_mode. Values in verts_uvs outside [0, 1] will not be respected.

Parameters:
  • meshes – Meshes object that contains a batch of meshes, or a list of Meshes objects.
  • include_textures – (bool) whether to try to join the textures.
Returns:

new Meshes object containing a single mesh

class pytorch3d.structures.Pointclouds(points, normals=None, features=None)[source]

This class provides functions for working with batches of 3d point clouds, and converting between representations.

Within Pointclouds, there are three different representations of the data.

List
  • only used for input as a starting point to convert to other representations.
Padded
  • has specific batch dimension.
Packed
  • no batch dimension.
  • has auxiliary variables used to index into the padded representation.

Example

Input list of points = [[P_1], [P_2], … , [P_N]] where P_1, … , P_N are the number of points in each cloud and N is the number of clouds.

__init__(points, normals=None, features=None) → None[source]
Parameters:
  • points

    Can be either

    • List where each element is a tensor of shape (num_points, 3) containing the (x, y, z) coordinates of each point.
    • Padded float tensor with shape (num_clouds, num_points, 3).
  • normals

    Can be either

    • List where each element is a tensor of shape (num_points, 3) containing the normal vector for each point.
    • Padded float tensor of shape (num_clouds, num_points, 3).
  • features

    Can be either

    • List where each element is a tensor of shape (num_points, C) containing the features for the points in the cloud.
    • Padded float tensor of shape (num_clouds, num_points, C).

    where C is the number of channels in the features. For example 3 for RGB color.

Refer to comments above for descriptions of List and Padded representations.

__getitem__(index) → pytorch3d.structures.pointclouds.Pointclouds[source]
Parameters:index – Specifying the index of the cloud to retrieve. Can be an int, slice, list of ints or a boolean tensor.
Returns:Pointclouds object with selected clouds. The tensors are not cloned.
isempty() → bool[source]

Checks whether any cloud is valid.

Returns:bool indicating whether there is any data.
points_list()[source]

Get the list representation of the points.

Returns:list of tensors of points of shape (P_n, 3).
normals_list()[source]

Get the list representation of the normals.

Returns:list of tensors of normals of shape (P_n, 3).
features_list()[source]

Get the list representation of the features.

Returns:list of tensors of features of shape (P_n, C).
points_packed()[source]

Get the packed representation of the points.

Returns:tensor of points of shape (sum(P_n), 3).
normals_packed()[source]

Get the packed representation of the normals.

Returns:tensor of normals of shape (sum(P_n), 3).
features_packed()[source]

Get the packed representation of the features.

Returns:tensor of features of shape (sum(P_n), C).
packed_to_cloud_idx()[source]

Return a 1D tensor x with length equal to the total number of points. packed_to_cloud_idx()[i] gives the index of the cloud which contains points_packed()[i].

Returns:1D tensor of indices.
cloud_to_packed_first_idx()[source]

Return a 1D tensor x with length equal to the number of clouds such that the first point of the ith cloud is points_packed[x[i]].

Returns:1D tensor of indices of first items.
num_points_per_cloud()[source]

Return a 1D tensor x with length equal to the number of clouds giving the number of points in each cloud.

Returns:1D tensor of sizes.
points_padded()[source]

Get the padded representation of the points.

Returns:tensor of points of shape (N, max(P_n), 3).
normals_padded()[source]

Get the padded representation of the normals.

Returns:tensor of normals of shape (N, max(P_n), 3).
features_padded()[source]

Get the padded representation of the features.

Returns:tensor of features of shape (N, max(P_n), 3).
padded_to_packed_idx()[source]

Return a 1D tensor x with length equal to the total number of points such that points_packed()[i] is element x[i] of the flattened padded representation. The packed representation can be calculated as follows.

p = points_padded().reshape(-1, 3)
points_packed = p[x]
Returns:1D tensor of indices.
clone()[source]

Deep copy of Pointclouds object. All internal tensors are cloned individually.

Returns:new Pointclouds object.
detach()[source]

Detach Pointclouds object. All internal tensors are detached individually.

Returns:new Pointclouds object.
to(device: Union[str, torch.device], copy: bool = False)[source]

Match functionality of torch.Tensor.to() If copy = True or the self Tensor is on a different device, the returned tensor is a copy of self with the desired torch.device. If copy = False and the self Tensor already has the correct torch.device, then self is returned.

Parameters:
  • device – Device (as str or torch.device) for the new tensor.
  • copy – Boolean indicator whether or not to clone self. Default False.
Returns:

Pointclouds object.

cpu()[source]
cuda()[source]
get_cloud(index: int)[source]

Get tensors for a single cloud from the list representation.

Parameters:index – Integer in the range [0, N).
Returns:points – Tensor of shape (P, 3). normals: Tensor of shape (P, 3) features: LongTensor of shape (P, C).
split(split_sizes: list)[source]

Splits Pointclouds object of size N into a list of Pointclouds objects of size len(split_sizes), where the i-th Pointclouds object is of size split_sizes[i]. Similar to torch.split().

Parameters:
  • split_sizes – List of integer sizes of Pointclouds objects to be
  • returned.
Returns:

list[Pointclouds].

offset_(offsets_packed)[source]

Translate the point clouds by an offset. In place operation.

Parameters:offsets_packed – A Tensor of shape (3,) or the same shape as self.points_packed giving offsets to be added to all points.
Returns:self.
offset(offsets_packed)[source]

Out of place offset.

Parameters:offsets_packed – A Tensor of the same shape as self.points_packed giving offsets to be added to all points.
Returns:new Pointclouds object.
subsample(max_points: Union[int, Sequence[int]]) → pytorch3d.structures.pointclouds.Pointclouds[source]

Subsample each cloud so that it has at most max_points points.

Parameters:max_points – maximum number of points in each cloud.
Returns:new Pointclouds object, or self if nothing to be done.
scale_(scale)[source]

Multiply the coordinates of this object by a scalar value. - i.e. enlarge/dilate In place operation.

Parameters:scale – A scalar, or a Tensor of shape (N,).
Returns:self.
scale(scale)[source]

Out of place scale_.

Parameters:scale – A scalar, or a Tensor of shape (N,).
Returns:new Pointclouds object.
get_bounding_boxes()[source]

Compute an axis-aligned bounding box for each cloud.

Returns:bboxes – Tensor of shape (N, 3, 2) where bbox[i, j] gives the min and max values of cloud i along the jth coordinate axis.
estimate_normals(neighborhood_size: int = 50, disambiguate_directions: bool = True, assign_to_self: bool = False)[source]

Estimates the normals of each point in each cloud and assigns them to the internal tensors self._normals_list and self._normals_padded

The function uses ops.estimate_pointcloud_local_coord_frames to estimate the normals. Please refer to that function for more detailed information about the implemented algorithm.

Parameters:
  • **neighborhood_size** – The size of the neighborhood used to estimate the geometry around each point.
  • **disambiguate_directions** – If True, uses the algorithm from [1] to ensure sign consistency of the normals of neighboring points.
  • **normals** – A tensor of normals for each input point of shape (minibatch, num_point, 3). If pointclouds are of Pointclouds class, returns a padded tensor.
  • **assign_to_self** – If True, assigns the computed normals to the internal buffers overwriting any previously stored normals.

References

[1] Tombari, Salti, Di Stefano: Unique Signatures of Histograms for Local Surface Description, ECCV 2010.

extend(N: int)[source]

Create new Pointclouds which contains each cloud N times.

Parameters:N – number of new copies of each cloud.
Returns:new Pointclouds object.
update_padded(new_points_padded, new_normals_padded=None, new_features_padded=None)[source]

Returns a Pointcloud structure with updated padded tensors and copies of the auxiliary tensors. This function allows for an update of points_padded (and normals and features) without having to explicitly convert it to the list representation for heterogeneous batches.

Parameters:
  • new_points_padded – FloatTensor of shape (N, P, 3)
  • new_normals_padded – (optional) FloatTensor of shape (N, P, 3)
  • new_features_padded – (optional) FloatTensor of shape (N, P, C)
Returns:

Pointcloud with updated padded representations

inside_box(box)[source]

Finds the points inside a 3D box.

Parameters:box

FloatTensor of shape (2, 3) or (N, 2, 3) where N is the number of clouds.

box[…, 0, :] gives the min x, y & z. box[…, 1, :] gives the max x, y & z.
Returns:idx
BoolTensor of length sum(P_i) indicating whether the packed points are
within the input box.
pytorch3d.structures.list_to_packed(x: List[torch.Tensor])[source]

Transforms a list of N tensors each of shape (Mi, K, …) into a single tensor of shape (sum(Mi), K, …).

Parameters:x – list of tensors.
Returns:4-element tuple containing
  • x_packed: tensor consisting of packed input tensors along the 1st dimension.
  • num_items: tensor of shape N containing Mi for each element in x.
  • item_packed_first_idx: tensor of shape N indicating the index of the first item belonging to the same element in the original list.
  • item_packed_to_list_idx: tensor of shape sum(Mi) containing the index of the element in the list the item belongs to.
pytorch3d.structures.list_to_padded(x: Union[List[torch.Tensor], Tuple[torch.Tensor]], pad_size: Optional[Sequence[int]] = None, pad_value: float = 0.0, equisized: bool = False) → torch.Tensor[source]

Transforms a list of N tensors each of shape (Si_0, Si_1, … Si_D) into: - a single tensor of shape (N, pad_size(0), pad_size(1), …, pad_size(D))

if pad_size is provided
  • or a tensor of shape (N, max(Si_0), max(Si_1), …, max(Si_D)) if pad_size is None.
Parameters:
  • x – list of Tensors
  • pad_size – list(int) specifying the size of the padded tensor. If None (default), the largest size of each dimension is set as the pad_size.
  • pad_value – float value to be used to fill the padded tensor
  • equisized – bool indicating whether the items in x are of equal size (sometimes this is known and if provided saves computation)
Returns:

x_padded

tensor consisting of padded input tensors stored

over the newly allocated memory.

pytorch3d.structures.packed_to_list(x: torch.Tensor, split_size: Union[list, int])[source]

Transforms a tensor of shape (sum(Mi), K, L, …) to N set of tensors of shape (Mi, K, L, …) where Mi’s are defined in split_size

Parameters:
  • x – tensor
  • split_size – list, tuple or int defining the number of items for each tensor in the output list.
Returns:

x_list – A list of Tensors

pytorch3d.structures.padded_to_list(x: torch.Tensor, split_size: Union[Sequence[int], Sequence[Sequence[int]], None] = None)[source]

Transforms a padded tensor of shape (N, S_1, S_2, …, S_D) into a list of N tensors of shape: - (Si_1, Si_2, …, Si_D) where (Si_1, Si_2, …, Si_D) is specified in split_size(i) - or (S_1, S_2, …, S_D) if split_size is None - or (Si_1, S_2, …, S_D) if split_size(i) is an integer.

Parameters:
  • x – tensor
  • split_size – optional 1D or 2D list/tuple of ints defining the number of items for each tensor.
Returns:

x_list – a list of tensors sharing the memory with the input.

class pytorch3d.structures.Volumes(densities: Union[torch.Tensor, List[torch.Tensor], Tuple[torch.Tensor]], features: Union[torch.Tensor, List[torch.Tensor], Tuple[torch.Tensor], None] = None, voxel_size: Union[int, float, torch.Tensor, Tuple[Union[int, float], ...], List[Union[int, float]]] = 1.0, volume_translation: Union[torch.Tensor, Tuple[Union[int, float], ...], List[Union[int, float]]] = (0.0, 0.0, 0.0))[source]

This class provides functions for working with batches of volumetric grids of possibly varying spatial sizes.

VOLUME DENSITIES

The Volumes class can be either constructed from a 5D tensor of densities of size batch x density_dim x depth x height x width or from a list of differently-sized 4D tensors [D_1, …, D_batch], where each D_i is of size [density_dim x depth_i x height_i x width_i].

In case the Volumes object is initialized from the list of densities, the list of tensors is internally converted to a single 5D tensor by zero-padding the relevant dimensions. Both list and padded representations can be accessed with the Volumes.densities() or Volumes.densities_list() getters. The sizes of the individual volumes in the structure can be retrieved with the Volumes.get_grid_sizes() getter.

The Volumes class is immutable. I.e. after generating a Volumes object, one cannot change its properties, such as self._densities or self._features anymore.

VOLUME FEATURES

While the densities field is intended to represent various measures of the “density” of the volume cells (opacity, signed/unsigned distances from the nearest surface, …), one can additionally initialize the object with the features argument. features are either a 5D tensor of shape batch x feature_dim x depth x height x width or a list of of differently-sized 4D tensors [F_1, …, F_batch], where each F_i is of size [feature_dim x depth_i x height_i x width_i]. features are intended to describe other properties of volume cells, such as per-voxel 3D vectors of RGB colors that can be later used for rendering the volume.

VOLUME COORDINATES

Additionally, the Volumes class keeps track of the locations of the centers of the volume cells in the local volume coordinates as well as in the world coordinates.

Local coordinates:
  • Represent the locations of the volume cells in the local coordinate frame of the volume.
  • The center of the voxel indexed with [·, ·, 0, 0, 0] in the volume has its 3D local coordinate set to [-1, -1, -1], while the voxel at index [·, ·, depth_i-1, height_i-1, width_i-1] has its 3D local coordinate set to [1, 1, 1].
  • The first/second/third coordinate of each of the 3D per-voxel XYZ vector denotes the horizontal/vertical/depth-wise position respectively. I.e the order of the coordinate dimensions in the volume is reversed w.r.t. the order of the 3D coordinate vectors.
  • The intermediate coordinates between [-1, -1, -1] and [1, 1, 1]. are linearly interpolated over the spatial dimensions of the volume.
  • Note that the convention is the same as for the 5D version of the torch.nn.functional.grid_sample function called with align_corners==True.
  • Note that the local coordinate convention of Volumes (+X = left to right, +Y = top to bottom, +Z = away from the user) is different from the world coordinate convention of the renderer for Meshes or Pointclouds (+X = right to left, +Y = bottom to top, +Z = away from the user).
World coordinates:
  • These define the locations of the centers of the volume cells in the world coordinates.

  • They are specified with the following mapping that converts points x_local in the local coordinates to points x_world in the world coordinates:

    ``` x_world = (

    x_local * (volume_size - 1) * 0.5 * voxel_size

    ) - volume_translation, ```

    here voxel_size specifies the size of each voxel of the volume, and volume_translation is the 3D offset of the central voxel of the volume w.r.t. the origin of the world coordinate frame. Both voxel_size and volume_translation are specified in the world coordinate units. volume_size is the spatial size of the volume in form of a 3D vector [width, height, depth].

  • Given the above definition of x_world, one can derive the inverse mapping from x_world to x_local as follows:

    ``` x_local = (

    (x_world + volume_translation) / (0.5 * voxel_size)

    ) / (volume_size - 1) ```

  • For a trivial volume with volume_translation==[0, 0, 0] with voxel_size=-1, x_world would range from -(volume_size-1)/2` to +(volume_size-1)/2.

Coordinate tensors that denote the locations of each of the volume cells in local / world coordinates (with shape (depth x height x width x 3)) can be retrieved by calling the Volumes.get_coord_grid() getter with the appropriate world_coordinates argument.

Internally, the mapping between x_local and x_world is represented as a Transform3D object Volumes._local_to_world_transform. Users can access the relevant transformations with the Volumes.get_world_to_local_coords_transform() and Volumes.get_local_to_world_coords_transform() functions.

Example coordinate conversion:
  • For a “trivial” volume with voxel_size = 1., volume_translation=[0., 0., 0.], and the spatial size of DxHxW = 5x5x5, the point x_world = (-2, 0, 2) gets mapped to x_local=(-1, 0, 1).

  • For a “trivial” volume v with voxel_size = 1., volume_translation=[0., 0., 0.], the following holds:

    ``` torch.nn.functional.grid_sample(

    v.densities(), v.get_coord_grid(world_coordinates=False), align_corners=True,

    ) == v.densities(), ` i.e. sampling the volume at trivial local coordinates (no scaling with `voxel_size or shift with volume_translation) results in the same volume.

__init__(densities: Union[torch.Tensor, List[torch.Tensor], Tuple[torch.Tensor]], features: Union[torch.Tensor, List[torch.Tensor], Tuple[torch.Tensor], None] = None, voxel_size: Union[int, float, torch.Tensor, Tuple[Union[int, float], ...], List[Union[int, float]]] = 1.0, volume_translation: Union[torch.Tensor, Tuple[Union[int, float], ...], List[Union[int, float]]] = (0.0, 0.0, 0.0)) → None[source]
Parameters:
  • **densities** – Batch of input feature volume occupancies of shape (minibatch, density_dim, depth, height, width), or a list of 4D tensors [D_1, …, D_minibatch] where each D_i has shape (density_dim, depth_i, height_i, width_i). Typically, each voxel contains a non-negative number corresponding to its opaqueness.
  • **features** – Batch of input feature volumes of shape: (minibatch, feature_dim, depth, height, width) or a list of 4D tensors [F_1, …, F_minibatch] where each F_i has shape (feature_dim, depth_i, height_i, width_i). The field is optional and can be set to None in case features are not required.
  • **voxel_size** – Denotes the size of each volume voxel in world units. Has to be one of: a) A scalar (square voxels) b) 3-tuple or a 3-list of scalars c) a Tensor of shape (3,) d) a Tensor of shape (minibatch, 3) e) a Tensor of shape (minibatch, 1) f) a Tensor of shape (1,) (square voxels)
  • **volume_translation** – Denotes the 3D translation of the center of the volume in world units. Has to be one of: a) 3-tuple or a 3-list of scalars b) a Tensor of shape (3,) c) a Tensor of shape (minibatch, 3) d) a Tensor of shape (1,) (square voxels)
get_coord_grid(world_coordinates: bool = True) → torch.Tensor[source]

Return the 3D coordinate grid of the volumetric grid in local (world_coordinates=False) or world coordinates (world_coordinates=True).

The grid records location of each center of the corresponding volume voxel.

Local coordinates are scaled s.t. the values along one side of the volume are in range [-1, 1].

Parameters:**world_coordinates** – if True, the method returns the grid in the world coordinates, otherwise, in local coordinates.
Returns:*coordinate_grid*
The grid of coordinates of shape
(minibatch, depth, height, width, 3), where minibatch, height, width and depth are the batch size, height, width and depth of the volume features or densities.
get_local_to_world_coords_transform() → pytorch3d.transforms.transform3d.Transform3d[source]

Return a Transform3d object that converts points in the the local coordinate frame of the volume to world coordinates. Local volume coordinates are scaled s.t. the coordinates along one side of the volume are in range [-1, 1].

Returns:*local_to_world_transform*
A Transform3d object converting
points from local coordinates to the world coordinates.
get_world_to_local_coords_transform() → pytorch3d.transforms.transform3d.Transform3d[source]

Return a Transform3d object that converts points in the world coordinates to the local coordinate frame of the volume. Local volume coordinates are scaled s.t. the coordinates along one side of the volume are in range [-1, 1].

Returns:*world_to_local_transform*
A Transform3d object converting
points from world coordinates to local coordinates.
world_to_local_coords(points_3d_world: torch.Tensor) → torch.Tensor[source]

Convert a batch of 3D point coordinates points_3d_world of shape (minibatch, …, dim) in the world coordinates to the local coordinate frame of the volume. Local volume coordinates are scaled s.t. the coordinates along one side of the volume are in range [-1, 1].

Parameters:**points_3d_world** – A tensor of shape (minibatch, …, 3) containing the 3D coordinates of a set of points that will be converted from the local volume coordinates (ranging within [-1, 1]) to the world coordinates defined by the self.center and self.voxel_size parameters.
Returns:*points_3d_local*
points_3d_world converted to the local
volume coordinates of shape (minibatch, …, 3).
local_to_world_coords(points_3d_local: torch.Tensor) → torch.Tensor[source]

Convert a batch of 3D point coordinates points_3d_local of shape (minibatch, …, dim) in the local coordinate frame of the volume to the world coordinates.

Parameters:**points_3d_local** – A tensor of shape (minibatch, …, 3) containing the 3D coordinates of a set of points that will be converted from the local volume coordinates (ranging within [-1, 1]) to the world coordinates defined by the self.center and self.voxel_size parameters.
Returns:*points_3d_world*
points_3d_local converted to the world
coordinates of the volume of shape (minibatch, …, 3).
__getitem__(index: Union[int, List[int], Tuple[int], slice, torch.Tensor]) → pytorch3d.structures.volumes.Volumes[source]
Parameters:index – Specifying the index of the volume to retrieve. Can be an int, slice, list of ints or a boolean or a long tensor.
Returns:Volumes object with selected volumes. The tensors are not cloned.
features() → Optional[torch.Tensor][source]

Returns the features of the volume.

Returns:*features* – The tensor of volume features.
densities() → torch.Tensor[source]

Returns the densities of the volume.

Returns:*densities* – The tensor of volume densities.
densities_list() → List[torch.Tensor][source]

Get the list representation of the densities.

Returns:list of tensors of densities of shape (dim_i, D_i, H_i, W_i).
features_list() → List[torch.Tensor][source]

Get the list representation of the features.

Returns:list of tensors of features of shape (dim_i, D_i, H_i, W_i) or None for feature-less volumes.
get_grid_sizes() → torch.LongTensor[source]

Returns the sizes of individual volumetric grids in the structure.

Returns:*grid_sizes*
Tensor of spatial sizes of each of the volumes
of size (batchsize, 3), where i-th row holds (D_i, H_i, W_i).
update_padded(new_densities: torch.Tensor, new_features: Optional[torch.Tensor] = None) → pytorch3d.structures.volumes.Volumes[source]

Returns a Volumes structure with updated padded tensors and copies of the auxiliary tensors self._local_to_world_transform, device and self._grid_sizes. This function allows for an update of densities (and features) without having to explicitly convert it to the list representation for heterogeneous batches.

Parameters:
  • new_densities – FloatTensor of shape (N, dim_density, D, H, W)
  • new_features – (optional) FloatTensor of shape (N, dim_feature, D, H, W)
Returns:

Volumes with updated features and densities

clone() → pytorch3d.structures.volumes.Volumes[source]

Deep copy of Volumes object. All internal tensors are cloned individually.

Returns:new Volumes object.
to(device: Union[str, torch.device], copy: bool = False) → pytorch3d.structures.volumes.Volumes[source]

Match the functionality of torch.Tensor.to() If copy = True or the self Tensor is on a different device, the returned tensor is a copy of self with the desired torch.device. If copy = False and the self Tensor already has the correct torch.device, then self is returned.

Parameters:
  • device – Device (as str or torch.device) for the new tensor.
  • copy – Boolean indicator whether or not to clone self. Default False.
Returns:

Volumes object.

cpu() → pytorch3d.structures.volumes.Volumes[source]
cuda() → pytorch3d.structures.volumes.Volumes[source]

pytorch3d.io

pytorch3d.io.load_obj(f, load_textures=True, create_texture_atlas: bool = False, texture_atlas_size: int = 4, texture_wrap: Optional[str] = 'repeat', device: Union[str, torch.device] = 'cpu', path_manager: Optional[iopath.common.file_io.PathManager] = None)[source]

Load a mesh from a .obj file and optionally textures from a .mtl file. Currently this handles verts, faces, vertex texture uv coordinates, normals, texture images and material reflectivity values.

Note .obj files are 1-indexed. The tensors returned from this function are 0-indexed. OBJ spec reference: http://www.martinreddy.net/gfx/3d/OBJ.spec

Example .obj file format:

# this is a comment
v 1.000000 -1.000000 -1.000000
v 1.000000 -1.000000 1.000000
v -1.000000 -1.000000 1.000000
v -1.000000 -1.000000 -1.000000
v 1.000000 1.000000 -1.000000
vt 0.748573 0.750412
vt 0.749279 0.501284
vt 0.999110 0.501077
vt 0.999455 0.750380
vn 0.000000 0.000000 -1.000000
vn -1.000000 -0.000000 -0.000000
vn -0.000000 -0.000000 1.000000
f 5/2/1 1/2/1 4/3/1
f 5/1/1 4/3/1 2/4/1

The first character of the line denotes the type of input:

- v is a vertex
- vt is the texture coordinate of one vertex
- vn is the normal of one vertex
- f is a face

Faces are interpreted as follows:

5/2/1 describes the first vertex of the first triangle
- 5: index of vertex [1.000000 1.000000 -1.000000]
- 2: index of texture coordinate [0.749279 0.501284]
- 1: index of normal [0.000000 0.000000 -1.000000]

If there are faces with more than 3 vertices they are subdivided into triangles. Polygonal faces are assumed to have vertices ordered counter-clockwise so the (right-handed) normal points out of the screen e.g. a proper rectangular face would be specified like this:

0_________1
|         |
|         |
3 ________2

The face would be split into two triangles: (0, 2, 1) and (0, 3, 2), both of which are also oriented counter-clockwise and have normals pointing out of the screen.

Parameters:
  • f – A file-like object (with methods read, readline, tell, and seek), a pathlib path or a string containing a file name.
  • load_textures – Boolean indicating whether material files are loaded
  • create_texture_atlas – Bool, If True a per face texture map is created and a tensor texture_atlas is also returned in aux.
  • texture_atlas_size – Int specifying the resolution of the texture map per face when create_texture_atlas=True. A (texture_size, texture_size, 3) map is created per face.
  • texture_wrap – string, one of [“repeat”, “clamp”]. This applies when computing the texture atlas. If texture_mode=”repeat”, for uv values outside the range [0, 1] the integer part is ignored and a repeating pattern is formed. If texture_mode=”clamp” the values are clamped to the range [0, 1]. If None, then there is no transformation of the texture values.
  • device – Device (as str or torch.device) on which to return the new tensors.
  • path_manager – optionally a PathManager object to interpret paths.
Returns:

6-element tuple containing

  • verts: FloatTensor of shape (V, 3).

  • faces: NamedTuple with fields:
    • verts_idx: LongTensor of vertex indices, shape (F, 3).
    • normals_idx: (optional) LongTensor of normal indices, shape (F, 3).
    • textures_idx: (optional) LongTensor of texture indices, shape (F, 3). This can be used to index into verts_uvs.
    • materials_idx: (optional) List of indices indicating which material the texture is derived from for each face. If there is no material for a face, the index is -1. This can be used to retrieve the corresponding values in material_colors/texture_images after they have been converted to tensors or Materials/Textures data structures - see textures.py and materials.py for more info.
  • aux: NamedTuple with fields:
    • normals: FloatTensor of shape (N, 3)

    • verts_uvs: FloatTensor of shape (T, 2), giving the uv coordinate per vertex. If a vertex is shared between two faces, it can have a different uv value for each instance. Therefore it is possible that the number of verts_uvs is greater than num verts i.e. T > V. vertex.

    • material_colors: if load_textures=True and the material has associated properties this will be a dict of material names and properties of the form:

      {
          material_name_1:  {
              "ambient_color": tensor of shape (1, 3),
              "diffuse_color": tensor of shape (1, 3),
              "specular_color": tensor of shape (1, 3),
              "shininess": tensor of shape (1)
          },
          material_name_2: {},
          ...
      }
      

      If a material does not have any properties it will have an empty dict. If load_textures=False, material_colors will None.

    • texture_images: if load_textures=True and the material has a texture map, this will be a dict of the form:

      {
          material_name_1: (H, W, 3) image,
          ...
      }
      

      If load_textures=False, texture_images will None.

    • texture_atlas: if load_textures=True and create_texture_atlas=True, this will be a FloatTensor of the form: (F, texture_size, textures_size, 3) If the material does not have a texture map, then all faces will have a uniform white texture. Otherwise texture_atlas will be None.

pytorch3d.io.load_objs_as_meshes(files: list, device: Union[str, torch.device, None] = None, load_textures: bool = True, create_texture_atlas: bool = False, texture_atlas_size: int = 4, texture_wrap: Optional[str] = 'repeat', path_manager: Optional[iopath.common.file_io.PathManager] = None)[source]

Load meshes from a list of .obj files using the load_obj function, and return them as a Meshes object. This only works for meshes which have a single texture image for the whole mesh. See the load_obj function for more details. material_colors and normals are not stored.

Parameters:
  • files – A list of file-like objects (with methods read, readline, tell, and seek), pathlib paths or strings containing file names.
  • device – Desired device of returned Meshes. Default: uses the current device for the default tensor type.
  • load_textures – Boolean indicating whether material files are loaded
  • texture_atlas_size, texture_wrap (create_texture_atlas,) – as for load_obj.
  • path_manager – optionally a PathManager object to interpret paths.
Returns:

New Meshes object.

pytorch3d.io.save_obj(f: Union[pathlib.Path, str], verts, faces, decimal_places: Optional[int] = None, path_manager: Optional[iopath.common.file_io.PathManager] = None, *, verts_uvs: Optional[torch.Tensor] = None, faces_uvs: Optional[torch.Tensor] = None, texture_map: Optional[torch.Tensor] = None) → None[source]

Save a mesh to an .obj file.

Parameters:
  • f – File (str or path) to which the mesh should be written.
  • verts – FloatTensor of shape (V, 3) giving vertex coordinates.
  • faces – LongTensor of shape (F, 3) giving faces.
  • decimal_places – Number of decimal places for saving.
  • path_manager – Optional PathManager for interpreting f if it is a str.
  • verts_uvs – FloatTensor of shape (V, 2) giving the uv coordinate per vertex.
  • faces_uvs – LongTensor of shape (F, 3) giving the index into verts_uvs for each vertex in the face.
  • texture_map – FloatTensor of shape (H, W, 3) representing the texture map for the mesh which will be saved as an image. The values are expected to be in the range [0, 1],
class pytorch3d.io.IO(include_default_formats: bool = True, path_manager: Optional[iopath.common.file_io.PathManager] = None)[source]

Bases: object

This class is the interface to flexible loading and saving of meshes and point clouds.

In simple cases the user will just initialize an instance of this class as IO() and then use its load and save functions. The arguments of the initializer are not usually needed.

The user can add their own formats for saving and loading by passing their own objects to the register_* functions.

Parameters:
  • include_default_formats – If False, the built-in file formats will not be available. Then only user-registered formats can be used.
  • path_manager – Used to customize how paths given as strings are interpreted.
register_default_formats() → None[source]
register_meshes_format(interpreter: pytorch3d.io.pluggable_formats.MeshFormatInterpreter) → None[source]

Register a new interpreter for a new mesh file format.

Parameters:interpreter – the new interpreter to use, which must be an instance of a class which inherits MeshFormatInterpreter.
register_pointcloud_format(interpreter: pytorch3d.io.pluggable_formats.PointcloudFormatInterpreter) → None[source]

Register a new interpreter for a new point cloud file format.

Parameters:interpreter – the new interpreter to use, which must be an instance of a class which inherits PointcloudFormatInterpreter.
load_mesh(path: Union[str, pathlib.Path], include_textures: bool = True, device: Union[str, torch.device] = 'cpu', **kwargs) → pytorch3d.structures.meshes.Meshes[source]

Attempt to load a mesh from the given file, using a registered format. Materials are not returned. If you have a .obj file with materials you might want to load them with the load_obj function instead.

Parameters:
  • path – file to read
  • include_textures – whether to try to load texture information
  • device – device on which to leave the data.
Returns:

new Meshes object containing one mesh.

save_mesh(data: pytorch3d.structures.meshes.Meshes, path: Union[str, pathlib.Path], binary: Optional[bool] = None, include_textures: bool = True, **kwargs) → None[source]

Attempt to save a mesh to the given file, using a registered format.

Parameters:
  • data – a 1-element Meshes
  • path – file to write
  • binary – If there is a choice, whether to save in a binary format.
  • include_textures – If textures are present, whether to try to save them.
load_pointcloud(path: Union[str, pathlib.Path], device: Union[str, torch.device] = 'cpu', **kwargs) → pytorch3d.structures.pointclouds.Pointclouds[source]

Attempt to load a point cloud from the given file, using a registered format.

Parameters:
  • path – file to read
  • device – Device (as str or torch.device) on which to load the data.
Returns:

new Pointclouds object containing one mesh.

save_pointcloud(data: pytorch3d.structures.pointclouds.Pointclouds, path: Union[str, pathlib.Path], binary: Optional[bool] = None, **kwargs) → None[source]

Attempt to save a point cloud to the given file, using a registered format.

Parameters:
  • data – a 1-element Pointclouds
  • path – file to write
  • binary – If there is a choice, whether to save in a binary format.
pytorch3d.io.load_ply(f, *, path_manager: Optional[iopath.common.file_io.PathManager] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Load the verts and faces from a .ply file. Note that the preferred way to load data from such a file is to use the IO.load_mesh and IO.load_pointcloud functions, which can read more of the data.

Example .ply file format:

ply format ascii 1.0 { ascii/binary, format version number } comment made by Greg Turk { comments keyword specified, like all lines } comment this file is a cube element vertex 8 { define “vertex” element, 8 of them in file } property float x { vertex contains float “x” coordinate } property float y { y coordinate is also a vertex property } property float z { z coordinate, too } element face 6 { there are 6 “face” elements in the file } property list uchar int vertex_index { “vertex_indices” is a list of ints } end_header { delimits the end of the header } 0 0 0 { start of vertex list } 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 0 4 0 1 2 3 { start of face list } 4 7 6 5 4 4 0 4 5 1 4 1 5 6 2 4 2 6 7 3 4 3 7 4 0

Parameters:
  • f – A binary or text file-like object (with methods read, readline, tell and seek), a pathlib path or a string containing a file name. If the ply file is in the binary ply format rather than the text ply format, then a text stream is not supported. It is easiest to use a binary stream in all cases.
  • path_manager – PathManager for loading if f is a str.
Returns:

verts – FloatTensor of shape (V, 3). faces: LongTensor of vertex indices, shape (F, 3).

pytorch3d.io.save_ply(f, verts: torch.Tensor, faces: Optional[torch.LongTensor] = None, verts_normals: Optional[torch.Tensor] = None, ascii: bool = False, decimal_places: Optional[int] = None, path_manager: Optional[iopath.common.file_io.PathManager] = None) → None[source]

Save a mesh to a .ply file.

Parameters:
  • f – File (or path) to which the mesh should be written.
  • verts – FloatTensor of shape (V, 3) giving vertex coordinates.
  • faces – LongTensor of shape (F, 3) giving faces.
  • verts_normals – FloatTensor of shape (V, 3) giving vertex normals.
  • ascii – (bool) whether to use the ascii ply format.
  • decimal_places – Number of decimal places for saving if ascii=True.
  • path_manager – PathManager for interpreting f if it is a str.

pytorch3d.loss

Loss functions for meshes and point clouds.

pytorch3d.loss.chamfer_distance(x, y, x_lengths=None, y_lengths=None, x_normals=None, y_normals=None, weights=None, batch_reduction: Optional[str] = 'mean', point_reduction: str = 'mean')[source]

Chamfer distance between two pointclouds x and y.

Parameters:
  • x – FloatTensor of shape (N, P1, D) or a Pointclouds object representing a batch of point clouds with at most P1 points in each batch element, batch size N and feature dimension D.
  • y – FloatTensor of shape (N, P2, D) or a Pointclouds object representing a batch of point clouds with at most P2 points in each batch element, batch size N and feature dimension D.
  • x_lengths – Optional LongTensor of shape (N,) giving the number of points in each cloud in x.
  • y_lengths – Optional LongTensor of shape (N,) giving the number of points in each cloud in y.
  • x_normals – Optional FloatTensor of shape (N, P1, D).
  • y_normals – Optional FloatTensor of shape (N, P2, D).
  • weights – Optional FloatTensor of shape (N,) giving weights for batch elements for reduction operation.
  • batch_reduction – Reduction operation to apply for the loss across the batch, can be one of [“mean”, “sum”] or None.
  • point_reduction – Reduction operation to apply for the loss across the points, can be one of [“mean”, “sum”].
Returns:

2-element tuple containing

  • loss: Tensor giving the reduced distance between the pointclouds in x and the pointclouds in y.
  • loss_normals: Tensor giving the reduced cosine distance of normals between pointclouds in x and pointclouds in y. Returns None if x_normals and y_normals are None.

pytorch3d.loss.mesh_edge_loss(meshes, target_length: float = 0.0)[source]

Computes mesh edge length regularization loss averaged across all meshes in a batch. Each mesh contributes equally to the final loss, regardless of the number of edges per mesh in the batch by weighting each mesh with the inverse number of edges. For example, if mesh 3 (out of N) has only E=4 edges, then the loss for each edge in mesh 3 should be multiplied by 1/E to contribute to the final loss.

Parameters:
  • meshes – Meshes object with a batch of meshes.
  • target_length – Resting value for the edge length.
Returns:

loss – Average loss across the batch. Returns 0 if meshes contains no meshes or all empty meshes.

pytorch3d.loss.mesh_laplacian_smoothing(meshes, method: str = 'uniform')[source]

Computes the laplacian smoothing objective for a batch of meshes. This function supports three variants of Laplacian smoothing, namely with uniform weights(“uniform”), with cotangent weights (“cot”), and cotangent curvature (“cotcurv”).For more details read [1, 2].

Parameters:
  • meshes – Meshes object with a batch of meshes.
  • method – str specifying the method for the laplacian.
Returns:

loss – Average laplacian smoothing loss across the batch. Returns 0 if meshes contains no meshes or all empty meshes.

Consider a mesh M = (V, F), with verts of shape Nx3 and faces of shape Mx3. The Laplacian matrix L is a NxN tensor such that LV gives a tensor of vectors: for a uniform Laplacian, LuV[i] points to the centroid of its neighboring vertices, a cotangent Laplacian LcV[i] is known to be an approximation of the surface normal, while the curvature variant LckV[i] scales the normals by the discrete mean curvature. For vertex i, assume S[i] is the set of neighboring vertices to i, a_ij and b_ij are the “outside” angles in the two triangles connecting vertex v_i and its neighboring vertex v_j for j in S[i], as seen in the diagram below.

       a_ij
        /\
       /  \
      /    \
     /      \
v_i /________\ v_j
    \        /
     \      /
      \    /
       \  /
        \/
       b_ij

The definition of the Laplacian is LV[i] = sum_j w_ij (v_j - v_i)
For the uniform variant,    w_ij = 1 / |S[i]|
For the cotangent variant,
    w_ij = (cot a_ij + cot b_ij) / (sum_k cot a_ik + cot b_ik)
For the cotangent curvature, w_ij = (cot a_ij + cot b_ij) / (4 A[i])
where A[i] is the sum of the areas of all triangles containing vertex v_i.

There is a nice trigonometry identity to compute cotangents. Consider a triangle with side lengths A, B, C and angles a, b, c.

       c
      /|\
     / | \
    /  |  \
 B /  H|   \ A
  /    |    \
 /     |     \
/a_____|_____b\
       C

Then cot a = (B^2 + C^2 - A^2) / 4 * area
We know that area = CH/2, and by the law of cosines we have

A^2 = B^2 + C^2 - 2BC cos a => B^2 + C^2 - A^2 = 2BC cos a

Putting these together, we get:

B^2 + C^2 - A^2     2BC cos a
_______________  =  _________ = (B/H) cos a = cos a / sin a = cot a
   4 * area            2CH

[1] Desbrun et al, “Implicit fairing of irregular meshes using diffusion and curvature flow”, SIGGRAPH 1999.

[2] Nealan et al, “Laplacian Mesh Optimization”, Graphite 2006.

pytorch3d.loss.mesh_normal_consistency(meshes)[source]

Computes the normal consistency of each mesh in meshes. We compute the normal consistency for each pair of neighboring faces. If e = (v0, v1) is the connecting edge of two neighboring faces f0 and f1, then the normal consistency between f0 and f1

        a
        /\
       /  \
      / f0 \
     /      \
v0  /____e___\ v1
    \        /
     \      /
      \ f1 /
       \  /
        \/
        b

The normal consistency is

nc(f0, f1) = 1 - cos(n0, n1)

where cos(n0, n1) = n0^n1 / ||n0|| / ||n1|| is the cosine of the angle
between the normals n0 and n1, and

n0 = (v1 - v0) x (a - v0)
n1 = - (v1 - v0) x (b - v0) = (b - v0) x (v1 - v0)

This means that if nc(f0, f1) = 0 then n0 and n1 point to the same direction, while if nc(f0, f1) = 2 then n0 and n1 point opposite direction.

Note

For well-constructed meshes the assumption that only two faces share an edge is true. This assumption could make the implementation easier and faster. This implementation does not follow this assumption. All the faces sharing e, which can be any in number, are discovered.

Parameters:meshes – Meshes object with a batch of meshes.
Returns:loss – Average normal consistency across the batch. Returns 0 if meshes contains no meshes or all empty meshes.
pytorch3d.loss.point_mesh_edge_distance(meshes: pytorch3d.structures.meshes.Meshes, pcls: pytorch3d.structures.pointclouds.Pointclouds)[source]

Computes the distance between a pointcloud and a mesh within a batch. Given a pair (mesh, pcl) in the batch, we define the distance to be the sum of two distances, namely point_edge(mesh, pcl) + edge_point(mesh, pcl)

point_edge(mesh, pcl): Computes the squared distance of each point p in pcl
to the closest edge segment in mesh and averages across all points in pcl
edge_point(mesh, pcl): Computes the squared distance of each edge segment in mesh
to the closest point in pcl and averages across all edges in mesh.

The above distance functions are applied for all (mesh, pcl) pairs in the batch and then averaged across the batch.

Parameters:
  • meshes – A Meshes data structure containing N meshes
  • pcls – A Pointclouds data structure containing N pointclouds
Returns:

loss

The point_edge(mesh, pcl) + edge_point(mesh, pcl) distance

between all (mesh, pcl) in a batch averaged across the batch.

pytorch3d.loss.point_mesh_face_distance(meshes: pytorch3d.structures.meshes.Meshes, pcls: pytorch3d.structures.pointclouds.Pointclouds)[source]

Computes the distance between a pointcloud and a mesh within a batch. Given a pair (mesh, pcl) in the batch, we define the distance to be the sum of two distances, namely point_face(mesh, pcl) + face_point(mesh, pcl)

point_face(mesh, pcl): Computes the squared distance of each point p in pcl
to the closest triangular face in mesh and averages across all points in pcl
face_point(mesh, pcl): Computes the squared distance of each triangular face in
mesh to the closest point in pcl and averages across all faces in mesh.

The above distance functions are applied for all (mesh, pcl) pairs in the batch and then averaged across the batch.

Parameters:
  • meshes – A Meshes data structure containing N meshes
  • pcls – A Pointclouds data structure containing N pointclouds
Returns:

loss

The point_face(mesh, pcl) + face_point(mesh, pcl) distance

between all (mesh, pcl) in a batch averaged across the batch.

pytorch3d.ops

pytorch3d.ops.ball_query(p1: torch.Tensor, p2: torch.Tensor, lengths1: Optional[torch.Tensor] = None, lengths2: Optional[torch.Tensor] = None, K: int = 500, radius: float = 0.2, return_nn: bool = True)[source]

Ball Query is an alternative to KNN. It can be used to find all points in p2 that are within a specified radius to the query point in p1 (with an upper limit of K neighbors).

The neighbors returned are not necssarily the nearest to the point in p1, just the first K values in p2 which are within the specified radius.

This method is faster than kNN when there are large numbers of points in p2 and the ordering of neighbors is not important compared to the distance being within the radius threshold.

“Ball query’s local neighborhood guarantees a fixed region scale thus making local region features more generalizable across space, which is preferred for tasks requiring local pattern recognition (e.g. semantic point labeling)” [1].

[1] Charles R. Qi et al, “PointNet++: Deep Hierarchical Feature Learning
on Point Sets in a Metric Space”, NeurIPS 2017.
Parameters:
  • p1 – Tensor of shape (N, P1, D) giving a batch of N point clouds, each containing up to P1 points of dimension D. These represent the centers of the ball queries.
  • p2 – Tensor of shape (N, P2, D) giving a batch of N point clouds, each containing up to P2 points of dimension D.
  • lengths1 – LongTensor of shape (N,) of values in the range [0, P1], giving the length of each pointcloud in p1. Or None to indicate that every cloud has length P1.
  • lengths2 – LongTensor of shape (N,) of values in the range [0, P2], giving the length of each pointcloud in p2. Or None to indicate that every cloud has length P2.
  • K – Integer giving the upper bound on the number of samples to take within the radius
  • radius – the radius around each point within which the neighbors need to be located
  • return_nn – If set to True returns the K neighbor points in p2 for each point in p1.
Returns:

dists

Tensor of shape (N, P1, K) giving the squared distances to

the neighbors. This is padded with zeros both where a cloud in p2 has fewer than S points and where a cloud in p1 has fewer than P1 points and also if there are fewer than K points which satisfy the radius threshold.

idx: LongTensor of shape (N, P1, K) giving the indices of the

S neighbors in p2 for points in p1. Concretely, if p1_idx[n, i, k] = j then p2[n, j] is the k-th neighbor to p1[n, i] in p2[n]. This is padded with -1 both where a cloud in p2 has fewer than S points and where a cloud in p1 has fewer than P1 points and also if there are fewer than K points which satisfy the radius threshold.

nn: Tensor of shape (N, P1, K, D) giving the K neighbors in p2 for

each point in p1. Concretely, p2_nn[n, i, k] gives the k-th neighbor for p1[n, i]. Returned if return_nn is True. The output is a tensor of shape (N, P1, K, U).

pytorch3d.ops.corresponding_cameras_alignment(cameras_src: CamerasBase, cameras_tgt: CamerasBase, estimate_scale: bool = True, mode: str = 'extrinsics', eps: float = 1e-09) → CamerasBase[source]

Warning

The corresponding_cameras_alignment API is experimental and subject to change!

Estimates a single similarity transformation between two sets of cameras cameras_src and cameras_tgt and returns an aligned version of cameras_src.

Given source cameras [(R_1, T_1), (R_2, T_2), …, (R_N, T_N)] and target cameras [(R_1’, T_1’), (R_2’, T_2’), …, (R_N’, T_N’)], where (R_i, T_i) is a 2-tuple of the camera rotation and translation matrix respectively, the algorithm finds a global rotation, translation and scale (R_A, T_A, s_A) which aligns all source cameras with the target cameras such that the following holds:

Under the change of coordinates using a similarity transform (R_A, T_A, s_A) a 3D point X’ is mapped to X with:

` X = (X' R_A + T_A) / s_A `
Then, for all cameras i, we assume that the following holds:
` X R_i + T_i = s' (X' R_i' + T_i'), `

i.e. an adjusted point X’ is mapped by a camera (R_i’, T_i’) to the same point as imaged from camera (R_i, T_i) after resolving the scale ambiguity with a global scalar factor s’.

Substituting for X above gives rise to the following:

``` (X’ R_A + T_A) / s_A R_i + T_i = s’ (X’ R_i’ + T_i’) // · s_A (X’ R_A + T_A) R_i + T_i s_A = (s’ s_A) (X’ R_i’ + T_i’) s’ := 1 / s_A # without loss of generality (X’ R_A + T_A) R_i + T_i s_A = X’ R_i’ + T_i’ X’ R_A R_i + T_A R_i + T_i s_A = X’ R_i’ + T_i’

^^^^^^^ ^^^^^^^^^^^^^^^^^ ~= R_i’ ~= T_i’

```

i.e. after estimating R_A, T_A, s_A, the aligned source cameras have extrinsics:

cameras_src_align = (R_A R_i, T_A R_i + T_i s_A) ~= (R_i’, T_i’)
We support two ways R_A, T_A, s_A can be estimated:
  1. mode==’centers’

    Estimates the similarity alignment between camera centers using Umeyama’s algorithm (see pytorch3d.ops.corresponding_points_alignment for details) and transforms camera extrinsics accordingly.

  2. mode==’extrinsics’

    Defines the alignment problem as a system of the following equations:

    ` for all i: [ R_A   0 ] x [ R_i         0 ] = [ R_i' 0 ] [ T_A^T 1 ]   [ (s_A T_i^T) 1 ]   [ T_i' 1 ] `

    R_A, T_A and s_A are then obtained by solving the system in the least squares sense.

The estimated camera transformation is a true similarity transform, i.e. it cannot be a reflection.

Parameters:
  • cameras_srcN cameras to be aligned.
  • cameras_tgtN target cameras.
  • estimate_scale – Controls whether the alignment transform is rigid (estimate_scale=False), or a similarity (estimate_scale=True). s_A is set to 1 if estimate_scale==False.
  • mode – Controls the alignment algorithm. Can be one either ‘centers’ or ‘extrinsics’. Please refer to the description above for details.
  • eps – A scalar for clamping to avoid dividing by zero. Active when estimate_scale==True.
Returns:

cameras_src_alignedcameras_src after applying the alignment transform.

pytorch3d.ops.cubify(voxels, thresh, device=None, align: str = 'topleft') → pytorch3d.structures.meshes.Meshes[source]

Converts a voxel to a mesh by replacing each occupied voxel with a cube consisting of 12 faces and 8 vertices. Shared vertices are merged, and internal faces are removed. :param voxels: A FloatTensor of shape (N, D, H, W) containing occupancy probabilities. :param thresh: A scalar threshold. If a voxel occupancy is larger than

thresh, the voxel is considered occupied.
Parameters:
  • device – The device of the output meshes
  • align – Defines the alignment of the mesh vertices and the grid locations. Has to be one of {“topleft”, “corner”, “center”}. See below for explanation. Default is “topleft”.
Returns:

meshes – A Meshes object of the corresponding meshes.

The alignment between the vertices of the cubified mesh and the voxel locations (or pixels) is defined by the choice of align. We support three modes, as shown below for a 2x2 grid:

X—X—- X——-X ——— | | | | | | | X | X | X—X—- ——— ——— | | | | | | | X | X | ——— X——-X ———

topleft corner center

In the figure, X denote the grid locations and the squares represent the added cuboids. When align=”topleft”, then the top left corner of each cuboid corresponds to the pixel coordinate of the input grid. When align=”corner”, then the corners of the output mesh span the whole grid. When align=”center”, then the grid locations form the center of the cuboids.

class pytorch3d.ops.GraphConv(input_dim: int, output_dim: int, init: str = 'normal', directed: bool = False)[source]

A single graph convolution layer.

__init__(input_dim: int, output_dim: int, init: str = 'normal', directed: bool = False) → None[source]
Parameters:
  • input_dim – Number of input features per vertex.
  • output_dim – Number of output features per vertex.
  • init – Weight initialization method. Can be one of [‘zero’, ‘normal’].
  • directed – Bool indicating if edges in the graph are directed.
forward(verts, edges)[source]
Parameters:
  • verts – FloatTensor of shape (V, input_dim) where V is the number of vertices and input_dim is the number of input features per vertex. input_dim has to match the input_dim specified in __init__.
  • edges – LongTensor of shape (E, 2) where E is the number of edges where each edge has the indices of the two vertices which form the edge.
Returns:

out – FloatTensor of shape (V, output_dim) where output_dim is the number of output features per vertex.

pytorch3d.ops.interpolate_face_attributes(pix_to_face: torch.Tensor, barycentric_coords: torch.Tensor, face_attributes: torch.Tensor) → torch.Tensor[source]

Interpolate arbitrary face attributes using the barycentric coordinates for each pixel in the rasterized output.

Parameters:
  • pix_to_face – LongTensor of shape (…) specifying the indices of the faces (in the packed representation) which overlap each pixel in the image. A value < 0 indicates that the pixel does not overlap any face and should be skipped.
  • barycentric_coords – FloatTensor of shape (N, H, W, K, 3) specifying the barycentric coordinates of each pixel relative to the faces (in the packed representation) which overlap the pixel.
  • face_attributes – packed attributes of shape (total_faces, 3, D), specifying the value of the attribute for each vertex in the face.
Returns:

pixel_vals – tensor of shape (N, H, W, K, D) giving the interpolated value of the face attribute for each pixel.

pytorch3d.ops.box3d_overlap(boxes1: torch.Tensor, boxes2: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Computes the intersection of 3D boxes1 and boxes2.

Inputs boxes1, boxes2 are tensors of shape (B, 8, 3) (where B doesn’t have to be the same for boxes1 and boxes1), containing the 8 corners of the boxes, as follows:

  1. +———+. (5) | ` . | ` . | (0) +—+—–+ (1) | | | |
  1. +—–+—+. (6)| ` . | ` . | (3) ` +———+ (2)

NOTE: Throughout this implementation, we assume that boxes are defined by their 8 corners exactly in the order specified in the diagram above for the function to give correct results. In addition the vertices on each plane must be coplanar. As an alternative to the diagram, this is a unit bounding box which has the correct vertex ordering:

box_corner_vertices = [
[0, 0, 0], [1, 0, 0], [1, 1, 0], [0, 1, 0], [0, 0, 1], [1, 0, 1], [1, 1, 1], [0, 1, 1],

]

Parameters:
  • boxes1 – tensor of shape (N, 8, 3) of the coordinates of the 1st boxes
  • boxes2 – tensor of shape (M, 8, 3) of the coordinates of the 2nd boxes
Returns:

vol – (N, M) tensor of the volume of the intersecting convex shapes iou: (N, M) tensor of the intersection over union which is

defined as: iou = vol / (vol1 + vol2 - vol)

pytorch3d.ops.knn_gather(x: torch.Tensor, idx: torch.Tensor, lengths: Optional[torch.Tensor] = None)[source]

A helper function for knn that allows indexing a tensor x with the indices idx returned by knn_points.

For example, if dists, idx = knn_points(p, x, lengths_p, lengths, K) where p is a tensor of shape (N, L, D) and x a tensor of shape (N, M, D), then one can compute the K nearest neighbors of p with p_nn = knn_gather(x, idx, lengths). It can also be applied for any tensor x of shape (N, M, U) where U != D.

Parameters:
  • x – Tensor of shape (N, M, U) containing U-dimensional features to be gathered.
  • idx – LongTensor of shape (N, L, K) giving the indices returned by knn_points.
  • lengths – LongTensor of shape (N,) of values in the range [0, M], giving the length of each example in the batch in x. Or None to indicate that every example has length M.
Returns:

x_out

Tensor of shape (N, L, K, U) resulting from gathering the elements of x

with idx, s.t. x_out[n, l, k] = x[n, idx[n, l, k]]. If k > lengths[n] then x_out[n, l, k] is filled with 0.0.

pytorch3d.ops.knn_points(p1: torch.Tensor, p2: torch.Tensor, lengths1: Optional[torch.Tensor] = None, lengths2: Optional[torch.Tensor] = None, K: int = 1, version: int = -1, return_nn: bool = False, return_sorted: bool = True)[source]

K-Nearest neighbors on point clouds.

Parameters:
  • p1 – Tensor of shape (N, P1, D) giving a batch of N point clouds, each containing up to P1 points of dimension D.
  • p2 – Tensor of shape (N, P2, D) giving a batch of N point clouds, each containing up to P2 points of dimension D.
  • lengths1 – LongTensor of shape (N,) of values in the range [0, P1], giving the length of each pointcloud in p1. Or None to indicate that every cloud has length P1.
  • lengths2 – LongTensor of shape (N,) of values in the range [0, P2], giving the length of each pointcloud in p2. Or None to indicate that every cloud has length P2.
  • K – Integer giving the number of nearest neighbors to return.
  • version – Which KNN implementation to use in the backend. If version=-1, the correct implementation is selected based on the shapes of the inputs.
  • return_nn – If set to True returns the K nearest neighbors in p2 for each point in p1.
  • return_sorted – (bool) whether to return the nearest neighbors sorted in ascending order of distance.
Returns:

dists

Tensor of shape (N, P1, K) giving the squared distances to

the nearest neighbors. This is padded with zeros both where a cloud in p2 has fewer than K points and where a cloud in p1 has fewer than P1 points.

idx: LongTensor of shape (N, P1, K) giving the indices of the

K nearest neighbors from points in p1 to points in p2. Concretely, if p1_idx[n, i, k] = j then p2[n, j] is the k-th nearest neighbors to p1[n, i] in p2[n]. This is padded with zeros both where a cloud in p2 has fewer than K points and where a cloud in p1 has fewer than P1 points.

nn: Tensor of shape (N, P1, K, D) giving the K nearest neighbors in p2 for

each point in p1. Concretely, p2_nn[n, i, k] gives the k-th nearest neighbor for p1[n, i]. Returned if return_nn is True. The nearest neighbors are collected using knn_gather

which is a helper function that allows indexing any tensor of shape (N, P2, U) with the indices p1_idx returned by knn_points. The output is a tensor of shape (N, P1, K, U).

pytorch3d.ops.cot_laplacian(verts: torch.Tensor, faces: torch.Tensor, eps: float = 1e-12) → Tuple[torch.Tensor, torch.Tensor][source]

Returns the Laplacian matrix with cotangent weights and the inverse of the face areas.

Parameters:
  • verts – tensor of shape (V, 3) containing the vertices of the graph
  • faces – tensor of shape (F, 3) containing the vertex indices of each face
Returns:

2-element tuple containing - L: Sparse FloatTensor of shape (V,V) for the Laplacian matrix.

Here, L[i, j] = cot a_ij + cot b_ij iff (i, j) is an edge in meshes. See the description above for more clarity.

  • inv_areas: FloatTensor of shape (V,) containing the inverse of sum of
    face areas containing each vertex

pytorch3d.ops.laplacian(verts: torch.Tensor, edges: torch.Tensor) → torch.Tensor[source]

Computes the laplacian matrix. The definition of the laplacian is L[i, j] = -1 , if i == j L[i, j] = 1 / deg(i) , if (i, j) is an edge L[i, j] = 0 , otherwise where deg(i) is the degree of the i-th vertex in the graph.

Parameters:
  • verts – tensor of shape (V, 3) containing the vertices of the graph
  • edges – tensor of shape (E, 2) containing the vertex indices of each edge
Returns:

L – Sparse FloatTensor of shape (V, V)

pytorch3d.ops.norm_laplacian(verts: torch.Tensor, edges: torch.Tensor, eps: float = 1e-12) → torch.Tensor[source]

Norm laplacian computes a variant of the laplacian matrix which weights each affinity with the normalized distance of the neighboring nodes. More concretely, L[i, j] = 1. / wij where wij = ||vi - vj|| if (vi, vj) are neighboring nodes

Parameters:
  • verts – tensor of shape (V, 3) containing the vertices of the graph
  • edges – tensor of shape (E, 2) containing the vertex indices of each edge
Returns:

L – Sparse FloatTensor of shape (V, V)

pytorch3d.ops.mesh_face_areas_normals()
pytorch3d.ops.taubin_smoothing(meshes: pytorch3d.structures.meshes.Meshes, lambd: float = 0.53, mu: float = -0.53, num_iter: int = 10) → pytorch3d.structures.meshes.Meshes[source]

Taubin smoothing [1] is an iterative smoothing operator for meshes. At each iteration

verts := (1 - λ) * verts + λ * L * verts verts := (1 - μ) * verts + μ * L * verts

This function returns a new mesh with smoothed vertices. :param meshes: Meshes input to be smoothed :param lambd, mu: float parameters for Taubin smoothing,

lambd > 0, mu < 0
Parameters:num_iter – number of iterations to execute smoothing
Returns:mesh – Smoothed input Meshes
[1] Curve and Surface Smoothing without Shrinkage,
Gabriel Taubin, ICCV 1997
pytorch3d.ops.packed_to_padded(inputs, first_idxs, max_size)[source]

Torch wrapper that handles allowed input shapes. See description below.

Parameters:
  • inputs – FloatTensor of shape (F,) or (F, D), representing the packed batch tensor, e.g. areas for faces in a batch of meshes.
  • first_idxs – LongTensor of shape (N,) where N is the number of elements in the batch and first_idxs[i] = f means that the inputs for batch element i begin at inputs[f].
  • max_size – Max length of an element in the batch.
Returns:

inputs_padded

FloatTensor of shape (N, max_size) or (N, max_size, D)

where max_size is max of sizes. The values for batch element i which start at inputs[first_idxs[i]] will be copied to inputs_padded[i, :], with zeros padding out the extra inputs.

To handle the allowed input shapes, we convert the inputs tensor of shape (F,) to (F, 1). We reshape the output back to (N, max_size) from (N, max_size, 1).

pytorch3d.ops.padded_to_packed(inputs, first_idxs, num_inputs)[source]

Torch wrapper that handles allowed input shapes. See description below.

Parameters:
  • inputs – FloatTensor of shape (N, max_size) or (N, max_size, D), representing the padded tensor, e.g. areas for faces in a batch of meshes.
  • first_idxs – LongTensor of shape (N,) where N is the number of elements in the batch and first_idxs[i] = f means that the inputs for batch element i begin at inputs_packed[f].
  • num_inputs – Number of packed entries (= F)
Returns:

inputs_packed

FloatTensor of shape (F,) or (F, D) where

inputs_packed[first_idx[i]:] = inputs[i, :].

To handle the allowed input shapes, we convert the inputs tensor of shape (N, max_size) to (N, max_size, 1). We reshape the output back to (F,) from (F, 1).

pytorch3d.ops.efficient_pnp(x: torch.Tensor, y: torch.Tensor, weights: Optional[torch.Tensor] = None, skip_quadratic_eq: bool = False) → pytorch3d.ops.perspective_n_points.EpnpSolution[source]

Implements Efficient PnP algorithm [1] for Perspective-n-Points problem: finds a camera position (defined by rotation R and translation T) that minimizes re-projection error between the given 3D points x and the corresponding uncalibrated 2D points y, i.e. solves

y[i] = Proj(x[i] R[i] + T[i])

in the least-squares sense, where i are indices within the batch, and Proj is the perspective projection operator: Proj([x y z]) = [x/z y/z]. In the noise-less case, 4 points are enough to find the solution as long as they are not co-planar.

Parameters:
  • x – Batch of 3-dimensional points of shape (minibatch, num_points, 3).
  • y – Batch of 2-dimensional points of shape (minibatch, num_points, 2).
  • weights – Batch of non-negative weights of shape (minibatch, num_point). None means equal weights.
  • skip_quadratic_eq – If True, assumes the solution space for the linear system is one-dimensional, i.e. takes the scaled eigenvector that corresponds to the smallest eigenvalue as a solution. If False, finds the candidate coordinates in the potentially 4D null space by approximately solving the systems of quadratic equations. The best candidate is chosen by examining the 2D re-projection error. While this option finds a better solution, especially when the number of points is small or perspective distortions are low (the points are far away), it may be more difficult to back-propagate through.
Returns:

EpnpSolution namedtuple containing elements –

x_cam: Batch of transformed points x that is used to find

the camera parameters, of shape (minibatch, num_points, 3). In the general (noisy) case, they are not exactly equal to x[i] R[i] + T[i] but are some affine transform of `x[i]`s.

R: Batch of rotation matrices of shape (minibatch, 3, 3). T: Batch of translation vectors of shape (minibatch, 3). err_2d: Batch of mean 2D re-projection errors of shape

(minibatch,). Specifically, if yhat is the re-projection for the i-th batch element, it returns sum_j norm(yhat_j - y_j) where j iterates over points and norm denotes the L2 norm.

err_3d: Batch of mean algebraic errors of shape (minibatch,).

Specifically, those are squared distances between x_world and estimated points on the rays defined by y.

[1] Moreno-Noguer, F., Lepetit, V., & Fua, P. (2009). EPnP: An Accurate O(n) solution to the PnP problem. International Journal of Computer Vision. https://www.epfl.ch/labs/cvlab/software/multi-view-stereo/epnp/

pytorch3d.ops.corresponding_points_alignment(X: Union[torch.Tensor, Pointclouds], Y: Union[torch.Tensor, Pointclouds], weights: Union[torch.Tensor, List[torch.Tensor], None] = None, estimate_scale: bool = False, allow_reflection: bool = False, eps: float = 1e-09) → pytorch3d.ops.points_alignment.SimilarityTransform[source]

Finds a similarity transformation (rotation R, translation T and optionally scale s) between two given sets of corresponding d-dimensional points X and Y such that:

s[i] X[i] R[i] + T[i] = Y[i],

for all batch indexes i in the least squares sense.

The algorithm is also known as Umeyama [1].

Parameters:
  • **X** – Batch of d-dimensional points of shape (minibatch, num_point, d) or a Pointclouds object.
  • **Y** – Batch of d-dimensional points of shape (minibatch, num_point, d) or a Pointclouds object.
  • **weights** – Batch of non-negative weights of shape (minibatch, num_point) or list of minibatch 1-dimensional tensors that may have different shapes; in that case, the length of i-th tensor should be equal to the number of points in X_i and Y_i. Passing None means uniform weights.
  • **estimate_scale** – If True, also estimates a scaling component s of the transformation. Otherwise assumes an identity scale and returns a tensor of ones.
  • **allow_reflection** – If True, allows the algorithm to return R which is orthonormal but has determinant==-1.
  • **eps** – A scalar for clamping to avoid dividing by zero. Active for the code that estimates the output scale s.
Returns:

3-element named tuple SimilarityTransform containing - R: Batch of orthonormal matrices of shape (minibatch, d, d). - T: Batch of translations of shape (minibatch, d). - s: batch of scaling factors of shape (minibatch, ).

References

[1] Shinji Umeyama: Least-Suqares Estimation of Transformation Parameters Between Two Point Patterns

pytorch3d.ops.iterative_closest_point(X: Union[torch.Tensor, Pointclouds], Y: Union[torch.Tensor, Pointclouds], init_transform: Optional[pytorch3d.ops.points_alignment.SimilarityTransform] = None, max_iterations: int = 100, relative_rmse_thr: float = 1e-06, estimate_scale: bool = False, allow_reflection: bool = False, verbose: bool = False) → pytorch3d.ops.points_alignment.ICPSolution[source]

Executes the iterative closest point (ICP) algorithm [1, 2] in order to find a similarity transformation (rotation R, translation T, and optionally scale s) between two given differently-sized sets of d-dimensional points X and Y, such that:

s[i] X[i] R[i] + T[i] = Y[NN[i]],

for all batch indices i in the least squares sense. Here, Y[NN[i]] stands for the indices of nearest neighbors from Y to each point in X. Note, however, that the solution is only a local optimum.

Parameters:
  • **X** – Batch of d-dimensional points of shape (minibatch, num_points_X, d) or a Pointclouds object.
  • **Y** – Batch of d-dimensional points of shape (minibatch, num_points_Y, d) or a Pointclouds object.
  • **init_transform** – A named-tuple SimilarityTransform of tensors R, T, `s, where R is a batch of orthonormal matrices of shape (minibatch, d, d), T is a batch of translations of shape (minibatch, d) and s is a batch of scaling factors of shape (minibatch,).
  • **max_iterations** – The maximum number of ICP iterations.
  • **relative_rmse_thr** – A threshold on the relative root mean squared error used to terminate the algorithm.
  • **estimate_scale** – If True, also estimates a scaling component s of the transformation. Otherwise assumes the identity scale and returns a tensor of ones.
  • **allow_reflection** – If True, allows the algorithm to return R which is orthonormal but has determinant==-1.
  • **verbose** – If True, prints status messages during each ICP iteration.
Returns:

A named tuple ICPSolution with the following fields –

converged: A boolean flag denoting whether the algorithm converged

successfully (=`True`) or not (=`False`).

rmse: Attained root mean squared error after termination of ICP. Xt: The point cloud X transformed with the final transformation

(R, T, s). If X is a Pointclouds object, returns an instance of Pointclouds, otherwise returns torch.Tensor.

RTs: A named tuple SimilarityTransform containing a batch of similarity transforms with fields:

R: Batch of orthonormal matrices of shape (minibatch, d, d). T: Batch of translations of shape (minibatch, d). s: batch of scaling factors of shape (minibatch, ).

t_history: A list of named tuples SimilarityTransform

the transformation parameters after each ICP iteration.

References

[1] Besl & McKay: A Method for Registration of 3-D Shapes. TPAMI, 1992. [2] https://en.wikipedia.org/wiki/Iterative_closest_point

pytorch3d.ops.estimate_pointcloud_local_coord_frames(pointclouds: Union[torch.Tensor, Pointclouds], neighborhood_size: int = 50, disambiguate_directions: bool = True) → Tuple[torch.Tensor, torch.Tensor][source]

Estimates the principal directions of curvature (which includes normals) of a batch of pointclouds.

The algorithm first finds neighborhood_size nearest neighbors for each point of the point clouds, followed by obtaining principal vectors of covariance matrices of each of the point neighborhoods. The main principal vector corresponds to the normals, while the other 2 are the direction of the highest curvature and the 2nd highest curvature.

Note that each principal direction is given up to a sign. Hence, the function implements disambiguate_directions switch that allows to ensure consistency of the sign of neighboring normals. The implementation follows the sign disabiguation from SHOT descriptors [1].

The algorithm also returns the curvature values themselves. These are the eigenvalues of the estimated covariance matrices of each point neighborhood.

Parameters:
  • **pointclouds** – Batch of 3-dimensional points of shape (minibatch, num_point, 3) or a Pointclouds object.
  • **neighborhood_size** – The size of the neighborhood used to estimate the geometry around each point.
  • **disambiguate_directions** – If True, uses the algorithm from [1] to ensure sign consistency of the normals of neighboring points.
Returns:

*curvatures*

The three principal curvatures of each point

of shape (minibatch, num_point, 3). If pointclouds are of Pointclouds class, returns a padded tensor.

local_coord_frames: The three principal directions of the curvature

around each point of shape (minibatch, num_point, 3, 3). The principal directions are stored in columns of the output. E.g. local_coord_frames[i, j, :, 0] is the normal of j-th point in the i-th pointcloud. If pointclouds are of Pointclouds class, returns a padded tensor.

References

[1] Tombari, Salti, Di Stefano: Unique Signatures of Histograms for Local Surface Description, ECCV 2010.

pytorch3d.ops.estimate_pointcloud_normals(pointclouds: Union[torch.Tensor, Pointclouds], neighborhood_size: int = 50, disambiguate_directions: bool = True) → torch.Tensor[source]

Estimates the normals of a batch of pointclouds.

The function uses estimate_pointcloud_local_coord_frames to estimate the normals. Please refer to that function for more detailed information.

Parameters:
  • **pointclouds** – Batch of 3-dimensional points of shape (minibatch, num_point, 3) or a Pointclouds object.
  • **neighborhood_size** – The size of the neighborhood used to estimate the geometry around each point.
  • **disambiguate_directions** – If True, uses the algorithm from [1] to ensure sign consistency of the normals of neighboring points.
Returns:

*normals*

A tensor of normals for each input point

of shape (minibatch, num_point, 3). If pointclouds are of Pointclouds class, returns a padded tensor.

References

[1] Tombari, Salti, Di Stefano: Unique Signatures of Histograms for Local Surface Description, ECCV 2010.

pytorch3d.ops.add_pointclouds_to_volumes(pointclouds: Pointclouds, initial_volumes: Volumes, mode: str = 'trilinear', min_weight: float = 0.0001, _python: bool = False) → Volumes[source]

Add a batch of point clouds represented with a Pointclouds structure pointclouds to a batch of existing volumes represented with a Volumes structure initial_volumes.

More specifically, the method casts a set of weighted votes (the weights are determined based on mode=”trilinear”|”nearest”) into the pre-initialized features and densities fields of initial_volumes.

The method returns an updated Volumes object that contains a copy of initial_volumes with its features and densities updated with the result of the pointcloud addition.

Example

``` # init a random point cloud pointclouds = Pointclouds(

points=torch.randn(4, 100, 3), features=torch.rand(4, 100, 5)

) # init an empty volume centered around [0.5, 0.5, 0.5] in world coordinates # with a voxel size of 1.0. initial_volumes = Volumes(

features = torch.zeros(4, 5, 25, 25, 25), densities = torch.zeros(4, 1, 25, 25, 25), volume_translation = [-0.5, -0.5, -0.5], voxel_size = 1.0,

) # add the pointcloud to the ‘initial_volumes’ buffer using # trilinear splatting updated_volumes = add_pointclouds_to_volumes(

pointclouds=pointclouds, initial_volumes=initial_volumes, mode=”trilinear”,
Parameters:
  • pointclouds – Batch of 3D pointclouds represented with a Pointclouds structure. Note that pointclouds.features have to be defined.
  • initial_volumes – Batch of initial Volumes with pre-initialized 1-dimensional densities which contain non-negative numbers corresponding to the opaqueness of each voxel (the higher, the less transparent).
  • mode

    The mode of the conversion of individual points into the volume. Set either to nearest or trilinear: nearest: Each 3D point is first rounded to the volumetric

    lattice. Each voxel is then labeled with the average over features that fall into the given voxel. The gradients of nearest neighbor conversion w.r.t. the 3D locations of the points in pointclouds are not defined.
    trilinear: Each 3D point casts 8 weighted votes to the 8-neighborhood
    of its floating point coordinate. The weights are determined using a trilinear interpolation scheme. Trilinear splatting is fully differentiable w.r.t. all input arguments.
  • min_weight – A scalar controlling the lowest possible total per-voxel weight used to normalize the features accumulated in a voxel. Only active for mode==trilinear.
  • _python – Set to True to use a pure Python implementation, e.g. for test purposes, which requires more memory and may be slower.
Returns:

updated_volumes – Output Volumes structure containing the conversion result.

pytorch3d.ops.add_points_features_to_volume_densities_features(points_3d: torch.Tensor, points_features: torch.Tensor, volume_densities: torch.Tensor, volume_features: Optional[torch.Tensor], mode: str = 'trilinear', min_weight: float = 0.0001, mask: Optional[torch.Tensor] = None, grid_sizes: Optional[torch.LongTensor] = None, _python: bool = False) → Tuple[torch.Tensor, torch.Tensor][source]

Convert a batch of point clouds represented with tensors of per-point 3d coordinates and their features to a batch of volumes represented with tensors of densities and features.

Parameters:
  • points_3d – Batch of 3D point cloud coordinates of shape (minibatch, N, 3) where N is the number of points in each point cloud. Coordinates have to be specified in the local volume coordinates (ranging in [-1, 1]).
  • points_features – Features of shape (minibatch, N, feature_dim) corresponding to the points of the input point clouds pointcloud.
  • volume_densities – Batch of input feature volume densities of shape (minibatch, 1, D, H, W). Each voxel should contain a non-negative number corresponding to its opaqueness (the higher, the less transparent).
  • volume_features – Batch of input feature volumes of shape (minibatch, feature_dim, D, H, W) If set to None, the volume_features will be automatically instantiated with a correct size and filled with 0s.
  • mode

    The mode of the conversion of individual points into the volume. Set either to nearest or trilinear: nearest: Each 3D point is first rounded to the volumetric

    lattice. Each voxel is then labeled with the average over features that fall into the given voxel. The gradients of nearest neighbor rounding w.r.t. the input point locations points_3d are not defined.
    trilinear: Each 3D point casts 8 weighted votes to the 8-neighborhood
    of its floating point coordinate. The weights are determined using a trilinear interpolation scheme. Trilinear splatting is fully differentiable w.r.t. all input arguments.
  • min_weight – A scalar controlling the lowest possible total per-voxel weight used to normalize the features accumulated in a voxel. Only active for mode==trilinear.
  • mask – A binary mask of shape (minibatch, N) determining which 3D points are going to be converted to the resulting volume. Set to None if all points are valid.
  • grid_sizesLongTensor of shape (minibatch, 3) representing the spatial resolutions of each of the the non-flattened volumes tensors, or None to indicate the whole volume is used for every batch element.
  • _python – Set to True to use a pure Python implementation.
Returns:

volume_features – Output volume of shape (minibatch, feature_dim, D, H, W) volume_densities: Occupancy volume of shape (minibatch, 1, D, H, W)

containing the total amount of votes cast to each of the voxels.

pytorch3d.ops.sample_farthest_points(points: torch.Tensor, lengths: Optional[torch.Tensor] = None, K: Union[int, List[T], torch.Tensor] = 50, random_start_point: bool = False) → Tuple[torch.Tensor, torch.Tensor][source]

Iterative farthest point sampling algorithm [1] to subsample a set of K points from a given pointcloud. At each iteration, a point is selected which has the largest nearest neighbor distance to any of the already selected points.

Farthest point sampling provides more uniform coverage of the input point cloud compared to uniform random sampling.

[1] Charles R. Qi et al, “PointNet++: Deep Hierarchical Feature Learning
on Point Sets in a Metric Space”, NeurIPS 2017.
Parameters:
  • points – (N, P, D) array containing the batch of pointclouds
  • lengths – (N,) number of points in each pointcloud (to support heterogeneous batches of pointclouds)
  • K – samples required in each sampled point cloud (this is typically << P). If K is an int then the same number of samples are selected for each pointcloud in the batch. If K is a tensor is should be length (N,) giving the number of samples to select for each element in the batch
  • random_start_point – bool, if True, a random point is selected as the starting point for iterative sampling.
Returns:

selected_points

(N, K, D), array of selected values from points. If the input

K is a tensor, then the shape will be (N, max(K), D), and padded with 0.0 for batch elements where k_i < max(K).

selected_indices: (N, K) array of selected indices. If the input

K is a tensor, then the shape will be (N, max(K), D), and padded with -1 for batch elements where k_i < max(K).

pytorch3d.ops.sample_points_from_meshes(meshes, num_samples: int = 10000, return_normals: bool = False, return_textures: bool = False) → Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]][source]

Convert a batch of meshes to a batch of pointclouds by uniformly sampling points on the surface of the mesh with probability proportional to the face area.

Parameters:
  • meshes – A Meshes object with a batch of N meshes.
  • num_samples – Integer giving the number of point samples per mesh.
  • return_normals – If True, return normals for the sampled points.
  • return_textures – If True, return textures for the sampled points.
Returns:

3-element tuple containing

  • samples: FloatTensor of shape (N, num_samples, 3) giving the coordinates of sampled points for each mesh in the batch. For empty meshes the corresponding row in the samples array will be filled with 0.
  • normals: FloatTensor of shape (N, num_samples, 3) giving a normal vector to each sampled point. Only returned if return_normals is True. For empty meshes the corresponding row in the normals array will be filled with 0.
  • textures: FloatTensor of shape (N, num_samples, C) giving a C-dimensional texture vector to each sampled point. Only returned if return_textures is True. For empty meshes the corresponding row in the textures array will be filled with 0.

Note that in a future releases, we will replace the 3-element tuple output with a Pointclouds datastructure, as follows

Pointclouds(samples, normals=normals, features=textures)

class pytorch3d.ops.SubdivideMeshes(meshes=None)[source]

Subdivide a triangle mesh by adding a new vertex at the center of each edge and dividing each face into four new faces. Vectors of vertex attributes can also be subdivided by averaging the values of the attributes at the two vertices which form each edge. This implementation preserves face orientation - if the vertices of a face are all ordered counter-clockwise, then the faces in the subdivided meshes will also have their vertices ordered counter-clockwise.

If meshes is provided as an input, the initializer performs the relatively expensive computation of determining the new face indices. This one-time computation can be reused for all meshes with the same face topology but different vertex positions.

__init__(meshes=None) → None[source]
Parameters:meshes – Meshes object or None. If a meshes object is provided, the first mesh is used to compute the new faces of the subdivided topology which can be reused for meshes with the same input topology.
subdivide_faces(meshes)[source]
Parameters:meshes – a Meshes object.
Returns:subdivided_faces_packed – (4*sum(F_n), 3) shape LongTensor of original and new faces.

Refer to pytorch3d.structures.meshes.py for more details on packed representations of faces.

Each face is split into 4 faces e.g. Input face

         v0
         /\
        /  \
       /    \
   e1 /      \ e0
     /        \
    /          \
   /            \
  /______________\
v2       e2       v1

faces_packed = [[0, 1, 2]]
faces_packed_to_edges_packed = [[2, 1, 0]]

faces_packed_to_edges_packed is used to represent all the new vertex indices corresponding to the mid-points of edges in the mesh. The actual vertex coordinates will be computed in the forward function. To get the indices of the new vertices, offset faces_packed_to_edges_packed by the total number of vertices.

faces_packed_to_edges_packed = [[2, 1, 0]] + 3 = [[5, 4, 3]]

e.g. subdivided face

        v0
        /\
       /  \
      / f0 \
  v4 /______\ v3
    /\      /\
   /  \ f3 /  \
  / f2 \  / f1 \
 /______\/______\
v2       v5       v1

f0 = [0, 3, 4]
f1 = [1, 5, 3]
f2 = [2, 4, 5]
f3 = [5, 4, 3]
forward(meshes, feats=None)[source]

Subdivide a batch of meshes by adding a new vertex on each edge, and dividing each face into four new faces. New meshes contains two types of vertices: 1) Vertices that appear in the input meshes.

Data for these vertices are copied from the input meshes.
  1. New vertices at the midpoint of each edge. Data for these vertices is the average of the data for the two vertices that make up the edge.
Parameters:
  • meshes – Meshes object representing a batch of meshes.
  • feats – Per-vertex features to be subdivided along with the verts. Should be parallel to the packed vert representation of the input meshes; so it should have shape (V, D) where V is the total number of verts in the input meshes. Default: None.
Returns:

2-element tuple containing

  • new_meshes: Meshes object of a batch of subdivided meshes.
  • new_feats: (optional) Tensor of subdivided feats, parallel to the (packed) vertices of the subdivided meshes. Only returned if feats is not None.

subdivide_homogeneous(meshes, feats=None)[source]

Subdivide verts (and optionally features) of a batch of meshes where each mesh has the same topology of faces. The subdivided faces are precomputed in the initializer.

Parameters:
  • meshes – Meshes object representing a batch of meshes.
  • feats – Per-vertex features to be subdivided along with the verts.
Returns:

2-element tuple containing

  • new_meshes: Meshes object of a batch of subdivided meshes.
  • new_feats: (optional) Tensor of subdivided feats, parallel to the (packed) vertices of the subdivided meshes. Only returned if feats is not None.

subdivide_heterogenerous(meshes, feats=None)[source]

Subdivide faces, verts (and optionally features) of a batch of meshes where each mesh can have different face topologies.

Parameters:
  • meshes – Meshes object representing a batch of meshes.
  • feats – Per-vertex features to be subdivided along with the verts.
Returns:

2-element tuple containing

  • new_meshes: Meshes object of a batch of subdivided meshes.
  • new_feats: (optional) Tensor of subdivided feats, parallel to the (packed) vertices of the subdivided meshes. Only returned if feats is not None.

pytorch3d.ops.convert_pointclouds_to_tensor(pcl: Union[torch.Tensor, Pointclouds])[source]

If type(pcl)==Pointclouds, converts a pcl object to a padded representation and returns it together with the number of points per batch. Otherwise, returns the input itself with the number of points set to the size of the second dimension of pcl.

pytorch3d.ops.eyes(dim: int, N: int, device: Optional[torch.device] = None, dtype: torch.dtype = torch.float32) → torch.Tensor[source]

Generates a batch of N identity matrices of shape (N, dim, dim).

Parameters:
  • **dim** – The dimensionality of the identity matrices.
  • **N** – The number of identity matrices.
  • **device** – The device to be used for allocating the matrices.
  • **dtype** – The datatype of the matrices.
Returns:

*identities* – A batch of identity matrices of shape (N, dim, dim).

pytorch3d.ops.get_point_covariances(points_padded: torch.Tensor, num_points_per_cloud: torch.Tensor, neighborhood_size: int) → Tuple[torch.Tensor, torch.Tensor][source]

Computes the per-point covariance matrices by of the 3D locations of K-nearest neighbors of each point.

Parameters:
  • **points_padded** – Input point clouds as a padded tensor of shape (minibatch, num_points, dim).
  • **num_points_per_cloud** – Number of points per cloud of shape (minibatch,).
  • **neighborhood_size** – Number of nearest neighbors for each point used to estimate the covariance matrices.
Returns:

*covariances*

A batch of per-point covariance matrices

of shape (minibatch, dim, dim).

k_nearest_neighbors: A batch of neighborhood_size nearest

neighbors for each of the point cloud points of shape (minibatch, num_points, neighborhood_size, dim).

pytorch3d.ops.is_pointclouds(pcl: Union[torch.Tensor, Pointclouds])[source]

Checks whether the input pcl is an instance of Pointclouds by checking the existence of points_padded and num_points_per_cloud functions.

pytorch3d.ops.wmean(x: torch.Tensor, weight: Optional[torch.Tensor] = None, dim: Union[int, Tuple[int]] = -2, keepdim: bool = True, eps: float = 1e-09) → torch.Tensor[source]

Finds the mean of the input tensor across the specified dimension. If the weight argument is provided, computes weighted mean. :param x: tensor of shape (*, D), where D is assumed to be spatial; :param weights: if given, non-negative tensor of shape (*,). It must be

broadcastable to x.shape[:-1]. Note that the weights for the last (spatial) dimension are assumed same;
Parameters:
  • dim – dimension(s) in x to average over;
  • keepdim – tells whether to keep the resulting singleton dimension.
  • eps – minimum clamping value in the denominator.
Returns:

the mean tensor

  • if weights is None => mean(x, dim),
  • otherwise => sum(x*w, dim) / max{sum(w, dim), eps}.

pytorch3d.ops.vert_align(feats, verts, return_packed: bool = False, interp_mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: bool = True) → torch.Tensor[source]

Sample vertex features from a feature map. This operation is called “perceptual feature pooling” in [1] or “vert align” in [2].

[1] Wang et al, “Pixel2Mesh: Generating 3D Mesh Models from Single
RGB Images”, ECCV 2018.

[2] Gkioxari et al, “Mesh R-CNN”, ICCV 2019

Parameters:
  • feats – FloatTensor of shape (N, C, H, W) representing image features from which to sample or a list of features each with potentially different C, H or W dimensions.
  • verts – FloatTensor of shape (N, V, 3) or an object (e.g. Meshes or Pointclouds) with `verts_padded’ or `points_padded’ as an attribute giving the (x, y, z) vertex positions for which to sample. (x, y) verts should be normalized such that (-1, -1) corresponds to top-left and (+1, +1) to bottom-right location in the input feature map.
  • return_packed – (bool) Indicates whether to return packed features
  • interp_mode – (str) Specifies how to interpolate features. (‘bilinear’ or ‘nearest’)
  • padding_mode – (str) Specifies how to handle vertices outside of the [-1, 1] range. (‘zeros’, ‘reflection’, or ‘border’)
  • align_corners (bool) – Geometrically, we consider the pixels of the input as squares rather than points. If set to True, the extrema (-1 and 1) are considered as referring to the center points of the input’s corner pixels. If set to False, they are instead considered as referring to the corner points of the input’s corner pixels, making the sampling more resolution agnostic. Default: True
Returns:

feats_sampled

FloatTensor of shape (N, V, C) giving sampled features for each

vertex. If feats is a list, we return concatenated features in axis=2 of shape (N, V, sum(C_n)) where C_n = feats[n].shape[1]. If return_packed = True, the features are transformed to a packed representation of shape (sum(V), C)

pytorch3d.renderer

rasterizer

pytorch3d.renderer.mesh.rasterize_meshes.rasterize_meshes(meshes, image_size: Union[int, List[int], Tuple[int, int]] = 256, blur_radius: float = 0.0, faces_per_pixel: int = 8, bin_size: Optional[int] = None, max_faces_per_bin: Optional[int] = None, perspective_correct: bool = False, clip_barycentric_coords: bool = False, cull_backfaces: bool = False, z_clip_value: Optional[float] = None, cull_to_frustum: bool = False)[source]

Rasterize a batch of meshes given the shape of the desired output image. Each mesh is rasterized onto a separate image of shape (H, W) if image_size is a tuple or (image_size, image_size) if it is an int.

If the desired image size is non square (i.e. a tuple of (H, W) where H != W) the aspect ratio needs special consideration. There are two aspect ratios to be aware of:

  • the aspect ratio of each pixel
  • the aspect ratio of the output image

The camera can be used to set the pixel aspect ratio. In the rasterizer, we assume square pixels, but variable image aspect ratio (i.e rectangle images).

In most cases you will want to set the camera aspect ratio to 1.0 (i.e. square pixels) and only vary the image_size (i.e. the output image dimensions in pixels).

Parameters:
  • meshes – A Meshes object representing a batch of meshes, batch size N.
  • image_size – Size in pixels of the output image to be rasterized. Can optionally be a tuple of (H, W) in the case of non square images.
  • blur_radius – Float distance in the range [0, 2] used to expand the face bounding boxes for rasterization. Setting blur radius results in blurred edges around the shape instead of a hard boundary. Set to 0 for no blur.
  • faces_per_pixel (Optional) – Number of faces to save per pixel, returning the nearest faces_per_pixel points along the z-axis.
  • bin_size – Size of bins to use for coarse-to-fine rasterization. Setting bin_size=0 uses naive rasterization; setting bin_size=None attempts to set it heuristically based on the shape of the input. This should not affect the output, but can affect the speed of the forward pass.
  • max_faces_per_bin – Only applicable when using coarse-to-fine rasterization (bin_size > 0); this is the maximum number of faces allowed within each bin. This should not affect the output values, but can affect the memory usage in the forward pass.
  • perspective_correct – Bool, Whether to apply perspective correction when computing barycentric coordinates for pixels. This should be set to True if a perspective camera is used.
  • clip_barycentric_coords – Whether, after any perspective correction is applied but before the depth is calculated (e.g. for z clipping), to “correct” a location outside the face (i.e. with a negative barycentric coordinate) to a position on the edge of the face.
  • cull_backfaces – Bool, Whether to only rasterize mesh faces which are visible to the camera. This assumes that vertices of front-facing triangles are ordered in an anti-clockwise fashion, and triangles that face away from the camera are in a clockwise order relative to the current view direction. NOTE: This will only work if the mesh faces are consistently defined with counter-clockwise ordering when viewed from the outside.
  • z_clip_value – if not None, then triangles will be clipped (and possibly subdivided into smaller triangles) such that z >= z_clip_value. This avoids camera projections that go to infinity as z->0. Default is None as clipping affects rasterization speed and should only be turned on if explicitly needed. See clip.py for all the extra computation that is required.
  • cull_to_frustum – if True, triangles outside the view frustum will be culled. Culling involves removing all faces which fall outside view frustum. Default is False so that it is turned on only when needed.
Returns:

4-element tuple containing

  • pix_to_face: LongTensor of shape (N, image_size, image_size, faces_per_pixel) giving the indices of the nearest faces at each pixel, sorted in ascending z-order. Concretely pix_to_face[n, y, x, k] = f means that faces_verts[f] is the kth closest face (in the z-direction) to pixel (y, x). Pixels that are hit by fewer than faces_per_pixel are padded with -1.
  • zbuf: FloatTensor of shape (N, image_size, image_size, faces_per_pixel) giving the NDC z-coordinates of the nearest faces at each pixel, sorted in ascending z-order. Concretely, if pix_to_face[n, y, x, k] = f then zbuf[n, y, x, k] = face_verts[f, 2]. Pixels hit by fewer than faces_per_pixel are padded with -1.
  • barycentric: FloatTensor of shape (N, image_size, image_size, faces_per_pixel, 3) giving the barycentric coordinates in NDC units of the nearest faces at each pixel, sorted in ascending z-order. Concretely, if pix_to_face[n, y, x, k] = f then [w0, w1, w2] = barycentric[n, y, x, k] gives the barycentric coords for pixel (y, x) relative to the face defined by face_verts[f]. Pixels hit by fewer than faces_per_pixel are padded with -1.
  • pix_dists: FloatTensor of shape (N, image_size, image_size, faces_per_pixel) giving the signed Euclidean distance (in NDC units) in the x/y plane of each point closest to the pixel. Concretely if pix_to_face[n, y, x, k] = f then pix_dists[n, y, x, k] is the squared distance between the pixel (y, x) and the face given by vertices face_verts[f]. Pixels hit with fewer than faces_per_pixel are padded with -1.

In the case that image_size is a tuple of (H, W) then the outputs will be of shape (N, H, W, …).

pytorch3d.renderer.mesh.rasterize_meshes.non_square_ndc_range(S1, S2)[source]

In the case of non square images, we scale the NDC range to maintain the aspect ratio. The smaller dimension has NDC range of 2.0.

Parameters:
  • S1 – dimension along with the NDC range is needed
  • S2 – the other image dimension
Returns:

ndc_range – NDC range for dimension S1

pytorch3d.renderer.mesh.rasterize_meshes.pix_to_non_square_ndc(i, S1, S2)[source]

The default value of the NDC range is [-1, 1]. However in the case of non square images, we scale the NDC range to maintain the aspect ratio. The smaller dimension has NDC range from [-1, 1] and the other dimension is scaled by the ratio of H:W. e.g. for image size (H, W) = (64, 128)

Height NDC range: [-1, 1] Width NDC range: [-2, 2]
Parameters:
  • i – pixel position on axes S1
  • S1 – dimension along with i is given
  • S2 – the other image dimension
Returns:

pixel – NDC coordinate of point i for dimension S1

pytorch3d.renderer.mesh.rasterize_meshes.rasterize_meshes_python(meshes, image_size: Union[int, Tuple[int, int]] = 256, blur_radius: float = 0.0, faces_per_pixel: int = 8, perspective_correct: bool = False, clip_barycentric_coords: bool = False, cull_backfaces: bool = False, z_clip_value: Optional[float] = None, cull_to_frustum: bool = True, clipped_faces_neighbor_idx: Optional[torch.Tensor] = None)[source]

Naive PyTorch implementation of mesh rasterization with the same inputs and outputs as the rasterize_meshes function.

This function is not optimized and is implemented as a comparison for the C++/CUDA implementations.

pytorch3d.renderer.mesh.rasterize_meshes.edge_function(p, v0, v1)[source]

Determines whether a point p is on the right side of a 2D line segment given by the end points v0, v1.

Parameters:
  • p – (x, y) Coordinates of a point.
  • v1 (v0,) – (x, y) Coordinates of the end points of the edge.
Returns:

area

The signed area of the parallelogram given by the vectors

B = p - v0
A = v1 - v0

      v1 ________
        /\      /
    A  /  \    /
      /    \  /
  v0 /______\/
        B    p

The area can also be interpreted as the cross product A x B. If the sign of the area is positive, the point p is on the right side of the edge. Negative area indicates the point is on the left side of the edge. i.e. for an edge v1 - v0

         v1
        /
       /
-     /    +
     /
    /
  v0

pytorch3d.renderer.mesh.rasterize_meshes.barycentric_coordinates_clip(bary)[source]

Clip negative barycentric coordinates to 0.0 and renormalize so the barycentric coordinates for a point sum to 1. When the blur_radius is greater than 0, a face will still be recorded as overlapping a pixel if the pixel is outside the face. In this case at least one of the barycentric coordinates for the pixel relative to the face will be negative. Clipping will ensure that the texture and z buffer are interpolated correctly.

Parameters:bary – tuple of barycentric coordinates
Returns
bary_clip: (w0, w1, w2) barycentric coordinates with no negative values.
pytorch3d.renderer.mesh.rasterize_meshes.barycentric_coordinates(p, v0, v1, v2)[source]

Compute the barycentric coordinates of a point relative to a triangle.

Parameters:
  • p – Coordinates of a point.
  • v1, v2 (v0,) – Coordinates of the triangle vertices.
Returns
bary: (w0, w1, w2) barycentric coordinates in the range [0, 1].
pytorch3d.renderer.mesh.rasterize_meshes.point_line_distance(p, v0, v1)[source]

Return minimum distance between line segment (v1 - v0) and point p.

Parameters:
  • p – Coordinates of a point.
  • v1 (v0,) – Coordinates of the end points of the line segment.
Returns:

non-square distance to the boundary of the triangle.

Consider the line extending the segment - this can be parameterized as v0 + t (v1 - v0).

First find the projection of point p onto the line. It falls where t = [(p - v0) . (v1 - v0)] / |v1 - v0|^2 where . is the dot product.

The parameter t is clamped from [0, 1] to handle points outside the segment (v1 - v0).

Once the projection of the point on the segment is known, the distance from p to the projection gives the minimum distance to the segment.

pytorch3d.renderer.mesh.rasterize_meshes.point_triangle_distance(p, v0, v1, v2)[source]

Return shortest distance between a point and a triangle.

Parameters:
  • p – Coordinates of a point.
  • v1, v2 (v0,) – Coordinates of the three triangle vertices.
Returns:

shortest absolute distance from the point to the triangle.

class pytorch3d.renderer.mesh.rasterizer.Fragments(pix_to_face, zbuf, bary_coords, dists)[source]
pix_to_face

Alias for field number 0

zbuf

Alias for field number 1

bary_coords

Alias for field number 2

dists

Alias for field number 3

class pytorch3d.renderer.mesh.rasterizer.RasterizationSettings(image_size: Union[int, Tuple[int, int]] = 256, blur_radius: float = 0.0, faces_per_pixel: int = 1, bin_size: Optional[int] = None, max_faces_per_bin: Optional[int] = None, perspective_correct: Optional[bool] = None, clip_barycentric_coords: Optional[bool] = None, cull_backfaces: bool = False, z_clip_value: Optional[float] = None, cull_to_frustum: bool = False)[source]

Class to store the mesh rasterization params with defaults

Members:

image_size: Either common height and width or (height, width), in pixels. blur_radius: Float distance in the range [0, 2] used to expand the face

bounding boxes for rasterization. Setting blur radius results in blurred edges around the shape instead of a hard boundary. Set to 0 for no blur.
faces_per_pixel: (int) Number of faces to keep track of per pixel.
We return the nearest faces_per_pixel faces along the z-axis.
bin_size: Size of bins to use for coarse-to-fine rasterization. Setting
bin_size=0 uses naive rasterization; setting bin_size=None attempts to set it heuristically based on the shape of the input. This should not affect the output, but can affect the speed of the forward pass.
max_faces_per_bin: Only applicable when using coarse-to-fine
rasterization (bin_size != 0); this is the maximum number of faces allowed within each bin. This should not affect the output values, but can affect the memory usage in the forward pass. Setting max_faces_per_bin=None attempts to set with a heuristic.
perspective_correct: Whether to apply perspective correction when
computing barycentric coordinates for pixels. None (default) means make correction if the camera uses perspective.
clip_barycentric_coords: Whether, after any perspective correction
is applied but before the depth is calculated (e.g. for z clipping), to “correct” a location outside the face (i.e. with a negative barycentric coordinate) to a position on the edge of the face. None (default) means clip if blur_radius > 0, which is a condition under which such outside-face-points are likely.
cull_backfaces: Whether to only rasterize mesh faces which are
visible to the camera. This assumes that vertices of front-facing triangles are ordered in an anti-clockwise fashion, and triangles that face away from the camera are in a clockwise order relative to the current view direction. NOTE: This will only work if the mesh faces are consistently defined with counter-clockwise ordering when viewed from the outside.
z_clip_value: if not None, then triangles will be clipped (and possibly
subdivided into smaller triangles) such that z >= z_clip_value. This avoids camera projections that go to infinity as z->0. Default is None as clipping affects rasterization speed and should only be turned on if explicitly needed. See clip.py for all the extra computation that is required.
cull_to_frustum: Whether to cull triangles outside the view frustum.
Culling involves removing all faces which fall outside view frustum. Default is False for performance as often not needed.
image_size = 256
blur_radius = 0.0
faces_per_pixel = 1
bin_size = None
max_faces_per_bin = None
perspective_correct = None
clip_barycentric_coords = None
cull_backfaces = False
z_clip_value = None
cull_to_frustum = False
class pytorch3d.renderer.mesh.rasterizer.MeshRasterizer(cameras=None, raster_settings=None)[source]

This class implements methods for rasterizing a batch of heterogeneous Meshes.

__init__(cameras=None, raster_settings=None) → None[source]
Parameters:
  • cameras – A cameras object which has a transform_points method which returns the transformed points after applying the world-to-view and view-to-ndc transformations.
  • raster_settings – the parameters for rasterization. This should be a named tuple.

All these initial settings can be overridden by passing keyword arguments to the forward function.

to(device)[source]
transform(meshes_world, **kwargs) → torch.Tensor[source]
Parameters:meshes_world – a Meshes object representing a batch of meshes with vertex coordinates in world space.
Returns:meshes_proj – a Meshes object with the vertex positions projected in NDC space

NOTE: keeping this as a separate function for readability but it could be moved into forward.

forward(meshes_world, **kwargs) → pytorch3d.renderer.mesh.rasterizer.Fragments[source]
Parameters:meshes_world – a Meshes object representing a batch of meshes with coordinates in world space.
Returns:Fragments – Rasterization outputs as a named tuple.

cameras

class pytorch3d.renderer.cameras.CamerasBase(dtype: torch.dtype = torch.float32, device: Union[str, torch.device] = 'cpu', **kwargs)[source]

Bases: pytorch3d.renderer.utils.TensorProperties

CamerasBase implements a base class for all cameras.

For cameras, there are four different coordinate systems (or spaces) - World coordinate system: This is the system the object lives - the world. - Camera view coordinate system: This is the system that has its origin on the camera

and the and the Z-axis perpendicular to the image plane. In PyTorch3D, we assume that +X points left, and +Y points up and +Z points out from the image plane. The transformation from world –> view happens after applying a rotation (R) and translation (T)
  • NDC coordinate system: This is the normalized coordinate system that confines
    in a volume the rendered part of the object or scene. Also known as view volume. Given the PyTorch3D convention, (+1, +1, znear) is the top left near corner, and (-1, -1, zfar) is the bottom right far corner of the volume. The transformation from view –> NDC happens after applying the camera projection matrix (P) if defined in NDC space.
  • Screen coordinate system: This is another representation of the view volume with
    the XY coordinates defined in image space instead of a normalized space.

A better illustration of the coordinate systems can be found in pytorch3d/docs/notes/cameras.md.

It defines methods that are common to all camera models:
  • get_camera_center that returns the optical center of the camera in
    world coordinates
  • get_world_to_view_transform which returns a 3D transform from
    world coordinates to the camera view coordinates (R, T)
  • get_full_projection_transform which composes the projection
    transform (P) with the world-to-view transform (R, T)
  • transform_points which takes a set of input points in world coordinates and
    projects to the space the camera is defined in (NDC or screen)
  • get_ndc_camera_transform which defines the transform from screen/NDC to
    PyTorch3D’s NDC space
  • transform_points_ndc which takes a set of points in world coordinates and
    projects them to PyTorch3D’s NDC space
  • transform_points_screen which takes a set of points in world coordinates and
    projects them to screen space

For each new camera, one should implement the get_projection_transform routine that returns the mapping from camera view coordinates to camera coordinates (NDC or screen).

Another useful function that is specific to each camera model is unproject_points which sends points from camera coordinates (NDC or screen) back to camera view or world coordinates depending on the world_coordinates boolean argument of the function.

get_projection_transform()[source]

Calculate the projective transformation matrix.

Parameters:**kwargs – parameters for the projection can be passed in as keyword arguments to override the default values set in __init__.
Returns:a Transform3d object which represents a batch of projection matrices of shape (N, 3, 3)
unproject_points()[source]

Transform input points from camera coodinates (NDC or screen) to the world / camera coordinates.

Each of the input points xy_depth of shape (…, 3) is a concatenation of the x, y location and its depth.

For instance, for an input 2D tensor of shape (num_points, 3) xy_depth takes the following form:

xy_depth[i] = [x[i], y[i], depth[i]],

for a each point at an index i.

The following example demonstrates the relationship between transform_points and unproject_points:

cameras = # camera object derived from CamerasBase
xyz = # 3D points of shape (batch_size, num_points, 3)
# transform xyz to the camera view coordinates
xyz_cam = cameras.get_world_to_view_transform().transform_points(xyz)
# extract the depth of each point as the 3rd coord of xyz_cam
depth = xyz_cam[:, :, 2:]
# project the points xyz to the camera
xy = cameras.transform_points(xyz)[:, :, :2]
# append depth to xy
xy_depth = torch.cat((xy, depth), dim=2)
# unproject to the world coordinates
xyz_unproj_world = cameras.unproject_points(xy_depth, world_coordinates=True)
print(torch.allclose(xyz, xyz_unproj_world)) # True
# unproject to the camera coordinates
xyz_unproj = cameras.unproject_points(xy_depth, world_coordinates=False)
print(torch.allclose(xyz_cam, xyz_unproj)) # True
Parameters:
  • xy_depth – torch tensor of shape (…, 3).
  • world_coordinates – If True, unprojects the points back to world coordinates using the camera extrinsics R and T. False ignores R and T and unprojects to the camera view coordinates.
Returns
new_points: unprojected points with the same shape as xy_depth.
get_camera_center(**kwargs) → torch.Tensor[source]

Return the 3D location of the camera optical center in the world coordinates.

Parameters:**kwargs – parameters for the camera extrinsics can be passed in as keyword arguments to override the default values set in __init__.

Setting T here will update the values set in init as this value may be needed later on in the rendering pipeline e.g. for lighting calculations.

Returns:C – a batch of 3D locations of shape (N, 3) denoting the locations of the center of each camera in the batch.
get_world_to_view_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]

Return the world-to-view transform.

Parameters:**kwargs – parameters for the camera extrinsics can be passed in as keyword arguments to override the default values set in __init__.

Setting R and T here will update the values set in init as these values may be needed later on in the rendering pipeline e.g. for lighting calculations.

Returns:A Transform3d object which represents a batch of transforms of shape (N, 3, 3)
get_full_projection_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]

Return the full world-to-camera transform composing the world-to-view and view-to-camera transforms. If camera is defined in NDC space, the projected points are in NDC space. If camera is defined in screen space, the projected points are in screen space.

Parameters:**kwargs – parameters for the projection transforms can be passed in as keyword arguments to override the default values set in __init__.

Setting R and T here will update the values set in init as these values may be needed later on in the rendering pipeline e.g. for lighting calculations.

Returns:a Transform3d object which represents a batch of transforms of shape (N, 3, 3)
transform_points(points, eps: Optional[float] = None, **kwargs) → torch.Tensor[source]

Transform input points from world to camera space with the projection matrix defined by the camera.

For CamerasBase.transform_points, setting eps > 0 stabilizes gradients since it leads to avoiding division by excessively low numbers for points close to the camera plane.

Parameters:
  • points – torch tensor of shape (…, 3).
  • eps

    If eps!=None, the argument is used to clamp the divisor in the homogeneous normalization of the points transformed to the ndc space. Please see transforms.Transform3D.transform_points for details.

    For CamerasBase.transform_points, setting eps > 0 stabilizes gradients since it leads to avoiding division by excessively low numbers for points close to the camera plane.

Returns
new_points: transformed points with the same shape as the input.
get_ndc_camera_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]

Returns the transform from camera projection space (screen or NDC) to NDC space. For cameras that can be specified in screen space, this transform allows points to be converted from screen to NDC space. The default transform scales the points from [0, W-1]x[0, H-1] to [-1, 1]. This function should be modified per camera definitions if need be, e.g. for Perspective/Orthographic cameras we provide a custom implementation. This transform assumes PyTorch3D coordinate system conventions for both the NDC space and the input points.

This transform interfaces with the PyTorch3D renderer which assumes input points to the renderer to be in NDC space.

transform_points_ndc(points, eps: Optional[float] = None, **kwargs) → torch.Tensor[source]

Transforms points from PyTorch3D world/camera space to NDC space. Input points follow the PyTorch3D coordinate system conventions: +X left, +Y up. Output points are in NDC space: +X left, +Y up, origin at image center.

Parameters:
  • points – torch tensor of shape (…, 3).
  • eps

    If eps!=None, the argument is used to clamp the divisor in the homogeneous normalization of the points transformed to the ndc space. Please see transforms.Transform3D.transform_points for details.

    For CamerasBase.transform_points, setting eps > 0 stabilizes gradients since it leads to avoiding division by excessively low numbers for points close to the camera plane.

Returns
new_points: transformed points with the same shape as the input.
transform_points_screen(points, eps: Optional[float] = None, **kwargs) → torch.Tensor[source]

Transforms points from PyTorch3D world/camera space to screen space. Input points follow the PyTorch3D coordinate system conventions: +X left, +Y up. Output points are in screen space: +X right, +Y down, origin at top left corner.

Parameters:
  • points – torch tensor of shape (…, 3).
  • eps

    If eps!=None, the argument is used to clamp the divisor in the homogeneous normalization of the points transformed to the ndc space. Please see transforms.Transform3D.transform_points for details.

    For CamerasBase.transform_points, setting eps > 0 stabilizes gradients since it leads to avoiding division by excessively low numbers for points close to the camera plane.

Returns
new_points: transformed points with the same shape as the input.
clone()[source]

Returns a copy of self.

is_perspective()[source]
in_ndc()[source]

Specifies whether the camera is defined in NDC space or in screen (image) space

get_znear()[source]
get_image_size()[source]

Returns the image size, if provided, expected in the form of (height, width) The image size is used for conversion of projected points to screen coordinates.

pytorch3d.renderer.cameras.OpenGLPerspectiveCameras(znear=1.0, zfar=100.0, aspect_ratio=1.0, fov=60.0, degrees: bool = True, R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), device: Union[str, torch.device] = 'cpu') → pytorch3d.renderer.cameras.FoVPerspectiveCameras[source]

OpenGLPerspectiveCameras has been DEPRECATED. Use FoVPerspectiveCameras instead. Preserving OpenGLPerspectiveCameras for backward compatibility.

class pytorch3d.renderer.cameras.FoVPerspectiveCameras(znear=1.0, zfar=100.0, aspect_ratio=1.0, fov=60.0, degrees: bool = True, R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), K: Optional[torch.Tensor] = None, device: Union[str, torch.device] = 'cpu')[source]

Bases: pytorch3d.renderer.cameras.CamerasBase

A class which stores a batch of parameters to generate a batch of projection matrices by specifying the field of view. The definition of the parameters follow the OpenGL perspective camera.

The extrinsics of the camera (R and T matrices) can also be set in the initializer or passed in to get_full_projection_transform to get the full transformation from world -> ndc.

The transform_points method calculates the full world -> ndc transform and then applies it to the input points.

The transforms can also be returned separately as Transform3d objects.

  • Setting the Aspect Ratio for Non Square Images *

If the desired output image size is non square (i.e. a tuple of (H, W) where H != W) the aspect ratio needs special consideration: There are two aspect ratios to be aware of:

  • the aspect ratio of each pixel
  • the aspect ratio of the output image

The aspect_ratio setting in the FoVPerspectiveCameras sets the pixel aspect ratio. When using this camera with the differentiable rasterizer be aware that in the rasterizer we assume square pixels, but allow variable image aspect ratio (i.e rectangle images).

In most cases you will want to set the camera aspect_ratio=1.0 (i.e. square pixels) and only vary the output image dimensions in pixels for rasterization.

__init__(znear=1.0, zfar=100.0, aspect_ratio=1.0, fov=60.0, degrees: bool = True, R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), K: Optional[torch.Tensor] = None, device: Union[str, torch.device] = 'cpu') → None[source]
Parameters:
  • znear – near clipping plane of the view frustrum.
  • zfar – far clipping plane of the view frustrum.
  • aspect_ratio – aspect ratio of the image pixels. 1.0 indicates square pixels.
  • fov – field of view angle of the camera.
  • degrees – bool, set to True if fov is specified in degrees.
  • R – Rotation matrix of shape (N, 3, 3)
  • T – Translation matrix of shape (N, 3)
  • K – (optional) A calibration matrix of shape (N, 4, 4) If provided, don’t need znear, zfar, fov, aspect_ratio, degrees
  • device – Device (as str or torch.device)
compute_projection_matrix(znear, zfar, fov, aspect_ratio, degrees: bool) → torch.Tensor[source]

Compute the calibration matrix K of shape (N, 4, 4)

Parameters:
  • znear – near clipping plane of the view frustrum.
  • zfar – far clipping plane of the view frustrum.
  • fov – field of view angle of the camera.
  • aspect_ratio – aspect ratio of the image pixels. 1.0 indicates square pixels.
  • degrees – bool, set to True if fov is specified in degrees.
Returns:

torch.FloatTensor of the calibration matrix with shape (N, 4, 4)

get_projection_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]

Calculate the perspective projection matrix with a symmetric viewing frustrum. Use column major order. The viewing frustrum will be projected into ndc, s.t. (max_x, max_y) -> (+1, +1) (min_x, min_y) -> (-1, -1)

Parameters:**kwargs – parameters for the projection can be passed in as keyword arguments to override the default values set in __init__.
Returns:a Transform3d object which represents a batch of projection matrices of shape (N, 4, 4)
h1 = (max_y + min_y)/(max_y - min_y)
w1 = (max_x + min_x)/(max_x - min_x)
tanhalffov = tan((fov/2))
s1 = 1/tanhalffov
s2 = 1/(tanhalffov * (aspect_ratio))

# To map z to the range [0, 1] use:
f1 =  far / (far - near)
f2 = -(far * near) / (far - near)

# Projection matrix
K = [
        [s1,   0,   w1,   0],
        [0,   s2,   h1,   0],
        [0,    0,   f1,  f2],
        [0,    0,    1,   0],
]
unproject_points(xy_depth: torch.Tensor, world_coordinates: bool = True, scaled_depth_input: bool = False, **kwargs) → torch.Tensor[source]

>! FoV cameras further allow for passing depth in world units (scaled_depth_input=False) or in the [0, 1]-normalized units (scaled_depth_input=True)

Parameters:scaled_depth_input – If True, assumes the input depth is in the [0, 1]-normalized units. If False the input depth is in the world units.
is_perspective()[source]
in_ndc()[source]
pytorch3d.renderer.cameras.OpenGLOrthographicCameras(znear=1.0, zfar=100.0, top=1.0, bottom=-1.0, left=-1.0, right=1.0, scale_xyz=((1.0, 1.0, 1.0),), R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), device: Union[str, torch.device] = 'cpu') → pytorch3d.renderer.cameras.FoVOrthographicCameras[source]

OpenGLOrthographicCameras has been DEPRECATED. Use FoVOrthographicCameras instead. Preserving OpenGLOrthographicCameras for backward compatibility.

class pytorch3d.renderer.cameras.FoVOrthographicCameras(znear=1.0, zfar=100.0, max_y=1.0, min_y=-1.0, max_x=1.0, min_x=-1.0, scale_xyz=((1.0, 1.0, 1.0),), R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), K: Optional[torch.Tensor] = None, device: Union[str, torch.device] = 'cpu')[source]

Bases: pytorch3d.renderer.cameras.CamerasBase

A class which stores a batch of parameters to generate a batch of projection matrices by specifying the field of view. The definition of the parameters follow the OpenGL orthographic camera.

__init__(znear=1.0, zfar=100.0, max_y=1.0, min_y=-1.0, max_x=1.0, min_x=-1.0, scale_xyz=((1.0, 1.0, 1.0),), R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), K: Optional[torch.Tensor] = None, device: Union[str, torch.device] = 'cpu')[source]
Parameters:
  • znear – near clipping plane of the view frustrum.
  • zfar – far clipping plane of the view frustrum.
  • max_y – maximum y coordinate of the frustrum.
  • min_y – minimum y coordinate of the frustrum.
  • max_x – maximum x coordinate of the frustrum.
  • min_x – minimum x coordinate of the frustrum
  • scale_xyz – scale factors for each axis of shape (N, 3).
  • R – Rotation matrix of shape (N, 3, 3).
  • T – Translation of shape (N, 3).
  • K – (optional) A calibration matrix of shape (N, 4, 4) If provided, don’t need znear, zfar, max_y, min_y, max_x, min_x, scale_xyz
  • device – torch.device or string.

Only need to set min_x, max_x, min_y, max_y for viewing frustrums which are non symmetric about the origin.

compute_projection_matrix(znear, zfar, max_x, min_x, max_y, min_y, scale_xyz) → torch.Tensor[source]

Compute the calibration matrix K of shape (N, 4, 4)

Parameters:
  • znear – near clipping plane of the view frustrum.
  • zfar – far clipping plane of the view frustrum.
  • max_x – maximum x coordinate of the frustrum.
  • min_x – minimum x coordinate of the frustrum
  • max_y – maximum y coordinate of the frustrum.
  • min_y – minimum y coordinate of the frustrum.
  • scale_xyz – scale factors for each axis of shape (N, 3).
get_projection_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]

Calculate the orthographic projection matrix. Use column major order.

Parameters:**kwargs – parameters for the projection can be passed in to override the default values set in __init__.
Returns:
a Transform3d object which represents a batch of projection
matrices of shape (N, 4, 4)
scale_x = 2 / (max_x - min_x)
scale_y = 2 / (max_y - min_y)
scale_z = 2 / (far-near)
mid_x = (max_x + min_x) / (max_x - min_x)
mix_y = (max_y + min_y) / (max_y - min_y)
mid_z = (far + near) / (far - near)

K = [
        [scale_x,        0,         0,  -mid_x],
        [0,        scale_y,         0,  -mix_y],
        [0,              0,  -scale_z,  -mid_z],
        [0,              0,         0,       1],
]
unproject_points(xy_depth: torch.Tensor, world_coordinates: bool = True, scaled_depth_input: bool = False, **kwargs) → torch.Tensor[source]

>! FoV cameras further allow for passing depth in world units (scaled_depth_input=False) or in the [0, 1]-normalized units (scaled_depth_input=True)

Parameters:scaled_depth_input – If True, assumes the input depth is in the [0, 1]-normalized units. If False the input depth is in the world units.
is_perspective()[source]
in_ndc()[source]
pytorch3d.renderer.cameras.SfMPerspectiveCameras(focal_length=1.0, principal_point=((0.0, 0.0),), R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), device: Union[str, torch.device] = 'cpu') → pytorch3d.renderer.cameras.PerspectiveCameras[source]

SfMPerspectiveCameras has been DEPRECATED. Use PerspectiveCameras instead. Preserving SfMPerspectiveCameras for backward compatibility.

class pytorch3d.renderer.cameras.PerspectiveCameras(focal_length=1.0, principal_point=((0.0, 0.0),), R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), K: Optional[torch.Tensor] = None, device: Union[str, torch.device] = 'cpu', in_ndc: bool = True, image_size: Union[List[T], Tuple, torch.Tensor, None] = None)[source]

Bases: pytorch3d.renderer.cameras.CamerasBase

A class which stores a batch of parameters to generate a batch of transformation matrices using the multi-view geometry convention for perspective camera.

Parameters for this camera are specified in NDC if in_ndc is set to True. If parameters are specified in screen space, in_ndc must be set to False.

__init__(focal_length=1.0, principal_point=((0.0, 0.0),), R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), K: Optional[torch.Tensor] = None, device: Union[str, torch.device] = 'cpu', in_ndc: bool = True, image_size: Union[List[T], Tuple, torch.Tensor, None] = None) → None[source]
Parameters:
  • focal_length – Focal length of the camera in world units. A tensor of shape (N, 1) or (N, 2) for square and non-square pixels respectively.
  • principal_point – xy coordinates of the center of the principal point of the camera in pixels. A tensor of shape (N, 2).
  • in_ndc – True if camera parameters are specified in NDC. If camera parameters are in screen space, it must be set to False.
  • R – Rotation matrix of shape (N, 3, 3)
  • T – Translation matrix of shape (N, 3)
  • K – (optional) A calibration matrix of shape (N, 4, 4) If provided, don’t need focal_length, principal_point
  • image_size – (height, width) of image size. A tensor of shape (N, 2) or a list/tuple. Required for screen cameras.
  • device – torch.device or string
get_projection_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]

Calculate the projection matrix using the multi-view geometry convention.

Parameters:**kwargs – parameters for the projection can be passed in as keyword arguments to override the default values set in __init__.
Returns:A Transform3d object with a batch of N projection transforms.
fx = focal_length[:, 0]
fy = focal_length[:, 1]
px = principal_point[:, 0]
py = principal_point[:, 1]

K = [
        [fx,   0,   px,   0],
        [0,   fy,   py,   0],
        [0,    0,    0,   1],
        [0,    0,    1,   0],
]
unproject_points(xy_depth: torch.Tensor, world_coordinates: bool = True, **kwargs) → torch.Tensor[source]
get_principal_point(**kwargs) → torch.Tensor[source]

Return the camera’s principal point

Parameters:**kwargs – parameters for the camera extrinsics can be passed in as keyword arguments to override the default values set in __init__.
get_ndc_camera_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]

Returns the transform from camera projection space (screen or NDC) to NDC space. If the camera is defined already in NDC space, the transform is identity. For cameras defined in screen space, we adjust the principal point computation which is defined in the image space (commonly) and scale the points to NDC space.

Important: This transforms assumes PyTorch3D conventions for the input points, i.e. +X left, +Y up.

is_perspective()[source]
in_ndc()[source]
pytorch3d.renderer.cameras.SfMOrthographicCameras(focal_length=1.0, principal_point=((0.0, 0.0),), R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), device: Union[str, torch.device] = 'cpu') → pytorch3d.renderer.cameras.OrthographicCameras[source]

SfMOrthographicCameras has been DEPRECATED. Use OrthographicCameras instead. Preserving SfMOrthographicCameras for backward compatibility.

class pytorch3d.renderer.cameras.OrthographicCameras(focal_length=1.0, principal_point=((0.0, 0.0),), R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), K: Optional[torch.Tensor] = None, device: Union[str, torch.device] = 'cpu', in_ndc: bool = True, image_size: Union[List[T], Tuple, torch.Tensor, None] = None)[source]

Bases: pytorch3d.renderer.cameras.CamerasBase

A class which stores a batch of parameters to generate a batch of transformation matrices using the multi-view geometry convention for orthographic camera.

Parameters for this camera are specified in NDC if in_ndc is set to True. If parameters are specified in screen space, in_ndc must be set to False.

__init__(focal_length=1.0, principal_point=((0.0, 0.0),), R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]]), K: Optional[torch.Tensor] = None, device: Union[str, torch.device] = 'cpu', in_ndc: bool = True, image_size: Union[List[T], Tuple, torch.Tensor, None] = None) → None[source]
Parameters:
  • focal_length – Focal length of the camera in world units. A tensor of shape (N, 1) or (N, 2) for square and non-square pixels respectively.
  • principal_point – xy coordinates of the center of the principal point of the camera in pixels. A tensor of shape (N, 2).
  • in_ndc – True if camera parameters are specified in NDC. If False, then camera parameters are in screen space.
  • R – Rotation matrix of shape (N, 3, 3)
  • T – Translation matrix of shape (N, 3)
  • K – (optional) A calibration matrix of shape (N, 4, 4) If provided, don’t need focal_length, principal_point, image_size
  • image_size – (height, width) of image size. A tensor of shape (N, 2) or list/tuple. Required for screen cameras.
  • device – torch.device or string
get_projection_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]

Calculate the projection matrix using the multi-view geometry convention.

Parameters:**kwargs – parameters for the projection can be passed in as keyword arguments to override the default values set in __init__.
Returns:A Transform3d object with a batch of N projection transforms.
fx = focal_length[:,0]
fy = focal_length[:,1]
px = principal_point[:,0]
py = principal_point[:,1]

K = [
        [fx,   0,    0,  px],
        [0,   fy,    0,  py],
        [0,    0,    1,   0],
        [0,    0,    0,   1],
]
unproject_points(xy_depth: torch.Tensor, world_coordinates: bool = True, **kwargs) → torch.Tensor[source]
get_principal_point(**kwargs) → torch.Tensor[source]

Return the camera’s principal point

Parameters:**kwargs – parameters for the camera extrinsics can be passed in as keyword arguments to override the default values set in __init__.
get_ndc_camera_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]

Returns the transform from camera projection space (screen or NDC) to NDC space. If the camera is defined already in NDC space, the transform is identity. For cameras defined in screen space, we adjust the principal point computation which is defined in the image space (commonly) and scale the points to NDC space.

Important: This transforms assumes PyTorch3D conventions for the input points, i.e. +X left, +Y up.

is_perspective()[source]
in_ndc()[source]
pytorch3d.renderer.cameras.get_world_to_view_transform(R: torch.Tensor = tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]]), T: torch.Tensor = tensor([[0., 0., 0.]])) → pytorch3d.transforms.transform3d.Transform3d[source]

This function returns a Transform3d representing the transformation matrix to go from world space to view space by applying a rotation and a translation.

PyTorch3D uses the same convention as Hartley & Zisserman. I.e., for camera extrinsic parameters R (rotation) and T (translation), we map a 3D point X_world in world coordinates to a point X_cam in camera coordinates with: X_cam = X_world R + T

Parameters:
  • R – (N, 3, 3) matrix representing the rotation.
  • T – (N, 3) matrix representing the translation.
Returns:

a Transform3d object which represents the composed RT transformation.

pytorch3d.renderer.cameras.camera_position_from_spherical_angles(distance: float, elevation: float, azimuth: float, degrees: bool = True, device: Union[str, torch.device] = 'cpu') → torch.Tensor[source]

Calculate the location of the camera based on the distance away from the target point, the elevation and azimuth angles.

Parameters:
  • distance – distance of the camera from the object.
  • azimuth (elevation,) –

    angles. The inputs distance, elevation and azimuth can be one of the following

    • Python scalar
    • Torch scalar
    • Torch tensor of shape (N) or (1)
  • degrees – bool, whether the angles are specified in degrees or radians.
  • device – str or torch.device, device for new tensors to be placed on.

The vectors are broadcast against each other so they all have shape (N, 1).

Returns:camera_position – (N, 3) xyz location of the camera.
pytorch3d.renderer.cameras.look_at_rotation(camera_position, at=((0, 0, 0), ), up=((0, 1, 0), ), device: Union[str, torch.device] = 'cpu') → torch.Tensor[source]

This function takes a vector ‘camera_position’ which specifies the location of the camera in world coordinates and two vectors at and up which indicate the position of the object and the up directions of the world coordinate system respectively. The object is assumed to be centered at the origin.

The output is a rotation matrix representing the transformation from world coordinates -> view coordinates.

Parameters:
  • camera_position – position of the camera in world coordinates
  • at – position of the object in world coordinates
  • up – vector specifying the up direction in the world coordinate frame.
The inputs camera_position, at and up can each be a
  • 3 element tuple/list
  • torch tensor of shape (1, 3)
  • torch tensor of shape (N, 3)

The vectors are broadcast against each other so they all have shape (N, 3).

Returns:R – (N, 3, 3) batched rotation matrices
pytorch3d.renderer.cameras.look_at_view_transform(dist=1.0, elev=0.0, azim=0.0, degrees: bool = True, eye: Optional[Sequence[T_co]] = None, at=((0, 0, 0), ), up=((0, 1, 0), ), device: Union[str, torch.device] = 'cpu') → Tuple[torch.Tensor, torch.Tensor][source]

This function returns a rotation and translation matrix to apply the ‘Look At’ transformation from world -> view coordinates [0].

Parameters:
  • dist – distance of the camera from the object
  • elev – angle in degrees or radians. This is the angle between the vector from the object to the camera, and the horizontal plane y = 0 (xz-plane).
  • azim – angle in degrees or radians. The vector from the object to the camera is projected onto a horizontal plane y = 0. azim is the angle between the projected vector and a reference vector at (0, 0, 1) on the reference plane (the horizontal plane).
  • elev and azim can be of shape (dist,) –
  • degrees – boolean flag to indicate if the elevation and azimuth angles are specified in degrees or radians.
  • eye – the position of the camera(s) in world coordinates. If eye is not None, it will override the camera position derived from dist, elev, azim.
  • up – the direction of the x axis in the world coordinate system.
  • at – the position of the object(s) in world coordinates.
  • up and at can be of shape (eye,) –
Returns:

2-element tuple containing

  • R: the rotation to apply to the points to align with the camera.
  • T: the translation to apply to the points to align with the camera.

References: [0] https://www.scratchapixel.com

pytorch3d.renderer.cameras.get_ndc_to_screen_transform(cameras, with_xyflip: bool = False, image_size: Union[List[T], Tuple, torch.Tensor, None] = None) → pytorch3d.transforms.transform3d.Transform3d[source]

PyTorch3D NDC to screen conversion. Conversion from PyTorch3D’s NDC space (+X left, +Y up) to screen/image space (+X right, +Y down, origin top left).

Parameters:
  • cameras
  • with_xyflip – flips x- and y-axis if set to True.
Optional kwargs:
image_size: ((height, width),) specifying the height, width of the image. If not provided, it reads it from cameras.

We represent the NDC to screen conversion as a Transform3d with projection matrix

K = [
[s, 0, 0, cx], [0, s, 0, cy], [0, 0, 1, 0], [0, 0, 0, 1],

]

pytorch3d.renderer.cameras.get_screen_to_ndc_transform(cameras, with_xyflip: bool = False, image_size: Union[List[T], Tuple, torch.Tensor, None] = None) → pytorch3d.transforms.transform3d.Transform3d[source]

Screen to PyTorch3D NDC conversion. Conversion from screen/image space (+X right, +Y down, origin top left) to PyTorch3D’s NDC space (+X left, +Y up).

Parameters:
  • cameras
  • with_xyflip – flips x- and y-axis if set to True.
Optional kwargs:
image_size: ((height, width),) specifying the height, width of the image. If not provided, it reads it from cameras.

We represent the screen to NDC conversion as a Transform3d with projection matrix

K = [
[1/s, 0, 0, cx/s], [ 0, 1/s, 0, cy/s], [ 0, 0, 1, 0], [ 0, 0, 0, 1],

]

lighting

pytorch3d.renderer.lighting.diffuse(normals, color, direction) → torch.Tensor[source]

Calculate the diffuse component of light reflection using Lambert’s cosine law.

Parameters:
  • normals – (N, …, 3) xyz normal vectors. Normals and points are expected to have the same shape.
  • color – (1, 3) or (N, 3) RGB color of the diffuse component of the light.
  • direction – (x,y,z) direction of the light
Returns:

colors – (N, …, 3), same shape as the input points.

The normals and light direction should be in the same coordinate frame i.e. if the points have been transformed from world -> view space then the normals and direction should also be in view space.

NOTE: to use with the packed vertices (i.e. no batch dimension) reformat the inputs in the following way.

Args:
    normals: (P, 3)
    color: (N, 3)[batch_idx, :] -> (P, 3)
    direction: (N, 3)[batch_idx, :] -> (P, 3)

Returns:
    colors: (P, 3)

where batch_idx is of shape (P). For meshes, batch_idx can be:
meshes.verts_packed_to_mesh_idx() or meshes.faces_packed_to_mesh_idx()
depending on whether points refers to the vertex coordinates or
average/interpolated face coordinates.
pytorch3d.renderer.lighting.specular(points, normals, direction, color, camera_position, shininess) → torch.Tensor[source]

Calculate the specular component of light reflection.

Parameters:
  • points – (N, …, 3) xyz coordinates of the points.
  • normals – (N, …, 3) xyz normal vectors for each point.
  • color – (N, 3) RGB color of the specular component of the light.
  • direction – (N, 3) vector direction of the light.
  • camera_position – (N, 3) The xyz position of the camera.
  • shininess
    1. The specular exponent of the material.
Returns:

colors – (N, …, 3), same shape as the input points.

The points, normals, camera_position, and direction should be in the same coordinate frame i.e. if the points have been transformed from world -> view space then the normals, camera_position, and light direction should also be in view space.

To use with a batch of packed points reindex in the following way. .. code-block:: python:

Args:
    points: (P, 3)
    normals: (P, 3)
    color: (N, 3)[batch_idx] -> (P, 3)
    direction: (N, 3)[batch_idx] -> (P, 3)
    camera_position: (N, 3)[batch_idx] -> (P, 3)
    shininess: (N)[batch_idx] -> (P)
Returns:
    colors: (P, 3)

where batch_idx is of shape (P). For meshes batch_idx can be:
meshes.verts_packed_to_mesh_idx() or meshes.faces_packed_to_mesh_idx().
class pytorch3d.renderer.lighting.DirectionalLights(ambient_color=((0.5, 0.5, 0.5), ), diffuse_color=((0.3, 0.3, 0.3), ), specular_color=((0.2, 0.2, 0.2), ), direction=((0, 1, 0), ), device: Union[str, torch.device] = 'cpu')[source]
__init__(ambient_color=((0.5, 0.5, 0.5), ), diffuse_color=((0.3, 0.3, 0.3), ), specular_color=((0.2, 0.2, 0.2), ), direction=((0, 1, 0), ), device: Union[str, torch.device] = 'cpu') → None[source]
Parameters:
  • ambient_color – RGB color of the ambient component.
  • diffuse_color – RGB color of the diffuse component.
  • specular_color – RGB color of the specular component.
  • direction – (x, y, z) direction vector of the light.
  • device – Device (as str or torch.device) on which the tensors should be located
The inputs can each be
  • 3 element tuple/list or list of lists
  • torch tensor of shape (1, 3)
  • torch tensor of shape (N, 3)

The inputs are broadcast against each other so they all have batch dimension N.

clone()[source]
diffuse(normals, points=None) → torch.Tensor[source]
specular(normals, points, camera_position, shininess) → torch.Tensor[source]
class pytorch3d.renderer.lighting.PointLights(ambient_color=((0.5, 0.5, 0.5), ), diffuse_color=((0.3, 0.3, 0.3), ), specular_color=((0.2, 0.2, 0.2), ), location=((0, 1, 0), ), device: Union[str, torch.device] = 'cpu')[source]
__init__(ambient_color=((0.5, 0.5, 0.5), ), diffuse_color=((0.3, 0.3, 0.3), ), specular_color=((0.2, 0.2, 0.2), ), location=((0, 1, 0), ), device: Union[str, torch.device] = 'cpu') → None[source]
Parameters:
  • ambient_color – RGB color of the ambient component
  • diffuse_color – RGB color of the diffuse component
  • specular_color – RGB color of the specular component
  • location – xyz position of the light.
  • device – Device (as str or torch.device) on which the tensors should be located
The inputs can each be
  • 3 element tuple/list or list of lists
  • torch tensor of shape (1, 3)
  • torch tensor of shape (N, 3)

The inputs are broadcast against each other so they all have batch dimension N.

clone()[source]
reshape_location(points) → torch.Tensor[source]

Reshape the location tensor to have dimensions compatible with the points which can either be of shape (P, 3) or (N, H, W, K, 3).

diffuse(normals, points) → torch.Tensor[source]
specular(normals, points, camera_position, shininess) → torch.Tensor[source]
class pytorch3d.renderer.lighting.AmbientLights(*, ambient_color=None, device: Union[str, torch.device] = 'cpu')[source]

A light object representing the same color of light everywhere. By default, this is white, which effectively means lighting is not used in rendering.

__init__(*, ambient_color=None, device: Union[str, torch.device] = 'cpu') → None[source]

If ambient_color is provided, it should be a sequence of triples of floats.

Parameters:
  • ambient_color – RGB color
  • device – Device (as str or torch.device) on which the tensors should be located
The ambient_color if provided, should be
  • 3 element tuple/list or list of lists
  • torch tensor of shape (1, 3)
  • torch tensor of shape (N, 3)
clone()[source]
diffuse(normals, points) → torch.Tensor[source]
specular(normals, points, camera_position, shininess) → torch.Tensor[source]

materials

class pytorch3d.renderer.materials.Materials(ambient_color=((1, 1, 1), ), diffuse_color=((1, 1, 1), ), specular_color=((1, 1, 1), ), shininess=64, device: Union[str, torch.device] = 'cpu')[source]

Bases: pytorch3d.renderer.utils.TensorProperties

A class for storing a batch of material properties. Currently only one material per batch element is supported.

__init__(ambient_color=((1, 1, 1), ), diffuse_color=((1, 1, 1), ), specular_color=((1, 1, 1), ), shininess=64, device: Union[str, torch.device] = 'cpu') → None[source]
Parameters:
  • ambient_color – RGB ambient reflectivity of the material
  • diffuse_color – RGB diffuse reflectivity of the material
  • specular_color – RGB specular reflectivity of the material
  • shininess – The specular exponent for the material. This defines the focus of the specular highlight with a high value resulting in a concentrated highlight. Shininess values can range from 0-1000.
  • device – Device (as str or torch.device) on which the tensors should be located

ambient_color, diffuse_color and specular_color can be of shape (1, 3) or (N, 3). shininess can be of shape (1) or (N).

The colors and shininess are broadcast against each other so need to have either the same batch dimension or batch dimension = 1.

clone()[source]

texturing

blending

class pytorch3d.renderer.blending.BlendParams[source]

Bases: tuple

Data class to store blending params with defaults

Members:
sigma (float): Controls the width of the sigmoid function used to
calculate the 2D distance based probability. Determines the sharpness of the edges of the shape. Higher => faces have less defined edges.
gamma (float): Controls the scaling of the exponential function used
to set the opacity of the color. Higher => faces are more transparent.
background_color: RGB values for the background color as a tuple or
as a tensor of three floats.
sigma

Alias for field number 0

gamma

Alias for field number 1

background_color

Alias for field number 2

pytorch3d.renderer.blending.hard_rgb_blend(colors: torch.Tensor, fragments, blend_params: pytorch3d.renderer.blending.BlendParams) → torch.Tensor[source]
Naive blending of top K faces to return an RGBA image
  • RGB - choose color of the closest point i.e. K=0
  • A - 1.0
Parameters:
  • colors – (N, H, W, K, 3) RGB color for each of the top K faces per pixel.
  • fragments

    the outputs of rasterization. From this we use - pix_to_face: LongTensor of shape (N, H, W, K) specifying the indices

    of the faces (in the packed representation) which overlap each pixel in the image. This is used to determine the output shape.
  • blend_params – BlendParams instance that contains a background_color
  • specifying the color for the background (field) –
Returns:

RGBA pixel_colors – (N, H, W, 4)

pytorch3d.renderer.blending.sigmoid_alpha_blend(colors, fragments, blend_params: pytorch3d.renderer.blending.BlendParams) → torch.Tensor[source]
Silhouette blending to return an RGBA image
  • RGB - choose color of the closest point.
  • A - blend based on the 2D distance based probability map [1].
Parameters:
  • colors – (N, H, W, K, 3) RGB color for each of the top K faces per pixel.
  • fragments

    the outputs of rasterization. From this we use - pix_to_face: LongTensor of shape (N, H, W, K) specifying the indices

    of the faces (in the packed representation) which overlap each pixel in the image.
    • dists: FloatTensor of shape (N, H, W, K) specifying the 2D euclidean distance from the center of each pixel to each of the top K overlapping faces.
Returns:

RGBA pixel_colors – (N, H, W, 4)

[1] Liu et al, ‘Soft Rasterizer: A Differentiable Renderer for Image-based
3D Reasoning’, ICCV 2019
pytorch3d.renderer.blending.softmax_rgb_blend(colors: torch.Tensor, fragments, blend_params: pytorch3d.renderer.blending.BlendParams, znear: Union[float, torch.Tensor] = 1.0, zfar: Union[float, torch.Tensor] = 100) → torch.Tensor[source]

RGB and alpha channel blending to return an RGBA image based on the method proposed in [1]

  • RGB - blend the colors based on the 2D distance based probability map and relative z distances.
  • A - blend based on the 2D distance based probability map.
Parameters:
  • colors – (N, H, W, K, 3) RGB color for each of the top K faces per pixel.
  • fragments

    namedtuple with outputs of rasterization. We use properties - pix_to_face: LongTensor of shape (N, H, W, K) specifying the indices

    of the faces (in the packed representation) which overlap each pixel in the image.
    • dists: FloatTensor of shape (N, H, W, K) specifying the 2D euclidean distance from the center of each pixel to each of the top K overlapping faces.
    • zbuf: FloatTensor of shape (N, H, W, K) specifying the interpolated depth from each pixel to to each of the top K overlapping faces.
  • blend_params

    instance of BlendParams dataclass containing properties - sigma: float, parameter which controls the width of the sigmoid

    function used to calculate the 2D distance based probability. Sigma controls the sharpness of the edges of the shape.
    • gamma: float, parameter which controls the scaling of the exponential function used to control the opacity of the color.
    • background_color: (3) element list/tuple/torch.Tensor specifying the RGB values for the background color.
  • znear – float, near clipping plane in the z direction
  • zfar – float, far clipping plane in the z direction
Returns:

RGBA pixel_colors – (N, H, W, 4)

[0] Shichen Liu et al, ‘Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning’

shading

pytorch3d.renderer.mesh.shading.phong_shading(meshes, fragments, lights, cameras, materials, texels) → torch.Tensor[source]

Apply per pixel shading. First interpolate the vertex normals and vertex coordinates using the barycentric coordinates to get the position and normal at each pixel. Then compute the illumination for each pixel. The pixel color is obtained by multiplying the pixel textures by the ambient and diffuse illumination and adding the specular component.

Parameters:
  • meshes – Batch of meshes
  • fragments – Fragments named tuple with the outputs of rasterization
  • lights – Lights class containing a batch of lights
  • cameras – Cameras class containing a batch of cameras
  • materials – Materials class containing a batch of material properties
  • texels – texture per pixel of shape (N, H, W, K, 3)
Returns:

colors – (N, H, W, K, 3)

pytorch3d.renderer.mesh.shading.gouraud_shading(meshes, fragments, lights, cameras, materials) → torch.Tensor[source]

Apply per vertex shading. First compute the vertex illumination by applying ambient, diffuse and specular lighting. If vertex color is available, combine the ambient and diffuse vertex illumination with the vertex color and add the specular component to determine the vertex shaded color. Then interpolate the vertex shaded colors using the barycentric coordinates to get a color per pixel.

Gouraud shading is only supported for meshes with texture type TexturesVertex. This is because the illumination is applied to the vertex colors.

Parameters:
  • meshes – Batch of meshes
  • fragments – Fragments named tuple with the outputs of rasterization
  • lights – Lights class containing a batch of lights parameters
  • cameras – Cameras class containing a batch of cameras parameters
  • materials – Materials class containing a batch of material properties
Returns:

colors – (N, H, W, K, 3)

pytorch3d.renderer.mesh.shading.flat_shading(meshes, fragments, lights, cameras, materials, texels) → torch.Tensor[source]

Apply per face shading. Use the average face position and the face normals to compute the ambient, diffuse and specular lighting. Apply the ambient and diffuse color to the pixel color and add the specular component to determine the final pixel color.

Parameters:
  • meshes – Batch of meshes
  • fragments – Fragments named tuple with the outputs of rasterization
  • lights – Lights class containing a batch of lights parameters
  • cameras – Cameras class containing a batch of cameras parameters
  • materials – Materials class containing a batch of material properties
  • texels – texture per pixel of shape (N, H, W, K, 3)
Returns:

colors – (N, H, W, K, 3)

shader

class pytorch3d.renderer.mesh.shader.HardPhongShader(device: Union[str, torch.device] = 'cpu', cameras: Optional[pytorch3d.renderer.utils.TensorProperties] = None, lights: Optional[pytorch3d.renderer.utils.TensorProperties] = None, materials: Optional[pytorch3d.renderer.materials.Materials] = None, blend_params: Optional[pytorch3d.renderer.blending.BlendParams] = None)[source]

Per pixel lighting - the lighting model is applied using the interpolated coordinates and normals for each pixel. The blending function hard assigns the color of the closest face for each pixel.

To use the default values, simply initialize the shader with the desired device e.g.

to(device: Union[str, torch.device])[source]
forward(fragments: pytorch3d.renderer.mesh.rasterizer.Fragments, meshes: pytorch3d.structures.meshes.Meshes, **kwargs) → torch.Tensor[source]
class pytorch3d.renderer.mesh.shader.SoftPhongShader(device: Union[str, torch.device] = 'cpu', cameras: Optional[pytorch3d.renderer.utils.TensorProperties] = None, lights: Optional[pytorch3d.renderer.utils.TensorProperties] = None, materials: Optional[pytorch3d.renderer.materials.Materials] = None, blend_params: Optional[pytorch3d.renderer.blending.BlendParams] = None)[source]

Per pixel lighting - the lighting model is applied using the interpolated coordinates and normals for each pixel. The blending function returns the soft aggregated color using all the faces per pixel.

To use the default values, simply initialize the shader with the desired device e.g.

to(device: Union[str, torch.device])[source]
forward(fragments: pytorch3d.renderer.mesh.rasterizer.Fragments, meshes: pytorch3d.structures.meshes.Meshes, **kwargs) → torch.Tensor[source]
class pytorch3d.renderer.mesh.shader.HardGouraudShader(device: Union[str, torch.device] = 'cpu', cameras: Optional[pytorch3d.renderer.utils.TensorProperties] = None, lights: Optional[pytorch3d.renderer.utils.TensorProperties] = None, materials: Optional[pytorch3d.renderer.materials.Materials] = None, blend_params: Optional[pytorch3d.renderer.blending.BlendParams] = None)[source]

Per vertex lighting - the lighting model is applied to the vertex colors and the colors are then interpolated using the barycentric coordinates to obtain the colors for each pixel. The blending function hard assigns the color of the closest face for each pixel.

To use the default values, simply initialize the shader with the desired device e.g.

to(device: Union[str, torch.device])[source]
forward(fragments: pytorch3d.renderer.mesh.rasterizer.Fragments, meshes: pytorch3d.structures.meshes.Meshes, **kwargs) → torch.Tensor[source]
class pytorch3d.renderer.mesh.shader.SoftGouraudShader(device: Union[str, torch.device] = 'cpu', cameras: Optional[pytorch3d.renderer.utils.TensorProperties] = None, lights: Optional[pytorch3d.renderer.utils.TensorProperties] = None, materials: Optional[pytorch3d.renderer.materials.Materials] = None, blend_params: Optional[pytorch3d.renderer.blending.BlendParams] = None)[source]

Per vertex lighting - the lighting model is applied to the vertex colors and the colors are then interpolated using the barycentric coordinates to obtain the colors for each pixel. The blending function returns the soft aggregated color using all the faces per pixel.

To use the default values, simply initialize the shader with the desired device e.g.

to(device: Union[str, torch.device])[source]
forward(fragments: pytorch3d.renderer.mesh.rasterizer.Fragments, meshes: pytorch3d.structures.meshes.Meshes, **kwargs) → torch.Tensor[source]
pytorch3d.renderer.mesh.shader.TexturedSoftPhongShader(device: Union[str, torch.device] = 'cpu', cameras: Optional[pytorch3d.renderer.utils.TensorProperties] = None, lights: Optional[pytorch3d.renderer.utils.TensorProperties] = None, materials: Optional[pytorch3d.renderer.materials.Materials] = None, blend_params: Optional[pytorch3d.renderer.blending.BlendParams] = None)[source]

TexturedSoftPhongShader class has been DEPRECATED. Use SoftPhongShader instead. Preserving TexturedSoftPhongShader as a function for backwards compatibility.

class pytorch3d.renderer.mesh.shader.HardFlatShader(device: Union[str, torch.device] = 'cpu', cameras: Optional[pytorch3d.renderer.utils.TensorProperties] = None, lights: Optional[pytorch3d.renderer.utils.TensorProperties] = None, materials: Optional[pytorch3d.renderer.materials.Materials] = None, blend_params: Optional[pytorch3d.renderer.blending.BlendParams] = None)[source]

Per face lighting - the lighting model is applied using the average face position and the face normal. The blending function hard assigns the color of the closest face for each pixel.

To use the default values, simply initialize the shader with the desired device e.g.

to(device: Union[str, torch.device])[source]
forward(fragments: pytorch3d.renderer.mesh.rasterizer.Fragments, meshes: pytorch3d.structures.meshes.Meshes, **kwargs) → torch.Tensor[source]
class pytorch3d.renderer.mesh.shader.SoftSilhouetteShader(blend_params: Optional[pytorch3d.renderer.blending.BlendParams] = None)[source]

Calculate the silhouette by blending the top K faces for each pixel based on the 2d euclidean distance of the center of the pixel to the mesh face.

Use this shader for generating silhouettes similar to SoftRasterizer [0].

Note

To be consistent with SoftRasterizer, initialize the RasterizationSettings for the rasterizer with blur_radius = np.log(1. / 1e-4 - 1.) * blend_params.sigma

[0] Liu et al, ‘Soft Rasterizer: A Differentiable Renderer for Image-based
3D Reasoning’, ICCV 2019
forward(fragments: pytorch3d.renderer.mesh.rasterizer.Fragments, meshes: pytorch3d.structures.meshes.Meshes, **kwargs) → torch.Tensor[source]

Only want to render the silhouette so RGB values can be ones. There is no need for lighting or texturing

utils

class pytorch3d.renderer.utils.TensorAccessor(class_object, index: Union[int, slice])[source]

A helper class to be used with the __getitem__ method. This can be used for getting/setting the values for an attribute of a class at one particular index. This is useful when the attributes of a class are batched tensors and one element in the batch needs to be modified.

__init__(class_object, index: Union[int, slice]) → None[source]
Parameters:
  • class_object – this should be an instance of a class which has attributes which are tensors representing a batch of values.
  • index – int/slice, an index indicating the position in the batch. In __setattr__ and __getattr__ only the value of class attributes at this index will be accessed.
__setattr__(name: str, value: Any)[source]

Update the attribute given by name to the value given by value at the index specified by self.index.

Parameters:
  • name – str, name of the attribute.
  • value – value to set the attribute to.
__getattr__(name: str)[source]

Return the value of the attribute given by “name” on self.class_object at the index specified in self.index.

Parameters:name – string of the attribute name
class pytorch3d.renderer.utils.TensorProperties(dtype: torch.dtype = torch.float32, device: Union[str, torch.device] = 'cpu', **kwargs)[source]

A mix-in class for storing tensors as properties with helper methods.

__init__(dtype: torch.dtype = torch.float32, device: Union[str, torch.device] = 'cpu', **kwargs) → None[source]
Parameters:
  • dtype – data type to set for the inputs
  • device – Device (as str or torch.device)
  • kwargs – any number of keyword arguments. Any arguments which are of type (float/int/list/tuple/tensor/array) are broadcasted and other keyword arguments are set as attributes.
isempty() → bool[source]
__getitem__(index: Union[int, slice]) → pytorch3d.renderer.utils.TensorAccessor[source]
Parameters:index – an int or slice used to index all the fields.
Returns:if index is an index int/slice return a TensorAccessor class with getattribute/setattribute methods which return/update the value at the index in the original camera.
to(device: Union[str, torch.device] = 'cpu') → pytorch3d.renderer.utils.TensorProperties[source]

In place operation to move class properties which are tensors to a specified device. If self has a property “device”, update this as well.

cpu() → pytorch3d.renderer.utils.TensorProperties[source]
cuda(device: Optional[int] = None) → pytorch3d.renderer.utils.TensorProperties[source]
clone(other) → pytorch3d.renderer.utils.TensorProperties[source]

Update the tensor properties of other with the cloned properties of self.

gather_props(batch_idx) → pytorch3d.renderer.utils.TensorProperties[source]

This is an in place operation to reformat all tensor class attributes based on a set of given indices using torch.gather. This is useful when attributes which are batched tensors e.g. shape (N, 3) need to be multiplied with another tensor which has a different first dimension e.g. packed vertices of shape (V, 3).

Example

self.specular_color = (N, 3) tensor of specular colors for each mesh

A lighting calculation may use

verts_packed = meshes.verts_packed()  # (V, 3)

To multiply these two tensors the batch dimension needs to be the same. To achieve this we can do

batch_idx = meshes.verts_packed_to_mesh_idx()  # (V)

This gives index of the mesh for each vertex in verts_packed.

self.gather_props(batch_idx)
self.specular_color = (V, 3) tensor with the specular color for
                         each packed vertex.

torch.gather requires the index tensor to have the same shape as the input tensor so this method takes care of the reshaping of the index tensor to use with class attributes with arbitrary dimensions.

Parameters:batch_idx – shape (B, …) where represents an arbitrary number of dimensions
Returns:self with all properties reshaped. e.g. a property with shape (N, 3) is transformed to shape (B, 3).
pytorch3d.renderer.utils.format_tensor(input, dtype: torch.dtype = torch.float32, device: Union[str, torch.device] = 'cpu') → torch.Tensor[source]

Helper function for converting a scalar value to a tensor.

Parameters:
  • input – Python scalar, Python list/tuple, torch scalar, 1D torch tensor
  • dtype – data type for the input
  • device – Device (as str or torch.device) on which the tensor should be placed.
Returns:

input_vec – torch tensor with optional added batch dimension.

pytorch3d.renderer.utils.convert_to_tensors_and_broadcast(*args, dtype: torch.dtype = torch.float32, device: Union[str, torch.device] = 'cpu')[source]

Helper function to handle parsing an arbitrary number of inputs (*args) which all need to have the same batch dimension. The output is a list of tensors.

Parameters:
  • *args

    an arbitrary number of inputs Each of the values in args can be one of the following

    • Python scalar
    • Torch scalar
    • Torch tensor of shape (N, K_i) or (1, K_i) where K_i are an arbitrary number of dimensions which can vary for each value in args. In this case each input is broadcast to a tensor of shape (N, K_i)
  • dtype – data type to use when creating new tensors.
  • device – torch device on which the tensors should be placed.
Output:
args: A list of tensors of shape (N, K_i)

pytorch3d.transforms

pytorch3d.transforms.acos_linear_extrapolation(x: torch.Tensor, bounds: Tuple[float, float] = (-0.9999, 0.9999)) → torch.Tensor[source]

Implements arccos(x) which is linearly extrapolated outside x’s original domain of (-1, 1). This allows for stable backpropagation in case x is not guaranteed to be strictly within (-1, 1).

More specifically: ``` bounds=(lower_bound, upper_bound) if lower_bound <= x <= upper_bound:

acos_linear_extrapolation(x) = acos(x)
elif x <= lower_bound: # 1st order Taylor approximation
acos_linear_extrapolation(x)
= acos(lower_bound) + dacos/dx(lower_bound) * (x - lower_bound)
else: # x >= upper_bound
acos_linear_extrapolation(x)
= acos(upper_bound) + dacos/dx(upper_bound) * (x - upper_bound)

```

Parameters:
  • x – Input Tensor.
  • bounds – A float 2-tuple defining the region for the linear extrapolation of acos. The first/second element of bound describes the lower/upper bound that defines the lower/upper extrapolation region, i.e. the region where x <= bound[0]/bound[1] <= x. Note that all elements of bound have to be within (-1, 1).
Returns:

acos_linear_extrapolationTensor containing the extrapolated arccos(x).

pytorch3d.transforms.axis_angle_to_matrix(axis_angle)[source]

Convert rotations given as axis/angle to rotation matrices.

Parameters:axis_angle – Rotations given as a vector in axis angle form, as a tensor of shape (…, 3), where the magnitude is the angle turned anticlockwise in radians around the vector’s direction.
Returns:Rotation matrices as tensor of shape (…, 3, 3).
pytorch3d.transforms.axis_angle_to_quaternion(axis_angle)[source]

Convert rotations given as axis/angle to quaternions.

Parameters:axis_angle – Rotations given as a vector in axis angle form, as a tensor of shape (…, 3), where the magnitude is the angle turned anticlockwise in radians around the vector’s direction.
Returns:quaternions with real part first, as tensor of shape (…, 4).
pytorch3d.transforms.euler_angles_to_matrix(euler_angles, convention: str)[source]

Convert rotations given as Euler angles in radians to rotation matrices.

Parameters:
  • euler_angles – Euler angles in radians as tensor of shape (…, 3).
  • convention – Convention string of three uppercase letters from {“X”, “Y”, and “Z”}.
Returns:

Rotation matrices as tensor of shape (…, 3, 3).

pytorch3d.transforms.matrix_to_euler_angles(matrix, convention: str)[source]

Convert rotations given as rotation matrices to Euler angles in radians.

Parameters:
  • matrix – Rotation matrices as tensor of shape (…, 3, 3).
  • convention – Convention string of three uppercase letters.
Returns:

Euler angles in radians as tensor of shape (…, 3).

pytorch3d.transforms.matrix_to_quaternion(matrix: torch.Tensor) → torch.Tensor[source]

Convert rotations given as rotation matrices to quaternions.

Parameters:matrix – Rotation matrices as tensor of shape (…, 3, 3).
Returns:quaternions with real part first, as tensor of shape (…, 4).
pytorch3d.transforms.matrix_to_rotation_6d(matrix: torch.Tensor) → torch.Tensor[source]

Converts rotation matrices to 6D rotation representation by Zhou et al. [1] by dropping the last row. Note that 6D representation is not unique. :param matrix: batch of rotation matrices of size (*, 3, 3)

Returns:6D rotation representation, of size (*, 6)

[1] Zhou, Y., Barnes, C., Lu, J., Yang, J., & Li, H. On the Continuity of Rotation Representations in Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition, 2019. Retrieved from http://arxiv.org/abs/1812.07035

pytorch3d.transforms.quaternion_apply(quaternion, point)[source]

Apply the rotation given by a quaternion to a 3D point. Usual torch rules for broadcasting apply.

Parameters:
  • quaternion – Tensor of quaternions, real part first, of shape (…, 4).
  • point – Tensor of 3D points of shape (…, 3).
Returns:

Tensor of rotated points of shape (…, 3).

pytorch3d.transforms.quaternion_invert(quaternion)[source]

Given a quaternion representing rotation, get the quaternion representing its inverse.

Parameters:quaternion – Quaternions as tensor of shape (…, 4), with real part first, which must be versors (unit quaternions).
Returns:The inverse, a tensor of quaternions of shape (…, 4).
pytorch3d.transforms.quaternion_multiply(a, b)[source]

Multiply two quaternions representing rotations, returning the quaternion representing their composition, i.e. the versor with nonnegative real part. Usual torch rules for broadcasting apply.

Parameters:
  • a – Quaternions as tensor of shape (…, 4), real part first.
  • b – Quaternions as tensor of shape (…, 4), real part first.
Returns:

The product of a and b, a tensor of quaternions of shape (…, 4).

pytorch3d.transforms.quaternion_raw_multiply(a, b)[source]

Multiply two quaternions. Usual torch rules for broadcasting apply.

Parameters:
  • a – Quaternions as tensor of shape (…, 4), real part first.
  • b – Quaternions as tensor of shape (…, 4), real part first.
Returns:

The product of a and b, a tensor of quaternions shape (…, 4).

pytorch3d.transforms.quaternion_to_axis_angle(quaternions)[source]

Convert rotations given as quaternions to axis/angle.

Parameters:quaternions – quaternions with real part first, as tensor of shape (…, 4).
Returns:
Rotations given as a vector in axis angle form, as a tensor
of shape (…, 3), where the magnitude is the angle turned anticlockwise in radians around the vector’s direction.
pytorch3d.transforms.quaternion_to_matrix(quaternions)[source]

Convert rotations given as quaternions to rotation matrices.

Parameters:quaternions – quaternions with real part first, as tensor of shape (…, 4).
Returns:Rotation matrices as tensor of shape (…, 3, 3).
pytorch3d.transforms.random_quaternions(n: int, dtype: Optional[torch.dtype] = None, device: Union[str, torch.device, None] = None)[source]

Generate random quaternions representing rotations, i.e. versors with nonnegative real part.

Parameters:
  • n – Number of quaternions in a batch to return.
  • dtype – Type to return.
  • device – Desired device of returned tensor. Default: uses the current device for the default tensor type.
Returns:

Quaternions as tensor of shape (N, 4).

pytorch3d.transforms.random_rotation(dtype: Optional[torch.dtype] = None, device: Union[str, torch.device, None] = None)[source]

Generate a single random 3x3 rotation matrix.

Parameters:
  • dtype – Type to return
  • device – Device of returned tensor. Default: if None, uses the current device for the default tensor type
Returns:

Rotation matrix as tensor of shape (3, 3).

pytorch3d.transforms.random_rotations(n: int, dtype: Optional[torch.dtype] = None, device: Union[str, torch.device, None] = None)[source]

Generate random rotations as 3x3 rotation matrices.

Parameters:
  • n – Number of rotation matrices in a batch to return.
  • dtype – Type to return.
  • device – Device of returned tensor. Default: if None, uses the current device for the default tensor type.
Returns:

Rotation matrices as tensor of shape (n, 3, 3).

pytorch3d.transforms.rotation_6d_to_matrix(d6: torch.Tensor) → torch.Tensor[source]

Converts 6D rotation representation by Zhou et al. [1] to rotation matrix using Gram–Schmidt orthogonalization per Section B of [1]. :param d6: 6D rotation representation, of size (*, 6)

Returns:batch of rotation matrices of size (*, 3, 3)

[1] Zhou, Y., Barnes, C., Lu, J., Yang, J., & Li, H. On the Continuity of Rotation Representations in Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition, 2019. Retrieved from http://arxiv.org/abs/1812.07035

pytorch3d.transforms.standardize_quaternion(quaternions)[source]

Convert a unit quaternion to a standard form: one in which the real part is non negative.

Parameters:quaternions – Quaternions with real part first, as tensor of shape (…, 4).
Returns:Standardized quaternions as tensor of shape (…, 4).
pytorch3d.transforms.se3_exp_map(log_transform: torch.Tensor, eps: float = 0.0001) → torch.Tensor[source]

Convert a batch of logarithmic representations of SE(3) matrices log_transform to a batch of 4x4 SE(3) matrices using the exponential map. See e.g. [1], Sec 9.4.2. for more detailed description.

A SE(3) matrix has the following form:
` [ R 0 ] [ T 1 ] , `

where R is a 3x3 rotation matrix and T is a 3-D translation vector. SE(3) matrices are commonly used to represent rigid motions or camera extrinsics.

In the SE(3) logarithmic representation SE(3) matrices are represented as 6-dimensional vectors [log_translation | log_rotation], i.e. a concatenation of two 3D vectors log_translation and log_rotation.

The conversion from the 6D representation to a 4x4 SE(3) matrix transform is done as follows:

``` transform = exp( [ hat(log_rotation) 0 ]

[ log_translation 1 ] ) ,

```

where exp is the matrix exponential and hat is the Hat operator [2].

Note that for any log_transform with 0 <= ||log_rotation|| < 2pi (i.e. the rotation angle is between 0 and 2pi), the following identity holds: ` se3_log_map(se3_exponential_map(log_transform)) == log_transform `

The conversion has a singularity around ||log(transform)|| = 0 which is handled by clamping controlled with the eps argument.

Parameters:
  • log_transform – Batch of vectors of shape (minibatch, 6).
  • eps – A threshold for clipping the squared norm of the rotation logarithm to avoid unstable gradients in the singular case.
Returns:

Batch of transformation matrices of shape (minibatch, 4, 4).

Raises:

ValueError if log_transform is of incorrect shape.

[1] https://jinyongjeong.github.io/Download/SE3/jlblanco2010geometry3d_techrep.pdf [2] https://en.wikipedia.org/wiki/Hat_operator

pytorch3d.transforms.se3_log_map(transform: torch.Tensor, eps: float = 0.0001, cos_bound: float = 0.0001) → torch.Tensor[source]

Convert a batch of 4x4 transformation matrices transform to a batch of 6-dimensional SE(3) logarithms of the SE(3) matrices. See e.g. [1], Sec 9.4.2. for more detailed description.

A SE(3) matrix has the following form:
` [ R 0 ] [ T 1 ] , `

where R is an orthonormal 3x3 rotation matrix and T is a 3-D translation vector. SE(3) matrices are commonly used to represent rigid motions or camera extrinsics.

In the SE(3) logarithmic representation SE(3) matrices are represented as 6-dimensional vectors [log_translation | log_rotation], i.e. a concatenation of two 3D vectors log_translation and log_rotation.

The conversion from the 4x4 SE(3) matrix transform to the 6D representation log_transform = [log_translation | log_rotation] is done as follows:

` log_transform = log(transform) log_translation = log_transform[3, :3] log_rotation = inv_hat(log_transform[:3, :3]) `

where log is the matrix logarithm and inv_hat is the inverse of the Hat operator [2].

Note that for any valid 4x4 transform matrix, the following identity holds: ` se3_exp_map(se3_log_map(transform)) == transform `

The conversion has a singularity around (transform=I) which is handled by clamping controlled with the eps and cos_bound arguments.

Parameters:
  • transform – batch of SE(3) matrices of shape (minibatch, 4, 4).
  • eps – A threshold for clipping the squared norm of the rotation logarithm to avoid division by zero in the singular case.
  • cos_bound – Clamps the cosine of the rotation angle to [-1 + cos_bound, 3 - cos_bound] to avoid non-finite outputs. The non-finite outputs can be caused by passing small rotation angles to the acos function in so3_rotation_angle of so3_log_map.
Returns:

Batch of logarithms of input SE(3) matrices of shape (minibatch, 6).

Raises:
  • ValueError if transform is of incorrect shape.
  • ValueError if R has an unexpected trace.

[1] https://jinyongjeong.github.io/Download/SE3/jlblanco2010geometry3d_techrep.pdf [2] https://en.wikipedia.org/wiki/Hat_operator

pytorch3d.transforms.so3_exp_map(log_rot: torch.Tensor, eps: float = 0.0001) → torch.Tensor[source]

Convert a batch of logarithmic representations of rotation matrices log_rot to a batch of 3x3 rotation matrices using Rodrigues formula [1].

In the logarithmic representation, each rotation matrix is represented as a 3-dimensional vector (log_rot) who’s l2-norm and direction correspond to the magnitude of the rotation angle and the axis of rotation respectively.

The conversion has a singularity around log(R) = 0 which is handled by clamping controlled with the eps argument.

Parameters:
  • log_rot – Batch of vectors of shape (minibatch, 3).
  • eps – A float constant handling the conversion singularity.
Returns:

Batch of rotation matrices of shape (minibatch, 3, 3).

Raises:

ValueError if log_rot is of incorrect shape.

[1] https://en.wikipedia.org/wiki/Rodrigues%27_rotation_formula

pytorch3d.transforms.so3_exponential_map(log_rot: torch.Tensor, eps: float = 0.0001) → torch.Tensor[source]
pytorch3d.transforms.so3_log_map(R: torch.Tensor, eps: float = 0.0001, cos_bound: float = 0.0001) → torch.Tensor[source]

Convert a batch of 3x3 rotation matrices R to a batch of 3-dimensional matrix logarithms of rotation matrices The conversion has a singularity around (R=I) which is handled by clamping controlled with the eps and cos_bound arguments.

Parameters:
  • R – batch of rotation matrices of shape (minibatch, 3, 3).
  • eps – A float constant handling the conversion singularity.
  • cos_bound – Clamps the cosine of the rotation angle to [-1 + cos_bound, 1 - cos_bound] to avoid non-finite outputs/gradients of the acos call when computing so3_rotation_angle. Note that the non-finite outputs/gradients are returned when the rotation angle is close to 0 or π.
Returns:

Batch of logarithms of input rotation matrices of shape (minibatch, 3).

Raises:
  • ValueError if R is of incorrect shape.
  • ValueError if R has an unexpected trace.
pytorch3d.transforms.so3_relative_angle(R1: torch.Tensor, R2: torch.Tensor, cos_angle: bool = False, cos_bound: float = 0.0001, eps: float = 0.0001) → torch.Tensor[source]

Calculates the relative angle (in radians) between pairs of rotation matrices R1 and R2 with angle = acos(0.5 * (Trace(R1 R2^T)-1))

Note

This corresponds to a geodesic distance on the 3D manifold of rotation matrices.

Parameters:
  • R1 – Batch of rotation matrices of shape (minibatch, 3, 3).
  • R2 – Batch of rotation matrices of shape (minibatch, 3, 3).
  • cos_angle – If==True return cosine of the relative angle rather than the angle itself. This can avoid the unstable calculation of acos.
  • cos_bound – Clamps the cosine of the relative rotation angle to [-1 + cos_bound, 1 - cos_bound] to avoid non-finite outputs/gradients of the acos call. Note that the non-finite outputs/gradients are returned when the angle is requested (i.e. cos_angle==False) and the rotation angle is close to 0 or π.
  • eps – Tolerance for the valid trace check of the relative rotation matrix in so3_rotation_angle.
Returns:

Corresponding rotation angles of shape (minibatch,). If cos_angle==True, returns the cosine of the angles.

Raises:
  • ValueError if R1 or R2 is of incorrect shape.
  • ValueError if R1 or R2 has an unexpected trace.
pytorch3d.transforms.so3_rotation_angle(R: torch.Tensor, eps: float = 0.0001, cos_angle: bool = False, cos_bound: float = 0.0001) → torch.Tensor[source]

Calculates angles (in radians) of a batch of rotation matrices R with angle = acos(0.5 * (Trace(R)-1)). The trace of the input matrices is checked to be in the valid range [-1-eps,3+eps]. The eps argument is a small constant that allows for small errors caused by limited machine precision.

Parameters:
  • R – Batch of rotation matrices of shape (minibatch, 3, 3).
  • eps – Tolerance for the valid trace check.
  • cos_angle – If==True return cosine of the rotation angles rather than the angle itself. This can avoid the unstable calculation of acos.
  • cos_bound – Clamps the cosine of the rotation angle to [-1 + cos_bound, 1 - cos_bound] to avoid non-finite outputs/gradients of the acos call. Note that the non-finite outputs/gradients are returned when the angle is requested (i.e. cos_angle==False) and the rotation angle is close to 0 or π.
Returns:

Corresponding rotation angles of shape (minibatch,). If cos_angle==True, returns the cosine of the angles.

Raises:
  • ValueError if R is of incorrect shape.
  • ValueError if R has an unexpected trace.
class pytorch3d.transforms.Rotate(R: torch.Tensor, dtype: torch.dtype = torch.float32, device: Union[str, torch.device, None] = None, orthogonal_tol: float = 1e-05)[source]

Bases: pytorch3d.transforms.transform3d.Transform3d

__init__(R: torch.Tensor, dtype: torch.dtype = torch.float32, device: Union[str, torch.device, None] = None, orthogonal_tol: float = 1e-05) → None[source]

Create a new Transform3d representing 3D rotation using a rotation matrix as the input.

Parameters:
  • R – a tensor of shape (3, 3) or (N, 3, 3)
  • orthogonal_tol – tolerance for the test of the orthogonality of R
class pytorch3d.transforms.RotateAxisAngle(angle, axis: str = 'X', degrees: bool = True, dtype: torch.dtype = torch.float64, device: Union[str, torch.device, None] = None)[source]

Bases: pytorch3d.transforms.transform3d.Rotate

__init__(angle, axis: str = 'X', degrees: bool = True, dtype: torch.dtype = torch.float64, device: Union[str, torch.device, None] = None) → None[source]

Create a new Transform3d representing 3D rotation about an axis by an angle.

Assuming a right-hand coordinate system, positive rotation angles result in a counter clockwise rotation.

Parameters:
  • angle
    • A torch tensor of shape (N,)
    • A python scalar
    • A torch scalar
  • axis – string: one of [“X”, “Y”, “Z”] indicating the axis about which to rotate. NOTE: All batch elements are rotated about the same axis.
class pytorch3d.transforms.Scale(x, y=None, z=None, dtype: torch.dtype = torch.float32, device: Union[str, torch.device, None] = None)[source]

Bases: pytorch3d.transforms.transform3d.Transform3d

__init__(x, y=None, z=None, dtype: torch.dtype = torch.float32, device: Union[str, torch.device, None] = None) → None[source]

A Transform3d representing a scaling operation, with different scale factors along each coordinate axis.

Option I: Scale(s, dtype=torch.float32, device=’cpu’)
s can be one of
  • Python scalar or torch scalar: Single uniform scale
  • 1D torch tensor of shape (N,): A batch of uniform scale
  • 2D torch tensor of shape (N, 3): Scale differently along each axis
Option II: Scale(x, y, z, dtype=torch.float32, device=’cpu’)
Each of x, y, and z can be one of
  • python scalar
  • torch scalar
  • 1D torch tensor
class pytorch3d.transforms.Transform3d(dtype: torch.dtype = torch.float32, device: Union[str, torch.device] = 'cpu', matrix: Optional[torch.Tensor] = None)[source]

Bases: object

A Transform3d object encapsulates a batch of N 3D transformations, and knows how to transform points and normal vectors. Suppose that t is a Transform3d; then we can do the following:

N = len(t)
points = torch.randn(N, P, 3)
normals = torch.randn(N, P, 3)
points_transformed = t.transform_points(points)    # => (N, P, 3)
normals_transformed = t.transform_normals(normals)  # => (N, P, 3)

BROADCASTING Transform3d objects supports broadcasting. Suppose that t1 and tN are Transform3D objects with len(t1) == 1 and len(tN) == N respectively. Then we can broadcast transforms like this:

t1.transform_points(torch.randn(P, 3))     # => (P, 3)
t1.transform_points(torch.randn(1, P, 3))  # => (1, P, 3)
t1.transform_points(torch.randn(M, P, 3))  # => (M, P, 3)
tN.transform_points(torch.randn(P, 3))     # => (N, P, 3)
tN.transform_points(torch.randn(1, P, 3))  # => (N, P, 3)

COMBINING TRANSFORMS Transform3d objects can be combined in two ways: composing and stacking. Composing is function composition. Given Transform3d objects t1, t2, t3, the following all compute the same thing:

y1 = t3.transform_points(t2.transform_points(t1.transform_points(x)))
y2 = t1.compose(t2).compose(t3).transform_points(x)
y3 = t1.compose(t2, t3).transform_points(x)

Composing transforms should broadcast.

if len(t1) == 1 and len(t2) == N, then len(t1.compose(t2)) == N.

We can also stack a sequence of Transform3d objects, which represents composition along the batch dimension; then the following should compute the same thing.

N, M = len(tN), len(tM)
xN = torch.randn(N, P, 3)
xM = torch.randn(M, P, 3)
y1 = torch.cat([tN.transform_points(xN), tM.transform_points(xM)], dim=0)
y2 = tN.stack(tM).transform_points(torch.cat([xN, xM], dim=0))

BUILDING TRANSFORMS We provide convenience methods for easily building Transform3d objects as compositions of basic transforms.

# Scale by 0.5, then translate by (1, 2, 3)
t1 = Transform3d().scale(0.5).translate(1, 2, 3)

# Scale each axis by a different amount, then translate, then scale
t2 = Transform3d().scale(1, 3, 3).translate(2, 3, 1).scale(2.0)

t3 = t1.compose(t2)
tN = t1.stack(t3, t3)

BACKPROP THROUGH TRANSFORMS When building transforms, we can also parameterize them by Torch tensors; in this case we can backprop through the construction and application of Transform objects, so they could be learned via gradient descent or predicted by a neural network.

s1_params = torch.randn(N, requires_grad=True)
t_params = torch.randn(N, 3, requires_grad=True)
s2_params = torch.randn(N, 3, requires_grad=True)

t = Transform3d().scale(s1_params).translate(t_params).scale(s2_params)
x = torch.randn(N, 3)
y = t.transform_points(x)
loss = compute_loss(y)
loss.backward()

with torch.no_grad():
    s1_params -= lr * s1_params.grad
    t_params -= lr * t_params.grad
    s2_params -= lr * s2_params.grad

CONVENTIONS We adopt a right-hand coordinate system, meaning that rotation about an axis with a positive angle results in a counter clockwise rotation.

This class assumes that transformations are applied on inputs which are row vectors. The internal representation of the Nx4x4 transformation matrix is of the form:

M = [
        [Rxx, Ryx, Rzx, 0],
        [Rxy, Ryy, Rzy, 0],
        [Rxz, Ryz, Rzz, 0],
        [Tx,  Ty,  Tz,  1],
    ]

To apply the transformation to points which are row vectors, the M matrix can be pre multiplied by the points:

points = [[0, 1, 2]]  # (1 x 3) xyz coordinates of a point
transformed_points = points * M
__init__(dtype: torch.dtype = torch.float32, device: Union[str, torch.device] = 'cpu', matrix: Optional[torch.Tensor] = None) → None[source]
Parameters:
  • dtype – The data type of the transformation matrix. to be used if matrix = None.
  • device – The device for storing the implemented transformation. If matrix != None, uses the device of input matrix.
  • matrix – A tensor of shape (4, 4) or of shape (minibatch, 4, 4) representing the 4x4 3D transformation matrix. If None, initializes with identity using the specified device and dtype.
__getitem__(index: Union[int, List[int], slice, torch.Tensor]) → pytorch3d.transforms.transform3d.Transform3d[source]
Parameters:index – Specifying the index of the transform to retrieve. Can be an int, slice, list of ints, boolean, long tensor. Supports negative indices.
Returns:Transform3d object with selected transforms. The tensors are not cloned.
compose(*others)[source]

Return a new Transform3d with the transforms to compose stored as an internal list.

Parameters:*others – Any number of Transform3d objects
Returns:A new Transform3d with the stored transforms
get_matrix()[source]

Return a matrix which is the result of composing this transform with others stored in self.transforms. Where necessary transforms are broadcast against each other. For example, if self.transforms contains transforms t1, t2, and t3, and given a set of points x, the following should be true:

y1 = t1.compose(t2, t3).transform(x)
y2 = t3.transform(t2.transform(t1.transform(x)))
y1.get_matrix() == y2.get_matrix()
Returns:A transformation matrix representing the composed inputs.
inverse(invert_composed: bool = False)[source]

Returns a new Transform3D object that represents an inverse of the current transformation.

Parameters:invert_composed
  • True: First compose the list of stored transformations and then apply inverse to the result. This is potentially slower for classes of transformations with inverses that can be computed efficiently (e.g. rotations and translations).
  • False: Invert the individual stored transformations independently without composing them.
Returns:A new Transform3D object containing the inverse of the original transformation.
stack(*others)[source]
transform_points(points, eps: Optional[float] = None)[source]

Use this transform to transform a set of 3D points. Assumes row major ordering of the input points.

Parameters:
  • points – Tensor of shape (P, 3) or (N, P, 3)
  • eps – If eps!=None, the argument is used to clamp the last coordinate before performing the final division. The clamping corresponds to: last_coord := (last_coord.sign() + (last_coord==0)) * torch.clamp(last_coord.abs(), eps), i.e. the last coordinates that are exactly 0 will be clamped to +eps.
Returns:

points_out – points of shape (N, P, 3) or (P, 3) depending on the dimensions of the transform

transform_normals(normals)[source]

Use this transform to transform a set of normal vectors.

Parameters:normals – Tensor of shape (P, 3) or (N, P, 3)
Returns:normals_out – Tensor of shape (P, 3) or (N, P, 3) depending on the dimensions of the transform
translate(*args, **kwargs)[source]
scale(*args, **kwargs)[source]
rotate(*args, **kwargs)[source]
rotate_axis_angle(*args, **kwargs)[source]
clone()[source]

Deep copy of Transforms object. All internal tensors are cloned individually.

Returns:new Transforms object.
to(device: Union[str, torch.device], copy: bool = False, dtype: Optional[torch.dtype] = None)[source]

Match functionality of torch.Tensor.to() If copy = True or the self Tensor is on a different device, the returned tensor is a copy of self with the desired torch.device. If copy = False and the self Tensor already has the correct torch.device, then self is returned.

Parameters:
  • device – Device (as str or torch.device) for the new tensor.
  • copy – Boolean indicator whether or not to clone self. Default False.
  • dtype – If not None, casts the internal tensor variables to a given torch.dtype.
Returns:

Transform3d object.

cpu()[source]
cuda()[source]
class pytorch3d.transforms.Translate(x, y=None, z=None, dtype: torch.dtype = torch.float32, device: Union[str, torch.device, None] = None)[source]

Bases: pytorch3d.transforms.transform3d.Transform3d

__init__(x, y=None, z=None, dtype: torch.dtype = torch.float32, device: Union[str, torch.device, None] = None) → None[source]

Create a new Transform3d representing 3D translations.

Option I: Translate(xyz, dtype=torch.float32, device=’cpu’)
xyz should be a tensor of shape (N, 3)
Option II: Translate(x, y, z, dtype=torch.float32, device=’cpu’)

Here x, y, and z will be broadcast against each other and concatenated to form the translation. Each can be:

  • A python scalar
  • A torch scalar
  • A 1D torch tensor

pytorch3d.utils

pytorch3d.utils.cameras_from_opencv_projection(R: torch.Tensor, tvec: torch.Tensor, camera_matrix: torch.Tensor, image_size: torch.Tensor) → pytorch3d.renderer.cameras.PerspectiveCameras[source]

Converts a batch of OpenCV-conventioned cameras parametrized with the rotation matrices R, translation vectors tvec, and the camera calibration matrices camera_matrix to PerspectiveCameras in PyTorch3D convention.

More specifically, the conversion is carried out such that a projection of a 3D shape to the OpenCV-conventioned screen of size image_size results in the same image as a projection with the corresponding PyTorch3D camera to the NDC screen convention of PyTorch3D.

More specifically, the OpenCV convention projects points to the OpenCV screen space as follows:

` x_screen_opencv = camera_matrix @ (R @ x_world + tvec) `

followed by the homogenization of x_screen_opencv.

Note

The parameters R, tvec, camera_matrix correspond to the outputs of cv2.decomposeProjectionMatrix.

The rvec parameter of the cv2.projectPoints is an axis-angle vector that can be converted to the rotation matrix R expected here by calling the so3_exp_map function.

Parameters:
  • R – A batch of rotation matrices of shape (N, 3, 3).
  • tvec – A batch of translation vectors of shape (N, 3).
  • camera_matrix – A batch of camera calibration matrices of shape (N, 3, 3).
  • image_size – A tensor of shape (N, 2) containing the sizes of the images (height, width) attached to each camera.
Returns:

cameras_pytorch3d – A batch of N cameras in the PyTorch3D convention.

pytorch3d.utils.opencv_from_cameras_projection(cameras: pytorch3d.renderer.cameras.PerspectiveCameras, image_size: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Converts a batch of PerspectiveCameras into OpenCV-convention rotation matrices R, translation vectors tvec, and the camera calibration matrices camera_matrix. This operation is exactly the inverse of cameras_from_opencv_projection.

Note

The outputs R, tvec, camera_matrix correspond to the outputs of cv2.decomposeProjectionMatrix.

The rvec parameter of the cv2.projectPoints is an axis-angle vector that can be converted from the returned rotation matrix R here by calling the so3_log_map function.

Parameters:
  • cameras – A batch of N cameras in the PyTorch3D convention.
  • image_size – A tensor of shape (N, 2) containing the sizes of the images (height, width) attached to each camera.
  • return_as_rotmat (bool) – If set to True, return the full 3x3 rotation matrices. Otherwise, return an axis-angle vector (default).
Returns:

R – A batch of rotation matrices of shape (N, 3, 3). tvec: A batch of translation vectors of shape (N, 3). camera_matrix: A batch of camera calibration matrices of shape (N, 3, 3).

pytorch3d.utils.pulsar_from_cameras_projection(cameras: pytorch3d.renderer.cameras.PerspectiveCameras, image_size: torch.Tensor) → torch.Tensor[source]

Convert PyTorch3D PerspectiveCameras to Pulsar style camera parameters.

Note

  • Pulsar does NOT support different focal lengths for x and y. For conversion, we use the average of fx and fy.
  • The Pulsar renderer MUST use a left-handed coordinate system for this mapping to work.
  • The resulting image will be vertically flipped - which has to be addressed AFTER rendering by the user.
Parameters:
  • cameras – A batch of N cameras in the PyTorch3D convention.
  • image_size – A tensor of shape (N, 2) containing the sizes of the images (height, width) attached to each camera.
Returns:

cameras_pulsar

A batch of N Pulsar camera vectors in the Pulsar

convention (N, 13) (3 translation, 6 rotation, focal_length, sensor_width, c_x, c_y).

pytorch3d.utils.pulsar_from_opencv_projection(R: torch.Tensor, tvec: torch.Tensor, camera_matrix: torch.Tensor, image_size: torch.Tensor, znear: float = 0.1) → torch.Tensor[source]

Convert OpenCV style camera parameters to Pulsar style camera parameters.

Note

  • Pulsar does NOT support different focal lengths for x and y. For conversion, we use the average of fx and fy.
  • The Pulsar renderer MUST use a left-handed coordinate system for this mapping to work.
  • The resulting image will be vertically flipped - which has to be addressed AFTER rendering by the user.
  • The parameters R, tvec, camera_matrix correspond to the outputs of cv2.decomposeProjectionMatrix.
Parameters:
  • R – A batch of rotation matrices of shape (N, 3, 3).
  • tvec – A batch of translation vectors of shape (N, 3).
  • camera_matrix – A batch of camera calibration matrices of shape (N, 3, 3).
  • image_size – A tensor of shape (N, 2) containing the sizes of the images (height, width) attached to each camera.
  • znear (float) – The near clipping value to use for Pulsar.
Returns:

cameras_pulsar

A batch of N Pulsar camera vectors in the Pulsar

convention (N, 13) (3 translation, 6 rotation, focal_length, sensor_width, c_x, c_y).

pytorch3d.utils.ico_sphere(level: int = 0, device=None)[source]

Create verts and faces for a unit ico-sphere, with all faces oriented consistently.

Parameters:
  • level – integer specifying the number of iterations for subdivision of the mesh faces. Each additional level will result in four new faces per face.
  • device – A torch.device object on which the outputs will be allocated.
Returns:

Meshes object with verts and faces.

pytorch3d.utils.torus(r: float, R: float, sides: int, rings: int, device: Optional[torch.device] = None) → pytorch3d.structures.meshes.Meshes[source]

Create vertices and faces for a torus.

Parameters:
  • r – Inner radius of the torus.
  • R – Outer radius of the torus.
  • sides – Number of inner divisions.
  • rings – Number of outer divisions.
  • device – Device on which the outputs will be allocated.
Returns:

Meshes object with the generated vertices and faces.

pytorch3d.datasets

Dataset loaders for datasets including ShapeNetCore.

class pytorch3d.datasets.R2N2(split: str, shapenet_dir, r2n2_dir, splits_file, return_all_views: bool = True, return_voxels: bool = False, views_rel_path: str = 'ShapeNetRendering', voxels_rel_path: str = 'ShapeNetVoxels', load_textures: bool = True, texture_resolution: int = 4)[source]

Bases: pytorch3d.datasets.shapenet_base.ShapeNetBase

This class loads the R2N2 dataset from a given directory into a Dataset object. The R2N2 dataset contains 13 categories that are a subset of the ShapeNetCore v.1 dataset. The R2N2 dataset also contains its own 24 renderings of each object and voxelized models. Most of the models have all 24 views in the same split, but there are eight of them that divide their views between train and test splits.

__init__(split: str, shapenet_dir, r2n2_dir, splits_file, return_all_views: bool = True, return_voxels: bool = False, views_rel_path: str = 'ShapeNetRendering', voxels_rel_path: str = 'ShapeNetVoxels', load_textures: bool = True, texture_resolution: int = 4) → None[source]

Store each object’s synset id and models id the given directories.

Parameters:
  • split (str) – One of (train, val, test).
  • shapenet_dir (path) – Path to ShapeNet core v1.
  • r2n2_dir (path) – Path to the R2N2 dataset.
  • splits_file (path) – File containing the train/val/test splits.
  • return_all_views (bool) – Indicator of whether or not to load all the views in the split. If set to False, one of the views in the split will be randomly selected and loaded.
  • return_voxels (bool) – Indicator of whether or not to return voxels as a tensor of shape (D, D, D) where D is the number of voxels along each dimension.
  • views_rel_path – path to rendered views within the r2n2_dir. If not specified, the renderings are assumed to be at os.path.join(rn2n_dir, “ShapeNetRendering”).
  • voxels_rel_path – path to rendered views within the r2n2_dir. If not specified, the renderings are assumed to be at os.path.join(rn2n_dir, “ShapeNetVoxels”).
  • load_textures – Boolean indicating whether textures should loaded for the model. Textures will be of type TexturesAtlas i.e. a texture map per face.
  • texture_resolution – Int specifying the resolution of the texture map per face created using the textures in the obj file. A (texture_resolution, texture_resolution, 3) map is created per face.
__getitem__(model_idx, view_idxs: Optional[List[int]] = None) → Dict[KT, VT][source]

Read a model by the given index.

Parameters:
  • model_idx – The idx of the model to be retrieved in the dataset.
  • view_idx – List of indices of the view to be returned. Each index needs to be contained in the loaded split (always between 0 and 23, inclusive). If an invalid index is supplied, view_idx will be ignored and all the loaded views will be returned.
Returns:

dictionary with following keys

  • verts: FloatTensor of shape (V, 3).
  • faces: faces.verts_idx, LongTensor of shape (F, 3).
  • synset_id (str): synset id.
  • model_id (str): model id.
  • label (str): synset label.
  • images: FloatTensor of shape (V, H, W, C), where V is number of views
    returned. Returns a batch of the renderings of the models from the R2N2 dataset.
  • R: Rotation matrix of shape (V, 3, 3), where V is number of views returned.
  • T: Translation matrix of shape (V, 3), where V is number of views returned.
  • K: Intrinsic matrix of shape (V, 4, 4), where V is number of views returned.
  • voxels: Voxels of shape (D, D, D), where D is the number of voxels along each
    dimension.

render(model_ids: Optional[List[str]] = None, categories: Optional[List[str]] = None, sample_nums: Optional[List[int]] = None, idxs: Optional[List[int]] = None, view_idxs: Optional[List[int]] = None, shader_type=<class 'pytorch3d.renderer.mesh.shader.HardPhongShader'>, device: Union[str, torch.device] = 'cpu', **kwargs) → torch.Tensor[source]

Render models with BlenderCamera by default to achieve the same orientations as the R2N2 renderings. Also accepts other types of cameras and any of the args that the render function in the ShapeNetBase class accepts.

Parameters:
  • view_idxs – each model will be rendered with the orientation(s) of the specified views. Only render by view_idxs if no camera or args for BlenderCamera is supplied.
  • any of the args of the render function in ShapeNetBase (Accepts) –
  • model_ids – List[str] of model_ids of models intended to be rendered.
  • categories – List[str] of categories intended to be rendered. categories and sample_nums must be specified at the same time. categories can be given in the form of synset offsets or labels, or a combination of both.
  • sample_nums – List[int] of number of models to be randomly sampled from each category. Could also contain one single integer, in which case it will be broadcasted for every category.
  • idxs – List[int] of indices of models to be rendered in the dataset.
  • shader_type – Shader to use for rendering. Examples include HardPhongShader
  • SoftPhongShader etc or any other type of valid Shader class. ((default),) –
  • device – Device (as str or torch.device) on which the tensors should be located.
  • **kwargs – Accepts any of the kwargs that the renderer supports and any of the args that BlenderCamera supports.
Returns:

Batch of rendered images of shape (N, H, W, 3).

class pytorch3d.datasets.BlenderCamera(R=<Mock name='numpy.expand_dims()' id='140134975593360'>, T=<Mock name='numpy.expand_dims()' id='140134975593360'>, K=<Mock name='numpy.expand_dims()' id='140134975593360'>, device: Union[str, torch.device] = 'cpu')[source]

Bases: pytorch3d.renderer.cameras.CamerasBase

Camera for rendering objects with calibration matrices from the R2N2 dataset (which uses Blender for rendering the views for each model).

__init__(R=<Mock name='numpy.expand_dims()' id='140134975593360'>, T=<Mock name='numpy.expand_dims()' id='140134975593360'>, K=<Mock name='numpy.expand_dims()' id='140134975593360'>, device: Union[str, torch.device] = 'cpu') → None[source]
Parameters:
  • R – Rotation matrix of shape (N, 3, 3).
  • T – Translation matrix of shape (N, 3).
  • K – Intrinsic matrix of shape (N, 4, 4).
  • device – Device (as str or torch.device).
get_projection_transform(**kwargs) → pytorch3d.transforms.transform3d.Transform3d[source]
pytorch3d.datasets.collate_batched_R2N2(batch: List[Dict[KT, VT]])[source]

Take a list of objects in the form of dictionaries and merge them into a single dictionary. This function can be used with a Dataset object to create a torch.utils.data.Dataloader which directly returns Meshes objects. TODO: Add support for textures.

Parameters:batch – List of dictionaries containing information about objects in the dataset.
Returns:collated_dict
Dictionary of collated lists. If batch contains both
verts and faces, a collated mesh batch is also returned.
pytorch3d.datasets.render_cubified_voxels(voxels: torch.Tensor, shader_type=<class 'pytorch3d.renderer.mesh.shader.HardPhongShader'>, device: Union[str, torch.device] = 'cpu', **kwargs)[source]

Use the Cubify operator to convert inputs voxels to a mesh and then render that mesh.

Parameters:
  • voxels – FloatTensor of shape (N, D, D, D) where N is the batch size and D is the number of voxels along each dimension.
  • shader_type – shader_type: shader_type: Shader to use for rendering. Examples include HardPhongShader (default), SoftPhongShader etc or any other type of valid Shader class.
  • device – Device (as str or torch.device) on which the tensors should be located.
  • **kwargs – Accepts any of the kwargs that the renderer supports.
Returns:

Batch of rendered images of shape (N, H, W, 3).

class pytorch3d.datasets.ShapeNetCore(data_dir, synsets=None, version: int = 1, load_textures: bool = True, texture_resolution: int = 4)[source]

Bases: pytorch3d.datasets.shapenet_base.ShapeNetBase

This class loads ShapeNetCore from a given directory into a Dataset object. ShapeNetCore is a subset of the ShapeNet dataset and can be downloaded from https://www.shapenet.org/.

__init__(data_dir, synsets=None, version: int = 1, load_textures: bool = True, texture_resolution: int = 4) → None[source]

Store each object’s synset id and models id from data_dir.

Parameters:
  • data_dir – Path to ShapeNetCore data.
  • synsets – List of synset categories to load from ShapeNetCore in the form of synset offsets or labels. A combination of both is also accepted. When no category is specified, all categories in data_dir are loaded.
  • version – (int) version of ShapeNetCore data in data_dir, 1 or 2. Default is set to be 1. Version 1 has 57 categories and version 2 has 55 categories. Note: version 1 has two categories 02858304(boat) and 02992529(cellphone) that are hyponyms of categories 04530566(watercraft) and 04401088(telephone) respectively. You can combine the categories manually if needed. Version 2 doesn’t have 02858304(boat) or 02834778(bicycle) compared to version 1.
  • load_textures – Boolean indicating whether textures should loaded for the model. Textures will be of type TexturesAtlas i.e. a texture map per face.
  • texture_resolution – Int specifying the resolution of the texture map per face created using the textures in the obj file. A (texture_resolution, texture_resolution, 3) map is created per face.
__getitem__(idx: int) → Dict[KT, VT][source]

Read a model by the given index.

Parameters:idx – The idx of the model to be retrieved in the dataset.
Returns:dictionary with following keys
  • verts: FloatTensor of shape (V, 3).
  • faces: LongTensor of shape (F, 3) which indexes into the verts tensor.
  • synset_id (str): synset id
  • model_id (str): model id
  • label (str): synset label.
pytorch3d.datasets.collate_batched_meshes(batch: List[Dict[KT, VT]])[source]

Take a list of objects in the form of dictionaries and merge them into a single dictionary. This function can be used with a Dataset object to create a torch.utils.data.Dataloader which directly returns Meshes objects. TODO: Add support for textures.

Parameters:batch – List of dictionaries containing information about objects in the dataset.
Returns:collated_dict
Dictionary of collated lists. If batch contains both
verts and faces, a collated mesh batch is also returned.