class pytorch3d.implicitron.models.view_pooler.view_pooler.ViewPooler(*args, **kwargs)[source]

Bases: Configurable, Module

Implements sampling of image-based features at the 2d projections of a set of 3D points, and a subsequent aggregation of the resulting set of features per-point.

  • view_sampler – An instance of ViewSampler which is used for sampling of image-based features at the 2D projections of a set of 3D points.

  • feature_aggregator_class_type – The name of the feature aggregator class which is available in the global registry.

  • feature_aggregator – A feature aggregator class which inherits from FeatureAggregatorBase. Typically, the aggregated features and their masks are output by a ViewSampler which samples feature tensors extracted from a set of source images. FeatureAggregator executes step (4) above.

view_sampler: ViewSampler
feature_aggregator_class_type: str = 'AngleWeightedReductionFeatureAggregator'
feature_aggregator: FeatureAggregatorBase
get_aggregated_feature_dim(feats: Dict[str, Tensor] | int)[source]

Returns the final dimensionality of the output aggregated features.


feats – Either a dict of sampled features {f_i: t_i} corresponding to the feats_sampled argument of feature_aggregator,forward, or an int representing the sum of dimensionalities of each t_i.



The final dimensionality of the output

aggregated features.


Specifies whether the feature_aggregator reduces the output reduce_dim dimension to 1.


has_aggregationTrue if reduce_dim==1, else False.

forward(*, pts: Tensor, seq_id_pts: List[int] | List[str] | LongTensor, camera: CamerasBase, seq_id_camera: List[int] | List[str] | LongTensor, feats: Dict[str, Tensor], masks: Tensor | None, **kwargs) Tensor | Dict[str, Tensor][source]

Project each point cloud from a batch of point clouds to corresponding input cameras, sample features at the 2D projection locations in a batch of source images, and aggregate the pointwise sampled features.

  • pts – A tensor of shape [pts_batch x n_pts x 3] in world coords.

  • seq_id_pts – LongTensor of shape [pts_batch] denoting the ids of the scenes from which pts were extracted, or a list of string names.

  • camera – ‘n_cameras’ cameras, each coresponding to a batch element of feats.

  • seq_id_camera – LongTensor of shape [n_cameras] denoting the ids of the scenes corresponding to cameras in camera, or a list of string names.

  • feats – a dict of tensors of per-image features {feat_i: T_i}. Each tensor T_i is of shape [n_cameras x dim_i x H_i x W_i].

  • masks[n_cameras x 1 x H x W], define valid image regions for sampling feats.



If feature_aggregator.concatenate_output==True, a tensor

of shape (pts_batch, reduce_dim, n_pts, sum(dim_1, … dim_N)) containing the aggregated features. reduce_dim depends on the specific feature aggregator implementation and typically equals 1 or n_cameras. If feature_aggregator.concatenate_output==False, the aggregator does not concatenate the aggregated features and returns a dictionary of per-feature aggregations {f_i: t_i_aggregated} instead. Each t_i_aggregated is of shape (pts_batch, reduce_dim, n_pts, aggr_dim_i).