pytorch3d.implicitron.models.view_pooler.feature_aggregator

feature_aggregator

class pytorch3d.implicitron.models.view_pooler.feature_aggregator.ReductionFunction(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

AVG = 'avg'
MAX = 'max'
STD = 'std'
STD_AVG = 'std_avg'
class pytorch3d.implicitron.models.view_pooler.feature_aggregator.FeatureAggregatorBase(*args, **kwargs)[source]

Bases: ABC, ReplaceableBase

Base class for aggregating features.

Typically, the aggregated features and their masks are output by ViewSampler which samples feature tensors extracted from a set of source images.

Settings:
exclude_target_view: If True/False, enables/disables pooling

from target view to itself.

exclude_target_view_mask_features: If True,

mask the features from the target view before aggregation

concatenate_output: If True,

concatenate the aggregated features into a single tensor, otherwise return a dictionary mapping feature names to tensors.

exclude_target_view: bool = True
exclude_target_view_mask_features: bool = True
concatenate_output: bool = True
abstract forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor][source]
Parameters:
  • feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).

  • masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.

  • camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.

  • pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.

Returns:

feats_aggregated

If concatenate_output==True, a tensor

of shape (minibatch, reduce_dim, n_samples, sum(dim_1, … dim_N)) containing the concatenation of the aggregated features feats_sampled. reduce_dim depends on the specific feature aggregator implementation and typically equals 1 or n_source_views. If concatenate_output==False, the aggregator does not concatenate the aggregated features and returns a dictionary of per-feature aggregations {f_i: t_i_aggregated} instead. Each t_i_aggregated is of shape (minibatch, reduce_dim, n_samples, aggr_dim_i).

abstract get_aggregated_feature_dim(feats_or_feats_dim: Dict[str, Tensor] | int)[source]

Returns the final dimensionality of the output aggregated features.

Parameters:

feats_or_feats_dim – Either a dict of sampled features {f_i: t_i} corresponding to the feats_sampled argument of forward, or an int representing the sum of dimensionalities of each t_i.

Returns:

aggregated_feature_dim

The final dimensionality of the output

aggregated features.

has_aggregation() bool[source]

Specifies whether the aggregator reduces the output reduce_dim dimension to 1.

Returns:

has_aggregationTrue if reduce_dim==1, else False.

class pytorch3d.implicitron.models.view_pooler.feature_aggregator.IdentityFeatureAggregator(*args, **kwargs)[source]

Bases: Module, FeatureAggregatorBase

This aggregator does not perform any feature aggregation. Depending on the settings the aggregator allows to mask target view features and concatenate the outputs.

get_aggregated_feature_dim(feats_or_feats_dim: Dict[str, Tensor] | int)[source]
forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor][source]
Parameters:
  • feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).

  • masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.

  • camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.

  • pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.

Returns:

feats_aggregated

If concatenate_output==True, a tensor

of shape (minibatch, 1, n_samples, sum(dim_1, … dim_N)). If concatenate_output==False, a dictionary {f_i: t_i_aggregated} with each t_i_aggregated of shape (minibatch, n_source_views, n_samples, dim_i).

class pytorch3d.implicitron.models.view_pooler.feature_aggregator.ReductionFeatureAggregator(*args, **kwargs)[source]

Bases: Module, FeatureAggregatorBase

Aggregates using a set of predefined reduction_functions and concatenates the results of each aggregation function along the channel dimension. The reduction functions singularize the second dimension of the sampled features which stacks the source views.

Settings:
reduction_functions: A list of ReductionFunction`s that reduce the

the stack of source-view-specific features to a single feature.

reduction_functions: Tuple[ReductionFunction, ...] = (ReductionFunction.AVG, ReductionFunction.STD)
get_aggregated_feature_dim(feats_or_feats_dim: Dict[str, Tensor] | int)[source]
forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor][source]
Parameters:
  • feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).

  • masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.

  • camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.

  • pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.

Returns:

feats_aggregated

If concatenate_output==True, a tensor

of shape (minibatch, 1, n_samples, sum(dim_1, … dim_N)). If concatenate_output==False, a dictionary {f_i: t_i_aggregated} with each t_i_aggregated of shape (minibatch, 1, n_samples, aggr_dim_i).

class pytorch3d.implicitron.models.view_pooler.feature_aggregator.AngleWeightedReductionFeatureAggregator(*args, **kwargs)[source]

Bases: Module, FeatureAggregatorBase

Performs a weighted aggregation using a set of predefined reduction_functions and concatenates the results of each aggregation function along the channel dimension. The weights are proportional to the cosine of the angle between the target ray and the source ray:

weight = (
    dot(target_ray, source_ray) * 0.5 + 0.5 + self.min_ray_angle_weight
)**self.weight_by_ray_angle_gamma

The reduction functions singularize the second dimension of the sampled features which stacks the source views.

Settings:
reduction_functions: A list of `ReductionFunction`s that reduce the

the stack of source-view-specific features to a single feature.

min_ray_angle_weight: The minimum possible aggregation weight

before rasising to the power of self.weight_by_ray_angle_gamma.

weight_by_ray_angle_gamma: The exponent of the cosine of the ray angles

used when calculating the angle-based aggregation weights.

reduction_functions: Tuple[ReductionFunction, ...] = (ReductionFunction.AVG, ReductionFunction.STD)
weight_by_ray_angle_gamma: float = 1.0
min_ray_angle_weight: float = 0.1
get_aggregated_feature_dim(feats_or_feats_dim: Dict[str, Tensor] | int)[source]
forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor][source]
Parameters:
  • feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).

  • masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.

  • camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.

  • pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.

Returns:

feats_aggregated

If concatenate_output==True, a tensor

of shape (minibatch, 1, n_samples, sum(dim_1, … dim_N)). If concatenate_output==False, a dictionary {f_i: t_i_aggregated} with each t_i_aggregated of shape (minibatch, n_source_views, n_samples, dim_i).

class pytorch3d.implicitron.models.view_pooler.feature_aggregator.AngleWeightedIdentityFeatureAggregator(*args, **kwargs)[source]

Bases: Module, FeatureAggregatorBase

This aggregator does not perform any feature aggregation. It only weights the features by the weights proportional to the cosine of the angle between the target ray and the source ray:

weight = (
    dot(target_ray, source_ray) * 0.5 + 0.5 + self.min_ray_angle_weight
)**self.weight_by_ray_angle_gamma
Settings:
min_ray_angle_weight: The minimum possible aggregation weight

before rasising to the power of self.weight_by_ray_angle_gamma.

weight_by_ray_angle_gamma: The exponent of the cosine of the ray angles

used when calculating the angle-based aggregation weights.

Additionally the aggregator allows to mask target view features and to concatenate the outputs.

weight_by_ray_angle_gamma: float = 1.0
min_ray_angle_weight: float = 0.1
get_aggregated_feature_dim(feats_or_feats_dim: Dict[str, Tensor] | int)[source]
forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor][source]
Parameters:
  • feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).

  • masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.

  • camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.

  • pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.

Returns:

feats_aggregated

If concatenate_output==True, a tensor

of shape (minibatch, n_source_views, n_samples, sum(dim_1, … dim_N)). If concatenate_output==False, a dictionary {f_i: t_i_aggregated} with each t_i_aggregated of shape (minibatch, n_source_views, n_samples, dim_i).