pytorch3d.implicitron.models.view_pooler.feature_aggregator
feature_aggregator
- class pytorch3d.implicitron.models.view_pooler.feature_aggregator.ReductionFunction(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
- AVG = 'avg'
- MAX = 'max'
- STD = 'std'
- STD_AVG = 'std_avg'
- class pytorch3d.implicitron.models.view_pooler.feature_aggregator.FeatureAggregatorBase(*args, **kwargs)[source]
Bases:
ABC
,ReplaceableBase
Base class for aggregating features.
Typically, the aggregated features and their masks are output by ViewSampler which samples feature tensors extracted from a set of source images.
- Settings:
- exclude_target_view: If True/False, enables/disables pooling
from target view to itself.
- exclude_target_view_mask_features: If True,
mask the features from the target view before aggregation
- concatenate_output: If True,
concatenate the aggregated features into a single tensor, otherwise return a dictionary mapping feature names to tensors.
- exclude_target_view: bool = True
- exclude_target_view_mask_features: bool = True
- concatenate_output: bool = True
- abstract forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor] [source]
- Parameters:
feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).
masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.
camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.
pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.
- Returns:
feats_aggregated –
- If concatenate_output==True, a tensor
of shape (minibatch, reduce_dim, n_samples, sum(dim_1, … dim_N)) containing the concatenation of the aggregated features feats_sampled. reduce_dim depends on the specific feature aggregator implementation and typically equals 1 or n_source_views. If concatenate_output==False, the aggregator does not concatenate the aggregated features and returns a dictionary of per-feature aggregations {f_i: t_i_aggregated} instead. Each t_i_aggregated is of shape (minibatch, reduce_dim, n_samples, aggr_dim_i).
- abstract get_aggregated_feature_dim(feats_or_feats_dim: Dict[str, Tensor] | int)[source]
Returns the final dimensionality of the output aggregated features.
- Parameters:
feats_or_feats_dim – Either a dict of sampled features {f_i: t_i} corresponding to the feats_sampled argument of forward, or an int representing the sum of dimensionalities of each t_i.
- Returns:
aggregated_feature_dim –
- The final dimensionality of the output
aggregated features.
- class pytorch3d.implicitron.models.view_pooler.feature_aggregator.IdentityFeatureAggregator(*args, **kwargs)[source]
Bases:
Module
,FeatureAggregatorBase
This aggregator does not perform any feature aggregation. Depending on the settings the aggregator allows to mask target view features and concatenate the outputs.
- forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor] [source]
- Parameters:
feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).
masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.
camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.
pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.
- Returns:
feats_aggregated –
- If concatenate_output==True, a tensor
of shape (minibatch, 1, n_samples, sum(dim_1, … dim_N)). If concatenate_output==False, a dictionary {f_i: t_i_aggregated} with each t_i_aggregated of shape (minibatch, n_source_views, n_samples, dim_i).
- class pytorch3d.implicitron.models.view_pooler.feature_aggregator.ReductionFeatureAggregator(*args, **kwargs)[source]
Bases:
Module
,FeatureAggregatorBase
Aggregates using a set of predefined reduction_functions and concatenates the results of each aggregation function along the channel dimension. The reduction functions singularize the second dimension of the sampled features which stacks the source views.
- Settings:
- reduction_functions: A list of ReductionFunction`s that reduce the
the stack of source-view-specific features to a single feature.
- reduction_functions: Tuple[ReductionFunction, ...] = (ReductionFunction.AVG, ReductionFunction.STD)
- forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor] [source]
- Parameters:
feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).
masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.
camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.
pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.
- Returns:
feats_aggregated –
- If concatenate_output==True, a tensor
of shape (minibatch, 1, n_samples, sum(dim_1, … dim_N)). If concatenate_output==False, a dictionary {f_i: t_i_aggregated} with each t_i_aggregated of shape (minibatch, 1, n_samples, aggr_dim_i).
- class pytorch3d.implicitron.models.view_pooler.feature_aggregator.AngleWeightedReductionFeatureAggregator(*args, **kwargs)[source]
Bases:
Module
,FeatureAggregatorBase
Performs a weighted aggregation using a set of predefined reduction_functions and concatenates the results of each aggregation function along the channel dimension. The weights are proportional to the cosine of the angle between the target ray and the source ray:
weight = ( dot(target_ray, source_ray) * 0.5 + 0.5 + self.min_ray_angle_weight )**self.weight_by_ray_angle_gamma
The reduction functions singularize the second dimension of the sampled features which stacks the source views.
- Settings:
- reduction_functions: A list of `ReductionFunction`s that reduce the
the stack of source-view-specific features to a single feature.
- min_ray_angle_weight: The minimum possible aggregation weight
before rasising to the power of self.weight_by_ray_angle_gamma.
- weight_by_ray_angle_gamma: The exponent of the cosine of the ray angles
used when calculating the angle-based aggregation weights.
- reduction_functions: Tuple[ReductionFunction, ...] = (ReductionFunction.AVG, ReductionFunction.STD)
- weight_by_ray_angle_gamma: float = 1.0
- min_ray_angle_weight: float = 0.1
- forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor] [source]
- Parameters:
feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).
masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.
camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.
pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.
- Returns:
feats_aggregated –
- If concatenate_output==True, a tensor
of shape (minibatch, 1, n_samples, sum(dim_1, … dim_N)). If concatenate_output==False, a dictionary {f_i: t_i_aggregated} with each t_i_aggregated of shape (minibatch, n_source_views, n_samples, dim_i).
- class pytorch3d.implicitron.models.view_pooler.feature_aggregator.AngleWeightedIdentityFeatureAggregator(*args, **kwargs)[source]
Bases:
Module
,FeatureAggregatorBase
This aggregator does not perform any feature aggregation. It only weights the features by the weights proportional to the cosine of the angle between the target ray and the source ray:
weight = ( dot(target_ray, source_ray) * 0.5 + 0.5 + self.min_ray_angle_weight )**self.weight_by_ray_angle_gamma
- Settings:
- min_ray_angle_weight: The minimum possible aggregation weight
before rasising to the power of self.weight_by_ray_angle_gamma.
- weight_by_ray_angle_gamma: The exponent of the cosine of the ray angles
used when calculating the angle-based aggregation weights.
Additionally the aggregator allows to mask target view features and to concatenate the outputs.
- weight_by_ray_angle_gamma: float = 1.0
- min_ray_angle_weight: float = 0.1
- forward(feats_sampled: Dict[str, Tensor], masks_sampled: Tensor, camera: CamerasBase | None = None, pts: Tensor | None = None, **kwargs) Tensor | Dict[str, Tensor] [source]
- Parameters:
feats_sampled – A dict of sampled feature tensors {f_i: t_i}, where each t_i is a tensor of shape (minibatch, n_source_views, n_samples, dim_i).
masks_sampled – A binary mask represented as a tensor of shape (minibatch, n_source_views, n_samples, 1) denoting valid sampled features.
camera – A batch of n_source_views CamerasBase objects corresponding to the source view cameras.
pts – A tensor of shape (minibatch, n_samples, 3) denoting the 3D points whose 2D projections to source views were sampled in order to generate feats_sampled and masks_sampled.
- Returns:
feats_aggregated –
- If concatenate_output==True, a tensor
of shape (minibatch, n_source_views, n_samples, sum(dim_1, … dim_N)). If concatenate_output==False, a dictionary {f_i: t_i_aggregated} with each t_i_aggregated of shape (minibatch, n_source_views, n_samples, dim_i).