Paper:
Sun, Xiao, et al. “Compositional human pose regression.” The IEEE International Conference on Computer Vision (ICCV). Vol. 2. 2017.
Key: Structure-aware
-
-
- Performance:
- 48.3mm on H3.6M Protocol 1 (Avg joint error)
- 59.1mm on H3.6M Protocol 2 (Avg joint error)
- PCK(0.5) 86.4 on MPII
- Evaluation
- Metrics:
- Absolute
- 3D: Procrustes Analysis + MPJPE
- 2D: PCK
- Relative:
- 2D: Mean per bone position error
- 3D pose: bone length standard deviation and the percentage of illegal joint angle.
- Absolute
- MPII, H3.6M
- Metrics:
- Basics
- Structure-aware approach
- Use bones instead of joints as pose representation.
- Use joint connection structure to define a compositional loss function.
- Just re-parameterizes the pose representation. Compatible with any other algorithm design.
- Both 3D and 2D
- Main method
- Use L1 norm for joint regression. (instead of squared distance)
- Bone based representation.
- Bone is easier to learn compared with joints. And Bone can express constraints more easily than joints.
- Many pose-driven applications only need local bone, not global joints.
- Use L1 norm for bone loss function.
- Bone is a vector from one joint to another joint. Then the relative joint position is the summation of the bones along the path.
- Network
- ResNet-50 pre-trained on ImageNet
- Last FC outputs 3-coordinates (or 2-coordinates)
- Fine-tuned on the task
- Performance:
-
-
-
- Other methods mentioned
- Detection based and regression based
- The heatmaps are usually noisy and multi-mode
- Problem: Simply minimize the per-joint location errors independently but ignore the internal structures of the pose.
- 3D pose estimation
- Not use prior knowledge in 3D model
- Use two separate steps: First do 2D joint prediction, then re-construct the 3D pose via optimization or search.
- [[20] Sparseness Meets Deepness] combines uncertainty maps of the 2D joints location and a sparsity-driven 3D geometric prior to infer the 3D joint location via an EM (expectation maximization) algorithm
- Represents 3D pose with an over-complete dictionary, use high-dim latent pose representation
- Extend Hourglass from 2D to 3D
- Use prior knowledge in 3D model
- Embedding kinematic model layer into deep neutral networks and estimating model parameters instead of joints.
- The kinematic model parameterization is highly non-linear and its optimization in deep networks is hard.
- Embedding kinematic model layer into deep neutral networks and estimating model parameters instead of joints.
- Not use prior knowledge in 3D model
- 2D pose estimation
- Pure Graphical models, inference models.
- PS model
- Graphical model with CNN
- Pure Graphical models, inference models.
- Detection based and regression based
- Evaluation
- Dataset: H3.6M
- Metrics:
- 59.1 mm Average joint error.
- 86.4% PCK(h0.5)
- Coding
- Caffe
- Two GPU
- Other methods mentioned
-