Faster R-CNN: Fast R-CNN Detector

RoI Pooling

  • it converts any variable size tensor into fixed size ones

  • why not resizing: max pooling is differentiable, while resizing is not

  • input

    • feature map out of the backbone

    • box coordinates in relative format

  • output: cropped feature map to feed detector

  • max pooling vs. average pooling (TBD)

    • robustness to noise

    • max pooling captures the most important feature in each sliding window, while average pooling may dilute the information of important features

FC Layers

Back to Object Detection.