FCOS- Fully Convolutional One-Stage Object Detection解读
1. Anchor-Based存在的问题
- detection performance is sensitive to the sizes, aspect ratios and number of anchor boxes.
the scales and aspect ratios of anchor boxes are kept fixed, detectors encounter difficulties to deal with object candidates with large shape variations, par- ticularly for small objects.
anchor-boxes 尺寸和比例固定,导致其处理尺度变化大或者尺寸小的物体有难度。
The excessive number of negative samples aggravates the imbalance between positive and negative samples in training.
involve complicated computation such as calculating the intersection-over-union (IoU) scores with ground-truth bounding boxes
2. FCNs have achieved tremendous success in dense prediction tasks
Can we solve object detection in the neat per-pixel prediction fashion, analogue to FCN for semantic segmentation, for example
3. DenseBox存在的问题
DenseBox crops and resizes training images to a fixed scale. Thus DenseBox has to perform detection on image pyramids, which is against FCN’s philosophy of computing all convolutions once.
the highly overlapped bounding boxes result in an intractable ambiguity
1. proposal free and anchor free
2. avoids the complicated computation related to anchor boxes such as the IOU computation and matching between the anchor boxes and ground-truth boxes during training
3. achieve state-of-the- art results among one-stage detectors
show that the proposed FCOS can be used as a Region Proposal Networks (RPNs) in two-stage detectors and can achieve significantly better performance than its anchor-based RPN counterpartsFCOS
4. immediately extended to solve other vision tasks with minimal modification, including instance segmentation and key-point detection.
1. FCOS模型设计
Different from anchor-based detectors, which consider the location on the input image as the center of (multiple) anchor boxes and regress the target bounding box with these anchor boxes as references, we directly regress the target bounding box at the location
将Feature map上的点映射回原图后,不使用Anchor而是直接回归目标候选框的位置。
If a location falls into multiple bounding boxes, it is considered as an ambiguous sample. We simply choose the bounding box with minimal area as its regression target.
- 若Feature map上的点映射会原图,如果落入GT框中,则认为该点是正样本。
- It is worth noting that FCOS can leverage as many fore- ground samples as possible to train the regressor. It is dif- ferent from anchor-based detectors, which only consider the anchor boxes with a highly enough IOU with ground-truth boxes as positive samples. We argue that it may be one of the reasons that FCOS outperforms its anchor-based counterparts.
作者认为因为将Feature map上所有的点作为样本点,落入GT框的点都算正样本,会有更多的正样本加入回归器,而Anchor-based的检测器由于只使用那些与GT框 IOU较大的anchor作为训练样本。这可能是FCOS比anchor-based算法表现更好的原因。
2. 损失函数
分类使用Focal loss的分类损失函数,而回归则采用了IoU loss。
对feature map中所有点都会计算分类损失,而对正样本的点计算回归损失。
1. 模块解决问题
The large stride of the final feature maps in a CNN can result in a relatively low best possible recall (BPR)
Overlaps in ground-truth boxes can cause intractable ambiguity
真值框重叠的话会导致 训练的时候,特征图上的像素点不知道回归到哪一个真值框。
2. we directly limit the range of bounding box regression for each level.
we firstly compute the regression targets l∗, t∗, r∗ and b∗for each location on all feature levels. Next, if a location satisfies max(l∗, t∗, r∗, b∗) > mi or max(l∗, t∗, r∗, b∗) < mi−1, it is set as a negative sample and is thus not required to regress a bounding box any- more.
Since objects with different sizes are assigned to different feature levels and most overlapping happens between ob- jects with considerably different sizes. If a location, even with multi-level prediction used, is still assigned to more than one ground-truth boxes, we simply choose the groundtruth box with minimal area as its target.
3. Center-ness for FCOS
due to a lot of low-quality predicted bounding boxes produced by locations far away from the center of an object.
The center-ness depicts the normalized distance from the location to the center of the object
The center-ness ranges from 0 to 1 and is thus trained with binary cross entropy (BCE) loss. The loss is added to the loss function Eq. (2). When testing, the final score (used for ranking the detected bounding boxes) is computed by multiplying the predicted center-ness with the correspond- ing classification score. Thus the center-ness can down- weight the scores of bounding boxes far from the center of an object.
在训练阶段:center-ness用交叉熵损失函数,测试阶段:center-ness x classification score作为最后classification score的值,通过这种方式将远离中心点的候选框的置信度降低。