Problem Definition¶
A General Problem¶
Suppose an object detection dataset \(S\) contains \(K\) different kinds of objects; each image contains \(M_i\) objects \(o_1, \ldots, o_{M_i}\):
\[S = \{ \mathbf{z}_i \}_{i=1}^n
= \{ I_i, \mathbf{y}_{o_1}, \ldots, \mathbf{y}_{o_{M_i}} \}_{i=1}^n\]
where
\[I_i \in \mathbb{W} \times \mathbb{H}\ \times \mathbb{C}\]
\[\mathbf{y}_{o_j} =
\{ c_{o_j}, x_{o_j}, y_{o_j}, w_{o_j}, h_{o_j} \}
\in \mathbb{K} \times \mathbb{Z}^4\]
where \(c\) denotes object category, \(x\) and \(y\) are the bounding box center coordinates, \(w\) and \(h\) represent width and height of the bounding box respectively [1].
Back to Object Detection.