Problem Definition¶

A General Problem¶

Suppose an object detection dataset \(S\) contains \(K\) different kinds of objects; each image contains \(M_i\) objects \(o_1, \ldots, o_{M_i}\):

\[ S = \{ \mathbf{z}_i \}_{i=1}^n = \{ I_i, \mathbf{y}_{o_1}, \ldots, \mathbf{y}_{o_{M_i}} \}_{i=1}^n \]

where

\[ I_i \in \mathbb{W} \times \mathbb{H}\ \times \mathbb{C} \]

\[ \mathbf{y}_{o_j} = \{ c_{o_j}, x_{o_j}, y_{o_j}, w_{o_j}, h_{o_j} \} \in \mathbb{K} \times \mathbb{Z}^4 \]

where \(c\) denotes object category, \(x\) and \(y\) are the bounding box center coordinates, \(w\) and \(h\) represent width and height of the bounding box respectively[1].

Back to Object Detection.