Object detection seeks to answer of one the most fundamental and challenging questions in computer vision: What objects are where? Computational models and various techniques are leveraged to detect instances of humans, animals, cars, or other visual objects in digital images at varying degrees of accuracy and speed.
A rapid evolution has unfolded over more than two decades in this technology space, and its impact on the entire field of computer vision has been profound. Indeed, object detection is the basis for many significant tasks in computer vision, including instance segmentation, image captioning, and object tracking. Today’s technique for object detection—underlying autonomous driving, robot vision, video surveillance, and several other real-world applications—originates in the ingenious thinking and long-term perspective design of early computer vision of the 1990s and is the product of key advances driven by deep learning.
A paper published in the Proceedings of the IEEE March 2023 issue extensively reviews the fast-moving research around object detection in the light of technical evolution spanning more than a quarter of a century. The paper delivers insight into the technology from many viewpoints by addressing a range of critical topics including milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speedup techniques, and recent state-of-the-art detection methods. The paper shows that, while the technology shares some common challenges with the other tasks of computer vision, object detection is also characterized by tasks with totally different objectives, constraints, and varying difficulty. These include object rotation and scale changes (for example, small objects), accurate object localization, dense and occluded object detection, and speedup of detection.
“Object Detection in 20 Years: A Survey,” furthermore, also offers a peek into the technology’s promising future directions:
- Lightweight object detection for accelerating detection inference on low-power edge devices
- End-to-end object detection from image to box in a network
- Small object detection for possible applications including counting the population of people in a crowd or animals in the open air
- Three-dimensional object detection utilizing multisource and multiview data
- Real-time detection and tracking in high-definition video
- Cross-modality detection for more closely emulating human perception
- “Toward open-world detection” topics, such as out-of-domain generalization, zero-shot detection, and incremental detection