A practical guide to object detection in images, videos, and real-time webcam feed using each CLI and Python
Object detection, a subfield of computer vision, is primarily concerned with the identification and localization of objects in images or videos with a certain degree of confidence. An identified object is usually annotated with a bounding box, which provides information to the viewer in regards to the object’s nature and site within the scene.
In 2015, the debut of , or , shook the world of computer vision as its system was able to real-time object detection with astounding accuracy and speed. Since then, YOLO has undergone several iterations of improvements in prediction accuracy and efficiency, eventually culminating in the discharge of its latest member of the family: by Ultralytics.
YOLOv8 is available in five versions: nano (n), small (s), medium (m), large (l), and further large (x). Their respective improvements could be demonstrated by their mean average precisions (mAP) and latencies, evaluated by the the COCO val2017 dataset.
In comparison with previous versions, YOLOv8 will not be only faster and more accurate, but it surely also requires fewer parameters to attain its performance and, as if that wasn’t enough, comes with an intuitive and easy-to-use command-line interface (CLI) in addition to a Python package, providing a more seamless experience for users and developers.
In this text, I’ll display how YOLOv8 could be applied to detect objects in static images, videos, and a live webcam using each CLI and Python.
Without further ado, let’s get into it!
All it is advisable to do to start with YOLOv8 is to run the next command in your terminal:
pip install ultralytics
This can install YOLOv8 via the ultralytics
pip package.
Object detection in static images has proven useful in quite a lot of domains, reminiscent of surveillance, medical imaging, or retail analytics. Whatever domain you select to use your detection system, YOLOv8 has made it incredibly easy so that you can achieve this.
Below is the raw image that we’re going to perform object detection on.
In an effort to run YOLOv8, we are going to look into each CLI and Python implementations. While on this particular case we’ll be using a jpg
image, YOLOv8 supports quite a lot of different image formats.
CLI
Assuming we’d wish to run the additional large YOLOv8x on our image (let’s call it img.jpg
), the next command could be put into the CLI:
yolo detect predict model=yolov8x.pt source="img.jpg" save=True
Here, we specify the next arguments: detect
to make use of object detection, predict
to perform a prediction task, model
to pick the model version, source
to supply the file path of our image, and save
to save lots of the processed image with the thing’s bounding boxes and their predicted classes and sophistication probabilities.
Python
In Python, the very same task could be achieved with the next intuitive and low-code solution:
from ultralytics import YOLOmodel = YOLO('yolov8x.pt')
results = model('img.jpg', save=True)
Whether you employ the CLI or Python; in either case, the saved, processed image looks as follows:
We are able to clearly see the bounding boxes around every object it detected, in addition to their corresponding class labels and probabilities.
Performing object detection on video files is nearly equivalent to image files, with the one difference being the source file format. Identical to with images, YOLOv8 supports quite a lot of different video formats that could be fed as an input to the model. In our case, we’ll be using an mp4
file.
Let’s again take a look at each CLI and Python implementations. For faster computation, let’s now use the YOLOv8m model as a substitute of the additional large version.
CLI
yolo detect predict model=yolov8m.pt source="vid.mp4" save=True
Python
from ultralytics import YOLOmodel = YOLO('yolov8m.pt')
results = model('vid.mp4', save=True)
First, let’s inspect our raw, vid.mp4
file before we perform object detection on it:
The video shows a scene of a busy city with a lot of traffic, including cars, busses, trucks, and cyclists, in addition to some people on the suitable side apparently waiting for a bus.
After processing this file using YOLOv8’s medium version, we get the next result:
Again, we are able to see that YOLOv8m does a extremely good job at accurately capturing the objects within the scene. It even detects smaller objects which can be part of a bigger whole, reminiscent of an individual on a bicycle wearing a backpack.
Finally, let’s take a take a look at what’s required to detect objects in a live webcam feed. To achieve this, I’ll use my personal webcam and, identical to before, each CLI and Python approaches.
To cut back the latency and subsequently the lag within the video, I’ll be using the light-weight nano version of YOLOv8.
CLI
yolo detect predict model=yolov8n.pt source=0 show=True
Most of those arguments are equivalent to what we have now seen above for image and video files, except for source
, which allows us to specify which video source to make use of. In my case, it’s the inbuilt webcam (0).
Python
from ultralytics import YOLOmodel = YOLO('yolov8n.pt')
model.predict(source="0", show=True)
Again, we are able to perform the identical task in Python with an ultra low-code solution.
Here’s an illustration of what YOLOv8n looks like on a live webcam:
Impressive! Despite the slightly low video quality and poor lighting conditions, it still captures the objects pretty much and even detects some objects within the background, reminiscent of the olive oil and vinegar bottles on the left and the sink on the suitable.
It’s value noting that while these intuitive CLI commands and low-code Python solutions are great ways to quickly start on an object detection task, they do have limitations in the case of custom configurations. As an example, if we’d wish to configure the aesthetics of the bounding boxes or perform a walk in the park reminiscent of counting and displaying the variety of objects which can be being detected at any given time, we might need to code up our own custom implementation using packages reminiscent of cv2 or supervision.
In actual fact, the webcam footage above has been recorded using the next Python code with the intention to adjust the webcam’s resolution and custom-define the bounding boxes and their annotations. (Note: This was mainly done to make the GIF above more presentable. The CLI and Python implementations shown above would suffice to provide similar outcomes.)
import cv2
import supervision as sv
from ultralytics import YOLOdef fundamental():
# to save lots of the video
author= cv2.VideoWriter('webcam_yolo.mp4',
cv2.VideoWriter_fourcc(*'DIVX'),
7,
(1280, 720))
# define resolution
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
# specify the model
model = YOLO("yolov8n.pt")
# customize the bounding box
box_annotator = sv.BoxAnnotator(
thickness=2,
text_thickness=2,
text_scale=1
)
while True:
ret, frame = cap.read()
result = model(frame, agnostic_nms=True)[0]
detections = sv.Detections.from_yolov8(result)
labels = [
f"{model.model.names[class_id]} {confidence:0.2f}"
for _, confidence, class_id, _
in detections
]
frame = box_annotator.annotate(
scene=frame,
detections=detections,
labels=labels
)
author.write(frame)
cv2.imshow("yolov8", frame)
if (cv2.waitKey(30) == 27): # break with escape key
break
cap.release()
author.release()
cv2.destroyAllWindows()
if __name__ == "__main__":
fundamental()
While the main points on this code are beyond the scope of this text, here’s an important reference that uses an analogous approach in case you might be eager about upping your object detection game:
YOLOv8 doesn’t only outperform its predecessors in accuracy and speed, but it surely also considerably improves user experience through an especially easy-to-use CLI and low-code Python solutions. It also is available in five different model versions, providing the user with the chance to decide on depending on their individual needs and tolerance limits for latency and accuracy.
Whether your goal is to perform object detection on static images, videos, or a live webcam, YOLOv8 lets you do that in a seamless manner. Nevertheless, should your application require custom configurations, you will have to resort to additional computer vision packages reminiscent of cv2
and supervision
.
relax music