5.4 YOLO Module API Manual#
1. Overview#
This manual aims to guide developers in deploying and using models trained and converted with YOLOv5, YOLOv8, and YOLO11 through the YOLO module. The models support three types of tasks: classification, detection, and segmentation. It helps users quickly integrate with the YOLO source code and deploy the trained models on the K230. For YOLO usage examples, refer to the documentation: YOLO Battle.
2. API Introduction#
2.1 YOLOv5 Class#
2.1.1 Constructor#
Description
This is the constructor of the encapsulated YOLOv5
module, which initializes a YOLOv5
type to obtain a YOLOv5
instance.
Syntax
from libs.YOLO import YOLOv5
yolo=YOLOv5(task_type="classify",mode="image",kmodel_path="yolov5_det.kmodel",labels=["apple","banana","orange"],rgb888p_size=[1280,720],model_input_size=[320,320],conf_thresh=0.5,nms_thresh=0.25,max_boxes_num=50,debug_mode=0)
Parameters
Parameter Name |
Description |
Explanation |
Type |
---|---|---|---|
task_type |
Task type |
Supports three types of tasks, with options ‘classify’/’detect’/’segment’. |
str |
mode |
Inference mode |
Supports two inference modes, with options ‘image’/’video’. ‘image’ indicates image inference, and ‘video’ indicates real - time video stream inference from the camera. |
str |
kmodel_path |
kmodel path |
The path of the kmodel copied to the development board. |
str |
labels |
Class label list |
The label names of different classes. |
list[str] |
rgb888p_size |
Inference frame resolution |
The resolution of the current inference frame, such as [1920,1080], [1280,720], [640,640]. |
list[int] |
model_input_size |
Model input resolution |
The input resolution during YOLOv5 model training, such as [224,224], [320,320], [640,640]. |
list[int] |
display_size |
Display resolution |
Set when the inference mode is ‘video’, supporting hdmi ([1920,1080]) and lcd ([800,480]). |
list[int] |
conf_thresh |
Confidence threshold |
The class confidence threshold for classification tasks and the target confidence threshold for detection and segmentation tasks, such as 0.5. |
float [0~1] |
nms_thresh |
NMS threshold |
The non - maximum suppression threshold, required for detection and segmentation tasks. |
float [0~1] |
mask_thresh |
Mask threshold |
The binary threshold for segmenting objects in the detection box in segmentation tasks. |
float [0~1] |
max_boxes_num |
Maximum number of detection boxes |
The maximum number of detection boxes allowed to be returned in one frame of image. |
int |
debug_mode |
Debug mode |
Whether the timing function is enabled, with options 0/1. 0 means no timing, and 1 means timing. |
int [0/1] |
Return Value
Return Value |
Description |
---|---|
YOLOv5 |
YOLOv5 instance |
2.1.2 config_preprocess#
Description This is the YOLOv5 pre - processing configuration function.
Syntax
yolo.config_preprocess()
Parameters
Parameter Name |
Description |
Input/Output |
Explanation |
---|---|---|---|
None |
Return Value
Return Value |
Description |
---|---|
None |
2.1.3 run#
Description
Infer one frame of image and return the inference result for use in the draw_result
method. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices.
Syntax
res=yolo.run(img)
Parameters
Parameter Name |
Description |
Input/Output |
Explanation |
---|---|---|---|
img |
The image to be inferred in the format of ulab.numpy.ndarray, or one frame of image obtained from the video stream through |
Input |
Return Value
Return Value |
Description |
---|---|
res |
The post - processing result of the model. The return values vary for different tasks. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices. |
2.1.4 draw_result#
Description
Draw the YOLOv5
inference result on the screen or image.
Syntax
yolo.draw_result(res,img_ori)
Parameters
Parameter Name |
Description |
Input/Output |
Explanation |
---|---|---|---|
res |
The inference result of |
Input |
|
img_ori |
The Image instance to be drawn on |
Input |
From |
Return Value
Return Value |
Description |
---|---|
None |
2.1.5 Example Programs#
The following is an example program for the YOLOv5
detection task:
from libs.YOLO import YOLOv5
import os,sys,gc
import ulab.numpy as np
import image
# Read an image from the local disk and convert from HWC to CHW
def read_img(img_path):
img_data = image.Image(img_path)
img_data_rgb888=img_data.to_rgb888()
img_hwc=img_data_rgb888.to_numpy_ref()
shape=img_hwc.shape
img_tmp = img_hwc.reshape((shape[0] * shape[1], shape[2]))
img_tmp_trans = img_tmp.transpose()
img_res=img_tmp_trans.copy()
img_return=img_res.reshape((shape[2],shape[0],shape[1]))
return img_return,img_data_rgb888
if __name__=="__main__":
# You can modify the path to fit your own model
img_path="/data/test_images/test.jpg"
kmodel_path="/data/yolo_kmodels/det_yolov5n_320.kmodel"
labels = ["apple","banana","orange"]
confidence_threshold = 0.5
nms_threshold=0.45
model_input_size=[320,320]
img,img_ori=read_img(img_path)
rgb888p_size=[img.shape[2],img.shape[1]]
# Initialize the YOLOv5 instance
yolo=YOLOv5(task_type="detect",mode="image",kmodel_path=kmodel_path,labels=labels,rgb888p_size=rgb888p_size,model_input_size=model_input_size,conf_thresh=confidence_threshold,nms_thresh=nms_threshold,max_boxes_num=50,debug_mode=0)
yolo.config_preprocess()
try:
res=yolo.run(img)
yolo.draw_result(res,img_ori)
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
yolo.deinit()
The above code shows how to use YOLOv5
for image inference.
from libs.PipeLine import PipeLine, ScopedTiming
from libs.YOLO import YOLOv5
import os,sys,gc
import ulab.numpy as np
import image
if __name__=="__main__":
# Display mode, default is "hdmi", can choose "hdmi" or "lcd"
display_mode="hdmi"
rgb888p_size=[1280,720]
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# You can modify the path to fit your own model
kmodel_path="/data/yolo_kmodels/det_yolov5n_320.kmodel"
labels = ["apple","banana","orange"]
confidence_threshold = 0.8
nms_threshold=0.45
model_input_size=[320,320]
# Initialize the PipeLine
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
# Initialize the YOLOv5 instance
yolo=YOLOv5(task_type="detect",mode="video",kmodel_path=kmodel_path,labels=labels,rgb888p_size=rgb888p_size,model_input_size=model_input_size,display_size=display_size,conf_thresh=confidence_threshold,nms_thresh=nms_threshold,max_boxes_num=50,debug_mode=0)
yolo.config_preprocess()
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
# Infer frame by frame
img=pl.get_frame()
res=yolo.run(img)
yolo.draw_result(res,pl.osd_img)
pl.show_image()
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
yolo.deinit()
pl.destroy()
The above code shows how to use YOLOv5
for video inference.
2.2 YOLOv8 Class#
2.2.1 Constructor#
Description
This is the constructor of the encapsulated YOLOv8
module, which initializes a YOLOv8
type to obtain a YOLOv8
instance.
Syntax
from libs.YOLO import YOLOv8
yolo=YOLOv8(task_type="classify",mode="image",kmodel_path="yolov8_det.kmodel",labels=["apple","banana","orange"],rgb888p_size=[1280,720],model_input_size=[320,320],conf_thresh=0.5,nms_thresh=0.25,max_boxes_num=50,debug_mode=0)
Parameters
Parameter Name |
Description |
Explanation |
Type |
---|---|---|---|
task_type |
Task type |
Supports three types of tasks, with options ‘classify’/’detect’/’segment’. |
str |
mode |
Inference mode |
Supports two inference modes, with options ‘image’/’video’. ‘image’ indicates image inference, and ‘video’ indicates real - time video stream inference from the camera. |
str |
kmodel_path |
kmodel path |
The path of the kmodel copied to the development board. |
str |
labels |
Class label list |
The label names of different classes. |
list[str] |
rgb888p_size |
Inference frame resolution |
The resolution of the current inference frame, such as [1920,1080], [1280,720], [640,640]. |
list[int] |
model_input_size |
Model input resolution |
The input resolution during YOLOv8 model training, such as [224,224], [320,320], [640,640]. |
list[int] |
display_size |
Display resolution |
Set when the inference mode is ‘video’, supporting hdmi ([1920,1080]) and lcd ([800,480]). |
list[int] |
conf_thresh |
Confidence threshold |
The class confidence threshold for classification tasks and the target confidence threshold for detection and segmentation tasks, such as 0.5. |
float [0~1] |
nms_thresh |
NMS threshold |
The non - maximum suppression threshold, required for detection and segmentation tasks. |
float [0~1] |
mask_thresh |
Mask threshold |
The binary threshold for segmenting objects in the detection box in segmentation tasks. |
float [0~1] |
max_boxes_num |
Maximum number of detection boxes |
The maximum number of detection boxes allowed to be returned in one frame of image. |
int |
debug_mode |
Debug mode |
Whether the timing function is enabled, with options 0/1. 0 means no timing, and 1 means timing. |
int [0/1] |
Return Value
Return Value |
Description |
---|---|
YOLOv8 |
YOLOv8 instance |
2.2.2 config_preprocess#
Description This is the YOLOv8 pre - processing configuration function.
Syntax
yolo.config_preprocess()
Parameters
Parameter Name |
Description |
Input/Output |
Explanation |
---|---|---|---|
None |
Return Value
Return Value |
Description |
---|---|
None |
2.2.3 run#
Description
Infer one frame of image and return the inference result for use in the draw_result
method. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices.
Syntax
res=yolo.run(img)
Parameters
Parameter Name |
Description |
Input/Output |
Explanation |
---|---|---|---|
img |
The image to be inferred in the format of ulab.numpy.ndarray, or one frame of image obtained from the video stream through |
Input |
Return Value
Return Value |
Description |
---|---|
res |
The post - processing result of the model. The return values vary for different tasks. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices. |
2.2.4 draw_result#
Description
Draw the YOLOv8
inference result on the screen or image.
Syntax
yolo.draw_result(res,img_ori)
Parameters
Parameter Name |
Description |
Input/Output |
Explanation |
---|---|---|---|
res |
The inference result of |
Input |
|
img_ori |
The Image instance to be drawn on |
Input |
From |
Return Value
Return Value |
Description |
---|---|
None |
2.2.5 Example Programs#
The following is an example program for the YOLOv8
classification task:
from libs.YOLO import YOLOv8
import os,sys,gc
import ulab.numpy as np
import image
# Read an image from the local disk and convert from HWC to CHW
def read_img(img_path):
img_data = image.Image(img_path)
img_data_rgb888=img_data.to_rgb888()
img_hwc=img_data_rgb888.to_numpy_ref()
shape=img_hwc.shape
img_tmp = img_hwc.reshape((shape[0] * shape[1], shape[2]))
img_tmp_trans = img_tmp.transpose()
img_res=img_tmp_trans.copy()
img_return=img_res.reshape((shape[2],shape[0],shape[1]))
return img_return,img_data_rgb888
if __name__=="__main__":
# You can modify the path parameters according to your model
img_path="/data/test_images/test_apple.jpg"
kmodel_path="/data/yolo_kmodels/cls_yolov8n_224.kmodel"
labels = ["apple","banana","orange"]
confidence_threshold = 0.5
model_input_size=[224,224]
img,img_ori=read_img(img_path)
rgb888p_size=[img.shape[2],img.shape[1]]
# Initialize the YOLOv8 instance
yolo=YOLOv8(task_type="classify",mode="image",kmodel_path=kmodel_path,labels=labels,rgb888p_size=rgb888p_size,model_input_size=model_input_size,conf_thresh=confidence_threshold,debug_mode=0)
yolo.config_preprocess()
try:
res=yolo.run(img)
yolo.draw_result(res,img_ori)
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
yolo.deinit()
The above code shows how to use YOLOv8
for image inference.
from libs.PipeLine import PipeLine, ScopedTiming
from libs.YOLO import YOLOv8
import os,sys,gc
import ulab.numpy as np
import image
if __name__=="__main__":
# Display mode, default is "hdmi", can choose "hdmi" or "lcd"
display_mode="hdmi"
rgb888p_size=[1280,720]
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# You can modify the path parameters according to your model
kmodel_path="/data/yolo_kmodels/cls_yolov8n_224.kmodel"
labels = ["apple","banana","orange"]
confidence_threshold = 0.8
model_input_size=[224,224]
# Initialize the PipeLine
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
# Initialize the YOLOv8 instance
yolo=YOLOv8(task_type="classify",mode="video",kmodel_path=kmodel_path,labels=labels,rgb888p_size=rgb888p_size,model_input_size=model_input_size,display_size=display_size,conf_thresh=confidence_threshold,debug_mode=0)
yolo.config_preprocess()
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
# Infer frame by frame
img=pl.get_frame()
res=yolo.run(img)
yolo.draw_result(res,pl.osd_img)
pl.show_image()
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
yolo.deinit()
pl.destroy()
The above code shows how to use YOLOv8
for video inference.
2.3 YOLO11 Class#
2.3.1 Constructor#
Description This is the constructor of the encapsulated YOLO11 module, which initializes the YOLO11 type to obtain a YOLO11 instance.
Syntax
from libs.YOLO import YOLO11
yolo=YOLO11(task_type="segment",mode="image",kmodel_path="yolo11_det.kmodel",labels=["apple","banana","orange"],rgb888p_size=[1280,720],model_input_size=[320,320],conf_thresh=0.5,nms_thresh=0.25,mask_thresh=0.5,max_boxes_num=50,debug_mode=0)
Parameters
Parameter Name |
Description |
Explanation |
Type |
---|---|---|---|
task_type |
Task type |
Supports three types of tasks, with available options: ‘classify’, ‘detect’, ‘segment’. |
str |
mode |
Inference mode |
Supports two inference modes, with available options: ‘image’ and ‘video’. ‘image’ means inferring images, and ‘video’ means inferring real - time video streams captured by the camera. |
str |
kmodel_path |
kmodel path |
The path of the kmodel copied to the development board. |
str |
labels |
Class label list |
The label names of different classes. |
list[str] |
rgb888p_size |
Inference frame resolution |
The resolution of the current inference frame, such as [1920, 1080], [1280, 720], [640, 640]. |
list[int] |
model_input_size |
Model input resolution |
The input resolution during the training of the YOLO11 model, such as [224, 224], [320, 320], [640, 640]. |
list[int] |
display_size |
Display resolution |
Set when the inference mode is ‘video’, supporting HDMI ([1920, 1080]) and LCD ([800, 480]). |
list[int] |
conf_thresh |
Confidence threshold |
The class confidence threshold for classification tasks and the target confidence threshold for detection and segmentation tasks, e.g., 0.5. |
float [0~1] |
nms_thresh |
NMS threshold |
The non - maximum suppression threshold, required for detection and segmentation tasks. |
float [0~1] |
mask_thresh |
Mask threshold |
The binary threshold for segmenting objects within the detection box in segmentation tasks. |
float [0~1] |
max_boxes_num |
Maximum number of detection boxes |
The maximum number of detection boxes allowed to be returned in one frame of an image. |
int |
debug_mode |
Debug mode |
Whether the timing function is enabled. Options are 0 or 1, where 0 means no timing and 1 means timing. |
int [0/1] |
Return Value
Return Value |
Description |
---|---|
YOLO11 |
YOLO11 instance |
2.3.2 config_preprocess#
Description This is the YOLO11 pre - processing configuration function.
Syntax
yolo.config_preprocess()
Parameters
Parameter Name |
Description |
Input/Output |
Explanation |
---|---|---|---|
None |
Return Value
Return Value |
Description |
---|---|
None |
2.3.3 run#
Description
Infer one frame of an image and return the inference result for use in the draw_result
method. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices.
Syntax
res=yolo.run(img)
Parameters
Parameter Name |
Description |
Input/Output |
Explanation |
---|---|---|---|
img |
The image to be inferred in the format of |
Input |
Return Value
Return Value |
Description |
---|---|
res |
The post - processing result of the model. The return values vary for different tasks. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices. |
2.3.4 draw_result#
Description
Draw the YOLO11
inference result on the screen or an image.
Syntax
yolo.draw_result(res,img_ori)
Parameters
Parameter Name |
Description |
Input/Output |
Explanation |
---|---|---|---|
res |
The inference result of |
Input |
|
img_ori |
The Image instance to be drawn on |
Input |
From |
Return Value
Return Value |
Description |
---|---|
None |
2.3.5 Example Programs#
The following is an example program for the YOLO11
segmentation task:
from libs.YOLO import YOLO11
import os,sys,gc
import ulab.numpy as np
import image
# Read an image from the local disk and convert from HWC to CHW
def read_img(img_path):
img_data = image.Image(img_path)
img_data_rgb888=img_data.to_rgb888()
img_hwc=img_data_rgb888.to_numpy_ref()
shape=img_hwc.shape
img_tmp = img_hwc.reshape((shape[0] * shape[1], shape[2]))
img_tmp_trans = img_tmp.transpose()
img_res=img_tmp_trans.copy()
img_return=img_res.reshape((shape[2],shape[0],shape[1]))
return img_return,img_data_rgb888
if __name__=="__main__":
# You can modify the path parameters according to your model
img_path="/data/test_images/test.jpg"
kmodel_path="/data/yolo_kmodels/seg_yolo11n_320.kmodel"
labels = ["apple","banana","orange"]
confidence_threshold = 0.5
nms_threshold=0.45
mask_threshold=0.5
model_input_size=[320,320]
img,img_ori=read_img(img_path)
rgb888p_size=[img.shape[2],img.shape[1]]
# Initialize the YOLO11 instance
yolo=YOLO11(task_type="segment",mode="image",kmodel_path=kmodel_path,labels=labels,rgb888p_size=rgb888p_size,model_input_size=model_input_size,conf_thresh=confidence_threshold,nms_thresh=nms_threshold,mask_thresh=mask_threshold,max_boxes_num=50,debug_mode=0)
yolo.config_preprocess()
try:
res=yolo.run(img)
yolo.draw_result(res,img_ori)
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
yolo.deinit()
The above code shows how to use YOLO11
for image inference.
from libs.PipeLine import PipeLine, ScopedTiming
from libs.YOLO import YOLO11
import os,sys,gc
import ulab.numpy as np
import image
if __name__=="__main__":
# Display mode, default is "hdmi", can choose "hdmi" or "lcd"
display_mode="hdmi"
rgb888p_size=[320,320]
if display_mode=="hdmi":
display_size=[1920,1080]
else:
display_size=[800,480]
# You can modify the path parameters according to your model
kmodel_path="/data/yolo_kmodels/seg_yolo11n_320.kmodel"
labels = ["apple","banana","orange"]
confidence_threshold = 0.5
nms_threshold=0.45
mask_threshold=0.5
model_input_size=[320,320]
# Initialize the PipeLine
pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
pl.create()
# Initialize the YOLO11 instance
yolo=YOLO11(task_type="segment",mode="video",kmodel_path=kmodel_path,labels=labels,rgb888p_size=rgb888p_size,model_input_size=model_input_size,display_size=display_size,conf_thresh=confidence_threshold,nms_thresh=nms_threshold,mask_thresh=mask_threshold,max_boxes_num=50,debug_mode=0)
yolo.config_preprocess()
try:
while True:
os.exitpoint()
with ScopedTiming("total",1):
# Infer frame by frame
img=pl.get_frame()
res=yolo.run(img)
yolo.draw_result(res,pl.osd_img)
pl.show_image()
gc.collect()
except Exception as e:
sys.print_exception(e)
finally:
yolo.deinit()
pl.destroy()
The above code shows how to use YOLO11
for video inference.