3. AI Demo Documentation

3. AI Demo Documentation#

1. AI Demo Application#

The K230 supports a wide range of AI applications. To facilitate user trial experiences, the CanMV K230 image comes with several built-in AI Demo applications. These are categorized into single-model and multi-model applications, covering various fields such as object, face, hand, body, license plate, OCR, audio (KWS, TTS), and more. Users can easily run these applications by opening the source code with CanMV IDE. The source code for AI Demo applications is located in the /CanMV/sdcard/examples/01-AI-Demo directory. Below is the list of AI Demo applications; note that some demos cannot run on the K230D chip due to memory limitations. Please refer to the compatibility list.

Demo Name	Scenario	Task Type	K230	K230D
dynamic_gesture	Dynamic Gesture Recognition	Multi-Model Task	√	√
eye_gaze	Gaze Estimation	Multi-Model Task	√
face_detection	Face Detection	Single-Model Task	√	√
face_landmark	Face Key Points	Multi-Model Task	√	√
face_mesh	Face 3D Mesh	Multi-Model Task	√
face_parse	Face Parsing	Multi-Model Task	√
face_pose	Face Pose	Multi-Model Task	√	√
face_registration	Face Registration	Multi-Model Task	√	√
face_recognition	Face Recognition	Multi-Model Task	√	√
falldown_detection	Fall Detection	Single-Model Task	√	√
finger_guessing	Finger Guessing Game	Multi-Model Task	√	√
hand_detection	Palm Detection	Single-Model Task	√	√
hand_keypoint_class	Palm Key Point Classification	Multi-Model Task	√	√
hand_keypoint_detection	Palm Key Point Detection	Multi-Model Task	√	√
hand_recognition	Gesture Recognition	Multi-Model Task	√	√
keyword_spotting	Keyword Spotting	Single-Model Task	√	√
licence_det	License Plate Detection	Single-Model Task	√	√
licence_det_rec	License Plate Recognition	Multi-Model Task	√	√
nanotracker	Single Object Tracking	Multi-Model Task	√	√
object_detect_yolov8n	YOLOv8n Object Detection	Single-Model Task	√	√
ocr_det	OCR Detection	Single-Model Task	√
ocr_rec	OCR Recognition	Multi-Model Task	√
person_detection	Body Detection	Single-Model Task	√	√
person_kp_detect	Body Key Point Detection	Multi-Model Task	√	√
puzzle_game	Puzzle Game	Multi-Model Task	√	√
segment_yolov8n	YOLOv8 Segmentation	Single-Model Task	√
self_learning	Self-Learning	Single-Model Task	√	√
space_resize	Local Amplifier	Multi-Model Task	√	√
tts_zh	Chinese Text-to-Speech	Multi-Model Task	√

Note

To run the above demos on the K230D chip development board, you need to change the display_mode in __main__ to lcd to adapt the display output, and also reduce the resolution as per the comments. Some demos cannot run on the K230D, see the table above for details.

2. AI Demo Development Framework Introduction#

2.1. AI Demo Development Framework#

To help users simplify the development of AI components, we have built a supporting AI development framework based on the API interface provided by K230_CanMV. The framework structure is shown in the diagram below:

Development Framework

The Camera outputs two streams of images by default: one in YUV420SP format (Sensor.YUV420SP) is directly provided to the Display for display; the other in RGBP888 format (Sensor.RGBP888) is used for AI processing. The AI is mainly responsible for the pre-processing, inference, and post-processing workflow of tasks. After processing, the results are drawn on the OSD image instance and sent to the Display for overlay display.

Note

The PipeLine process encapsulation mainly simplifies the development process of visual tasks. You can use ‘get_frame’ to get a frame for machine vision processing; if you want to customize the AI process, refer to face_detection. For audio-related AI, refer to the keyword_spotting and tts_zh examples in the demo.

2.2. Interface Encapsulation#

For user convenience, we have encapsulated common functions for obtaining images from the Camera, AI2D pre-processing, and kmodel model inference based on the framework above. For encapsulated interfaces, please refer to: AI Demo API

2.3. Application Methods and Examples#

2.3.1. Overview#

Users can write custom task classes inheriting from AIBase according to specific AI scenarios. Tasks can be divided into four categories: single-model tasks, multi-model tasks, custom pre-processing tasks, and no pre-processing tasks. Different tasks require different code implementations, as shown in the diagram below:

Different Task Types

Introduction to different tasks:

Task Type	Task Description	Code Description
Single-Model Task	This task has only one model, focusing on the model’s pre-processing, inference, and post-processing. Pre-processing uses Ai2d, possibly with one or multiple Ai2d instances. Post-processing is customized based on the scenario.	Write a custom task class, focusing on config_preprocess, postprocess, and other methods like draw_result. If multiple Ai2d instances are used, override preprocess to set the computation process of pre-processing stages.
Custom Pre-Processing Task	This task has one model, focusing on pre-processing, inference, and post-processing. Pre-processing does not use Ai2d; it can be customized with ulab.numpy. Post-processing is customized based on the scenario.	Write a custom task class, focusing on preprocess, postprocess, and other methods like draw_result.
No Pre-Processing Task	This task has one model and requires no pre-processing. Focus on inference and post-processing. Usually part of a multi-model task, using the output of the previous model as input. Post-processing is customized based on needs.	Write a custom task class, focusing on run (the entire inference process, including preprocess, inference, postprocess), postprocess, and other methods like draw_results.
Multi-Model Task	This task includes multiple models, possibly in series or other combinations. Each model is essentially one of the first three types, unified in a complete task class.	Write multiple sub-model task classes. Different sub-model tasks refer to the first three task definitions. Different tasks focus on different methods. Write a multi-model task class to unify sub-model task classes for the entire scenario.

2.3.2. Single-Model Task#

The pseudocode structure for a single-model task is as follows:

from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys

# Custom AI task class, inheriting from AIBase
class MyAIApp(AIBase):
    def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
        # Call the constructor of the base class
        super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
        # Model file path
        self.kmodel_path = kmodel_path
        # Model input resolution
        self.model_input_size = model_input_size
        # Image resolution from sensor to AI, align width to 16
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
        # Display resolution, align width to 16
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
        # Enable debug mode
        self.debug_mode = debug_mode
        # Instantiate Ai2d for model pre-processing
        self.ai2d = Ai2d(debug_mode)
        # Set input and output format and type for Ai2d
        self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)

    # Configure pre-processing operations, using resize here. Ai2d supports crop/shift/pad/resize/affine. See /sdcard/libs/AI2D.py for code.
    def config_preprocess(self, input_image_size=None):
        with ScopedTiming("set preprocess config", self.debug_mode > 0):
            # Initialize ai2d pre-processing configuration, default to sensor size to AI, can modify input size by setting input_image_size
            ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
            # Configure resize pre-processing method
            self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
            # Build pre-processing workflow
            self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])

    # Customize post-processing for the current task, results is a list of model output arrays, need to override based on actual task
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
           pass

    # Draw results on the screen, need to write according to the task
    def draw_result(self, pl, dets):
        with ScopedTiming("display_draw", self.debug_mode > 0):
            pass

if __name__ == "__main__":
    # Display mode, default "hdmi", can choose "hdmi" or "lcd"
    display_mode="hdmi"
    if display_mode=="hdmi":
        display_size=[1920,1080]
    else:
        display_size=[800,480]
    # Set model path, replace with the current task model
    kmodel_path = "example_test.kmodel"
    rgb888p_size = [1920, 1080]
    ###### Other Parameters ########
    ...
    ######################

    # Initialize PipeLine for image processing workflow
    pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
    pl.create()  # Create PipeLine instance
    # Initialize custom AI task instance
    my_ai = MyAIApp(kmodel_path, model_input_size=[320, 320],rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
    my_ai.config_preprocess()  # Configure pre-processing

    try:
        while True:
            os.exitpoint()                      # Check for exit signal
            with ScopedTiming("total",1):
                img = pl.get_frame()            # Get current frame data
                res = my_ai.run(img)            # Infer current frame
                my_ai.draw_result(pl, res)      # Draw result
                pl.show_image()                 # Show result
                gc.collect()                    # Garbage collection
    except Exception as e:
        sys.print_exception(e)                  # Print exception information
    finally:
        my_ai.deinit()                          # Deinitialize
        pl.destroy()                            # Destroy PipeLine instance

Below is an example code for face detection:

from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import utime
import image
import random
import gc
import sys
import aidemo

#
Custom Face Detection Class, inheriting from the AIBase class
```python
class FaceDetectionApp(AIBase):
    def __init__(self, kmodel_path, model_input_size, anchors, confidence_threshold=0.5, nms_threshold=0.2, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
        super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)  # Call the constructor of the base class
        self.kmodel_path = kmodel_path  # Model file path
        self.model_input_size = model_input_size  # Model input resolution
        self.confidence_threshold = confidence_threshold  # Confidence threshold
        self.nms_threshold = nms_threshold  # NMS (Non-Maximum Suppression) threshold
        self.anchors = anchors  # Anchor data, used for object detection
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]  # Image resolution from sensor to AI, aligned to 16 in width
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]  # Display resolution, aligned to 16 in width
        self.debug_mode = debug_mode  # Whether to enable debug mode
        self.ai2d = Ai2d(debug_mode)  # Instantiate Ai2d for model pre-processing
        self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)  # Set input and output format and type for Ai2d

    # Configure pre-processing operations, using pad and resize here. Ai2d supports crop/shift/pad/resize/affine. See the code in /sdcard/app/libs/AI2D.py.
    def config_preprocess(self, input_image_size=None):
        with ScopedTiming("set preprocess config", self.debug_mode > 0):  # Timer, enabled if debug_mode is greater than 0
            ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size  # Initialize ai2d pre-processing configuration, default to sensor size to AI, can modify input size by setting input_image_size
            top, bottom, left, right = self.get_padding_param()  # Get padding parameters
            self.ai2d.pad([0, 0, 0, 0, top, bottom, left, right], 0, [104, 117, 123])  # Pad the edges
            self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)  # Resize the image
            self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])  # Build the pre-processing workflow

    # Customize post-processing for the current task, results is a list of model output arrays, using the face_det_post_process interface from the aidemo library here
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
            post_ret = aidemo.face_det_post_process(self.confidence_threshold, self.nms_threshold, self.model_input_size[1], self.anchors, self.rgb888p_size, results)
            if len(post_ret) == 0:
                return post_ret
            else:
                return post_ret[0]

    # Draw detection results on the screen
    def draw_result(self, pl, dets):
        with ScopedTiming("display_draw", self.debug_mode > 0):
            if dets:
                pl.osd_img.clear()  # Clear the OSD image
                for det in dets:
                    # Convert the coordinates of the detection box to coordinates under the display resolution
                    x, y, w, h = map(lambda x: int(round(x, 0)), det[:4])
                    x = x * self.display_size[0] // self.rgb888p_size[0]
                    y = y * self.display_size[1] // self.rgb888p_size[1]
                    w = w * self.display_size[0] // self.rgb888p_size[0]
                    h = h * self.display_size[1] // self.rgb888p_size[1]
                    pl.osd_img.draw_rectangle(x, y, w, h, color=(255, 255, 0, 255), thickness=2)  # Draw a rectangle
            else:
                pl.osd_img.clear()

    # Get padding parameters
    def get_padding_param(self):
        dst_w = self.model_input_size[0]  # Model input width
        dst_h = self.model_input_size[1]  # Model input height
        ratio_w = dst_w / self.rgb888p_size[0]  # Width scaling ratio
        ratio_h = dst_h / self.rgb888p_size[1]  # Height scaling ratio
        ratio = min(ratio_w, ratio_h)  # Take the smaller scaling ratio
        new_w = int(ratio * self.rgb888p_size[0])  # New width
        new_h = int(ratio * self.rgb888p_size[1])  # New height
        dw = (dst_w - new_w) / 2  # Width difference
        dh = (dst_h - new_h) / 2  # Height difference
        top = int(round(0))
        bottom = int(round(dh * 2 + 0.1))
        left = int(round(0))
        right = int(round(dw * 2 - 0.1))
        return top, bottom, left, right

if __name__ == "__main__":
    # Display mode, default is "hdmi", you can choose "hdmi" or "lcd"
    display_mode="hdmi"
    # k230 remains unchanged, k230d can be adjusted to [640,360]
    rgb888p_size = [1920, 1080]

    if display_mode=="hdmi":
        display_size=[1920,1080]
    else:
        display_size=[800,480]
    # Set model path and other parameters
    kmodel_path = "/sdcard/examples/kmodel/face_detection_320.kmodel"
    # Other parameters
    confidence_threshold = 0.5
    nms_threshold = 0.2
    anchor_len = 4200
    det_dim = 4
    anchors_path = "/sdcard/examples/utils/prior_data_320.bin"
    anchors = np.fromfile(anchors_path, dtype=np.float)
    anchors = anchors.reshape((anchor_len, det_dim))

    # Initialize PipeLine for image processing workflow
    pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
    pl.create()  # Create PipeLine instance
    # Initialize custom face detection instance
    face_det = FaceDetectionApp(kmodel_path, model_input_size=[320, 320], anchors=anchors, confidence_threshold=confidence_threshold, nms_threshold=nms_threshold, rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
    face_det.config_preprocess()  # Configure pre-processing

    try:
        while True:
            os.exitpoint()                      # Check for exit signal
            with ScopedTiming("total",1):
                img = pl.get_frame()            # Get current frame data
                res = face_det.run(img)         # Infer current frame
                face_det.draw_result(pl, res)   # Draw result
                pl.show_image()                 # Show result
                gc.collect()                    # Garbage collection
    except Exception as e:
        sys.print_exception(e)                  # Print exception information
    finally:
        face_det.deinit()                       # Deinitialize
        pl.destroy()                            # Destroy PipeLine instance

The pseudocode for multiple Ai2d instances is as follows:

from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys
 
# Custom AI task class, inheriting from AIBase class
class MyAIApp(AIBase):
    def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
        # Call the constructor of the base class
        super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
        # Model file path
        self.kmodel_path = kmodel_path
        # Model input resolution
        self.model_input_size = model_input_size
        # Image resolution from sensor to AI, aligned to 16 in width
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
        # Display resolution, aligned to 16 in width
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
        # Whether to enable debug mode
        self.debug_mode = debug_mode
        # Instantiate Ai2d for model pre-processing
        self.ai2d_resize = Ai2d(debug_mode)
        # Set input and output format and type for Ai2d
        self.ai2d_resize.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
        # Instantiate Ai2d for model pre-processing
        self.ai2d_resize = Ai2d(debug_mode)
        # Set input and output format and type for Ai2d
        self.ai2d_resize.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
        # Instantiate Ai2d for model pre-processing
        self.ai2d_crop = Ai2d(debug_mode)
        # Set input and output format and type for Ai2d
        self.ai2d_crop.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)

    # Configure pre-processing operations, using resize and crop here. Ai2d supports crop/shift/pad/resize/affine. See the code in /sdcard/libs/AI2D.py.
    def config_preprocess(self, input_image_size=None):
        with ScopedTiming("set preprocess config", self.debug_mode > 0):
            # Initialize ai2d pre-processing configuration, default to sensor size to AI, can modify input size by setting input_image_size
            ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
            # Configure resize pre-processing method
            self.ai2d_resize.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
            # Build pre-processing workflow
            self.ai2d_resize.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,640,640])
            # Configure crop pre-processing method
            self.ai2d_crop.crop(0,0,320,320)
            # Build pre-processing workflow
            self.ai2d_crop.build([1,3,640,640],[1,3,320,320])

    # Assume this task requires crop and resize pre-processing, the order is resize first then crop, this order does not match ai2d's processing order, so two Ai2d instances are needed to handle them separately
    def preprocess(self,input_np):
        resize_tensor=self.ai2d_resize.run(input_np)
        resize_np=resize_tensor.to_numpy()
        crop_tensor=self.ai2d_crop.run(resize_np)
        return [crop_tensor]


    # Customize post-processing for the current task, results is a list of model output arrays, need to override based on actual task
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
           pass

    # Draw results on the screen, need to write according to the task
    def draw_result(self, pl, dets):
        with ScopedTiming("display_draw", self.debug_mode > 0):
            pass

    # Override deinit to release multiple ai2d resources
    def deinit(self):
        with ScopedTiming("deinit",self.debug_mode > 0):
            del self.ai2d_resize
            del self.ai2d_crop
            super().deinit()

if __name__ == "__main__":
    # Display mode, default is "hdmi", you can choose "hdmi" or "lcd"
    display_mode="hdmi"
    if display_mode=="hdmi":
        display_size=[1920,1080]
    else:
        display_size=[800,480]
    # Set model path, replace with the current task model
    kmodel_path = "example_test.kmodel"
    rgb888p_size = [1920, 1080]
    ###### Other parameters ########
    ...
    ######################

    # Initialize PipeLine for image processing workflow
    pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
    pl.create()  # Create PipeLine instance
    # Initialize custom AI task instance
    my_ai = MyAIApp(kmodel_path, model_input_size=[320, 320],rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
    my_ai.config_preprocess()  # Configure pre-processing

    try:
        while True:
            os.exitpoint()                      # Check for exit signal
            with ScopedTiming("total",1):
                img = pl.get_frame()            # Get current frame data
                res = my_ai.run(img)            # Infer current frame
                my_ai.draw_result(pl, res)      # Draw result
                pl.show_image()                 # Show result
                gc.collect()                    # Garbage collection
    except Exception as e:
        sys.print_exception(e)                  # Print exception information
    finally:
        my_ai.deinit()                          # Deinitialize
        pl.destroy()                            # Destroy PipeLine instance

2.3.3. Custom Pre-processing Task#

For AI tasks that require rewriting the pre-processing (not using the provided ai2d class, writing pre-processing manually), the pseudocode is as follows:

from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys

# Custom AI Task Class, Inheriting from AIBase Base Class
class MyAIApp(AIBase):
    def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
        # Call the constructor of the base class
        super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
        # Model file path
        self.kmodel_path = kmodel_path
        # Model input resolution
        self.model_input_size = model_input_size
        # Image resolution from sensor to AI, aligned to 16 in width
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
        # Display resolution, aligned to 16 in width
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
        # Whether to enable debug mode
        self.debug_mode = debug_mode
        # Instantiate Ai2d for model pre-processing
        self.ai2d = Ai2d(debug_mode)
        # Set input and output format and type for Ai2d
        self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)

    # For AI tasks that do not use ai2d for pre-processing, use the encapsulated interface or ulab.numpy to implement pre-processing, need to override this function in the subclass
    def preprocess(self,input_np):
        #############
        # Note custom pre-processing
        #############
        return [tensor]

    # Customize post-processing for the current task, results is a list of model output arrays, need to override based on actual task
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
           pass

    # Draw results on the screen, need to write according to the task
    def draw_result(self, pl, dets):
        with ScopedTiming("display_draw", self.debug_mode > 0):
            pass

if __name__ == "__main__":
    # Display mode, default "hdmi", can choose "hdmi" and "lcd"
    display_mode="hdmi"
    if display_mode=="hdmi":
        display_size=[1920,1080]
    else:
        display_size=[800,480]
    # Set model path, replace with the current task model
    kmodel_path = "example_test.kmodel"
    rgb888p_size = [1920, 1080]
    ###### Other parameters ########
    ...
    ######################

    # Initialize PipeLine for image processing workflow
    pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
    pl.create()  # Create PipeLine instance
    # Initialize custom AI task instance
    my_ai = MyAIApp(kmodel_path, model_input_size=[320, 320],rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
    my_ai.config_preprocess()  # Configure pre-processing

    try:
        while True:
            os.exitpoint()                      # Check for exit signal
            with ScopedTiming("total",1):
                img = pl.get_frame()            # Get current frame data
                res = my_ai.run(img)            # Infer current frame
                my_ai.draw_result(pl, res)      # Draw result
                pl.show_image()                 # Show result
                gc.collect()                    # Garbage collection
    except Exception as e:
        sys.print_exception(e)                  # Print exception information
    finally:
        my_ai.deinit()                          # Deinitialize
        pl.destroy()                            # Destroy PipeLine instance

Example of Keyword Spotting with Keyword “XiaonanXiaonan”#

from libs.PipeLine import ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
from media.pyaudio import *                     # Audio module
from media.media import *                       # Software abstraction module, mainly encapsulates media data link and media buffer
import media.wave as wave                       # wav audio processing module
import nncase_runtime as nn                     # nncase runtime module, encapsulates kpu (kmodel inference) and ai2d (image pre-processing acceleration) operations
import ulab.numpy as np                         # Similar to python numpy operations, but with some different interfaces
import aidemo                                   # aidemo module, encapsulates ai demo related pre-processing, post-processing, etc.
import time                                     # Time statistics
import struct                                   # Byte character conversion module
import gc                                       # Garbage collection module
import os,sys                                   # Operating system interface module
 
# Custom keyword spotting class, inheriting from AIBase base class
class KWSApp(AIBase):
    def __init__(self, kmodel_path, threshold, debug_mode=0):
        super().__init__(kmodel_path)  # Call the constructor of the base class
        self.kmodel_path = kmodel_path  # Model file path
        self.threshold=threshold
        self.debug_mode = debug_mode  # Whether to enable debug mode
        self.cache_np = np.zeros((1, 256, 105), dtype=np.float)

    # Custom pre-processing, return model input tensor list
    def preprocess(self,pcm_data):
        pcm_data_list=[]
        # Get audio stream data
        for i in range(0, len(pcm_data), 2):
            # Organize every two bytes into a signed integer, then convert it to a float, which is the sampled data for one sample, added to the current frame (0.3s) data list
            int_pcm_data = struct.unpack("<h", pcm_data[i:i+2])[0]
            float_pcm_data = float(int_pcm_data)
            pcm_data_list.append(float_pcm_data)
        # Process pcm data into feature vectors for model input
        mp_feats = aidemo.kws_preprocess(fp, pcm_data_list)[0]
        mp_feats_np = np.array(mp_feats).reshape((1, 30, 40))
        audio_input_tensor = nn.from_numpy(mp_feats_np)
        cache_input_tensor = nn.from_numpy(self.cache_np)
        return [audio_input_tensor,cache_input_tensor]

    # Custom post-processing for the current task, results is a list of model output arrays
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
            logits_np = results[0]
            self.cache_np= results[1]
            max_logits = np.max(logits_np, axis=1)[0]
            max_p = np.max(max_logits)
            idx = np.argmax(max_logits)
            # If the score is greater than the threshold and idx==1 (i.e., contains the wake word), play the reply audio
            if max_p > self.threshold and idx == 1:
                return 1
            else:
                return 0


if __name__ == "__main__":
    os.exitpoint(os.EXITPOINT_ENABLE)
    nn.shrink_memory_pool()
    # Set model path and other parameters
    kmodel_path = "/sdcard/examples/kmodel/kws.kmodel"
    # Other parameters
    THRESH = 0.5                # Detection threshold
    SAMPLE_RATE = 16000         # Sampling rate 16000Hz, i.e., 16000 samples per second
    CHANNELS = 1                # Number of channels 1 for mono, 2 for stereo
    FORMAT = paInt16            # Audio input/output format paInt16
    CHUNK = int(0.3 * 16000)    # Number of frames to read audio data each time, set to 0.3s frames 16000*0.3=4800
    reply_wav_file = "/sdcard/examples/utils/wozai.wav"         # kws wake word reply audio path

    # Initialize audio pre-processing interface
    fp = aidemo.kws_fp_create()
    # Initialize audio stream
    p = PyAudio()
    p.initialize(CHUNK)
    MediaManager.init()    # vb buffer initialization
    # Used to collect real-time audio data
    input_stream = p.open(format=FORMAT,channels=CHANNELS,rate=SAMPLE_RATE,input=True,frames_per_buffer=CHUNK)
    # Used to play reply audio
    output_stream = p.open(format=FORMAT,channels=CHANNELS,rate=SAMPLE_RATE,output=True,frames_per_buffer=CHUNK)
    # Initialize custom keyword spotting instance
    kws = KWSApp(kmodel_path,threshold=THRESH,debug_mode=0)

    try:
        while True:
            os.exitpoint()                      # Check for exit signal
            with ScopedTiming("total",1):
                pcm_data=input_stream.read()
                res=kws.run(pcm_data)
                if res:
                    print("====Detected XiaonanXiaonan!====")
                    wf = wave.open(reply_wav_file, "rb")
                    wav_data = wf.read_frames(CHUNK)
                    while wav_data:
                        output_stream.write(wav_data)
                        wav_data = wf.read_frames(CHUNK)
                    time.sleep(1) # Time buffer for playing reply sound
                    wf.close()
                else:
                    print("Deactivated!")
                gc.collect()                    # Garbage collection
    except Exception as e:
        sys.print_exception(e)                  # Print exception information
    finally:
        input_stream.stop_stream()
        output_stream.stop_stream()
        input_stream.close()
        output_stream.close()
        p.terminate()
        MediaManager.deinit()              # Release vb buffer
        aidemo.kws_fp_destroy(fp)
        kws.deinit()                       # Deinitialize

2.3.4. No Pre-processing Task#

For AI tasks that do not require pre-processing (direct input inference), the pseudocode is as follows:

from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys

# Custom AI Task Class, Inheriting from AIBase Base Class
class MyAIApp(AIBase):
    def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
        # Call the constructor of the base class
        super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
        # Model file path
        self.kmodel_path = kmodel_path
        # Model input resolution
        self.model_input_size = model_input_size
        # Image resolution from sensor to AI, aligned to 16 in width
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
        # Display resolution, aligned to 16 in width
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
        # Whether to enable debug mode
        self.debug_mode = debug_mode

    # Custom post-processing for the current task, results is a list of model output arrays, needs to be rewritten based on the actual task
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
           pass

    # For AI tasks without preprocessing, this function needs to be overridden in the subclass
    def run(self, inputs_np):
        # First convert ulab.numpy.ndarray list to tensor list
        tensors = []
        for input_np in inputs_np:
            tensors.append(nn.from_numpy(input_np))
        # Call the inference function in AIBase for model inference
        results = self.inference(tensors)
        # Call the postprocess method of the current subclass for custom post-processing
        outputs = self.postprocess(results)
        return outputs

    # Draw results on the screen, needs to be written according to the task
    def draw_result(self, pl, dets):
        with ScopedTiming("display_draw", self.debug_mode > 0):
            pass

if __name__ == "__main__":
    # Display mode, default "hdmi", can choose "hdmi" and "lcd"
    display_mode = "hdmi"
    if display_mode == "hdmi":
        display_size = [1920, 1080]
    else:
        display_size = [800, 480]
    # Set model path, replace with the current task model
    kmodel_path = "example_test.kmodel"
    rgb888p_size = [1920, 1080]
    ###### Other parameters ########
    ...
    ######################

    # Initialize PipeLine for image processing workflow
    pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
    pl.create()  # Create PipeLine instance
    # Initialize custom AI task instance
    my_ai = MyAIApp(kmodel_path, model_input_size=[320, 320], rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
    my_ai.config_preprocess()  # Configure preprocessing

    try:
        while True:
            os.exitpoint()                      # Check for exit signal
            with ScopedTiming("total", 1):
                img = pl.get_frame()            # Get current frame data
                res = my_ai.run(img)            # Infer current frame
                my_ai.draw_result(pl, res)      # Draw result
                pl.show_image()                 # Show result
                gc.collect()                    # Garbage collection
    except Exception as e:
        sys.print_exception(e)                  # Print exception information
    finally:
        my_ai.deinit()                          # Deinitialize
        pl.destroy()                            # Destroy PipeLine instance

For example, in the single-object tracking module (nanotracker.py), the tracking module only needs the output of the template model and real-time inference model as the input for the tracking model, without requiring preprocessing:

class TrackerApp(AIBase):
    def __init__(self, kmodel_path, crop_input_size, thresh, rgb888p_size=[1280,720], display_size=[1920,1080], debug_mode=0):
        super().__init__(kmodel_path, rgb888p_size, debug_mode)
        # kmodel path
        self.kmodel_path = kmodel_path
        # Crop model input size
        self.crop_input_size = crop_input_size
        # Tracking box threshold
        self.thresh = thresh
        # Tracking box width and height adjustment coefficient
        self.CONTEXT_AMOUNT = 0.5
        # Image resolution from sensor to AI, aligned to 16 in width
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
        # Video output VO resolution, aligned to 16 in width
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
        # Debug mode
        self.debug_mode = debug_mode
        # Optional
        self.ai2d = Ai2d(debug_mode)
        self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)

    # Optional
    def config_preprocess(self, input_image_size=None):
        with ScopedTiming("set preprocess config", self.debug_mode > 0):
            pass

    # Override the run function, because there is no preprocessing, so the original run operation which includes preprocess->inference->postprocess is not suitable, here it only includes inference->postprocess
    def run(self, input_np_1, input_np_2, center_xy_wh):
        input_tensors = []
        input_tensors.append(nn.from_numpy(input_np_1))
        input_tensors.append(nn.from_numpy(input_np_2))
        results = self.inference(input_tensors)
        return self.postprocess(results, center_xy_wh)

    # Custom post-processing, results is a list of model output arrays, here using aidemo's nanotracker_postprocess function
    def postprocess(self, results, center_xy_wh):
        with ScopedTiming("postprocess", self.debug_mode > 0):
            det = aidemo.nanotracker_postprocess(results[0], results[1], [self.rgb888p_size[1], self.rgb888p_size[0]], self.thresh, center_xy_wh, self.crop_input_size[0], self.CONTEXT_AMOUNT)
            return det

2.3.5. Multi-Model Task#

Here is an example of dual-model cascade inference, with the pseudocode as follows:

from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
from media.media import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import gc
import sys

# Custom AI task class, inheriting from AIBase base class
class MyAIApp_1(AIBase):
    def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
        # Call the constructor of the base class
        super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
        # Model file path
        self.kmodel_path = kmodel_path
        # Model input resolution
        self.model_input_size = model_input_size
        # Image resolution from sensor to AI, aligned to 16 in width
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
        # Display resolution, aligned to 16 in width
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
        # Whether to enable debug mode
        self.debug_mode = debug_mode
        # Instantiate Ai2d for model preprocessing
        self.ai2d = Ai2d(debug_mode)
        # Set input and output format and type for Ai2d
        self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)

    # Configure preprocessing operations, using resize here. Ai2d supports crop/shift/pad/resize/affine. Check /sdcard/libs/AI2D.py for specific code.
    def config_preprocess(self, input_image_size=None):
        with ScopedTiming("set preprocess config", self.debug_mode > 0):
            # Initialize ai2d preprocessing configuration, default is the size from sensor to AI, can modify input size by setting input_image_size
            ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
            # Configure resize preprocessing method
            self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
            # Build preprocessing workflow
            self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]], [1,3,self.model_input_size[1],self.model_input_size[0]])

    # Custom post-processing for the current task, results is a list of model output arrays, needs to be rewritten based on the actual task
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
           pass

# Custom AI task class, inheriting from AIBase base class
class MyAIApp_2(AIBase):
    def __init__(self, kmodel_path, model_input_size, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
        # Call the constructor of the base class
        super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)
        # Model file path
        self.kmodel_path = kmodel_path
        # Model input resolution
        self.model_input_size = model_input_size
        # Image resolution from sensor to AI, aligned to 16 in width
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]
        # Display resolution, aligned to 16 in width
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]
        # Whether to enable debug mode
        self.debug_mode = debug_mode
        # Instantiate Ai2d for model preprocessing
        self.ai2d = Ai2d(debug_mode)
        # Set input and output format and type for Ai2d
        self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)

    # Configure preprocessing operations, using resize here. Ai2d supports crop/shift/pad/resize/affine. Check /sdcard/libs/AI2D.py for specific code.
    def config_preprocess(self, input_image_size=None):
        with ScopedTiming("set preprocess config", self.debug_mode > 0):
            # Initialize ai2d preprocessing configuration, default is the size from sensor to AI, can modify input size by setting input_image_size
            ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size
            # Configure resize preprocessing method
            self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
            # Build preprocessing workflow
            self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]], [1,3,self.model_input_size[1],self.model_input_size[0]])

    # Custom post-processing for the current task, results is a list of model output arrays, needs to be rewritten based on the actual task
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
           pass

class MyApp:
    def __init__(self, kmodel1_path, kmodel2_path, kmodel1_input_size, kmodel2_input_size, rgb888p_size, display_size, debug_mode):
        # Create instances for two model inferences
        self.app_1 = MyAIApp_1(kmodel1_path, kmodel1_input_size, rgb888p_size, display_size, debug_mode)
        self.app_2 = MyAIApp_2(kmodel2_path, kmodel2_input_size, rgb888p_size, display_size, debug_mode)
        self.app_1.config_preprocess()

    # Write the run function, specific code should be written according to AI task requirements, here is just an example
    def run(self, input_np):
        outputs_1 = self.app_1.run(input_np)
        outputs_2 = []
        for out in outputs_1:
            self.app_2.config_preprocess(out)
            out_2 = self.app_2.run(input_np)
            outputs_2.append(out_2)
        return outputs_1, outputs_2

    # Draw
    def draw_result(self, pl, outputs_1, outputs_2):
        pass

    ###### Other functions ########
    # Omitted
    ####################

if __name__ == "__main__":
    # Display mode, default "hdmi", can choose "hdmi" and "lcd"
    display_mode = "hdmi"
    if display_mode == "hdmi":
        display_size = [1920, 1080]
    else:
        display_size = [800, 480]
    rgb888p_size = [1920, 1080]
    # Set model path, replace with the current task model
    kmodel1_path = "test_kmodel1.kmodel"
    kmodel1_input_size = [320, 320]
    kmodel2_path = "test_kmodel2.kmodel"
    kmodel2_input_size = [48, 48]

    ###### Other parameters ########
    # Omitted
    ######################

    # Initialize PipeLine for image processing workflow
    pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
    pl.create()  # Create PipeLine instance
    # Initialize custom AI task instance
    my_ai = MyApp(kmodel1_path, kmodel2_path, kmodel1_input_size, kmodel2_input_size, rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
    my_ai.config_preprocess()  # Configure preprocessing

    try:
        while True:
            os.exitpoint()                      # Check for exit signal
            with ScopedTiming("total", 1):
                img = pl.get_frame()            # Get current frame data
                outputs_1, outputs_2 = my_ai.run(img)  # Infer current frame
                my_ai.draw_result(pl, outputs_1, outputs_2)  # Draw result
                pl.show_image()                 # Show result
                gc.collect()                    # Garbage collection
    except Exception as e:
        sys.print_exception(e)                  # Print exception information
    finally:
        my_ai.app_1.deinit()                    # Deinitialize
        my_ai.app_2.deinit()
        pl.destroy()                            # Destroy PipeLine instance

For example code, please refer to the license plate detection and recognition: burn the firmware, open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> licence_det_rec.py.

2.4. Reference Documents#

2.4.1. K230 CanMV Documentation#

Documentation link: Welcome to K230 CanMV’s documentation! — K230 CanMV 文档 (canaan-creative.com)

2.4.2. Ulab Library Support#

Link: ulab – Manipulate numeric data similar to numpy — Adafruit CircuitPython 9.1.0-beta.3 documentation

GitHub Link: v923z/micropython-ulab: a numpy-like fast vector module for micropython, circuitpython, and their derivatives (github.com)

3. AI Demo#

Note

To run the above demos on the K230D chip development board, you need to change the display_mode in __main__ to lcd to adapt to the display output. Additionally, you need to reduce the resolution as per the comments. Some demos may not run on the K230D, please refer to the table above for details.

Please download the latest version of the firmware according to your development board type from the K230 MicroPython Image Download List and K230D MicroPython Image Download List and complete the burning process. For burning instructions, please refer to the documentation: Burn Firmware.

After the image burning is complete, connect the IDE. For IDE usage instructions, refer to the documentation: How to Use IDE.

3.1 Dynamic Gesture Recognition#

3.1.1 Demo Description#

Dynamic gesture recognition implements the recognition of five dynamic gestures, which include: waving up, waving down, waving left, waving right, and pinching with fingers.

3.1.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> dynamic_gesture.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.1.3 Demonstration Effect#

Please point your palm at the camera. After the corresponding direction of the palm appears in the upper left corner of the screen, make actions of waving left, right, up, down, or pinching with five fingers. The recognized action will display the category of the dynamic gesture and mark it with an arrow.

3.2 Gaze Estimation#

3.2.1 Demo Description#

Gaze estimation predicts the eye gaze direction based on facial features. This application uses a dual-model approach, first performing face detection, then predicting the gaze direction of the detected face, and marking it with an arrow on the screen.

3.2.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> eye_gaze.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.2.3 Demonstration Effect#

The original image is as follows:

eye_gaze

eye_gaze_res

3.3 Face Detection#

3.3.1 Demo Description#

The face detection application detects every face in the video and marks it with a detection box, while also marking the five key points of each face: left eyeball, right eyeball, nose tip, left mouth corner, and right mouth corner.

3.3.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> face_detection.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.3.3 Demonstration Effect#

The original image is as follows:

face_det

face_det_res

3.4 Facial Key Points#

3.4.1 Demo Description#

The facial key points application is a dual-model application. First, it performs face detection on each frame of the video, then recognizes 106 key points for each detected face, and draws the contours of the face, mouth, eyes, nose, and eyebrow regions based on these key points.

3.4.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> face_landmark.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.4.3 Demonstration Effect#

The original image is as follows:

face_landmark

face_landmark_res

3.5 Face 3D Mesh#

3.5.1 Demo Description#

The face 3D mesh is a multi-model application. First, it performs face detection on each frame of the video, then uses a face alignment model (3DDFA, 3D Dense Face Alignment) for face alignment, and finally uses a face mesh model for face 3D mesh reconstruction to obtain the mesh of each face in the image.

3.5.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> face_mesh.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.5.3 Demonstration Effect#

The original image is as follows:

face_mesh

face_mesh_res

3.6 Face Parsing#

3.6.1 Demo Description#

Face parsing (also known as face segmentation) is a dual-model application. First, it performs face detection, then implements segmentation of different parts of the face. Face segmentation includes pixel-level classification of parts such as eyes, nose, and mouth, with different regions marked in different colors on the screen.

3.6.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> face_parse.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.6.3 Demonstration Effect#

The original image is as follows:

face_parse

face_parse_res

3.7 Face Pose#

3.7.1 Demo Description#

Face pose is a dual-model application. First, it performs face detection on each frame of the video, then predicts the face orientation for each detected face. Face orientation is represented using Euler angles (roll/yaw/pitch), where roll indicates the degree of head shaking left or right, yaw indicates the degree of head rotation left or right, and pitch indicates the degree of head nodding up or down. This application visualizes face orientation on the screen by constructing a projection matrix.

3.7.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> face_pose.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.7.3 Demonstration Effect#

The original image is as follows:

face_pose

face_pose_res

3.8 Face Registration#

3.8.1 Demo Description#

Face registration is a pre-task for face recognition, where each image containing a face in the face database is characterized, and the face features are written into the face database directory as a bin file for the face recognition program to call. The output face feature dimension of face registration is 512.

3.8.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> face_registration.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.8.3 Demonstration Effect#

id_2

3.9 Face Recognition#

3.9.1 Demo Description#

The face recognition application performs face identity recognition on each frame of the video based on the information from face registration. If the recognized face is in the registration database, it labels the recognized face’s identity information; otherwise, it displays “unknown.”

3.9.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> face_recognition.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.9.3 Demonstration Effect#

The registered face original image is as follows:

face_recognition

face_recognition_res

3.10 Fall Detection#

3.10.1 Demo Description#

The fall detection application detects whether there is a human in each frame of the video and checks for the fall status of the human. If not fallen, it marks “NoFall”; if fallen, it marks “Fall.”

3.10.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> falldown_detect.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.10.3 Demonstration Effect#

The original image is as follows:

falldown_det

falldown_det_res

3.11 Rock-Paper-Scissors Game#

3.11.1 Demo Description#

The rock-paper-scissors game is a fun game based on hand applications. First, ensure there are no other palms on the screen, then one hand enters the camera to show rock/paper/scissors. The machine will also randomly show rock/paper/scissors, and the winner is determined by the best of three rounds.

3.11.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> finger_guessing.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.11.3 Demonstration Effect#

Initially, ensure the camera range is blank. One hand enters the camera while showing rock/paper/scissors. The rule is best of three. Please experience it yourself.

3.12 Palm Detection#

3.12.1 Demo Description#

The palm detection application detects the presence of palms in each frame of the video and marks the palm detection box on the screen.

3.12.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> hand_detection.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.12.3 Demonstration Effect#

The original image is as follows:

hand_det

hand_det_res

3.13 Palm Key Point Classification#

3.13.1 Demo Description#

The palm key point classification application is a dual-model task. First, it detects the palms present in the video, then performs key point regression on the detected palms. After obtaining the key point information, it distinguishes different gestures by calculating the angle information between fingers. It currently supports 9 gestures as shown below.

Supported Gesture Illustration

3.13.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> hand_keypoint_class.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.13.3 Demonstration Effect#

The original image is as follows:

hand_kp_cls

hand_kp_cls_res

3.14 Palm Key Point Detection#

3.14.1 Demo Description#

Palm key point detection is a dual-model task. First, it detects the palm in each frame of the video, then performs key point regression on each detected palm to obtain the positions of 21 palm skeletal key points. These key points and their connections are displayed on the screen.

3.14.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> hand_keypoint_detection.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.14.3 Demonstration Effect#

The original image is as follows:

hand_kp_det

hand_kp_det_res

3.15 Gesture Recognition#

3.15.1 Demo Description#

The gesture recognition application is a classification-based task. First, it detects the palm in each frame of the video, then classifies the detected palm to recognize the gesture. This example supports three gestures as shown below.

Gesture Recognition

3.15.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> hand_recognition.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.15.3 Demonstration Effect#

The original image is as follows:

hand_rec

hand_rec_res

3.16 Keyword Spotting#

3.16.1 Demo Description#

The keyword spotting application is a typical audio application. After the application starts, wake it up by saying “Xiao Nan, Xiao Nan” near the microphone. Once the wake word is recognized, it will respond with “I’m here.” Other audio data collection applications can also refer to this application.

3.16.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> keyword_spotting.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.16.3 Demonstration Effect#

Please approach the microphone and wake it up with “Xiao Nan, Xiao Nan.” After waking up, the K230 will respond with “I’m here!”

3.17 License Plate Detection#

3.17.1 Demo Description#

The license plate detection application detects license plates appearing in the video and marks them with detection boxes on the screen.

3.17.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> licence_det.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.17.3 Demonstration Effect#

Please find license plate images for detection yourself. This example involves license plate privacy, so no demonstration effect is provided.

3.18 License Plate Recognition#

3.18.1 Demo Description#

The license plate recognition application is a dual-model task. First, it detects license plates in the video, then recognizes each detected license plate and displays the recognized content near the corresponding detection box.

3.18.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> licence_det_rec.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.18.3 Demonstration Effect#

Please find license plate images for recognition yourself. This example involves license plate privacy, so no demonstration effect is provided.

3.19 Single Object Tracking#

3.19.1 Demo Description#

The single object tracking application tracks and recognizes the registered object within the target box. After starting the application, wait for a while. During this time, place the object to be tracked (preferably with a different color from the background) in the target box. Once registration is complete, it will automatically enter tracking mode, and the moving target box will follow the object.

3.19.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> nanotracker.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.19.3 Demonstration Effect#

The original image is as follows:

nanotracker

nanotracker_res

3.20 YOLOv8n Object Detection#

3.20.1 Demo Description#

The YOLOv8n object detection application uses the YOLOv8n model to detect 80 categories from the COCO dataset, and the detection results are marked on the screen with detection boxes.

3.20.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> object_detect_yolov8n.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.20.3 Demonstration Effect#

The original image is as follows:

yolov8n_det

yolov8n_det_res

3.21 OCR Detection#

3.21.1 Demo Description#

The OCR detection application detects text appearing in the video, and the detection results are marked on the screen with detection boxes.

3.21.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> ocr_det.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.21.3 Demonstration Effect#

The original image is as follows:

ocr_det

ocr_det_res

3.22 OCR Recognition#

3.22.1 Demo Description#

The OCR recognition application is a dual-model task. First, it detects text in each frame of the video, then recognizes the detected text, and finally marks the detection box on the screen and adds the recognized text content near the detection box.

3.22.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> ocr_rec.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.22.3 Demonstration Effect#

The original image is as follows:

ocr_rec

ocr_rec_res

3.23 Human Detection#

3.23.1 Demo Description#

The human detection application detects people in the video and marks them on the screen with detection boxes.

3.23.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> person_detection.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.23.3 Demonstration Effect#

The original image is as follows:

person_det

person_det_res

3.24 Human Key Point Detection#

3.24.1 Demo Description#

The human key point detection application uses the YOLOv8n-pose model to detect human poses, resulting in the positions of 17 human skeletal key points, which are connected with lines of different colors and displayed on the screen.

3.24.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> person_keypoint_detect.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.24.3 Demonstration Effect#

The original image is as follows:

person_kp_det

person_kp_det_res

3.25 Puzzle Game#

3.25.1 Demo Description#

The puzzle game is a hand-based fun game. When the application starts, ensure there are no palms in the camera range and wait for the screen to initialize. On the left is a randomly scrambled puzzle, and on the right is the target puzzle. When a palm enters, open the thumb and middle finger. The yellow dot between the fingers is used to locate the moving block. Quickly pinch the two fingers together and then open them again. The yellow dot turns blue, and the corresponding block moves to the adjacent empty space.

3.25.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> puzzle_game.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.25.3 Demonstration Effect#

Here, the demonstration effect is given using the 01studio LCD screen (800*480) as an example. Please change the display_mode in the __main__ function to lcd and run it. The HDMI effect is similar. Keep a certain distance from the camera, open the thumb and middle finger, and pinch the thumb and middle finger around the empty block. The empty block and the current block will swap positions.

puzzle_game_res

3.26 YOLOv8n Segmentation#

3.26.1 Demo Description#

The YOLOv8n segmentation application uses the YOLOv8n model to segment objects from 80 categories of the COCO dataset appearing in the video, displaying them as a mask on the screen.

3.26.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> segment_yolov8n.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.26.3 Demonstration Effect#

The original image is as follows:

yolov8n_seg

yolov8n_seg_res

3.27 Self-Learning#

3.27.1 Demo Description#

Self-learning is a feature-based classification method. First, set the labels in the code and specify the names of the items to be collected. Then start the application and place the items to be collected in the feature collection box as prompted on the screen. Once feature collection is complete, it automatically enters recognition mode. Place the item to be recognized in the collection box, and it will be compared with the registered items for feature comparison, completing classification based on similarity.

3.27.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> self_learning.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.27.3 Demonstration Effect#

The images of the apple and banana to be learned are as follows:

apple

banana

Here, the demonstration effect is given using the 01studio LCD screen (800*480) as an example. Please change the display_mode in the __main__ function to lcd and run it. The HDMI effect is similar. Follow the screen prompts to place the apple and banana in the box for learning, then automatically enter recognition mode. The object in the box will be recognized as an apple or banana, and a similarity score will be given.

self_learning_res

3.28 Local Magnifier#

3.28.1 Demo Description#

The local magnifier is a fun application based on hand features. After starting the application, ensure one hand is inside the camera. Pinch the thumb and middle finger together, position it at a certain location, and open the two fingers. The enlarged image of that area will be displayed near the fingers.

3.28.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> space_resize.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.28.3 Demonstration Effect#

Here, the demonstration effect is given using the 01studio LCD screen (800*480) as an example. Please change the display_mode in the __main__ function to lcd and run it. The HDMI effect is similar. Pinch the thumb and middle finger, place them in the frame, then open and close the fingers, and a small surrounding area will zoom in and out.

space_resize_res

3.29 Text-to-Speech (Chinese)#

3.29.1 Demo Description#

Text-to-Speech (TTS, Chinese) is a typical audio application. You can modify the text in the main function, and after the audio is generated, you can listen to it through headphones. This application takes a relatively long time, and the complete audio playback time is also quite long, depending on the length of the generated text.

3.29.2 Code#

Open the IDE, select File -> Open File, and choose the corresponding script according to the following path: This PC -> CanMV -> sdcard -> examples -> 05-AI-Demo -> tts_zh.py. You can refer to the source code displayed in the IDE. Click the run button at the bottom left of the IDE to demonstrate the demo.

3.29.3 Demonstration Effect#

Please insert headphones or speakers, run the program, and the generated audio will play after completion. Please experience it yourself!

3. AI Demo Documentation

Contents

3. AI Demo Documentation#

1. AI Demo Application#

2. AI Demo Development Framework Introduction#

2.1. AI Demo Development Framework#

2.2. Interface Encapsulation#

2.3. Application Methods and Examples#

2.3.1. Overview#

2.3.2. Single-Model Task#

2.3.3. Custom Pre-processing Task#

Example of Keyword Spotting with Keyword “XiaonanXiaonan”#

2.3.4. No Pre-processing Task#

2.3.5. Multi-Model Task#

2.4. Reference Documents#

2.4.1. K230 CanMV Documentation#

2.4.2. Ulab Library Support#

3. AI Demo#

3.1 Dynamic Gesture Recognition#

3.1.1 Demo Description#

3.1.2 Code#

3.1.3 Demonstration Effect#

3.2 Gaze Estimation#

3.2.1 Demo Description#

3.2.2 Code#

3.2.3 Demonstration Effect#

3.3 Face Detection#

3.3.1 Demo Description#

3.3.2 Code#

3.3.3 Demonstration Effect#

3.4 Facial Key Points#

3.4.1 Demo Description#

3.4.2 Code#

3.4.3 Demonstration Effect#

3.5 Face 3D Mesh#

3.5.1 Demo Description#

3.5.2 Code#

3.5.3 Demonstration Effect#

3.6 Face Parsing#

3.6.1 Demo Description#

3.6.2 Code#

3.6.3 Demonstration Effect#

3.7 Face Pose#

3.7.1 Demo Description#

3.7.2 Code#

3.7.3 Demonstration Effect#

3.8 Face Registration#

3.8.1 Demo Description#

3.8.2 Code#

3.8.3 Demonstration Effect#

3.9 Face Recognition#

3.9.1 Demo Description#

3.9.2 Code#

3.9.3 Demonstration Effect#

3.10 Fall Detection#

3.10.1 Demo Description#

3.10.2 Code#

3.10.3 Demonstration Effect#

3.11 Rock-Paper-Scissors Game#

3.11.1 Demo Description#

3.11.2 Code#

3.11.3 Demonstration Effect#

3.12 Palm Detection#

3.12.1 Demo Description#

3.12.2 Code#

3.12.3 Demonstration Effect#

3.13 Palm Key Point Classification#

3.13.1 Demo Description#

3.13.2 Code#

3.13.3 Demonstration Effect#

3.14 Palm Key Point Detection#

3.14.1 Demo Description#

3.14.2 Code#

3.14.3 Demonstration Effect#

3.15 Gesture Recognition#

3.15.1 Demo Description#

3.15.2 Code#

3.15.3 Demonstration Effect#

3.16 Keyword Spotting#

3.16.1 Demo Description#