5.3 AIBase Module API Manual#

1. Overview#

This manual aims to guide developers in using MicroPython to develop AI Demos by constructing a complete AI inference process, which includes loading models, preprocessing, inference, obtaining model outputs, and post-processing. The module encapsulates the inference process of a single model, wrapping preprocessing, inference, and output retrieval operations within the framework, allowing users to focus only on preprocessing configuration and post-processing when developing AI applications.

2. API Introduction#

2.1 init#

Description

AIBase constructor, initializes AI program to get image resolution and display-related parameters.

Syntax

from libs.AIBase import AIBase

aibase = AIBase(kmodel_path="**.kmodel", model_input_size=[224,224], rgb888p_size=[1280,720], debug_mode=0)

Parameters

Parameter Name	Description	Input / Output	Note
kmodel_path	kmodel model path	Input	Required
model_input_size	Model input resolution, list type, including width and height, e.g., [512,512]	Input	Required
rgb888p_size	AI program input image resolution, list type, including width and height, e.g., [1280,720]	Input	Required
debug_mode	Debug timing mode, 0 for timing, 1 for no timing, int type	Input	Default is 0

Return Value

Return Value	Description	Note
AIBase	AIBase instance	This class is generally inherited by subclasses to write AI Demo classes for different scenarios

2.2 get_kmodel_inputs_num#

Description

Get the number of inputs for the kmodel.

Syntax

aibase.get_kmodel_inputs_num()

Return Value

Return Value	Description
inputs_num	Number of kmodel inputs

2.3 get_kmodel_outputs_num#

Description

Get the number of outputs for the kmodel.

Syntax

aibase.get_kmodel_outputs_num()

Return Value

Return Value	Description
outputs_num	Number of kmodel outputs

2.4 preprocess#

Description

Call the ai2d method defined in the subclass to perform preprocessing. If preprocessing cannot be achieved with ai2d or is not needed, it must be overridden in the subclass.

Syntax

aibase.preprocess(input_np)

Parameters

Parameter Name	Description	Input / Output	Note
input_np	`ulab.numpy.ndarray` format input data, shape must be consistent with the configuration in `Ai2d.build`	Input	Required

Return Value

Return Value	Description
input_tensors	List of input tensors obtained after ai2d preprocessing

2.5 inference#

Description

Method to perform inference using kmodel and obtain model outputs.

Syntax

results = aibase.inference()

Return Value

Return Value	Description
results	List of kmodel inference outputs, each output is in `ulab.numpy.ndarray` format

2.6 postprocess#

Description

Definition of post-processing interface. Since different AI applications require different post-processing steps, this method must be overridden in the subclass.

Syntax

aibase.postprocess()

Return Value

Return Value	Description
None

2.7 run#

Description

The complete process of single-model inference, including preprocessing, inference, output retrieval, and post-processing, returning post-processing output for drawing results on Display.

Syntax

aibase.run(input_np)

Parameters

Parameter Name	Description	Input / Output	Note
input_np	`ulab.numpy.ndarray` format input data, shape must be consistent with the configuration in `Ai2d.build`	Input	Required

Return Value

Return Value	Description
None

2.8 deinit#

Description

AIBase deinitialization method, destroys the kpu instance and releases memory.

Syntax

aibase.deinit()

Return Value

Return Value	Description
None

3. Example Program#

Attention

The AIBase class is generally not used alone. It serves as a parent class for AI Demo application development, providing basic interfaces. Subclasses inherit AIBase and rewrite some methods according to task types to develop specific scenarios. During development, the draw_result method must be defined in the subclass to draw results according to the task.

Below is a face detection example program:

from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os, sys, gc, time, random, utime
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import image
import aidemo

# Custom face detection class, inheriting from AIBase base class
class FaceDetectionApp(AIBase):
    def __init__(self, kmodel_path, model_input_size, anchors, confidence_threshold=0.5, nms_threshold=0.2, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
        super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)  # Call the base class constructor
        self.kmodel_path = kmodel_path  # Model file path
        self.model_input_size = model_input_size  # Model input resolution
        self.confidence_threshold = confidence_threshold  # Confidence threshold
        self.nms_threshold = nms_threshold  # NMS (Non-Maximum Suppression) threshold
        self.anchors = anchors  # Anchor data for object detection
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]  # Sensor image resolution for AI, aligned to 16 for width
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]  # Display resolution, aligned to 16 for width
        self.debug_mode = debug_mode  # Debug mode
        self.ai2d = Ai2d(debug_mode)  # Instantiate Ai2d for model preprocessing
        self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)  # Set Ai2d input-output format and type

    # Configure preprocessing operations, using pad and resize here. Ai2d supports crop/shift/pad/resize/affine. Check /sdcard/app/libs/AI2D.py for details.
    def config_preprocess(self, input_image_size=None):
        with ScopedTiming("set preprocess config", self.debug_mode > 0):  # Timer, activated if debug_mode > 0
            ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size  # Initialize ai2d preprocessing config, default is sensor size for AI, can be modified by setting input_image_size
            top, bottom, left, right = self.get_padding_param()  # Get padding parameters
            self.ai2d.pad([0, 0, 0, 0, top, bottom, left, right], 0, [104, 117, 123])  # Pad edges
            self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)  # Resize image
            self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]], [1,3,self.model_input_size[1],self.model_input_size[0]])  # Build preprocessing flow

    # Custom post-process for the current task, results is a list of model output arrays, using aidemo's face_det_post_process interface here
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
            post_ret = aidemo.face_det_post_process(self.confidence_threshold, self.nms_threshold, self.model_input_size[1], self.anchors, self.rgb888p_size, results)
            if len(post_ret) == 0:
                return post_ret
            else:
                return post_ret[0]

    # Draw detection results on the screen
    def draw_result(self, pl, dets):
        with ScopedTiming("display_draw", self.debug_mode > 0):
            if dets:
                pl.osd_img.clear()  # Clear OSD image
                for det in dets:
                    # Convert detection box coordinates to display resolution coordinates
                    x, y, w, h = map(lambda x: int(round(x, 0)), det[:4])
                    x = x * self.display_size[0] // self.rgb888p_size[0]
                    y = y * self.display_size[1] // self.rgb888p_size[1]
                    w = w * self.display_size[0] // self.rgb888p_size[0]
                    h = h * self.display_size[1] // self.rgb888p_size[1]
                    pl.osd_img.draw_rectangle(x, y, w, h, color=(255, 255, 0, 255), thickness=2)  # Draw rectangle
            else:
                pl.osd_img.clear()

    # Get padding parameters
    def get_padding_param(self):
        dst_w = self.model_input_size[0]  # Model input width
        dst_h = self.model_input_size[1]  # Model input height
        ratio_w = dst_w / self.rgb888p_size[0]  # Width scaling ratio
        ratio_h = dst_h / self.rgb888p_size[1]  # Height scaling ratio
        ratio = min(ratio_w, ratio_h)  # Choose the smaller scaling ratio
        new_w = int(ratio * self.rgb888p_size[0])  # New width
        new_h = int(ratio * self.rgb888p_size[1])  # New height
        dw = (dst_w - new_w) / 2  # Width difference
        dh = (dst_h - new_h) / 2  # Height difference
        top = int(round(0))
        bottom = int(round(dh * 2 + 0.1))
        left = int(round(0))
        right = int(round(dw * 2 - 0.1))
        return top, bottom, left, right

if __name__ == "__main__":
    # Display mode, default is "hdmi", options are "hdmi" and "lcd"
    display_mode = "hdmi"
    # k230 remains unchanged, k230d can be adjusted to [640,360]
    rgb888p_size = [1920, 1080]

    if display_mode == "hdmi":
        display_size = [1920, 1080]
    else:
        display_size = [800, 480]
    # Set model path and other parameters
    kmodel_path = "/sdcard/examples/kmodel/face_detection_320.kmodel"
    # Other parameters
    confidence_threshold = 0.5
    nms_threshold = 0.2
    anchor_len = 4200
    det_dim = 4
    anchors_path = "/sdcard/examples/utils/prior_data_320.bin"
    anchors = np.fromfile(anchors_path, dtype=np.float)
    anchors = anchors.reshape((anchor_len, det_dim))

    # Initialize PipeLine for image processing flow
    pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
    pl.create()  # Create PipeLine instance
    # Initialize custom face detection instance
    face_det = FaceDetectionApp(kmodel_path, model_input_size=[320, 320], anchors=anchors, confidence_threshold=confidence_threshold, nms_threshold=nms_threshold, rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
    face_det.config_preprocess()  # Configure preprocessing

    try:
        while True:
            os.exitpoint()                      # Check for exit signal
            with ScopedTiming("total", 1):
                img = pl.get_frame()            # Get current frame data
                res = face_det.run(img)         # Infer current frame
                face_det.draw_result(pl, res)   # Draw results
                pl.show_image()                 # Display results
                gc.collect()                    # Garbage collection
    except Exception as e:
        sys.print_exception(e)                  # Print exception info
    finally:
        face_det.deinit()                       # Deinitialize
        pl.destroy()                            # Destroy PipeLine instance

4. Development Tips#

For common data type conversions in development, here are corresponding examples for reference.

Tips:

Converting Image object to ulab.numpy.ndarray:

import image
img.to_rgb888().to_numpy_ref() # Returns an array in HWC layout

Converting ulab.numpy.ndarray to Image object:

import ulab.numpy as np
import image
img_np = np.zeros((height, width, 4), dtype=np.uint8)
img = image.Image(width, height, image.ARGB8888, alloc=image.ALLOC_REF, data=img_np)

Converting ulab.numpy.ndarray to tensor type:

import ulab.numpy as np
import nncase_runtime as nn
img_np = np.zeros((height, width, 4), dtype=np.uint8)
tensor = nn.from_numpy(img_np)

Converting tensor type to ulab.numpy.ndarray:

import ulab.numpy as np
import nncase_runtime as nn
img_np = tensor.to_numpy()

5.3 AIBase Module API Manual

Contents

5.3 AIBase Module API Manual#

1. Overview#

2. API Introduction#

2.1 init#

2.2 get_kmodel_inputs_num#

2.3 get_kmodel_outputs_num#

2.4 preprocess#

2.5 inference#

2.6 postprocess#

2.7 run#

2.8 deinit#

3. Example Program#

4. Development Tips#