5. Quick Start Guide for K230 AI Inference Process#

Tutorial Video for this Chapter: K230 Development Board Tutorial - Quick Start AI Inference Process on Bilibili

This chapter supports Linux+RT-Smart dual system and pure RT-Smart single system. For the system image compilation process, refer to [Compile K230_SDK](./Development Basics.md#make_image).

This chapter introduces the complete process of AI inference on the K230, providing a general overview of the AI inference process based on K230. It includes video capture, image preprocessing, model inference, post-processing, and display.

Video Input: VI (Video Input), video capture
Video Output: VO (Video Output), display

This chapter includes two implementations of the K230 AI inference process: AI inference based on OpenCV (C++) and AI inference based on ulab (MicroPython). Each corresponds to a subchapter, with detailed implementations described in the respective subchapters.

Detailed Code Implementation: k230_AI_Demo_Code_Flow_Introduction

5.1 AI Inference Process Based on OpenCV (C++)#

The K230 AI inference based on OpenCV (C++) briefly introduces the process of video capture, image preprocessing (OpenCV), model inference, post-processing, and display using the C++ language.

5.1.1 Video Capture#

Video Capture: (Video Input, VI) is related to the camera. This subsection briefly introduces the overall process of setting up the camera, starting the camera, capturing a frame from the camera, and stopping the camera using C++; for detailed information, see K230_VICAP_API Reference.md, K230_VICAP_SENSOR_Parameter Partition Reference.md, K230_Camera_Sensor Adaptation Guide.md.

This section takes CANMV-K230-V1.x as an example to explain the corresponding display parameter settings; for other development board settings, refer to vi_vo.h .

1. Setting Camera Properties:

Set the sensor to output two channels;
One channel is for display, with the output size set to 1080p and the image format as PIXEL_FORMAT_YVU_PLANAR_420, directly bound to the VO;
The other channel is for AI computation, with the output size set to 720p and the image format as PIXEL_FORMAT_BGR_888_PLANAR (actually rgb, chw, uint8);

Create Camera (sensor) Output Buffer (VB): Used to store the two outputs from the camera

//vi_vo.h
/***************************unfixed: may need modification for different AI Demos***********************/
#define SENSOR_CHANNEL (3)     // Number of channels
#define SENSOR_HEIGHT (720)    // sensor ch1 output height, AI input
#define SENSOR_WIDTH (1280)    // sensor ch1 output width, AI input
#define ISP_CHN0_WIDTH  (1920) // sensor ch0 output width, VO
#define ISP_CHN0_HEIGHT (1080) // sensor ch0 output height, VO
/*****************************************************************************/

/***************************fixed: no modification needed***********************************/
memset(&config, 0, sizeof(config));
config.max_pool_cnt = 64;
//VB for YUV420SP output
config.comm_pool[0].blk_cnt = 5;
config.comm_pool[0].mode = VB_REMAP_MODE_NOCACHE;
config.comm_pool[0].blk_size = VICAP_ALIGN_UP((ISP_CHN0_WIDTH * ISP_CHN0_HEIGHT * 3 / 2), VICAP_ALIGN_1K);

//VB for RGB888 output
config.comm_pool[1].blk_cnt = 5;
config.comm_pool[1].mode = VB_REMAP_MODE_NOCACHE;
config.comm_pool[1].blk_size = VICAP_ALIGN_UP((SENSOR_HEIGHT * SENSOR_WIDTH * 3 ), VICAP_ALIGN_1K);

ret = kd_mpi_vb_set_config(&config);
if (ret) {
    printf("vb_set_config failed ret:%d\n", ret);
    return ret;
}
/*****************************************************************************/

Set Camera Properties: Set sensor_type; generally, no need to change the camera, no modification needed.

//vi_vo.h
/***************************fixed: no modification needed***********************************/
vicap_dev = VICAP_DEV_ID_0;
ret = kd_mpi_vicap_get_sensor_info(sensor_type, &sensor_info);
if (ret) {
    printf("sample_vicap, the sensor type not supported!\n");
    return ret;
}

......

dev_attr.cpature_frame = 0;
memcpy(&dev_attr.sensor_info, &sensor_info, sizeof(k_vicap_sensor_info));

ret = kd_mpi_vicap_set_dev_attr(vicap_dev, dev_attr);
if (ret) {
    printf("sample_vicap, kd_mpi_vicap_set_dev_attr failed.\n");
    return ret;
}
/*****************************************************************************/

Set Camera Channel 0 Properties: Set camera channel 0 resolution to 1080p and format to PIXEL_FORMAT_YVU_PLANAR_420; bind camera channel 0 to display; generally, only chn_attr.out_win.width, chn_attr.out_win.height, and chn_attr.pix_format need attention, no other modifications needed.

//vi_vo.h
/***************************unfixed: may need modification for different AI Demos***********************/
#define ISP_CHN0_WIDTH  (1920) // sensor ch0 output width, VO
#define ISP_CHN0_HEIGHT (1080) // sensor ch0 output height, VO
/*****************************************************************************/

/***************************unfixed: may need modification for different AI Demos******************/
//set chn0 output yuv420sp
......
chn_attr.out_win.width = ISP_CHN0_WIDTH;
chn_attr.out_win.height = ISP_CHN0_HEIGHT;
......
chn_attr.chn_enable = K_TRUE;
chn_attr.pix_format = PIXEL_FORMAT_YVU_PLANAR_420;
/*****************************************************************************/ 
......
/***************************fixed: no modification needed***********************************/
chn_attr.buffer_num = VICAP_MAX_FRAME_COUNT;        //at least 3 buffers for ISP
chn_attr.buffer_size = config.comm_pool[0].blk_size;
vicap_chn = VICAP_CHN_ID_0;

printf("sample_vicap ...kd_mpi_vicap_set_chn_attr, buffer_size[%d]\n", chn_attr.buffer_size);
ret = kd_mpi_vicap_set_chn_attr(vicap_dev, vicap_chn, chn_attr);
if (ret) {
    printf("sample_vicap, kd_mpi_vicap_set_chn_attr failed.\n");
    return ret;
}

//bind vicap chn 0 to VO
vicap_mpp_chn.mod_id = K_ID_VI;
vicap_mpp_chn.dev_id = vicap_dev;
vicap_mpp_chn.chn_id = vicap_chn;

vo_mpp_chn.mod_id = K_ID_VO;
vo_mpp_chn.dev_id = K_VO_DISPLAY_DEV_ID;
vo_mpp_chn.chn_id = K_VO_DISPLAY_CHN_ID1;

sample_vicap_bind_vo(vicap_mpp_chn, vo_mpp_chn);
printf("sample_vicap ...dwc_dsi_init\n");
/*****************************************************************************/

Set Camera Channel 1 Properties: Set channel 1 resolution to 720p and format to PIXEL_FORMAT_BGR_888_PLANAR.

//vi_vo.h
/***************************unfixed: may need modification for different AI Demos***********************/
#define SENSOR_CHANNEL (3)     // Number of channels
#define SENSOR_HEIGHT (720)    // sensor ch1 output height, AI input
#define SENSOR_WIDTH (1280)    // sensor ch1 output width, AI input
/*****************************************************************************/
......
/***************************unfixed: may need modification for different AI Demos******************/
//set chn1 output rgb888p
....
chn_attr.out_win.width = SENSOR_WIDTH ;
chn_attr.out_win.height = SENSOR_HEIGHT;

......
chn_attr.chn_enable = K_TRUE;
chn_attr.pix_format = PIXEL_FORMAT_BGR_888_PLANAR;
/*****************************************************************************/ 

/***************************fixed: no modification needed***********************************/
chn_attr.buffer_num = VICAP_MAX_FRAME_COUNT;//at least 3 buffers for ISP
chn_attr.buffer_size = config.comm_pool[1].blk_size;

printf("sample_vicap ...kd_mpi_vicap_set_chn_attr, buffer_size[%d]\n", chn_attr.buffer_size);
ret = kd_mpi_vicap_set_chn_attr(vicap_dev, VICAP_CHN_ID_1, chn_attr);
if (ret) {
    printf("sample_vicap, kd_mpi_vicap_set_chn_attr failed.\n");
    return ret;
}
/*****************************************************************************/

Initialize and Start the Camera:

//vi_vo.h
/***************************fixed: no modification needed***********************************/
ret = kd_mpi_vicap_init(vicap_dev);
if (ret) {
    printf("sample_vicap, kd_mpi_vicap_init failed.\n");
    // goto err_exit;
}

printf("sample_vicap ...kd_mpi_vicap_start_stream\n");
ret = kd_mpi_vicap_start_stream(vicap_dev);
if (ret) {
    printf("sample_vicap, kd_mpi_vicap_init failed.\n");
    // goto err_exit;
}
/*****************************************************************************/

Usage Example:

//main.cc
vivcap_start();

**2. Retrieving Camera Images:

Temporary Camera Data Address: Create a vaddr to temporarily store the latest data from the camera. It matches the size of camera channel 1.

// main.cc
/***************************fixed: no modification needed***********************************/
// allocate memory for sensor
size_t paddr = 0;
void *vaddr = nullptr;
size_t size = SENSOR_CHANNEL * SENSOR_HEIGHT * SENSOR_WIDTH;
int ret = kd_mpi_sys_mmz_alloc_cached(&paddr, &vaddr, "allocate", "anonymous", size);
if (ret)
{
    std::cerr << "physical_memory_block::allocate failed: ret = " << ret << ", errno = " << strerror(errno) << std::endl;
    std::abort();
}
/*****************************************************************************/

Reading the Latest Frame:

Dump: Read a frame of image from camera channel 1, i.e., dump a frame of data from VB to dump_info.
Map: Map the DDR address (physical address) corresponding to dump_info to the current system address (virtual address) for access.
Convert to cv::Mat: Convert the camera data to cv::Mat, sensor (rgb, chw) -> cv::Mat (bgr, hwc); converting the camera data to cv::Mat is not mandatory, but it is done here to explain it in a more familiar way (cv::Mat).

//vi_vo.h
/***************************fixed: no modification needed***********************************/
// VB for RGB888 output
config.comm_pool[1].blk_cnt = 5;
config.comm_pool[1].mode = VB_REMAP_MODE_NOCACHE;
config.comm_pool[1].blk_size = VICAP_ALIGN_UP((SENSOR_HEIGHT * SENSOR_WIDTH * 3 ), VICAP_ALIGN_1K);
/*****************************************************************************/

//main.cc
while (!isp_stop)
{
    cv::Mat ori_img;
    // sensor to cv::Mat
    {
        /***************************fixed: no modification needed***********************************/
        // Read a frame of image from camera channel 1, i.e., dump a frame of data from VB to dump_info
        memset(&dump_info, 0 , sizeof(k_video_frame_info));
        ret = kd_mpi_vicap_dump_frame(vicap_dev, VICAP_CHN_ID_1, VICAP_DUMP_YUV, &dump_info, 1000);
        if (ret) {
            printf("sample_vicap...kd_mpi_vicap_dump_frame failed.\n");
            continue;
        }

        // Map the DDR address (physical address) corresponding to dump_info to the current system (virtual address) for access
        // vbvaddr is changing in real-time, so it's best to copy the latest data to a fixed address vaddr for access by other parts
        auto vbvaddr = kd_mpi_sys_mmap_cached(dump_info.v_frame.phys_addr[0], size);
        memcpy(vaddr, (void *)vbvaddr, SENSOR_HEIGHT * SENSOR_WIDTH * 3); 
        kd_mpi_sys_munmap(vbvaddr, size);
        /*****************************************************************************/
        
        // Convert camera data to cv::Mat, sensor (rgb, chw) -> cv::Mat (bgr, hwc)
        cv::Mat image_r = cv::Mat(SENSOR_HEIGHT, SENSOR_WIDTH, CV_8UC1, vaddr);
        cv::Mat image_g = cv::Mat(SENSOR_HEIGHT, SENSOR_WIDTH, CV_8UC1, vaddr + SENSOR_HEIGHT * SENSOR_WIDTH);
        cv::Mat image_b = cv::Mat(SENSOR_HEIGHT, SENSOR_WIDTH, CV_8UC1, vaddr + 2 * SENSOR_HEIGHT * SENSOR_WIDTH);
        std::vector<cv::Mat> color_vec(3);
        color_vec.clear();
        color_vec.push_back(image_b);
        color_vec.push_back(image_g);
        color_vec.push_back(image_r);
        cv::merge(color_vec, ori_img);
    }
    // Use the current frame data
    ......
    ......
    {
        /***************************fixed: no modification needed***********************************/
        // Release the current frame of the sensor
        ret = kd_mpi_vicap_dump_release(vicap_dev, VICAP_CHN_ID_1, &dump_info);
        if (ret) {
            printf("sample_vicap...kd_mpi_vicap_dump_release failed.\n");
        }
        /*****************************************************************************/
    }
}

**3. Stopping the Camera:

//vi_vo.h
/***************************fixed: no modification needed***********************************/
int vivcap_stop()
{
    // Stop the camera
    printf("sample_vicap ...kd_mpi_vicap_stop_stream\n");
    int ret = kd_mpi_vicap_stop_stream(vicap_dev);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_init failed.\n");
        return ret;
    }

    // Release camera resources
    ret = kd_mpi_vicap_deinit(vicap_dev);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_deinit failed.\n");
        return ret;
    }

    kd_mpi_vo_disable_video_layer(K_VO_LAYER1);

    vicap_mpp_chn.mod_id = K_ID_VI;
    vicap_mpp_chn.dev_id = vicap_dev;
    vicap_mpp_chn.chn_id = vicap_chn;

    vo_mpp_chn.mod_id = K_ID_VO;
    vo_mpp_chn.dev_id = K_VO_DISPLAY_DEV_ID;
    vo_mpp_chn.chn_id = K_VO_DISPLAY_CHN_ID1;

    // Unbind VI and VO
    sample_vicap_unbind_vo(vicap_mpp_chn, vo_mpp_chn);

    /* Allow one frame time for the VO to release the VB block */
    k_u32 display_ms = 1000 / 33;
    usleep(1000 * display_ms);

    // Exit VB
    ret = kd_mpi_vb_exit();
    if (ret) {
        printf("sample_vicap, kd_mpi_vb_exit failed.\n");
        return ret;
    }

    return 0;
}
/*****************************************************************************/

Simply call in main.cc:

//main.cc
vivcap_start();
......
vivcap_stop();

5.1.2 Preprocessing#

Resize the current frame data.

//main.cc
while (!isp_stop)
{
    cv::Mat ori_img;        // ori: uint8, chw, rgb
    // sensor to cv::Mat
    {
        ......
    }
    /***************************unfixed: may need modification for different AI Demos******************/
    // pre_process, cv::Mat((1280,720), bgr, hwc) -> kmodel((224,224), rgb, hwc)
    cv::Mat pre_process_img;
    {
        cv::Mat rgb_img;
        cv::cvtColor(ori_img, rgb_img, cv::COLOR_BGR2RGB);
        cv::resize(rgb_img, pre_process_img, cv::Size(kmodel_input_width, kmodel_input_height), cv::INTER_LINEAR);
    }
    /*****************************************************************************/

    // A general implementation has been provided in ai_base.cc and open-source code will be provided in Chapter 6
    /***************************fixed: no modification needed***********************************/
    // set kmodel input
    {
        runtime_tensor tensor0 = kmodel_interp.input_tensor(0).expect("cannot get input tensor");
        auto in_buf = tensor0.impl()->to_host().unwrap()->buffer().as_host().unwrap().map(map_access_::map_write).unwrap().buffer();
        memcpy(reinterpret_cast<unsigned char *>(in_buf.data()), pre_process_img.data, sizeof(uint8_t) * kmodel_input_height * kmodel_input_width * 3);
        hrt::sync(tensor0, sync_op_t::sync_write_back, true).expect("sync write_back failed");
    }
    /*****************************************************************************/ 
    ......
}

5.1.3 Model Inference#

After setting the model input, perform model inference to get the inference results.

//main.cc
string kmodel_path = argv[1];
cout << kmodel_path << endl;
float cls_thresh = 0.5;

// A general implementation has been provided in ai_base.cc and open-source code will be provided in Chapter 6
/***************************fixed: no modification needed***********************************/
// kmodel interpreter, constructed from the kmodel file, responsible for model loading, input/output settings, and inference
interpreter kmodel_interp;        
// load model
std::ifstream ifs(kmodel_path, std::ios::binary);
kmodel_interp.load_model(ifs).expect("Invalid kmodel");

// inputs init
for (size_t i = 0; i < kmodel_interp.inputs_size(); i++)
{
    auto desc = kmodel_interp.input_desc(i);
    auto shape = kmodel_interp.input_shape(i);
    auto tensor = host_runtime_tensor::create(desc.datatype, shape, hrt::pool_shared).expect("cannot create input tensor");
    kmodel_interp.input_tensor(i, tensor).expect("cannot set input tensor");
} 
auto shape0 = kmodel_interp.input_shape(0);      // nhwc
int kmodel_input_height = shape0[1];
int kmodel_input_width = shape0[2];

// outputs init
for (size_t i = 0; i < kmodel_interp.outputs_size(); i++)
{
    auto desc = kmodel_interp.output_desc(i);
    auto shape = kmodel_interp.output_shape(i);
    auto tensor = host_runtime_tensor::create(desc.datatype, shape, hrt::pool_shared).expect("cannot create output tensor");
    kmodel_interp.output_tensor(i, tensor).expect("cannot set output tensor");
}
/*****************************************************************************/ 
    
while (!isp_stop)
{
    cv::Mat ori_img;
    // sensor to cv::Mat
    {
        ......
    }

    // pre_process
    cv::Mat pre_process_img;
    {
        ......
    }

    // set kmodel input
    {
        ......
    }

    // A general implementation has been provided in ai_base.cc and open-source code will be provided in Chapter 6
    /***************************fixed: no modification needed***********************************/
    // kmodel run
    kmodel_interp.run().expect("error occurred in running model");

    // get kmodel output
    vector<float *> k_outputs;
    {
        for (int i = 0; i < kmodel_interp.outputs_size(); i++)
        {
            auto out = kmodel_interp.output_tensor(i).expect("cannot get output tensor");
            auto buf = out.impl()->to_host().unwrap()->buffer().as_host().unwrap().map(map_access_::map_read).unwrap().buffer();
            float *p_out = reinterpret_cast<float *>(buf.data());
            k_outputs.push_back(p_out);
        }
    }
    /*****************************************************************************/
}

5.1.4 Post-processing#

Post-process the model results and store the results in results.

//main.cc
vector<cls_res> results;
while (!isp_stop)
{
    cv::Mat ori_img;
    // sensor to cv::Mat
    {
        ......
    }

    // pre_process
    cv::Mat pre_process_img;
    {
        ......
    }

    // set kmodel input
    {
        ......
    }

    // kmodel run
    ......

    // get kmodel output
    vector<float *> k_outputs;
    {
        ......
    }

    // post process
    results.clear();
    {
        /***************************unfixed: may need modification for different AI Demos******************/
        float* output0 = k_outputs[0];
        // softmax
        float sum = 0.0;
        for (int i = 0; i < labels.size(); i++){
            sum += exp(output0[i]);
        }

        int max_index;
        for (int i = 0; i < labels.size(); i++)
        {
            output0[i] = exp(output0[i]) / sum;
        }
        max_index = std::max_element(output0, output0 + labels.size()) - output0; 
        cls_res b;
        if (output0[max_index] >= cls_thresh)
        {
            b.label = labels[max_index];
            b.score = output0[max_index];
            results.push_back(b);
        }
        /*****************************************************************************/   
    }
}

5.1.5 Display#

Display (Video Output, VO) is related to display settings and overlay. This subsection briefly introduces the overall process of display settings and overlay using C++; for detailed information, see K230_Video_Output_API Reference.md

Display Settings: Set display size and format.
Display Overlay: The display consists of two layers. The lower layer (original image layer) directly displays the camera output, while the upper layer (OSD layer) is used for drawing boxes, points, writing text, etc.

This section takes CANMV-K230-V1.x as an example to explain the corresponding display parameter settings; for other development board settings, refer to vi_vo.h .

1. Display Settings: Set display size and format.

//vi_vo.h
/***************************fixed: no modification needed***********************************/
static k_s32 sample_connector_init(void)
{
    ......
    k_connector_type connector_type = LT9611_MIPI_4LAN_1920X1080_30FPS;
    ......
}

static k_s32 vo_layer_vdss_bind_vo_config(void)
{
    ......
    sample_connector_init();

    // config layer
    info.act_size.width = ISP_CHN0_WIDTH; // ISP_CHN0_HEIGHT; // 1080; // 640; // 1080;
    info.act_size.height = ISP_CHN0_HEIGHT; // ISP_CHN0_WIDTH; // 1920; // 480; // 1920;
    info.format = PIXEL_FORMAT_YVU_PLANAR_420;
    ......
}
/*****************************************************************************/ 

**2. Display Overlay: Since the camera and display channels are bound, we cannot directly manipulate the VO. Therefore, we use an overlay method for display.

Original Image Layer: The camera (VI) channel 0 is bound to the display (VO) channel 1. As the camera starts, the data from camera channel 0 will automatically flow to VO channel 1.

//vi_vo.h
......
/***************************fixed: no modification needed***********************************/
//bind vicap chn 0 to vo
vicap_mpp_chn.mod_id = K_ID_VI;
vicap_mpp_chn.dev_id = vicap_dev;
vicap_mpp_chn.chn_id = vicap_chn;

vo_mpp_chn.mod_id = K_ID_VO;
vo_mpp_chn.dev_id = K_VO_DISPLAY_DEV_ID;
vo_mpp_chn.chn_id = K_VO_DISPLAY_CHN_ID1;

sample_vicap_bind_vo(vicap_mpp_chn, vo_mpp_chn);
printf("sample_vicap ...dwc_dsi_init\n");
/*****************************************************************************/ 
......

OSD Layer: After drawing boxes, points, and writing text on cv::Mat, insert the data into the corresponding VO channel.

// vi_vo.h
#define osd_id                              K_VO_OSD3
/***************************unfixed: may need modification for different AI Demos******************/
#define ISP_CHN0_WIDTH                      (1920) // sensor ch0 output width, vo
#define ISP_CHN0_HEIGHT                     (1080) // sensor ch0 output height, vo
#define osd_width                           (1920)
#define osd_height                          (1080)
/*****************************************************************************/
.....
/***************************fixed: no modification needed***********************************/
k_vb_blk_handle vo_insert_frame(k_video_frame_info *vf_info, void **pic_vaddr)
{
    k_u64 phys_addr = 0;
    k_u32 *virt_addr;
    k_vb_blk_handle handle;
    k_s32 size;

    if (vf_info == NULL)
        return K_FALSE;

    if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_ABGR_8888 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_ARGB_8888)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 4;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_RGB_565 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_BGR_565)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 2;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_ABGR_4444 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_ARGB_4444)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 2;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_RGB_888 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_BGR_888)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 3;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_ARGB_1555 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_ABGR_1555)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 2;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_YVU_PLANAR_420)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 3 / 2;

    size = size + 4096;         // Force 4K alignment, should be removed later

    printf("vb block size is %x \n", size);

    handle = kd_mpi_vb_get_block(g_pool_id, size, NULL);
    if (handle == VB_INVALID_HANDLE)
    {
        printf("%s get vb block error\n", __func__);
        return K_FAILED;
    }

    phys_addr = kd_mpi_vb_handle_to_phyaddr(handle);
    if (phys_addr == 0)
    {
        printf("%s get phys addr error\n", __func__);
        return K_FAILED;
    }

    virt_addr = (k_u32 *)kd_mpi_sys_mmap(phys_addr, size);
    // virt_addr = (k_u32 *)kd_mpi_sys_mmap_cached(phys_addr, size);

    if (virt_addr == NULL)
    {
        printf("%s mmap error\n", __func__);
        return K_FAILED;
    }

    vf_info->mod_id = K_ID_VO;
    vf_info->pool_id = g_pool_id;
    vf_info->v_frame.phys_addr[0] = phys_addr;
    if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_YVU_PLANAR_420)
        vf_info->v_frame.phys_addr[1] = phys_addr + (vf_info->v_frame.height * vf_info->v_frame.stride[0]);
    *pic_vaddr = virt_addr;

    printf("phys_addr is %lx g_pool_id is %d \n", phys_addr, g_pool_id);

    return handle;
}
/*****************************************************************************/

Usage Example:

// main.cc
/***************************fixed: no modification needed***********************************/
// osd create
k_video_frame_info vf_info;
void *pic_vaddr = NULL;       
memset(&vf_info, 0, sizeof(vf_info));
vf_info.v_frame.width = osd_width;
vf_info.v_frame.height = osd_height;
vf_info.v_frame.stride[0] = osd_width;
vf_info.v_frame.pixel_format = PIXEL_FORMAT_ARGB_8888;
block = vo_insert_frame(&vf_info, &pic_vaddr);
/*****************************************************************************/ 

while (!isp_stop)
{
    cv::Mat ori_img;
    // sensor to cv::Mat
    // pre_process
    // set kmodel input
    // kmodel run
    // get kmodel output
    // post process
    results.clear();
    {
        ......
    }

    // draw result to vo
    {
        // draw osd
        {
            cv::Mat osd_frame(osd_height, osd_width, CV_8UC4, cv::Scalar(0, 0, 0, 0));
            /***************************unfixed: may need modification for different AI Demos******************/
            {
                // draw cls
                double fontsize = (osd_frame.cols * osd_frame.rows * 1.0) / (1100 * 1200);
                for(int i = 0; i < results.size(); i++)
                {   
                    std::string text = "class: " + results[i].label + ", score: " + std::to_string(round(results[i].score * 100) / 100.0).substr(0, 4);

                    cv::putText(osd_frame, text, cv::Point(1, 40), cv::FONT_HERSHEY_SIMPLEX, 0.8, cv::Scalar(255, 255, 255, 0), 2);

                    std::cout << text << std::endl;
                }
            }
            
            /*************************************************************************/
            
            /***************************fixed: no modification needed***********************************/
            memcpy(pic_vaddr, osd_frame.data, osd_width * osd_height * 4);
        }

        // insert osd to vo
        {
            kd_mpi_vo_chn_insert_frame(osd_id+3, &vf_info);
            printf("kd_mpi_vo_chn_insert_frame success \n");
        }
        /*****************************************************************************/ 
    }
    ......
}

5.1.6 Complete Code#

For detailed operations and running instructions, refer to Chapter 6. Here, the focus is on the code flow.

vi_vo.h

/* Copyright (c) 2023, Canaan Bright Sight Co., Ltd
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 * 1. Redistributions of source code must retain the above copyright
 * notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 * notice, this list of conditions and the following disclaimer in the
 * documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
 * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
 * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
 * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#include <chrono>
#include <fstream>
#include <iostream>
#include <thread>
#include <atomic>

#include "mpi_sys_api.h"

/* vicap */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>

#include "k_module.h"
#include "k_type.h"
#include "k_vb_comm.h"
#include "k_video_comm.h"
#include "k_sys_comm.h"
#include "mpi_vb_api.h"
#include "mpi_vicap_api.h"
#include "mpi_isp_api.h"
#include "mpi_sys_api.h"
#include "k_vo_comm.h"
#include "mpi_vo_api.h"

#include "vo_test_case.h"

#include "k_connector_comm.h"
#include "mpi_connector_api.h"
#include "k_autoconf_comm.h"

/***************************unfixed：不同AI Demo可能需要修改*******************/
//不同的宏对应不同开发板
#if defined(CONFIG_BOARD_K230_CANMV)
#define SENSOR_CHANNEL (3)    
#define SENSOR_HEIGHT (720)  
#define SENSOR_WIDTH (1280)   
#define ISP_CHN0_WIDTH  (1920)
#define ISP_CHN0_HEIGHT (1080)
#define ISP_INPUT_WIDTH (1920)
#define ISP_INPUT_HEIGHT (1080)
#define vicap_install_osd                   (1)
#define osd_id                              K_VO_OSD3
#define osd_width                           (1920)
#define osd_height                          (1080)
#elif defined(CONFIG_BOARD_K230_CANMV_V2)
#define SENSOR_CHANNEL (3)
#define SENSOR_HEIGHT (720)
#define SENSOR_WIDTH (1280)
#define ISP_CHN0_WIDTH  (1920)
#define ISP_CHN0_HEIGHT (1080)
#define ISP_INPUT_WIDTH (1920)
#define ISP_INPUT_HEIGHT (1080)
#define vicap_install_osd                   (1)
#define osd_id                              K_VO_OSD3
#define osd_width                           (1920)
#define osd_height                          (1080)
#elif defined(CONFIG_BOARD_K230D_CANMV)
#define SENSOR_CHANNEL (3)     
#define SENSOR_HEIGHT (720 / 2)
#define SENSOR_WIDTH (1280 / 2)
#define ISP_CHN0_WIDTH  (800)
#define ISP_CHN0_HEIGHT (480)
#define ISP_INPUT_WIDTH (1920)
#define ISP_INPUT_HEIGHT (1080)
#define vicap_install_osd                   (1)
#define osd_id                              K_VO_OSD3
#define osd_width                           (480)
#define osd_height                          (800)
#elif defined(CONFIG_BOARD_K230_CANMV_01STUDIO)
#define SENSOR_CHANNEL (3)     
#define SENSOR_HEIGHT (720)  
#define SENSOR_WIDTH (1280)    
#define ISP_CHN0_WIDTH  (800)
#define ISP_CHN0_HEIGHT (480)
#define ISP_INPUT_WIDTH (1920)
#define ISP_INPUT_HEIGHT (1080)
#define vicap_install_osd                   (1)
#define osd_id                              K_VO_OSD3
#define osd_width                           (480)
#define osd_height                          (800)
#else
#define SENSOR_CHANNEL (3)     // isp通道数
#define SENSOR_HEIGHT (1280)  // isp高度，ai输入，竖屏
#define SENSOR_WIDTH (720)    // isp宽度，ai输入，竖屏
#define ISP_CHN0_WIDTH  (1088)//(1920)
#define ISP_CHN0_HEIGHT (1920)//(1080)
#define vicap_install_osd                   (1)
#define osd_id                              K_VO_OSD3
#define osd_width                           (1080)
#define osd_height                          (1920)
#endif
/*****************************************************************************/    

/***************************fixed：无需修改***********************************/
k_vb_config config;
k_vicap_dev vicap_dev;
k_vicap_chn vicap_chn;
k_vicap_dev_attr dev_attr;
k_vicap_chn_attr chn_attr;
k_vicap_sensor_info sensor_info;
k_vicap_sensor_type sensor_type;
k_mpp_chn vicap_mpp_chn;
k_mpp_chn vo_mpp_chn;

k_video_frame_info dump_info;

k_vo_draw_frame vo_frame = (k_vo_draw_frame) {
    1,
    16,
    16,
    128,
    128,
    1
};

static k_vb_blk_handle block;
k_u32 g_pool_id;

int vo_creat_layer_test(k_vo_layer chn_id, layer_info *info)
{
    k_vo_video_layer_attr attr;

    // check layer
    if ((chn_id >= K_MAX_VO_LAYER_NUM) || ((info->func & K_VO_SCALER_ENABLE) && (chn_id != K_VO_LAYER0))
            || ((info->func != 0) && (chn_id == K_VO_LAYER2)))
    {
        printf("input layer num failed \n");
        return -1 ;
    }

    // check scaler

    // set offset
    attr.display_rect = info->offset;
    // set act
    attr.img_size = info->act_size;
    // sget size
    info->size = info->act_size.height * info->act_size.width * 3 / 2;
    //set pixel format
    attr.pixel_format = info->format;
    if (info->format != PIXEL_FORMAT_YVU_PLANAR_420)
    {
        printf("input pix format failed \n");
        return -1;
    }
    // set stride
    attr.stride = (info->act_size.width / 8 - 1) + ((info->act_size.height - 1) << 16);
    // set function
    attr.func = info->func;
    // set scaler attr
    attr.scaler_attr = info->attr;

    // set video layer atrr
    kd_mpi_vo_set_video_layer_attr(chn_id, &attr);

    // enable layer
    kd_mpi_vo_enable_video_layer(chn_id);

    return 0;
}

k_vb_blk_handle vo_insert_frame(k_video_frame_info *vf_info, void **pic_vaddr)
{
    k_u64 phys_addr = 0;
    k_u32 *virt_addr;
    k_vb_blk_handle handle;
    k_s32 size;

    if (vf_info == NULL)
        return K_FALSE;

    if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_ABGR_8888 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_ARGB_8888)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 4;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_RGB_565 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_BGR_565)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 2;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_ABGR_4444 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_ARGB_4444)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 2;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_RGB_888 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_BGR_888)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 3;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_ARGB_1555 || vf_info->v_frame.pixel_format == PIXEL_FORMAT_ABGR_1555)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 2;
    else if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_YVU_PLANAR_420)
        size = vf_info->v_frame.height * vf_info->v_frame.width * 3 / 2;

    size = size + 4096;         // 强制4K ，后边得删了

    printf("vb block size is %x \n", size);

    handle = kd_mpi_vb_get_block(g_pool_id, size, NULL);
    if (handle == VB_INVALID_HANDLE)
    {
        printf("%s get vb block error\n", __func__);
        return K_FAILED;
    }

    phys_addr = kd_mpi_vb_handle_to_phyaddr(handle);
    if (phys_addr == 0)
    {
        printf("%s get phys addr error\n", __func__);
        return K_FAILED;
    }

    virt_addr = (k_u32 *)kd_mpi_sys_mmap(phys_addr, size);
    // virt_addr = (k_u32 *)kd_mpi_sys_mmap_cached(phys_addr, size);

    if (virt_addr == NULL)
    {
        printf("%s mmap error\n", __func__);
        return K_FAILED;
    }

    vf_info->mod_id = K_ID_VO;
    vf_info->pool_id = g_pool_id;
    vf_info->v_frame.phys_addr[0] = phys_addr;
    if (vf_info->v_frame.pixel_format == PIXEL_FORMAT_YVU_PLANAR_420)
        vf_info->v_frame.phys_addr[1] = phys_addr + (vf_info->v_frame.height * vf_info->v_frame.stride[0]);
    *pic_vaddr = virt_addr;

    printf("phys_addr is %lx g_pool_id is %d \n", phys_addr, g_pool_id);

    return handle;
}

k_u32 vo_creat_osd_test(k_vo_osd osd, osd_info *info)
{
    k_vo_video_osd_attr attr;

    // set attr
    attr.global_alptha = info->global_alptha;

    if (info->format == PIXEL_FORMAT_ABGR_8888 || info->format == PIXEL_FORMAT_ARGB_8888)
    {
        info->size = info->act_size.width  * info->act_size.height * 4;
        info->stride  = info->act_size.width * 4 / 8;
    }
    else if (info->format == PIXEL_FORMAT_RGB_565 || info->format == PIXEL_FORMAT_BGR_565)
    {
        info->size = info->act_size.width  * info->act_size.height * 2;
        info->stride  = info->act_size.width * 2 / 8;
    }
    else if (info->format == PIXEL_FORMAT_RGB_888 || info->format == PIXEL_FORMAT_BGR_888)
    {
        info->size = info->act_size.width  * info->act_size.height * 3;
        info->stride  = info->act_size.width * 3 / 8;
    }
    else if(info->format == PIXEL_FORMAT_ARGB_4444 || info->format == PIXEL_FORMAT_ABGR_4444)
    {
        info->size = info->act_size.width  * info->act_size.height * 2;
        info->stride  = info->act_size.width * 2 / 8;
    }
    else if(info->format == PIXEL_FORMAT_ARGB_1555 || info->format == PIXEL_FORMAT_ABGR_1555)
    {
        info->size = info->act_size.width  * info->act_size.height * 2;
        info->stride  = info->act_size.width * 2 / 8;
    }
    else
    {
        printf("set osd pixel format failed  \n");
    }

    attr.stride = info->stride;
    attr.pixel_format = info->format;
    attr.display_rect = info->offset;
    attr.img_size = info->act_size;
    kd_mpi_vo_set_video_osd_attr(osd, &attr);

    kd_mpi_vo_osd_enable(osd);

    return 0;
}

void sample_vicap_install_osd(void)
{
    osd_info osd;

    osd.act_size.width = osd_width ;
    osd.act_size.height = osd_height;
    osd.offset.x = 0;
    osd.offset.y = 0;
    osd.global_alptha = 0xff;
    // osd.global_alptha = 0x32;
    osd.format = PIXEL_FORMAT_ARGB_8888;//PIXEL_FORMAT_ARGB_4444; //PIXEL_FORMAT_ARGB_1555;//PIXEL_FORMAT_ARGB_8888;

    vo_creat_osd_test(osd_id, &osd);
}

void vo_osd_release_block(void)
{
    if(vicap_install_osd == 1)
    {
        kd_mpi_vo_osd_disable(osd_id);
        kd_mpi_vb_release_block(block);
    }
    
}

static k_s32 sample_connector_init(void)
{
    k_u32 ret = 0;
    k_s32 connector_fd;
#if defined(CONFIG_BOARD_K230_CANMV)
    k_connector_type connector_type = LT9611_MIPI_4LAN_1920X1080_30FPS;
#elif defined(CONFIG_BOARD_K230_CANMV_V2)
    k_connector_type connector_type = LT9611_MIPI_4LAN_1920X1080_30FPS;
#elif defined(CONFIG_BOARD_K230D_CANMV)
    k_connector_type connector_type = ST7701_V1_MIPI_2LAN_480X800_30FPS;
#elif defined(CONFIG_BOARD_K230_CANMV_01STUDIO)
    k_connector_type connector_type = ST7701_V1_MIPI_2LAN_480X800_30FPS;
#else
    k_connector_type connector_type = HX8377_V2_MIPI_4LAN_1080X1920_30FPS;
#endif

    k_connector_info connector_info;
    memset(&connector_info, 0, sizeof(k_connector_info));
    //connector get sensor info
    ret = kd_mpi_get_connector_info(connector_type, &connector_info);
    if (ret) {
        printf("sample_vicap, the sensor type not supported!\n");
        return ret;
    }

    connector_fd = kd_mpi_connector_open(connector_info.connector_name);
    if (connector_fd < 0) {
        printf("%s, connector open failed.\n", __func__);
        return K_ERR_VO_NOTREADY;
    }

    // set connect power
    kd_mpi_connector_power_set(connector_fd, K_TRUE);
    // connector init
    kd_mpi_connector_init(connector_fd, connector_info);

    return 0;
}

static k_s32 vo_layer_vdss_bind_vo_config(void)
{
    layer_info info;

    k_vo_layer chn_id = K_VO_LAYER1;

    memset(&info, 0, sizeof(info));

    sample_connector_init();

    #if defined(CONFIG_BOARD_K230D_CANMV) 
    info.act_size.width = ISP_CHN0_HEIGHT;//1080;//640;//1080;
    info.act_size.height = ISP_CHN0_WIDTH;//1920;//480;//1920;
    info.format = PIXEL_FORMAT_YVU_PLANAR_420;
    info.func = K_ROTATION_90;
    #elif defined(CONFIG_BOARD_K230_CANMV_01STUDIO)
    info.act_size.width = ISP_CHN0_HEIGHT;//1080;//640;//1080;
    info.act_size.height = ISP_CHN0_WIDTH;//1920;//480;//1920;
    info.format = PIXEL_FORMAT_YVU_PLANAR_420;
    info.func = K_ROTATION_90;
    #else
    // config lyaer
    info.act_size.width = ISP_CHN0_WIDTH;//ISP_CHN0_HEIGHT;//1080;//640;//1080;
    info.act_size.height = ISP_CHN0_HEIGHT;//ISP_CHN0_WIDTH;//1920;//480;//1920;
    info.format = PIXEL_FORMAT_YVU_PLANAR_420;
    info.func = 0;//K_ROTATION_180;////K_ROTATION_90;
    #endif

    info.global_alptha = 0xff;
    info.offset.x = 0;//(1080-w)/2,
    info.offset.y = 0;//(1920-h)/2;
    vo_creat_layer_test(chn_id, &info);

    if(vicap_install_osd == 1)
        sample_vicap_install_osd();

    //exit ;
    return 0;
}

static void sample_vicap_bind_vo(k_mpp_chn vicap_mpp_chn, k_mpp_chn vo_mpp_chn)
{
    k_s32 ret;

    ret = kd_mpi_sys_bind(&vicap_mpp_chn, &vo_mpp_chn);
    if (ret) {
        printf("kd_mpi_sys_unbind failed:0x%x\n", ret);
    }

    return;
}

static void sample_vicap_unbind_vo(k_mpp_chn vicap_mpp_chn, k_mpp_chn vo_mpp_chn)
{
    k_s32 ret;

    ret = kd_mpi_sys_unbind(&vicap_mpp_chn, &vo_mpp_chn);
    if (ret) {
        printf("kd_mpi_sys_unbind failed:0x%x\n", ret);
    }

    return;
}

int vivcap_start()
{
    k_s32 ret = 0;

    k_u32 pool_id;
    k_vb_pool_config pool_config;

    printf("sample_vicap ...\n");

    #if defined(CONFIG_BOARD_K230_CANMV)
    sensor_type = OV_OV5647_MIPI_CSI0_1920X1080_30FPS_10BIT_LINEAR;
    #elif defined(CONFIG_BOARD_K230_CANMV_V2) 
    sensor_type = OV_OV5647_MIPI_CSI2_1920X1080_30FPS_10BIT_LINEAR_V2;
    #elif defined(CONFIG_BOARD_K230_CANMV_01STUDIO)
    sensor_type = OV_OV5647_MIPI_CSI2_1920X1080_30FPS_10BIT_LINEAR_V2;
    #elif defined(CONFIG_BOARD_K230D_CANMV)
    sensor_type = OV_OV5647_MIPI_1920X1080_30FPS_10BIT_LINEAR;
    #else
    sensor_type = IMX335_MIPI_2LANE_RAW12_2592X1944_30FPS_LINEAR;
    #endif

    vicap_dev = VICAP_DEV_ID_0;
    memset(&config, 0, sizeof(config));
    config.max_pool_cnt = 64;
    #if defined(CONFIG_BOARD_K230D_CANMV) 
    //VB for YUV420SP output
    config.comm_pool[0].blk_cnt = 4;
    config.comm_pool[0].mode = VB_REMAP_MODE_NOCACHE;
    config.comm_pool[0].blk_size = VICAP_ALIGN_UP((ISP_CHN0_WIDTH * ISP_CHN0_HEIGHT * 3 / 2), VICAP_ALIGN_1K);
    //VB for RGB888 output
    config.comm_pool[1].blk_cnt = 5;
    config.comm_pool[1].mode = VB_REMAP_MODE_NOCACHE;
    config.comm_pool[1].blk_size = VICAP_ALIGN_UP((SENSOR_HEIGHT * SENSOR_WIDTH * 3 ), VICAP_ALIGN_1K);
    #else
    //VB for YUV420SP output
    config.comm_pool[0].blk_cnt = 5;
    config.comm_pool[0].mode = VB_REMAP_MODE_NOCACHE;
    config.comm_pool[0].blk_size = VICAP_ALIGN_UP((ISP_CHN0_WIDTH * ISP_CHN0_HEIGHT * 3 / 2), VICAP_ALIGN_1K);
    //VB for RGB888 output
    config.comm_pool[1].blk_cnt = 5;
    config.comm_pool[1].mode = VB_REMAP_MODE_NOCACHE;
    config.comm_pool[1].blk_size = VICAP_ALIGN_UP((SENSOR_HEIGHT * SENSOR_WIDTH * 3 ), VICAP_ALIGN_1K);
    #endif

    ret = kd_mpi_vb_set_config(&config);
    if (ret) {
        printf("vb_set_config failed ret:%d\n", ret);
        return ret;
    }

    k_vb_supplement_config supplement_config;
    memset(&supplement_config, 0, sizeof(supplement_config));
    supplement_config.supplement_config |= VB_SUPPLEMENT_JPEG_MASK;

    ret = kd_mpi_vb_set_supplement_config(&supplement_config);
    if (ret) {
        printf("vb_set_supplement_config failed ret:%d\n", ret);
        return ret;
    }

    ret = kd_mpi_vb_init();
    if (ret) {
        printf("vb_init failed ret:%d\n", ret);
        return ret;
    }
    printf("sample_vicap ...kd_mpi_vicap_get_sensor_info\n");

    // dwc_dsi_init();
    vo_layer_vdss_bind_vo_config();

    if(vicap_install_osd == 1)
    {
        #if defined(CONFIG_BOARD_K230D_CANMV) 
        memset(&pool_config, 0, sizeof(pool_config));
        pool_config.blk_size = VICAP_ALIGN_UP((osd_width * osd_height * 4 * 2), VICAP_ALIGN_1K);
        pool_config.blk_cnt = 2;
        pool_config.mode = VB_REMAP_MODE_NOCACHE;
        pool_id = kd_mpi_vb_create_pool(&pool_config);      // osd0 - 3 argb 320 x 240
        g_pool_id = pool_id;
        #else
        memset(&pool_config, 0, sizeof(pool_config));
        pool_config.blk_size = VICAP_ALIGN_UP((osd_width * osd_height * 4 * 2), VICAP_ALIGN_1K);
        pool_config.blk_cnt = 4;
        pool_config.mode = VB_REMAP_MODE_NOCACHE;
        pool_id = kd_mpi_vb_create_pool(&pool_config);      // osd0 - 3 argb 320 x 240
        g_pool_id = pool_id;
        #endif

        printf("--------aa--------------g_pool_id is %d pool_id is %d \n",g_pool_id, pool_id);
    }

    memset(&sensor_info, 0, sizeof(k_vicap_sensor_info));
    ret = kd_mpi_vicap_get_sensor_info(sensor_type, &sensor_info);
    if (ret) {
        printf("sample_vicap, the sensor type not supported!\n");
        return ret;
    }

    memset(&dev_attr, 0, sizeof(k_vicap_dev_attr));
    dev_attr.acq_win.h_start = 0;
    dev_attr.acq_win.v_start = 0;
    #if defined (CONFIG_BOARD_K230_CANMV)  || defined(CONFIG_BOARD_K230_CANMV_V2) || defined(CONFIG_BOARD_K230D_CANMV) || defined(CONFIG_BOARD_K230_CANMV_01STUDIO)
    dev_attr.acq_win.width = ISP_INPUT_WIDTH;
    dev_attr.acq_win.height = ISP_INPUT_HEIGHT;
    #else
    dev_attr.acq_win.width = 2592;//SENSOR_HEIGHT;
    dev_attr.acq_win.height = 1944;//SENSOR_WIDTH;
    #endif
    dev_attr.mode = VICAP_WORK_ONLINE_MODE;

    dev_attr.pipe_ctrl.data = 0xFFFFFFFF;
    dev_attr.pipe_ctrl.bits.af_enable = 0;
    dev_attr.pipe_ctrl.bits.ahdr_enable = 0;


    dev_attr.cpature_frame = 0;
    memcpy(&dev_attr.sensor_info, &sensor_info, sizeof(k_vicap_sensor_info));

    ret = kd_mpi_vicap_set_dev_attr(vicap_dev, dev_attr);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_set_dev_attr failed.\n");
        return ret;
    }

    memset(&chn_attr, 0, sizeof(k_vicap_chn_attr));

    //set chn0 output yuv420sp
    chn_attr.out_win.h_start = 0;
    chn_attr.out_win.v_start = 0;
    chn_attr.out_win.width = ISP_CHN0_WIDTH;
    chn_attr.out_win.height = ISP_CHN0_HEIGHT;


    #if defined(CONFIG_BOARD_K230_CANMV)  || defined(CONFIG_BOARD_K230_CANMV_V2) || defined(CONFIG_BOARD_K230D_CANMV) || defined(CONFIG_BOARD_K230_CANMV_01STUDIO)
    chn_attr.crop_win = dev_attr.acq_win;
    #else
    // chn_attr.crop_win = dev_attr.acq_win;
    chn_attr.crop_win.h_start = 768;
    chn_attr.crop_win.v_start = 16;
    chn_attr.crop_win.width = ISP_CHN0_WIDTH;
    chn_attr.crop_win.height = ISP_CHN0_HEIGHT;
    #endif

    chn_attr.scale_win = chn_attr.out_win;
    chn_attr.crop_enable = K_FALSE;
    chn_attr.scale_enable = K_FALSE;
    // chn_attr.dw_enable = K_FALSE;
    chn_attr.chn_enable = K_TRUE;
    chn_attr.pix_format = PIXEL_FORMAT_YVU_PLANAR_420;
    chn_attr.buffer_num = VICAP_MAX_FRAME_COUNT;//at least 3 buffers for isp
    chn_attr.buffer_size = config.comm_pool[0].blk_size;
    vicap_chn = VICAP_CHN_ID_0;

    printf("sample_vicap ...kd_mpi_vicap_set_chn_attr, buffer_size[%d]\n", chn_attr.buffer_size);
    ret = kd_mpi_vicap_set_chn_attr(vicap_dev, vicap_chn, chn_attr);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_set_chn_attr failed.\n");
        return ret;
    }

    //bind vicap chn 0 to vo
    vicap_mpp_chn.mod_id = K_ID_VI;
    vicap_mpp_chn.dev_id = vicap_dev;
    vicap_mpp_chn.chn_id = vicap_chn;

    vo_mpp_chn.mod_id = K_ID_VO;
    vo_mpp_chn.dev_id = K_VO_DISPLAY_DEV_ID;
    vo_mpp_chn.chn_id = K_VO_DISPLAY_CHN_ID1;

    sample_vicap_bind_vo(vicap_mpp_chn, vo_mpp_chn);
    printf("sample_vicap ...dwc_dsi_init\n");

    //set chn1 output rgb888p
    chn_attr.out_win.h_start = 0;
    chn_attr.out_win.v_start = 0;
    chn_attr.out_win.width = SENSOR_WIDTH ;
    chn_attr.out_win.height = SENSOR_HEIGHT;
    // chn_attr.crop_win = dev_attr.acq_win;

    #if defined(CONFIG_BOARD_K230_CANMV)  || defined(CONFIG_BOARD_K230_CANMV_V2) || defined(CONFIG_BOARD_K230D_CANMV) || defined(CONFIG_BOARD_K230_CANMV_01STUDIO)
    chn_attr.crop_win = dev_attr.acq_win;
    #else   
    chn_attr.crop_win.h_start = 768;
    chn_attr.crop_win.v_start = 16;
    chn_attr.crop_win.width = ISP_CHN0_WIDTH;
    chn_attr.crop_win.height = ISP_CHN0_HEIGHT;
    #endif

    chn_attr.scale_win = chn_attr.out_win;
    chn_attr.crop_enable = K_FALSE;
    chn_attr.scale_enable = K_FALSE;
    // chn_attr.dw_enable = K_FALSE;
    chn_attr.chn_enable = K_TRUE;
    chn_attr.pix_format = PIXEL_FORMAT_BGR_888_PLANAR;
    chn_attr.buffer_num = VICAP_MAX_FRAME_COUNT;//at least 3 buffers for isp
    chn_attr.buffer_size = config.comm_pool[1].blk_size;

    printf("sample_vicap ...kd_mpi_vicap_set_chn_attr, buffer_size[%d]\n", chn_attr.buffer_size);
    ret = kd_mpi_vicap_set_chn_attr(vicap_dev, VICAP_CHN_ID_1, chn_attr);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_set_chn_attr failed.\n");
        return ret;
    }

    // set to header file database parse mode
    ret = kd_mpi_vicap_set_database_parse_mode(vicap_dev, VICAP_DATABASE_PARSE_XML_JSON);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_set_database_parse_mode failed.\n");
        return ret;
    }

    printf("sample_vicap ...kd_mpi_vicap_init\n");
    ret = kd_mpi_vicap_init(vicap_dev);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_init failed.\n");
        // goto err_exit;
    }

    printf("sample_vicap ...kd_mpi_vicap_start_stream\n");
    ret = kd_mpi_vicap_start_stream(vicap_dev);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_init failed.\n");
        // goto err_exit;
    }

    return ret;
}

int vivcap_stop()
{
    printf("sample_vicap ...kd_mpi_vicap_stop_stream\n");
    int ret = kd_mpi_vicap_stop_stream(vicap_dev);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_init failed.\n");
        return ret;
    }

    ret = kd_mpi_vicap_deinit(vicap_dev);
    if (ret) {
        printf("sample_vicap, kd_mpi_vicap_deinit failed.\n");
        return ret;
    }

    kd_mpi_vo_disable_video_layer(K_VO_LAYER1);

    vicap_mpp_chn.mod_id = K_ID_VI;
    vicap_mpp_chn.dev_id = vicap_dev;
    vicap_mpp_chn.chn_id = vicap_chn;

    vo_mpp_chn.mod_id = K_ID_VO;
    vo_mpp_chn.dev_id = K_VO_DISPLAY_DEV_ID;
    vo_mpp_chn.chn_id = K_VO_DISPLAY_CHN_ID1;

    sample_vicap_unbind_vo(vicap_mpp_chn, vo_mpp_chn);

    /*Allow one frame time for the VO to release the VB block*/
    k_u32 display_ms = 1000 / 33;
    usleep(1000 * display_ms);

    ret = kd_mpi_vb_exit();
    if (ret) {
        printf("sample_vicap, kd_mpi_vb_exit failed.\n");
        return ret;
    }

    return 0;
}

void yuv_rotate_90(char *des, char *src,int width,int height)
{
    int n = 0;
    int hw = width>>1;
    int hh = height>>1;
    int size = width * height;
    int hsize = size>>2;

    int pos = 0;

    for(int i = width-1;i >= 0;i--)
    {
        pos = 0;
        for(int j= 0;j < height;j++)
        {
            des[n++]= src[pos+i];
            pos += width;
        }
    }

}
/****************************************************************************/

main.cc

/* Copyright (c) 2023, Canaan Bright Sight Co., Ltd
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 * 1. Redistributions of source code must retain the above copyright
 * notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 * notice, this list of conditions and the following disclaimer in the
 * documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
 * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
 * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
 * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */
#include <iostream>
#include <thread>
#include <map>
#include <nncase/runtime/runtime_tensor.h>
#include <nncase/runtime/interpreter.h>
#include <nncase/runtime/runtime_op_utility.h>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>

#include "vi_vo.h"

using namespace nncase;
using namespace nncase::runtime;

using cv::Mat;
using std::cerr;
using std::cout;
using std::endl;
using namespace std;

auto cache = cv::Mat::zeros(1, 1, CV_32FC1);
/**
 * @brief Classification result structure
 */
typedef struct cls_res
{
    float score; // Classification score
    string label; // Classification label result
} cls_res;

std::atomic<bool> isp_stop(false);

void video_proc_cls(char *argv[])
{
    /***************************fixed: no modification needed***********************************/
    vivcap_start();

    // osd set
    k_video_frame_info vf_info;
    void *pic_vaddr = NULL;       

    memset(&vf_info, 0, sizeof(vf_info));

    vf_info.v_frame.width = osd_width;
    vf_info.v_frame.height = osd_height;
    vf_info.v_frame.stride[0] = osd_width;
    vf_info.v_frame.pixel_format = PIXEL_FORMAT_ARGB_8888;
    block = vo_insert_frame(&vf_info, &pic_vaddr);

    // alloc memory for sensor
    size_t paddr = 0;
    void *vaddr = nullptr;
    size_t size = SENSOR_CHANNEL * SENSOR_HEIGHT * SENSOR_WIDTH;
    int ret = kd_mpi_sys_mmz_alloc_cached(&paddr, &vaddr, "allocate", "anonymous", size);
    if (ret)
    {
        std::cerr << "physical_memory_block::allocate failed: ret = " << ret << ", errno = " << strerror(errno) << std::endl;
        std::abort();
    }
    /*****************************************************************************/ 

    string kmodel_path = argv[1];
    cout << "kmodel_path : " << kmodel_path << endl;
    float cls_thresh = 0.5;

    /***************************fixed: no modification needed***********************************/
    // We have encapsulated the relevant implementation in ai_base.cc, here for simplicity
    interpreter kmodel_interp;        
    // load model
    std::ifstream ifs(kmodel_path, std::ios::binary);
    kmodel_interp.load_model(ifs).expect("Invalid kmodel");

    // inputs init
    for (size_t i = 0; i < kmodel_interp.inputs_size(); i++)
    {
        auto desc = kmodel_interp.input_desc(i);
        auto shape = kmodel_interp.input_shape(i);
        auto tensor = host_runtime_tensor::create(desc.datatype, shape, hrt::pool_shared).expect("cannot create input tensor");
        kmodel_interp.input_tensor(i, tensor).expect("cannot set input tensor");
    } 
    auto shape0 = kmodel_interp.input_shape(0);      // nhwc
    int kmodel_input_height = shape0[1];
    int kmodel_input_width = shape0[2];

    // outputs init
    for (size_t i = 0; i < kmodel_interp.outputs_size(); i++)
    {
        auto desc = kmodel_interp.output_desc(i);
        auto shape = kmodel_interp.output_shape(i);
        auto tensor = host_runtime_tensor::create(desc.datatype, shape, hrt::pool_shared).expect("cannot create output tensor");
        kmodel_interp.output_tensor(i, tensor).expect("cannot set output tensor");
    }
    /*****************************************************************************/ 

    vector<cls_res> results;
    std::vector<std::string> labels = {"bocai", "changqiezi", "huluobo", "xihongshi", "xilanhua"};
    
    while (!isp_stop)
    {
        cv::Mat ori_img;
        // sensor to cv::Mat
        {
            /***************************fixed: no modification needed***********************************/
            // Read a frame of image from the camera
            memset(&dump_info, 0 , sizeof(k_video_frame_info));
            ret = kd_mpi_vicap_dump_frame(vicap_dev, VICAP_CHN_ID_1, VICAP_DUMP_YUV, &dump_info, 1000);
            if (ret) {
                printf("sample_vicap...kd_mpi_vicap_dump_frame failed.\n");
                continue;
            }

            // Map the DDR address of the current frame from the camera to the current system for access
            auto vbvaddr = kd_mpi_sys_mmap_cached(dump_info.v_frame.phys_addr[0], size);
            memcpy(vaddr, (void *)vbvaddr, SENSOR_HEIGHT * SENSOR_WIDTH * 3); 
            kd_mpi_sys_munmap(vbvaddr, size);
            /*****************************************************************************/ 
            
            // Convert camera data to cv::Mat, sensor (rgb, chw) -> cv::Mat (bgr, hwc)
            cv::Mat image_r = cv::Mat(SENSOR_HEIGHT, SENSOR_WIDTH, CV_8UC1, vaddr);
            cv::Mat image_g = cv::Mat(SENSOR_HEIGHT, SENSOR_WIDTH, CV_8UC1, vaddr + SENSOR_HEIGHT * SENSOR_WIDTH);
            cv::Mat image_b = cv::Mat(SENSOR_HEIGHT, SENSOR_WIDTH, CV_8UC1, vaddr + 2 * SENSOR_HEIGHT * SENSOR_WIDTH);
            std::vector<cv::Mat> color_vec(3);
            color_vec.clear();
            color_vec.push_back(image_b);
            color_vec.push_back(image_g);
            color_vec.push_back(image_r);
            cv::merge(color_vec, ori_img);
        }

        /***************************unfixed: may need modification for different AI Demos******************/
        // pre_process
        cv::Mat pre_process_img;
        {
            cv::Mat rgb_img;
            cv::cvtColor(ori_img, rgb_img, cv::COLOR_BGR2RGB);
            cv::resize(rgb_img, pre_process_img, cv::Size(kmodel_input_width, kmodel_input_height), cv::INTER_LINEAR);
        }
        /*****************************************************************************/  

        /***************************fixed: no modification needed***********************************/
        // set kmodel input
        {
            runtime_tensor tensor0 = kmodel_interp.input_tensor(0).expect("cannot get input tensor");
            auto in_buf = tensor0.impl()->to_host().unwrap()->buffer().as_host().unwrap().map(map_access_::map_write).unwrap().buffer();
            memcpy(reinterpret_cast<unsigned char *>(in_buf.data()), pre_process_img.data, sizeof(uint8_t) * kmodel_input_height * kmodel_input_width * 3);
            hrt::sync(tensor0, sync_op_t::sync_write_back, true).expect("sync write_back failed");
        }
        

        // kmodel run
        kmodel_interp.run().expect("error occurred in running model");

        // get kmodel output
        vector<float *> k_outputs;
        {
            for (int i = 0; i < kmodel_interp.outputs_size(); i++)
            {
                auto out = kmodel_interp.output_tensor(i).expect("cannot get output tensor");
                auto buf = out.impl()->to_host().unwrap()->buffer().as_host().unwrap().map(map_access_::map_read).unwrap().buffer();
                float *p_out = reinterpret_cast<float *>(buf.data());
                k_outputs.push_back(p_out);
            }
        }
        /***************************fixed: no modification needed***********************************/

        /***************************unfixed: may need modification for different AI Demos******************/
        // post process
        results.clear();
        {
            float* output0 = k_outputs[0];
            float sum = 0.0;
            for (int i = 0; i < labels.size(); i++){
                sum += exp(output0[i]);
            }
            
            int max_index;
            for (int i = 0; i < labels.size(); i++)
            {
                output0[i] = exp(output0[i]) / sum;
            }
            max_index = std::max_element(output0, output0 + labels.size()) - output0; 
            cls_res b;
            if (output0[max_index] >= cls_thresh)
            {
                b.label = labels[max_index];
                b.score = output0[max_index];
                results.push_back(b);
            }
        }
        /*****************************************************************************/    

        // draw result to vo
        {
            {
                cv::Mat osd_frame(osd_height, osd_width, CV_8UC4, cv::Scalar(0, 0, 0, 0));
                {
                    /***************************unfixed: may need modification for different AI Demos******************/
                    // draw cls
                    double fontsize = (osd_frame.cols * osd_frame.rows * 1.0) / (1100 * 1200);
                    for (int i = 0; i < results.size(); i++)
                    {   
                        std::string text = "class: " + results[i].label + ", score: " + std::to_string(round(results[i].score * 100) / 100.0).substr(0, 4);

                        cv::putText(osd_frame, text, cv::Point(1, 40), cv::FONT_HERSHEY_SIMPLEX, 0.8, cv::Scalar(255, 255, 255, 0), 2);

                        std::cout << text << std::endl;
                    }
                    /*****************************************************************************/
                }

                /***************************fixed: no modification needed***********************************/
                memcpy(pic_vaddr, osd_frame.data, osd_width * osd_height * 4);
            }

            // insert osd to vo
            {
                kd_mpi_vo_chn_insert_frame(osd_id + 3, &vf_info);  // K_VO_OSD0
                printf("kd_mpi_vo_chn_insert_frame success \n");
            }
        }
        
        {
            // Release the current frame of the sensor
            ret = kd_mpi_vicap_dump_release(vicap_dev, VICAP_CHN_ID_1, &dump_info);
            if (ret) {
                printf("sample_vicap...kd_mpi_vicap_dump_release failed.\n");
            }
        }
        /*****************************************************************************/ 
    }
    
    /***************************fixed: no modification needed***********************************/
    vo_osd_release_block();
    vivcap_stop();

    // free memory
    ret = kd_mpi_sys_mmz_free(paddr, vaddr);
    if (ret)
    {
        std::cerr << "free failed: ret = " << ret << ", errno = " << strerror(errno) << std::endl;
        std::abort();
    }
    /*****************************************************************************/ 
}

int main(int argc, char *argv[])
{
    std::cout << "case " << argv[0] << " built at " << __DATE__ << " " << __TIME__ << std::endl;
    if (argc != 2)
    {
        cout << "Model inference parameter explanation:"
             << "<kmodel_path>" << endl
             << "Options:" << endl
             << "  kmodel_path     Path of the Kmodel\n"
             << "\n"
             << endl;
        return -1;
    }

    /***************************fixed: no modification needed***********************************/
    std::thread thread_isp(video_proc_cls, argv);
    while (getchar() != 'q')
    {
        usleep(10000);
    }

    isp_stop = true;
    thread_isp.join();
    /*****************************************************************************/ 
    return 0;
}

5.2 AI Inference Process Based on ulab (MicroPython)#

The K230 AI inference based on ulab (MicroPython) briefly introduces the process of video capture, image preprocessing (ulab), model inference, post-processing (ulab), and display using the MicroPython language.

5.2.1 Video Capture#

Create Camera (sensor) Output Buffer (VB): Already encapsulated in the C++ underlying implementation, no need to worry about it in the MicroPython implementation.

Set the sensor to output two channels;
One channel is for display, with the output size set to 1080p and the image format as PIXEL_FORMAT_YVU_PLANAR_420, directly bound to the VO;
The other channel is for AI computation, with the output size set to (224, 224), and the image format as PIXEL_FORMAT_RGB_888_PLANAR;

import os
from media.sensor import * # Import the sensor module to use sensor-related interfaces
from media.display import * # Import the display module to use display-related interfaces
from media.media import * # Import the media module to use media-related interfaces
from time import *
import gc

# Display output size
DISPLAY_WIDTH = ALIGN_UP(1920, 16)
DISPLAY_HEIGHT = 1080
# Size of the RGB888P image obtained by AI
OUT_RGB888P_WIDTH = ALIGN_UP(1280, 16)
OUT_RGB888P_HEIGH = 720

def sensor_init():
    # Initialize and configure the sensor
    sensor = Sensor()
    sensor.reset()
    # Set mirror
    sensor.set_hmirror(False)
    # Set flip
    sensor.set_vflip(False)
    # Directly output channel 0 to the display VO, format is YUV420
    sensor.set_framesize(width = DISPLAY_WIDTH, height = DISPLAY_HEIGHT)
    sensor.set_pixformat(PIXEL_FORMAT_YUV_SEMIPLANAR_420)
    # Output channel 2 to AI for algorithm processing, format is RGB888
    sensor.set_framesize(width = OUT_RGB888P_WIDTH, height = OUT_RGB888P_HEIGH, chn=CAM_CHN_ID_2)
    sensor.set_pixformat(PIXEL_FORMAT_RGB_888_PLANAR, chn=CAM_CHN_ID_2)
    # Bind the output of channel 0 to the display
    sensor_bind_info = sensor.bind_info(x = 0, y = 0, chn = CAM_CHN_ID_0)
    Display.bind_layer(**sensor_bind_info, layer = Display.LAYER_VIDEO1)
    # Set LT9611 display, default is 1920x1080
    Display.init(Display.LT9611, to_ide = True)
    try:
        # Initialize media
        MediaManager.init()
        # Start the sensor
        sensor.run()
        rgb888p_img = None
        while  True:
            # Capture camera data
            rgb888p_img = sensor.snapshot(chn=CAM_CHN_ID_2)
            # ************ After obtaining a frame of image, subsequent AI processing can be performed ************
            #                      ......
            # ***************************************************
    except Exception as e:
        print(f"An error occurred during running: {e}")
    finally:
        os.exitpoint(os.EXITPOINT_ENABLE_SLEEP)
        # Stop the camera output
        sensor.stop()
        # Deinitialize the display device
        Display.deinit()
        # Release media buffer
        MediaManager.deinit()
        gc.collect()
        nn.shrink_memory_pool()
    return 0

sensor_init()

5.2.2 Preprocessing#

import os
from media.sensor import * # Import the camera module to use camera-related interfaces
from media.display import * # Import the display module to use display-related interfaces
from media.media import * # Import the media module to use media-related interfaces
from time import *
import time
import nncase_runtime as nn # Import the nn module to use nn-related interfaces
import ulab.numpy as np # Import the np module to use np-related interfaces
import gc

# Display output size
DISPLAY_WIDTH = ALIGN_UP(1920, 16)
DISPLAY_HEIGHT = 1080
# Size of the RGB888P image obtained by AI
OUT_RGB888P_WIDTH = ALIGN_UP(1280, 16)
OUT_RGB888P_HEIGH = 720

def cls_test():
    print("cls_test start")
    # Initialize AI2D
    ai2d = nn.ai2d()
    # Set the input and output data format and data type for ai2d
    ai2d.set_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
    # Set resize preprocessing
    ai2d.set_resize_param(True, nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
    # Build preprocessing with input and output tensor shapes
    ai2d_builder = ai2d.build([1, 3, OUT_RGB888P_HEIGH, OUT_RGB888P_WIDTH], [1, 3, 224, 224])

    # Initialize and configure the sensor
    sensor = Sensor()
    sensor.reset()
    # Set mirror
    sensor.set_hmirror(False)
    # Set flip
    sensor.set_vflip(False)
    # Directly output channel 0 to the display VO, format is YUV420
    sensor.set_framesize(width = DISPLAY_WIDTH, height = DISPLAY_HEIGHT)
    sensor.set_pixformat(PIXEL_FORMAT_YUV_SEMIPLANAR_420)
    # Output channel 2 to AI for algorithm processing, format is RGB888
    sensor.set_framesize(width = OUT_RGB888P_WIDTH, height = OUT_RGB888P_HEIGH, chn=CAM_CHN_ID_2)
    sensor.set_pixformat(PIXEL_FORMAT_RGB_888_PLANAR, chn=CAM_CHN_ID_2)
    # Bind the output of channel 0 to the display
    sensor_bind_info = sensor.bind_info(x = 0, y = 0, chn = CAM_CHN_ID_0)
    Display.bind_layer(**sensor_bind_info, layer = Display.LAYER_VIDEO1)
    # Set LT9611 display, default is 1920x1080
    Display.init(Display.LT9611, to_ide = True)

    try:
        # Initialize media
        MediaManager.init()
        # Start the sensor
        sensor.run()
        rgb888p_img = None
        while True:
            # Capture camera data
            rgb888p_img = sensor.snapshot(chn=CAM_CHN_ID_2)
            # For RGB888 planar
            if rgb888p_img.format() == image.RGBP888:
                # Convert image to numpy.ndarray
                ai2d_input = rgb888p_img.to_numpy_ref()
                # Create tensor from numpy.ndarray
                ai2d_input_tensor = nn.from_numpy(ai2d_input)
                # Initialize preprocessing output
                data = np.ones((1, 3, 224, 224), dtype=np.uint8)
                # Convert numpy.ndarray to tensor
                ai2d_out = nn.from_numpy(data)
                # Execute preprocessing
                ai2d_builder.run(ai2d_input_tensor, ai2d_out)
                # Convert tensor to numpy.ndarray
                ai2d_out_np = ai2d_out.to_numpy()
                # Print preprocessing shape
                print(ai2d_out_np.shape)
    except Exception as e:
        print(f"An error occurred during running: {e}")
    finally:
        os.exitpoint(os.EXITPOINT_ENABLE_SLEEP)
        # Stop the camera output
        sensor.stop()
        # Deinitialize the display device
        Display.deinit()
        # Release media buffer
        MediaManager.deinit()
        gc.collect()
        time.sleep(1)
        nn.shrink_memory_pool()
    print("cls_test end")
    return 0

cls_test()

Note: For this classification model, it is necessary to first obtain the original image, resize the original image to (224, 224), and then feed it to the model. For MicroPython development, resizing can be achieved using the ai2d module.

We can also directly set the camera data resolution fed to the AI to (224, 224), as shown in the code below:

import os
from media.sensor import * # Import the camera module to use camera-related interfaces
from media.display import * # Import the display module to use display-related interfaces
from media.media import * # Import the media module to use media-related interfaces
from time import *
import time
import nncase_runtime as nn # Import the nn module to use nn-related interfaces
import ulab.numpy as np # Import the np module to use np-related interfaces
import gc

DISPLAY_WIDTH = ALIGN_UP(1920, 16)
DISPLAY_HEIGHT = 1080

OUT_RGB888P_WIDTH = ALIGN_UP(224, 16)
OUT_RGB888P_HEIGH = 224

def cls_test():
    print("cls_test start")
    # Initialize and configure the sensor
    sensor = Sensor()
    sensor.reset()
    # Set mirror
    sensor.set_hmirror(False)
    # Set flip
    sensor.set_vflip(False)
    # Directly output channel 0 to the display VO, format is YUV420
    sensor.set_framesize(width = DISPLAY_WIDTH, height = DISPLAY_HEIGHT)
    sensor.set_pixformat(PIXEL_FORMAT_YUV_SEMIPLANAR_420)
    # Output channel 2 to AI for algorithm processing, format is RGB888
    sensor.set_framesize(width = OUT_RGB888P_WIDTH, height = OUT_RGB888P_HEIGH, chn=CAM_CHN_ID_2)
    sensor.set_pixformat(PIXEL_FORMAT_RGB_888_PLANAR, chn=CAM_CHN_ID_2)
    # Bind the output of channel 0 to the display
    sensor_bind_info = sensor.bind_info(x = 0, y = 0, chn = CAM_CHN_ID_0)
    Display.bind_layer(**sensor_bind_info, layer = Display.LAYER_VIDEO1)
    # Set LT9611 display, default is 1920x1080
    Display.init(Display.LT9611, to_ide = True)

    try:
        # Initialize media
        MediaManager.init()
        # Start the sensor
        sensor.run()
        rgb888p_img = None
        while True:
            # Capture camera data
            rgb888p_img = sensor.snapshot(chn=CAM_CHN_ID_2)
            # For RGB888 planar
            if rgb888p_img.format() == image.RGBP888:
                input_img = rgb888p_img.to_numpy_ref()
                # Print shape
                print(input_img.shape)
    except Exception as e:
        print(f"An error occurred during running: {e}")
    finally:
        os.exitpoint(os.EXITPOINT_ENABLE_SLEEP)
        # Stop the camera output
        sensor.stop()
        # Deinitialize the display device
        Display.deinit()
        # Release media buffer
        MediaManager.deinit()
        gc.collect()
        time.sleep(1)
        nn.shrink_memory_pool()
    print("cls_test end")
    return 0

cls_test()

5.2.3 Inference#

import os
from media.sensor import * # Import the camera module to use camera-related interfaces
from media.display import * # Import the display module to use display-related interfaces
from media.media import * # Import the media module to use media-related interfaces
from time import *
import time
import nncase_runtime as nn # Import the nn module to use nn-related interfaces
import ulab.numpy as np # Import the np module to use np-related interfaces
import gc

# Display output size
DISPLAY_WIDTH = ALIGN_UP(1920, 16)
DISPLAY_HEIGHT = 1080
# Size of the RGB888P image obtained by AI
OUT_RGB888P_WIDTH = ALIGN_UP(1280, 16)
OUT_RGB888P_HEIGH = 720

kmodel_path = "/sdcard/app/tests/ai_test_kmodel/veg_cls.kmodel"

def cls_test():
    print("cls_test start")
    # Initialize AI2D
    ai2d = nn.ai2d()
    # Set the input and output data format and data type for ai2d
    ai2d.set_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
    # Set resize preprocessing
    ai2d.set_resize_param(True, nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
    # Build preprocessing with input and output tensor shapes
    ai2d_builder = ai2d.build([1, 3, OUT_RGB888P_HEIGH, OUT_RGB888P_WIDTH], [1, 3, 224, 224])
    # Initialize kpu
    kpu = nn.kpu()
    # Load the model
    kpu.load_kmodel(kmodel_path)
    
    # Initialize and configure the sensor
    sensor = Sensor()
    sensor.reset()
    # Set mirror
    sensor.set_hmirror(False)
    # Set flip
    sensor.set_vflip(False)
    # Directly output channel 0 to the display VO, format is YUV420
    sensor.set_framesize(width = DISPLAY_WIDTH, height = DISPLAY_HEIGHT)
    sensor.set_pixformat(PIXEL_FORMAT_YUV_SEMIPLANAR_420)
    # Output channel 2 to AI for algorithm processing, format is RGB888
    sensor.set_framesize(width = OUT_RGB888P_WIDTH, height = OUT_RGB888P_HEIGH, chn=CAM_CHN_ID_2)
    sensor.set_pixformat(PIXEL_FORMAT_RGB_888_PLANAR, chn=CAM_CHN_ID_2)
    # Bind the output of channel 0 to the display
    sensor_bind_info = sensor.bind_info(x = 0, y = 0, chn = CAM_CHN_ID_0)
    Display.bind_layer(**sensor_bind_info, layer = Display.LAYER_VIDEO1)
    # Set LT9611 display, default is 1920x1080
    Display.init(Display.LT9611, to_ide = True)

    try:
        # Initialize media
        MediaManager.init()
        # Start the sensor
        sensor.run()
        rgb888p_img = None
        
        # Initialize preprocessing output
        data = np.ones((1, 3, 224, 224), dtype=np.uint8)
        # Convert numpy.ndarray to tensor
        ai2d_out = nn.from_numpy(data)
        # Bind the output of ai2d preprocessing to the input of kmodel
        kpu.set_input_tensor(0, ai2d_out)
        while True:
            # Capture camera data
            rgb888p_img = sensor.snapshot(chn=CAM_CHN_ID_2)
            # For RGB888 planar
            if rgb888p_img.format() == image.RGBP888:
                # Convert image to numpy.ndarray
                ai2d_input = rgb888p_img.to_numpy_ref()
                # Create tensor from numpy.ndarray
                ai2d_input_tensor = nn.from_numpy(ai2d_input)
                # Execute preprocessing
                ai2d_builder.run(ai2d_input_tensor, ai2d_out)
                # Run kpu inference
                kpu.run()
                
                # Get the inference output tensor of kmodel, which may have multiple outputs, thus returning a list
                results = []
                for i in range(kpu.outputs_size()):
                    # Get the i-th output of kmodel
                    output_data = kpu.get_output_tensor(i)
                    # Convert tensor to numpy.ndarray
                    result = output_data.to_numpy()
                    # Print shape
                    print(result.shape)
                    # Add to list
                    results.append(result)
                    del output_data
                
    except Exception as e:
        print(f"An error occurred during running: {e}")
    finally:
        os.exitpoint(os.EXITPOINT_ENABLE_SLEEP)
        del ai2d_input_tensor
        del ai2d_out
        # Stop the camera output
        sensor.stop()
        # Deinitialize the display device
        Display.deinit()
        # Release media buffer
        MediaManager.deinit()
        gc.collect()
        time.sleep(1)
        nn.shrink_memory_pool()
    print("cls_test end")
    return 0

cls_test()

Note: For tensor-type data, please manually release it after allocation.

5.2.4 Post-processing#

Perform post-processing on the model results. For example, in classification, apply softmax to the output first and then use argmax to get the class index.

import os
from media.sensor import * # Import the camera module to use camera-related interfaces
from media.display import * # Import the display module to use display-related interfaces
from media.media import * # Import the media module to use media-related interfaces
from time import *
import time
import nncase_runtime as nn # Import the nn module to use nn-related interfaces
import ulab.numpy as np # Import the np module to use np-related interfaces
import gc

# Display output size
DISPLAY_WIDTH = ALIGN_UP(1920, 16)
DISPLAY_HEIGHT = 1080
# Size of the RGB888P image obtained by AI
OUT_RGB888P_WIDTH = ALIGN_UP(1280, 16)
OUT_RGB888P_HEIGH = 720

kmodel_path = "/sdcard/app/tests/ai_test_kmodel/veg_cls.kmodel"

# Softmax function
def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / np.sum(exp_x)

def cls_test():
    print("cls_test start")
    # Initialize AI2D
    ai2d = nn.ai2d()
    # Set the input and output data format and data type for ai2d
    ai2d.set_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
    # Set resize preprocessing
    ai2d.set_resize_param(True, nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
    # Build preprocessing with input and output tensor shapes
    ai2d_builder = ai2d.build([1, 3, OUT_RGB888P_HEIGH, OUT_RGB888P_WIDTH], [1, 3, 224, 224])
    # Initialize kpu
    kpu = nn.kpu()
    # Load the model
    kpu.load_kmodel(kmodel_path)
    
    # Initialize and configure the sensor
    sensor = Sensor()
    sensor.reset()
    # Set mirror
    sensor.set_hmirror(False)
    # Set flip
    sensor.set_vflip(False)
    # Directly output channel 0 to the display VO, format is YUV420
    sensor.set_framesize(width = DISPLAY_WIDTH, height = DISPLAY_HEIGHT)
    sensor.set_pixformat(PIXEL_FORMAT_YUV_SEMIPLANAR_420)
    # Output channel 2 to AI for algorithm processing, format is RGB888
    sensor.set_framesize(width = OUT_RGB888P_WIDTH, height = OUT_RGB888P_HEIGH, chn=CAM_CHN_ID_2)
    sensor.set_pixformat(PIXEL_FORMAT_RGB_888_PLANAR, chn=CAM_CHN_ID_2)
    # Bind the output of channel 0 to the display
    sensor_bind_info = sensor.bind_info(x = 0, y = 0, chn = CAM_CHN_ID_0)
    Display.bind_layer(**sensor_bind_info, layer = Display.LAYER_VIDEO1)
    # Set LT9611 display, default is 1920x1080
    Display.init(Display.LT9611, to_ide = True)

    try:
        # Initialize media
        MediaManager.init()
        # Start the sensor
        sensor.run()
        rgb888p_img = None
        
        # Initialize preprocessing output
        data = np.ones((1, 3, 224, 224), dtype=np.uint8)
        # Convert numpy.ndarray to tensor
        ai2d_out = nn.from_numpy(data)
        # Bind the output of ai2d preprocessing to the input of kmodel
        kpu.set_input_tensor(0, ai2d_out)
        while True:
            # Capture camera data
            rgb888p_img = sensor.snapshot(chn=CAM_CHN_ID_2)
            # For RGB888 planar
            if rgb888p_img.format() == image.RGBP888:
                # Convert image to numpy.ndarray
                ai2d_input = rgb888p_img.to_numpy_ref()
                # Create tensor from numpy.ndarray
                ai2d_input_tensor = nn.from_numpy(ai2d_input)
                # Execute preprocessing
                ai2d_builder.run(ai2d_input_tensor, ai2d_out)
                # Run kpu inference
                kpu.run()
                
                # Get the inference output tensor of kmodel, which may have multiple outputs, thus returning a list
                results = []
                for i in range(kpu.outputs_size()):
                    # Get the i-th output of kmodel
                    output_data = kpu.get_output_tensor(i)
                    # Convert tensor to numpy.ndarray
                    result = output_data.to_numpy()
                    # Add to list
                    results.append(result)
                    del output_data
                # ****************** Post-processing ********************
                # softmax + argmax
                softmax_res = softmax(results[0][0])
                res_idx = np.argmax(softmax_res)
                # Print class index
                print(res_idx)
    except Exception as e:
        print(f"An error occurred during running: {e}")
    finally:
        os.exitpoint(os.EXITPOINT_ENABLE_SLEEP)
        del ai2d_input_tensor
        del ai2d_out
        # Stop the camera output
        sensor.stop()
        # Deinitialize the display device
        Display.deinit()
        # Release media buffer
        MediaManager.deinit()
        gc.collect()
        time.sleep(1)
        nn.shrink_memory_pool()
    print("cls_test end")
    return 0

cls_test()

5.2.5 Display#

Display (Video Output, VO) and Display Related Settings. This section briefly introduces the overall process of display settings, resource release, and display overlay based on MicroPython.

Display Settings: Set display size and format.
Display Overlay: The display consists of 2 layers. The lower layer (original image layer) directly displays the camera output, while the upper layer (OSD layer) is used for drawing boxes, points, writing text, etc.
Resource Release: Release resources related to the display.

import os
from media.sensor import *  # Import camera module to use camera-related interfaces
from media.display import *  # Import display module to use display-related interfaces
from media.media import *  # Import media module to use media-related interfaces
from time import *
import time
import nncase_runtime as nn  # Import nn module to use nn-related interfaces
import ulab.numpy as np  # Import np module to use np-related interfaces
import gc

# Display output size
DISPLAY_WIDTH = ALIGN_UP(1920, 16)
DISPLAY_HEIGHT = 1080
# Size of RGB888P image obtained by AI
OUT_RGB888P_WIDTH = ALIGN_UP(1280, 16)
OUT_RGB888P_HEIGH = 720

kmodel_path = "/sdcard/app/tests/ai_test_kmodel/veg_cls.kmodel"
labels = ["Spinach", "Elongated Eggplant", "Red Amaranth", "Carrot", "Tomato", "Broccoli"]

# Softmax function
def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / np.sum(exp_x)


def cls_test():
    print("cls_test start")
    # Initialize AI2D
    ai2d = nn.ai2d()
    # Set ai2d input and output data format and data type
    ai2d.set_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
    # Set resize preprocessing
    ai2d.set_resize_param(True, nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
    # Build preprocessing with input and output tensor shapes
    ai2d_builder = ai2d.build([1, 3, OUT_RGB888P_HEIGH, OUT_RGB888P_WIDTH], [1, 3, 224, 224])
    # Initialize kpu
    kpu = nn.kpu()
    # Load the model
    kpu.load_kmodel(kmodel_path)

    # Initialize and configure sensor
    sensor = Sensor()
    sensor.reset()
    # Set mirror
    sensor.set_hmirror(False)
    # Set flip
    sensor.set_vflip(False)
    # Channel 0 directly to display VO, format YUV420
    sensor.set_framesize(width=DISPLAY_WIDTH, height=DISPLAY_HEIGHT)
    sensor.set_pixformat(PIXEL_FORMAT_YUV_SEMIPLANAR_420)
    # Channel 2 for AI processing, format RGB888
    sensor.set_framesize(width=OUT_RGB888P_WIDTH, height=OUT_RGB888P_HEIGH, chn=CAM_CHN_ID_2)
    sensor.set_pixformat(PIXEL_FORMAT_RGB_888_PLANAR, chn=CAM_CHN_ID_2)
    # Bind channel 0 output to VO
    sensor_bind_info = sensor.bind_info(x=0, y=0, chn=CAM_CHN_ID_0)
    Display.bind_layer(**sensor_bind_info, layer=Display.LAYER_VIDEO1)
    # Initialize LT9611 display, default 1920x1080
    Display.init(Display.LT9611, to_ide=True)

    # Create OSD image
    osd_img = image.Image(DISPLAY_WIDTH, DISPLAY_HEIGHT, image.ARGB8888)

    try:
        # Initialize media
        MediaManager.init()
        # Start sensor
        sensor.run()
        rgb888p_img = None

        # Initialize preprocessing output
        data = np.ones((1, 3, 224, 224), dtype=np.uint8)
        # Convert numpy.ndarray to tensor
        ai2d_out = nn.from_numpy(data)
        # Bind ai2d preprocessing output to kmodel input, sharing the same tensor
        kpu.set_input_tensor(0, ai2d_out)
        
        while True:
            # Capture camera data
            rgb888p_img = sensor.snapshot(chn=CAM_CHN_ID_2)
            # for rgb888planar
            if rgb888p_img.format() == image.RGBP888:
                #*************************Preprocessing***********************
                # Convert image to numpy.ndarray
                ai2d_input = rgb888p_img.to_numpy_ref()
                # Create tensor from numpy.ndarray
                ai2d_input_tensor = nn.from_numpy(ai2d_input)
                # Execute preprocessing
                ai2d_builder.run(ai2d_input_tensor, ai2d_out)
                #************************Model Inference**********************
                # Run kpu inference
                kpu.run()

                # Get kmodel output tensor, which may have multiple outputs, hence returned as a list
                results = []
                for i in range(kpu.outputs_size()):
                    # Get the i-th output of kmodel
                    output_data = kpu.get_output_tensor(i)
                    # Convert tensor to numpy.ndarray
                    result = output_data.to_numpy()
                    # Append to results list
                    results.append(result)
                    del output_data
                #******************Post-processing********************
                # softmax + argmax
                softmax_res = softmax(results[0][0])
                res_idx = np.argmax(softmax_res)
                #******************Display*********************
                # Class name
                label = labels[res_idx]
                # Class score
                score = softmax_res[res_idx]
                # Draw text
                text = label + " " + str(score)
                # Clear osd_img
                osd_img.clear()
                # Draw text at position (5, 5), size 32, color (0, 255, 0)
                osd_img.draw_string_advanced(5, 5, 32, text, color=(0, 255, 0))
                Display.show_image(osd_img, 0, 0, Display.LAYER_OSD3)
    except Exception as e:
        print(f"An error occurred during running: {e}")
    finally:
        os.exitpoint(os.EXITPOINT_ENABLE_SLEEP)
        del ai2d_input_tensor
        del ai2d_out
        del kpu
        # Stop camera output
        sensor.stop()
        # Deinitialize display device
        Display.deinit()
        # Deinitialize media buffer
        MediaManager.deinit()
        gc.collect()
        time.sleep(1)
        nn.shrink_memory_pool()
    print("cls_test end")
    return 0

cls_test()

5.2.6 Resource Release#

The memory in the main core includes two parts: system memory and GC memory. The former is mainly used for models and some functions within the system, including camera and screen buffers, kmodel, and its inputs and outputs (mmz, released with del); the latter is memory allocated at the interpreter level and can be used for code variables (released with gc.collect()).

1. GC Resource Release:

#classification.py
import gc
import os
 
def func_a():
    a = []
    for i in range(10000):
        a.append(i)
 
func_a()
print(gc.mem_free() / 1024 / 1024) #stack mem
print(gc.mem_alloc() / 1024 / 1024)
gc.collect()
print(gc.mem_alloc() / 1024 / 1024)

2. System Memory Release:

The kmodel module part, kmodel’s input_tensor, output_tensor, and the kmodel itself are all memory allocated by mmz and need to be manually released.

del ai2d_input_tensor
del ai2d_out
del kpu
nn.shrink_memory_pool() # To avoid missing any `del`, traverse all kmodel-related memory and release them

3. Camera, Display, and Media Buffer Release:

# Stop camera output
sensor.stop()
# Deinitialize display device
Display.deinit()
# Release media buffer
MediaManager.deinit()

5.2.7 Complete Code#

Here’s the complete translated code with comments explaining each step, including the encapsulation of AI development steps using classes located in /sdcard/app/libs. Please refer: AI Demo说明文档 — K230 CanMV 文档 (canaan-creative.com)

import os
from media.sensor import *  # Import camera module to use camera-related interfaces
from media.display import *  # Import display module to use display-related interfaces
from media.media import *  # Import media module to use media-related interfaces
from time import *
import time
import nncase_runtime as nn  # Import nn module to use nn-related interfaces
import ulab.numpy as np  # Import np module to use np-related interfaces
import gc

# Display output size
DISPLAY_WIDTH = ALIGN_UP(1920, 16)
DISPLAY_HEIGHT = 1080
# Size of RGB888P image obtained by AI
OUT_RGB888P_WIDTH = ALIGN_UP(1280, 16)
OUT_RGB888P_HEIGH = 720

kmodel_path = "/sdcard/app/tests/ai_test_kmodel/veg_cls.kmodel"
labels = ["Spinach", "Elongated Eggplant", "Red Amaranth", "Carrot", "Tomato", "Broccoli"]

# Softmax function
def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / np.sum(exp_x)


def cls_test():
    print("cls_test start")
    # Initialize AI2D
    ai2d = nn.ai2d()
    # Set ai2d input and output data format and data type
    ai2d.set_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)
    # Set resize preprocessing
    ai2d.set_resize_param(True, nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)
    # Build preprocessing with input and output tensor shapes
    ai2d_builder = ai2d.build([1, 3, OUT_RGB888P_HEIGH, OUT_RGB888P_WIDTH], [1, 3, 224, 224])
    # Initialize kpu
    kpu = nn.kpu()
    # Load the model
    kpu.load_kmodel(kmodel_path)

    # Initialize and configure sensor
    sensor = Sensor()
    sensor.reset()
    # Set mirror
    sensor.set_hmirror(False)
    # Set flip
    sensor.set_vflip(False)
    # Channel 0 directly to display VO, format YUV420
    sensor.set_framesize(width=DISPLAY_WIDTH, height=DISPLAY_HEIGHT)
    sensor.set_pixformat(PIXEL_FORMAT_YUV_SEMIPLANAR_420)
    # Channel 2 for AI processing, format RGB888
    sensor.set_framesize(width=OUT_RGB888P_WIDTH, height=OUT_RGB888P_HEIGH, chn=CAM_CHN_ID_2)
    sensor.set_pixformat(PIXEL_FORMAT_RGB_888_PLANAR, chn=CAM_CHN_ID_2)
    # Bind channel 0 output to VO
    sensor_bind_info = sensor.bind_info(x=0, y=0, chn=CAM_CHN_ID_0)
    Display.bind_layer(**sensor_bind_info, layer=Display.LAYER_VIDEO1)
    # Initialize LT9611 display, default 1920x1080
    Display.init(Display.LT9611, to_ide=True)

    # Create OSD image
    osd_img = image.Image(DISPLAY_WIDTH, DISPLAY_HEIGHT, image.ARGB8888)

    try:
        # Initialize media
        MediaManager.init()
        # Start sensor
        sensor.run()
        rgb888p_img = None

        # Initialize preprocessing output
        data = np.ones((1, 3, 224, 224), dtype=np.uint8)
        # Convert numpy.ndarray to tensor
        ai2d_out = nn.from_numpy(data)
        # Bind ai2d preprocessing output to kmodel input, sharing the same tensor
        kpu.set_input_tensor(0, ai2d_out)
        
        while True:
            # Capture camera data
            rgb888p_img = sensor.snapshot(chn=CAM_CHN_ID_2)
            # for rgb888planar
            if rgb888p_img.format() == image.RGBP888:
                #*************************Preprocessing***********************
                # Convert image to numpy.ndarray
                ai2d_input = rgb888p_img.to_numpy_ref()
                # Create tensor from numpy.ndarray
                ai2d_input_tensor = nn.from_numpy(ai2d_input)
                # Execute preprocessing
                ai2d_builder.run(ai2d_input_tensor, ai2d_out)
                #************************Model Inference**********************
                # Run kpu inference
                kpu.run()

                # Get kmodel output tensor, which may have multiple outputs, hence returned as a list
                results = []
                for i in range(kpu.outputs_size()):
                    # Get the i-th output of kmodel
                    output_data = kpu.get_output_tensor(i)
                    # Convert tensor to numpy.ndarray
                    result = output_data.to_numpy()
                    # Append to results list
                    results.append(result)
                    del output_data
                #******************Post-processing********************
                # softmax + argmax
                softmax_res = softmax(results[0][0])
                res_idx = np.argmax(softmax_res)
                #******************Display*********************
                # Class name
                label = labels[res_idx]
                # Class score
                score = softmax_res[res_idx]
                # Draw text
                text = label + " " + str(score)
                # Clear osd_img
                osd_img.clear()
                # Draw text at position (5, 5), size 32, color (0, 255, 0)
                osd_img.draw_string_advanced(5, 5, 32, text, color=(0, 255, 0))
                Display.show_image(osd_img, 0, 0, Display.LAYER_OSD3)
    except Exception as e:
        print(f"An error occurred during running: {e}")
    finally:
        os.exitpoint(os.EXITPOINT_ENABLE_SLEEP)
        del ai2d_input_tensor
        del ai2d_out
        del kpu
        # Stop camera output
        sensor.stop()
        # Deinitialize display device
        Display.deinit()
        # Deinitialize media buffer
        MediaManager.deinit()
        gc.collect()
        time.sleep(1)
        nn.shrink_memory_pool()
    print("cls_test end")
    return 0

cls_test()

5. Quick Start Guide for K230 AI Inference Process

Contents