K230 RVV Practical Guide#
Overview#
RVV (RISC-V Vector Extension) refers to the vector extension of the RISC-V instruction set architecture. RISC-V is an open-source instruction set architecture known for its simplicity, scalability, and wide range of applications. RVV, as an optional extension of RISC-V, aims to support vector processing and parallel computing. RVV defines a set of new instructions for executing vector operations. These instructions allow multiple data elements to be processed simultaneously, thereby improving computational efficiency and throughput. Vector operations can be executed in a single instruction without needing loops or individual operations for each data element. RVV supports different vector lengths, which can be fixed or configurable based on application requirements. It also supports various data types, including integers, floating-point numbers, and fixed-point numbers.
The introduction of RVV provides processors with vector processing and parallel computing capabilities, accelerating various applications such as image processing, signal processing, machine learning, and scientific computing. The openness and scalability of RVV also enable manufacturers and developers to customize and optimize according to their needs. The K230 uses the XuanTie C908 dual-core processor, with the larger core C908 featuring the RVV1.0 extension. This document describes how to use the RVV functionality on the larger core rt-smart and experience the performance acceleration brought by RVV.
Environment Preparation#
Hardware Environment#
K230-UNSIP-LP3-EVB-V1.0/K230-UNSIP-LP3-EVB-V1.1
Software Environment#
k230_SDK
Using RVV Functionality#
Source Code Writing#
To experience the advantages of RVV in vector computation, we will use image scaling as an application scenario and write a demo. Create a folder in the following path of the SDK:
k230_sdk/src/big/rt-smart/userapps/testcases/scale
Note that running make
or make rt-smart
in the k230_sdk directory may overwrite your changes. You can copy the completed source code to the following directory:
k230_sdk/src/big/unittest/testcases
Source Code#
Create a C source file named scale.c
in the scale directory:
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <stdint.h>
#include <time.h>
#pragma pack(push, 1)
typedef struct {
unsigned short signature;
unsigned int fileSize;
unsigned short reserved1;
unsigned short reserved2;
unsigned int dataOffset;
} BitmapFileHeader;
typedef struct {
unsigned int headerSize;
int width;
int height;
unsigned short planes;
unsigned short bitsPerPixel;
unsigned int compression;
unsigned int imageSize;
int xPixelsPerMeter;
int yPixelsPerMeter;
unsigned int colorsUsed;
unsigned int colorsImportant;
} BitmapInfoHeader;
#pragma pack(pop)
void __attribute__((optimize(3)))
scaleBMP(const char* inputPath, const char* outputPath, float scaleFactor) {
// Read input BMP file
FILE* inputFile = fopen(inputPath, "rb");
if (inputFile == NULL) {
printf("Failed to open input BMP file.\n");
return;
}
BitmapFileHeader fileHeader;
BitmapInfoHeader infoHeader;
fread(&fileHeader, sizeof(BitmapFileHeader), 1, inputFile);
fread(&infoHeader, sizeof(BitmapInfoHeader), 1, inputFile);
int originalWidth = infoHeader.width;
int originalHeight = infoHeader.height;
int originalImageSize = infoHeader.imageSize;
unsigned char* originalImageData = (unsigned char*) malloc(originalImageSize);
fread(originalImageData, originalImageSize, 1, inputFile);
fclose(inputFile);
// Calculate scaled image dimensions
int scaledWidth = (int)(originalWidth * scaleFactor);
int scaledHeight = (int)(originalHeight * scaleFactor);
int scaledImageSize = scaledWidth * scaledHeight * 3;
// Create output BMP file
FILE* outputFile = fopen(outputPath, "wb");
if (outputFile == NULL) {
printf("Failed to create output BMP file.\n");
free(originalImageData);
return;
}
// Update BMP file header information
fileHeader.fileSize = sizeof(BitmapFileHeader) + sizeof(BitmapInfoHeader) + scaledImageSize;
infoHeader.width = scaledWidth;
infoHeader.height = scaledHeight;
infoHeader.imageSize = scaledImageSize;
fwrite(&fileHeader, sizeof(BitmapFileHeader), 1, outputFile);
fwrite(&infoHeader, sizeof(BitmapInfoHeader), 1, outputFile);
clock_t start, finish;
start = clock();
// Scale image data
unsigned char* scaledImageData = (unsigned char*) malloc(scaledImageSize);
for (int y = 0; y < scaledHeight; y++) {
for (int x = 0; x < scaledWidth; x++) {
int originalX = (int)(x / scaleFactor);
int originalY = (int)(y / scaleFactor);
scaledImageData[(y * scaledWidth + x) * 3 + 0] = originalImageData[(originalY * originalWidth + originalX) * 3 + 0];
scaledImageData[(y * scaledWidth + x) * 3 + 1] = originalImageData[(originalY * originalWidth + originalX) * 3 + 1];
scaledImageData[(y * scaledWidth + x) * 3 + 2] = originalImageData[(originalY * originalWidth + originalX) * 3 + 2];
}
}
finish = clock();
printf("scale calc use time:%f ms\n", (double)(finish - start) / CLOCKS_PER_SEC);
// Write the scaled image data
fwrite(scaledImageData, scaledImageSize, 1, outputFile);
fclose(outputFile);
free(originalImageData);
free(scaledImageData);
printf("BMP image scaling completed.\n");
}
int main()
{
const char* inputPath = "input.bmp";
const char* outputPath = "output.bmp";
float scaleFactor = 0.5; // Scaling factor
scaleBMP(inputPath, outputPath, scaleFactor);
return 0;
}
SCONS Configuration File#
Create SConscript file
# RT-Thread building script for component
from building import *
cwd = GetCurrentDir()
src = Glob('*.c')
CPPPATH = [cwd]
CPPDEFINES = [
'HAVE_CCONFIG_H',
]
group = DefineGroup('scale', src, depend = [''], CPPPATH = CPPPATH, CPPDEFINES = CPPDEFINES)
Return('group')
Create SConstruct file
import os
import sys
# add building.py path
sys.path = sys.path + [os.path.join('..','..','..','tools')]
from building import *
BuildApplication('scale', 'SConscript', usr_root = '../../..')
Compilation#
Enter the directory src/big/rt-smart
and run the script source smart-env.sh riscv64
to configure the environment variables.
$ source smart-env.sh riscv64
Arch => riscv64
CC => gcc
PREFIX => riscv64-unknown-linux-musl-
EXEC_PATH => /home/testUser/k230_sdk/src/big/rt-smart/../../../toolchain/riscv64-linux-musleabi_for_x86_64-pc-linux-gnu/bin
Enter the userapps/testcases/scale
directory and run scons --release
to compile.
$ cd userapps/testcases/scale
$ scons --release
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: building associated VariantDir targets: build/scale
CC build/scale/scale.o
LINK scale.elf
/home/haohaibo/work/k230_sdk/toolchain/riscv64-linux-musleabi_for_x86_64-pc-linux-gnu/bin/../lib/gcc/riscv64-unknown-linux-musl/12.0.1/../../../../riscv64-unknown-linux-musl/bin/ld: warning: scale.elf has a LOAD segment with RWX permissions
scons: done building targets.
Rename the compiled program
mv scale.elf scale_with_rvv.elf
Edit the k230_sdk/src/big/rt-smart/tools/riscv64.py
file to remove the v extension from the compilation options.
$:k230_sdk/src/big/rt-smart/userapps/testcases/scale$ git diff
diff --git a/tools/riscv64.py b/tools/riscv64.py
index 16fc9b2..c045bf5 100644
--- a/tools/riscv64.py
+++ b/tools/riscv64.py
@@ -44,7 +44,7 @@ class ARCHRISCV64():
EXT_CFLAGS = ''
EXT_LFLAGS = ''
- DEVICE = ' -mcmodel=medany -march=rv64imafdcv -mabi=lp64d'
+ DEVICE = ' -mcmodel=medany -march=rv64imafdc -mabi=lp64d'
self.CFLAGS = configuration.get('CFLAGS', DEVICE + ' -Werror -Wall' + EXT_CFLAGS)
self.AFLAGS = configuration.get('AFLAGS', ' -c' + DEVICE + ' -x assembler-with-cpp -D__ASSEMBLY__ -I.' + EXT_CFLAGS)
LINK_SCRIPT = configuration.get('LINK_SCRIPT', os.path.join(USR_ROOT, 'linker_scripts', 'riscv64', 'link.lds'))
haohaibo@develop:~/work/k230_sdk/src/big/rt-smart/userapps/testcases/scale$
Recompile the source code, then copy both compiled scale.elf
and scale_with_rvv.elf
to the sharefs directory for execution.
Execution#
Prepare a 24-bit BMP image named input.bmp
(you can use PC drawing software to save and generate it), place it in the same directory as the program, and then run these two programs on the large core through sharefs.
msh /sharefs>scale.elf
scale calc use time:0.013952 ms
BMP image scaling completed.
msh /sharefs>scale.elf
scale calc use time:0.013960 ms
BMP image scaling completed.
msh /sharefs>scale.elf
scale calc use time:0.013941 ms
BMP image scaling completed.
msh /sharefs>scale.elf
scale calc use time:0.013936 ms
BMP image scaling completed.
msh /sharefs>scale.elf
scale calc use time:0.013945 ms
BMP image scaling completed.
msh /sharefs>scale.elf
scale calc use time:0.013957 ms
BMP image scaling completed.
msh /sharefs>scale_with_rvv.elf
scale calc use time:0.010139 ms
BMP image scaling completed.
msh /sharefs>scale_with_rvv.elf
scale calc use time:0.010133 ms
BMP image scaling completed.
msh /sharefs>scale_with_rvv.elf
scale calc use time:0.010135 ms
BMP image scaling completed.
msh /sharefs>scale_with_rvv.elf
scale calc use time:0.010144 ms
BMP image scaling completed.
msh /sharefs>scale_with_rvv.elf
scale calc use time:0.010142 ms
BMP image scaling completed.
From the printed information, it is evident that using the V extension instructions significantly speeds up array calculations. If the image resolution is increased to 4K, the contrast will be even more apparent.
Disassembly#
You can use the objdump tool in the source directory to disassemble and confirm whether vector instructions are generated.
$ riscv64-unknown-linux-musl-objdump scale_with_rvv.elf -S | grep 'vadd'
200002c7e:03bf4dd7 vadd.vx v27,v27,t5
200002c94:03834c57 vadd.vx v24,v24,t1
$ riscv64-unknown-linux-musl-objdump scale_with_rvv.elf -S | grep 'vmul'
200002c98:97856c57 vmul.vx v24,v24,a0