Five EmbedDev logo Five EmbedDev

An Embedded RISC-V Blog

Introduction

This post is about using C++ coroutines to suspend and resume functions in real time. The objective is a simple way of building real time tasks using only C++, without the need for an RTOS or operating system kernel.

Coroutines are functions that can be suspended and resumed, using the keywords co_await, co_yield and co_return. The C++20 standard introduced coroutines to the language.

C++ standardized the keywords and type concepts for coroutines, but it did not standardize a runtime1. The lack of a standard runtime has made them hard to use them “out of the box”, but the implementation of coroutines is very adaptable to different use cases.

Project X

Here I use a simple runtime implementing C++20 coroutines on bare metal (no operating system) for RISC-V, using the co_await keyword. This is done by passing the real time scheduler and resume time condition as the argument to the asynchronous wait operator.

The runtime is described in detail in this post.

This story is also published on Medium.

Why coroutines?

I’m interested in coroutines for the follow benefits:

A software timer example

This article will build a simple software timer example. The function has a loop that pauses for several micro seconds before iterating again. While the loop is paused the control flow returns to the caller function.

A simple coroutines task

A simple task periodic is defined in example_simple.cpp. It takes scheduler, period and resume_count as arguments and asynchronously waits period microseconds for 10 iterations, updating the resume_count value each iteration.

The scheduler passed as an argument is not strictly necessary for C++ coroutines, but is used to make the ownership of the context of each task explicit. (It could be possible to use a global scheduler, such as when implementing via OS threads.)

The task returns nop_task. This is a special structure that is linked to the coroutines implementation. In this case a “nop task” refers to a task that does not return a value via co_return.

template<typename SCHEDULER>
nop_task periodic(
    SCHEDULER& scheduler,
    std::chrono::microseconds period,
    volatile uint32_t& resume_count) {
    driver::timer<> mtimer;
    for (auto i = 0; i < 10; i++) {
        co_await scheduled_delay{ scheduler, period };
        *timestamp_resume[resume_count] = mtimer.get_time<driver::timer<>::timer_ticks>().count();
        resume_count = i + 1;
    }
    co_return; // Not strictly needed
}

The function has the following behavior:

The following sequence diagram shows an abstract coroutine execution where an abstracted OS exists to handle the scheduling of process execution.

Task Sequence

Calling the simple coroutine task

The example_simple() function in example_simple.cpp calls the periodic function once, with 100ms as the period value.

The scheduler_delay<mtimer_clock> is a scheduler class that will manage the software timer to wake each coroutine at the appropriate time, using our RISC-V machine mode timer driver mtimer.

    driver::timer<> mtimer;
    // Class to manage timer coroutines
    scheduler_delay<mtimer_clock> scheduler;
    // Run two concurrent loops. The first loop will run concurrently to the second loop.
    auto t0 = periodic(scheduler, 100ms, resume_simple);

Resuming the coroutine tasks

For this example the scheduler is an object instantiated in the example_simple() function. It needs to be called explicitly to calculate when each coroutine needs to be woken and resumed. This is a convention of the runtime for this example, and not a required convention for C++ coroutines.

The tasks are resumed in the WFI busy loop of example_Simple() when scheduler.update() is called. However, as the scheduler is just a C++ class, this can be called from other locations, such as a timer interrupt handler.

    do {
        // Get a delay to the next coroutines wake up
        schedule_by_delay<mtimer_clock> now;
        auto [pending, next_wake] = scheduler.update(now);
        if (pending) {
            // Next wakeup
            mtimer.set_time_cmp(next_wake->delay());
            // Timer interrupt enable
            riscv::csrs.mstatus.mie.clr();
            riscv::csrs.mie.mti.set();
            // WFI Should be called while interrupts are disabled 
            // to ensure interrupt enable and WFI is atomic.            
            core.wfi();
        ]
    } while(true)

For example as the IRQ handler in this example is a lambda function, we could also capture the scheduler and run the timer coroutine in the IRQ handler.

    static const auto handler = [&](void) {
        ...
        schedule_by_delay<mtimer_clock> now;
        auto [pending, next_wake] = scheduler.update(now);
    };

Building and running with Platform IO

The example can be built and run using Platform IO. The default RISC-V platforms use an old version of GCC that does not support C++20, so a custom virtual platform configured to use xPack 12.2.0-3 riscv-none-elf-gcc and run on QEMU has been created in platformio/platforms/virt_riscv.

build_flags = 
    -std=c++20
    -O2
    -g
    -Wall 
    -ffunction-sections 
    -fcoroutines
    -fno-exceptions 
    -fno-rtti 
    -fno-nonansi-builtins 
    -fno-use-cxa-atexit 
    -fno-threadsafe-statics
    -nostartfiles 
    -Wl,-Map,c-hardware-access-riscv.map

The debug sequence shows entering the function example_simple(), initializing scheduler_delay<mtimer_clock> scheduler; then calling periodic(scheduler, 100ms, resume_simple);.

Once the statement co_await scheduled_delay{ scheduler, period }; is reached the context returns to example_simple(). Then when auto [pending, next_wake] = scheduler.resume(now); is called it returns to the for loop in periodic().

The coroutine handle is stored in the scheduler class by the first call to co_await. The following call to scheduler.resume() looks up the pending coroutine handle and calls resume on the handle.

Debug Sequence

The stack of the coroutine periodic() before resume can be seen below. It’s called from example_simple().

Debug Stack - coro

The stack of the coroutines periodic() after resume can be seen below. It’s called from coroutine_handle::resume, which is called from scheduler_ordered::resume. Debug Stack - coro

The stack of example_simple() function calling resume() is also on the same stack. Debug Stack - main

Building with CMake and running with Spike

The Makefile has targets to build with CMake.

$ make target
cmake \
        -DCMAKE_TOOLCHAIN_FILE=cmake/riscv.cmake \
        -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
        -B build_target \
        -S .
cmake --build build_target --verbose
 

The Makefile also has targets to simulate and trace with the standard RISC-V ISA simulator, spike. The forked spike with VCD tracing is used. The forked spike is included in a docker container.

$ make spike_sim
docker run \
        -it \
        --rm \
        -v .:/project \
        fiveembeddev/forked_riscv_spike_dev_env:latest  \
        /opt/riscv-isa-sim/bin/spike \
        --log=spike_sim.log \
        --isa=rv32imac_zicsr \
        -m0x8000000:0x2000,0x80000000:0x4000,0x20010000:0x6a120 \
        --priv=m \
        --pc=0x20010000 \
        --vcd-log=spike_sim.vcd \
        --max-cycles=10000000  \
        --trace-var=timestamp_simple --trace-var=timestamp_resume --trace-var=timestamp_resume_0 --trace-var=timestamp_resume_1 --trace-var=timestamp_resume_2 --trace-var=timestamp_resume_3 --trace-var=timestamp_resume_4 --trace-var=timestamp_resume_5 --trace-var=timestamp_resume_6 --trace-var=timestamp_resume_7 --trace-var=timestamp_resume_8 --trace-var=timestamp_resume_9 --trace-var=resume_simple \
        build_target/src/main.elf
docker run \
     --rm \
    -v .:/project \
     fiveembeddev/riscv_gtkwave_base:latest \
     vcd2fst spike_sim.vcd spike_sim.fst

The results can be viewed with GTKWave. The GTKWave savefile includes address decode and opcode decode by using docker images containing the decoders.

gtkwave spike_sim.fst  spike_sim.gtkw

The benefit of tracing results from the ISA is that it is easy to confirm the periodic timing of the coroutine. (For this example the parameter to periodic() was changed to 1ms).

The periodic write to resume_count and timestamp_resume is traced to VCD so the exact timing of the coroutine execution is visible.

Debug GTKWave Trace

Using GTKWave the context switch can also be examined in detail. In the fake 1GhZ clock used by spike, the context switch takes 104ns.

Debug GTKWave Trace Detail

Coroutine runtime

A more detailed post will follow to describe the runtime. The runtime for this example is in the header embeddev_coro.hpp, and it uses the embeddev_riscv.hpp header to provide a simple HAL for RISC-V and host emulation.

Summary

This post describes a simple working example of how to use C++ coroutines in an embedded context. The example and context are not meant to be a realistic use case, but the simplest possible use case that involves and interrupt handler and a context switch.

However, the example can be built on to explore portable and lightweight asynchronous programming techniques. Future posts will look at that topic.

  1. The C++23 standard library provides a limited runtime for coroutines generators.