Five EmbedDev logo Five EmbedDev

An Embedded RISC-V Blog

Intro & References

For information on assembler programming:

Some good cheat sheets.

GCC Extended Assembler

GCC gives direct access to instructions via __asm__. e.g.

Opcodes are listed in machine readable format here

Instruction List

rv32 rv64 a c d f m q v counters zihintpause zfh csr supervisor hypervisor

rv32

control transfer instructions environment call and breakpoints immediate encoding variants integer register immediate instructions
integer register register operations sec:rv32:ldst

rv32 /

/

Operation Arguments Description
ebreak

RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity).

Spike ISS Implementation:
if (!STATE.debug_mode && (
(!STATE.v && STATE.prv == PRV_M && STATE.dcsr->ebreakm) ||
(!STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreaks) ||
(!STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreaku) ||
(STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreakvs) ||
(STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreakvu))) {
throw trap_debug_mode();
} else {
throw trap_breakpoint(STATE.v, pc);
}
ecall

RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity).

Spike ISS Implementation:
switch (STATE.prv)
{
case PRV_U: throw trap_user_ecall();
case PRV_S:
if (STATE.v)
throw trap_virtual_supervisor_ecall();
else
throw trap_supervisor_ecall();
case PRV_M: throw trap_machine_ecall();
default: abort();
}
fence rs1, rd

RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity).

Spike ISS Implementation:

nop

RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity).

Psuedo Opcode, Equivalent Operations:
addi x0, x0, 0

rv32 / control-transfer-instructions

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.5 Control Transfer Instructions

Operation Arguments Description
beq rs1, rs2, bimm12

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Spike ISS Implementation:
if (RS1 == RS2)
set_pc(BRANCH_TARGET);
bge rs1, rs2, bimm12

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Spike ISS Implementation:
if (sreg_t(RS1) >= sreg_t(RS2))
set_pc(BRANCH_TARGET);
bgeu rs1, rs2, bimm12

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Spike ISS Implementation:
if (RS1 >= RS2)
set_pc(BRANCH_TARGET);
bgt rs, rt, offset

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Psuedo Opcode, Equivalent Operations:
blt rt, rs, offset

bgtu rs, rt, offset

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Psuedo Opcode, Equivalent Operations:
bltu rt, rs, offset

ble rs, rt, offset

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Psuedo Opcode, Equivalent Operations:
bge rt, rs, offset

bleu rs, rt, offset

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Psuedo Opcode, Equivalent Operations:
bgeu rt, rs, offset

blt rs1, rs2, bimm12

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Spike ISS Implementation:
if (sreg_t(RS1) < sreg_t(RS2))
set_pc(BRANCH_TARGET);
bltu rs1, rs2, bimm12

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Signed array bounds may be checked with a single BLTU instruction, since any negative index will compare greater than any nonnegative bound.

Spike ISS Implementation:
if (RS1 < RS2)
set_pc(BRANCH_TARGET);
bne rs1, rs2, bimm12

Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Spike ISS Implementation:
if (RS1 != RS2)
set_pc(BRANCH_TARGET);

rv32 / environment-call-and-breakpoints

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.8 Environment Call and Breakpoints

Operation Arguments Description
sbreak

ECALL and EBREAK were previously named SCALL and SBREAK. The instructions have the same functionality and encoding, but were renamed to reflect that they can be used more generally than to call a supervisor-level operating system or debugger.

scall

ECALL and EBREAK were previously named SCALL and SBREAK. The instructions have the same functionality and encoding, but were renamed to reflect that they can be used more generally than to call a supervisor-level operating system or debugger.

rv32 / immediate-encoding-variants

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.3 Immediate Encoding Variants

Operation Arguments Description
addw rd, rs1, rs2

In RV64I, checks of 32-bit signed additions can be optimized further by comparing the results of ADD and ADDW on the operands.

Spike ISS Implementation:
require_rv64;
WRITE_RD(sext32(RS1 + RS2));
j offset

There are a further two variants of the instruction formats (B/J) based on the handling of immediates, as shown in Figure 1.3 .

Similarly, the only difference between the U and J formats is that the 20-bit immediate is shifted left by 12 bits to form U immediates and by 1 bit to form J immediates. The location of instruction bits in the U and J format immediates is chosen to maximize overlap with the other formats and with each other.

Although more complex implementations might have separate adders for branch and jump calculations and so would not benefit from keeping the location of immediate bits constant across types of instruction, we wanted to reduce the hardware cost of the simplest implementations. By rotating bits in the instruction encoding of B and J immediates instead of using dynamic hardware muxes to multiply the immediate by 2, we reduce instruction signal fanout and immediate mux costs by around a factor of 2. The scrambled immediate encoding will add negligible time to static or ahead-of-time compilation. For dynamic generation of instructions, there is some small additional overhead, but the most common short forward branches have straightforward immediate encodings.

Psuedo Opcode, Equivalent Operations:
jal x0, offset

rv32 / integer-register-immediate-instructions

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.4 Integer Computational Instructions

Operation Arguments Description
addi rd, rs1, imm12

ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic overflow is ignored and the result is simply the low XLEN bits of the result. ADDI rd, rs1, 0 is used to implement the MV rd, rs1 assembler pseudoinstruction.

Spike ISS Implementation:
WRITE_RD(sext_xlen(RS1 + insn.i_imm()));
andi rd, rs1, imm12

ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs).

Spike ISS Implementation:
WRITE_RD(insn.i_imm() & RS1);
auipc rd, imm20

AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the address of the AUIPC instruction, then places the result in register rd.

The AUIPC instruction supports two-instruction sequences to access arbitrary offsets from the PC for both control-flow transfers and data accesses. The combination of an AUIPC and the 12-bit immediate in a JALR can transfer control to any 32-bit PC-relative address, while an AUIPC plus the 12-bit immediate offset in regular load or store instructions can access any 32-bit PC-relative data address.

Spike ISS Implementation:
WRITE_RD(sext_xlen(insn.u_imm() + pc));
jal rd, jimm20

The current PC can be obtained by setting the U-immediate to 0. Although a JAL +4 instruction could also be used to obtain the local PC (of the instruction following the JAL), it might cause pipeline breaks in simpler microarchitectures or pollute BTB structures in more complex microarchitectures.

Spike ISS Implementation:
reg_t tmp = npc;
set_pc(JUMP_TARGET);
WRITE_RD(tmp);
jalr rd, rs1, imm12

The AUIPC instruction supports two-instruction sequences to access arbitrary offsets from the PC for both control-flow transfers and data accesses. The combination of an AUIPC and the 12-bit immediate in a JALR can transfer control to any 32-bit PC-relative address, while an AUIPC plus the 12-bit immediate offset in regular load or store instructions can access any 32-bit PC-relative data address.

Spike ISS Implementation:
reg_t tmp = npc;
set_pc((RS1 + insn.i_imm()) & ~reg_t(1));
WRITE_RD(tmp);
lui rd, imm20

LUI (load upper immediate) is used to build 32-bit constants and uses the U-type format. LUI places the 32-bit U-immediate value into the destination register rd, filling in the lowest 12 bits with zeros.

Spike ISS Implementation:
WRITE_RD(insn.u_imm());
mv rd, rs

ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic overflow is ignored and the result is simply the low XLEN bits of the result. ADDI rd, rs1, 0 is used to implement the MV rd, rs1 assembler pseudoinstruction.

Psuedo Opcode, Equivalent Operations:
addi rd, rs, 0

not rd, rs

ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs).

Psuedo Opcode, Equivalent Operations:
xori rd, rs, -1

ori rd, rs1, imm12

ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs).

Spike ISS Implementation:
// prefetch.i/r/w hint when rd = 0 and i_imm[4:0] = 0/1/3
WRITE_RD(insn.i_imm() | RS1);
seqz rd, rs

SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs).

Psuedo Opcode, Equivalent Operations:
sltiu rd, rs, 1

slli rd, rs1

Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits).

Spike ISS Implementation:
require(SHAMT < xlen);
WRITE_RD(sext_xlen(RS1 << SHAMT));
slti rd, rs1, imm12

SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs).

Spike ISS Implementation:
WRITE_RD(sreg_t(RS1) < sreg_t(insn.i_imm()));
sltiu rd, rs1, imm12

SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs).

Spike ISS Implementation:
WRITE_RD(RS1 < reg_t(insn.i_imm()));
srai rd, rs1

Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits).

Spike ISS Implementation:
require(SHAMT < xlen);
WRITE_RD(sext_xlen(sext_xlen(RS1) >> SHAMT));
srli rd, rs1

Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits).

Spike ISS Implementation:
require(SHAMT < xlen);
WRITE_RD(sext_xlen(zext_xlen(RS1) >> SHAMT));
xori rd, rs1, imm12

ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs).

Spike ISS Implementation:
WRITE_RD(insn.i_imm() ^ RS1);

rv32 / integer-register-register-operations

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.4 Integer Computational Instructions

Operation Arguments Description
add rd, rs1, rs2

ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations.

Spike ISS Implementation:
WRITE_RD(sext_xlen(RS1 + RS2));
and rd, rs1, rs2

ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations.

Spike ISS Implementation:
WRITE_RD(RS1 & RS2);
or rd, rs1, rs2

ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations.

Spike ISS Implementation:
WRITE_RD(RS1 | RS2);
sll rd, rs1, rs2

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2.

Spike ISS Implementation:
WRITE_RD(sext_xlen(RS1 << (RS2 & (xlen-1))));
slt rd, rs1, rs2

ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations.

Spike ISS Implementation:
WRITE_RD(sreg_t(RS1) < sreg_t(RS2));
sltu rd, rs1, rs2

ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations.

Spike ISS Implementation:
WRITE_RD(RS1 < RS2);
snez rd, rs

ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations.

Psuedo Opcode, Equivalent Operations:
sltu rd, x0, rs

sra rd, rs1, rs2

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2.

Spike ISS Implementation:
WRITE_RD(sext_xlen(sext_xlen(RS1) >> (RS2 & (xlen-1))));
srl rd, rs1, rs2

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2.

Spike ISS Implementation:
WRITE_RD(sext_xlen(zext_xlen(RS1) >> (RS2 & (xlen-1))));
sub rd, rs1, rs2

ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations.

Spike ISS Implementation:
WRITE_RD(sext_xlen(RS1 - RS2));
xor rd, rs1, rs2

ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations.

Spike ISS Implementation:
WRITE_RD(RS1 ^ RS2);

rv32 / sec:rv32:ldst

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.6 Load and Store Instructions

Operation Arguments Description
lb rd, rs1, imm12

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory.

Spike ISS Implementation:
WRITE_RD(MMU.load<int8_t>(RS1 + insn.i_imm()));
lbu rd, rs1, imm12

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory.

Spike ISS Implementation:
WRITE_RD(MMU.load<uint8_t>(RS1 + insn.i_imm()));
lh rd, rs1, imm12

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory.

Spike ISS Implementation:
WRITE_RD(MMU.load<int16_t>(RS1 + insn.i_imm()));
lhu rd, rs1, imm12

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory.

Spike ISS Implementation:
WRITE_RD(MMU.load<uint16_t>(RS1 + insn.i_imm()));
lw rd, rs1, imm12

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory.

Spike ISS Implementation:
WRITE_RD(MMU.load<int32_t>(RS1 + insn.i_imm()));
sb rs1, rs2, imm12

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory.

Spike ISS Implementation:
MMU.store<uint8_t>(RS1 + insn.s_imm(), RS2);
sh rs1, rs2, imm12

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory.

Spike ISS Implementation:
MMU.store<uint16_t>(RS1 + insn.s_imm(), RS2);
sw rs1, rs2, imm12

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory.

Spike ISS Implementation:
MMU.store<uint32_t>(RS1 + insn.s_imm(), RS2);

rv64

integer computational instructions integer register immediate instructions load and store instructions register state

rv64 / integer-computational-instructions

6 RV64I Base Integer Instruction Set, Version 2.1 / 6.2 Integer Computational Instructions

Operation Arguments Description
sllw rd, rs1, rs2

SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. The shift amount is given by rs2[4:0].

Spike ISS Implementation:
require_rv64;
WRITE_RD(sext32(RS1 << (RS2 & 0x1F)));
sraw rd, rs1, rs2

SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. The shift amount is given by rs2[4:0].

Spike ISS Implementation:
require_rv64;
WRITE_RD(sext32(int32_t(RS1) >> (RS2 & 0x1F)));
srlw rd, rs1, rs2

SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. The shift amount is given by rs2[4:0].

Spike ISS Implementation:
require_rv64;
WRITE_RD(sext32((uint32_t)RS1 >> (RS2 & 0x1F)));

rv64 / integer-register-immediate-instructions

6 RV64I Base Integer Instruction Set, Version 2.1 / 6.2 Integer Computational Instructions

Operation Arguments Description
addiw rd, rs1, imm12

ADDIW is an RV64I instruction that adds the sign-extended 12-bit immediate to register rs1 and produces the proper sign-extension of a 32-bit result in rd. Overflows are ignored and the result is the low 32 bits of the result sign-extended to 64 bits. Note, ADDIW rd, rs1, 0 writes the sign-extension of the lower 32 bits of register rs1 into register rd (assembler pseudoinstruction SEXT.W).

Spike ISS Implementation:
require_rv64;
WRITE_RD(sext32(insn.i_imm() + RS1));
ld rd, rs1, imm12

Note that the set of address offsets that can be formed by pairing LUI with LD, AUIPC with JALR, etc.in RV64I is [ - 231 - 211, 231 - 211 - 1].

Spike ISS Implementation:
require_rv64;
WRITE_RD(MMU.load<int64_t>(RS1 + insn.i_imm()));
sext.w rd, rs

ADDIW is an RV64I instruction that adds the sign-extended 12-bit immediate to register rs1 and produces the proper sign-extension of a 32-bit result in rd. Overflows are ignored and the result is the low 32 bits of the result sign-extended to 64 bits. Note, ADDIW rd, rs1, 0 writes the sign-extension of the lower 32 bits of register rs1 into register rd (assembler pseudoinstruction SEXT.W).

Psuedo Opcode, Equivalent Operations:
addiw rd, rs, 0

slliw rd, rs1

SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. SLLIW, SRLIW, and SRAIW encodings with imm[5] 0 are reserved.

Previously, SLLIW, SRLIW, and SRAIW with imm[5] 0 were defined to cause illegal instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change.

Spike ISS Implementation:
require_rv64;
WRITE_RD(sext32(RS1 << SHAMT));
sraiw rd, rs1

SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. SLLIW, SRLIW, and SRAIW encodings with imm[5] 0 are reserved.

Previously, SLLIW, SRLIW, and SRAIW with imm[5] 0 were defined to cause illegal instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change.

Spike ISS Implementation:
require_rv64;
WRITE_RD(sext32(int32_t(RS1) >> SHAMT));
srliw rd, rs1

SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. SLLIW, SRLIW, and SRAIW encodings with imm[5] 0 are reserved.

Previously, SLLIW, SRLIW, and SRAIW with imm[5] 0 were defined to cause illegal instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change.

Spike ISS Implementation:
require_rv64;
WRITE_RD(sext32((uint32_t)RS1 >> SHAMT));

rv64 / load-and-store-instructions

6 RV64I Base Integer Instruction Set, Version 2.1 / 6.3 Load and Store Instructions

Operation Arguments Description
lwu rd, rs1, imm12

The LW instruction loads a 32-bit value from memory and sign-extends this to 64 bits before storing it in register rd for RV64I. The LWU instruction, on the other hand, zero-extends the 32-bit value from memory for RV64I. LH and LHU are defined analogously for 16-bit values, as are LB and LBU for 8-bit values. The SD, SW, SH, and SB instructions store 64-bit, 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory respectively.

Spike ISS Implementation:
require_rv64;
WRITE_RD(MMU.load<uint32_t>(RS1 + insn.i_imm()));
sd rs1, rs2, imm12

The LW instruction loads a 32-bit value from memory and sign-extends this to 64 bits before storing it in register rd for RV64I. The LWU instruction, on the other hand, zero-extends the 32-bit value from memory for RV64I. LH and LHU are defined analogously for 16-bit values, as are LB and LBU for 8-bit values. The SD, SW, SH, and SB instructions store 64-bit, 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory respectively.

Spike ISS Implementation:
require_rv64;
MMU.store<uint64_t>(RS1 + insn.s_imm(), RS2);

rv64 / register-state

6 RV64I Base Integer Instruction Set, Version 2.1 / 6.1 Register State

Operation Arguments Description
subw rd, rs1, rs2

The compiler and calling convention maintain an invariant that all 32-bit values are held in a sign-extended format in 64-bit registers. Even 32-bit unsigned integers extend bit 31 into bits 63 through 32. Consequently, conversion between unsigned and signed 32-bit integers is a no-op, as is conversion from a signed 32-bit integer to a signed 64-bit integer. Existing 64-bit wide SLTU and unsigned branch compares still operate correctly on unsigned 32-bit integers under this invariant. Similarly, existing 64-bit wide logical operations on 32-bit sign-extended integers preserve the sign-extension property. A few new instructions (ADD[I]W/SUBW/SxxW) are required for addition and shifts to ensure reasonable performance for 32-bit values.

Spike ISS Implementation:
require_rv64;
WRITE_RD(sext32(RS1 - RS2));

a

atomics sec:lrsc

a / atomics

9 “A” Standard Extension for Atomic Instructions, Version 2.1 / 9.4 Atomic Memory Operations

Operation Arguments Description
amoadd.d rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs + RS2; }));
amoadd.w rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs + RS2; })));
amoand.d rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs & RS2; }));
amoand.w rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs & RS2; })));
amomax.d rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.amo<uint64_t>(RS1, [&](int64_t lhs) { return std::max(lhs, int64_t(RS2)); }));
amomax.w rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](int32_t lhs) { return std::max(lhs, int32_t(RS2)); })));
amomaxu.d rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return std::max(lhs, RS2); }));
amomaxu.w rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return std::max(lhs, uint32_t(RS2)); })));
amomin.d rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.amo<uint64_t>(RS1, [&](int64_t lhs) { return std::min(lhs, int64_t(RS2)); }));
amomin.w rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](int32_t lhs) { return std::min(lhs, int32_t(RS2)); })));
amominu.d rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return std::min(lhs, RS2); }));
amominu.w rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return std::min(lhs, uint32_t(RS2)); })));
amoor.d rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs | RS2; }));
amoor.w rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs | RS2; })));
amoswap.d rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t UNUSED lhs) { return RS2; }));
amoswap.w rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t UNUSED lhs) { return RS2; })));
amoxor.d rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs ^ RS2; }));
amoxor.w rd, rs1, rs2

The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs ^ RS2; })));

a / sec:lrsc

9 “A” Standard Extension for Atomic Instructions, Version 2.1 / 9.2 Load-Reserved/Store-Conditional Instructions

Operation Arguments Description
lr.d rd, rs1

Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd.

Spike ISS Implementation:
require_extension('A');
require_rv64;
WRITE_RD(MMU.load_reserved<int64_t>(RS1));
lr.w rd, rs1

Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd.

Spike ISS Implementation:
require_extension('A');
WRITE_RD(MMU.load_reserved<int32_t>(RS1));
sc.d rd, rs1, rs2

Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd.

Spike ISS Implementation:
require_extension('A');
require_rv64;

bool have_reservation = MMU.store_conditional<uint64_t>(RS1, RS2);

WRITE_RD(!have_reservation);
sc.w rd, rs1, rs2

Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd.

Spike ISS Implementation:
require_extension('A');

bool have_reservation = MMU.store_conditional<uint32_t>(RS1, RS2);

WRITE_RD(!have_reservation);

c

compressed control transfer instructions integer constant generation instructions integer register immediate operations integer register register operations
load and store instructions nop instruction stack pointer based loads and stores

c / compressed

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.8 RVC Instruction Set Listings

Operation Arguments Description
c.slli_rv32 rd_rs1_n0, c_nzuimm6lo
c.srai_rv32 rd_rs1_p, c_nzuimm5
c.srli_rv32 rd_rs1_p, c_nzuimm5

c / control-transfer-instructions

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.4 Control Transfer Instructions

Operation Arguments Description
c.beqz rs1_p, c_bimm9

C.BEQZ performs conditional control transfers. The offset is sign-extended and added to the pc to form the branch target address. It can therefore target a ±256 B range. C.BEQZ takes the branch if the value in register rs1' is zero. It expands to beq rs1'', x0, offset.

Spike ISS Implementation:
require_extension(EXT_ZCA);
if (RVC_RS1S == 0)
set_pc(pc + insn.rvc_b_imm());
c.bnez rs1_p, c_bimm9

C.BNEZ is defined analogously, but it takes the branch if rs1' contains a nonzero value. It expands to bne rs1'', x0, offset.

Spike ISS Implementation:
require_extension(EXT_ZCA);
if (RVC_RS1S != 0)
set_pc(pc + insn.rvc_b_imm());
c.j c_imm12

C.J performs an unconditional control transfer. The offset is sign-extended and added to the pc to form the jump target address. C.J can therefore target a ±2 KiB range. C.J expands to jal x0, offset.

C.JAL is an RV32C-only instruction that performs the same operation as C.J, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1. C.JAL expands to jal x1, offset.

Spike ISS Implementation:
require_extension(EXT_ZCA);
set_pc(pc + insn.rvc_j_imm());
c.jal c_imm12

C.JAL is an RV32C-only instruction that performs the same operation as C.J, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1. C.JAL expands to jal x1, offset.

Spike ISS Implementation:
require_extension(EXT_ZCA);
if (xlen == 32) {
reg_t tmp = npc;
set_pc(pc + insn.rvc_j_imm());
WRITE_REG(X_RA, tmp);
} else { // c.addiw
require(insn.rvc_rd() != 0);
WRITE_RD(sext32(RVC_RS1 + insn.rvc_imm()));
}

c / integer-constant-generation-instructions

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions

Operation Arguments Description
c.addi16sp c_nzimm10

C.LUI loads the non-zero 6-bit immediate field into bits 17-12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination. C.LUI expands into lui rd, nzimm. C.LUI is only valid when rd {x0,x2}, and when the immediate is not equal to zero. The code points with nzimm=0 are reserved; the remaining code points with rd=x0 are HINTs; and the remaining code points with rd=x2 correspond to the C.ADDI16SP instruction.

c.li rd, c_imm6

C.LI loads the sign-extended 6-bit immediate, imm, into register rd. C.LI expands into addi rd, x0, imm. C.LI is only valid when rd x0; the code points with rd=x0 encode HINTs.

Spike ISS Implementation:
require_extension(EXT_ZCA);
WRITE_RD(insn.rvc_imm());
c.lui rd_n2, c_nzimm18

C.LUI loads the non-zero 6-bit immediate field into bits 17-12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination. C.LUI expands into lui rd, nzimm. C.LUI is only valid when rd {x0,x2}, and when the immediate is not equal to zero. The code points with nzimm=0 are reserved; the remaining code points with rd=x0 are HINTs; and the remaining code points with rd=x2 correspond to the C.ADDI16SP instruction.

Spike ISS Implementation:
require_extension(EXT_ZCA);
if (insn.rvc_rd() == 2) { // c.addi16sp
require(insn.rvc_addi16sp_imm() != 0);
WRITE_REG(X_SP, sext_xlen(RVC_SP + insn.rvc_addi16sp_imm()));
} else {
require(insn.rvc_imm() != 0);
WRITE_RD(insn.rvc_imm() << 12);
}

c / integer-register-immediate-operations

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions

Operation Arguments Description
c.addi rd_rs1_n0, c_nzimm6, c_nzimm6

C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in register rd then writes the result to rd. C.ADDI expands into addi rd, rd, nzimm. C.ADDI is only valid when rd x0 and nzimm 0. The code points with rd=x0 encode the C.NOP instruction; the remaining code points with nzimm=0 encode HINTs.

Spike ISS Implementation:
require_extension(EXT_ZCA);
WRITE_RD(sext_xlen(RVC_RS1 + insn.rvc_imm()));
c.addi4spn rd_p, c_nzuimm10

C.ADDI4SPN is a CIW-format instruction that adds a zero-extended non-zero immediate, scaled by 4, to the stack pointer, x2, and writes the result to rd''. This instruction is used to generate pointers to stack-allocated variables, and expands to addi rd'', x2, nzuimm. C.ADDI4SPN is only valid when nzuimm 0; the code points with nzuimm=0 are reserved.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require(insn.rvc_addi4spn_imm() != 0);
WRITE_RVC_RS2S(sext_xlen(RVC_SP + insn.rvc_addi4spn_imm()));
c.addiw rd_rs1_n0, c_imm6

C.ADDIW is an RV64C/RV128C-only instruction that performs the same computation but produces a 32-bit result, then sign-extends result to 64 bits. C.ADDIW expands into addiw rd, rd, imm. The immediate can be zero for C.ADDIW, where this corresponds to sext.w rd. C.ADDIW is only valid when rd x0; the code points with rd=x0 are reserved.

c.andi rd_rs1_p, c_imm6

C.ANDI is a CB-format instruction that computes the bitwise AND of the value in register rd' and the sign-extended 6-bit immediate, then writes the result to rd'. C.ANDI expands to andi rd'', rd'', imm.

Spike ISS Implementation:
require_extension(EXT_ZCA);
WRITE_RVC_RS1S(RVC_RS1S & insn.rvc_imm());
c.slli rd_rs1_n0, c_nzuimm6

C.SLLI is a CI-format instruction that performs a logical left shift of the value in register rd then writes the result to rd. The shift amount is encoded in the shamt field. For RV128C, a shift amount of zero is used to encode a shift of 64. C.SLLI expands into slli rd, rd, shamt, except for RV128C with shamt=0, which expands to slli rd, rd, 64.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require(insn.rvc_zimm() < xlen);
WRITE_RD(sext_xlen(RVC_RS1 << insn.rvc_zimm()));
c.srai rd_rs1_p, c_nzuimm6

C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic right shift. C.SRAI expands to srai rd'', rd'', shamt.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require(insn.rvc_zimm() < xlen);
WRITE_RVC_RS1S(sext_xlen(sext_xlen(RVC_RS1S) >> insn.rvc_zimm()));
c.srli rd_rs1_p, c_nzuimm6

C.SRLI is a CB-format instruction that performs a logical right shift of the value in register rd' then writes the result to rd'. The shift amount is encoded in the shamt field. For RV128C, a shift amount of zero is used to encode a shift of 64. Furthermore, the shift amount is sign-extended for RV128C, and so the legal shift amounts are 1-31, 64, and 96-127. C.SRLI expands into srli rd'', rd'', shamt, except for RV128C with shamt=0, which expands to srli rd'', rd'', 64.

C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic right shift. C.SRAI expands to srai rd'', rd'', shamt.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require(insn.rvc_zimm() < xlen);
WRITE_RVC_RS1S(sext_xlen(zext_xlen(RVC_RS1S) >> insn.rvc_zimm()));

c / integer-register-register-operations

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions

Operation Arguments Description
c.add rd_rs1, c_rs2_n0

C.ADD adds the values in registers rd and rs2 and writes the result to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JALR and C.EBREAK instructions. The code points with rs2 x0 and rd = x0 are HINTs.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require(insn.rvc_rs2() != 0);
WRITE_RD(sext_xlen(RVC_RS1 + RVC_RS2));
c.addw rd_rs1_p, rs2_p

C.ADDW is an RV64C/RV128C-only instruction that adds the values in registers rd' and rs2', then sign-extends the lower 32 bits of the sum before writing the result to register rd'. C.ADDW expands into addw rd'', rd'', rs2''.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require_rv64;
WRITE_RVC_RS1S(sext32(RVC_RS1S + RVC_RS2S));
c.and rd_rs1_p, rs2_p

C.AND computes the bitwise AND of the values in registers rd' and rs2', then writes the result to register rd'. C.AND expands into and rd'', rd'', rs2''.

Spike ISS Implementation:
require_extension(EXT_ZCA);
WRITE_RVC_RS1S(RVC_RS1S & RVC_RS2S);
c.ebreak

C.ADD adds the values in registers rd and rs2 and writes the result to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JALR and C.EBREAK instructions. The code points with rs2 x0 and rd = x0 are HINTs.

Spike ISS Implementation:
require_extension(EXT_ZCA);
if (!STATE.debug_mode && (
(!STATE.v && STATE.prv == PRV_M && STATE.dcsr->ebreakm) ||
(!STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreaks) ||
(!STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreaku) ||
(STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreakvs) ||
(STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreakvu))) {
throw trap_debug_mode();
} else {
throw trap_breakpoint(STATE.v, pc);
}
c.jalr c_rs1_n0

C.ADD adds the values in registers rd and rs2 and writes the result to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JALR and C.EBREAK instructions. The code points with rs2 x0 and rd = x0 are HINTs.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require(insn.rvc_rs1() != 0);
reg_t tmp = npc;
set_pc(RVC_RS1 & ~reg_t(1));
WRITE_REG(X_RA, tmp);
c.jr rs1_n0

C.MV copies the value in register rs2 into register rd. C.MV expands into add rd, x0, rs2. C.MV is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JR instruction. The code points with rs2 x0 and rd = x0 are HINTs.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require(insn.rvc_rs1() != 0);
set_pc(RVC_RS1 & ~reg_t(1));
c.mv rd, c_rs2_n0

C.MV copies the value in register rs2 into register rd. C.MV expands into add rd, x0, rs2. C.MV is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JR instruction. The code points with rs2 x0 and rd = x0 are HINTs.

C.MV expands to a different instruction than the canonical MV pseudoinstruction, which instead uses ADDI. Implementations that handle MV specially, e.g. using register-renaming hardware, may find it more convenient to expand C.MV to MV instead of ADD, at slight additional hardware cost.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require(insn.rvc_rs2() != 0);
WRITE_RD(RVC_RS2);
c.or rd_rs1_p, rs2_p

C.OR computes the bitwise OR of the values in registers rd' and rs2', then writes the result to register rd'. C.OR expands into or rd'', rd'', rs2''.

Spike ISS Implementation:
require_extension(EXT_ZCA);
WRITE_RVC_RS1S(RVC_RS1S | RVC_RS2S);
c.sub rd_rs1_p, rs2_p

C.SUB subtracts the value in register rs2' from the value in register rd', then writes the result to register rd'. C.SUB expands into sub rd'', rd'', rs2''.

Spike ISS Implementation:
require_extension(EXT_ZCA);
WRITE_RVC_RS1S(sext_xlen(RVC_RS1S - RVC_RS2S));
c.subw rd_rs1_p, rs2_p

C.SUBW is an RV64C/RV128C-only instruction that subtracts the value in register rs2' from the value in register rd', then sign-extends the lower 32 bits of the difference before writing the result to register rd'. C.SUBW expands into subw rd'', rd'', rs2''.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require_rv64;
WRITE_RVC_RS1S(sext32(RVC_RS1S - RVC_RS2S));
c.xor rd_rs1_p, rs2_p

C.XOR computes the bitwise XOR of the values in registers rd' and rs2', then writes the result to register rd'. C.XOR expands into xor rd'', rd'', rs2''.

Spike ISS Implementation:
require_extension(EXT_ZCA);
WRITE_RVC_RS1S(RVC_RS1S ^ RVC_RS2S);

c / load-and-store-instructions

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.3 Load and Store Instructions

Operation Arguments Description
c.fld rd_p, rs1_p, c_uimm8

C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register rd'. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to fld rd'', offset(rs1'').

Spike ISS Implementation:
require_extension(EXT_ZCD);
require_fp;
WRITE_RVC_FRS2S(f64(MMU.load<uint64_t>(RVC_RS1S + insn.rvc_ld_imm())));
c.flw rd_p, rs1_p, c_uimm7

C.FLW is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to flw rd'', offset(rs1'').

Spike ISS Implementation:
if (xlen == 32) {
require_extension(EXT_ZCF);
require_fp;
WRITE_RVC_FRS2S(f32(MMU.load<uint32_t>(RVC_RS1S + insn.rvc_lw_imm())));
} else { // c.ld
require_extension(EXT_ZCA);
WRITE_RVC_RS2S(MMU.load<int64_t>(RVC_RS1S + insn.rvc_ld_imm()));
}
c.fsd rs1_p, rs2_p, c_uimm8

C.FSD is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to fsd rs2'', offset(rs1'').

Spike ISS Implementation:
require_extension(EXT_ZCD);
require_fp;
MMU.store<uint64_t>(RVC_RS1S + insn.rvc_ld_imm(), RVC_FRS2S.v[0]);
c.fsw rs1_p, rs2_p, c_uimm7

C.FSW is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to fsw rs2'', offset(rs1'').

Spike ISS Implementation:
if (xlen == 32) {
require_extension(EXT_ZCF);
require_fp;
MMU.store<uint32_t>(RVC_RS1S + insn.rvc_lw_imm(), RVC_FRS2S.v[0]);
} else { // c.sd
require_extension(EXT_ZCA);
MMU.store<uint64_t>(RVC_RS1S + insn.rvc_ld_imm(), RVC_RS2S);
}
c.ld rd_p, rs1_p, c_uimm8

C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to ld rd'', offset(rs1'').

c.lw rd_p, rs1_p, c_uimm7

C.LW loads a 32-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to lw rd'', offset(rs1'').

Spike ISS Implementation:
require_extension(EXT_ZCA);
WRITE_RVC_RS2S(MMU.load<int32_t>(RVC_RS1S + insn.rvc_lw_imm()));
c.sd rs1_p, rs2_p, c_uimm8

C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to sd rs2'', offset(rs1'').

c.sw rs1_p, rs2_p, c_uimm7

C.SW stores a 32-bit value in register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to sw rs2'', offset(rs1'').

Spike ISS Implementation:
require_extension(EXT_ZCA);
MMU.store<uint32_t>(RVC_RS1S + insn.rvc_lw_imm(), RVC_RS2S);

c / nop-instruction

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions

Operation Arguments Description
c.nop c_nzimm6

C.NOP is a CI-format instruction that does not change any user-visible state, except for advancing the pc and incrementing any applicable performance counters. C.NOP expands to nop. C.NOP is only valid when imm=0; the code points with imm 0 encode HINTs.

c / stack-pointer-based-loads-and-stores

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.3 Load and Store Instructions

Operation Arguments Description
c.fldsp rd, c_uimm9sp

C.FLDSP is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register rd. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to fld rd, offset(x2).

Spike ISS Implementation:
require_extension(EXT_ZCD);
require_fp;
WRITE_FRD(f64(MMU.load<uint64_t>(RVC_SP + insn.rvc_ldsp_imm())));
c.flwsp rd, c_uimm8sp

C.FLWSP is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register rd. It computes its effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to flw rd, offset(x2).

Spike ISS Implementation:
if (xlen == 32) {
require_extension(EXT_ZCF);
require_fp;
WRITE_FRD(f32(MMU.load<uint32_t>(RVC_SP + insn.rvc_lwsp_imm())));
} else { // c.ldsp
require_extension(EXT_ZCA);
require(insn.rvc_rd() != 0);
WRITE_RD(MMU.load<int64_t>(RVC_SP + insn.rvc_ldsp_imm()));
}
c.fsdsp c_rs2, c_uimm9sp_s

C.FSDSP is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to fsd rs2, offset(x2).

Spike ISS Implementation:
require_extension(EXT_ZCD);
require_fp;
MMU.store<uint64_t>(RVC_SP + insn.rvc_sdsp_imm(), RVC_FRS2.v[0]);
c.fswsp c_rs2, c_uimm8sp_s

C.FSWSP is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to fsw rs2, offset(x2).

Spike ISS Implementation:
if (xlen == 32) {
require_extension(EXT_ZCF);
require_fp;
MMU.store<uint32_t>(RVC_SP + insn.rvc_swsp_imm(), RVC_FRS2.v[0]);
} else { // c.sdsp
require_extension(EXT_ZCA);
MMU.store<uint64_t>(RVC_SP + insn.rvc_sdsp_imm(), RVC_RS2);
}
c.ldsp rd_n0, c_uimm9sp

C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into register rd. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to ld rd, offset(x2). C.LDSP is only valid when rd x0; the code points with rd = x0 are reserved.

c.lwsp rd_n0, c_uimm8sp

C.LWSP loads a 32-bit value from memory into register rd. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to lw rd, offset(x2). C.LWSP is only valid when rd x0; the code points with rd = x0 are reserved.

Spike ISS Implementation:
require_extension(EXT_ZCA);
require(insn.rvc_rd() != 0);
WRITE_RD(MMU.load<int32_t>(RVC_SP + insn.rvc_lwsp_imm()));
c.sdsp c_rs2, c_uimm9sp_s

C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to sd rs2, offset(x2).

c.swsp c_rs2, c_uimm8sp_s

C.SWSP stores a 32-bit value in register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to sw rs2, offset(x2).

Spike ISS Implementation:
require_extension(EXT_ZCA);
MMU.store<uint32_t>(RVC_SP + insn.rvc_swsp_imm(), RVC_RS2);

d

d standard extension for double precision floating point version 2.2 double precision floating point conversion and move instructions fld fsd sec:single float compute single precision floating point compare instructions
single precision floating point conversion and move instructions

d / d-standard-extension-for-double-precision-floating-point-version-2.2

13 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / 13.7 Double-Precision Floating-Point Classify Instruction

Operation Arguments Description
fclass.d rd, rs1

The double-precision floating-point classify instruction, FCLASS.D, is defined analogously to its single-precision counterpart, but operates on double-precision operands.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
WRITE_RD(f64_classify(FRS1_D));

d / double-precision-floating-point-conversion-and-move-instructions

13 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / 13.5 Double-Precision Floating-Point Conversion and Move Instructions

Operation Arguments Description
fcvt.d.l rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.

fcvt.d.lu rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.

fcvt.d.s rd, rs1

The double-precision to single-precision and single-precision to double-precision conversion instructions, FCVT.S.D and FCVT.D.S, are encoded in the OP-FP major opcode space and both the source and destination are floating-point registers. The rs2 field encodes the datatype of the source, and the fmt field encodes the datatype of the destination. FCVT.S.D rounds according to the RM field; FCVT.D.S will never round.

fcvt.d.w rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.

All floating-point to integer and integer to floating-point conversion instructions round according to the rm field. Note FCVT.D.W[U] always produces an exact result and is unaffected by rounding mode.

fcvt.d.wu rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.

fcvt.l.d rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.

fcvt.lu.d rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.

fcvt.s.d rd, rs1

The double-precision to single-precision and single-precision to double-precision conversion instructions, FCVT.S.D and FCVT.D.S, are encoded in the OP-FP major opcode space and both the source and destination are floating-point registers. The rs2 field encodes the datatype of the source, and the fmt field encodes the datatype of the destination. FCVT.S.D rounds according to the RM field; FCVT.D.S will never round.

fcvt.w.d rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.

fcvt.wu.d rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.

fmv.d rd, rs

For XLEN>=64 only, instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.D moves the double-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd. FMV.D.X moves the double-precision value encoded in IEEE 754-2008 standard encoding from the integer register rs1 to the floating-point register rd.

FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Psuedo Opcode, Equivalent Operations:
fsgnj.d rd, rs, rs

fmv.x.d rd, rs1

For XLEN>=64 only, instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.D moves the double-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd. FMV.D.X moves the double-precision value encoded in IEEE 754-2008 standard encoding from the integer register rs1 to the floating-point register rd.

FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

fsgnj.d rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
WRITE_FRD_D(fsgnj64(freg(FRS1_D), freg(FRS2_D), false, false));
fsgnjn.d rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
WRITE_FRD_D(fsgnj64(freg(FRS1_D), freg(FRS2_D), true, false));
fsgnjx.d rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
WRITE_FRD_D(fsgnj64(freg(FRS1_D), freg(FRS2_D), false, true));

d / fld_fsd

13 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / 13.3 Double-Precision Load and Store Instructions

Operation Arguments Description
fld rd, rs1, imm12

The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd. FSD stores a double-precision value from the floating-point registers to memory.

FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN>=64.

FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:
require_extension('D');
require_fp;
WRITE_FRD(f64(MMU.load<uint64_t>(RS1 + insn.i_imm())));
fsd rs1, rs2, imm12

The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd. FSD stores a double-precision value from the floating-point registers to memory.

FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN>=64.

FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:
require_extension('D');
require_fp;
MMU.store<uint64_t>(RS1 + insn.s_imm(), FRS2.v[0]);

d / sec:single-float-compute

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.6 Single-Precision Floating-Point Computational Instructions

Operation Arguments Description
fadd.d rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_D(f64_add(FRS1_D, FRS2_D));
set_fp_exceptions;
fdiv.d rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_D(f64_div(FRS1_D, FRS2_D));
set_fp_exceptions;
fmadd.d rd, rs1, rs2, rs3

FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_D(f64_mulAdd(FRS1_D, FRS2_D, FRS3_D));
set_fp_exceptions;
fmax.d rd, rs1, rs2

Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN.

Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
bool greater = f64_lt_quiet(FRS2_D, FRS1_D) ||
(f64_eq(FRS2_D, FRS1_D) && (FRS2_D.v & F64_SIGN));
if (isNaNF64UI(FRS1_D.v) && isNaNF64UI(FRS2_D.v))
WRITE_FRD_D(f64(defaultNaNF64UI));
else
WRITE_FRD_D((greater || isNaNF64UI(FRS2_D.v) ? FRS1_D : FRS2_D));
set_fp_exceptions;
fmin.d rd, rs1, rs2

Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN.

Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
bool less = f64_lt_quiet(FRS1_D, FRS2_D) ||
(f64_eq(FRS1_D, FRS2_D) && (FRS1_D.v & F64_SIGN));
if (isNaNF64UI(FRS1_D.v) && isNaNF64UI(FRS2_D.v))
WRITE_FRD_D(f64(defaultNaNF64UI));
else
WRITE_FRD_D((less || isNaNF64UI(FRS2_D.v) ? FRS1_D : FRS2_D));
set_fp_exceptions;
fmsub.d rd, rs1, rs2, rs3

FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_D(f64_mulAdd(FRS1_D, FRS2_D, f64(FRS3_D.v ^ F64_SIGN)));
set_fp_exceptions;
fmul.d rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_D(f64_mul(FRS1_D, FRS2_D));
set_fp_exceptions;
fnmadd.d rd, rs1, rs2, rs3

FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_D(f64_mulAdd(f64(FRS1_D.v ^ F64_SIGN), FRS2_D, f64(FRS3_D.v ^ F64_SIGN)));
set_fp_exceptions;
fnmsub.d rd, rs1, rs2, rs3

FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_D(f64_mulAdd(f64(FRS1_D.v ^ F64_SIGN), FRS2_D, FRS3_D));
set_fp_exceptions;
fsqrt.d rd, rs1

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_D(f64_sqrt(FRS1_D));
set_fp_exceptions;
fsub.d rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_D(f64_sub(FRS1_D, FRS2_D));
set_fp_exceptions;

d / single-precision-floating-point-compare-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.8 Single-Precision Floating-Point Compare Instructions

Operation Arguments Description
feq.d rd, rs1, rs2

Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.

FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
WRITE_RD(f64_eq(FRS1_D, FRS2_D));
set_fp_exceptions;
fle.d rd, rs1, rs2

Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.

FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
WRITE_RD(f64_le(FRS1_D, FRS2_D));
set_fp_exceptions;
flt.d rd, rs1, rs2

Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.

FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.

Spike ISS Implementation:
require_either_extension('D', EXT_ZDINX);
require_fp;
WRITE_RD(f64_lt(FRS1_D, FRS2_D));
set_fp_exceptions;

d / single-precision-floating-point-conversion-and-move-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.7 Single-Precision Floating-Point Conversion and Move Instructions

Operation Arguments Description
fabs.d rd, rs

Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).

Psuedo Opcode, Equivalent Operations:
fsgnjx.d rd, rs, rs

fmv.d.x rd, rs1

Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).

The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.

fneg.d rd, rs

Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).

Psuedo Opcode, Equivalent Operations:
fsgnjn.d rd, rs, rs

f

floating point control and status register sec:single float sec:single float compute single precision floating point compare instructions single precision floating point conversion and move instructions
single precision load and store instructions

f / floating-point-control-and-status-register

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.2 Floating-Point Control and Status Register

Operation Arguments Description
frcsr rd

The fcsr register can be read and written with the FRCSR and FSCSR instructions, which are assembler pseudoinstructions built on the underlying CSR access instructions. FRCSR reads fcsr by copying it into integer register rd. FSCSR swaps the value in fcsr by copying the original value into integer register rd, and then writing a new value obtained from integer register rs1 into fcsr.

frflags rd

The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags.

frrm rd

The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags.

fscsr rd, rs1

The fcsr register can be read and written with the FRCSR and FSCSR instructions, which are assembler pseudoinstructions built on the underlying CSR access instructions. FRCSR reads fcsr by copying it into integer register rd. FSCSR swaps the value in fcsr by copying the original value into integer register rd, and then writing a new value obtained from integer register rs1 into fcsr.

fsflags rd, rs1

The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags.

fsrm rd, rs1

The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags.

f / sec:single-float

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.9 Single-Precision Floating-Point Classify Instruction

Operation Arguments Description
fclass.s rd, rs1

The FCLASS.S instruction examines the value in floating-point register rs1 and writes to integer register rd a 10-bit mask that indicates the class of the floating-point number. The format of the mask is described in Table [tab:fclass] . The corresponding bit in rd will be set if the property is true and clear otherwise. All other bits in rd are cleared. Note that exactly one bit in rd will be set. FCLASS.S does not set the floating-point exception flags.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
WRITE_RD(f32_classify(FRS1_F));

f / sec:single-float-compute

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.6 Single-Precision Floating-Point Computational Instructions

Operation Arguments Description
fadd.s rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_F(f32_add(FRS1_F, FRS2_F));
set_fp_exceptions;
fdiv.s rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_F(f32_div(FRS1_F, FRS2_F));
set_fp_exceptions;
fmadd.s rd, rs1, rs2, rs3

FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_F(f32_mulAdd(FRS1_F, FRS2_F, FRS3_F));
set_fp_exceptions;
fmax.s rd, rs1, rs2

Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN.

Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
bool greater = f32_lt_quiet(FRS2_F, FRS1_F) ||
(f32_eq(FRS2_F, FRS1_F) && (FRS2_F.v & F32_SIGN));
if (isNaNF32UI(FRS1_F.v) && isNaNF32UI(FRS2_F.v))
WRITE_FRD_F(f32(defaultNaNF32UI));
else
WRITE_FRD_F((greater || isNaNF32UI(FRS2_F.v) ? FRS1_F : FRS2_F));
set_fp_exceptions;
fmin.s rd, rs1, rs2

Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN.

Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
bool less = f32_lt_quiet(FRS1_F, FRS2_F) ||
(f32_eq(FRS1_F, FRS2_F) && (FRS1_F.v & F32_SIGN));
if (isNaNF32UI(FRS1_F.v) && isNaNF32UI(FRS2_F.v))
WRITE_FRD_F(f32(defaultNaNF32UI));
else
WRITE_FRD_F((less || isNaNF32UI(FRS2_F.v) ? FRS1_F : FRS2_F));
set_fp_exceptions;
fmsub.s rd, rs1, rs2, rs3

FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_F(f32_mulAdd(FRS1_F, FRS2_F, f32(FRS3_F.v ^ F32_SIGN)));
set_fp_exceptions;
fmul.s rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_F(f32_mul(FRS1_F, FRS2_F));
set_fp_exceptions;
fnmadd.s rd, rs1, rs2, rs3

FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_F(f32_mulAdd(f32(FRS1_F.v ^ F32_SIGN), FRS2_F, f32(FRS3_F.v ^ F32_SIGN)));
set_fp_exceptions;
fnmsub.s rd, rs1, rs2, rs3

FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_F(f32_mulAdd(f32(FRS1_F.v ^ F32_SIGN), FRS2_F, FRS3_F));
set_fp_exceptions;
fsqrt.s rd, rs1

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_F(f32_sqrt(FRS1_F));
set_fp_exceptions;
fsub.s rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD_F(f32_sub(FRS1_F, FRS2_F));
set_fp_exceptions;

f / single-precision-floating-point-compare-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.8 Single-Precision Floating-Point Compare Instructions

Operation Arguments Description
feq.s rd, rs1, rs2

Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.

FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
WRITE_RD(f32_eq(FRS1_F, FRS2_F));
set_fp_exceptions;
fle.s rd, rs1, rs2

Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.

FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
WRITE_RD(f32_le(FRS1_F, FRS2_F));
set_fp_exceptions;
flt.s rd, rs1, rs2

Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.

FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
WRITE_RD(f32_lt(FRS1_F, FRS2_F));
set_fp_exceptions;

f / single-precision-floating-point-conversion-and-move-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.7 Single-Precision Floating-Point Conversion and Move Instructions

Operation Arguments Description
fabs.s rd, rs

Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).

Psuedo Opcode, Equivalent Operations:
fsgnjx.s rd, rs, rs

fcvt.l.s rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.

FCVT.L.S

fcvt.lu.s rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.

FCVT.LU.S

fcvt.s.l rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.

fcvt.s.lu rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.

fcvt.s.w rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.

All floating-point to integer and integer to floating-point conversion instructions round according to the rm field. A floating-point register can be initialized to floating-point positive zero using FCVT.S.W rd, x0, which will never set any exception flags.

fcvt.s.wu rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.

fcvt.w.s rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.

FCVT.W.S

fcvt.wu.s rd, rs1

Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.

FCVT.WU.S

fmv.s rd, rs

Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).

The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.

Psuedo Opcode, Equivalent Operations:
fsgnj.s rd, rs, rs

fmv.w.x rd, rs1

FMV.W.X moves the single-precision value encoded in IEEE 754-2008 standard encoding from the lower 32 bits of integer register rs1 to the floating-point register rd. The bits are not modified in the transfer, and in particular, the payloads of non-canonical NaNs are preserved.

The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.

fmv.x.s rd, rs1

The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.

fmv.x.w rd, rs1

Instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.W moves the single-precision value in floating-point register rs1 represented in IEEE 754-2008 encoding to the lower 32 bits of integer register rd. The bits are not modified in the transfer, and in particular, the payloads of non-canonical NaNs are preserved. For RV64, the higher 32 bits of the destination register are filled with copies of the floating-point number's sign bit.

The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.

fneg.s rd, rs

Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).

Psuedo Opcode, Equivalent Operations:
fsgnjn.s rd, rs, rs

fsgnj.s rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
WRITE_FRD_F(fsgnj32(freg(FRS1_F), freg(FRS2_F), false, false));
fsgnjn.s rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
WRITE_FRD_F(fsgnj32(freg(FRS1_F), freg(FRS2_F), true, false));
fsgnjx.s rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).

Spike ISS Implementation:
require_either_extension('F', EXT_ZFINX);
require_fp;
WRITE_FRD_F(fsgnj32(freg(FRS1_F), freg(FRS2_F), false, true));
neg rd, rs

The sign-injection instructions provide floating-point MV, ABS, and NEG, as well as supporting a few other operations, including the IEEE copySign operation and sign manipulation in transcendental math function libraries. Although MV, ABS, and NEG only need a single register operand, whereas FSGNJ instructions need two, it is unlikely most microarchitectures would add optimizations to benefit from the reduced number of register reads for these relatively infrequent instructions. Even in this case, a microarchitecture can simply detect when both source registers are the same for FSGNJ instructions and only read a single copy.

Psuedo Opcode, Equivalent Operations:
sub rd, x0, rs

f / single-precision-load-and-store-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.5 Single-Precision Load and Store Instructions

Operation Arguments Description
flw rd, rs1, imm12

Floating-point loads and stores use the same base+offset addressing mode as the integer base ISAs, with a base address in register rs1 and a 12-bit signed byte offset. The FLW instruction loads a single-precision floating-point value from memory into floating-point register rd. FSW stores a single-precision value from floating-point register rs2 to memory.

FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned.

FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:
require_extension('F');
require_fp;
WRITE_FRD(f32(MMU.load<uint32_t>(RS1 + insn.i_imm())));
fsw rs1, rs2, imm12

Floating-point loads and stores use the same base+offset addressing mode as the integer base ISAs, with a base address in register rs1 and a 12-bit signed byte offset. The FLW instruction loads a single-precision floating-point value from memory into floating-point register rd. FSW stores a single-precision value from floating-point register rs2 to memory.

FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned.

FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:
require_extension('F');
require_fp;
MMU.store<uint32_t>(RS1 + insn.s_imm(), FRS2.v[0]);

m

division operations multiplication operations

m / division-operations

8 “M” Standard Extension for Integer Multiplication and Division, Version 2.0 / 8.2 Division Operations

Operation Arguments Description
div rd, rs1, rs2

DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend.

If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U] rdr, rs1, rs2 (rdq cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides.

DIV[W]

Spike ISS Implementation:
require_extension('M');
sreg_t lhs = sext_xlen(RS1);
sreg_t rhs = sext_xlen(RS2);
if (rhs == 0)
WRITE_RD(UINT64_MAX);
else if (lhs == INT64_MIN && rhs == -1)
WRITE_RD(lhs);
else
WRITE_RD(sext_xlen(lhs / rhs));
divu rd, rs1, rs2

DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend.

DIVU[W]

Spike ISS Implementation:
require_extension('M');
reg_t lhs = zext_xlen(RS1);
reg_t rhs = zext_xlen(RS2);
if (rhs == 0)
WRITE_RD(UINT64_MAX);
else
WRITE_RD(sext_xlen(lhs / rhs));
divuw rd, rs1, rs2

DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero.

Spike ISS Implementation:
require_extension('M');
require_rv64;
reg_t lhs = zext32(RS1);
reg_t rhs = zext32(RS2);
if (rhs == 0)
WRITE_RD(UINT64_MAX);
else
WRITE_RD(sext32(lhs / rhs));
divw rd, rs1, rs2

DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero.

Spike ISS Implementation:
require_extension('M');
require_rv64;
sreg_t lhs = sext32(RS1);
sreg_t rhs = sext32(RS2);
if (rhs == 0)
WRITE_RD(UINT64_MAX);
else
WRITE_RD(sext32(lhs / rhs));
rem rd, rs1, rs2

DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend.

If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U] rdr, rs1, rs2 (rdq cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides.

REM[W]

Spike ISS Implementation:
require_extension('M');
sreg_t lhs = sext_xlen(RS1);
sreg_t rhs = sext_xlen(RS2);
if (rhs == 0)
WRITE_RD(lhs);
else if (lhs == INT64_MIN && rhs == -1)
WRITE_RD(0);
else
WRITE_RD(sext_xlen(lhs % rhs));
remu rd, rs1, rs2

DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend.

REMU[W]

Spike ISS Implementation:
require_extension('M');
reg_t lhs = zext_xlen(RS1);
reg_t rhs = zext_xlen(RS2);
if (rhs == 0)
WRITE_RD(sext_xlen(RS1));
else
WRITE_RD(sext_xlen(lhs % rhs));
remuw rd, rs1, rs2

DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero.

Spike ISS Implementation:
require_extension('M');
require_rv64;
reg_t lhs = zext32(RS1);
reg_t rhs = zext32(RS2);
if (rhs == 0)
WRITE_RD(sext32(lhs));
else
WRITE_RD(sext32(lhs % rhs));
remw rd, rs1, rs2

DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero.

Spike ISS Implementation:
require_extension('M');
require_rv64;
sreg_t lhs = sext32(RS1);
sreg_t rhs = sext32(RS2);
if (rhs == 0)
WRITE_RD(lhs);
else
WRITE_RD(sext32(lhs % rhs));

m / multiplication-operations

8 “M” Standard Extension for Integer Multiplication and Division, Version 2.0 / 8.1 Multiplication Operations

Operation Arguments Description
mul rd, rs1, rs2

MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.

In RV64, MUL can be used to obtain the upper 32 bits of the 64-bit product, but signed arguments must be proper 32-bit signed values, whereas unsigned arguments must have their upper 32 bits clear. If the arguments are not known to be sign- or zero-extended, an alternative is to shift both arguments left by 32 bits, then use MULH[[S]U].

Spike ISS Implementation:
require_either_extension('M', EXT_ZMMUL);
WRITE_RD(sext_xlen(RS1 * RS2));
mulh rd, rs1, rs2

MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.

In RV64, MUL can be used to obtain the upper 32 bits of the 64-bit product, but signed arguments must be proper 32-bit signed values, whereas unsigned arguments must have their upper 32 bits clear. If the arguments are not known to be sign- or zero-extended, an alternative is to shift both arguments left by 32 bits, then use MULH[[S]U].

Spike ISS Implementation:
require_either_extension('M', EXT_ZMMUL);
if (xlen == 64)
WRITE_RD(mulh(RS1, RS2));
else
WRITE_RD(sext32((sext32(RS1) * sext32(RS2)) >> 32));
mulhsu rd, rs1, rs2

MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.

MULHSU is used in multi-word signed multiplication to multiply the most-significant word of the multiplicand (which contains the sign bit) with the less-significant words of the multiplier (which are unsigned).

Spike ISS Implementation:
require_either_extension('M', EXT_ZMMUL);
if (xlen == 64)
WRITE_RD(mulhsu(RS1, RS2));
else
WRITE_RD(sext32((sext32(RS1) * reg_t((uint32_t)RS2)) >> 32));
mulhu rd, rs1, rs2

MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.

Spike ISS Implementation:
require_either_extension('M', EXT_ZMMUL);
if (xlen == 64)
WRITE_RD(mulhu(RS1, RS2));
else
WRITE_RD(sext32(((uint64_t)(uint32_t)RS1 * (uint64_t)(uint32_t)RS2) >> 32));
mulw rd, rs1, rs2

MULW is an RV64 instruction that multiplies the lower 32 bits of the source registers, placing the sign-extension of the lower 32 bits of the result into the destination register.

Spike ISS Implementation:
require_either_extension('M', EXT_ZMMUL);
require_rv64;
WRITE_RD(sext32(RS1 * RS2));

q

q standard extension for quad precision floating point version 2.2 quad precision convert and move instructions quad precision load and store instructions sec:single float compute single precision floating point compare instructions

q / q-standard-extension-for-quad-precision-floating-point-version-2.2

14 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / 14.5 Quad-Precision Floating-Point Classify Instruction

Operation Arguments Description
fclass.q rd, rs1

The quad-precision floating-point classify instruction, FCLASS.Q, is defined analogously to its double-precision counterpart, but operates on quad-precision operands.

Spike ISS Implementation:
require_extension('Q');
require_fp;
WRITE_RD(f128_classify(f128(FRS1)));

q / quad-precision-convert-and-move-instructions

14 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / 14.3 Quad-Precision Convert and Move Instructions

Operation Arguments Description
fcvt.d.q rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.

fcvt.l.q rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.

fcvt.lu.q rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.

fcvt.q.d rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.

fcvt.q.l rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.

fcvt.q.lu rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.

fcvt.q.s rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.

fcvt.q.w rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.

fcvt.q.wu rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.

fcvt.s.q rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.

fcvt.w.q rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.

fcvt.wu.q rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.

fsgnj.q rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction.

Spike ISS Implementation:
require_extension('Q');
require_fp;
WRITE_FRD(fsgnj128(FRS1, FRS2, false, false));
fsgnjn.q rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction.

Spike ISS Implementation:
require_extension('Q');
require_fp;
WRITE_FRD(fsgnj128(FRS1, FRS2, true, false));
fsgnjx.q rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction.

Spike ISS Implementation:
require_extension('Q');
require_fp;
WRITE_FRD(fsgnj128(FRS1, FRS2, false, true));

q / quad-precision-load-and-store-instructions

14 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / 14.1 Quad-Precision Load and Store Instructions

Operation Arguments Description
flq rd, rs1, imm12

FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128.

FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:
require_extension('Q');
require_fp;
WRITE_FRD(MMU.load_float128(RS1 + insn.i_imm()));
fsq rs1, rs2, imm12

FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128.

FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:
require_extension('Q');
require_fp;
MMU.store_float128(RS1 + insn.s_imm(), FRS2);

q / sec:single-float-compute

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.6 Single-Precision Floating-Point Computational Instructions

Operation Arguments Description
fadd.q rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_extension('Q');
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD(f128_add(f128(FRS1), f128(FRS2)));
set_fp_exceptions;
fdiv.q rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_extension('Q');
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD(f128_div(f128(FRS1), f128(FRS2)));
set_fp_exceptions;
fmadd.q rd, rs1, rs2, rs3

FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3.

Spike ISS Implementation:
require_extension('Q');
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD(f128_mulAdd(f128(FRS1), f128(FRS2), f128(FRS3)));
set_fp_exceptions;
fmax.q rd, rs1, rs2

Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN.

Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs.

Spike ISS Implementation:
require_extension('Q');
require_fp;
bool greater = f128_lt_quiet(f128(FRS2), f128(FRS1)) ||
(f128_eq(f128(FRS2), f128(FRS1)) && (f128(FRS2).v[1] & F64_SIGN));
if (isNaNF128(f128(FRS1)) && isNaNF128(f128(FRS2)))
WRITE_FRD(f128(defaultNaNF128()));
else
WRITE_FRD(greater || isNaNF128(f128(FRS2)) ? FRS1 : FRS2);
set_fp_exceptions;
fmin.q rd, rs1, rs2

Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN.

Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs.

Spike ISS Implementation:
require_extension('Q');
require_fp;
bool less = f128_lt_quiet(f128(FRS1), f128(FRS2)) ||
(f128_eq(f128(FRS1), f128(FRS2)) && (f128(FRS1).v[1] & F64_SIGN));
if (isNaNF128(f128(FRS1)) && isNaNF128(f128(FRS2)))
WRITE_FRD(f128(defaultNaNF128()));
else
WRITE_FRD(less || isNaNF128(f128(FRS2)) ? FRS1 : FRS2);
set_fp_exceptions;
fmsub.q rd, rs1, rs2, rs3

FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3.

Spike ISS Implementation:
require_extension('Q');
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD(f128_mulAdd(f128(FRS1), f128(FRS2), f128_negate(f128(FRS3))));
set_fp_exceptions;
fmul.q rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_extension('Q');
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD(f128_mul(f128(FRS1), f128(FRS2)));
set_fp_exceptions;
fnmadd.q rd, rs1, rs2, rs3

FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3.

Spike ISS Implementation:
require_extension('Q');
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD(f128_mulAdd(f128_negate(f128(FRS1)), f128(FRS2), f128_negate(f128(FRS3))));
set_fp_exceptions;
fnmsub.q rd, rs1, rs2, rs3

FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3.

Spike ISS Implementation:
require_extension('Q');
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD(f128_mulAdd(f128_negate(f128(FRS1)), f128(FRS2), f128(FRS3)));
set_fp_exceptions;
fsqrt.q rd, rs1

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_extension('Q');
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD(f128_sqrt(f128(FRS1)));
set_fp_exceptions;
fsub.q rd, rs1, rs2

Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd.

Spike ISS Implementation:
require_extension('Q');
require_fp;
softfloat_roundingMode = RM;
WRITE_FRD(f128_sub(f128(FRS1), f128(FRS2)));
set_fp_exceptions;

q / single-precision-floating-point-compare-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.8 Single-Precision Floating-Point Compare Instructions

Operation Arguments Description
feq.q rd, rs1, rs2

Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.

FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.

Spike ISS Implementation:
require_extension('Q');
require_fp;
WRITE_RD(f128_eq(f128(FRS1), f128(FRS2)));
set_fp_exceptions;
fle.q rd, rs1, rs2

Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.

FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.

Spike ISS Implementation:
require_extension('Q');
require_fp;
WRITE_RD(f128_le(f128(FRS1), f128(FRS2)));
set_fp_exceptions;
flt.q rd, rs1, rs2

Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise.

FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN.

Spike ISS Implementation:
require_extension('Q');
require_fp;
WRITE_RD(f128_lt(f128(FRS1), f128(FRS2)));
set_fp_exceptions;

v

introduction narrowing floating pointinteger type convert instructions single width floating pointinteger type convert instructions state of vector extension at reset unit stride fault only first loads
vector bitwise logical instructions vector compress instruction vector count population in mask vcpop m vector element index instruction vector floating point classify instruction
vector floating point compare instructions vector floating point merge instruction vector floating point minmax instructions vector floating point move instruction vector floating point reciprocal estimate instruction
vector floating point reciprocal square root estimate instruction vector floating point sign injection instructions vector floating point square root instruction vector indexed instructions vector instruction formats
vector instruction listing vector integer add with carry subtract with borrow instructions vector integer compare instructions vector integer divide instructions vector integer merge instructions
vector integer minmax instructions vector integer move instructions vector iota instruction vector loadstore whole register instructions vector narrowing fixed point clip instructions
vector register gather instructions vector register grouping vlmul20 vector single width averaging add and subtract vector single width floating point addsubtract instructions vector single width floating point fused multiply add instructions
vector single width floating point multiplydivide instructions vector single width fractional multiply with rounding and saturation vector single width integer add and subtract vector single width integer multiply add instructions vector single width integer multiply instructions
vector single width saturating add and subtract vector single width scaling shift instructions vector single width shift instructions vector slide1down instruction vector slide1up
vector slide instructions vector slidedown instructions vector strided instructions vector unit stride instructions vector unordered single width floating point sum reduction
vector widening floating point addsubtract instructions vector widening floating point fused multiply add instructions vector widening floating point multiply vector widening integer addsubtract vector widening integer multiply add instructions
vector widening integer multiply instructions vfirst find first set mask bit vmsif m set including first mask bit vmsof m set only first mask bit widening floating pointinteger type convert instructions
zve vector extensions for embedded processors sec agnostic sec mask register logical sec narrowing sec vec operands
sec vector float reduce sec vector float reduce widen sec vector integer reduce sec vector integer reduce widen

v / _introduction

  1. Introduction /
Operation Arguments Description
vamoaddei16.v vs2, rs1, vd Spike ISS Implementation:
//vamoadde.v vd, (rs1), vs2, vd
VI_AMO({ return lhs + vs3; }, uint, e16);
vamoaddei32.v vs2, rs1, vd Spike ISS Implementation:
//vamoadde.v vd, (rs1), vs2, vd
VI_AMO({ return lhs + vs3; }, uint, e32);
vamoaddei64.v vs2, rs1, vd Spike ISS Implementation:
//vamoadde.v vd, (rs1), vs2, vd
VI_AMO({ return lhs + vs3; }, uint, e64);
vamoaddei8.v vs2, rs1, vd Spike ISS Implementation:
//vamoadde.v vd, (rs1), vs2, vd
VI_AMO({ return lhs + vs3; }, uint, e8);
vamoandei16.v vs2, rs1, vd Spike ISS Implementation:
//vamoande.v vd, (rs1), vs2, vd
VI_AMO({ return lhs & vs3; }, uint, e16);
vamoandei32.v vs2, rs1, vd Spike ISS Implementation:
//vamoande.v vd, (rs1), vs2, vd
VI_AMO({ return lhs & vs3; }, uint, e32);
vamoandei64.v vs2, rs1, vd Spike ISS Implementation:
//vamoande.v vd, (rs1), vs2, vd
VI_AMO({ return lhs & vs3; }, uint, e64);
vamoandei8.v vs2, rs1, vd Spike ISS Implementation:
//vamoande.v vd, (rs1), vs2, vd
VI_AMO({ return lhs & vs3; }, uint, e8);
vamomaxei16.v vs2, rs1, vd Spike ISS Implementation:
//vamomaxe.v vd, (rs1), vs2, vd
VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e16);
vamomaxei32.v vs2, rs1, vd Spike ISS Implementation:
//vamomaxe.v vd, (rs1), vs2, vd
VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e32);
vamomaxei64.v vs2, rs1, vd Spike ISS Implementation:
//vamomaxe.v vd, (rs1), vs2, vd
VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e64);
vamomaxei8.v vs2, rs1, vd Spike ISS Implementation:
//vamomaxe.v vd, (rs1), vs2, vd
VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e8);
vamomaxuei16.v vs2, rs1, vd Spike ISS Implementation:
//vamomaxue.v vd, (rs1), vs2, vd
VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e16);
vamomaxuei32.v vs2, rs1, vd Spike ISS Implementation:
//vamomaxue.v vd, (rs1), vs2, vd
VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e32);
vamomaxuei64.v vs2, rs1, vd Spike ISS Implementation:
//vamomaxue.v vd, (rs1), vs2, vd
VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e64);
vamomaxuei8.v vs2, rs1, vd Spike ISS Implementation:
//vamomaxue.v vd, (rs1), vs2, vd
VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e8);
vamominei16.v vs2, rs1, vd Spike ISS Implementation:
//vamomine.v vd, (rs1), vs2, vd
VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e16);
vamominei32.v vs2, rs1, vd Spike ISS Implementation:
//vamomine.v vd, (rs1), vs2, vd
VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e32);
vamominei64.v vs2, rs1, vd Spike ISS Implementation:
//vamomine.v vd, (rs1), vs2, vd
VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e64);
vamominei8.v vs2, rs1, vd Spike ISS Implementation:
//vamomine.v vd, (rs1), vs2, vd
VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e8);
vamominuei16.v vs2, rs1, vd Spike ISS Implementation:
//vamominue.v vd, (rs1), vs2, vd
VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e16);
vamominuei32.v vs2, rs1, vd Spike ISS Implementation:
//vamominue.v vd, (rs1), vs2, vd
VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e32);
vamominuei64.v vs2, rs1, vd Spike ISS Implementation:
//vamominue.v vd, (rs1), vs2, vd
VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e64);
vamominuei8.v vs2, rs1, vd Spike ISS Implementation:
//vamominue.v vd, (rs1), vs2, vd
VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e8);
vamoorei16.v vs2, rs1, vd Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd
VI_AMO({ return lhs | vs3; }, uint, e16);
vamoorei32.v vs2, rs1, vd Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd
VI_AMO({ return lhs | vs3; }, uint, e32);
vamoorei64.v vs2, rs1, vd Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd
VI_AMO({ return lhs | vs3; }, uint, e64);
vamoorei8.v vs2, rs1, vd Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd
VI_AMO({ return lhs | vs3; }, uint, e8);
vamoswapei16.v vs2, rs1, vd Spike ISS Implementation:
//vamoswape.v vd, (rs1), vs2, vd
VI_AMO({ return vs3; }, uint, e16);
vamoswapei32.v vs2, rs1, vd Spike ISS Implementation:
//vamoswape.v vd, (rs1), vs2, vd
VI_AMO({ return vs3; }, uint, e32);
vamoswapei64.v vs2, rs1, vd Spike ISS Implementation:
//vamoswape.v vd, (rs1), vs2, vd
VI_AMO({ return vs3; }, uint, e64);
vamoswapei8.v vs2, rs1, vd Spike ISS Implementation:
//vamoswape.v vd, (rs1), vs2, vd
VI_AMO({ return vs3; }, uint, e8);
vamoxorei16.v vs2, rs1, vd Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd
VI_AMO({ return lhs ^ vs3; }, uint, e16);
vamoxorei32.v vs2, rs1, vd Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd
VI_AMO({ return lhs ^ vs3; }, uint, e32);
vamoxorei64.v vs2, rs1, vd Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd
VI_AMO({ return lhs ^ vs3; }, uint, e64);
vamoxorei8.v vs2, rs1, vd Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd
VI_AMO({ return lhs ^ vs3; }, uint, e8);
vl1re16.v rs1, vd Spike ISS Implementation:
// vl1re16.v vd, (rs1)
VI_LD_WHOLE(uint16);
vl1re32.v rs1, vd Spike ISS Implementation:
// vl1re32.v vd, (rs1)
VI_LD_WHOLE(uint32);
vl1re64.v rs1, vd Spike ISS Implementation:
// vl1re64.v vd, (rs1)
VI_LD_WHOLE(uint64);
vl2re16.v rs1, vd Spike ISS Implementation:
// vl2e16.v vd, (rs1)
VI_LD_WHOLE(uint16);
vl2re32.v rs1, vd Spike ISS Implementation:
// vl2re32.v vd, (rs1)
VI_LD_WHOLE(uint32);
vl2re64.v rs1, vd Spike ISS Implementation:
// vl2re64.v vd, (rs1)
VI_LD_WHOLE(uint64);
vl2re8.v rs1, vd Spike ISS Implementation:
// vl2re8.v vd, (rs1)
VI_LD_WHOLE(uint8);
vl4re16.v rs1, vd Spike ISS Implementation:
// vl4re16.v vd, (rs1)
VI_LD_WHOLE(uint16);
vl4re32.v rs1, vd Spike ISS Implementation:
// vl4re32.v vd, (rs1)
VI_LD_WHOLE(uint32);
vl4re64.v rs1, vd Spike ISS Implementation:
// vl4re64.v vd, (rs1)
VI_LD_WHOLE(uint64);
vl4re8.v rs1, vd Spike ISS Implementation:
// vl4re8.v vd, (rs1)
VI_LD_WHOLE(uint8);
vl8re16.v rs1, vd Spike ISS Implementation:
// vl8re16.v vd, (rs1)
VI_LD_WHOLE(uint16);
vl8re32.v rs1, vd Spike ISS Implementation:
// vl8re32.v vd, (rs1)
VI_LD_WHOLE(uint32);
vl8re64.v rs1, vd Spike ISS Implementation:
// vl8re64.v vd, (rs1)
VI_LD_WHOLE(uint64);
vl8re8.v rs1, vd Spike ISS Implementation:
// vl8re8.v vd, (rs1)
VI_LD_WHOLE(uint8);
vle1024.v rs1, vd
vle1024ff.v rs1, vd
vle128.v rs1, vd
vle128ff.v rs1, vd
vle16.v rs1, vd Spike ISS Implementation:
// vle16.v and vlseg[2-8]e16.v
VI_LD(0, (i * nf + fn), int16, false);
vle16ff.v rs1, vd Spike ISS Implementation:
// vle16ff.v and vlseg[2-8]e16ff.v
VI_LDST_FF(int16);
vle256.v rs1, vd
vle256ff.v rs1, vd
vle32.v rs1, vd Spike ISS Implementation:
// vle32.v and vlseg[2-8]e32.v
VI_LD(0, (i * nf + fn), int32, false);
vle32ff.v rs1, vd Spike ISS Implementation:
// vle32ff.v and vlseg[2-8]e32ff.v
VI_LDST_FF(int32);
vle512.v rs1, vd
vle512ff.v rs1, vd
vle64.v rs1, vd Spike ISS Implementation:
// vle64.v and vlseg[2-8]e64.v
VI_LD(0, (i * nf + fn), int64, false);
vle64ff.v rs1, vd Spike ISS Implementation:
// vle64ff.v and vlseg[2-8]e64ff.v
VI_LDST_FF(int64);
vloxei1024.v vs2, rs1, vd
vloxei128.v vs2, rs1, vd
vloxei16.v vs2, rs1, vd Spike ISS Implementation:
// vlxei16.v and vlxseg[2-8]e16.v
VI_LD_INDEX(e16, true);
vloxei256.v vs2, rs1, vd
vloxei32.v vs2, rs1, vd Spike ISS Implementation:
// vlxe32.v and vlxseg[2-8]ei32.v
VI_LD_INDEX(e32, true);
vloxei512.v vs2, rs1, vd
vloxei64.v vs2, rs1, vd Spike ISS Implementation:
// vlxei64.v and vlxseg[2-8]ei64.v
VI_LD_INDEX(e64, true);

vloxei8.v vs2, rs1, vd Spike ISS Implementation:
// vlxei8.v and vlxseg[2-8]ei8.v
VI_LD_INDEX(e8, true);
vlse1024.v rs2, rs1, vd
vlse128.v rs2, rs1, vd
vlse16.v rs2, rs1, vd Spike ISS Implementation:
// vlse16.v and vlsseg[2-8]e16.v
VI_LD(i * RS2, fn, int16, false);
vlse256.v rs2, rs1, vd
vlse32.v rs2, rs1, vd Spike ISS Implementation:
// vlse32.v and vlsseg[2-8]e32.v
VI_LD(i * RS2, fn, int32, false);
vlse512.v rs2, rs1, vd
vlse64.v rs2, rs1, vd Spike ISS Implementation:
// vlse64.v and vlsseg[2-8]e64.v
VI_LD(i * RS2, fn, int64, false);
vluxei1024.v vs2, rs1, vd
vluxei128.v vs2, rs1, vd
vluxei16.v vs2, rs1, vd Spike ISS Implementation:
// vlxei16.v and vlxseg[2-8]e16.v
VI_LD_INDEX(e16, true);
vluxei256.v vs2, rs1, vd
vluxei32.v vs2, rs1, vd Spike ISS Implementation:
// vlxe32.v and vlxseg[2-8]ei32.v
VI_LD_INDEX(e32, true);
vluxei512.v vs2, rs1, vd
vluxei64.v vs2, rs1, vd Spike ISS Implementation:
// vlxei64.v and vlxseg[2-8]ei64.v
VI_LD_INDEX(e64, true);

vmv1r.v vs2, vd Spike ISS Implementation:
// vmv1r.v vd, vs2
#include "vmvnfr_v.h"
vmv2r.v vs2, vd Spike ISS Implementation:
// vmv2r.v vd, vs2
#include "vmvnfr_v.h"
vmv4r.v vs2, vd Spike ISS Implementation:
// vmv4r.v vd, vs2
#include "vmvnfr_v.h"
vmv8r.v vs2, vd Spike ISS Implementation:
// vmv8r.v vd, vs2
#include "vmvnfr_v.h"
vs1r.v rs1, vs3 Spike ISS Implementation:
// vs1r.v vs3, (rs1)
VI_ST_WHOLE
vs2r.v rs1, vs3 Spike ISS Implementation:
// vs2r.v vs3, (rs1)
VI_ST_WHOLE
vs4r.v rs1, vs3 Spike ISS Implementation:
// vs4r.v vs3, (rs1)
VI_ST_WHOLE
vs8r.v rs1, vs3 Spike ISS Implementation:
// vs8r.v vs3, (rs1)
VI_ST_WHOLE
vse1024.v rs1, vs3
vse128.v rs1, vs3
vse16.v rs1, vs3 Spike ISS Implementation:
// vse16.v and vsseg[2-8]e16.v
VI_ST(0, (i * nf + fn), uint16, false);
vse256.v rs1, vs3
vse32.v rs1, vs3 Spike ISS Implementation:
// vse32.v and vsseg[2-8]e32.v
VI_ST(0, (i * nf + fn), uint32, false);
vse512.v rs1, vs3
vse64.v rs1, vs3 Spike ISS Implementation:
// vse64.v and vsseg[2-8]e64.v
VI_ST(0, (i * nf + fn), uint64, false);
vse8.v rs1, vs3 Spike ISS Implementation:
// vse8.v and vsseg[2-8]e8.v
VI_ST(0, (i * nf + fn), uint8, false);
vsm.v rs1, vs3 Spike ISS Implementation:
// vse1.v
VI_ST(0, (i * nf + fn), uint8, true);
vsoxei1024.v vs2, rs1, vs3
vsoxei128.v vs2, rs1, vs3
vsoxei16.v vs2, rs1, vs3 Spike ISS Implementation:
// vsxei16.v and vsxseg[2-8]ei16.v
VI_ST_INDEX(e16, true);
vsoxei256.v vs2, rs1, vs3
vsoxei32.v vs2, rs1, vs3 Spike ISS Implementation:
// vsxei32.v and vsxseg[2-8]ei32.v
VI_ST_INDEX(e32, true);
vsoxei512.v vs2, rs1, vs3
vsoxei64.v vs2, rs1, vs3 Spike ISS Implementation:
// vsxei64.v and vsxseg[2-8]ei64.v
VI_ST_INDEX(e64, true);
vsoxei8.v vs2, rs1, vs3 Spike ISS Implementation:
// vsxei8.v and vsxseg[2-8]ei8.v
VI_ST_INDEX(e8, true);
vsse1024.v rs2, rs1, vs3
vsse128.v rs2, rs1, vs3
vsse16.v rs2, rs1, vs3 Spike ISS Implementation:
// vsse16v and vssseg[2-8]e16.v
VI_ST(i * RS2, fn, uint16, false);
vsse256.v rs2, rs1, vs3
vsse32.v rs2, rs1, vs3 Spike ISS Implementation:
// vsse32.v and vssseg[2-8]e32.v
VI_ST(i * RS2, fn, uint32, false);
vsse512.v rs2, rs1, vs3
vsse64.v rs2, rs1, vs3 Spike ISS Implementation:
// vsse64.v and vssseg[2-8]e64.v
VI_ST(i * RS2, fn, uint64, false);
vsse8.v rs2, rs1, vs3 Spike ISS Implementation:
// vsse8.v and vssseg[2-8]e8.v
VI_ST(i * RS2, fn, uint8, false);
vsuxei1024.v vs2, rs1, vs3
vsuxei128.v vs2, rs1, vs3
vsuxei16.v vs2, rs1, vs3 Spike ISS Implementation:
// vsuxe16.v
VI_ST_INDEX(e16, true);
vsuxei256.v vs2, rs1, vs3
vsuxei32.v vs2, rs1, vs3 Spike ISS Implementation:
// vsuxe32.v
VI_ST_INDEX(e32, true);
vsuxei512.v vs2, rs1, vs3
vsuxei64.v vs2, rs1, vs3 Spike ISS Implementation:
// vsuxe64.v
VI_ST_INDEX(e64, true);
vsuxei8.v vs2, rs1, vs3 Spike ISS Implementation:
// vsuxe8.v
VI_ST_INDEX(e8, true);

v / _narrowing_floating_pointinteger_type_convert_instructions

  1. Vector Floating-Point Instructions / 14.19. Narrowing Floating-Point/Integer Type-Convert Instructions
Operation Arguments Description
vfncvt.f.f.w vs2, vd

vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.

vfncvt.f.x.w vs2, vd

vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.

vfncvt.f.xu.w vs2, vd

vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.

vfncvt.rod.f.f.w vs2, vd

vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.

vfncvt.rtz.x.f.w vs2, vd

vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.

vfncvt.rtz.xu.f.w vs2, vd

vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.

vfncvt.x.f.w vs2, vd

vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.

vfncvt.xu.f.w vs2, vd

vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.

v / _single_width_floating_pointinteger_type_convert_instructions

  1. Vector Floating-Point Instructions / 14.17. Single-Width Floating-Point/Integer Type-Convert Instructions
Operation Arguments Description
vfcvt.f.x.v vs2, vd

vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.

vfcvt.f.xu.v vs2, vd

vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.

vfcvt.rtz.x.f.v vs2, vd

vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.

vfcvt.rtz.xu.f.v vs2, vd

vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.

vfcvt.x.f.v vs2, vd

vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.

vfcvt.xu.f.v vs2, vd

vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.

v / _state_of_vector_extension_at_reset

  1. Vector Extension Programmer’s Model / 4.11. State of Vector Extension at Reset
Operation Arguments Description
vsetvl rs2, rs1, rd

The vector extension must have a consistent state at reset. In particular, vtype and vl must have values that can be read and then restored with a single vsetvl instruction.

Spike ISS Implementation:
require_vector_novtype(false);
WRITE_RD(P.VU.set_vl(insn.rd(), insn.rs1(), RS1, RS2));

v / _unit_stride_fault_only_first_loads

  1. Vector Loads and Stores / 8.7. Unit-stride Fault-Only-First Loads
Operation Arguments Description
vle8ff.v rs1, vd

# Vector unit-stride fault-only-first loads # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) vle8ff.v vd, (rs1), vm # 8-bit unit-stride fault-only-first load vle16ff.v vd, (rs1), vm # 16-bit unit-stride fault-only-first load vle32ff.v vd, (rs1), vm # 32-bit unit-stride fault-only-first load vle64ff.v vd, (rs1), vm # 64-bit unit-stride fault-only-first load

Spike ISS Implementation:
// vle8ff.v and vlseg[2-8]e8ff.v
VI_LDST_FF(int8);

v / _vector_bitwise_logical_instructions

  1. Vector Integer Arithmetic Instructions / 12.5. Vector Bitwise Logical Instructions
Operation Arguments Description
vand.vi vs2, simm5, vd

# Bitwise logical operations. vand.vv vd, vs2, vs1, vm # Vector-vector vand.vx vd, vs2, rs1, vm # vector-scalar vand.vi vd, vs2, imm, vm # vector-immediate vor.vv vd, vs2, vs1, vm # Vector-vector vor.vx vd, vs2, rs1, vm # vector-scalar vor.vi vd, vs2, imm, vm # vector-immediate vxor.vv vd, vs2, vs1, vm # Vector-vector vxor.vx vd, vs2, rs1, vm # vector-scalar vxor.vi vd, vs2, imm, vm # vector-immediate

vand.vv vs2, vs1, vd

# Bitwise logical operations. vand.vv vd, vs2, vs1, vm # Vector-vector vand.vx vd, vs2, rs1, vm # vector-scalar vand.vi vd, vs2, imm, vm # vector-immediate vor.vv vd, vs2, vs1, vm # Vector-vector vor.vx vd, vs2, rs1, vm # vector-scalar vor.vi vd, vs2, imm, vm # vector-immediate vxor.vv vd, vs2, vs1, vm # Vector-vector vxor.vx vd, vs2, rs1, vm # vector-scalar vxor.vi vd, vs2, imm, vm # vector-immediate

vand.vx vs2, rs1, vd

# Bitwise logical operations. vand.vv vd, vs2, vs1, vm # Vector-vector vand.vx vd, vs2, rs1, vm # vector-scalar vand.vi vd, vs2, imm, vm # vector-immediate vor.vv vd, vs2, vs1, vm # Vector-vector vor.vx vd, vs2, rs1, vm # vector-scalar vor.vi vd, vs2, imm, vm # vector-immediate vxor.vv vd, vs2, vs1, vm # Vector-vector vxor.vx vd, vs2, rs1, vm # vector-scalar vxor.vi vd, vs2, imm, vm # vector-immediate

v / _vector_compress_instruction

  1. Vector Permutation Instructions / 17.5. Vector Compress Instruction
Operation Arguments Description
vcompress.vm vs2, vs1, vd

vcompress is encoded as an unmasked instruction (vm=1). The equivalent masked instruction (vm=0) is reserved.

A trap on a vcompress instruction is always reported with a vstart of 0. Executing a vcompress instruction with a non-zero vstart raises an illegal instruction exception.

vcompress.vm vd, vs2, vs1 # Compress into vd elements of vs2 where vs1 is enabled

Example use of vcompress instruction 8 7 6 5 4 3 2 1 0 Element number 1 1 0 1 0 0 1 0 1 v0 8 7 6 5 4 3 2 1 0 v1 1 2 3 4 5 6 7 8 9 v2 vcompress.vm v2, v1, v0 1 2 3 4 8 7 5 2 0 v2

v / _vector_count_population_in_mask_vcpop_m

  1. Vector Mask Instructions / 16.2. Vector count population in mask vcpop.m
Operation Arguments Description
vcpop.m vs2, rd

The vcpop.m instruction counts the number of mask elements of the active elements of the vector source mask register that have the value 1 and writes the result to a scalar x register.

The vcpop.m instruction writes x[rd] even if vl=0 (with the value 0, since no mask elements are active).

Traps on vcpop.m are always reported with a vstart of 0. The vcpop.m instruction will raise an illegal instruction exception if vstart is non-zero.

vcpop.m rd, vs2, vm

vcpop.m rd, vs2, v0.t # x[rd] = sum_i ( vs2.mask[i] && v0.mask[i] )

Spike ISS Implementation:
// vmpopc rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
reg_t vl = P.VU.vl->read();
reg_t rs2_num = insn.rs2();
require(P.VU.vstart->read() == 0);
reg_t popcount = 0;
for (reg_t i=P.VU.vstart->read(); i<vl; ++i) {
const int midx = i / 32;
const int mpos = i % 32;

bool vs2_lsb = ((P.VU.elt<uint32_t>(rs2_num, midx ) >> mpos) & 0x1) == 1;
if (insn.v_vm() == 1) {
popcount += vs2_lsb;
} else {
bool do_mask = (P.VU.elt<uint32_t>(0, midx) >> mpos) & 0x1;
popcount += (vs2_lsb && do_mask);
}
}
P.VU.vstart->write(0);
WRITE_RD(popcount);

v / _vector_element_index_instruction

  1. Vector Mask Instructions / 16.9. Vector Element Index Instruction
Operation Arguments Description
vid.v vd

The vid.v instruction writes each element's index to the destination vector register group, from 0 to vl-1.

vid.v vd, vm # Write element ID to destination.

Spike ISS Implementation:
// vmpopc rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
reg_t sew = P.VU.vsew;
reg_t rd_num = insn.rd();
require_align(rd_num, P.VU.vflmul);
require_vm;

for (reg_t i = P.VU.vstart->read() ; i < P.VU.vl->read(); ++i) {
VI_LOOP_ELEMENT_SKIP();

switch (sew) {
case e8:
P.VU.elt<uint8_t>(rd_num, i, true) = i;
break;
case e16:
P.VU.elt<uint16_t>(rd_num, i, true) = i;
break;
case e32:
P.VU.elt<uint32_t>(rd_num, i, true) = i;
break;
default:
P.VU.elt<uint64_t>(rd_num, i, true) = i;
break;
}
}

P.VU.vstart->write(0);

v / _vector_floating_point_classify_instruction

  1. Vector Floating-Point Instructions / 14.14. Vector Floating-Point Classify Instruction
Operation Arguments Description
vfclass.v vs2, vd

vfclass.v vd, vs2, vm # Vector-vector

Spike ISS Implementation:
// vfclass.v vd, vs2, vm
VI_VFP_V_LOOP
({
vd = f16(f16_classify(vs2));
},
{
vd = f32(f32_classify(vs2));
},
{
vd = f64(f64_classify(vs2));
})

v / _vector_floating_point_compare_instructions

  1. Vector Floating-Point Instructions / 14.13. Vector Floating-Point Compare Instructions
Operation Arguments Description
vmfeq.vf vs2, rs1, vd

The compare instructions follow the semantics of the scalar floating-point compare instructions. vmfeq and vmfne raise the invalid operation exception only on signaling NaN inputs. vmflt, vmfle, vmfgt, and vmfge raise the invalid operation exception on both signaling and quiet NaN inputs. vmfne writes 1 to the destination element when either operand is NaN, whereas the other compares write 0 when either operand is NaN.

# Compare equal vmfeq.vv vd, vs2, vs1, vm # Vector-vector vmfeq.vf vd, vs2, rs1, vm # vector-scalar # Compare not equal vmfne.vv vd, vs2, vs1, vm # Vector-vector vmfne.vf vd, vs2, rs1, vm # vector-scalar # Compare less than vmflt.vv vd, vs2, vs1, vm # Vector-vector vmflt.vf vd, vs2, rs1, vm # vector-scalar # Compare less than or equal vmfle.vv vd, vs2, vs1, vm # Vector-vector vmfle.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than vmfgt.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than or equal vmfge.vf vd, vs2, rs1, vm # vector-scalar

# Example of implementing isgreater() vmfeq.vv v0, va, va # Only set where A is not NaN. vmfeq.vv v1, vb, vb # Only set where B is not NaN. vmand.mm v0, v0, v1 # Only set where A and B are ordered, vmfgt.vv v0, va, vb, v0.t # so only set flags on ordered values.

vmfeq.vv vs2, vs1, vd

The compare instructions follow the semantics of the scalar floating-point compare instructions. vmfeq and vmfne raise the invalid operation exception only on signaling NaN inputs. vmflt, vmfle, vmfgt, and vmfge raise the invalid operation exception on both signaling and quiet NaN inputs. vmfne writes 1 to the destination element when either operand is NaN, whereas the other compares write 0 when either operand is NaN.

# Compare equal vmfeq.vv vd, vs2, vs1, vm # Vector-vector vmfeq.vf vd, vs2, rs1, vm # vector-scalar # Compare not equal vmfne.vv vd, vs2, vs1, vm # Vector-vector vmfne.vf vd, vs2, rs1, vm # vector-scalar # Compare less than vmflt.vv vd, vs2, vs1, vm # Vector-vector vmflt.vf vd, vs2, rs1, vm # vector-scalar # Compare less than or equal vmfle.vv vd, vs2, vs1, vm # Vector-vector vmfle.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than vmfgt.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than or equal vmfge.vf vd, vs2, rs1, vm # vector-scalar

# Example of implementing isgreater() vmfeq.vv v0, va, va # Only set where A is not NaN. vmfeq.vv v1, vb, vb # Only set where B is not NaN. vmand.mm v0, v0, v1 # Only set where A and B are ordered, vmfgt.vv v0, va, vb, v0.t # so only set flags on ordered values.

vmflt.vf vs2, rs1, vd

Comparison Assembler Mapping Assembler pseudoinstruction va < vb vmflt.vv vd, va, vb, vm va <= vb vmfle.vv vd, va, vb, vm va > vb vmflt.vv vd, vb, va, vm vmfgt.vv vd, va, vb, vm va >= vb vmfle.vv vd, vb, va, vm vmfge.vv vd, va, vb, vm va < f vmflt.vf vd, va, f, vm va <= f vmfle.vf vd, va, f, vm va > f vmfgt.vf vd, va, f, vm va >= f vmfge.vf vd, va, f, vm va, vb vector register groups f scalar floating-point register

vmflt.vv vs2, vs1, vd

Comparison Assembler Mapping Assembler pseudoinstruction va < vb vmflt.vv vd, va, vb, vm va <= vb vmfle.vv vd, va, vb, vm va > vb vmflt.vv vd, vb, va, vm vmfgt.vv vd, va, vb, vm va >= vb vmfle.vv vd, vb, va, vm vmfge.vv vd, va, vb, vm va < f vmflt.vf vd, va, f, vm va <= f vmfle.vf vd, va, f, vm va > f vmfgt.vf vd, va, f, vm va >= f vmfge.vf vd, va, f, vm va, vb vector register groups f scalar floating-point register

v / _vector_floating_point_merge_instruction

  1. Vector Floating-Point Instructions / 14.15. Vector Floating-Point Merge Instruction
Operation Arguments Description
vfmerge.vfm vs2, rs1, vd

The vfmerge.vfm instruction is encoded as a masked instruction (vm=0). At elements where the mask value is zero, the first vector operand is copied to the destination element, otherwise a scalar floating-point register value is copied to the destination element.

vfmerge.vfm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? f[rs1] : vs2[i]

v / _vector_floating_point_minmax_instructions

  1. Vector Floating-Point Instructions / 14.11. Vector Floating-Point MIN/MAX Instructions
Operation Arguments Description
vfmin.vf vs2, rs1, vd

The vector floating-point vfmin and vfmax instructions have the same behavior as the corresponding scalar floating-point instructions in version 2.2 of the RISC-V F/D/Q extension.

# Floating-point minimum vfmin.vv vd, vs2, vs1, vm # Vector-vector vfmin.vf vd, vs2, rs1, vm # vector-scalar # Floating-point maximum vfmax.vv vd, vs2, vs1, vm # Vector-vector vfmax.vf vd, vs2, rs1, vm # vector-scalar

vfmin.vv vs2, vs1, vd

The vector floating-point vfmin and vfmax instructions have the same behavior as the corresponding scalar floating-point instructions in version 2.2 of the RISC-V F/D/Q extension.

# Floating-point minimum vfmin.vv vd, vs2, vs1, vm # Vector-vector vfmin.vf vd, vs2, rs1, vm # vector-scalar # Floating-point maximum vfmax.vv vd, vs2, vs1, vm # Vector-vector vfmax.vf vd, vs2, rs1, vm # vector-scalar

v / _vector_floating_point_move_instruction

  1. Vector Floating-Point Instructions / 14.16. Vector Floating-Point Move Instruction
Operation Arguments Description
vfmv.f.s vs2, rd

vfmv.v.f vd, rs1 # vd[i] = f[rs1]

vfmv.s.f rs1, vd

vfmv.v.f vd, rs1 # vd[i] = f[rs1]

vfmv.v.f rs1, vd

vfmv.v.f vd, rs1 # vd[i] = f[rs1]

v / _vector_floating_point_reciprocal_estimate_instruction

  1. Vector Floating-Point Instructions / 14.10. Vector Floating-Point Reciprocal Estimate Instruction
Operation Arguments Description
vfrec7.v vs2, vd

Table 17. vfrec7.v common-case lookup table contents

# Floating-point reciprocal estimate to 7 bits. vfrec7.v vd, vs2, vm

Spike ISS Implementation:
// vfclass.v vd, vs2, vm
VI_VFP_V_LOOP
({
vd = f16_recip7(vs2);
},
{
vd = f32_recip7(vs2);
},
{
vd = f64_recip7(vs2);
})

v / _vector_floating_point_reciprocal_square_root_estimate_instruction

  1. Vector Floating-Point Instructions / 14.9. Vector Floating-Point Reciprocal Square-Root Estimate Instruction
Operation Arguments Description
vfrsqrt7.v vs2, vd

Table 16. vfrsqrt7.v common-case lookup table contents

# Floating-point reciprocal square-root estimate to 7 bits. vfrsqrt7.v vd, vs2, vm

Spike ISS Implementation:
// vfclass.v vd, vs2, vm
VI_VFP_V_LOOP
({
vd = f16_rsqrte7(vs2);
},
{
vd = f32_rsqrte7(vs2);
},
{
vd = f64_rsqrte7(vs2);
})

v / _vector_floating_point_sign_injection_instructions

  1. Vector Floating-Point Instructions / 14.12. Vector Floating-Point Sign-Injection Instructions
Operation Arguments Description
vfsgnj.vf vs2, rs1, vd

vfsgnj.vv vd, vs2, vs1, vm # Vector-vector vfsgnj.vf vd, vs2, rs1, vm # vector-scalar vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar

vfsgnj.vv vs2, vs1, vd

vfsgnj.vv vd, vs2, vs1, vm # Vector-vector vfsgnj.vf vd, vs2, rs1, vm # vector-scalar vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar

v / _vector_floating_point_square_root_instruction

  1. Vector Floating-Point Instructions / 14.8. Vector Floating-Point Square-Root Instruction
Operation Arguments Description
vfsqrt.v vs2, vd

# Floating-point square root vfsqrt.v vd, vs2, vm # Vector-vector square root

Spike ISS Implementation:
// vsqrt.v vd, vd2, vm
VI_VFP_V_LOOP
({
vd = f16_sqrt(vs2);
},
{
vd = f32_sqrt(vs2);
},
{
vd = f64_sqrt(vs2);
})

v / _vector_indexed_instructions

  1. Vector Loads and Stores / 8.6. Vector Indexed Instructions
Operation Arguments Description
vluxei8.v vs2, rs1, vd

# Vector indexed loads and stores # Vector indexed-unordered load instructions # vd destination, rs1 base address, vs2 byte offsets vluxei8.v vd, (rs1), vs2, vm # unordered 8-bit indexed load of SEW data vluxei16.v vd, (rs1), vs2, vm # unordered 16-bit indexed load of SEW data vluxei32.v vd, (rs1), vs2, vm # unordered 32-bit indexed load of SEW data vluxei64.v vd, (rs1), vs2, vm # unordered 64-bit indexed load of SEW data # Vector indexed-ordered load instructions # vd destination, rs1 base address, vs2 byte offsets vloxei8.v vd, (rs1), vs2, vm # ordered 8-bit indexed load of SEW data vloxei16.v vd, (rs1), vs2, vm # ordered 16-bit indexed load of SEW data vloxei32.v vd, (rs1), vs2, vm # ordered 32-bit indexed load of SEW data vloxei64.v vd, (rs1), vs2, vm # ordered 64-bit indexed load of SEW data # Vector indexed-unordered store instructions # vs3 store data, rs1 base address, vs2 byte offsets vsuxei8.v vs3, (rs1), vs2, vm # unordered 8-bit indexed store of SEW data vsuxei16.v vs3, (rs1), vs2, vm # unordered 16-bit indexed store of SEW data vsuxei32.v vs3, (rs1), vs2, vm # unordered 32-bit indexed store of SEW data vsuxei64.v vs3, (rs1), vs2, vm # unordered 64-bit indexed store of SEW data # Vector indexed-ordered store instructions # vs3 store data, rs1 base address, vs2 byte offsets vsoxei8.v vs3, (rs1), vs2, vm # ordered 8-bit indexed store of SEW data vsoxei16.v vs3, (rs1), vs2, vm # ordered 16-bit indexed store of SEW data vsoxei32.v vs3, (rs1), vs2, vm # ordered 32-bit indexed store of SEW data vsoxei64.v vs3, (rs1), vs2, vm # ordered 64-bit indexed store of SEW data

Spike ISS Implementation:
// vlxei8.v and vlxseg[2-8]ei8.v
VI_LD_INDEX(e8, true);

v / _vector_instruction_formats

  1. Vector Instruction Formats /
Operation Arguments Description
vsetivli zimm10, zimm, rd

{reg: [ {bits: 7, name: 0x57, attr: 'vsetivli'}, {bits: 5, name: 'rd', type: 4}, {bits: 3, name: 7}, {bits: 5, name: 'uimm[4:0]', type: 5}, {bits: 10, name: 'zimm[9:0]', type: 5}, {bits: 1, name: '1'}, {bits: 1, name: '1'}, ]}

Spike ISS Implementation:
require_vector_novtype(false);
WRITE_RD(P.VU.set_vl(insn.rd(), -1, insn.rs1(), insn.v_zimm10()));

v / _vector_instruction_listing

  1. Vector Instruction Listing /
Operation Arguments Description
vaadd.vv vs2, vs1, vd

vaadd

vaadd.vx vs2, rs1, vd

vaadd

vasub.vv vs2, vs1, vd

vasub

vasub.vx vs2, rs1, vd

vasub

vasubu.vv vs2, vs1, vd

vasubu

vasubu.vx vs2, rs1, vd

vasubu

vdiv.vv vs2, vs1, vd

vdiv

vdiv.vx vs2, rs1, vd

vdiv

vfdiv.vf vs2, rs1, vd

vfdiv

vfdiv.vv vs2, vs1, vd

vfdiv

vfmadd.vf vs2, rs1, vd

vfmadd

vfmadd.vv vs2, vs1, vd

vfmadd

vfmax.vf vs2, rs1, vd

vfmax

vfmax.vv vs2, vs1, vd

vfmax

vfmsac.vf vs2, rs1, vd

vfmsac

vfmsac.vv vs2, vs1, vd

vfmsac

vfmsub.vf vs2, rs1, vd

vfmsub

vfmsub.vv vs2, vs1, vd

vfmsub

vfnmacc.vf vs2, rs1, vd

vfnmacc

vfnmacc.vv vs2, vs1, vd

vfnmacc

vfnmadd.vf vs2, rs1, vd

vfnmadd

vfnmadd.vv vs2, vs1, vd

vfnmadd

vfnmsac.vf vs2, rs1, vd

vfnmsac

vfnmsac.vv vs2, vs1, vd

vfnmsac

vfnmsub.vf vs2, rs1, vd

vfnmsub

vfnmsub.vv vs2, vs1, vd

vfnmsub

vfrdiv.vf vs2, rs1, vd

vfrdiv

vfredmax.vs vs2, vs1, vd

vfredmax

vfredmin.vs vs2, vs1, vd

vfredmin

vfrsub.vf vs2, rs1, vd

vfrsub

vfsgnjn.vf vs2, rs1, vd

vfsgnjn

vfsgnjn.vv vs2, vs1, vd

vfsgnjn

vfsgnjx.vf vs2, rs1, vd

vfsgnjx

vfsgnjx.vv vs2, vs1, vd

vfsgnjx

vfsub.vf vs2, rs1, vd

vfsub

vfsub.vv vs2, vs1, vd

vfsub

vfwmsac.vf vs2, rs1, vd

vfwmsac

vfwmsac.vv vs2, vs1, vd

vfwmsac

vfwnmacc.vf vs2, rs1, vd

vfwnmacc

vfwnmacc.vv vs2, vs1, vd

vfwnmacc

vfwnmsac.vf vs2, rs1, vd

vfwnmsac

vfwnmsac.vv vs2, vs1, vd

vfwnmsac

vfwredusum.vs vs2, vs1, vd

vfwredusum

vfwsub.vf vs2, rs1, vd

vfwsub

vfwsub.w

vfwsub.vv vs2, vs1, vd

vfwsub

vfwsub.w

vfwsub.wf vs2, rs1, vd

vfwsub

vfwsub.w

vfwsub.wv vs2, vs1, vd

vfwsub

vfwsub.w

vmadd.vv vs2, vs1, vd

vmadd

vmadd.vx vs2, rs1, vd

vmadd

vmax.vv vs2, vs1, vd

vmax

vmax.vx vs2, rs1, vd

vmax

vmaxu.vv vs2, vs1, vd

vmaxu

vmaxu.vx vs2, rs1, vd

vmaxu

vmfge.vf vs2, rs1, vd

vmfge

vmfgt.vf vs2, rs1, vd

vmfgt

vmfle.vf vs2, rs1, vd

vmfle

vmfle.vv vs2, vs1, vd

vmfle

vmfne.vf vs2, rs1, vd

vmfne

vmfne.vv vs2, vs1, vd

vmfne

vmin.vv vs2, vs1, vd

vmin

vmin.vx vs2, rs1, vd

vmin

vmor.mm vs2, vs1, vd

vmor

vmsgtu.vi vs2, simm5, vd

vmsgtu

vmsgtu.vx vs2, rs1, vd

vmsgtu

vmsle.vi vs2, simm5, vd

vmsle

vmsle.vv vs2, vs1, vd

vmsle

vmsle.vx vs2, rs1, vd

vmsle

vmsleu.vi vs2, simm5, vd

vmsleu

vmsleu.vv vs2, vs1, vd

vmsleu

vmsleu.vx vs2, rs1, vd

vmsleu

vmsltu.vv vs2, vs1, vd

vmsltu

vmsltu.vx vs2, rs1, vd

vmsltu

vmsne.vi vs2, simm5, vd

vmsne

vmsne.vv vs2, vs1, vd

vmsne

vmsne.vx vs2, rs1, vd

vmsne

vmulhsu.vv vs2, vs1, vd

vmulhsu

vmulhsu.vx vs2, rs1, vd

vmulhsu

vmulhu.vv vs2, vs1, vd

vmulhu

vmulhu.vx vs2, rs1, vd

vmulhu

vnmsac.vv vs2, vs1, vd

vnmsac

vnmsac.vx vs2, rs1, vd

vnmsac

vnmsub.vv vs2, vs1, vd

vnmsub

vnmsub.vx vs2, rs1, vd

vnmsub

vor.vi vs2, simm5, vd

vor

vor.vv vs2, vs1, vd

vor

vor.vx vs2, rs1, vd

vor

vredand.vs vs2, vs1, vd

vredand

vredmax.vs vs2, vs1, vd

vredmax

vredmaxu.vs vs2, vs1, vd

vredmaxu

vredmin.vs vs2, vs1, vd

vredmin

vredminu.vs vs2, vs1, vd

vredminu

vredor.vs vs2, vs1, vd

vredor

vredxor.vs vs2, vs1, vd

vredxor

vrem.vv vs2, vs1, vd

vrem

vrem.vx vs2, rs1, vd

vrem

vremu.vv vs2, vs1, vd

vremu

vremu.vx vs2, rs1, vd

vremu

vrgatherei16.vv vs2, vs1, vd

vrgatherei16

vrsub.vi vs2, simm5, vd

vrsub

vrsub.vx vs2, rs1, vd

vrsub

vsadd.vi vs2, simm5, vd

vsadd

vsadd.vv vs2, vs1, vd

vsadd

vsadd.vx vs2, rs1, vd

vsadd

vsext.vf2 vs2, vd

vsext.vf8

vsext.vf4

vsext.vf2

vsext.vf4 vs2, vd

vsext.vf8

vsext.vf4

vsext.vf2

vsext.vf8 vs2, vd

vsext.vf8

vsext.vf4

vsext.vf2

vsra.vi vs2, simm5, vd

vsra

vsra.vv vs2, vs1, vd

vsra

vsra.vx vs2, rs1, vd

vsra

vsrl.vi vs2, simm5, vd

vsrl

vsrl.vv vs2, vs1, vd

vsrl

vsrl.vx vs2, rs1, vd

vsrl

vssra.vi vs2, simm5, vd

vssra

vssra.vv vs2, vs1, vd

vssra

vssra.vx vs2, rs1, vd

vssra

vssub.vv vs2, vs1, vd

vssub

vssub.vx vs2, rs1, vd

vssub

vssubu.vv vs2, vs1, vd

vssubu

vssubu.vx vs2, rs1, vd

vssubu

vsub.vv vs2, vs1, vd

vsub

vsub.vx vs2, rs1, vd

vsub

vwadd.vv vs2, vs1, vd

vwadd

vwadd.w

vwadd.vx vs2, rs1, vd

vwadd

vwadd.w

vwadd.wv vs2, vs1, vd

vwadd

vwadd.w

vwadd.wx vs2, rs1, vd

vwadd

vwadd.w

vwmacc.vv vs2, vs1, vd

vwmacc

vwmacc.vx vs2, rs1, vd

vwmacc

vwmaccsu.vv vs2, vs1, vd

vwmaccsu

vwmaccsu.vx vs2, rs1, vd

vwmaccsu

vwmaccus.vx vs2, rs1, vd

vwmaccus

vwmulsu.vv vs2, vs1, vd

vwmulsu

vwmulsu.vx vs2, rs1, vd

vwmulsu

vwmulu.vv vs2, vs1, vd

vwmulu

vwmulu.vx vs2, rs1, vd

vwmulu

vwsub.vv vs2, vs1, vd

vwsub

vwsub.w

vwsub.vx vs2, rs1, vd

vwsub

vwsub.w

vwsub.wv vs2, vs1, vd

vwsub

vwsub.w

vwsub.wx vs2, rs1, vd

vwsub

vwsub.w

vwsubu.vv vs2, vs1, vd

vwsubu

vwsubu.w

vwsubu.vx vs2, rs1, vd

vwsubu

vwsubu.w

vwsubu.wv vs2, vs1, vd

vwsubu

vwsubu.w

vwsubu.wx vs2, rs1, vd

vwsubu

vwsubu.w

vxor.vi vs2, simm5, vd

vxor

vxor.vv vs2, vs1, vd

vxor

vxor.vx vs2, rs1, vd

vxor

v / _vector_integer_add_with_carry_subtract_with_borrow_instructions

  1. Vector Integer Arithmetic Instructions / 12.4. Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions
Operation Arguments Description
vadc.vim vs2, simm5, vd

vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd. These instructions are encoded as masked instructions (vm=0), but they operate on and write back all body elements. Encodings corresponding to the unmasked versions (vm=1) are reserved.

For vadc and vsbc, the instruction encoding is reserved if the destination vector register is v0.

# Produce sum with carry. # vd[i] = vs2[i] + vs1[i] + v0.mask[i] vadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] + x[rs1] + v0.mask[i] vadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd[i] = vs2[i] + imm + v0.mask[i] vadc.vim vd, vs2, imm, v0 # Vector-immediate # Produce carry out in mask register format # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) vmadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) vmadc.vim vd, vs2, imm, v0 # Vector-immediate # vd.mask[i] = carry_out(vs2[i] + vs1[i]) vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in # vd.mask[i] = carry_out(vs2[i] + x[rs1]) vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in # vd.mask[i] = carry_out(vs2[i] + imm) vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in

vadc.vvm vs2, vs1, vd

vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd. These instructions are encoded as masked instructions (vm=0), but they operate on and write back all body elements. Encodings corresponding to the unmasked versions (vm=1) are reserved.

For vadc and vsbc, the instruction encoding is reserved if the destination vector register is v0.

# Produce sum with carry. # vd[i] = vs2[i] + vs1[i] + v0.mask[i] vadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] + x[rs1] + v0.mask[i] vadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd[i] = vs2[i] + imm + v0.mask[i] vadc.vim vd, vs2, imm, v0 # Vector-immediate # Produce carry out in mask register format # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) vmadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) vmadc.vim vd, vs2, imm, v0 # Vector-immediate # vd.mask[i] = carry_out(vs2[i] + vs1[i]) vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in # vd.mask[i] = carry_out(vs2[i] + x[rs1]) vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in # vd.mask[i] = carry_out(vs2[i] + imm) vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in

vadc.vxm vs2, rs1, vd

vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd. These instructions are encoded as masked instructions (vm=0), but they operate on and write back all body elements. Encodings corresponding to the unmasked versions (vm=1) are reserved.

For vadc and vsbc, the instruction encoding is reserved if the destination vector register is v0.

# Produce sum with carry. # vd[i] = vs2[i] + vs1[i] + v0.mask[i] vadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] + x[rs1] + v0.mask[i] vadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd[i] = vs2[i] + imm + v0.mask[i] vadc.vim vd, vs2, imm, v0 # Vector-immediate # Produce carry out in mask register format # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) vmadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) vmadc.vim vd, vs2, imm, v0 # Vector-immediate # vd.mask[i] = carry_out(vs2[i] + vs1[i]) vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in # vd.mask[i] = carry_out(vs2[i] + x[rs1]) vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in # vd.mask[i] = carry_out(vs2[i] + imm) vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in

vmadc.vi vs2, simm5, vd

vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy.

# Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word

vmadc.vim vs2, simm5, vd

vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy.

# Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word

vmadc.vv vs2, vs1, vd

vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy.

# Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word

vmadc.vvm vs2, vs1, vd

vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy.

# Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word

vmadc.vx vs2, rs1, vd

vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy.

# Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word

vmadc.vxm vs2, rs1, vd

vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy.

# Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word

vmsbc.vv vs2, vs1, vd

For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative.

vmsbc.vvm vs2, vs1, vd

For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative.

vmsbc.vx vs2, rs1, vd

For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative.

vmsbc.vxm vs2, rs1, vd

For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative.

vsbc.vvm vs2, vs1, vd

The subtract with borrow instruction vsbc performs the equivalent function to support long word arithmetic for subtraction. There are no subtract with immediate instructions.

# Produce difference with borrow. # vd[i] = vs2[i] - vs1[i] - v0.mask[i] vsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] - x[rs1] - v0.mask[i] vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # Produce borrow out in mask register format # vd.mask[i] = borrow_out(vs2[i] - vs1[i] - v0.mask[i]) vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = borrow_out(vs2[i] - x[rs1] - v0.mask[i]) vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = borrow_out(vs2[i] - vs1[i]) vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in # vd.mask[i] = borrow_out(vs2[i] - x[rs1]) vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in

vsbc.vxm vs2, rs1, vd

The subtract with borrow instruction vsbc performs the equivalent function to support long word arithmetic for subtraction. There are no subtract with immediate instructions.

# Produce difference with borrow. # vd[i] = vs2[i] - vs1[i] - v0.mask[i] vsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] - x[rs1] - v0.mask[i] vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # Produce borrow out in mask register format # vd.mask[i] = borrow_out(vs2[i] - vs1[i] - v0.mask[i]) vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = borrow_out(vs2[i] - x[rs1] - v0.mask[i]) vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = borrow_out(vs2[i] - vs1[i]) vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in # vd.mask[i] = borrow_out(vs2[i] - x[rs1]) vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in

v / _vector_integer_compare_instructions

  1. Vector Integer Arithmetic Instructions / 12.8. Vector Integer Compare Instructions
Operation Arguments Description
vmseq.vi vs2, simm5, vd

# Set if equal vmseq.vv vd, vs2, vs1, vm # Vector-vector vmseq.vx vd, vs2, rs1, vm # vector-scalar vmseq.vi vd, vs2, imm, vm # vector-immediate # Set if not equal vmsne.vv vd, vs2, vs1, vm # Vector-vector vmsne.vx vd, vs2, rs1, vm # vector-scalar vmsne.vi vd, vs2, imm, vm # vector-immediate # Set if less than, unsigned vmsltu.vv vd, vs2, vs1, vm # Vector-vector vmsltu.vx vd, vs2, rs1, vm # Vector-scalar # Set if less than, signed vmslt.vv vd, vs2, vs1, vm # Vector-vector vmslt.vx vd, vs2, rs1, vm # vector-scalar # Set if less than or equal, unsigned vmsleu.vv vd, vs2, vs1, vm # Vector-vector vmsleu.vx vd, vs2, rs1, vm # vector-scalar vmsleu.vi vd, vs2, imm, vm # Vector-immediate # Set if less than or equal, signed vmsle.vv vd, vs2, vs1, vm # Vector-vector vmsle.vx vd, vs2, rs1, vm # vector-scalar vmsle.vi vd, vs2, imm, vm # vector-immediate # Set if greater than, unsigned vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar vmsgtu.vi vd, vs2, imm, vm # Vector-immediate # Set if greater than, signed vmsgt.vx vd, vs2, rs1, vm # Vector-scalar vmsgt.vi vd, vs2, imm, vm # Vector-immediate # Following two instructions are not provided directly # Set if greater than or equal, unsigned # vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar # Set if greater than or equal, signed # vmsge.vx vd, vs2, rs1, vm # Vector-scalar

vmseq.vv vs2, vs1, vd

# Set if equal vmseq.vv vd, vs2, vs1, vm # Vector-vector vmseq.vx vd, vs2, rs1, vm # vector-scalar vmseq.vi vd, vs2, imm, vm # vector-immediate # Set if not equal vmsne.vv vd, vs2, vs1, vm # Vector-vector vmsne.vx vd, vs2, rs1, vm # vector-scalar vmsne.vi vd, vs2, imm, vm # vector-immediate # Set if less than, unsigned vmsltu.vv vd, vs2, vs1, vm # Vector-vector vmsltu.vx vd, vs2, rs1, vm # Vector-scalar # Set if less than, signed vmslt.vv vd, vs2, vs1, vm # Vector-vector vmslt.vx vd, vs2, rs1, vm # vector-scalar # Set if less than or equal, unsigned vmsleu.vv vd, vs2, vs1, vm # Vector-vector vmsleu.vx vd, vs2, rs1, vm # vector-scalar vmsleu.vi vd, vs2, imm, vm # Vector-immediate # Set if less than or equal, signed vmsle.vv vd, vs2, vs1, vm # Vector-vector vmsle.vx vd, vs2, rs1, vm # vector-scalar vmsle.vi vd, vs2, imm, vm # vector-immediate # Set if greater than, unsigned vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar vmsgtu.vi vd, vs2, imm, vm # Vector-immediate # Set if greater than, signed vmsgt.vx vd, vs2, rs1, vm # Vector-scalar vmsgt.vi vd, vs2, imm, vm # Vector-immediate # Following two instructions are not provided directly # Set if greater than or equal, unsigned # vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar # Set if greater than or equal, signed # vmsge.vx vd, vs2, rs1, vm # Vector-scalar

vmseq.vx vs2, rs1, vd

# Set if equal vmseq.vv vd, vs2, vs1, vm # Vector-vector vmseq.vx vd, vs2, rs1, vm # vector-scalar vmseq.vi vd, vs2, imm, vm # vector-immediate # Set if not equal vmsne.vv vd, vs2, vs1, vm # Vector-vector vmsne.vx vd, vs2, rs1, vm # vector-scalar vmsne.vi vd, vs2, imm, vm # vector-immediate # Set if less than, unsigned vmsltu.vv vd, vs2, vs1, vm # Vector-vector vmsltu.vx vd, vs2, rs1, vm # Vector-scalar # Set if less than, signed vmslt.vv vd, vs2, vs1, vm # Vector-vector vmslt.vx vd, vs2, rs1, vm # vector-scalar # Set if less than or equal, unsigned vmsleu.vv vd, vs2, vs1, vm # Vector-vector vmsleu.vx vd, vs2, rs1, vm # vector-scalar vmsleu.vi vd, vs2, imm, vm # Vector-immediate # Set if less than or equal, signed vmsle.vv vd, vs2, vs1, vm # Vector-vector vmsle.vx vd, vs2, rs1, vm # vector-scalar vmsle.vi vd, vs2, imm, vm # vector-immediate # Set if greater than, unsigned vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar vmsgtu.vi vd, vs2, imm, vm # Vector-immediate # Set if greater than, signed vmsgt.vx vd, vs2, rs1, vm # Vector-scalar vmsgt.vi vd, vs2, imm, vm # Vector-immediate # Following two instructions are not provided directly # Set if greater than or equal, unsigned # vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar # Set if greater than or equal, signed # vmsge.vx vd, vs2, rs1, vm # Vector-scalar

vmsgt.vi vs2, simm5, vd

Similarly, vmsge{u}.vi is not provided and the compare is implemented using vmsgt{u}.vi with the immediate decremented by one. The resulting effective vmsge.vi range is -15 to 16, and the resulting effective vmsgeu.vi range is 1 to 16 (Note, vmsgeu.vi with immediate 0 is not useful as it is always true).

The vmsge{u}.vx operation can be synthesized by reducing the value of x by 1 and using the vmsgt{u}.vx instruction, when it is known that this will not underflow the representation in x.

Sequences to synthesize `vmsge{u}.vx` instruction va >= x, x > minimum addi t0, x, -1; vmsgt{u}.vx vd, va, t0, vm

vmsgt.vx vs2, rs1, vd

Similarly, vmsge{u}.vi is not provided and the compare is implemented using vmsgt{u}.vi with the immediate decremented by one. The resulting effective vmsge.vi range is -15 to 16, and the resulting effective vmsgeu.vi range is 1 to 16 (Note, vmsgeu.vi with immediate 0 is not useful as it is always true).

The vmsge{u}.vx operation can be synthesized by reducing the value of x by 1 and using the vmsgt{u}.vx instruction, when it is known that this will not underflow the representation in x.

Sequences to synthesize `vmsge{u}.vx` instruction va >= x, x > minimum addi t0, x, -1; vmsgt{u}.vx vd, va, t0, vm

vmslt.vv vs2, vs1, vd

Comparison Assembler Mapping Assembler Pseudoinstruction va < vb vmslt{u}.vv vd, va, vb, vm va <= vb vmsle{u}.vv vd, va, vb, vm va > vb vmslt{u}.vv vd, vb, va, vm vmsgt{u}.vv vd, va, vb, vm va >= vb vmsle{u}.vv vd, vb, va, vm vmsge{u}.vv vd, va, vb, vm va < x vmslt{u}.vx vd, va, x, vm va <= x vmsle{u}.vx vd, va, x, vm va > x vmsgt{u}.vx vd, va, x, vm va >= x see below va < i vmsle{u}.vi vd, va, i-1, vm vmslt{u}.vi vd, va, i, vm va <= i vmsle{u}.vi vd, va, i, vm va > i vmsgt{u}.vi vd, va, i, vm va >= i vmsgt{u}.vi vd, va, i-1, vm vmsge{u}.vi vd, va, i, vm va, vb vector register groups x scalar integer register i immediate

unmasked va >= x pseudoinstruction: vmsge{u}.vx vd, va, x expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd masked va >= x, vd != v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 masked va >= x, vd == v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vd, vd, vt masked va >= x, any vd pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vt, v0, vt; vmandn.mm vd, vd, v0; vmor.mm vd, vt, vd The vt argument to the pseudoinstruction must name a temporary vector register that is not same as vd and which will be clobbered by the pseudoinstruction

# (a < b) && (b < c) in two instructions when mask-undisturbed vmslt.vv v0, va, vb # All body elements written vmslt.vv v0, vb, vc, v0.t # Only update at set mask

vmslt.vx vs2, rs1, vd

Comparison Assembler Mapping Assembler Pseudoinstruction va < vb vmslt{u}.vv vd, va, vb, vm va <= vb vmsle{u}.vv vd, va, vb, vm va > vb vmslt{u}.vv vd, vb, va, vm vmsgt{u}.vv vd, va, vb, vm va >= vb vmsle{u}.vv vd, vb, va, vm vmsge{u}.vv vd, va, vb, vm va < x vmslt{u}.vx vd, va, x, vm va <= x vmsle{u}.vx vd, va, x, vm va > x vmsgt{u}.vx vd, va, x, vm va >= x see below va < i vmsle{u}.vi vd, va, i-1, vm vmslt{u}.vi vd, va, i, vm va <= i vmsle{u}.vi vd, va, i, vm va > i vmsgt{u}.vi vd, va, i, vm va >= i vmsgt{u}.vi vd, va, i-1, vm vmsge{u}.vi vd, va, i, vm va, vb vector register groups x scalar integer register i immediate

unmasked va >= x pseudoinstruction: vmsge{u}.vx vd, va, x expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd masked va >= x, vd != v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 masked va >= x, vd == v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vd, vd, vt masked va >= x, any vd pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vt, v0, vt; vmandn.mm vd, vd, v0; vmor.mm vd, vt, vd The vt argument to the pseudoinstruction must name a temporary vector register that is not same as vd and which will be clobbered by the pseudoinstruction

# (a < b) && (b < c) in two instructions when mask-undisturbed vmslt.vv v0, va, vb # All body elements written vmslt.vv v0, vb, vc, v0.t # Only update at set mask

v / _vector_integer_divide_instructions

  1. Vector Integer Arithmetic Instructions / 12.11. Vector Integer Divide Instructions
Operation Arguments Description
vdivu.vv vs2, vs1, vd

# Unsigned divide. vdivu.vv vd, vs2, vs1, vm # Vector-vector vdivu.vx vd, vs2, rs1, vm # vector-scalar # Signed divide vdiv.vv vd, vs2, vs1, vm # Vector-vector vdiv.vx vd, vs2, rs1, vm # vector-scalar # Unsigned remainder vremu.vv vd, vs2, vs1, vm # Vector-vector vremu.vx vd, vs2, rs1, vm # vector-scalar # Signed remainder vrem.vv vd, vs2, vs1, vm # Vector-vector vrem.vx vd, vs2, rs1, vm # vector-scalar

vdivu.vx vs2, rs1, vd

# Unsigned divide. vdivu.vv vd, vs2, vs1, vm # Vector-vector vdivu.vx vd, vs2, rs1, vm # vector-scalar # Signed divide vdiv.vv vd, vs2, vs1, vm # Vector-vector vdiv.vx vd, vs2, rs1, vm # vector-scalar # Unsigned remainder vremu.vv vd, vs2, vs1, vm # Vector-vector vremu.vx vd, vs2, rs1, vm # vector-scalar # Signed remainder vrem.vv vd, vs2, vs1, vm # Vector-vector vrem.vx vd, vs2, rs1, vm # vector-scalar

v / _vector_integer_merge_instructions

  1. Vector Integer Arithmetic Instructions / 12.15. Vector Integer Merge Instructions
Operation Arguments Description
vmerge.vim vs2, simm5, vd

The vmerge instructions are encoded as masked instructions (vm=0). The instructions combine two sources as follows. At elements where the mask value is zero, the first operand is copied to the destination element, otherwise the second operand is copied to the destination element. The first operand is always a vector register group specified by vs2. The second operand is a vector register group specified by vs1 or a scalar x register specified by rs1 or a 5-bit sign-extended immediate.

vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i]

vmerge.vvm vs2, vs1, vd

The vmerge instructions are encoded as masked instructions (vm=0). The instructions combine two sources as follows. At elements where the mask value is zero, the first operand is copied to the destination element, otherwise the second operand is copied to the destination element. The first operand is always a vector register group specified by vs2. The second operand is a vector register group specified by vs1 or a scalar x register specified by rs1 or a 5-bit sign-extended immediate.

vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i]

vmerge.vxm vs2, rs1, vd

The vmerge instructions are encoded as masked instructions (vm=0). The instructions combine two sources as follows. At elements where the mask value is zero, the first operand is copied to the destination element, otherwise the second operand is copied to the destination element. The first operand is always a vector register group specified by vs2. The second operand is a vector register group specified by vs1 or a scalar x register specified by rs1 or a 5-bit sign-extended immediate.

vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i]

v / _vector_integer_minmax_instructions

  1. Vector Integer Arithmetic Instructions / 12.9. Vector Integer Min/Max Instructions
Operation Arguments Description
vminu.vv vs2, vs1, vd

# Unsigned minimum vminu.vv vd, vs2, vs1, vm # Vector-vector vminu.vx vd, vs2, rs1, vm # vector-scalar # Signed minimum vmin.vv vd, vs2, vs1, vm # Vector-vector vmin.vx vd, vs2, rs1, vm # vector-scalar # Unsigned maximum vmaxu.vv vd, vs2, vs1, vm # Vector-vector vmaxu.vx vd, vs2, rs1, vm # vector-scalar # Signed maximum vmax.vv vd, vs2, vs1, vm # Vector-vector vmax.vx vd, vs2, rs1, vm # vector-scalar

vminu.vx vs2, rs1, vd

# Unsigned minimum vminu.vv vd, vs2, vs1, vm # Vector-vector vminu.vx vd, vs2, rs1, vm # vector-scalar # Signed minimum vmin.vv vd, vs2, vs1, vm # Vector-vector vmin.vx vd, vs2, rs1, vm # vector-scalar # Unsigned maximum vmaxu.vv vd, vs2, vs1, vm # Vector-vector vmaxu.vx vd, vs2, rs1, vm # vector-scalar # Signed maximum vmax.vv vd, vs2, vs1, vm # Vector-vector vmax.vx vd, vs2, rs1, vm # vector-scalar

v / _vector_integer_move_instructions

  1. Vector Integer Arithmetic Instructions / 12.16. Vector Integer Move Instructions
Operation Arguments Description
vmv.s.x rs1, vd

The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved.

The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW.

vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm

vmv.v.i simm5, vd

The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved.

The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW.

vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm

vmv.v.v vs1, vd

The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved.

The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW.

vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm

vmv.v.x rs1, vd

The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved.

The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW.

vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm

vmv.x.s vs2, rd

The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved.

The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW.

vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm

v / _vector_iota_instruction

  1. Vector Mask Instructions / 16.8. Vector Iota Instruction
Operation Arguments Description
viota.m vs2, vd

The viota.m instruction reads a source vector mask register and writes to each element of the destination vector register group the sum of all the bits of elements in the mask register whose index is less than the element, e.g., a parallel prefix sum of the mask values.

Traps on viota.m are always reported with a vstart of 0, and execution is always restarted from the beginning when resuming after a trap handler. An illegal instruction exception is raised if vstart is non-zero.

The viota.m instruction can be combined with memory scatter instructions (indexed stores) to perform vector compress functions.

viota.m vd, vs2, vm # Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 0 0 1 v2 contents viota.m v4, v2 # Unmasked 2 2 2 1 1 1 1 0 v4 result 1 1 1 0 1 0 1 1 v0 contents 1 0 0 1 0 0 0 1 v2 contents 2 3 4 5 6 7 8 9 v4 contents viota.m v4, v2, v0.t # Masked, vtype.vma=0 1 1 1 5 1 7 1 0 v4 results

Spike ISS Implementation:
// vmpopc rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
reg_t vl = P.VU.vl->read();
reg_t sew = P.VU.vsew;
reg_t rd_num = insn.rd();
reg_t rs2_num = insn.rs2();
require(P.VU.vstart->read() == 0);
require_vm;
require_align(rd_num, P.VU.vflmul);
require_noover(rd_num, P.VU.vflmul, rs2_num, 1);

int cnt = 0;
for (reg_t i = 0; i < vl; ++i) {
const int midx = i / 64;
const int mpos = i % 64;

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx) >> mpos) & 0x1) == 1;
bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1;

bool has_one = false;
if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) {
if (vs2_lsb) {
has_one = true;
}
}

bool use_ori = (insn.v_vm() == 0) && !do_mask;
switch (sew) {
case e8:
P.VU.elt<uint8_t>(rd_num, i, true) = use_ori ?
P.VU.elt<uint8_t>(rd_num, i) : cnt;
break;
case e16:
P.VU.elt<uint16_t>(rd_num, i, true) = use_ori ?
P.VU.elt<uint16_t>(rd_num, i) : cnt;
break;
case e32:
P.VU.elt<uint32_t>(rd_num, i, true) = use_ori ?
P.VU.elt<uint32_t>(rd_num, i) : cnt;
break;
default:
P.VU.elt<uint64_t>(rd_num, i, true) = use_ori ?
P.VU.elt<uint64_t>(rd_num, i) : cnt;
break;
}

if (has_one) {
cnt++;
}
}

v / _vector_loadstore_whole_register_instructions

  1. Vector Loads and Stores / 8.9. Vector Load/Store Whole Register Instructions
Operation Arguments Description
vl1re8.v rs1, vd

# Format of whole register load and store instructions. vl1r.v v3, (a0) # Pseudoinstruction equal to vl1re8.v vl1re8.v v3, (a0) # Load v3 with VLEN/8 bytes held at address in a0 vl1re16.v v3, (a0) # Load v3 with VLEN/16 halfwords held at address in a0 vl1re32.v v3, (a0) # Load v3 with VLEN/32 words held at address in a0 vl1re64.v v3, (a0) # Load v3 with VLEN/64 doublewords held at address in a0 vl2r.v v2, (a0) # Pseudoinstruction equal to vl2re8.v v2, (a0) vl2re8.v v2, (a0) # Load v2-v3 with 2*VLEN/8 bytes from address in a0 vl2re16.v v2, (a0) # Load v2-v3 with 2*VLEN/16 halfwords held at address in a0 vl2re32.v v2, (a0) # Load v2-v3 with 2*VLEN/32 words held at address in a0 vl2re64.v v2, (a0) # Load v2-v3 with 2*VLEN/64 doublewords held at address in a0 vl4r.v v4, (a0) # Pseudoinstruction equal to vl4re8.v vl4re8.v v4, (a0) # Load v4-v7 with 4*VLEN/8 bytes from address in a0 vl4re16.v v4, (a0) vl4re32.v v4, (a0) vl4re64.v v4, (a0) vl8r.v v8, (a0) # Pseudoinstruction equal to vl8re8.v vl8re8.v v8, (a0) # Load v8-v15 with 8*VLEN/8 bytes from address in a0 vl8re16.v v8, (a0) vl8re32.v v8, (a0) vl8re64.v v8, (a0) vs1r.v v3, (a1) # Store v3 to address in a1 vs2r.v v2, (a1) # Store v2-v3 to address in a1 vs4r.v v4, (a1) # Store v4-v7 to address in a1 vs8r.v v8, (a1) # Store v8-v15 to address in a1

Spike ISS Implementation:
// vl1re8.v vd, (rs1)
VI_LD_WHOLE(uint8);

v / _vector_narrowing_fixed_point_clip_instructions

  1. Vector Fixed-Point Arithmetic Instructions / 13.5. Vector Narrowing Fixed-Point Clip Instructions
Operation Arguments Description
vnclip.wi vs2, simm5, vd

The vnclip instructions are used to pack a fixed-point value into a narrower destination. The instructions support rounding, scaling, and saturation into the final destination format. The source data is in the vector register group specified by vs2. The scaling shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. The low lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are used to control the right shift amount, which provides the scaling.

For vnclip, the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer.

vnclip.wv vs2, vs1, vd

The vnclip instructions are used to pack a fixed-point value into a narrower destination. The instructions support rounding, scaling, and saturation into the final destination format. The source data is in the vector register group specified by vs2. The scaling shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. The low lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are used to control the right shift amount, which provides the scaling.

For vnclip, the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer.

vnclip.wx vs2, rs1, vd

The vnclip instructions are used to pack a fixed-point value into a narrower destination. The instructions support rounding, scaling, and saturation into the final destination format. The source data is in the vector register group specified by vs2. The scaling shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. The low lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are used to control the right shift amount, which provides the scaling.

For vnclip, the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer.

vnclipu.wi vs2, simm5, vd

For vnclipu/vnclip, the rounding mode is specified in the vxrm CSR. Rounding occurs around the least-significant bit of the destination and before saturation.

For vnclipu, the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer.

# Narrowing unsigned clip # SEW 2*SEW SEW vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm))

vnclipu.wv vs2, vs1, vd

For vnclipu/vnclip, the rounding mode is specified in the vxrm CSR. Rounding occurs around the least-significant bit of the destination and before saturation.

For vnclipu, the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer.

# Narrowing unsigned clip # SEW 2*SEW SEW vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm))

vnclipu.wx vs2, rs1, vd

For vnclipu/vnclip, the rounding mode is specified in the vxrm CSR. Rounding occurs around the least-significant bit of the destination and before saturation.

For vnclipu, the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer.

# Narrowing unsigned clip # SEW 2*SEW SEW vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm))

v / _vector_register_gather_instructions

  1. Vector Permutation Instructions / 17.4. Vector Register Gather Instructions
Operation Arguments Description
vrgather.vi vs2, simm5, vd

The vrgather.vv form uses SEW/LMUL for both the data and indices. The vrgatherei16.vv form uses SEW/LMUL for the data in vs2 but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in vs1.

For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, otherwise the instruction encoding is reserved.

vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]];

vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]

vrgather.vv vs2, vs1, vd

The vrgather.vv form uses SEW/LMUL for both the data and indices. The vrgatherei16.vv form uses SEW/LMUL for the data in vs2 but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in vs1.

For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, otherwise the instruction encoding is reserved.

vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]];

vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]

vrgather.vx vs2, rs1, vd

The vrgather.vv form uses SEW/LMUL for both the data and indices. The vrgatherei16.vv form uses SEW/LMUL for the data in vs2 but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in vs1.

For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, otherwise the instruction encoding is reserved.

vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]];

vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]

v / _vector_register_grouping_vlmul20

  1. Vector Extension Programmer’s Model / 4.4. Vector type register, vtype
Operation Arguments Description
min rd, rs1, rs2

MIN

Spike ISS Implementation:
require_either_extension(EXT_ZBPBO, EXT_ZBB);
WRITE_RD(sext_xlen(sreg_t(RS1) < sreg_t(RS2) ? RS1 : RS2));

v / _vector_single_width_averaging_add_and_subtract

  1. Vector Fixed-Point Arithmetic Instructions / 13.2. Vector Single-Width Averaging Add and Subtract
Operation Arguments Description
vaaddu.vv vs2, vs1, vd

The averaging add and subtract instructions right shift the result by one bit and round off the result according to the setting in vxrm. Both unsigned and signed versions are provided. For vaaddu and vaadd there can be no overflow in the result. For vasub and vasubu, overflow is ignored and the result wraps around.

# Averaging add # Averaging adds of unsigned integers. vaaddu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] + vs1[i], 1) vaaddu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] + x[rs1], 1) # Averaging adds of signed integers. vaadd.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] + vs1[i], 1) vaadd.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] + x[rs1], 1) # Averaging subtract # Averaging subtract of unsigned integers. vasubu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] - vs1[i], 1) vasubu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] - x[rs1], 1) # Averaging subtract of signed integers. vasub.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] - vs1[i], 1) vasub.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] - x[rs1], 1)

vaaddu.vx vs2, rs1, vd

The averaging add and subtract instructions right shift the result by one bit and round off the result according to the setting in vxrm. Both unsigned and signed versions are provided. For vaaddu and vaadd there can be no overflow in the result. For vasub and vasubu, overflow is ignored and the result wraps around.

# Averaging add # Averaging adds of unsigned integers. vaaddu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] + vs1[i], 1) vaaddu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] + x[rs1], 1) # Averaging adds of signed integers. vaadd.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] + vs1[i], 1) vaadd.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] + x[rs1], 1) # Averaging subtract # Averaging subtract of unsigned integers. vasubu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] - vs1[i], 1) vasubu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] - x[rs1], 1) # Averaging subtract of signed integers. vasub.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] - vs1[i], 1) vasub.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] - x[rs1], 1)

v / _vector_single_width_floating_point_addsubtract_instructions

  1. Vector Floating-Point Instructions / 14.2. Vector Single-Width Floating-Point Add/Subtract Instructions
Operation Arguments Description
vfadd.vf vs2, rs1, vd

# Floating-point add vfadd.vv vd, vs2, vs1, vm # Vector-vector vfadd.vf vd, vs2, rs1, vm # vector-scalar # Floating-point subtract vfsub.vv vd, vs2, vs1, vm # Vector-vector vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1] vfrsub.vf vd, vs2, rs1, vm # Scalar-vector vd[i] = f[rs1] - vs2[i]

vfadd.vv vs2, vs1, vd

# Floating-point add vfadd.vv vd, vs2, vs1, vm # Vector-vector vfadd.vf vd, vs2, rs1, vm # vector-scalar # Floating-point subtract vfsub.vv vd, vs2, vs1, vm # Vector-vector vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1] vfrsub.vf vd, vs2, rs1, vm # Scalar-vector vd[i] = f[rs1] - vs2[i]

v / _vector_single_width_floating_point_fused_multiply_add_instructions

  1. Vector Floating-Point Instructions / 14.6. Vector Single-Width Floating-Point Fused Multiply-Add Instructions
Operation Arguments Description
vfmacc.vf vs2, rs1, vd

# FP multiply-accumulate, overwrites addend vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP negate-(multiply-accumulate), overwrites subtrahend vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP multiply-subtract-accumulator, overwrites subtrahend vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP negate-(multiply-subtract-accumulator), overwrites minuend vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] # FP multiply-add, overwrites multiplicand vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i] vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i] # FP negate-(multiply-add), overwrites multiplicand vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i] vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i] # FP multiply-sub, overwrites multiplicand vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i] vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i] # FP negate-(multiply-sub), overwrites multiplicand vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vfnmsub.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) + vs2[i]

vfmacc.vv vs2, vs1, vd

# FP multiply-accumulate, overwrites addend vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP negate-(multiply-accumulate), overwrites subtrahend vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP multiply-subtract-accumulator, overwrites subtrahend vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP negate-(multiply-subtract-accumulator), overwrites minuend vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] # FP multiply-add, overwrites multiplicand vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i] vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i] # FP negate-(multiply-add), overwrites multiplicand vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i] vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i] # FP multiply-sub, overwrites multiplicand vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i] vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i] # FP negate-(multiply-sub), overwrites multiplicand vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vfnmsub.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) + vs2[i]

v / _vector_single_width_floating_point_multiplydivide_instructions

  1. Vector Floating-Point Instructions / 14.4. Vector Single-Width Floating-Point Multiply/Divide Instructions
Operation Arguments Description
vfmul.vf vs2, rs1, vd

# Floating-point multiply vfmul.vv vd, vs2, vs1, vm # Vector-vector vfmul.vf vd, vs2, rs1, vm # vector-scalar # Floating-point divide vfdiv.vv vd, vs2, vs1, vm # Vector-vector vfdiv.vf vd, vs2, rs1, vm # vector-scalar # Reverse floating-point divide vector = scalar / vector vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i]

vfmul.vv vs2, vs1, vd

# Floating-point multiply vfmul.vv vd, vs2, vs1, vm # Vector-vector vfmul.vf vd, vs2, rs1, vm # vector-scalar # Floating-point divide vfdiv.vv vd, vs2, vs1, vm # Vector-vector vfdiv.vf vd, vs2, rs1, vm # vector-scalar # Reverse floating-point divide vector = scalar / vector vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i]

v / _vector_single_width_fractional_multiply_with_rounding_and_saturation

  1. Vector Fixed-Point Arithmetic Instructions / 13.3. Vector Single-Width Fractional Multiply with Rounding and Saturation
Operation Arguments Description
vsmul.vv vs2, vs1, vd

# Signed saturating and rounding fractional multiply # See vxrm description for rounding calculation vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*vs1[i], SEW-1)) vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*x[rs1], SEW-1))

vsmul.vx vs2, rs1, vd

# Signed saturating and rounding fractional multiply # See vxrm description for rounding calculation vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*vs1[i], SEW-1)) vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*x[rs1], SEW-1))

v / _vector_single_width_integer_add_and_subtract

  1. Vector Integer Arithmetic Instructions / 12.1. Vector Single-Width Integer Add and Subtract
Operation Arguments Description
vadd.vi vs2, simm5, vd

# Integer adds. vadd.vv vd, vs2, vs1, vm # Vector-vector vadd.vx vd, vs2, rs1, vm # vector-scalar vadd.vi vd, vs2, imm, vm # vector-immediate # Integer subtract vsub.vv vd, vs2, vs1, vm # Vector-vector vsub.vx vd, vs2, rs1, vm # vector-scalar # Integer reverse subtract vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i]

vadd.vv vs2, vs1, vd

# Integer adds. vadd.vv vd, vs2, vs1, vm # Vector-vector vadd.vx vd, vs2, rs1, vm # vector-scalar vadd.vi vd, vs2, imm, vm # vector-immediate # Integer subtract vsub.vv vd, vs2, vs1, vm # Vector-vector vsub.vx vd, vs2, rs1, vm # vector-scalar # Integer reverse subtract vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i]

vadd.vx vs2, rs1, vd

# Integer adds. vadd.vv vd, vs2, vs1, vm # Vector-vector vadd.vx vd, vs2, rs1, vm # vector-scalar vadd.vi vd, vs2, imm, vm # vector-immediate # Integer subtract vsub.vv vd, vs2, vs1, vm # Vector-vector vsub.vx vd, vs2, rs1, vm # vector-scalar # Integer reverse subtract vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i]

v / _vector_single_width_integer_multiply_add_instructions

  1. Vector Integer Arithmetic Instructions / 12.13. Vector Single-Width Integer Multiply-Add Instructions
Operation Arguments Description
vmacc.vv vs2, vs1, vd

The integer multiply-add instructions are destructive and are provided in two forms, one that overwrites the addend or minuend (vmacc, vnmsac) and one that overwrites the first multiplicand (vmadd, vnmsub).

# Integer multiply-add, overwrite addend vmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Integer multiply-sub, overwrite minuend vnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vnmsac.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vs2[i]) + vd[i] # Integer multiply-add, overwrite multiplicand vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i] vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i] # Integer multiply-sub, overwrite multiplicand vnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vnmsub.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vd[i]) + vs2[i]

vmacc.vx vs2, rs1, vd

The integer multiply-add instructions are destructive and are provided in two forms, one that overwrites the addend or minuend (vmacc, vnmsac) and one that overwrites the first multiplicand (vmadd, vnmsub).

# Integer multiply-add, overwrite addend vmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Integer multiply-sub, overwrite minuend vnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vnmsac.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vs2[i]) + vd[i] # Integer multiply-add, overwrite multiplicand vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i] vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i] # Integer multiply-sub, overwrite multiplicand vnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vnmsub.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vd[i]) + vs2[i]

v / _vector_single_width_integer_multiply_instructions

  1. Vector Integer Arithmetic Instructions / 12.10. Vector Single-Width Integer Multiply Instructions
Operation Arguments Description
vmul.vv vs2, vs1, vd

# Signed multiply, returning low bits of product vmul.vv vd, vs2, vs1, vm # Vector-vector vmul.vx vd, vs2, rs1, vm # vector-scalar # Signed multiply, returning high bits of product vmulh.vv vd, vs2, vs1, vm # Vector-vector vmulh.vx vd, vs2, rs1, vm # vector-scalar # Unsigned multiply, returning high bits of product vmulhu.vv vd, vs2, vs1, vm # Vector-vector vmulhu.vx vd, vs2, rs1, vm # vector-scalar # Signed(vs2)-Unsigned multiply, returning high bits of product vmulhsu.vv vd, vs2, vs1, vm # Vector-vector vmulhsu.vx vd, vs2, rs1, vm # vector-scalar

vmul.vx vs2, rs1, vd

# Signed multiply, returning low bits of product vmul.vv vd, vs2, vs1, vm # Vector-vector vmul.vx vd, vs2, rs1, vm # vector-scalar # Signed multiply, returning high bits of product vmulh.vv vd, vs2, vs1, vm # Vector-vector vmulh.vx vd, vs2, rs1, vm # vector-scalar # Unsigned multiply, returning high bits of product vmulhu.vv vd, vs2, vs1, vm # Vector-vector vmulhu.vx vd, vs2, rs1, vm # vector-scalar # Signed(vs2)-Unsigned multiply, returning high bits of product vmulhsu.vv vd, vs2, vs1, vm # Vector-vector vmulhsu.vx vd, vs2, rs1, vm # vector-scalar

v / _vector_single_width_saturating_add_and_subtract

  1. Vector Fixed-Point Arithmetic Instructions / 13.1. Vector Single-Width Saturating Add and Subtract
Operation Arguments Description
vsaddu.vi vs2, simm5, vd

# Saturating adds of unsigned integers. vsaddu.vv vd, vs2, vs1, vm # Vector-vector vsaddu.vx vd, vs2, rs1, vm # vector-scalar vsaddu.vi vd, vs2, imm, vm # vector-immediate # Saturating adds of signed integers. vsadd.vv vd, vs2, vs1, vm # Vector-vector vsadd.vx vd, vs2, rs1, vm # vector-scalar vsadd.vi vd, vs2, imm, vm # vector-immediate # Saturating subtract of unsigned integers. vssubu.vv vd, vs2, vs1, vm # Vector-vector vssubu.vx vd, vs2, rs1, vm # vector-scalar # Saturating subtract of signed integers. vssub.vv vd, vs2, vs1, vm # Vector-vector vssub.vx vd, vs2, rs1, vm # vector-scalar

vsaddu.vv vs2, vs1, vd

# Saturating adds of unsigned integers. vsaddu.vv vd, vs2, vs1, vm # Vector-vector vsaddu.vx vd, vs2, rs1, vm # vector-scalar vsaddu.vi vd, vs2, imm, vm # vector-immediate # Saturating adds of signed integers. vsadd.vv vd, vs2, vs1, vm # Vector-vector vsadd.vx vd, vs2, rs1, vm # vector-scalar vsadd.vi vd, vs2, imm, vm # vector-immediate # Saturating subtract of unsigned integers. vssubu.vv vd, vs2, vs1, vm # Vector-vector vssubu.vx vd, vs2, rs1, vm # vector-scalar # Saturating subtract of signed integers. vssub.vv vd, vs2, vs1, vm # Vector-vector vssub.vx vd, vs2, rs1, vm # vector-scalar

vsaddu.vx vs2, rs1, vd

# Saturating adds of unsigned integers. vsaddu.vv vd, vs2, vs1, vm # Vector-vector vsaddu.vx vd, vs2, rs1, vm # vector-scalar vsaddu.vi vd, vs2, imm, vm # vector-immediate # Saturating adds of signed integers. vsadd.vv vd, vs2, vs1, vm # Vector-vector vsadd.vx vd, vs2, rs1, vm # vector-scalar vsadd.vi vd, vs2, imm, vm # vector-immediate # Saturating subtract of unsigned integers. vssubu.vv vd, vs2, vs1, vm # Vector-vector vssubu.vx vd, vs2, rs1, vm # vector-scalar # Saturating subtract of signed integers. vssub.vv vd, vs2, vs1, vm # Vector-vector vssub.vx vd, vs2, rs1, vm # vector-scalar

v / _vector_single_width_scaling_shift_instructions

  1. Vector Fixed-Point Arithmetic Instructions / 13.4. Vector Single-Width Scaling Shift Instructions
Operation Arguments Description
vssrl.vi vs2, simm5, vd

These instructions shift the input value right, and round off the shifted out bits according to vxrm. The scaling right shifts have both zero-extending (vssrl) and sign-extending (vssra) forms. The data to be shifted is in the vector register group specified by vs2 and the shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. Only the low lg2(SEW) bits of the shift-amount value are used to control the shift amount.

# Scaling shift right logical vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) # Scaling shift right arithmetic vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm)

vssrl.vv vs2, vs1, vd

These instructions shift the input value right, and round off the shifted out bits according to vxrm. The scaling right shifts have both zero-extending (vssrl) and sign-extending (vssra) forms. The data to be shifted is in the vector register group specified by vs2 and the shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. Only the low lg2(SEW) bits of the shift-amount value are used to control the shift amount.

# Scaling shift right logical vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) # Scaling shift right arithmetic vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm)

vssrl.vx vs2, rs1, vd

These instructions shift the input value right, and round off the shifted out bits according to vxrm. The scaling right shifts have both zero-extending (vssrl) and sign-extending (vssra) forms. The data to be shifted is in the vector register group specified by vs2 and the shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. Only the low lg2(SEW) bits of the shift-amount value are used to control the shift amount.

# Scaling shift right logical vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) # Scaling shift right arithmetic vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm)

v / _vector_single_width_shift_instructions

  1. Vector Integer Arithmetic Instructions / 12.6. Vector Single-Width Shift Instructions
Operation Arguments Description
vsll.vi vs2, simm5, vd

# Bit shift operations vsll.vv vd, vs2, vs1, vm # Vector-vector vsll.vx vd, vs2, rs1, vm # vector-scalar vsll.vi vd, vs2, uimm, vm # vector-immediate vsrl.vv vd, vs2, vs1, vm # Vector-vector vsrl.vx vd, vs2, rs1, vm # vector-scalar vsrl.vi vd, vs2, uimm, vm # vector-immediate vsra.vv vd, vs2, vs1, vm # Vector-vector vsra.vx vd, vs2, rs1, vm # vector-scalar vsra.vi vd, vs2, uimm, vm # vector-immediate

vsll.vv vs2, vs1, vd

# Bit shift operations vsll.vv vd, vs2, vs1, vm # Vector-vector vsll.vx vd, vs2, rs1, vm # vector-scalar vsll.vi vd, vs2, uimm, vm # vector-immediate vsrl.vv vd, vs2, vs1, vm # Vector-vector vsrl.vx vd, vs2, rs1, vm # vector-scalar vsrl.vi vd, vs2, uimm, vm # vector-immediate vsra.vv vd, vs2, vs1, vm # Vector-vector vsra.vx vd, vs2, rs1, vm # vector-scalar vsra.vi vd, vs2, uimm, vm # vector-immediate

vsll.vx vs2, rs1, vd

# Bit shift operations vsll.vv vd, vs2, vs1, vm # Vector-vector vsll.vx vd, vs2, rs1, vm # vector-scalar vsll.vi vd, vs2, uimm, vm # vector-immediate vsrl.vv vd, vs2, vs1, vm # Vector-vector vsrl.vx vd, vs2, rs1, vm # vector-scalar vsrl.vi vd, vs2, uimm, vm # vector-immediate vsra.vv vd, vs2, vs1, vm # Vector-vector vsra.vx vd, vs2, rs1, vm # vector-scalar vsra.vi vd, vs2, uimm, vm # vector-immediate

v / _vector_slide1down_instruction

  1. Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation Arguments Description
vfslide1down.vf vs2, rs1, vd

The vfslide1down instruction is defined analogously, but sources its scalar argument from an f register.

vslide1down.vx vs2, rs1, vd

The vslide1down instruction copies the first vl-1 active elements values from index i+1 in the source vector register group to index i in the destination vector register group.

The vslide1down instruction places the x register argument at location vl-1 in the destination vector register, provided that element vl-1 is active, otherwise the destination element is unchanged. If XLEN < SEW, the value is sign-extended to SEW bits. If XLEN > SEW, the least-significant bits are copied over and the high SEW-XLEN bits are ignored.

vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] vfslide1down.vf vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=f[rs1]

vslide1down behavior i < vstart unchanged vstart <= i < vl-1 vd[i] = vs2[i+1] if v0.mask[i] enabled vstart <= i = vl-1 vd[vl-1] = x[rs1] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy

v / _vector_slide1up

  1. Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation Arguments Description
vfslide1up.vf vs2, rs1, vd

The vfslide1up instruction is defined analogously, but sources its scalar argument from an f register.

vslide1up.vx vs2, rs1, vd

The vslide1up instruction places the x register argument at location 0 of the destination vector register group, provided that element 0 is active, otherwise the destination element update follows the current mask agnostic/undisturbed policy. If XLEN < SEW, the value is sign-extended to SEW bits. If XLEN > SEW, the least-significant bits are copied over and the high SEW-XLEN bits are ignored.

The vslide1up instruction requires that the destination vector register group does not overlap the source vector register group. Otherwise, the instruction encoding is reserved.

vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] vfslide1up.vf vd, vs2, rs1, vm # vd[0]=f[rs1], vd[i+1] = vs2[i]

vslide1up behavior i < vstart unchanged 0 = i = vstart vd[i] = x[rs1] if v0.mask[i] enabled max(vstart, 1) <= i < vl vd[i] = vs2[i-1] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy

v / _vector_slide_instructions

  1. Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation Arguments Description
vslideup.vi vs2, simm5, vd

For all of the vslideup, vslidedown, v[f]slide1up, and v[f]slide1down instructions, if vstart >= vl, the instruction performs no operation and leaves the destination vector register unchanged.

vslideup.vx vs2, rs1, vd

For all of the vslideup, vslidedown, v[f]slide1up, and v[f]slide1down instructions, if vstart >= vl, the instruction performs no operation and leaves the destination vector register unchanged.

v / _vector_slidedown_instructions

  1. Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation Arguments Description
vslidedown.vi vs2, simm5, vd

For vslidedown, the value in vl specifies the maximum number of destination elements that are written. The remaining elements past vl are handled according to the current tail policy (Section Vector Tail Agnostic and Vector Mask Agnostic vta and vma ).

vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] vslidedown.vi vd, vs2, uimm, vm # vd[i] = vs2[i+uimm]

vslidedown behavior for source elements for element i in slide 0 <= i+OFFSET < VLMAX src[i] = vs2[i+OFFSET] VLMAX <= i+OFFSET src[i] = 0 vslidedown behavior for destination element i in slide 0 < i < vstart Unchanged vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy

vslidedown.vx vs2, rs1, vd

For vslidedown, the value in vl specifies the maximum number of destination elements that are written. The remaining elements past vl are handled according to the current tail policy (Section Vector Tail Agnostic and Vector Mask Agnostic vta and vma ).

vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] vslidedown.vi vd, vs2, uimm, vm # vd[i] = vs2[i+uimm]

vslidedown behavior for source elements for element i in slide 0 <= i+OFFSET < VLMAX src[i] = vs2[i+OFFSET] VLMAX <= i+OFFSET src[i] = 0 vslidedown behavior for destination element i in slide 0 < i < vstart Unchanged vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy

v / _vector_strided_instructions

  1. Vector Loads and Stores / 8.5. Vector Strided Instructions
Operation Arguments Description
vlse8.v rs2, rs1, vd

# Vector strided loads and stores # vd destination, rs1 base address, rs2 byte stride vlse8.v vd, (rs1), rs2, vm # 8-bit strided load vlse16.v vd, (rs1), rs2, vm # 16-bit strided load vlse32.v vd, (rs1), rs2, vm # 32-bit strided load vlse64.v vd, (rs1), rs2, vm # 64-bit strided load # vs3 store data, rs1 base address, rs2 byte stride vsse8.v vs3, (rs1), rs2, vm # 8-bit strided store vsse16.v vs3, (rs1), rs2, vm # 16-bit strided store vsse32.v vs3, (rs1), rs2, vm # 32-bit strided store vsse64.v vs3, (rs1), rs2, vm # 64-bit strided store

Spike ISS Implementation:
// vlse8.v and vlsseg[2-8]e8.v
VI_LD(i * RS2, fn, int8, false);

v / _vector_unit_stride_instructions

  1. Vector Loads and Stores / 8.4. Vector Unit-Stride Instructions
Operation Arguments Description
vle8.v rs1, vd

# Vector unit-stride loads and stores # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) vle8.v vd, (rs1), vm # 8-bit unit-stride load vle16.v vd, (rs1), vm # 16-bit unit-stride load vle32.v vd, (rs1), vm # 32-bit unit-stride load vle64.v vd, (rs1), vm # 64-bit unit-stride load # vs3 store data, rs1 base address, vm is mask encoding (v0.t or <missing>) vse8.v vs3, (rs1), vm # 8-bit unit-stride store vse16.v vs3, (rs1), vm # 16-bit unit-stride store vse32.v vs3, (rs1), vm # 32-bit unit-stride store vse64.v vs3, (rs1), vm # 64-bit unit-stride store

Spike ISS Implementation:
// vle8.v and vlseg[2-8]e8.v
VI_LD(0, (i * nf + fn), int8, false);
vlm.v rs1, vd

vlm.v and vsm.v are encoded with the same width[2:0]=0 encoding as vle8.v and vse8.v, but are distinguished by different lumop and sumop encodings. Since vlm.v and vsm.v operate as byte loads and stores, vstart is in units of bytes for these instructions.

# Vector unit-stride mask load vlm.v vd, (rs1) # Load byte vector of length ceil(vl/8) # Vector unit-stride mask store vsm.v vs3, (rs1) # Store byte vector of length ceil(vl/8)

Spike ISS Implementation:
// vle1.v and vlseg[2-8]e8.v
VI_LD(0, (i * nf + fn), int8, true);

v / _vector_unordered_single_width_floating_point_sum_reduction

  1. Vector Reduction Operations / 15.3. Vector Single-Width Floating-Point Reduction Instructions
Operation Arguments Description
vfredusum.vs vs2, vs1, vd

The unordered sum reduction instruction, vfredusum, provides an implementation more freedom in performing the reduction.

v / _vector_widening_floating_point_addsubtract_instructions

  1. Vector Floating-Point Instructions / 14.3. Vector Widening Floating-Point Add/Subtract Instructions
Operation Arguments Description
vfwadd.vf vs2, rs1, vd

# Widening FP add/subtract, 2*SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar

vfwadd.vv vs2, vs1, vd

# Widening FP add/subtract, 2*SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar

vfwadd.wf vs2, rs1, vd

# Widening FP add/subtract, 2*SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar

vfwadd.wv vs2, vs1, vd

# Widening FP add/subtract, 2*SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar

v / _vector_widening_floating_point_fused_multiply_add_instructions

  1. Vector Floating-Point Instructions / 14.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
Operation Arguments Description
vfwmacc.vf vs2, rs1, vd

# FP widening multiply-accumulate, overwrites addend vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP widening negate-(multiply-accumulate), overwrites addend vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP widening multiply-subtract-accumulator, overwrites addend vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP widening negate-(multiply-subtract-accumulator), overwrites addend vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i]

vfwmacc.vv vs2, vs1, vd

# FP widening multiply-accumulate, overwrites addend vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP widening negate-(multiply-accumulate), overwrites addend vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP widening multiply-subtract-accumulator, overwrites addend vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP widening negate-(multiply-subtract-accumulator), overwrites addend vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i]

v / _vector_widening_floating_point_multiply

  1. Vector Floating-Point Instructions / 14.5. Vector Widening Floating-Point Multiply
Operation Arguments Description
vfwmul.vf vs2, rs1, vd

# Widening floating-point multiply vfwmul.vv vd, vs2, vs1, vm # vector-vector vfwmul.vf vd, vs2, rs1, vm # vector-scalar

vfwmul.vv vs2, vs1, vd

# Widening floating-point multiply vfwmul.vv vd, vs2, vs1, vm # vector-vector vfwmul.vf vd, vs2, rs1, vm # vector-scalar

v / _vector_widening_integer_addsubtract

  1. Vector Integer Arithmetic Instructions / 12.2. Vector Widening Integer Add/Subtract
Operation Arguments Description
vwaddu.vv vs2, vs1, vd

# Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar

vwaddu.vx vs2, rs1, vd

# Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar

vwaddu.wv vs2, vs1, vd

# Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar

vwaddu.wx vs2, rs1, vd

# Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar

v / _vector_widening_integer_multiply_add_instructions

  1. Vector Integer Arithmetic Instructions / 12.14. Vector Widening Integer Multiply-Add Instructions
Operation Arguments Description
vwmaccu.vv vs2, vs1, vd

# Widening unsigned-integer multiply-add, overwrite addend vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-integer multiply-add, overwrite addend vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-unsigned-integer multiply-add, overwrite addend vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i] vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i] # Widening unsigned-signed-integer multiply-add, overwrite addend vwmaccus.vx vd, rs1, vs2, vm # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i]

vwmaccu.vx vs2, rs1, vd

# Widening unsigned-integer multiply-add, overwrite addend vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-integer multiply-add, overwrite addend vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-unsigned-integer multiply-add, overwrite addend vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i] vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i] # Widening unsigned-signed-integer multiply-add, overwrite addend vwmaccus.vx vd, rs1, vs2, vm # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i]

v / _vector_widening_integer_multiply_instructions

  1. Vector Integer Arithmetic Instructions / 12.12. Vector Widening Integer Multiply Instructions
Operation Arguments Description
vwmul.vv vs2, vs1, vd

# Widening signed-integer multiply vwmul.vv vd, vs2, vs1, vm # vector-vector vwmul.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned-integer multiply vwmulu.vv vd, vs2, vs1, vm # vector-vector vwmulu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar

vwmul.vx vs2, rs1, vd

# Widening signed-integer multiply vwmul.vv vd, vs2, vs1, vm # vector-vector vwmul.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned-integer multiply vwmulu.vv vd, vs2, vs1, vm # vector-vector vwmulu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar

v / _vfirst_find_first_set_mask_bit

  1. Vector Mask Instructions / 16.3. vfirst find-first-set mask bit
Operation Arguments Description
vfirst.m vs2, rd

The vfirst instruction finds the lowest-numbered active element of the source mask vector that has the value 1 and writes that element's index to a GPR. If no active element has the value 1, -1 is written to the GPR.

The vfirst.m instruction writes x[rd] even if vl=0 (with the value -1, since no mask elements are active).

Traps on vfirst are always reported with a vstart of 0. The vfirst instruction will raise an illegal instruction exception if vstart is non-zero.

vfirst.m rd, vs2, vm

Spike ISS Implementation:
// vmfirst rd, vs2
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
reg_t vl = P.VU.vl->read();
reg_t rs2_num = insn.rs2();
require(P.VU.vstart->read() == 0);
reg_t pos = -1;
for (reg_t i=P.VU.vstart->read(); i < vl; ++i) {
VI_LOOP_ELEMENT_SKIP()

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx ) >> mpos) & 0x1) == 1;
if (vs2_lsb) {
pos = i;
break;
}
}
P.VU.vstart->write(0);
WRITE_RD(pos);

v / _vmsif_m_set_including_first_mask_bit

  1. Vector Mask Instructions / 16.5. vmsif.m set-including-first mask bit
Operation Arguments Description
vmsif.m vs2, vd

Traps on vmsif.m are always reported with a vstart of 0. The vmsif instruction will raise an illegal instruction exception if vstart is non-zero.

vmsif.m vd, vs2, vm # Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 1 0 0 v3 contents vmsif.m v2, v3 0 0 0 0 0 1 1 1 v2 contents 1 0 0 1 0 1 0 1 v3 contents vmsif.m v2, v3 0 0 0 0 0 0 0 1 v2 1 1 0 0 0 0 1 1 v0 vcontents 1 0 0 1 0 1 0 0 v3 contents vmsif.m v2, v3, v0.t 1 1 x x x x 1 1 v2 contents

Spike ISS Implementation:
// vmsif.m rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
require(P.VU.vstart->read() == 0);
require_vm;
require(insn.rd() != insn.rs2());

reg_t vl = P.VU.vl->read();
reg_t rd_num = insn.rd();
reg_t rs2_num = insn.rs2();

bool has_one = false;
for (reg_t i = P.VU.vstart->read(); i < vl; ++i) {
const int midx = i / 64;
const int mpos = i % 64;
const uint64_t mmask = UINT64_C(1) << mpos; \

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx ) >> mpos) & 0x1) == 1;
bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1;

if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) {
auto &vd = P.VU.elt<uint64_t>(rd_num, midx, true);
uint64_t res = 0;
if (!has_one && !vs2_lsb) {
res = 1;
} else if (!has_one && vs2_lsb) {
has_one = true;
res = 1;
}
vd = (vd & ~mmask) | ((res << mpos) & mmask);
}
}

v / _vmsof_m_set_only_first_mask_bit

  1. Vector Mask Instructions / 16.6. vmsof.m set-only-first mask bit
Operation Arguments Description
vmsof.m vs2, vd

Traps on vmsof.m are always reported with a vstart of 0. The vmsof instruction will raise an illegal instruction exception if vstart is non-zero.

vmsof.m vd, vs2, vm # Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 1 0 0 v3 contents vmsof.m v2, v3 0 0 0 0 0 1 0 0 v2 contents 1 0 0 1 0 1 0 1 v3 contents vmsof.m v2, v3 0 0 0 0 0 0 0 1 v2 1 1 0 0 0 0 1 1 v0 vcontents 1 1 0 1 0 1 0 0 v3 contents vmsof.m v2, v3, v0.t 0 1 x x x x 0 0 v2 contents

Spike ISS Implementation:
// vmsof.m rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
require(P.VU.vstart->read() == 0);
require_vm;
require(insn.rd() != insn.rs2());

reg_t vl = P.VU.vl->read();
reg_t rd_num = insn.rd();
reg_t rs2_num = insn.rs2();

bool has_one = false;
for (reg_t i = P.VU.vstart->read() ; i < vl; ++i) {
const int midx = i / 64;
const int mpos = i % 64;
const uint64_t mmask = UINT64_C(1) << mpos; \

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx ) >> mpos) & 0x1) == 1;
bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1;

if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) {
uint64_t &vd = P.VU.elt<uint64_t>(rd_num, midx, true);
uint64_t res = 0;
if (!has_one && vs2_lsb) {
has_one = true;
res = 1;
}
vd = (vd & ~mmask) | ((res << mpos) & mmask);
}
}

v / _widening_floating_pointinteger_type_convert_instructions

  1. Vector Floating-Point Instructions / 14.18. Widening Floating-Point/Integer Type-Convert Instructions
Operation Arguments Description
vfwcvt.f.f.v vs2, vd

vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.

vfwcvt.f.x.v vs2, vd

vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.

vfwcvt.f.xu.v vs2, vd

vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.

vfwcvt.rtz.x.f.v vs2, vd

vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.

vfwcvt.rtz.xu.f.v vs2, vd

vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.

vfwcvt.x.f.v vs2, vd

vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.

vfwcvt.xu.f.v vs2, vd

vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.

v / _zve_vector_extensions_for_embedded_processors

  1. Standard Vector Extensions / 19.2. Zve*: Vector Extensions for Embedded Processors
Operation Arguments Description
vmulh.vv vs2, vs1, vd

All Zve* extensions support all vector integer instructions (Section Vector Integer Arithmetic Instructions ), except that the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64*.

vmulh.vx vs2, rs1, vd

All Zve* extensions support all vector integer instructions (Section Vector Integer Arithmetic Instructions ), except that the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64*.

v / sec-agnostic

  1. Vector Extension Programmer’s Model / 4.4. Vector type register, vtype
Operation Arguments Description
vmsbf.m vs2, vd

In addition, except for mask load instructions, any element in the tail of a mask result can also be written with the value the mask-producing operation would have calculated with vl=VLMAX. Furthermore, for mask-logical instructions and vmsbf.m, vmsif.m, vmsof.m mask-manipulation instructions, any element in the tail of the result can be written with the value the mask-producing operation would have calculated with vl=VLEN, SEW=8, and LMUL=8 (i.e., all bits of the mask register can be overwritten).

Spike ISS Implementation:
// vmsbf.m vd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
require(P.VU.vstart->read() == 0);
require_vm;
require(insn.rd() != insn.rs2());

reg_t vl = P.VU.vl->read();
reg_t rd_num = insn.rd();
reg_t rs2_num = insn.rs2();

bool has_one = false;
for (reg_t i = P.VU.vstart->read(); i < vl; ++i) {
const int midx = i / 64;
const int mpos = i % 64;
const uint64_t mmask = UINT64_C(1) << mpos; \

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx) >> mpos) & 0x1) == 1;
bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1;


if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) {
auto &vd = P.VU.elt<uint64_t>(rd_num, midx, true);
uint64_t res = 0;
if (!has_one && !vs2_lsb) {
res = 1;
} else if (!has_one && vs2_lsb) {
has_one = true;
}
vd = (vd & ~mmask) | ((res << mpos) & mmask);
}
}
vsetvli zimm11, rs1, rd

The assembly syntax adds two mandatory flags to the vsetvli instruction:

ta # Tail agnostic tu # Tail undisturbed ma # Mask agnostic mu # Mask undisturbed vsetvli t0, a0, e32, m4, ta, ma # Tail agnostic, mask agnostic vsetvli t0, a0, e32, m4, tu, ma # Tail undisturbed, mask agnostic vsetvli t0, a0, e32, m4, ta, mu # Tail agnostic, mask undisturbed vsetvli t0, a0, e32, m4, tu, mu # Tail undisturbed, mask undisturbed

Spike ISS Implementation:
require_vector_novtype(false);
WRITE_RD(P.VU.set_vl(insn.rd(), insn.rs1(), RS1, insn.v_zimm11()));

v / sec-mask-register-logical

  1. Vector Mask Instructions / 16.1. Vector Mask-Register Logical Instructions
Operation Arguments Description
vmand.mm vs2, vs1, vd

vmand.mm vd, src1, src2

vmand.mm vd, src2, src2

vmand.mm vd, src1, src1

vmand.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && vs1.mask[i] vmnand.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] && vs1.mask[i]) vmandn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && !vs1.mask[i] vmxor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] ^^ vs1.mask[i] vmor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || vs1.mask[i] vmnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] || vs1.mask[i]) vmorn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || !vs1.mask[i] vmxnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] ^^ vs1.mask[i])

vmmv.m vd, vs => vmand.mm vd, vs, vs # Copy mask register vmclr.m vd => vmxor.mm vd, vd, vd # Clear mask register vmset.m vd => vmxnor.mm vd, vd, vd # Set mask register vmnot.m vd, vs => vmnand.mm vd, vs, vs # Invert bits

vmandn.mm vs2, vs1, vd

vmandn.mm vd, src2, src1

vmandn.mm vd, src1, src2

vmnand.mm vs2, vs1, vd

vmnand.mm vd, src1, src1

vmnand.mm vd, src2, src2

vmnand.mm vd, src1, src2

vmnor.mm vs2, vs1, vd

vmnor.mm vd, src1, src2

vmorn.mm vs2, vs1, vd

vmorn.mm vd, src2, src1

vmorn.mm vd, src1, src2

vmxnor.mm vs2, vs1, vd

vmxnor.mm vd, src1, src2

vmxnor.mm vd, vd, vd

vmxor.mm vs2, vs1, vd

vmxor.mm vd, vd, vd

vmxor.mm vd, src1, src2

v / sec-narrowing

  1. Vector Arithmetic Instruction Formats / 11.3. Narrowing Vector Arithmetic Instructions
Operation Arguments Description
vnsra.wi vs2, simm5, vd

A vn* prefix on the opcode is used to distinguish these instructions in the assembler, or a vfn* prefix for narrowing floating-point opcodes. The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv)

vnsra.wv vs2, vs1, vd

A vn* prefix on the opcode is used to distinguish these instructions in the assembler, or a vfn* prefix for narrowing floating-point opcodes. The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv)

vnsra.wx vs2, rs1, vd

A vn* prefix on the opcode is used to distinguish these instructions in the assembler, or a vfn* prefix for narrowing floating-point opcodes. The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv)

v / sec-vec-operands

  1. Vector Instruction Formats / 6.2. Vector Operands
Operation Arguments Description
vnsrl.wi vs2, simm5, vd

The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not).

vnsrl.wv vs2, vs1, vd

The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not).

vnsrl.wx vs2, rs1, vd

The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not).

vzext.vf2 vs2, vd

The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not).

vzext.vf4 vs2, vd

The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not).

vzext.vf8 vs2, vd

The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not).

v / sec-vector-float-reduce

  1. Vector Reduction Operations / 15.3. Vector Single-Width Floating-Point Reduction Instructions
Operation Arguments Description
vfredosum.vs vs2, vs1, vd

# Simple reductions. vfredosum.vs vd, vs2, vs1, vm # Ordered sum vfredusum.vs vd, vs2, vs1, vm # Unordered sum vfredmax.vs vd, vs2, vs1, vm # Maximum value vfredmin.vs vd, vs2, vs1, vm # Minimum value

v / sec-vector-float-reduce-widen

  1. Vector Reduction Operations / 15.4. Vector Widening Floating-Point Reduction Instructions
Operation Arguments Description
vfwredosum.vs vs2, vs1, vd

# Simple reductions. vfwredosum.vs vd, vs2, vs1, vm # Ordered sum vfwredusum.vs vd, vs2, vs1, vm # Unordered sum

v / sec-vector-integer-reduce

  1. Vector Reduction Operations / 15.1. Vector Single-Width Integer Reduction Instructions
Operation Arguments Description
vredsum.vs vs2, vs1, vd

# Simple reductions, where [*] denotes all active elements: vredsum.vs vd, vs2, vs1, vm # vd[0] = sum( vs1[0] , vs2[*] ) vredmaxu.vs vd, vs2, vs1, vm # vd[0] = maxu( vs1[0] , vs2[*] ) vredmax.vs vd, vs2, vs1, vm # vd[0] = max( vs1[0] , vs2[*] ) vredminu.vs vd, vs2, vs1, vm # vd[0] = minu( vs1[0] , vs2[*] ) vredmin.vs vd, vs2, vs1, vm # vd[0] = min( vs1[0] , vs2[*] ) vredand.vs vd, vs2, vs1, vm # vd[0] = and( vs1[0] , vs2[*] ) vredor.vs vd, vs2, vs1, vm # vd[0] = or( vs1[0] , vs2[*] ) vredxor.vs vd, vs2, vs1, vm # vd[0] = xor( vs1[0] , vs2[*] )

v / sec-vector-integer-reduce-widen

  1. Vector Reduction Operations / 15.2. Vector Widening Integer Reduction Instructions
Operation Arguments Description
vwredsum.vs vs2, vs1, vd

The vwredsum.vs instruction sign-extends the SEW-wide vector elements before summing them.

vwredsumu.vs vs2, vs1, vd

The unsigned vwredsumu.vs instruction zero-extends the SEW-wide vector elements before summing them, then adds the 2*SEW-width scalar element, and stores the result in a 2*SEW-width scalar element.

For both vwredsumu.vs and vwredsum.vs, overflows wrap around.

# Unsigned sum reduction into double-width accumulator vwredsumu.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(zero-extend(SEW)) # Signed sum reduction into double-width accumulator vwredsum.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(sign-extend(SEW))

counters

counters / zicntr-standard-extension-for-base-counters-and-timers

11 Counters / 11.1 “Zicntr” Standard Extension for Base Counters and Timers

Operation Arguments Description
rdcycle rd

RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only.

The RDCYCLE pseudoinstruction reads the low XLEN bits of the cycle CSR which holds a count of the number of clock cycles executed by the processor core on which the hart is running from an arbitrary start time in the past. RDCYCLEH is an RV32I-only instruction that reads bits 63-32 of the same cycle counter. The underlying 64-bit counter should never overflow in practice. The rate at which the cycle counter advances will depend on the implementation and operating environment. The execution environment should provide a means to determine the current rate (cycles/second) at which the cycle counter is incrementing.

RDCYCLE is intended to return the number of cycles executed by the processor core, not the hart. Precisely defining what is a "core" is difficult given some implementation choices (e.g., AMD Bulldozer). Precisely defining what is a "clock cycle" is also difficult given the range of implementations (including software emulations), but the intent is that RDCYCLE is used for performance monitoring along with the other performance counters. In particular, where there is one hart/core, one would expect cycle-count/instructions-retired to measure CPI for a hart.

Even though there is no precise definition that works for all platforms, this is still a useful facility for most platforms, and an imprecise, common, "usually correct" standard here is better than no standard. The intent of RDCYCLE was primarily performance monitoring/tuning, and the specification was written with that goal in mind.

On some simple platforms, cycle count might represent a valid implementation of RDTIME, in which case RDTIME and RDCYCLE may return the same result.

rdcycleh rd

RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only.

The RDCYCLE pseudoinstruction reads the low XLEN bits of the cycle CSR which holds a count of the number of clock cycles executed by the processor core on which the hart is running from an arbitrary start time in the past. RDCYCLEH is an RV32I-only instruction that reads bits 63-32 of the same cycle counter. The underlying 64-bit counter should never overflow in practice. The rate at which the cycle counter advances will depend on the implementation and operating environment. The execution environment should provide a means to determine the current rate (cycles/second) at which the cycle counter is incrementing.

rdinstret rd

RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only.

The RDINSTRET pseudoinstruction reads the low XLEN bits of the instret CSR, which counts the number of instructions retired by this hart from some arbitrary start point in the past. RDINSTRETH is an RV32I-only instruction that reads bits 63-32 of the same instruction counter. The underlying 64-bit counter should never overflow in practice.

rdinstreth rd

RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only.

The RDINSTRET pseudoinstruction reads the low XLEN bits of the instret CSR, which counts the number of instructions retired by this hart from some arbitrary start point in the past. RDINSTRETH is an RV32I-only instruction that reads bits 63-32 of the same instruction counter. The underlying 64-bit counter should never overflow in practice.

rdtime rd

RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only.

The RDTIME pseudoinstruction reads the low XLEN bits of the time CSR, which counts wall-clock real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only instruction that reads bits 63-32 of the same real-time counter. The underlying 64-bit counter increments by one with each tick of the real-time clock, and, for realistic real-time clock frequencies, should never overflow in practice. The execution environment should provide a means of determining the period of a counter tick (seconds/tick). The period must be constant. The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock. The environment should provide a means to determine the accuracy of the clock (i.e., the maximum relative error between the nominal and actual real-time clock periods).

On some simple platforms, cycle count might represent a valid implementation of RDTIME, in which case RDTIME and RDCYCLE may return the same result.

rdtimeh rd

RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only.

The RDTIME pseudoinstruction reads the low XLEN bits of the time CSR, which counts wall-clock real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only instruction that reads bits 63-32 of the same real-time counter. The underlying 64-bit counter increments by one with each tick of the real-time clock, and, for realistic real-time clock frequencies, should never overflow in practice. The execution environment should provide a means of determining the period of a counter tick (seconds/tick). The period must be constant. The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock. The environment should provide a means to determine the accuracy of the clock (i.e., the maximum relative error between the nominal and actual real-time clock periods).

zihintpause

zihintpause / chap:zihintpause

4 “Zihintpause” Pause Hint, Version 2.0 /

Operation Arguments Description
pause

The PAUSE instruction is a HINT that indicates the current hart's rate of instruction retirement should be temporarily reduced or paused. The duration of its effect must be bounded and may be zero. No architectural state is changed.

Software can use the PAUSE instruction to reduce energy consumption while executing spin-wait code sequences. Multithreaded cores might temporarily relinquish execution resources to other harts when PAUSE is executed. It is recommended that a PAUSE instruction generally be included in the code sequence for a spin-wait loop.

A future extension might add primitives similar to the x86 MONITOR/MWAIT instructions, which provide a more efficient mechanism to wait on writes to a specific memory location. However, these instructions would not supplant PAUSE. PAUSE is more appropriate when polling for non-memory events, when polling for multiple events, or when software does not know precisely what events it is polling for.

The duration of a PAUSE instruction's effect may vary significantly within and among implementations. In typical implementations this duration should be much less than the time to perform a context switch, probably more on the rough order of an on-chip cache miss latency or a cacheless access to main memory.

A series of PAUSE instructions can be used to create a cumulative delay loosely proportional to the number of PAUSE instructions. In spin-wait loops in portable code, however, only one PAUSE instruction should be used before re-evaluating loop conditions, else the hart might stall longer than optimal on some implementations, degrading system performance.

PAUSE is encoded as a FENCE instruction with pred=W, succ=0, fm=0, rd=x0, and rs1=x0.

PAUSE is encoded as a hint within the FENCE opcode because some implementations are expected to deliberately stall the PAUSE instruction until outstanding memory transactions have completed. Because the successor set is null, however, PAUSE does not mandate any particular memory ordering--hence, it truly is a HINT.

Like other FENCE instructions, PAUSE cannot be used within LR/SC sequences without voiding the forward-progress guarantee.

The choice of a predecessor set of W is arbitrary, since the successor set is null. Other HINTs similar to PAUSE might be encoded with other predecessor sets.

zfh

half precision convert and move instructions half precision floating point classify instruction half precision load and store instructions

zfh / half-precision-convert-and-move-instructions

15 “Zfh” and “Zfhmin” Standard Extensions for Half-Precision Floating-Point, Version 0.1 / 15.3 Half-Precision Convert and Move Instructions

Operation Arguments Description
fcvt.d.h rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.

fcvt.h.d rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.

fcvt.h.l rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.

fcvt.h.lu rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.

fcvt.h.q rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.

fcvt.h.s rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.

fcvt.h.w rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.

fcvt.h.wu rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.

fcvt.l.h rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.

fcvt.lu.h rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.

fcvt.q.h rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.

fcvt.s.h rd, rs1

New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.

fcvt.w.h rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.

fcvt.wu.h rd, rs1

New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.

fmv.h.x rd, rs1

FMV.H.X moves the half-precision value encoded in IEEE 754-2008 standard encoding from the lower 16 bits of integer register rs1 to the floating-point register rd, NaN-boxing the result.

FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

fmv.x.h rd, rs1

Instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.H moves the half-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd, filling the upper XLEN-16 bits with copies of the floating-point number's sign bit.

FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

fsgnj.h rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction.

Spike ISS Implementation:
require_either_extension(EXT_ZFH, EXT_ZHINX);
require_fp;
WRITE_FRD_H(fsgnj16(freg(FRS1_H), freg(FRS2_H), false, false));
fsgnjn.h rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction.

Spike ISS Implementation:
require_either_extension(EXT_ZFH, EXT_ZHINX);
require_fp;
WRITE_FRD_H(fsgnj16(freg(FRS1_H), freg(FRS2_H), true, false));
fsgnjx.h rd, rs1, rs2

Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction.

Spike ISS Implementation:
require_either_extension(EXT_ZFH, EXT_ZHINX);
require_fp;
WRITE_FRD_H(fsgnj16(freg(FRS1_H), freg(FRS2_H), false, true));

zfh / half-precision-floating-point-classify-instruction

15 “Zfh” and “Zfhmin” Standard Extensions for Half-Precision Floating-Point, Version 0.1 / 15.5 Half-Precision Floating-Point Classify Instruction

Operation Arguments Description
fclass.h rd, rs1

The half-precision floating-point classify instruction, FCLASS.H, is defined analogously to its single-precision counterpart, but operates on half-precision operands.

Spike ISS Implementation:
require_either_extension(EXT_ZFH, EXT_ZHINX);
require_fp;
WRITE_RD(f16_classify(FRS1_H));

zfh / half-precision-load-and-store-instructions

15 “Zfh” and “Zfhmin” Standard Extensions for Half-Precision Floating-Point, Version 0.1 / 15.1 Half-Precision Load and Store Instructions

Operation Arguments Description
flh rd, rs1, imm12

FLH and FSH are only guaranteed to execute atomically if the effective address is naturally aligned.

FLH and FSH do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. FLH NaN-boxes the result written to rd, whereas FSH ignores all but the lower 16 bits in rs2.

Spike ISS Implementation:
require_extension(EXT_INTERNAL_ZFH_MOVE);
require_fp;
WRITE_FRD(f16(MMU.load<uint16_t>(RS1 + insn.i_imm())));
fsh rs1, rs2, imm12

FLH and FSH are only guaranteed to execute atomically if the effective address is naturally aligned.

FLH and FSH do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. FLH NaN-boxes the result written to rd, whereas FSH ignores all but the lower 16 bits in rs2.

Spike ISS Implementation:
require_extension(EXT_INTERNAL_ZFH_MOVE);
require_fp;
MMU.store<uint16_t>(RS1 + insn.s_imm(), FRS2.v[0]);

csr

csr /

/

Operation Arguments Description
csrc csr, rs

Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm.

Psuedo Opcode, Equivalent Operations:
csrrc x0, csr, rs

csrci csr, imm

Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm.

Psuedo Opcode, Equivalent Operations:
csrrci x0, csr, imm

csrr rd, csr

The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm.

Psuedo Opcode, Equivalent Operations:
csrrs rd, csr, x0

csrrc rd, rs1

The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the value of the CSR, zero-extends the value to XLEN bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be cleared in the CSR, if that CSR bit is writable. Other bits in the CSR are not explicitly written.

For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects. A CSRRW with rs1=x0 will attempt to write zero to the destination CSR.

The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields.

Spike ISS Implementation:
bool write = insn.rs1() != 0;
int csr = validate_csr(insn.csr(), write);
reg_t old = p->get_csr(csr, insn, write);
if (write) {
p->put_csr(csr, old & ~RS1);
}
WRITE_RD(sext_xlen(old));
serialize();
csrrci rd, zimm

The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields.

Spike ISS Implementation:
bool write = insn.rs1() != 0;
int csr = validate_csr(insn.csr(), write);
reg_t old = p->get_csr(csr, insn, write);
if (write) {
p->put_csr(csr, old & ~(reg_t)insn.rs1());
}
WRITE_RD(sext_xlen(old));
serialize();
csrrs rd, rs1

The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the value of the CSR, zero-extends the value to XLEN bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are not explicitly written.

For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects. A CSRRW with rs1=x0 will attempt to write zero to the destination CSR.

The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields.

The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm.

Spike ISS Implementation:
bool write = insn.rs1() != 0;
int csr = validate_csr(insn.csr(), write);
reg_t old = p->get_csr(csr, insn, write);
if (write) {
p->put_csr(csr, old | RS1);
}
WRITE_RD(sext_xlen(old));
serialize();
csrrsi rd, zimm

The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields.

Spike ISS Implementation:
bool write = insn.rs1() != 0;
int csr = validate_csr(insn.csr(), write);
reg_t old = p->get_csr(csr, insn, write);
if (write) {
p->put_csr(csr, old | insn.rs1());
}
WRITE_RD(sext_xlen(old));
serialize();
csrrw rd, rs1

The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values in the CSRs and integer registers. CSRRW reads the old value of the CSR, zero-extends the value to XLEN bits, then writes it to integer register rd. The initial value in rs1 is written to the CSR. If rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read.

For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects. A CSRRW with rs1=x0 will attempt to write zero to the destination CSR.

The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields.

The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm.

Spike ISS Implementation:
int csr = validate_csr(insn.csr(), true);
reg_t old = p->get_csr(csr, insn, true);
p->put_csr(csr, RS1);
WRITE_RD(sext_xlen(old));
serialize();
csrrwi rd, zimm

The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields.

The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm.

Spike ISS Implementation:
int csr = validate_csr(insn.csr(), true);
reg_t old = p->get_csr(csr, insn, true);
p->put_csr(csr, insn.rs1());
WRITE_RD(sext_xlen(old));
serialize();
csrs csr, rs

Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm.

Psuedo Opcode, Equivalent Operations:
csrrs x0, csr, rs

csrsi csr, imm

Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm.

Psuedo Opcode, Equivalent Operations:
csrrsi x0, csr, imm

csrw csr, rs

The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm.

Psuedo Opcode, Equivalent Operations:
csrrw x0, csr, rs

csrwi csr, imm

The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm.

Psuedo Opcode, Equivalent Operations:
csrrwi x0, csr, imm

supervisor

supervisor / svinval

7 “Svinval” Standard Extension for Fine-Grained Address-Translation Cache Invalidation, Version 1.0 /

Operation Arguments Description
hfence.gvma rs1, rs2

The Svinval extension splits SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA instructions into finer-grained invalidation and ordering operations that can be more efficiently batched or pipelined on certain classes of high-performance implementation.

If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs.

SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception.

High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed.

Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.

hfence.vvma rs1, rs2

The Svinval extension splits SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA instructions into finer-grained invalidation and ordering operations that can be more efficiently batched or pipelined on certain classes of high-performance implementation.

If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs.

SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception.

High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed.

Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.

hinval.gvma rs1, rs2

If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs.

SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception.

In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction.

Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.

hinval.vvma rs1, rs2

If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs.

SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception.

In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction.

Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.

sfence.inval.ir

The SINVAL.VMA instruction invalidates any address-translation cache entries that an SFENCE.VMA instruction with the same values of rs1 and rs2 would invalidate. However, unlike SFENCE.VMA, SINVAL.VMA instructions are only ordered with respect to SFENCE.VMA, SFENCE.W.INVAL, and SFENCE.INVAL.IR instructions as defined below.

The SFENCE.W.INVAL instruction guarantees that any previous stores already visible to the current RISC-V hart are ordered before subsequent SINVAL.VMA instructions executed by the same hart. The SFENCE.INVAL.IR instruction guarantees that any previous SINVAL.VMA instructions executed by the current hart are ordered before subsequent implicit references by that hart to the memory-management data structures.

When executed in order (but not necessarily consecutively) by a single hart, the sequence SFENCE.W.INVAL, SINVAL.VMA, and SFENCE.INVAL.IR has the same effect as a hypothetical SFENCE.VMA instruction in which:

reads and writes following the SFENCE.INVAL.IR are considered to be those subsequent to the SFENCE.VMA.

If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs.

SFENCE.W.INVAL and SFENCE.INVAL.IR instructions do not need to be trapped when mstatus.TVM=1 or when hstatus.VTVM=1, as they only have ordering effects but no visible side effects. Trapping of the SINVAL.VMA instruction is sufficient to enable emulation of the intended overall TLB maintenance functionality.

In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction.

High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed.

Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.

sfence.w.inval

The SINVAL.VMA instruction invalidates any address-translation cache entries that an SFENCE.VMA instruction with the same values of rs1 and rs2 would invalidate. However, unlike SFENCE.VMA, SINVAL.VMA instructions are only ordered with respect to SFENCE.VMA, SFENCE.W.INVAL, and SFENCE.INVAL.IR instructions as defined below.

The SFENCE.W.INVAL instruction guarantees that any previous stores already visible to the current RISC-V hart are ordered before subsequent SINVAL.VMA instructions executed by the same hart. The SFENCE.INVAL.IR instruction guarantees that any previous SINVAL.VMA instructions executed by the current hart are ordered before subsequent implicit references by that hart to the memory-management data structures.

When executed in order (but not necessarily consecutively) by a single hart, the sequence SFENCE.W.INVAL, SINVAL.VMA, and SFENCE.INVAL.IR has the same effect as a hypothetical SFENCE.VMA instruction in which:

reads and writes prior to the SFENCE.W.INVAL are considered to be those prior to the SFENCE.VMA, and

If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs.

SFENCE.W.INVAL and SFENCE.INVAL.IR instructions do not need to be trapped when mstatus.TVM=1 or when hstatus.VTVM=1, as they only have ordering effects but no visible side effects. Trapping of the SINVAL.VMA instruction is sufficient to enable emulation of the intended overall TLB maintenance functionality.

In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction.

High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed.

Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.

hypervisor

hypervisor / hypervisor-virtual-machine-load-and-store-instructions

5 Hypervisor Extension, Version 1.0 / 5.3 Hypervisor Instructions

Operation Arguments Description
hlv.b rd, rs1

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

Spike ISS Implementation:
require_extension('H');
require_novirt();
require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S);
WRITE_RD(MMU.guest_load<int8_t>(RS1));
hlv.bu rd, rs1

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

hlv.d rd, rs1

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

Spike ISS Implementation:
require_extension('H');
require_rv64;
require_novirt();
require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S);
WRITE_RD(MMU.guest_load<int64_t>(RS1));
hlv.h rd, rs1

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

Spike ISS Implementation:
require_extension('H');
require_novirt();
require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S);
WRITE_RD(MMU.guest_load<int16_t>(RS1));
hlv.hu rd, rs1

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.)

hlv.w rd, rs1

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

HLVX.WU is valid for RV32, even though LWU and HLV.WU are not. (For RV32, HLVX.WU can be considered a variant of HLV.W, as sign extension is irrelevant for 32-bit values.)

Spike ISS Implementation:
require_extension('H');
require_novirt();
require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S);
WRITE_RD(MMU.guest_load<int32_t>(RS1));
hlv.wu rd, rs1

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.)

HLVX.WU is valid for RV32, even though LWU and HLV.WU are not. (For RV32, HLVX.WU can be considered a variant of HLV.W, as sign extension is irrelevant for 32-bit values.)

hlvx.hu rd, rs1

Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.)

hlvx.wu rd, rs1

Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.)

HLVX.WU is valid for RV32, even though LWU and HLV.WU are not. (For RV32, HLVX.WU can be considered a variant of HLV.W, as sign extension is irrelevant for 32-bit values.)

hsv.b rs1, rs2

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

Spike ISS Implementation:
require_extension('H');
require_novirt();
require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S);
MMU.guest_store<uint8_t>(RS1, RS2);
hsv.d rs1, rs2

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

Spike ISS Implementation:
require_extension('H');
require_rv64;
require_novirt();
require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S);
MMU.guest_store<uint64_t>(RS1, RS2);
hsv.h rs1, rs2

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

Spike ISS Implementation:
require_extension('H');
require_novirt();
require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S);
MMU.guest_store<uint16_t>(RS1, RS2);
hsv.w rs1, rs2

For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.

Spike ISS Implementation:
require_extension('H');
require_novirt();
require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S);
MMU.guest_store<uint32_t>(RS1, RS2);