Assembler & Instructions

Intro & References
GCC Extended Assembler
Instruction List
rv32
rv64
a
- a / atomics
- a / sec:lrsc
c
d
f
m
- m / division-operations
- m / multiplication-operations
q
v
counters
- counters / zicntr-standard-extension-for-base-counters-and-timers
zihintpause
- zihintpause / chap:zihintpause
zfh
csr
- csr /
supervisor
- supervisor / svinval
hypervisor
- hypervisor / hypervisor-virtual-machine-load-and-store-instructions

Intro & References

For information on assembler programming:

Some good cheat sheets.

RISC-V Instruction-Set Cheatsheet, from Erik Engheim. PDF Version
RISC-V-QuickRefCard-v042.pdf, “basic assembly programmer’s quick reference card” from Dylan McNamee.
A old but nicely formatted “green card” summary of the ISA: RISCVGreenCardv8-20151013.pdf

GCC Extended Assembler

GCC gives direct access to instructions via __asm__. e.g.

No argument instructions:

    __asm__ volatile ("nop");
    __asm__ volatile ("wfi");

With register arguments:

__asm__ volatile ("csrrw    %0, mie, %1"  /* read and write atomically */
                      : "=r" (ret) /* output: register %0 */
                      : "r" (value)  /* input: register %1 */
                      : /* clobbers: none */);

Opcodes are listed in machine readable format here

Instruction List

rv32

rv64

counters

zihintpause

zfh

csr

supervisor

hypervisor

rv32

	control transfer instructions	environment call and breakpoints	immediate encoding variants	integer register immediate instructions
integer register register operations	sec:rv32:ldst

rv32 /

Operation	Arguments	Description
ebreak		RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity). Spike ISS Implementation: if (!STATE.debug_mode && ( (!STATE.v && STATE.prv == PRV_M && STATE.dcsr->ebreakm) \|\| (!STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreaks) \|\| (!STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreaku) \|\| (STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreakvs) \|\| (STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreakvu))) { throw trap_debug_mode(); } else { throw trap_breakpoint(STATE.v, pc); }
ecall		RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity). Spike ISS Implementation: switch (STATE.prv) { case PRV_U: throw trap_user_ecall(); case PRV_S: if (STATE.v) throw trap_virtual_supervisor_ecall(); else throw trap_supervisor_ecall(); case PRV_M: throw trap_machine_ecall(); default: abort(); }
fence	rs1, rd	RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity). Spike ISS Implementation:
nop		RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity). Psuedo Opcode, Equivalent Operations: addi x0, x0, 0

rv32 / control-transfer-instructions

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.5 Control Transfer Instructions

Operation	Arguments	Description
beq	rs1, rs2, bimm12	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation: if (RS1 == RS2) set_pc(BRANCH_TARGET);
bge	rs1, rs2, bimm12	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation: if (sreg_t(RS1) >= sreg_t(RS2)) set_pc(BRANCH_TARGET);
bgeu	rs1, rs2, bimm12	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation: if (RS1 >= RS2) set_pc(BRANCH_TARGET);
bgt	rs, rt, offset	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Psuedo Opcode, Equivalent Operations: blt rt, rs, offset
bgtu	rs, rt, offset	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Psuedo Opcode, Equivalent Operations: bltu rt, rs, offset
ble	rs, rt, offset	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Psuedo Opcode, Equivalent Operations: bge rt, rs, offset
bleu	rs, rt, offset	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Psuedo Opcode, Equivalent Operations: bgeu rt, rs, offset
blt	rs1, rs2, bimm12	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation: if (sreg_t(RS1) < sreg_t(RS2)) set_pc(BRANCH_TARGET);
bltu	rs1, rs2, bimm12	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Signed array bounds may be checked with a single BLTU instruction, since any negative index will compare greater than any nonnegative bound. Spike ISS Implementation: if (RS1 < RS2) set_pc(BRANCH_TARGET);
bne	rs1, rs2, bimm12	Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation: if (RS1 != RS2) set_pc(BRANCH_TARGET);

rv32 / environment-call-and-breakpoints

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.8 Environment Call and Breakpoints

Operation	Arguments	Description
sbreak		ECALL and EBREAK were previously named SCALL and SBREAK. The instructions have the same functionality and encoding, but were renamed to reflect that they can be used more generally than to call a supervisor-level operating system or debugger.
scall		ECALL and EBREAK were previously named SCALL and SBREAK. The instructions have the same functionality and encoding, but were renamed to reflect that they can be used more generally than to call a supervisor-level operating system or debugger.

rv32 / immediate-encoding-variants

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.3 Immediate Encoding Variants

Operation	Arguments	Description
addw	rd, rs1, rs2	In RV64I, checks of 32-bit signed additions can be optimized further by comparing the results of ADD and ADDW on the operands. Spike ISS Implementation: require_rv64; WRITE_RD(sext32(RS1 + RS2));
j	offset	There are a further two variants of the instruction formats (B/J) based on the handling of immediates, as shown in Figure 1.3 . Similarly, the only difference between the U and J formats is that the 20-bit immediate is shifted left by 12 bits to form U immediates and by 1 bit to form J immediates. The location of instruction bits in the U and J format immediates is chosen to maximize overlap with the other formats and with each other. Although more complex implementations might have separate adders for branch and jump calculations and so would not benefit from keeping the location of immediate bits constant across types of instruction, we wanted to reduce the hardware cost of the simplest implementations. By rotating bits in the instruction encoding of B and J immediates instead of using dynamic hardware muxes to multiply the immediate by 2, we reduce instruction signal fanout and immediate mux costs by around a factor of 2. The scrambled immediate encoding will add negligible time to static or ahead-of-time compilation. For dynamic generation of instructions, there is some small additional overhead, but the most common short forward branches have straightforward immediate encodings. Psuedo Opcode, Equivalent Operations: jal x0, offset

Operation

Arguments

Description

addw

rd, rs1, rs2

In RV64I, checks of 32-bit signed additions can be optimized further by comparing the results of ADD and ADDW on the operands.

Spike ISS Implementation:

require_rv64;
WRITE_RD(sext32(RS1 + RS2));

offset

There are a further two variants of the instruction formats (B/J) based on the handling of immediates, as shown in Figure 1.3 .

Similarly, the only difference between the U and J formats is that the 20-bit immediate is shifted left by 12 bits to form U immediates and by 1 bit to form J immediates. The location of instruction bits in the U and J format immediates is chosen to maximize overlap with the other formats and with each other.

Although more complex implementations might have separate adders for branch and jump calculations and so would not benefit from keeping the location of immediate bits constant across types of instruction, we wanted to reduce the hardware cost of the simplest implementations. By rotating bits in the instruction encoding of B and J immediates instead of using dynamic hardware muxes to multiply the immediate by 2, we reduce instruction signal fanout and immediate mux costs by around a factor of 2. The scrambled immediate encoding will add negligible time to static or ahead-of-time compilation. For dynamic generation of instructions, there is some small additional overhead, but the most common short forward branches have straightforward immediate encodings.

Psuedo Opcode, Equivalent Operations:

jal x0, offset

rv32 / integer-register-immediate-instructions

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.4 Integer Computational Instructions

Operation	Arguments	Description
addi	rd, rs1, imm12	ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic overflow is ignored and the result is simply the low XLEN bits of the result. ADDI rd, rs1, 0 is used to implement the MV rd, rs1 assembler pseudoinstruction. Spike ISS Implementation: WRITE_RD(sext_xlen(RS1 + insn.i_imm()));
andi	rd, rs1, imm12	ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs). Spike ISS Implementation: WRITE_RD(insn.i_imm() & RS1);
auipc	rd, imm20	AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the address of the AUIPC instruction, then places the result in register rd. The AUIPC instruction supports two-instruction sequences to access arbitrary offsets from the PC for both control-flow transfers and data accesses. The combination of an AUIPC and the 12-bit immediate in a JALR can transfer control to any 32-bit PC-relative address, while an AUIPC plus the 12-bit immediate offset in regular load or store instructions can access any 32-bit PC-relative data address. Spike ISS Implementation: WRITE_RD(sext_xlen(insn.u_imm() + pc));
jal	rd, jimm20	The current PC can be obtained by setting the U-immediate to 0. Although a JAL +4 instruction could also be used to obtain the local PC (of the instruction following the JAL), it might cause pipeline breaks in simpler microarchitectures or pollute BTB structures in more complex microarchitectures. Spike ISS Implementation: reg_t tmp = npc; set_pc(JUMP_TARGET); WRITE_RD(tmp);
jalr	rd, rs1, imm12	The AUIPC instruction supports two-instruction sequences to access arbitrary offsets from the PC for both control-flow transfers and data accesses. The combination of an AUIPC and the 12-bit immediate in a JALR can transfer control to any 32-bit PC-relative address, while an AUIPC plus the 12-bit immediate offset in regular load or store instructions can access any 32-bit PC-relative data address. Spike ISS Implementation: reg_t tmp = npc; set_pc((RS1 + insn.i_imm()) & ~reg_t(1)); WRITE_RD(tmp);
lui	rd, imm20	LUI (load upper immediate) is used to build 32-bit constants and uses the U-type format. LUI places the 32-bit U-immediate value into the destination register rd, filling in the lowest 12 bits with zeros. Spike ISS Implementation: WRITE_RD(insn.u_imm());
mv	rd, rs	ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic overflow is ignored and the result is simply the low XLEN bits of the result. ADDI rd, rs1, 0 is used to implement the MV rd, rs1 assembler pseudoinstruction. Psuedo Opcode, Equivalent Operations: addi rd, rs, 0
not	rd, rs	ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs). Psuedo Opcode, Equivalent Operations: xori rd, rs, -1
ori	rd, rs1, imm12	ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs). Spike ISS Implementation: // prefetch.i/r/w hint when rd = 0 and i_imm[4:0] = 0/1/3 WRITE_RD(insn.i_imm() \| RS1);
seqz	rd, rs	SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs). Psuedo Opcode, Equivalent Operations: sltiu rd, rs, 1
slli	rd, rs1	Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits). Spike ISS Implementation: require(SHAMT < xlen); WRITE_RD(sext_xlen(RS1 << SHAMT));
slti	rd, rs1, imm12	SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs). Spike ISS Implementation: WRITE_RD(sreg_t(RS1) < sreg_t(insn.i_imm()));
sltiu	rd, rs1, imm12	SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs). Spike ISS Implementation: WRITE_RD(RS1 < reg_t(insn.i_imm()));
srai	rd, rs1	Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits). Spike ISS Implementation: require(SHAMT < xlen); WRITE_RD(sext_xlen(sext_xlen(RS1) >> SHAMT));
srli	rd, rs1	Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits). Spike ISS Implementation: require(SHAMT < xlen); WRITE_RD(sext_xlen(zext_xlen(RS1) >> SHAMT));
xori	rd, rs1, imm12	ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs). Spike ISS Implementation: WRITE_RD(insn.i_imm() ^ RS1);

rv32 / integer-register-register-operations

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.4 Integer Computational Instructions

Operation	Arguments	Description
add	rd, rs1, rs2	ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation: WRITE_RD(sext_xlen(RS1 + RS2));
and	rd, rs1, rs2	ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation: WRITE_RD(RS1 & RS2);
or	rd, rs1, rs2	ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation: WRITE_RD(RS1 \| RS2);
sll	rd, rs1, rs2	SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2. Spike ISS Implementation: WRITE_RD(sext_xlen(RS1 << (RS2 & (xlen-1))));
slt	rd, rs1, rs2	ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation: WRITE_RD(sreg_t(RS1) < sreg_t(RS2));
sltu	rd, rs1, rs2	ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation: WRITE_RD(RS1 < RS2);
snez	rd, rs	ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Psuedo Opcode, Equivalent Operations: sltu rd, x0, rs
sra	rd, rs1, rs2	SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2. Spike ISS Implementation: WRITE_RD(sext_xlen(sext_xlen(RS1) >> (RS2 & (xlen-1))));
srl	rd, rs1, rs2	SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2. Spike ISS Implementation: WRITE_RD(sext_xlen(zext_xlen(RS1) >> (RS2 & (xlen-1))));
sub	rd, rs1, rs2	ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation: WRITE_RD(sext_xlen(RS1 - RS2));
xor	rd, rs1, rs2	ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation: WRITE_RD(RS1 ^ RS2);

rv32 / sec:rv32:ldst

2 RV32I Base Integer Instruction Set, Version 2.1 / 2.6 Load and Store Instructions

Operation	Arguments	Description
lb	rd, rs1, imm12	The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation: WRITE_RD(MMU.load<int8_t>(RS1 + insn.i_imm()));
lbu	rd, rs1, imm12	The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation: WRITE_RD(MMU.load<uint8_t>(RS1 + insn.i_imm()));
lh	rd, rs1, imm12	The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation: WRITE_RD(MMU.load<int16_t>(RS1 + insn.i_imm()));
lhu	rd, rs1, imm12	The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation: WRITE_RD(MMU.load<uint16_t>(RS1 + insn.i_imm()));
lw	rd, rs1, imm12	The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation: WRITE_RD(MMU.load<int32_t>(RS1 + insn.i_imm()));
sb	rs1, rs2, imm12	The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation: MMU.store<uint8_t>(RS1 + insn.s_imm(), RS2);
sh	rs1, rs2, imm12	The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation: MMU.store<uint16_t>(RS1 + insn.s_imm(), RS2);
sw	rs1, rs2, imm12	The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation: MMU.store<uint32_t>(RS1 + insn.s_imm(), RS2);

rv64

integer computational instructions

integer register immediate instructions

load and store instructions

rv64 / integer-computational-instructions

6 RV64I Base Integer Instruction Set, Version 2.1 / 6.2 Integer Computational Instructions

Operation	Arguments	Description
sllw	rd, rs1, rs2	SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. The shift amount is given by rs2[4:0]. Spike ISS Implementation: require_rv64; WRITE_RD(sext32(RS1 << (RS2 & 0x1F)));
sraw	rd, rs1, rs2	SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. The shift amount is given by rs2[4:0]. Spike ISS Implementation: require_rv64; WRITE_RD(sext32(int32_t(RS1) >> (RS2 & 0x1F)));
srlw	rd, rs1, rs2	SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. The shift amount is given by rs2[4:0]. Spike ISS Implementation: require_rv64; WRITE_RD(sext32((uint32_t)RS1 >> (RS2 & 0x1F)));

rv64 / integer-register-immediate-instructions

6 RV64I Base Integer Instruction Set, Version 2.1 / 6.2 Integer Computational Instructions

Operation	Arguments	Description
addiw	rd, rs1, imm12	ADDIW is an RV64I instruction that adds the sign-extended 12-bit immediate to register rs1 and produces the proper sign-extension of a 32-bit result in rd. Overflows are ignored and the result is the low 32 bits of the result sign-extended to 64 bits. Note, ADDIW rd, rs1, 0 writes the sign-extension of the lower 32 bits of register rs1 into register rd (assembler pseudoinstruction SEXT.W). Spike ISS Implementation: require_rv64; WRITE_RD(sext32(insn.i_imm() + RS1));
ld	rd, rs1, imm12	Note that the set of address offsets that can be formed by pairing LUI with LD, AUIPC with JALR, etc.in RV64I is [ - 231 - 211, 231 - 211 - 1]. Spike ISS Implementation: require_rv64; WRITE_RD(MMU.load<int64_t>(RS1 + insn.i_imm()));
sext.w	rd, rs	ADDIW is an RV64I instruction that adds the sign-extended 12-bit immediate to register rs1 and produces the proper sign-extension of a 32-bit result in rd. Overflows are ignored and the result is the low 32 bits of the result sign-extended to 64 bits. Note, ADDIW rd, rs1, 0 writes the sign-extension of the lower 32 bits of register rs1 into register rd (assembler pseudoinstruction SEXT.W). Psuedo Opcode, Equivalent Operations: addiw rd, rs, 0
slliw	rd, rs1	SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. SLLIW, SRLIW, and SRAIW encodings with imm[5] 0 are reserved. Previously, SLLIW, SRLIW, and SRAIW with imm[5] 0 were defined to cause illegal instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change. Spike ISS Implementation: require_rv64; WRITE_RD(sext32(RS1 << SHAMT));
sraiw	rd, rs1	SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. SLLIW, SRLIW, and SRAIW encodings with imm[5] 0 are reserved. Previously, SLLIW, SRLIW, and SRAIW with imm[5] 0 were defined to cause illegal instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change. Spike ISS Implementation: require_rv64; WRITE_RD(sext32(int32_t(RS1) >> SHAMT));
srliw	rd, rs1	SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. SLLIW, SRLIW, and SRAIW encodings with imm[5] 0 are reserved. Previously, SLLIW, SRLIW, and SRAIW with imm[5] 0 were defined to cause illegal instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change. Spike ISS Implementation: require_rv64; WRITE_RD(sext32((uint32_t)RS1 >> SHAMT));

rv64 / load-and-store-instructions

6 RV64I Base Integer Instruction Set, Version 2.1 / 6.3 Load and Store Instructions

Operation	Arguments	Description
lwu	rd, rs1, imm12	The LW instruction loads a 32-bit value from memory and sign-extends this to 64 bits before storing it in register rd for RV64I. The LWU instruction, on the other hand, zero-extends the 32-bit value from memory for RV64I. LH and LHU are defined analogously for 16-bit values, as are LB and LBU for 8-bit values. The SD, SW, SH, and SB instructions store 64-bit, 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory respectively. Spike ISS Implementation: require_rv64; WRITE_RD(MMU.load<uint32_t>(RS1 + insn.i_imm()));
sd	rs1, rs2, imm12	The LW instruction loads a 32-bit value from memory and sign-extends this to 64 bits before storing it in register rd for RV64I. The LWU instruction, on the other hand, zero-extends the 32-bit value from memory for RV64I. LH and LHU are defined analogously for 16-bit values, as are LB and LBU for 8-bit values. The SD, SW, SH, and SB instructions store 64-bit, 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory respectively. Spike ISS Implementation: require_rv64; MMU.store<uint64_t>(RS1 + insn.s_imm(), RS2);

Operation

Arguments

Description

lwu

rd, rs1, imm12

The LW instruction loads a 32-bit value from memory and sign-extends this to 64 bits before storing it in register rd for RV64I. The LWU instruction, on the other hand, zero-extends the 32-bit value from memory for RV64I. LH and LHU are defined analogously for 16-bit values, as are LB and LBU for 8-bit values. The SD, SW, SH, and SB instructions store 64-bit, 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory respectively.

Spike ISS Implementation:

require_rv64;
WRITE_RD(MMU.load<uint32_t>(RS1 + insn.i_imm()));

rs1, rs2, imm12

Spike ISS Implementation:

require_rv64;
MMU.store<uint64_t>(RS1 + insn.s_imm(), RS2);

rv64 / register-state

6 RV64I Base Integer Instruction Set, Version 2.1 / 6.1 Register State

Operation	Arguments	Description
subw	rd, rs1, rs2	The compiler and calling convention maintain an invariant that all 32-bit values are held in a sign-extended format in 64-bit registers. Even 32-bit unsigned integers extend bit 31 into bits 63 through 32. Consequently, conversion between unsigned and signed 32-bit integers is a no-op, as is conversion from a signed 32-bit integer to a signed 64-bit integer. Existing 64-bit wide SLTU and unsigned branch compares still operate correctly on unsigned 32-bit integers under this invariant. Similarly, existing 64-bit wide logical operations on 32-bit sign-extended integers preserve the sign-extension property. A few new instructions (ADD[I]W/SUBW/SxxW) are required for addition and shifts to ensure reasonable performance for 32-bit values. Spike ISS Implementation: require_rv64; WRITE_RD(sext32(RS1 - RS2));

Operation

Arguments

Description

subw

rd, rs1, rs2

The compiler and calling convention maintain an invariant that all 32-bit values are held in a sign-extended format in 64-bit registers. Even 32-bit unsigned integers extend bit 31 into bits 63 through 32. Consequently, conversion between unsigned and signed 32-bit integers is a no-op, as is conversion from a signed 32-bit integer to a signed 64-bit integer. Existing 64-bit wide SLTU and unsigned branch compares still operate correctly on unsigned 32-bit integers under this invariant. Similarly, existing 64-bit wide logical operations on 32-bit sign-extended integers preserve the sign-extension property. A few new instructions (ADD[I]W/SUBW/SxxW) are required for addition and shifts to ensure reasonable performance for 32-bit values.

Spike ISS Implementation:

require_rv64;
WRITE_RD(sext32(RS1 - RS2));

a

atomics

sec:lrsc

a / atomics

9 “A” Standard Extension for Atomic Instructions, Version 2.1 / 9.4 Atomic Memory Operations

Operation	Arguments	Description
amoadd.d	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs + RS2; }));
amoadd.w	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs + RS2; })));
amoand.d	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs & RS2; }));
amoand.w	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs & RS2; })));
amomax.d	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](int64_t lhs) { return std::max(lhs, int64_t(RS2)); }));
amomax.w	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](int32_t lhs) { return std::max(lhs, int32_t(RS2)); })));
amomaxu.d	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return std::max(lhs, RS2); }));
amomaxu.w	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return std::max(lhs, uint32_t(RS2)); })));
amomin.d	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](int64_t lhs) { return std::min(lhs, int64_t(RS2)); }));
amomin.w	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](int32_t lhs) { return std::min(lhs, int32_t(RS2)); })));
amominu.d	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return std::min(lhs, RS2); }));
amominu.w	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return std::min(lhs, uint32_t(RS2)); })));
amoor.d	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs \| RS2; }));
amoor.w	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs \| RS2; })));
amoswap.d	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t UNUSED lhs) { return RS2; }));
amoswap.w	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t UNUSED lhs) { return RS2; })));
amoxor.d	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs ^ RS2; }));
amoxor.w	rd, rs1, rs2	The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation: require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs ^ RS2; })));

a / sec:lrsc

9 “A” Standard Extension for Atomic Instructions, Version 2.1 / 9.2 Load-Reserved/Store-Conditional Instructions

Operation	Arguments	Description
lr.d	rd, rs1	Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd. Spike ISS Implementation: require_extension('A'); require_rv64; WRITE_RD(MMU.load_reserved<int64_t>(RS1));
lr.w	rd, rs1	Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd. Spike ISS Implementation: require_extension('A'); WRITE_RD(MMU.load_reserved<int32_t>(RS1));
sc.d	rd, rs1, rs2	Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd. Spike ISS Implementation: require_extension('A'); require_rv64; bool have_reservation = MMU.store_conditional<uint64_t>(RS1, RS2); WRITE_RD(!have_reservation);
sc.w	rd, rs1, rs2	Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd. Spike ISS Implementation: require_extension('A'); bool have_reservation = MMU.store_conditional<uint32_t>(RS1, RS2); WRITE_RD(!have_reservation);

c

compressed	control transfer instructions	integer constant generation instructions	integer register immediate operations	integer register register operations
load and store instructions	nop instruction	stack pointer based loads and stores

c / compressed

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.8 RVC Instruction Set Listings

Operation	Arguments	Description
c.slli_rv32	rd_rs1_n0, c_nzuimm6lo
c.srai_rv32	rd_rs1_p, c_nzuimm5
c.srli_rv32	rd_rs1_p, c_nzuimm5

c / control-transfer-instructions

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.4 Control Transfer Instructions

Operation	Arguments	Description
c.beqz	rs1_p, c_bimm9	C.BEQZ performs conditional control transfers. The offset is sign-extended and added to the pc to form the branch target address. It can therefore target a ±256 B range. C.BEQZ takes the branch if the value in register rs1' is zero. It expands to beq rs1'', x0, offset. Spike ISS Implementation: require_extension(EXT_ZCA); if (RVC_RS1S == 0) set_pc(pc + insn.rvc_b_imm());
c.bnez	rs1_p, c_bimm9	C.BNEZ is defined analogously, but it takes the branch if rs1' contains a nonzero value. It expands to bne rs1'', x0, offset. Spike ISS Implementation: require_extension(EXT_ZCA); if (RVC_RS1S != 0) set_pc(pc + insn.rvc_b_imm());
c.j	c_imm12	C.J performs an unconditional control transfer. The offset is sign-extended and added to the pc to form the jump target address. C.J can therefore target a ±2 KiB range. C.J expands to jal x0, offset. C.JAL is an RV32C-only instruction that performs the same operation as C.J, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1. C.JAL expands to jal x1, offset. Spike ISS Implementation: require_extension(EXT_ZCA); set_pc(pc + insn.rvc_j_imm());
c.jal	c_imm12	C.JAL is an RV32C-only instruction that performs the same operation as C.J, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1. C.JAL expands to jal x1, offset. Spike ISS Implementation: require_extension(EXT_ZCA); if (xlen == 32) { reg_t tmp = npc; set_pc(pc + insn.rvc_j_imm()); WRITE_REG(X_RA, tmp); } else { // c.addiw require(insn.rvc_rd() != 0); WRITE_RD(sext32(RVC_RS1 + insn.rvc_imm())); }

c / integer-constant-generation-instructions

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions

Operation	Arguments	Description
c.addi16sp	c_nzimm10	C.LUI loads the non-zero 6-bit immediate field into bits 17-12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination. C.LUI expands into lui rd, nzimm. C.LUI is only valid when rd {x0,x2}, and when the immediate is not equal to zero. The code points with nzimm=0 are reserved; the remaining code points with rd=x0 are HINTs; and the remaining code points with rd=x2 correspond to the C.ADDI16SP instruction.
c.li	rd, c_imm6	C.LI loads the sign-extended 6-bit immediate, imm, into register rd. C.LI expands into addi rd, x0, imm. C.LI is only valid when rd x0; the code points with rd=x0 encode HINTs. Spike ISS Implementation: require_extension(EXT_ZCA); WRITE_RD(insn.rvc_imm());
c.lui	rd_n2, c_nzimm18	C.LUI loads the non-zero 6-bit immediate field into bits 17-12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination. C.LUI expands into lui rd, nzimm. C.LUI is only valid when rd {x0,x2}, and when the immediate is not equal to zero. The code points with nzimm=0 are reserved; the remaining code points with rd=x0 are HINTs; and the remaining code points with rd=x2 correspond to the C.ADDI16SP instruction. Spike ISS Implementation: require_extension(EXT_ZCA); if (insn.rvc_rd() == 2) { // c.addi16sp require(insn.rvc_addi16sp_imm() != 0); WRITE_REG(X_SP, sext_xlen(RVC_SP + insn.rvc_addi16sp_imm())); } else { require(insn.rvc_imm() != 0); WRITE_RD(insn.rvc_imm() << 12); }

c / integer-register-immediate-operations

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions

Operation	Arguments	Description
c.addi	rd_rs1_n0, c_nzimm6, c_nzimm6	C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in register rd then writes the result to rd. C.ADDI expands into addi rd, rd, nzimm. C.ADDI is only valid when rd x0 and nzimm 0. The code points with rd=x0 encode the C.NOP instruction; the remaining code points with nzimm=0 encode HINTs. Spike ISS Implementation: require_extension(EXT_ZCA); WRITE_RD(sext_xlen(RVC_RS1 + insn.rvc_imm()));
c.addi4spn	rd_p, c_nzuimm10	C.ADDI4SPN is a CIW-format instruction that adds a zero-extended non-zero immediate, scaled by 4, to the stack pointer, x2, and writes the result to rd''. This instruction is used to generate pointers to stack-allocated variables, and expands to addi rd'', x2, nzuimm. C.ADDI4SPN is only valid when nzuimm 0; the code points with nzuimm=0 are reserved. Spike ISS Implementation: require_extension(EXT_ZCA); require(insn.rvc_addi4spn_imm() != 0); WRITE_RVC_RS2S(sext_xlen(RVC_SP + insn.rvc_addi4spn_imm()));
c.addiw	rd_rs1_n0, c_imm6	C.ADDIW is an RV64C/RV128C-only instruction that performs the same computation but produces a 32-bit result, then sign-extends result to 64 bits. C.ADDIW expands into addiw rd, rd, imm. The immediate can be zero for C.ADDIW, where this corresponds to sext.w rd. C.ADDIW is only valid when rd x0; the code points with rd=x0 are reserved.
c.andi	rd_rs1_p, c_imm6	C.ANDI is a CB-format instruction that computes the bitwise AND of the value in register rd' and the sign-extended 6-bit immediate, then writes the result to rd'. C.ANDI expands to andi rd'', rd'', imm. Spike ISS Implementation: require_extension(EXT_ZCA); WRITE_RVC_RS1S(RVC_RS1S & insn.rvc_imm());
c.slli	rd_rs1_n0, c_nzuimm6	C.SLLI is a CI-format instruction that performs a logical left shift of the value in register rd then writes the result to rd. The shift amount is encoded in the shamt field. For RV128C, a shift amount of zero is used to encode a shift of 64. C.SLLI expands into slli rd, rd, shamt, except for RV128C with shamt=0, which expands to slli rd, rd, 64. Spike ISS Implementation: require_extension(EXT_ZCA); require(insn.rvc_zimm() < xlen); WRITE_RD(sext_xlen(RVC_RS1 << insn.rvc_zimm()));
c.srai	rd_rs1_p, c_nzuimm6	C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic right shift. C.SRAI expands to srai rd'', rd'', shamt. Spike ISS Implementation: require_extension(EXT_ZCA); require(insn.rvc_zimm() < xlen); WRITE_RVC_RS1S(sext_xlen(sext_xlen(RVC_RS1S) >> insn.rvc_zimm()));
c.srli	rd_rs1_p, c_nzuimm6	C.SRLI is a CB-format instruction that performs a logical right shift of the value in register rd' then writes the result to rd'. The shift amount is encoded in the shamt field. For RV128C, a shift amount of zero is used to encode a shift of 64. Furthermore, the shift amount is sign-extended for RV128C, and so the legal shift amounts are 1-31, 64, and 96-127. C.SRLI expands into srli rd'', rd'', shamt, except for RV128C with shamt=0, which expands to srli rd'', rd'', 64. C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic right shift. C.SRAI expands to srai rd'', rd'', shamt. Spike ISS Implementation: require_extension(EXT_ZCA); require(insn.rvc_zimm() < xlen); WRITE_RVC_RS1S(sext_xlen(zext_xlen(RVC_RS1S) >> insn.rvc_zimm()));

c / integer-register-register-operations

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions

Operation	Arguments	Description
c.add	rd_rs1, c_rs2_n0	C.ADD adds the values in registers rd and rs2 and writes the result to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JALR and C.EBREAK instructions. The code points with rs2 x0 and rd = x0 are HINTs. Spike ISS Implementation: require_extension(EXT_ZCA); require(insn.rvc_rs2() != 0); WRITE_RD(sext_xlen(RVC_RS1 + RVC_RS2));
c.addw	rd_rs1_p, rs2_p	C.ADDW is an RV64C/RV128C-only instruction that adds the values in registers rd' and rs2', then sign-extends the lower 32 bits of the sum before writing the result to register rd'. C.ADDW expands into addw rd'', rd'', rs2''. Spike ISS Implementation: require_extension(EXT_ZCA); require_rv64; WRITE_RVC_RS1S(sext32(RVC_RS1S + RVC_RS2S));
c.and	rd_rs1_p, rs2_p	C.AND computes the bitwise AND of the values in registers rd' and rs2', then writes the result to register rd'. C.AND expands into and rd'', rd'', rs2''. Spike ISS Implementation: require_extension(EXT_ZCA); WRITE_RVC_RS1S(RVC_RS1S & RVC_RS2S);
c.ebreak		C.ADD adds the values in registers rd and rs2 and writes the result to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JALR and C.EBREAK instructions. The code points with rs2 x0 and rd = x0 are HINTs. Spike ISS Implementation: require_extension(EXT_ZCA); if (!STATE.debug_mode && ( (!STATE.v && STATE.prv == PRV_M && STATE.dcsr->ebreakm) \|\| (!STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreaks) \|\| (!STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreaku) \|\| (STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreakvs) \|\| (STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreakvu))) { throw trap_debug_mode(); } else { throw trap_breakpoint(STATE.v, pc); }
c.jalr	c_rs1_n0	C.ADD adds the values in registers rd and rs2 and writes the result to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JALR and C.EBREAK instructions. The code points with rs2 x0 and rd = x0 are HINTs. Spike ISS Implementation: require_extension(EXT_ZCA); require(insn.rvc_rs1() != 0); reg_t tmp = npc; set_pc(RVC_RS1 & ~reg_t(1)); WRITE_REG(X_RA, tmp);
c.jr	rs1_n0	C.MV copies the value in register rs2 into register rd. C.MV expands into add rd, x0, rs2. C.MV is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JR instruction. The code points with rs2 x0 and rd = x0 are HINTs. Spike ISS Implementation: require_extension(EXT_ZCA); require(insn.rvc_rs1() != 0); set_pc(RVC_RS1 & ~reg_t(1));
c.mv	rd, c_rs2_n0	C.MV copies the value in register rs2 into register rd. C.MV expands into add rd, x0, rs2. C.MV is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JR instruction. The code points with rs2 x0 and rd = x0 are HINTs. C.MV expands to a different instruction than the canonical MV pseudoinstruction, which instead uses ADDI. Implementations that handle MV specially, e.g. using register-renaming hardware, may find it more convenient to expand C.MV to MV instead of ADD, at slight additional hardware cost. Spike ISS Implementation: require_extension(EXT_ZCA); require(insn.rvc_rs2() != 0); WRITE_RD(RVC_RS2);
c.or	rd_rs1_p, rs2_p	C.OR computes the bitwise OR of the values in registers rd' and rs2', then writes the result to register rd'. C.OR expands into or rd'', rd'', rs2''. Spike ISS Implementation: require_extension(EXT_ZCA); WRITE_RVC_RS1S(RVC_RS1S \| RVC_RS2S);
c.sub	rd_rs1_p, rs2_p	C.SUB subtracts the value in register rs2' from the value in register rd', then writes the result to register rd'. C.SUB expands into sub rd'', rd'', rs2''. Spike ISS Implementation: require_extension(EXT_ZCA); WRITE_RVC_RS1S(sext_xlen(RVC_RS1S - RVC_RS2S));
c.subw	rd_rs1_p, rs2_p	C.SUBW is an RV64C/RV128C-only instruction that subtracts the value in register rs2' from the value in register rd', then sign-extends the lower 32 bits of the difference before writing the result to register rd'. C.SUBW expands into subw rd'', rd'', rs2''. Spike ISS Implementation: require_extension(EXT_ZCA); require_rv64; WRITE_RVC_RS1S(sext32(RVC_RS1S - RVC_RS2S));
c.xor	rd_rs1_p, rs2_p	C.XOR computes the bitwise XOR of the values in registers rd' and rs2', then writes the result to register rd'. C.XOR expands into xor rd'', rd'', rs2''. Spike ISS Implementation: require_extension(EXT_ZCA); WRITE_RVC_RS1S(RVC_RS1S ^ RVC_RS2S);

c / load-and-store-instructions

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.3 Load and Store Instructions

Operation	Arguments	Description
c.fld	rd_p, rs1_p, c_uimm8	C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register rd'. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to fld rd'', offset(rs1''). Spike ISS Implementation: require_extension(EXT_ZCD); require_fp; WRITE_RVC_FRS2S(f64(MMU.load<uint64_t>(RVC_RS1S + insn.rvc_ld_imm())));
c.flw	rd_p, rs1_p, c_uimm7	C.FLW is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to flw rd'', offset(rs1''). Spike ISS Implementation: if (xlen == 32) { require_extension(EXT_ZCF); require_fp; WRITE_RVC_FRS2S(f32(MMU.load<uint32_t>(RVC_RS1S + insn.rvc_lw_imm()))); } else { // c.ld require_extension(EXT_ZCA); WRITE_RVC_RS2S(MMU.load<int64_t>(RVC_RS1S + insn.rvc_ld_imm())); }
c.fsd	rs1_p, rs2_p, c_uimm8	C.FSD is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to fsd rs2'', offset(rs1''). Spike ISS Implementation: require_extension(EXT_ZCD); require_fp; MMU.store<uint64_t>(RVC_RS1S + insn.rvc_ld_imm(), RVC_FRS2S.v[0]);
c.fsw	rs1_p, rs2_p, c_uimm7	C.FSW is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to fsw rs2'', offset(rs1''). Spike ISS Implementation: if (xlen == 32) { require_extension(EXT_ZCF); require_fp; MMU.store<uint32_t>(RVC_RS1S + insn.rvc_lw_imm(), RVC_FRS2S.v[0]); } else { // c.sd require_extension(EXT_ZCA); MMU.store<uint64_t>(RVC_RS1S + insn.rvc_ld_imm(), RVC_RS2S); }
c.ld	rd_p, rs1_p, c_uimm8	C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to ld rd'', offset(rs1'').
c.lw	rd_p, rs1_p, c_uimm7	C.LW loads a 32-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to lw rd'', offset(rs1''). Spike ISS Implementation: require_extension(EXT_ZCA); WRITE_RVC_RS2S(MMU.load<int32_t>(RVC_RS1S + insn.rvc_lw_imm()));
c.sd	rs1_p, rs2_p, c_uimm8	C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to sd rs2'', offset(rs1'').
c.sw	rs1_p, rs2_p, c_uimm7	C.SW stores a 32-bit value in register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to sw rs2'', offset(rs1''). Spike ISS Implementation: require_extension(EXT_ZCA); MMU.store<uint32_t>(RVC_RS1S + insn.rvc_lw_imm(), RVC_RS2S);

c / nop-instruction

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions

Operation	Arguments	Description
c.nop	c_nzimm6	C.NOP is a CI-format instruction that does not change any user-visible state, except for advancing the pc and incrementing any applicable performance counters. C.NOP expands to nop. C.NOP is only valid when imm=0; the code points with imm 0 encode HINTs.

c / stack-pointer-based-loads-and-stores

17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.3 Load and Store Instructions

Operation	Arguments	Description
c.fldsp	rd, c_uimm9sp	C.FLDSP is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register rd. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to fld rd, offset(x2). Spike ISS Implementation: require_extension(EXT_ZCD); require_fp; WRITE_FRD(f64(MMU.load<uint64_t>(RVC_SP + insn.rvc_ldsp_imm())));
c.flwsp	rd, c_uimm8sp	C.FLWSP is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register rd. It computes its effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to flw rd, offset(x2). Spike ISS Implementation: if (xlen == 32) { require_extension(EXT_ZCF); require_fp; WRITE_FRD(f32(MMU.load<uint32_t>(RVC_SP + insn.rvc_lwsp_imm()))); } else { // c.ldsp require_extension(EXT_ZCA); require(insn.rvc_rd() != 0); WRITE_RD(MMU.load<int64_t>(RVC_SP + insn.rvc_ldsp_imm())); }
c.fsdsp	c_rs2, c_uimm9sp_s	C.FSDSP is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to fsd rs2, offset(x2). Spike ISS Implementation: require_extension(EXT_ZCD); require_fp; MMU.store<uint64_t>(RVC_SP + insn.rvc_sdsp_imm(), RVC_FRS2.v[0]);
c.fswsp	c_rs2, c_uimm8sp_s	C.FSWSP is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to fsw rs2, offset(x2). Spike ISS Implementation: if (xlen == 32) { require_extension(EXT_ZCF); require_fp; MMU.store<uint32_t>(RVC_SP + insn.rvc_swsp_imm(), RVC_FRS2.v[0]); } else { // c.sdsp require_extension(EXT_ZCA); MMU.store<uint64_t>(RVC_SP + insn.rvc_sdsp_imm(), RVC_RS2); }
c.ldsp	rd_n0, c_uimm9sp	C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into register rd. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to ld rd, offset(x2). C.LDSP is only valid when rd x0; the code points with rd = x0 are reserved.
c.lwsp	rd_n0, c_uimm8sp	C.LWSP loads a 32-bit value from memory into register rd. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to lw rd, offset(x2). C.LWSP is only valid when rd x0; the code points with rd = x0 are reserved. Spike ISS Implementation: require_extension(EXT_ZCA); require(insn.rvc_rd() != 0); WRITE_RD(MMU.load<int32_t>(RVC_SP + insn.rvc_lwsp_imm()));
c.sdsp	c_rs2, c_uimm9sp_s	C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to sd rs2, offset(x2).
c.swsp	c_rs2, c_uimm8sp_s	C.SWSP stores a 32-bit value in register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to sw rs2, offset(x2). Spike ISS Implementation: require_extension(EXT_ZCA); MMU.store<uint32_t>(RVC_SP + insn.rvc_swsp_imm(), RVC_RS2);

d

d standard extension for double precision floating point version 2.2	double precision floating point conversion and move instructions	fld fsd	sec:single float compute	single precision floating point compare instructions
single precision floating point conversion and move instructions

d / d-standard-extension-for-double-precision-floating-point-version-2.2

13 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / 13.7 Double-Precision Floating-Point Classify Instruction

Operation	Arguments	Description
fclass.d	rd, rs1	The double-precision floating-point classify instruction, FCLASS.D, is defined analogously to its single-precision counterpart, but operates on double-precision operands. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; WRITE_RD(f64_classify(FRS1_D));

Operation

Arguments

Description

fclass.d

rd, rs1

The double-precision floating-point classify instruction, FCLASS.D, is defined analogously to its single-precision counterpart, but operates on double-precision operands.

Spike ISS Implementation:

require_either_extension('D', EXT_ZDINX);
require_fp;
WRITE_RD(f64_classify(FRS1_D));

d / double-precision-floating-point-conversion-and-move-instructions

13 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / 13.5 Double-Precision Floating-Point Conversion and Move Instructions

Operation	Arguments	Description
fcvt.d.l	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.
fcvt.d.lu	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.
fcvt.d.s	rd, rs1	The double-precision to single-precision and single-precision to double-precision conversion instructions, FCVT.S.D and FCVT.D.S, are encoded in the OP-FP major opcode space and both the source and destination are floating-point registers. The rs2 field encodes the datatype of the source, and the fmt field encodes the datatype of the destination. FCVT.S.D rounds according to the RM field; FCVT.D.S will never round.
fcvt.d.w	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S. All floating-point to integer and integer to floating-point conversion instructions round according to the rm field. Note FCVT.D.W[U] always produces an exact result and is unaffected by rounding mode.
fcvt.d.wu	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.
fcvt.l.d	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.
fcvt.lu.d	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.
fcvt.s.d	rd, rs1	The double-precision to single-precision and single-precision to double-precision conversion instructions, FCVT.S.D and FCVT.D.S, are encoded in the OP-FP major opcode space and both the source and destination are floating-point registers. The rs2 field encodes the datatype of the source, and the fmt field encodes the datatype of the destination. FCVT.S.D rounds according to the RM field; FCVT.D.S will never round.
fcvt.w.d	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.
fcvt.wu.d	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S.
fmv.d	rd, rs	For XLEN>=64 only, instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.D moves the double-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd. FMV.D.X moves the double-precision value encoded in IEEE 754-2008 standard encoding from the integer register rs1 to the floating-point register rd. FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Psuedo Opcode, Equivalent Operations: fsgnj.d rd, rs, rs
fmv.x.d	rd, rs1	For XLEN>=64 only, instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.D moves the double-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd. FMV.D.X moves the double-precision value encoded in IEEE 754-2008 standard encoding from the integer register rs1 to the floating-point register rd. FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
fsgnj.d	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; WRITE_FRD_D(fsgnj64(freg(FRS1_D), freg(FRS2_D), false, false));
fsgnjn.d	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; WRITE_FRD_D(fsgnj64(freg(FRS1_D), freg(FRS2_D), true, false));
fsgnjx.d	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; WRITE_FRD_D(fsgnj64(freg(FRS1_D), freg(FRS2_D), false, true));

d / fld_fsd

13 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / 13.3 Double-Precision Load and Store Instructions

Operation	Arguments	Description
fld	rd, rs1, imm12	The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd. FSD stores a double-precision value from the floating-point registers to memory. FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN>=64. FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation: require_extension('D'); require_fp; WRITE_FRD(f64(MMU.load<uint64_t>(RS1 + insn.i_imm())));
fsd	rs1, rs2, imm12	The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd. FSD stores a double-precision value from the floating-point registers to memory. FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN>=64. FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation: require_extension('D'); require_fp; MMU.store<uint64_t>(RS1 + insn.s_imm(), FRS2.v[0]);

Operation

Arguments

Description

fld

rd, rs1, imm12

The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd. FSD stores a double-precision value from the floating-point registers to memory.

FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN>=64.

FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:

require_extension('D');
require_fp;
WRITE_FRD(f64(MMU.load<uint64_t>(RS1 + insn.i_imm())));

fsd

rs1, rs2, imm12

The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd. FSD stores a double-precision value from the floating-point registers to memory.

FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN>=64.

FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:

require_extension('D');
require_fp;
MMU.store<uint64_t>(RS1 + insn.s_imm(), FRS2.v[0]);

d / sec:single-float-compute

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.6 Single-Precision Floating-Point Computational Instructions

Operation	Arguments	Description
fadd.d	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_add(FRS1_D, FRS2_D)); set_fp_exceptions;
fdiv.d	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_div(FRS1_D, FRS2_D)); set_fp_exceptions;
fmadd.d	rd, rs1, rs2, rs3	FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mulAdd(FRS1_D, FRS2_D, FRS3_D)); set_fp_exceptions;
fmax.d	rd, rs1, rs2	Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; bool greater = f64_lt_quiet(FRS2_D, FRS1_D) \|\| (f64_eq(FRS2_D, FRS1_D) && (FRS2_D.v & F64_SIGN)); if (isNaNF64UI(FRS1_D.v) && isNaNF64UI(FRS2_D.v)) WRITE_FRD_D(f64(defaultNaNF64UI)); else WRITE_FRD_D((greater \|\| isNaNF64UI(FRS2_D.v) ? FRS1_D : FRS2_D)); set_fp_exceptions;
fmin.d	rd, rs1, rs2	Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; bool less = f64_lt_quiet(FRS1_D, FRS2_D) \|\| (f64_eq(FRS1_D, FRS2_D) && (FRS1_D.v & F64_SIGN)); if (isNaNF64UI(FRS1_D.v) && isNaNF64UI(FRS2_D.v)) WRITE_FRD_D(f64(defaultNaNF64UI)); else WRITE_FRD_D((less \|\| isNaNF64UI(FRS2_D.v) ? FRS1_D : FRS2_D)); set_fp_exceptions;
fmsub.d	rd, rs1, rs2, rs3	FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mulAdd(FRS1_D, FRS2_D, f64(FRS3_D.v ^ F64_SIGN))); set_fp_exceptions;
fmul.d	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mul(FRS1_D, FRS2_D)); set_fp_exceptions;
fnmadd.d	rd, rs1, rs2, rs3	FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mulAdd(f64(FRS1_D.v ^ F64_SIGN), FRS2_D, f64(FRS3_D.v ^ F64_SIGN))); set_fp_exceptions;
fnmsub.d	rd, rs1, rs2, rs3	FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mulAdd(f64(FRS1_D.v ^ F64_SIGN), FRS2_D, FRS3_D)); set_fp_exceptions;
fsqrt.d	rd, rs1	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_sqrt(FRS1_D)); set_fp_exceptions;
fsub.d	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_sub(FRS1_D, FRS2_D)); set_fp_exceptions;

d / single-precision-floating-point-compare-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.8 Single-Precision Floating-Point Compare Instructions

Operation	Arguments	Description
feq.d	rd, rs1, rs2	Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; WRITE_RD(f64_eq(FRS1_D, FRS2_D)); set_fp_exceptions;
fle.d	rd, rs1, rs2	Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; WRITE_RD(f64_le(FRS1_D, FRS2_D)); set_fp_exceptions;
flt.d	rd, rs1, rs2	Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation: require_either_extension('D', EXT_ZDINX); require_fp; WRITE_RD(f64_lt(FRS1_D, FRS2_D)); set_fp_exceptions;

d / single-precision-floating-point-conversion-and-move-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.7 Single-Precision Floating-Point Conversion and Move Instructions

Operation	Arguments	Description
fabs.d	rd, rs	Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Psuedo Opcode, Equivalent Operations: fsgnjx.d rd, rs, rs
fmv.d.x	rd, rs1	Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.
fneg.d	rd, rs	Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Psuedo Opcode, Equivalent Operations: fsgnjn.d rd, rs, rs

f

floating point control and status register	sec:single float	sec:single float compute	single precision floating point compare instructions	single precision floating point conversion and move instructions
single precision load and store instructions

f / floating-point-control-and-status-register

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.2 Floating-Point Control and Status Register

Operation	Arguments	Description
frcsr	rd	The fcsr register can be read and written with the FRCSR and FSCSR instructions, which are assembler pseudoinstructions built on the underlying CSR access instructions. FRCSR reads fcsr by copying it into integer register rd. FSCSR swaps the value in fcsr by copying the original value into integer register rd, and then writing a new value obtained from integer register rs1 into fcsr.
frflags	rd	The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags.
frrm	rd	The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags.
fscsr	rd, rs1	The fcsr register can be read and written with the FRCSR and FSCSR instructions, which are assembler pseudoinstructions built on the underlying CSR access instructions. FRCSR reads fcsr by copying it into integer register rd. FSCSR swaps the value in fcsr by copying the original value into integer register rd, and then writing a new value obtained from integer register rs1 into fcsr.
fsflags	rd, rs1	The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags.
fsrm	rd, rs1	The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags.

f / sec:single-float

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.9 Single-Precision Floating-Point Classify Instruction

Operation	Arguments	Description
fclass.s	rd, rs1	The FCLASS.S instruction examines the value in floating-point register rs1 and writes to integer register rd a 10-bit mask that indicates the class of the floating-point number. The format of the mask is described in Table [tab:fclass] . The corresponding bit in rd will be set if the property is true and clear otherwise. All other bits in rd are cleared. Note that exactly one bit in rd will be set. FCLASS.S does not set the floating-point exception flags. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; WRITE_RD(f32_classify(FRS1_F));

Operation

Arguments

Description

fclass.s

rd, rs1

The FCLASS.S instruction examines the value in floating-point register rs1 and writes to integer register rd a 10-bit mask that indicates the class of the floating-point number. The format of the mask is described in Table [tab:fclass] . The corresponding bit in rd will be set if the property is true and clear otherwise. All other bits in rd are cleared. Note that exactly one bit in rd will be set. FCLASS.S does not set the floating-point exception flags.

Spike ISS Implementation:

require_either_extension('F', EXT_ZFINX);
require_fp;
WRITE_RD(f32_classify(FRS1_F));

f / sec:single-float-compute

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.6 Single-Precision Floating-Point Computational Instructions

Operation	Arguments	Description
fadd.s	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_add(FRS1_F, FRS2_F)); set_fp_exceptions;
fdiv.s	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_div(FRS1_F, FRS2_F)); set_fp_exceptions;
fmadd.s	rd, rs1, rs2, rs3	FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mulAdd(FRS1_F, FRS2_F, FRS3_F)); set_fp_exceptions;
fmax.s	rd, rs1, rs2	Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; bool greater = f32_lt_quiet(FRS2_F, FRS1_F) \|\| (f32_eq(FRS2_F, FRS1_F) && (FRS2_F.v & F32_SIGN)); if (isNaNF32UI(FRS1_F.v) && isNaNF32UI(FRS2_F.v)) WRITE_FRD_F(f32(defaultNaNF32UI)); else WRITE_FRD_F((greater \|\| isNaNF32UI(FRS2_F.v) ? FRS1_F : FRS2_F)); set_fp_exceptions;
fmin.s	rd, rs1, rs2	Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; bool less = f32_lt_quiet(FRS1_F, FRS2_F) \|\| (f32_eq(FRS1_F, FRS2_F) && (FRS1_F.v & F32_SIGN)); if (isNaNF32UI(FRS1_F.v) && isNaNF32UI(FRS2_F.v)) WRITE_FRD_F(f32(defaultNaNF32UI)); else WRITE_FRD_F((less \|\| isNaNF32UI(FRS2_F.v) ? FRS1_F : FRS2_F)); set_fp_exceptions;
fmsub.s	rd, rs1, rs2, rs3	FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mulAdd(FRS1_F, FRS2_F, f32(FRS3_F.v ^ F32_SIGN))); set_fp_exceptions;
fmul.s	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mul(FRS1_F, FRS2_F)); set_fp_exceptions;
fnmadd.s	rd, rs1, rs2, rs3	FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mulAdd(f32(FRS1_F.v ^ F32_SIGN), FRS2_F, f32(FRS3_F.v ^ F32_SIGN))); set_fp_exceptions;
fnmsub.s	rd, rs1, rs2, rs3	FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mulAdd(f32(FRS1_F.v ^ F32_SIGN), FRS2_F, FRS3_F)); set_fp_exceptions;
fsqrt.s	rd, rs1	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_sqrt(FRS1_F)); set_fp_exceptions;
fsub.s	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_sub(FRS1_F, FRS2_F)); set_fp_exceptions;

f / single-precision-floating-point-compare-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.8 Single-Precision Floating-Point Compare Instructions

Operation	Arguments	Description
feq.s	rd, rs1, rs2	Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; WRITE_RD(f32_eq(FRS1_F, FRS2_F)); set_fp_exceptions;
fle.s	rd, rs1, rs2	Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; WRITE_RD(f32_le(FRS1_F, FRS2_F)); set_fp_exceptions;
flt.s	rd, rs1, rs2	Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; WRITE_RD(f32_lt(FRS1_F, FRS2_F)); set_fp_exceptions;

f / single-precision-floating-point-conversion-and-move-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.7 Single-Precision Floating-Point Conversion and Move Instructions

Operation	Arguments	Description
fabs.s	rd, rs	Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Psuedo Opcode, Equivalent Operations: fsgnjx.s rd, rs, rs
fcvt.l.s	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. FCVT.L.S
fcvt.lu.s	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. FCVT.LU.S
fcvt.s.l	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.
fcvt.s.lu	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.
fcvt.s.w	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. All floating-point to integer and integer to floating-point conversion instructions round according to the rm field. A floating-point register can be initialized to floating-point positive zero using FCVT.S.W rd, x0, which will never set any exception flags.
fcvt.s.wu	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs.
fcvt.w.s	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. FCVT.W.S
fcvt.wu.s	rd, rs1	Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. FCVT.WU.S
fmv.s	rd, rs	Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools. Psuedo Opcode, Equivalent Operations: fsgnj.s rd, rs, rs
fmv.w.x	rd, rs1	FMV.W.X moves the single-precision value encoded in IEEE 754-2008 standard encoding from the lower 32 bits of integer register rs1 to the floating-point register rd. The bits are not modified in the transfer, and in particular, the payloads of non-canonical NaNs are preserved. The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.
fmv.x.s	rd, rs1	The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.
fmv.x.w	rd, rs1	Instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.W moves the single-precision value in floating-point register rs1 represented in IEEE 754-2008 encoding to the lower 32 bits of integer register rd. The bits are not modified in the transfer, and in particular, the payloads of non-canonical NaNs are preserved. For RV64, the higher 32 bits of the destination register are filled with copies of the floating-point number's sign bit. The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools.
fneg.s	rd, rs	Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Psuedo Opcode, Equivalent Operations: fsgnjn.s rd, rs, rs
fsgnj.s	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; WRITE_FRD_F(fsgnj32(freg(FRS1_F), freg(FRS2_F), false, false));
fsgnjn.s	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; WRITE_FRD_F(fsgnj32(freg(FRS1_F), freg(FRS2_F), true, false));
fsgnjx.s	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Spike ISS Implementation: require_either_extension('F', EXT_ZFINX); require_fp; WRITE_FRD_F(fsgnj32(freg(FRS1_F), freg(FRS2_F), false, true));
neg	rd, rs	The sign-injection instructions provide floating-point MV, ABS, and NEG, as well as supporting a few other operations, including the IEEE copySign operation and sign manipulation in transcendental math function libraries. Although MV, ABS, and NEG only need a single register operand, whereas FSGNJ instructions need two, it is unlikely most microarchitectures would add optimizations to benefit from the reduced number of register reads for these relatively infrequent instructions. Even in this case, a microarchitecture can simply detect when both source registers are the same for FSGNJ instructions and only read a single copy. Psuedo Opcode, Equivalent Operations: sub rd, x0, rs

f / single-precision-load-and-store-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.5 Single-Precision Load and Store Instructions

Operation	Arguments	Description
flw	rd, rs1, imm12	Floating-point loads and stores use the same base+offset addressing mode as the integer base ISAs, with a base address in register rs1 and a 12-bit signed byte offset. The FLW instruction loads a single-precision floating-point value from memory into floating-point register rd. FSW stores a single-precision value from floating-point register rs2 to memory. FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned. FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation: require_extension('F'); require_fp; WRITE_FRD(f32(MMU.load<uint32_t>(RS1 + insn.i_imm())));
fsw	rs1, rs2, imm12	Floating-point loads and stores use the same base+offset addressing mode as the integer base ISAs, with a base address in register rs1 and a 12-bit signed byte offset. The FLW instruction loads a single-precision floating-point value from memory into floating-point register rd. FSW stores a single-precision value from floating-point register rs2 to memory. FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned. FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation: require_extension('F'); require_fp; MMU.store<uint32_t>(RS1 + insn.s_imm(), FRS2.v[0]);

Operation

Arguments

Description

flw

rd, rs1, imm12

Floating-point loads and stores use the same base+offset addressing mode as the integer base ISAs, with a base address in register rs1 and a 12-bit signed byte offset. The FLW instruction loads a single-precision floating-point value from memory into floating-point register rd. FSW stores a single-precision value from floating-point register rs2 to memory.

FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned.

FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:

require_extension('F');
require_fp;
WRITE_FRD(f32(MMU.load<uint32_t>(RS1 + insn.i_imm())));

fsw

rs1, rs2, imm12

FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned.

FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:

require_extension('F');
require_fp;
MMU.store<uint32_t>(RS1 + insn.s_imm(), FRS2.v[0]);

m

division operations

multiplication operations

m / division-operations

8 “M” Standard Extension for Integer Multiplication and Division, Version 2.0 / 8.2 Division Operations

Operation	Arguments	Description
div	rd, rs1, rs2	DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend. If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U] rdr, rs1, rs2 (rdq cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides. DIV[W] Spike ISS Implementation: require_extension('M'); sreg_t lhs = sext_xlen(RS1); sreg_t rhs = sext_xlen(RS2); if (rhs == 0) WRITE_RD(UINT64_MAX); else if (lhs == INT64_MIN && rhs == -1) WRITE_RD(lhs); else WRITE_RD(sext_xlen(lhs / rhs));
divu	rd, rs1, rs2	DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend. DIVU[W] Spike ISS Implementation: require_extension('M'); reg_t lhs = zext_xlen(RS1); reg_t rhs = zext_xlen(RS2); if (rhs == 0) WRITE_RD(UINT64_MAX); else WRITE_RD(sext_xlen(lhs / rhs));
divuw	rd, rs1, rs2	DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero. Spike ISS Implementation: require_extension('M'); require_rv64; reg_t lhs = zext32(RS1); reg_t rhs = zext32(RS2); if (rhs == 0) WRITE_RD(UINT64_MAX); else WRITE_RD(sext32(lhs / rhs));
divw	rd, rs1, rs2	DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero. Spike ISS Implementation: require_extension('M'); require_rv64; sreg_t lhs = sext32(RS1); sreg_t rhs = sext32(RS2); if (rhs == 0) WRITE_RD(UINT64_MAX); else WRITE_RD(sext32(lhs / rhs));
rem	rd, rs1, rs2	DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend. If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U] rdr, rs1, rs2 (rdq cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides. REM[W] Spike ISS Implementation: require_extension('M'); sreg_t lhs = sext_xlen(RS1); sreg_t rhs = sext_xlen(RS2); if (rhs == 0) WRITE_RD(lhs); else if (lhs == INT64_MIN && rhs == -1) WRITE_RD(0); else WRITE_RD(sext_xlen(lhs % rhs));
remu	rd, rs1, rs2	DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend. REMU[W] Spike ISS Implementation: require_extension('M'); reg_t lhs = zext_xlen(RS1); reg_t rhs = zext_xlen(RS2); if (rhs == 0) WRITE_RD(sext_xlen(RS1)); else WRITE_RD(sext_xlen(lhs % rhs));
remuw	rd, rs1, rs2	DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero. Spike ISS Implementation: require_extension('M'); require_rv64; reg_t lhs = zext32(RS1); reg_t rhs = zext32(RS2); if (rhs == 0) WRITE_RD(sext32(lhs)); else WRITE_RD(sext32(lhs % rhs));
remw	rd, rs1, rs2	DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero. Spike ISS Implementation: require_extension('M'); require_rv64; sreg_t lhs = sext32(RS1); sreg_t rhs = sext32(RS2); if (rhs == 0) WRITE_RD(lhs); else WRITE_RD(sext32(lhs % rhs));

m / multiplication-operations

8 “M” Standard Extension for Integer Multiplication and Division, Version 2.0 / 8.1 Multiplication Operations

Operation	Arguments	Description
mul	rd, rs1, rs2	MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. In RV64, MUL can be used to obtain the upper 32 bits of the 64-bit product, but signed arguments must be proper 32-bit signed values, whereas unsigned arguments must have their upper 32 bits clear. If the arguments are not known to be sign- or zero-extended, an alternative is to shift both arguments left by 32 bits, then use MULH[[S]U]. Spike ISS Implementation: require_either_extension('M', EXT_ZMMUL); WRITE_RD(sext_xlen(RS1 * RS2));
mulh	rd, rs1, rs2	MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. In RV64, MUL can be used to obtain the upper 32 bits of the 64-bit product, but signed arguments must be proper 32-bit signed values, whereas unsigned arguments must have their upper 32 bits clear. If the arguments are not known to be sign- or zero-extended, an alternative is to shift both arguments left by 32 bits, then use MULH[[S]U]. Spike ISS Implementation: require_either_extension('M', EXT_ZMMUL); if (xlen == 64) WRITE_RD(mulh(RS1, RS2)); else WRITE_RD(sext32((sext32(RS1) * sext32(RS2)) >> 32));
mulhsu	rd, rs1, rs2	MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. MULHSU is used in multi-word signed multiplication to multiply the most-significant word of the multiplicand (which contains the sign bit) with the less-significant words of the multiplier (which are unsigned). Spike ISS Implementation: require_either_extension('M', EXT_ZMMUL); if (xlen == 64) WRITE_RD(mulhsu(RS1, RS2)); else WRITE_RD(sext32((sext32(RS1) * reg_t((uint32_t)RS2)) >> 32));
mulhu	rd, rs1, rs2	MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. Spike ISS Implementation: require_either_extension('M', EXT_ZMMUL); if (xlen == 64) WRITE_RD(mulhu(RS1, RS2)); else WRITE_RD(sext32(((uint64_t)(uint32_t)RS1 * (uint64_t)(uint32_t)RS2) >> 32));
mulw	rd, rs1, rs2	MULW is an RV64 instruction that multiplies the lower 32 bits of the source registers, placing the sign-extension of the lower 32 bits of the result into the destination register. Spike ISS Implementation: require_either_extension('M', EXT_ZMMUL); require_rv64; WRITE_RD(sext32(RS1 * RS2));

q

q standard extension for quad precision floating point version 2.2

quad precision convert and move instructions

quad precision load and store instructions

sec:single float compute

single precision floating point compare instructions

q / q-standard-extension-for-quad-precision-floating-point-version-2.2

14 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / 14.5 Quad-Precision Floating-Point Classify Instruction

Operation	Arguments	Description
fclass.q	rd, rs1	The quad-precision floating-point classify instruction, FCLASS.Q, is defined analogously to its double-precision counterpart, but operates on quad-precision operands. Spike ISS Implementation: require_extension('Q'); require_fp; WRITE_RD(f128_classify(f128(FRS1)));

Operation

Arguments

Description

fclass.q

rd, rs1

The quad-precision floating-point classify instruction, FCLASS.Q, is defined analogously to its double-precision counterpart, but operates on quad-precision operands.

Spike ISS Implementation:

require_extension('Q');
require_fp;
WRITE_RD(f128_classify(f128(FRS1)));

q / quad-precision-convert-and-move-instructions

14 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / 14.3 Quad-Precision Convert and Move Instructions

Operation	Arguments	Description
fcvt.d.q	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.
fcvt.l.q	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.
fcvt.lu.q	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.
fcvt.q.d	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.
fcvt.q.l	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.
fcvt.q.lu	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.
fcvt.q.s	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.
fcvt.q.w	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.
fcvt.q.wu	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.
fcvt.s.q	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.
fcvt.w.q	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.
fcvt.wu.q	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.
fsgnj.q	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction. Spike ISS Implementation: require_extension('Q'); require_fp; WRITE_FRD(fsgnj128(FRS1, FRS2, false, false));
fsgnjn.q	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction. Spike ISS Implementation: require_extension('Q'); require_fp; WRITE_FRD(fsgnj128(FRS1, FRS2, true, false));
fsgnjx.q	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction. Spike ISS Implementation: require_extension('Q'); require_fp; WRITE_FRD(fsgnj128(FRS1, FRS2, false, true));

q / quad-precision-load-and-store-instructions

14 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / 14.1 Quad-Precision Load and Store Instructions

Operation	Arguments	Description
flq	rd, rs1, imm12	FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128. FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation: require_extension('Q'); require_fp; WRITE_FRD(MMU.load_float128(RS1 + insn.i_imm()));
fsq	rs1, rs2, imm12	FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128. FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation: require_extension('Q'); require_fp; MMU.store_float128(RS1 + insn.s_imm(), FRS2);

Operation

Arguments

Description

flq

rd, rs1, imm12

FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128.

FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:

require_extension('Q');
require_fp;
WRITE_FRD(MMU.load_float128(RS1 + insn.i_imm()));

fsq

rs1, rs2, imm12

FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128.

FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.

Spike ISS Implementation:

require_extension('Q');
require_fp;
MMU.store_float128(RS1 + insn.s_imm(), FRS2);

q / sec:single-float-compute

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.6 Single-Precision Floating-Point Computational Instructions

Operation	Arguments	Description
fadd.q	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_add(f128(FRS1), f128(FRS2))); set_fp_exceptions;
fdiv.q	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_div(f128(FRS1), f128(FRS2))); set_fp_exceptions;
fmadd.q	rd, rs1, rs2, rs3	FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3. Spike ISS Implementation: require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mulAdd(f128(FRS1), f128(FRS2), f128(FRS3))); set_fp_exceptions;
fmax.q	rd, rs1, rs2	Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation: require_extension('Q'); require_fp; bool greater = f128_lt_quiet(f128(FRS2), f128(FRS1)) \|\| (f128_eq(f128(FRS2), f128(FRS1)) && (f128(FRS2).v[1] & F64_SIGN)); if (isNaNF128(f128(FRS1)) && isNaNF128(f128(FRS2))) WRITE_FRD(f128(defaultNaNF128())); else WRITE_FRD(greater \|\| isNaNF128(f128(FRS2)) ? FRS1 : FRS2); set_fp_exceptions;
fmin.q	rd, rs1, rs2	Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation: require_extension('Q'); require_fp; bool less = f128_lt_quiet(f128(FRS1), f128(FRS2)) \|\| (f128_eq(f128(FRS1), f128(FRS2)) && (f128(FRS1).v[1] & F64_SIGN)); if (isNaNF128(f128(FRS1)) && isNaNF128(f128(FRS2))) WRITE_FRD(f128(defaultNaNF128())); else WRITE_FRD(less \|\| isNaNF128(f128(FRS2)) ? FRS1 : FRS2); set_fp_exceptions;
fmsub.q	rd, rs1, rs2, rs3	FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3. Spike ISS Implementation: require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mulAdd(f128(FRS1), f128(FRS2), f128_negate(f128(FRS3)))); set_fp_exceptions;
fmul.q	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mul(f128(FRS1), f128(FRS2))); set_fp_exceptions;
fnmadd.q	rd, rs1, rs2, rs3	FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3. Spike ISS Implementation: require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mulAdd(f128_negate(f128(FRS1)), f128(FRS2), f128_negate(f128(FRS3)))); set_fp_exceptions;
fnmsub.q	rd, rs1, rs2, rs3	FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3. Spike ISS Implementation: require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mulAdd(f128_negate(f128(FRS1)), f128(FRS2), f128(FRS3))); set_fp_exceptions;
fsqrt.q	rd, rs1	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_sqrt(f128(FRS1))); set_fp_exceptions;
fsub.q	rd, rs1, rs2	Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation: require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_sub(f128(FRS1), f128(FRS2))); set_fp_exceptions;

q / single-precision-floating-point-compare-instructions

12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.8 Single-Precision Floating-Point Compare Instructions

Operation	Arguments	Description
feq.q	rd, rs1, rs2	Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation: require_extension('Q'); require_fp; WRITE_RD(f128_eq(f128(FRS1), f128(FRS2))); set_fp_exceptions;
fle.q	rd, rs1, rs2	Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation: require_extension('Q'); require_fp; WRITE_RD(f128_le(f128(FRS1), f128(FRS2))); set_fp_exceptions;
flt.q	rd, rs1, rs2	Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation: require_extension('Q'); require_fp; WRITE_RD(f128_lt(f128(FRS1), f128(FRS2))); set_fp_exceptions;

v

introduction	narrowing floating pointinteger type convert instructions	single width floating pointinteger type convert instructions	state of vector extension at reset	unit stride fault only first loads
vector bitwise logical instructions	vector compress instruction	vector count population in mask vcpop m	vector element index instruction	vector floating point classify instruction
vector floating point compare instructions	vector floating point merge instruction	vector floating point minmax instructions	vector floating point move instruction	vector floating point reciprocal estimate instruction
vector floating point reciprocal square root estimate instruction	vector floating point sign injection instructions	vector floating point square root instruction	vector indexed instructions	vector instruction formats
vector instruction listing	vector integer add with carry subtract with borrow instructions	vector integer compare instructions	vector integer divide instructions	vector integer merge instructions
vector integer minmax instructions	vector integer move instructions	vector iota instruction	vector loadstore whole register instructions	vector narrowing fixed point clip instructions
vector register gather instructions	vector register grouping vlmul20	vector single width averaging add and subtract	vector single width floating point addsubtract instructions	vector single width floating point fused multiply add instructions
vector single width floating point multiplydivide instructions	vector single width fractional multiply with rounding and saturation	vector single width integer add and subtract	vector single width integer multiply add instructions	vector single width integer multiply instructions
vector single width saturating add and subtract	vector single width scaling shift instructions	vector single width shift instructions	vector slide1down instruction	vector slide1up
vector slide instructions	vector slidedown instructions	vector strided instructions	vector unit stride instructions	vector unordered single width floating point sum reduction
vector widening floating point addsubtract instructions	vector widening floating point fused multiply add instructions	vector widening floating point multiply	vector widening integer addsubtract	vector widening integer multiply add instructions
vector widening integer multiply instructions	vfirst find first set mask bit	vmsif m set including first mask bit	vmsof m set only first mask bit	widening floating pointinteger type convert instructions
zve vector extensions for embedded processors	sec agnostic	sec mask register logical	sec narrowing	sec vec operands
sec vector float reduce	sec vector float reduce widen	sec vector integer reduce	sec vector integer reduce widen

v / _introduction

Introduction /

Operation	Arguments	Description
vamoaddei16.v	vs2, rs1, vd	Spike ISS Implementation: //vamoadde.v vd, (rs1), vs2, vd VI_AMO({ return lhs + vs3; }, uint, e16);
vamoaddei32.v	vs2, rs1, vd	Spike ISS Implementation: //vamoadde.v vd, (rs1), vs2, vd VI_AMO({ return lhs + vs3; }, uint, e32);
vamoaddei64.v	vs2, rs1, vd	Spike ISS Implementation: //vamoadde.v vd, (rs1), vs2, vd VI_AMO({ return lhs + vs3; }, uint, e64);
vamoaddei8.v	vs2, rs1, vd	Spike ISS Implementation: //vamoadde.v vd, (rs1), vs2, vd VI_AMO({ return lhs + vs3; }, uint, e8);
vamoandei16.v	vs2, rs1, vd	Spike ISS Implementation: //vamoande.v vd, (rs1), vs2, vd VI_AMO({ return lhs & vs3; }, uint, e16);
vamoandei32.v	vs2, rs1, vd	Spike ISS Implementation: //vamoande.v vd, (rs1), vs2, vd VI_AMO({ return lhs & vs3; }, uint, e32);
vamoandei64.v	vs2, rs1, vd	Spike ISS Implementation: //vamoande.v vd, (rs1), vs2, vd VI_AMO({ return lhs & vs3; }, uint, e64);
vamoandei8.v	vs2, rs1, vd	Spike ISS Implementation: //vamoande.v vd, (rs1), vs2, vd VI_AMO({ return lhs & vs3; }, uint, e8);
vamomaxei16.v	vs2, rs1, vd	Spike ISS Implementation: //vamomaxe.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e16);
vamomaxei32.v	vs2, rs1, vd	Spike ISS Implementation: //vamomaxe.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e32);
vamomaxei64.v	vs2, rs1, vd	Spike ISS Implementation: //vamomaxe.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e64);
vamomaxei8.v	vs2, rs1, vd	Spike ISS Implementation: //vamomaxe.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e8);
vamomaxuei16.v	vs2, rs1, vd	Spike ISS Implementation: //vamomaxue.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e16);
vamomaxuei32.v	vs2, rs1, vd	Spike ISS Implementation: //vamomaxue.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e32);
vamomaxuei64.v	vs2, rs1, vd	Spike ISS Implementation: //vamomaxue.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e64);
vamomaxuei8.v	vs2, rs1, vd	Spike ISS Implementation: //vamomaxue.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e8);
vamominei16.v	vs2, rs1, vd	Spike ISS Implementation: //vamomine.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e16);
vamominei32.v	vs2, rs1, vd	Spike ISS Implementation: //vamomine.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e32);
vamominei64.v	vs2, rs1, vd	Spike ISS Implementation: //vamomine.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e64);
vamominei8.v	vs2, rs1, vd	Spike ISS Implementation: //vamomine.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e8);
vamominuei16.v	vs2, rs1, vd	Spike ISS Implementation: //vamominue.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e16);
vamominuei32.v	vs2, rs1, vd	Spike ISS Implementation: //vamominue.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e32);
vamominuei64.v	vs2, rs1, vd	Spike ISS Implementation: //vamominue.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e64);
vamominuei8.v	vs2, rs1, vd	Spike ISS Implementation: //vamominue.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e8);
vamoorei16.v	vs2, rs1, vd	Spike ISS Implementation: //vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs \| vs3; }, uint, e16);
vamoorei32.v	vs2, rs1, vd	Spike ISS Implementation: //vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs \| vs3; }, uint, e32);
vamoorei64.v	vs2, rs1, vd	Spike ISS Implementation: //vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs \| vs3; }, uint, e64);
vamoorei8.v	vs2, rs1, vd	Spike ISS Implementation: //vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs \| vs3; }, uint, e8);
vamoswapei16.v	vs2, rs1, vd	Spike ISS Implementation: //vamoswape.v vd, (rs1), vs2, vd VI_AMO({ return vs3; }, uint, e16);
vamoswapei32.v	vs2, rs1, vd	Spike ISS Implementation: //vamoswape.v vd, (rs1), vs2, vd VI_AMO({ return vs3; }, uint, e32);
vamoswapei64.v	vs2, rs1, vd	Spike ISS Implementation: //vamoswape.v vd, (rs1), vs2, vd VI_AMO({ return vs3; }, uint, e64);
vamoswapei8.v	vs2, rs1, vd	Spike ISS Implementation: //vamoswape.v vd, (rs1), vs2, vd VI_AMO({ return vs3; }, uint, e8);
vamoxorei16.v	vs2, rs1, vd	Spike ISS Implementation: //vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs ^ vs3; }, uint, e16);
vamoxorei32.v	vs2, rs1, vd	Spike ISS Implementation: //vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs ^ vs3; }, uint, e32);
vamoxorei64.v	vs2, rs1, vd	Spike ISS Implementation: //vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs ^ vs3; }, uint, e64);
vamoxorei8.v	vs2, rs1, vd	Spike ISS Implementation: //vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs ^ vs3; }, uint, e8);
vl1re16.v	rs1, vd	Spike ISS Implementation: // vl1re16.v vd, (rs1) VI_LD_WHOLE(uint16);
vl1re32.v	rs1, vd	Spike ISS Implementation: // vl1re32.v vd, (rs1) VI_LD_WHOLE(uint32);
vl1re64.v	rs1, vd	Spike ISS Implementation: // vl1re64.v vd, (rs1) VI_LD_WHOLE(uint64);
vl2re16.v	rs1, vd	Spike ISS Implementation: // vl2e16.v vd, (rs1) VI_LD_WHOLE(uint16);
vl2re32.v	rs1, vd	Spike ISS Implementation: // vl2re32.v vd, (rs1) VI_LD_WHOLE(uint32);
vl2re64.v	rs1, vd	Spike ISS Implementation: // vl2re64.v vd, (rs1) VI_LD_WHOLE(uint64);
vl2re8.v	rs1, vd	Spike ISS Implementation: // vl2re8.v vd, (rs1) VI_LD_WHOLE(uint8);
vl4re16.v	rs1, vd	Spike ISS Implementation: // vl4re16.v vd, (rs1) VI_LD_WHOLE(uint16);
vl4re32.v	rs1, vd	Spike ISS Implementation: // vl4re32.v vd, (rs1) VI_LD_WHOLE(uint32);
vl4re64.v	rs1, vd	Spike ISS Implementation: // vl4re64.v vd, (rs1) VI_LD_WHOLE(uint64);
vl4re8.v	rs1, vd	Spike ISS Implementation: // vl4re8.v vd, (rs1) VI_LD_WHOLE(uint8);
vl8re16.v	rs1, vd	Spike ISS Implementation: // vl8re16.v vd, (rs1) VI_LD_WHOLE(uint16);
vl8re32.v	rs1, vd	Spike ISS Implementation: // vl8re32.v vd, (rs1) VI_LD_WHOLE(uint32);
vl8re64.v	rs1, vd	Spike ISS Implementation: // vl8re64.v vd, (rs1) VI_LD_WHOLE(uint64);
vl8re8.v	rs1, vd	Spike ISS Implementation: // vl8re8.v vd, (rs1) VI_LD_WHOLE(uint8);
vle1024.v	rs1, vd
vle1024ff.v	rs1, vd
vle128.v	rs1, vd
vle128ff.v	rs1, vd
vle16.v	rs1, vd	Spike ISS Implementation: // vle16.v and vlseg[2-8]e16.v VI_LD(0, (i * nf + fn), int16, false);
vle16ff.v	rs1, vd	Spike ISS Implementation: // vle16ff.v and vlseg[2-8]e16ff.v VI_LDST_FF(int16);
vle256.v	rs1, vd
vle256ff.v	rs1, vd
vle32.v	rs1, vd	Spike ISS Implementation: // vle32.v and vlseg[2-8]e32.v VI_LD(0, (i * nf + fn), int32, false);
vle32ff.v	rs1, vd	Spike ISS Implementation: // vle32ff.v and vlseg[2-8]e32ff.v VI_LDST_FF(int32);
vle512.v	rs1, vd
vle512ff.v	rs1, vd
vle64.v	rs1, vd	Spike ISS Implementation: // vle64.v and vlseg[2-8]e64.v VI_LD(0, (i * nf + fn), int64, false);
vle64ff.v	rs1, vd	Spike ISS Implementation: // vle64ff.v and vlseg[2-8]e64ff.v VI_LDST_FF(int64);
vloxei1024.v	vs2, rs1, vd
vloxei128.v	vs2, rs1, vd
vloxei16.v	vs2, rs1, vd	Spike ISS Implementation: // vlxei16.v and vlxseg[2-8]e16.v VI_LD_INDEX(e16, true);
vloxei256.v	vs2, rs1, vd
vloxei32.v	vs2, rs1, vd	Spike ISS Implementation: // vlxe32.v and vlxseg[2-8]ei32.v VI_LD_INDEX(e32, true);
vloxei512.v	vs2, rs1, vd
vloxei64.v	vs2, rs1, vd	Spike ISS Implementation: // vlxei64.v and vlxseg[2-8]ei64.v VI_LD_INDEX(e64, true);
vloxei8.v	vs2, rs1, vd	Spike ISS Implementation: // vlxei8.v and vlxseg[2-8]ei8.v VI_LD_INDEX(e8, true);
vlse1024.v	rs2, rs1, vd
vlse128.v	rs2, rs1, vd
vlse16.v	rs2, rs1, vd	Spike ISS Implementation: // vlse16.v and vlsseg[2-8]e16.v VI_LD(i * RS2, fn, int16, false);
vlse256.v	rs2, rs1, vd
vlse32.v	rs2, rs1, vd	Spike ISS Implementation: // vlse32.v and vlsseg[2-8]e32.v VI_LD(i * RS2, fn, int32, false);
vlse512.v	rs2, rs1, vd
vlse64.v	rs2, rs1, vd	Spike ISS Implementation: // vlse64.v and vlsseg[2-8]e64.v VI_LD(i * RS2, fn, int64, false);
vluxei1024.v	vs2, rs1, vd
vluxei128.v	vs2, rs1, vd
vluxei16.v	vs2, rs1, vd	Spike ISS Implementation: // vlxei16.v and vlxseg[2-8]e16.v VI_LD_INDEX(e16, true);
vluxei256.v	vs2, rs1, vd
vluxei32.v	vs2, rs1, vd	Spike ISS Implementation: // vlxe32.v and vlxseg[2-8]ei32.v VI_LD_INDEX(e32, true);
vluxei512.v	vs2, rs1, vd
vluxei64.v	vs2, rs1, vd	Spike ISS Implementation: // vlxei64.v and vlxseg[2-8]ei64.v VI_LD_INDEX(e64, true);
vmv1r.v	vs2, vd	Spike ISS Implementation: // vmv1r.v vd, vs2 #include "vmvnfr_v.h"
vmv2r.v	vs2, vd	Spike ISS Implementation: // vmv2r.v vd, vs2 #include "vmvnfr_v.h"
vmv4r.v	vs2, vd	Spike ISS Implementation: // vmv4r.v vd, vs2 #include "vmvnfr_v.h"
vmv8r.v	vs2, vd	Spike ISS Implementation: // vmv8r.v vd, vs2 #include "vmvnfr_v.h"
vs1r.v	rs1, vs3	Spike ISS Implementation: // vs1r.v vs3, (rs1) VI_ST_WHOLE
vs2r.v	rs1, vs3	Spike ISS Implementation: // vs2r.v vs3, (rs1) VI_ST_WHOLE
vs4r.v	rs1, vs3	Spike ISS Implementation: // vs4r.v vs3, (rs1) VI_ST_WHOLE
vs8r.v	rs1, vs3	Spike ISS Implementation: // vs8r.v vs3, (rs1) VI_ST_WHOLE
vse1024.v	rs1, vs3
vse128.v	rs1, vs3
vse16.v	rs1, vs3	Spike ISS Implementation: // vse16.v and vsseg[2-8]e16.v VI_ST(0, (i * nf + fn), uint16, false);
vse256.v	rs1, vs3
vse32.v	rs1, vs3	Spike ISS Implementation: // vse32.v and vsseg[2-8]e32.v VI_ST(0, (i * nf + fn), uint32, false);
vse512.v	rs1, vs3
vse64.v	rs1, vs3	Spike ISS Implementation: // vse64.v and vsseg[2-8]e64.v VI_ST(0, (i * nf + fn), uint64, false);
vse8.v	rs1, vs3	Spike ISS Implementation: // vse8.v and vsseg[2-8]e8.v VI_ST(0, (i * nf + fn), uint8, false);
vsm.v	rs1, vs3	Spike ISS Implementation: // vse1.v VI_ST(0, (i * nf + fn), uint8, true);
vsoxei1024.v	vs2, rs1, vs3
vsoxei128.v	vs2, rs1, vs3
vsoxei16.v	vs2, rs1, vs3	Spike ISS Implementation: // vsxei16.v and vsxseg[2-8]ei16.v VI_ST_INDEX(e16, true);
vsoxei256.v	vs2, rs1, vs3
vsoxei32.v	vs2, rs1, vs3	Spike ISS Implementation: // vsxei32.v and vsxseg[2-8]ei32.v VI_ST_INDEX(e32, true);
vsoxei512.v	vs2, rs1, vs3
vsoxei64.v	vs2, rs1, vs3	Spike ISS Implementation: // vsxei64.v and vsxseg[2-8]ei64.v VI_ST_INDEX(e64, true);
vsoxei8.v	vs2, rs1, vs3	Spike ISS Implementation: // vsxei8.v and vsxseg[2-8]ei8.v VI_ST_INDEX(e8, true);
vsse1024.v	rs2, rs1, vs3
vsse128.v	rs2, rs1, vs3
vsse16.v	rs2, rs1, vs3	Spike ISS Implementation: // vsse16v and vssseg[2-8]e16.v VI_ST(i * RS2, fn, uint16, false);
vsse256.v	rs2, rs1, vs3
vsse32.v	rs2, rs1, vs3	Spike ISS Implementation: // vsse32.v and vssseg[2-8]e32.v VI_ST(i * RS2, fn, uint32, false);
vsse512.v	rs2, rs1, vs3
vsse64.v	rs2, rs1, vs3	Spike ISS Implementation: // vsse64.v and vssseg[2-8]e64.v VI_ST(i * RS2, fn, uint64, false);
vsse8.v	rs2, rs1, vs3	Spike ISS Implementation: // vsse8.v and vssseg[2-8]e8.v VI_ST(i * RS2, fn, uint8, false);
vsuxei1024.v	vs2, rs1, vs3
vsuxei128.v	vs2, rs1, vs3
vsuxei16.v	vs2, rs1, vs3	Spike ISS Implementation: // vsuxe16.v VI_ST_INDEX(e16, true);
vsuxei256.v	vs2, rs1, vs3
vsuxei32.v	vs2, rs1, vs3	Spike ISS Implementation: // vsuxe32.v VI_ST_INDEX(e32, true);
vsuxei512.v	vs2, rs1, vs3
vsuxei64.v	vs2, rs1, vs3	Spike ISS Implementation: // vsuxe64.v VI_ST_INDEX(e64, true);
vsuxei8.v	vs2, rs1, vs3	Spike ISS Implementation: // vsuxe8.v VI_ST_INDEX(e8, true);

v / _narrowing_floating_pointinteger_type_convert_instructions

Vector Floating-Point Instructions / 14.19. Narrowing Floating-Point/Integer Type-Convert Instructions

Operation	Arguments	Description
vfncvt.f.f.w	vs2, vd	vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.
vfncvt.f.x.w	vs2, vd	vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.
vfncvt.f.xu.w	vs2, vd	vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.
vfncvt.rod.f.f.w	vs2, vd	vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.
vfncvt.rtz.x.f.w	vs2, vd	vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.
vfncvt.rtz.xu.f.w	vs2, vd	vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.
vfncvt.x.f.w	vs2, vd	vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.
vfncvt.xu.f.w	vs2, vd	vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd.

v / _single_width_floating_pointinteger_type_convert_instructions

Vector Floating-Point Instructions / 14.17. Single-Width Floating-Point/Integer Type-Convert Instructions

Operation	Arguments	Description
vfcvt.f.x.v	vs2, vd	vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.
vfcvt.f.xu.v	vs2, vd	vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.
vfcvt.rtz.x.f.v	vs2, vd	vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.
vfcvt.rtz.xu.f.v	vs2, vd	vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.
vfcvt.x.f.v	vs2, vd	vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.
vfcvt.xu.f.v	vs2, vd	vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float.

v / _state_of_vector_extension_at_reset

Vector Extension Programmer’s Model / 4.11. State of Vector Extension at Reset

Operation

Arguments

Description

vsetvl

rs2, rs1, rd

The vector extension must have a consistent state at reset. In particular, vtype and vl must have values that can be read and then restored with a single vsetvl instruction.

Spike ISS Implementation:

require_vector_novtype(false);
WRITE_RD(P.VU.set_vl(insn.rd(), insn.rs1(), RS1, RS2));

v / _unit_stride_fault_only_first_loads

Vector Loads and Stores / 8.7. Unit-stride Fault-Only-First Loads

Operation

Arguments

Description

vle8ff.v

rs1, vd

# Vector unit-stride fault-only-first loads # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) vle8ff.v vd, (rs1), vm # 8-bit unit-stride fault-only-first load vle16ff.v vd, (rs1), vm # 16-bit unit-stride fault-only-first load vle32ff.v vd, (rs1), vm # 32-bit unit-stride fault-only-first load vle64ff.v vd, (rs1), vm # 64-bit unit-stride fault-only-first load

Spike ISS Implementation:

// vle8ff.v and vlseg[2-8]e8ff.v
VI_LDST_FF(int8);

v / _vector_bitwise_logical_instructions

Vector Integer Arithmetic Instructions / 12.5. Vector Bitwise Logical Instructions

Operation	Arguments	Description
vand.vi	vs2, simm5, vd	# Bitwise logical operations. vand.vv vd, vs2, vs1, vm # Vector-vector vand.vx vd, vs2, rs1, vm # vector-scalar vand.vi vd, vs2, imm, vm # vector-immediate vor.vv vd, vs2, vs1, vm # Vector-vector vor.vx vd, vs2, rs1, vm # vector-scalar vor.vi vd, vs2, imm, vm # vector-immediate vxor.vv vd, vs2, vs1, vm # Vector-vector vxor.vx vd, vs2, rs1, vm # vector-scalar vxor.vi vd, vs2, imm, vm # vector-immediate
vand.vv	vs2, vs1, vd	# Bitwise logical operations. vand.vv vd, vs2, vs1, vm # Vector-vector vand.vx vd, vs2, rs1, vm # vector-scalar vand.vi vd, vs2, imm, vm # vector-immediate vor.vv vd, vs2, vs1, vm # Vector-vector vor.vx vd, vs2, rs1, vm # vector-scalar vor.vi vd, vs2, imm, vm # vector-immediate vxor.vv vd, vs2, vs1, vm # Vector-vector vxor.vx vd, vs2, rs1, vm # vector-scalar vxor.vi vd, vs2, imm, vm # vector-immediate
vand.vx	vs2, rs1, vd	# Bitwise logical operations. vand.vv vd, vs2, vs1, vm # Vector-vector vand.vx vd, vs2, rs1, vm # vector-scalar vand.vi vd, vs2, imm, vm # vector-immediate vor.vv vd, vs2, vs1, vm # Vector-vector vor.vx vd, vs2, rs1, vm # vector-scalar vor.vi vd, vs2, imm, vm # vector-immediate vxor.vv vd, vs2, vs1, vm # Vector-vector vxor.vx vd, vs2, rs1, vm # vector-scalar vxor.vi vd, vs2, imm, vm # vector-immediate

v / _vector_compress_instruction

Vector Permutation Instructions / 17.5. Vector Compress Instruction

Operation

Arguments

Description

vcompress.vm

vs2, vs1, vd

vcompress is encoded as an unmasked instruction (vm=1). The equivalent masked instruction (vm=0) is reserved.

A trap on a vcompress instruction is always reported with a vstart of 0. Executing a vcompress instruction with a non-zero vstart raises an illegal instruction exception.

vcompress.vm vd, vs2, vs1 # Compress into vd elements of vs2 where vs1 is enabled

Example use of vcompress instruction 8 7 6 5 4 3 2 1 0 Element number 1 1 0 1 0 0 1 0 1 v0 8 7 6 5 4 3 2 1 0 v1 1 2 3 4 5 6 7 8 9 v2 vcompress.vm v2, v1, v0 1 2 3 4 8 7 5 2 0 v2

v / _vector_count_population_in_mask_vcpop_m

Vector Mask Instructions / 16.2. Vector count population in mask vcpop.m

Operation

Arguments

Description

vcpop.m

vs2, rd

The vcpop.m instruction counts the number of mask elements of the active elements of the vector source mask register that have the value 1 and writes the result to a scalar x register.

The vcpop.m instruction writes x[rd] even if vl=0 (with the value 0, since no mask elements are active).

Traps on vcpop.m are always reported with a vstart of 0. The vcpop.m instruction will raise an illegal instruction exception if vstart is non-zero.

vcpop.m rd, vs2, vm

vcpop.m rd, vs2, v0.t # x[rd] = sum_i ( vs2.mask[i] && v0.mask[i] )

Spike ISS Implementation:

// vmpopc rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
reg_t vl = P.VU.vl->read();
reg_t rs2_num = insn.rs2();
require(P.VU.vstart->read() == 0);
reg_t popcount = 0;
for (reg_t i=P.VU.vstart->read(); i<vl; ++i) {
const int midx = i / 32;
const int mpos = i % 32;

bool vs2_lsb = ((P.VU.elt<uint32_t>(rs2_num, midx ) >> mpos) & 0x1) == 1;
if (insn.v_vm() == 1) {
popcount += vs2_lsb;
} else {
bool do_mask = (P.VU.elt<uint32_t>(0, midx) >> mpos) & 0x1;
popcount += (vs2_lsb && do_mask);
}
}
P.VU.vstart->write(0);
WRITE_RD(popcount);

v / _vector_element_index_instruction

Vector Mask Instructions / 16.9. Vector Element Index Instruction

Operation

Arguments

Description

vid.v

The vid.v instruction writes each element's index to the destination vector register group, from 0 to vl-1.

vid.v vd, vm # Write element ID to destination.

Spike ISS Implementation:

// vmpopc rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
reg_t sew = P.VU.vsew;
reg_t rd_num = insn.rd();
require_align(rd_num, P.VU.vflmul);
require_vm;

for (reg_t i = P.VU.vstart->read() ; i < P.VU.vl->read(); ++i) {
VI_LOOP_ELEMENT_SKIP();

switch (sew) {
case e8:
P.VU.elt<uint8_t>(rd_num, i, true) = i;
break;
case e16:
P.VU.elt<uint16_t>(rd_num, i, true) = i;
break;
case e32:
P.VU.elt<uint32_t>(rd_num, i, true) = i;
break;
default:
P.VU.elt<uint64_t>(rd_num, i, true) = i;
break;
}
}

P.VU.vstart->write(0);

v / _vector_floating_point_classify_instruction

Vector Floating-Point Instructions / 14.14. Vector Floating-Point Classify Instruction

Operation

Arguments

Description

vfclass.v

vs2, vd

vfclass.v vd, vs2, vm # Vector-vector

Spike ISS Implementation:

// vfclass.v vd, vs2, vm
VI_VFP_V_LOOP
({
vd = f16(f16_classify(vs2));
},
{
vd = f32(f32_classify(vs2));
},
{
vd = f64(f64_classify(vs2));
})

v / _vector_floating_point_compare_instructions

Vector Floating-Point Instructions / 14.13. Vector Floating-Point Compare Instructions

Operation	Arguments	Description
vmfeq.vf	vs2, rs1, vd	The compare instructions follow the semantics of the scalar floating-point compare instructions. vmfeq and vmfne raise the invalid operation exception only on signaling NaN inputs. vmflt, vmfle, vmfgt, and vmfge raise the invalid operation exception on both signaling and quiet NaN inputs. vmfne writes 1 to the destination element when either operand is NaN, whereas the other compares write 0 when either operand is NaN. # Compare equal vmfeq.vv vd, vs2, vs1, vm # Vector-vector vmfeq.vf vd, vs2, rs1, vm # vector-scalar # Compare not equal vmfne.vv vd, vs2, vs1, vm # Vector-vector vmfne.vf vd, vs2, rs1, vm # vector-scalar # Compare less than vmflt.vv vd, vs2, vs1, vm # Vector-vector vmflt.vf vd, vs2, rs1, vm # vector-scalar # Compare less than or equal vmfle.vv vd, vs2, vs1, vm # Vector-vector vmfle.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than vmfgt.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than or equal vmfge.vf vd, vs2, rs1, vm # vector-scalar # Example of implementing isgreater() vmfeq.vv v0, va, va # Only set where A is not NaN. vmfeq.vv v1, vb, vb # Only set where B is not NaN. vmand.mm v0, v0, v1 # Only set where A and B are ordered, vmfgt.vv v0, va, vb, v0.t # so only set flags on ordered values.
vmfeq.vv	vs2, vs1, vd	The compare instructions follow the semantics of the scalar floating-point compare instructions. vmfeq and vmfne raise the invalid operation exception only on signaling NaN inputs. vmflt, vmfle, vmfgt, and vmfge raise the invalid operation exception on both signaling and quiet NaN inputs. vmfne writes 1 to the destination element when either operand is NaN, whereas the other compares write 0 when either operand is NaN. # Compare equal vmfeq.vv vd, vs2, vs1, vm # Vector-vector vmfeq.vf vd, vs2, rs1, vm # vector-scalar # Compare not equal vmfne.vv vd, vs2, vs1, vm # Vector-vector vmfne.vf vd, vs2, rs1, vm # vector-scalar # Compare less than vmflt.vv vd, vs2, vs1, vm # Vector-vector vmflt.vf vd, vs2, rs1, vm # vector-scalar # Compare less than or equal vmfle.vv vd, vs2, vs1, vm # Vector-vector vmfle.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than vmfgt.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than or equal vmfge.vf vd, vs2, rs1, vm # vector-scalar # Example of implementing isgreater() vmfeq.vv v0, va, va # Only set where A is not NaN. vmfeq.vv v1, vb, vb # Only set where B is not NaN. vmand.mm v0, v0, v1 # Only set where A and B are ordered, vmfgt.vv v0, va, vb, v0.t # so only set flags on ordered values.
vmflt.vf	vs2, rs1, vd	Comparison Assembler Mapping Assembler pseudoinstruction va < vb vmflt.vv vd, va, vb, vm va <= vb vmfle.vv vd, va, vb, vm va > vb vmflt.vv vd, vb, va, vm vmfgt.vv vd, va, vb, vm va >= vb vmfle.vv vd, vb, va, vm vmfge.vv vd, va, vb, vm va < f vmflt.vf vd, va, f, vm va <= f vmfle.vf vd, va, f, vm va > f vmfgt.vf vd, va, f, vm va >= f vmfge.vf vd, va, f, vm va, vb vector register groups f scalar floating-point register
vmflt.vv	vs2, vs1, vd	Comparison Assembler Mapping Assembler pseudoinstruction va < vb vmflt.vv vd, va, vb, vm va <= vb vmfle.vv vd, va, vb, vm va > vb vmflt.vv vd, vb, va, vm vmfgt.vv vd, va, vb, vm va >= vb vmfle.vv vd, vb, va, vm vmfge.vv vd, va, vb, vm va < f vmflt.vf vd, va, f, vm va <= f vmfle.vf vd, va, f, vm va > f vmfgt.vf vd, va, f, vm va >= f vmfge.vf vd, va, f, vm va, vb vector register groups f scalar floating-point register

v / _vector_floating_point_merge_instruction

Vector Floating-Point Instructions / 14.15. Vector Floating-Point Merge Instruction

Operation

Arguments

Description

vfmerge.vfm

vs2, rs1, vd

The vfmerge.vfm instruction is encoded as a masked instruction (vm=0). At elements where the mask value is zero, the first vector operand is copied to the destination element, otherwise a scalar floating-point register value is copied to the destination element.

vfmerge.vfm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? f[rs1] : vs2[i]

v / _vector_floating_point_minmax_instructions

Vector Floating-Point Instructions / 14.11. Vector Floating-Point MIN/MAX Instructions

Operation

Arguments

Description

vfmin.vf

vs2, rs1, vd

The vector floating-point vfmin and vfmax instructions have the same behavior as the corresponding scalar floating-point instructions in version 2.2 of the RISC-V F/D/Q extension.

# Floating-point minimum vfmin.vv vd, vs2, vs1, vm # Vector-vector vfmin.vf vd, vs2, rs1, vm # vector-scalar # Floating-point maximum vfmax.vv vd, vs2, vs1, vm # Vector-vector vfmax.vf vd, vs2, rs1, vm # vector-scalar

vfmin.vv

vs2, vs1, vd

The vector floating-point vfmin and vfmax instructions have the same behavior as the corresponding scalar floating-point instructions in version 2.2 of the RISC-V F/D/Q extension.

v / _vector_floating_point_move_instruction

Vector Floating-Point Instructions / 14.16. Vector Floating-Point Move Instruction

Operation	Arguments	Description
vfmv.f.s	vs2, rd	vfmv.v.f vd, rs1 # vd[i] = f[rs1]
vfmv.s.f	rs1, vd	vfmv.v.f vd, rs1 # vd[i] = f[rs1]
vfmv.v.f	rs1, vd	vfmv.v.f vd, rs1 # vd[i] = f[rs1]

v / _vector_floating_point_reciprocal_estimate_instruction

Vector Floating-Point Instructions / 14.10. Vector Floating-Point Reciprocal Estimate Instruction

Operation

Arguments

Description

vfrec7.v

vs2, vd

Table 17. vfrec7.v common-case lookup table contents

# Floating-point reciprocal estimate to 7 bits. vfrec7.v vd, vs2, vm

Spike ISS Implementation:

// vfclass.v vd, vs2, vm
VI_VFP_V_LOOP
({
vd = f16_recip7(vs2);
},
{
vd = f32_recip7(vs2);
},
{
vd = f64_recip7(vs2);
})

v / _vector_floating_point_reciprocal_square_root_estimate_instruction

Vector Floating-Point Instructions / 14.9. Vector Floating-Point Reciprocal Square-Root Estimate Instruction

Operation

Arguments

Description

vfrsqrt7.v

vs2, vd

Table 16. vfrsqrt7.v common-case lookup table contents

# Floating-point reciprocal square-root estimate to 7 bits. vfrsqrt7.v vd, vs2, vm

Spike ISS Implementation:

// vfclass.v vd, vs2, vm
VI_VFP_V_LOOP
({
vd = f16_rsqrte7(vs2);
},
{
vd = f32_rsqrte7(vs2);
},
{
vd = f64_rsqrte7(vs2);
})

v / _vector_floating_point_sign_injection_instructions

Vector Floating-Point Instructions / 14.12. Vector Floating-Point Sign-Injection Instructions

Operation	Arguments	Description
vfsgnj.vf	vs2, rs1, vd	vfsgnj.vv vd, vs2, vs1, vm # Vector-vector vfsgnj.vf vd, vs2, rs1, vm # vector-scalar vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar
vfsgnj.vv	vs2, vs1, vd	vfsgnj.vv vd, vs2, vs1, vm # Vector-vector vfsgnj.vf vd, vs2, rs1, vm # vector-scalar vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar

v / _vector_floating_point_square_root_instruction

Vector Floating-Point Instructions / 14.8. Vector Floating-Point Square-Root Instruction

Operation

Arguments

Description

vfsqrt.v

vs2, vd

# Floating-point square root vfsqrt.v vd, vs2, vm # Vector-vector square root

Spike ISS Implementation:

// vsqrt.v vd, vd2, vm
VI_VFP_V_LOOP
({
vd = f16_sqrt(vs2);
},
{
vd = f32_sqrt(vs2);
},
{
vd = f64_sqrt(vs2);
})

v / _vector_indexed_instructions

Vector Loads and Stores / 8.6. Vector Indexed Instructions

Operation

Arguments

Description

vluxei8.v

vs2, rs1, vd

# Vector indexed loads and stores # Vector indexed-unordered load instructions # vd destination, rs1 base address, vs2 byte offsets vluxei8.v vd, (rs1), vs2, vm # unordered 8-bit indexed load of SEW data vluxei16.v vd, (rs1), vs2, vm # unordered 16-bit indexed load of SEW data vluxei32.v vd, (rs1), vs2, vm # unordered 32-bit indexed load of SEW data vluxei64.v vd, (rs1), vs2, vm # unordered 64-bit indexed load of SEW data # Vector indexed-ordered load instructions # vd destination, rs1 base address, vs2 byte offsets vloxei8.v vd, (rs1), vs2, vm # ordered 8-bit indexed load of SEW data vloxei16.v vd, (rs1), vs2, vm # ordered 16-bit indexed load of SEW data vloxei32.v vd, (rs1), vs2, vm # ordered 32-bit indexed load of SEW data vloxei64.v vd, (rs1), vs2, vm # ordered 64-bit indexed load of SEW data # Vector indexed-unordered store instructions # vs3 store data, rs1 base address, vs2 byte offsets vsuxei8.v vs3, (rs1), vs2, vm # unordered 8-bit indexed store of SEW data vsuxei16.v vs3, (rs1), vs2, vm # unordered 16-bit indexed store of SEW data vsuxei32.v vs3, (rs1), vs2, vm # unordered 32-bit indexed store of SEW data vsuxei64.v vs3, (rs1), vs2, vm # unordered 64-bit indexed store of SEW data # Vector indexed-ordered store instructions # vs3 store data, rs1 base address, vs2 byte offsets vsoxei8.v vs3, (rs1), vs2, vm # ordered 8-bit indexed store of SEW data vsoxei16.v vs3, (rs1), vs2, vm # ordered 16-bit indexed store of SEW data vsoxei32.v vs3, (rs1), vs2, vm # ordered 32-bit indexed store of SEW data vsoxei64.v vs3, (rs1), vs2, vm # ordered 64-bit indexed store of SEW data

Spike ISS Implementation:

// vlxei8.v and vlxseg[2-8]ei8.v
VI_LD_INDEX(e8, true);

v / _vector_instruction_formats

Vector Instruction Formats /

Operation

Arguments

Description

vsetivli

zimm10, zimm, rd

{reg: [ {bits: 7, name: 0x57, attr: 'vsetivli'}, {bits: 5, name: 'rd', type: 4}, {bits: 3, name: 7}, {bits: 5, name: 'uimm[4:0]', type: 5}, {bits: 10, name: 'zimm[9:0]', type: 5}, {bits: 1, name: '1'}, {bits: 1, name: '1'}, ]}

Spike ISS Implementation:

require_vector_novtype(false);
WRITE_RD(P.VU.set_vl(insn.rd(), -1, insn.rs1(), insn.v_zimm10()));

v / _vector_instruction_listing

Vector Instruction Listing /

Operation	Arguments	Description
vaadd.vv	vs2, vs1, vd	vaadd
vaadd.vx	vs2, rs1, vd	vaadd
vasub.vv	vs2, vs1, vd	vasub
vasub.vx	vs2, rs1, vd	vasub
vasubu.vv	vs2, vs1, vd	vasubu
vasubu.vx	vs2, rs1, vd	vasubu
vdiv.vv	vs2, vs1, vd	vdiv
vdiv.vx	vs2, rs1, vd	vdiv
vfdiv.vf	vs2, rs1, vd	vfdiv
vfdiv.vv	vs2, vs1, vd	vfdiv
vfmadd.vf	vs2, rs1, vd	vfmadd
vfmadd.vv	vs2, vs1, vd	vfmadd
vfmax.vf	vs2, rs1, vd	vfmax
vfmax.vv	vs2, vs1, vd	vfmax
vfmsac.vf	vs2, rs1, vd	vfmsac
vfmsac.vv	vs2, vs1, vd	vfmsac
vfmsub.vf	vs2, rs1, vd	vfmsub
vfmsub.vv	vs2, vs1, vd	vfmsub
vfnmacc.vf	vs2, rs1, vd	vfnmacc
vfnmacc.vv	vs2, vs1, vd	vfnmacc
vfnmadd.vf	vs2, rs1, vd	vfnmadd
vfnmadd.vv	vs2, vs1, vd	vfnmadd
vfnmsac.vf	vs2, rs1, vd	vfnmsac
vfnmsac.vv	vs2, vs1, vd	vfnmsac
vfnmsub.vf	vs2, rs1, vd	vfnmsub
vfnmsub.vv	vs2, vs1, vd	vfnmsub
vfrdiv.vf	vs2, rs1, vd	vfrdiv
vfredmax.vs	vs2, vs1, vd	vfredmax
vfredmin.vs	vs2, vs1, vd	vfredmin
vfrsub.vf	vs2, rs1, vd	vfrsub
vfsgnjn.vf	vs2, rs1, vd	vfsgnjn
vfsgnjn.vv	vs2, vs1, vd	vfsgnjn
vfsgnjx.vf	vs2, rs1, vd	vfsgnjx
vfsgnjx.vv	vs2, vs1, vd	vfsgnjx
vfsub.vf	vs2, rs1, vd	vfsub
vfsub.vv	vs2, vs1, vd	vfsub
vfwmsac.vf	vs2, rs1, vd	vfwmsac
vfwmsac.vv	vs2, vs1, vd	vfwmsac
vfwnmacc.vf	vs2, rs1, vd	vfwnmacc
vfwnmacc.vv	vs2, vs1, vd	vfwnmacc
vfwnmsac.vf	vs2, rs1, vd	vfwnmsac
vfwnmsac.vv	vs2, vs1, vd	vfwnmsac
vfwredusum.vs	vs2, vs1, vd	vfwredusum
vfwsub.vf	vs2, rs1, vd	vfwsub vfwsub.w
vfwsub.vv	vs2, vs1, vd	vfwsub vfwsub.w
vfwsub.wf	vs2, rs1, vd	vfwsub vfwsub.w
vfwsub.wv	vs2, vs1, vd	vfwsub vfwsub.w
vmadd.vv	vs2, vs1, vd	vmadd
vmadd.vx	vs2, rs1, vd	vmadd
vmax.vv	vs2, vs1, vd	vmax
vmax.vx	vs2, rs1, vd	vmax
vmaxu.vv	vs2, vs1, vd	vmaxu
vmaxu.vx	vs2, rs1, vd	vmaxu
vmfge.vf	vs2, rs1, vd	vmfge
vmfgt.vf	vs2, rs1, vd	vmfgt
vmfle.vf	vs2, rs1, vd	vmfle
vmfle.vv	vs2, vs1, vd	vmfle
vmfne.vf	vs2, rs1, vd	vmfne
vmfne.vv	vs2, vs1, vd	vmfne
vmin.vv	vs2, vs1, vd	vmin
vmin.vx	vs2, rs1, vd	vmin
vmor.mm	vs2, vs1, vd	vmor
vmsgtu.vi	vs2, simm5, vd	vmsgtu
vmsgtu.vx	vs2, rs1, vd	vmsgtu
vmsle.vi	vs2, simm5, vd	vmsle
vmsle.vv	vs2, vs1, vd	vmsle
vmsle.vx	vs2, rs1, vd	vmsle
vmsleu.vi	vs2, simm5, vd	vmsleu
vmsleu.vv	vs2, vs1, vd	vmsleu
vmsleu.vx	vs2, rs1, vd	vmsleu
vmsltu.vv	vs2, vs1, vd	vmsltu
vmsltu.vx	vs2, rs1, vd	vmsltu
vmsne.vi	vs2, simm5, vd	vmsne
vmsne.vv	vs2, vs1, vd	vmsne
vmsne.vx	vs2, rs1, vd	vmsne
vmulhsu.vv	vs2, vs1, vd	vmulhsu
vmulhsu.vx	vs2, rs1, vd	vmulhsu
vmulhu.vv	vs2, vs1, vd	vmulhu
vmulhu.vx	vs2, rs1, vd	vmulhu
vnmsac.vv	vs2, vs1, vd	vnmsac
vnmsac.vx	vs2, rs1, vd	vnmsac
vnmsub.vv	vs2, vs1, vd	vnmsub
vnmsub.vx	vs2, rs1, vd	vnmsub
vor.vi	vs2, simm5, vd	vor
vor.vv	vs2, vs1, vd	vor
vor.vx	vs2, rs1, vd	vor
vredand.vs	vs2, vs1, vd	vredand
vredmax.vs	vs2, vs1, vd	vredmax
vredmaxu.vs	vs2, vs1, vd	vredmaxu
vredmin.vs	vs2, vs1, vd	vredmin
vredminu.vs	vs2, vs1, vd	vredminu
vredor.vs	vs2, vs1, vd	vredor
vredxor.vs	vs2, vs1, vd	vredxor
vrem.vv	vs2, vs1, vd	vrem
vrem.vx	vs2, rs1, vd	vrem
vremu.vv	vs2, vs1, vd	vremu
vremu.vx	vs2, rs1, vd	vremu
vrgatherei16.vv	vs2, vs1, vd	vrgatherei16
vrsub.vi	vs2, simm5, vd	vrsub
vrsub.vx	vs2, rs1, vd	vrsub
vsadd.vi	vs2, simm5, vd	vsadd
vsadd.vv	vs2, vs1, vd	vsadd
vsadd.vx	vs2, rs1, vd	vsadd
vsext.vf2	vs2, vd	vsext.vf8 vsext.vf4 vsext.vf2
vsext.vf4	vs2, vd	vsext.vf8 vsext.vf4 vsext.vf2
vsext.vf8	vs2, vd	vsext.vf8 vsext.vf4 vsext.vf2
vsra.vi	vs2, simm5, vd	vsra
vsra.vv	vs2, vs1, vd	vsra
vsra.vx	vs2, rs1, vd	vsra
vsrl.vi	vs2, simm5, vd	vsrl
vsrl.vv	vs2, vs1, vd	vsrl
vsrl.vx	vs2, rs1, vd	vsrl
vssra.vi	vs2, simm5, vd	vssra
vssra.vv	vs2, vs1, vd	vssra
vssra.vx	vs2, rs1, vd	vssra
vssub.vv	vs2, vs1, vd	vssub
vssub.vx	vs2, rs1, vd	vssub
vssubu.vv	vs2, vs1, vd	vssubu
vssubu.vx	vs2, rs1, vd	vssubu
vsub.vv	vs2, vs1, vd	vsub
vsub.vx	vs2, rs1, vd	vsub
vwadd.vv	vs2, vs1, vd	vwadd vwadd.w
vwadd.vx	vs2, rs1, vd	vwadd vwadd.w
vwadd.wv	vs2, vs1, vd	vwadd vwadd.w
vwadd.wx	vs2, rs1, vd	vwadd vwadd.w
vwmacc.vv	vs2, vs1, vd	vwmacc
vwmacc.vx	vs2, rs1, vd	vwmacc
vwmaccsu.vv	vs2, vs1, vd	vwmaccsu
vwmaccsu.vx	vs2, rs1, vd	vwmaccsu
vwmaccus.vx	vs2, rs1, vd	vwmaccus
vwmulsu.vv	vs2, vs1, vd	vwmulsu
vwmulsu.vx	vs2, rs1, vd	vwmulsu
vwmulu.vv	vs2, vs1, vd	vwmulu
vwmulu.vx	vs2, rs1, vd	vwmulu
vwsub.vv	vs2, vs1, vd	vwsub vwsub.w
vwsub.vx	vs2, rs1, vd	vwsub vwsub.w
vwsub.wv	vs2, vs1, vd	vwsub vwsub.w
vwsub.wx	vs2, rs1, vd	vwsub vwsub.w
vwsubu.vv	vs2, vs1, vd	vwsubu vwsubu.w
vwsubu.vx	vs2, rs1, vd	vwsubu vwsubu.w
vwsubu.wv	vs2, vs1, vd	vwsubu vwsubu.w
vwsubu.wx	vs2, rs1, vd	vwsubu vwsubu.w
vxor.vi	vs2, simm5, vd	vxor
vxor.vv	vs2, vs1, vd	vxor
vxor.vx	vs2, rs1, vd	vxor

v / _vector_integer_add_with_carry_subtract_with_borrow_instructions

Vector Integer Arithmetic Instructions / 12.4. Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions

Operation	Arguments	Description
vadc.vim	vs2, simm5, vd	vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd. These instructions are encoded as masked instructions (vm=0), but they operate on and write back all body elements. Encodings corresponding to the unmasked versions (vm=1) are reserved. For vadc and vsbc, the instruction encoding is reserved if the destination vector register is v0. # Produce sum with carry. # vd[i] = vs2[i] + vs1[i] + v0.mask[i] vadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] + x[rs1] + v0.mask[i] vadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd[i] = vs2[i] + imm + v0.mask[i] vadc.vim vd, vs2, imm, v0 # Vector-immediate # Produce carry out in mask register format # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) vmadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) vmadc.vim vd, vs2, imm, v0 # Vector-immediate # vd.mask[i] = carry_out(vs2[i] + vs1[i]) vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in # vd.mask[i] = carry_out(vs2[i] + x[rs1]) vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in # vd.mask[i] = carry_out(vs2[i] + imm) vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in
vadc.vvm	vs2, vs1, vd	vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd. These instructions are encoded as masked instructions (vm=0), but they operate on and write back all body elements. Encodings corresponding to the unmasked versions (vm=1) are reserved. For vadc and vsbc, the instruction encoding is reserved if the destination vector register is v0. # Produce sum with carry. # vd[i] = vs2[i] + vs1[i] + v0.mask[i] vadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] + x[rs1] + v0.mask[i] vadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd[i] = vs2[i] + imm + v0.mask[i] vadc.vim vd, vs2, imm, v0 # Vector-immediate # Produce carry out in mask register format # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) vmadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) vmadc.vim vd, vs2, imm, v0 # Vector-immediate # vd.mask[i] = carry_out(vs2[i] + vs1[i]) vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in # vd.mask[i] = carry_out(vs2[i] + x[rs1]) vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in # vd.mask[i] = carry_out(vs2[i] + imm) vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in
vadc.vxm	vs2, rs1, vd	vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd. These instructions are encoded as masked instructions (vm=0), but they operate on and write back all body elements. Encodings corresponding to the unmasked versions (vm=1) are reserved. For vadc and vsbc, the instruction encoding is reserved if the destination vector register is v0. # Produce sum with carry. # vd[i] = vs2[i] + vs1[i] + v0.mask[i] vadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] + x[rs1] + v0.mask[i] vadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd[i] = vs2[i] + imm + v0.mask[i] vadc.vim vd, vs2, imm, v0 # Vector-immediate # Produce carry out in mask register format # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) vmadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) vmadc.vim vd, vs2, imm, v0 # Vector-immediate # vd.mask[i] = carry_out(vs2[i] + vs1[i]) vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in # vd.mask[i] = carry_out(vs2[i] + x[rs1]) vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in # vd.mask[i] = carry_out(vs2[i] + imm) vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in
vmadc.vi	vs2, simm5, vd	vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word
vmadc.vim	vs2, simm5, vd	vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word
vmadc.vv	vs2, vs1, vd	vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word
vmadc.vvm	vs2, vs1, vd	vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word
vmadc.vx	vs2, rs1, vd	vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word
vmadc.vxm	vs2, rs1, vd	vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word
vmsbc.vv	vs2, vs1, vd	For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative.
vmsbc.vvm	vs2, vs1, vd	For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative.
vmsbc.vx	vs2, rs1, vd	For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative.
vmsbc.vxm	vs2, rs1, vd	For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative.
vsbc.vvm	vs2, vs1, vd	The subtract with borrow instruction vsbc performs the equivalent function to support long word arithmetic for subtraction. There are no subtract with immediate instructions. # Produce difference with borrow. # vd[i] = vs2[i] - vs1[i] - v0.mask[i] vsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] - x[rs1] - v0.mask[i] vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # Produce borrow out in mask register format # vd.mask[i] = borrow_out(vs2[i] - vs1[i] - v0.mask[i]) vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = borrow_out(vs2[i] - x[rs1] - v0.mask[i]) vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = borrow_out(vs2[i] - vs1[i]) vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in # vd.mask[i] = borrow_out(vs2[i] - x[rs1]) vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in
vsbc.vxm	vs2, rs1, vd	The subtract with borrow instruction vsbc performs the equivalent function to support long word arithmetic for subtraction. There are no subtract with immediate instructions. # Produce difference with borrow. # vd[i] = vs2[i] - vs1[i] - v0.mask[i] vsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] - x[rs1] - v0.mask[i] vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # Produce borrow out in mask register format # vd.mask[i] = borrow_out(vs2[i] - vs1[i] - v0.mask[i]) vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = borrow_out(vs2[i] - x[rs1] - v0.mask[i]) vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = borrow_out(vs2[i] - vs1[i]) vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in # vd.mask[i] = borrow_out(vs2[i] - x[rs1]) vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in

v / _vector_integer_compare_instructions

Vector Integer Arithmetic Instructions / 12.8. Vector Integer Compare Instructions

Operation	Arguments	Description
vmseq.vi	vs2, simm5, vd	# Set if equal vmseq.vv vd, vs2, vs1, vm # Vector-vector vmseq.vx vd, vs2, rs1, vm # vector-scalar vmseq.vi vd, vs2, imm, vm # vector-immediate # Set if not equal vmsne.vv vd, vs2, vs1, vm # Vector-vector vmsne.vx vd, vs2, rs1, vm # vector-scalar vmsne.vi vd, vs2, imm, vm # vector-immediate # Set if less than, unsigned vmsltu.vv vd, vs2, vs1, vm # Vector-vector vmsltu.vx vd, vs2, rs1, vm # Vector-scalar # Set if less than, signed vmslt.vv vd, vs2, vs1, vm # Vector-vector vmslt.vx vd, vs2, rs1, vm # vector-scalar # Set if less than or equal, unsigned vmsleu.vv vd, vs2, vs1, vm # Vector-vector vmsleu.vx vd, vs2, rs1, vm # vector-scalar vmsleu.vi vd, vs2, imm, vm # Vector-immediate # Set if less than or equal, signed vmsle.vv vd, vs2, vs1, vm # Vector-vector vmsle.vx vd, vs2, rs1, vm # vector-scalar vmsle.vi vd, vs2, imm, vm # vector-immediate # Set if greater than, unsigned vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar vmsgtu.vi vd, vs2, imm, vm # Vector-immediate # Set if greater than, signed vmsgt.vx vd, vs2, rs1, vm # Vector-scalar vmsgt.vi vd, vs2, imm, vm # Vector-immediate # Following two instructions are not provided directly # Set if greater than or equal, unsigned # vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar # Set if greater than or equal, signed # vmsge.vx vd, vs2, rs1, vm # Vector-scalar
vmseq.vv	vs2, vs1, vd	# Set if equal vmseq.vv vd, vs2, vs1, vm # Vector-vector vmseq.vx vd, vs2, rs1, vm # vector-scalar vmseq.vi vd, vs2, imm, vm # vector-immediate # Set if not equal vmsne.vv vd, vs2, vs1, vm # Vector-vector vmsne.vx vd, vs2, rs1, vm # vector-scalar vmsne.vi vd, vs2, imm, vm # vector-immediate # Set if less than, unsigned vmsltu.vv vd, vs2, vs1, vm # Vector-vector vmsltu.vx vd, vs2, rs1, vm # Vector-scalar # Set if less than, signed vmslt.vv vd, vs2, vs1, vm # Vector-vector vmslt.vx vd, vs2, rs1, vm # vector-scalar # Set if less than or equal, unsigned vmsleu.vv vd, vs2, vs1, vm # Vector-vector vmsleu.vx vd, vs2, rs1, vm # vector-scalar vmsleu.vi vd, vs2, imm, vm # Vector-immediate # Set if less than or equal, signed vmsle.vv vd, vs2, vs1, vm # Vector-vector vmsle.vx vd, vs2, rs1, vm # vector-scalar vmsle.vi vd, vs2, imm, vm # vector-immediate # Set if greater than, unsigned vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar vmsgtu.vi vd, vs2, imm, vm # Vector-immediate # Set if greater than, signed vmsgt.vx vd, vs2, rs1, vm # Vector-scalar vmsgt.vi vd, vs2, imm, vm # Vector-immediate # Following two instructions are not provided directly # Set if greater than or equal, unsigned # vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar # Set if greater than or equal, signed # vmsge.vx vd, vs2, rs1, vm # Vector-scalar
vmseq.vx	vs2, rs1, vd	# Set if equal vmseq.vv vd, vs2, vs1, vm # Vector-vector vmseq.vx vd, vs2, rs1, vm # vector-scalar vmseq.vi vd, vs2, imm, vm # vector-immediate # Set if not equal vmsne.vv vd, vs2, vs1, vm # Vector-vector vmsne.vx vd, vs2, rs1, vm # vector-scalar vmsne.vi vd, vs2, imm, vm # vector-immediate # Set if less than, unsigned vmsltu.vv vd, vs2, vs1, vm # Vector-vector vmsltu.vx vd, vs2, rs1, vm # Vector-scalar # Set if less than, signed vmslt.vv vd, vs2, vs1, vm # Vector-vector vmslt.vx vd, vs2, rs1, vm # vector-scalar # Set if less than or equal, unsigned vmsleu.vv vd, vs2, vs1, vm # Vector-vector vmsleu.vx vd, vs2, rs1, vm # vector-scalar vmsleu.vi vd, vs2, imm, vm # Vector-immediate # Set if less than or equal, signed vmsle.vv vd, vs2, vs1, vm # Vector-vector vmsle.vx vd, vs2, rs1, vm # vector-scalar vmsle.vi vd, vs2, imm, vm # vector-immediate # Set if greater than, unsigned vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar vmsgtu.vi vd, vs2, imm, vm # Vector-immediate # Set if greater than, signed vmsgt.vx vd, vs2, rs1, vm # Vector-scalar vmsgt.vi vd, vs2, imm, vm # Vector-immediate # Following two instructions are not provided directly # Set if greater than or equal, unsigned # vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar # Set if greater than or equal, signed # vmsge.vx vd, vs2, rs1, vm # Vector-scalar
vmsgt.vi	vs2, simm5, vd	Similarly, vmsge{u}.vi is not provided and the compare is implemented using vmsgt{u}.vi with the immediate decremented by one. The resulting effective vmsge.vi range is -15 to 16, and the resulting effective vmsgeu.vi range is 1 to 16 (Note, vmsgeu.vi with immediate 0 is not useful as it is always true). The vmsge{u}.vx operation can be synthesized by reducing the value of x by 1 and using the vmsgt{u}.vx instruction, when it is known that this will not underflow the representation in x. Sequences to synthesize `vmsge{u}.vx` instruction va >= x, x > minimum addi t0, x, -1; vmsgt{u}.vx vd, va, t0, vm
vmsgt.vx	vs2, rs1, vd	Similarly, vmsge{u}.vi is not provided and the compare is implemented using vmsgt{u}.vi with the immediate decremented by one. The resulting effective vmsge.vi range is -15 to 16, and the resulting effective vmsgeu.vi range is 1 to 16 (Note, vmsgeu.vi with immediate 0 is not useful as it is always true). The vmsge{u}.vx operation can be synthesized by reducing the value of x by 1 and using the vmsgt{u}.vx instruction, when it is known that this will not underflow the representation in x. Sequences to synthesize `vmsge{u}.vx` instruction va >= x, x > minimum addi t0, x, -1; vmsgt{u}.vx vd, va, t0, vm
vmslt.vv	vs2, vs1, vd	Comparison Assembler Mapping Assembler Pseudoinstruction va < vb vmslt{u}.vv vd, va, vb, vm va <= vb vmsle{u}.vv vd, va, vb, vm va > vb vmslt{u}.vv vd, vb, va, vm vmsgt{u}.vv vd, va, vb, vm va >= vb vmsle{u}.vv vd, vb, va, vm vmsge{u}.vv vd, va, vb, vm va < x vmslt{u}.vx vd, va, x, vm va <= x vmsle{u}.vx vd, va, x, vm va > x vmsgt{u}.vx vd, va, x, vm va >= x see below va < i vmsle{u}.vi vd, va, i-1, vm vmslt{u}.vi vd, va, i, vm va <= i vmsle{u}.vi vd, va, i, vm va > i vmsgt{u}.vi vd, va, i, vm va >= i vmsgt{u}.vi vd, va, i-1, vm vmsge{u}.vi vd, va, i, vm va, vb vector register groups x scalar integer register i immediate unmasked va >= x pseudoinstruction: vmsge{u}.vx vd, va, x expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd masked va >= x, vd != v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 masked va >= x, vd == v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vd, vd, vt masked va >= x, any vd pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vt, v0, vt; vmandn.mm vd, vd, v0; vmor.mm vd, vt, vd The vt argument to the pseudoinstruction must name a temporary vector register that is not same as vd and which will be clobbered by the pseudoinstruction # (a < b) && (b < c) in two instructions when mask-undisturbed vmslt.vv v0, va, vb # All body elements written vmslt.vv v0, vb, vc, v0.t # Only update at set mask
vmslt.vx	vs2, rs1, vd	Comparison Assembler Mapping Assembler Pseudoinstruction va < vb vmslt{u}.vv vd, va, vb, vm va <= vb vmsle{u}.vv vd, va, vb, vm va > vb vmslt{u}.vv vd, vb, va, vm vmsgt{u}.vv vd, va, vb, vm va >= vb vmsle{u}.vv vd, vb, va, vm vmsge{u}.vv vd, va, vb, vm va < x vmslt{u}.vx vd, va, x, vm va <= x vmsle{u}.vx vd, va, x, vm va > x vmsgt{u}.vx vd, va, x, vm va >= x see below va < i vmsle{u}.vi vd, va, i-1, vm vmslt{u}.vi vd, va, i, vm va <= i vmsle{u}.vi vd, va, i, vm va > i vmsgt{u}.vi vd, va, i, vm va >= i vmsgt{u}.vi vd, va, i-1, vm vmsge{u}.vi vd, va, i, vm va, vb vector register groups x scalar integer register i immediate unmasked va >= x pseudoinstruction: vmsge{u}.vx vd, va, x expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd masked va >= x, vd != v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 masked va >= x, vd == v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vd, vd, vt masked va >= x, any vd pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vt, v0, vt; vmandn.mm vd, vd, v0; vmor.mm vd, vt, vd The vt argument to the pseudoinstruction must name a temporary vector register that is not same as vd and which will be clobbered by the pseudoinstruction # (a < b) && (b < c) in two instructions when mask-undisturbed vmslt.vv v0, va, vb # All body elements written vmslt.vv v0, vb, vc, v0.t # Only update at set mask

v / _vector_integer_divide_instructions

Vector Integer Arithmetic Instructions / 12.11. Vector Integer Divide Instructions

Operation	Arguments	Description
vdivu.vv	vs2, vs1, vd	# Unsigned divide. vdivu.vv vd, vs2, vs1, vm # Vector-vector vdivu.vx vd, vs2, rs1, vm # vector-scalar # Signed divide vdiv.vv vd, vs2, vs1, vm # Vector-vector vdiv.vx vd, vs2, rs1, vm # vector-scalar # Unsigned remainder vremu.vv vd, vs2, vs1, vm # Vector-vector vremu.vx vd, vs2, rs1, vm # vector-scalar # Signed remainder vrem.vv vd, vs2, vs1, vm # Vector-vector vrem.vx vd, vs2, rs1, vm # vector-scalar
vdivu.vx	vs2, rs1, vd	# Unsigned divide. vdivu.vv vd, vs2, vs1, vm # Vector-vector vdivu.vx vd, vs2, rs1, vm # vector-scalar # Signed divide vdiv.vv vd, vs2, vs1, vm # Vector-vector vdiv.vx vd, vs2, rs1, vm # vector-scalar # Unsigned remainder vremu.vv vd, vs2, vs1, vm # Vector-vector vremu.vx vd, vs2, rs1, vm # vector-scalar # Signed remainder vrem.vv vd, vs2, vs1, vm # Vector-vector vrem.vx vd, vs2, rs1, vm # vector-scalar

v / _vector_integer_merge_instructions

Vector Integer Arithmetic Instructions / 12.15. Vector Integer Merge Instructions

Operation	Arguments	Description
vmerge.vim	vs2, simm5, vd	The vmerge instructions are encoded as masked instructions (vm=0). The instructions combine two sources as follows. At elements where the mask value is zero, the first operand is copied to the destination element, otherwise the second operand is copied to the destination element. The first operand is always a vector register group specified by vs2. The second operand is a vector register group specified by vs1 or a scalar x register specified by rs1 or a 5-bit sign-extended immediate. vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i]
vmerge.vvm	vs2, vs1, vd	The vmerge instructions are encoded as masked instructions (vm=0). The instructions combine two sources as follows. At elements where the mask value is zero, the first operand is copied to the destination element, otherwise the second operand is copied to the destination element. The first operand is always a vector register group specified by vs2. The second operand is a vector register group specified by vs1 or a scalar x register specified by rs1 or a 5-bit sign-extended immediate. vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i]
vmerge.vxm	vs2, rs1, vd	The vmerge instructions are encoded as masked instructions (vm=0). The instructions combine two sources as follows. At elements where the mask value is zero, the first operand is copied to the destination element, otherwise the second operand is copied to the destination element. The first operand is always a vector register group specified by vs2. The second operand is a vector register group specified by vs1 or a scalar x register specified by rs1 or a 5-bit sign-extended immediate. vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i]

v / _vector_integer_minmax_instructions

Vector Integer Arithmetic Instructions / 12.9. Vector Integer Min/Max Instructions

Operation	Arguments	Description
vminu.vv	vs2, vs1, vd	# Unsigned minimum vminu.vv vd, vs2, vs1, vm # Vector-vector vminu.vx vd, vs2, rs1, vm # vector-scalar # Signed minimum vmin.vv vd, vs2, vs1, vm # Vector-vector vmin.vx vd, vs2, rs1, vm # vector-scalar # Unsigned maximum vmaxu.vv vd, vs2, vs1, vm # Vector-vector vmaxu.vx vd, vs2, rs1, vm # vector-scalar # Signed maximum vmax.vv vd, vs2, vs1, vm # Vector-vector vmax.vx vd, vs2, rs1, vm # vector-scalar
vminu.vx	vs2, rs1, vd	# Unsigned minimum vminu.vv vd, vs2, vs1, vm # Vector-vector vminu.vx vd, vs2, rs1, vm # vector-scalar # Signed minimum vmin.vv vd, vs2, vs1, vm # Vector-vector vmin.vx vd, vs2, rs1, vm # vector-scalar # Unsigned maximum vmaxu.vv vd, vs2, vs1, vm # Vector-vector vmaxu.vx vd, vs2, rs1, vm # vector-scalar # Signed maximum vmax.vv vd, vs2, vs1, vm # Vector-vector vmax.vx vd, vs2, rs1, vm # vector-scalar

v / _vector_integer_move_instructions

Vector Integer Arithmetic Instructions / 12.16. Vector Integer Move Instructions

Operation	Arguments	Description
vmv.s.x	rs1, vd	The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm
vmv.v.i	simm5, vd	The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm
vmv.v.v	vs1, vd	The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm
vmv.v.x	rs1, vd	The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm
vmv.x.s	vs2, rd	The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm

v / _vector_iota_instruction

Vector Mask Instructions / 16.8. Vector Iota Instruction

Operation

Arguments

Description

viota.m

vs2, vd

The viota.m instruction reads a source vector mask register and writes to each element of the destination vector register group the sum of all the bits of elements in the mask register whose index is less than the element, e.g., a parallel prefix sum of the mask values.

Traps on viota.m are always reported with a vstart of 0, and execution is always restarted from the beginning when resuming after a trap handler. An illegal instruction exception is raised if vstart is non-zero.

The viota.m instruction can be combined with memory scatter instructions (indexed stores) to perform vector compress functions.

viota.m vd, vs2, vm # Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 0 0 1 v2 contents viota.m v4, v2 # Unmasked 2 2 2 1 1 1 1 0 v4 result 1 1 1 0 1 0 1 1 v0 contents 1 0 0 1 0 0 0 1 v2 contents 2 3 4 5 6 7 8 9 v4 contents viota.m v4, v2, v0.t # Masked, vtype.vma=0 1 1 1 5 1 7 1 0 v4 results

Spike ISS Implementation:

// vmpopc rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
reg_t vl = P.VU.vl->read();
reg_t sew = P.VU.vsew;
reg_t rd_num = insn.rd();
reg_t rs2_num = insn.rs2();
require(P.VU.vstart->read() == 0);
require_vm;
require_align(rd_num, P.VU.vflmul);
require_noover(rd_num, P.VU.vflmul, rs2_num, 1);

int cnt = 0;
for (reg_t i = 0; i < vl; ++i) {
const int midx = i / 64;
const int mpos = i % 64;

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx) >> mpos) & 0x1) == 1;
bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1;

bool has_one = false;
if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) {
if (vs2_lsb) {
has_one = true;
}
}

bool use_ori = (insn.v_vm() == 0) && !do_mask;
switch (sew) {
case e8:
P.VU.elt<uint8_t>(rd_num, i, true) = use_ori ?
P.VU.elt<uint8_t>(rd_num, i) : cnt;
break;
case e16:
P.VU.elt<uint16_t>(rd_num, i, true) = use_ori ?
P.VU.elt<uint16_t>(rd_num, i) : cnt;
break;
case e32:
P.VU.elt<uint32_t>(rd_num, i, true) = use_ori ?
P.VU.elt<uint32_t>(rd_num, i) : cnt;
break;
default:
P.VU.elt<uint64_t>(rd_num, i, true) = use_ori ?
P.VU.elt<uint64_t>(rd_num, i) : cnt;
break;
}

if (has_one) {
cnt++;
}
}

v / _vector_loadstore_whole_register_instructions

Vector Loads and Stores / 8.9. Vector Load/Store Whole Register Instructions

Operation

Arguments

Description

vl1re8.v

rs1, vd

# Format of whole register load and store instructions. vl1r.v v3, (a0) # Pseudoinstruction equal to vl1re8.v vl1re8.v v3, (a0) # Load v3 with VLEN/8 bytes held at address in a0 vl1re16.v v3, (a0) # Load v3 with VLEN/16 halfwords held at address in a0 vl1re32.v v3, (a0) # Load v3 with VLEN/32 words held at address in a0 vl1re64.v v3, (a0) # Load v3 with VLEN/64 doublewords held at address in a0 vl2r.v v2, (a0) # Pseudoinstruction equal to vl2re8.v v2, (a0) vl2re8.v v2, (a0) # Load v2-v3 with 2*VLEN/8 bytes from address in a0 vl2re16.v v2, (a0) # Load v2-v3 with 2*VLEN/16 halfwords held at address in a0 vl2re32.v v2, (a0) # Load v2-v3 with 2*VLEN/32 words held at address in a0 vl2re64.v v2, (a0) # Load v2-v3 with 2*VLEN/64 doublewords held at address in a0 vl4r.v v4, (a0) # Pseudoinstruction equal to vl4re8.v vl4re8.v v4, (a0) # Load v4-v7 with 4*VLEN/8 bytes from address in a0 vl4re16.v v4, (a0) vl4re32.v v4, (a0) vl4re64.v v4, (a0) vl8r.v v8, (a0) # Pseudoinstruction equal to vl8re8.v vl8re8.v v8, (a0) # Load v8-v15 with 8*VLEN/8 bytes from address in a0 vl8re16.v v8, (a0) vl8re32.v v8, (a0) vl8re64.v v8, (a0) vs1r.v v3, (a1) # Store v3 to address in a1 vs2r.v v2, (a1) # Store v2-v3 to address in a1 vs4r.v v4, (a1) # Store v4-v7 to address in a1 vs8r.v v8, (a1) # Store v8-v15 to address in a1

Spike ISS Implementation:

// vl1re8.v vd, (rs1)
VI_LD_WHOLE(uint8);

v / _vector_narrowing_fixed_point_clip_instructions

Vector Fixed-Point Arithmetic Instructions / 13.5. Vector Narrowing Fixed-Point Clip Instructions

Operation	Arguments	Description
vnclip.wi	vs2, simm5, vd	The vnclip instructions are used to pack a fixed-point value into a narrower destination. The instructions support rounding, scaling, and saturation into the final destination format. The source data is in the vector register group specified by vs2. The scaling shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. The low lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are used to control the right shift amount, which provides the scaling. For vnclip, the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer.
vnclip.wv	vs2, vs1, vd	The vnclip instructions are used to pack a fixed-point value into a narrower destination. The instructions support rounding, scaling, and saturation into the final destination format. The source data is in the vector register group specified by vs2. The scaling shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. The low lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are used to control the right shift amount, which provides the scaling. For vnclip, the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer.
vnclip.wx	vs2, rs1, vd	The vnclip instructions are used to pack a fixed-point value into a narrower destination. The instructions support rounding, scaling, and saturation into the final destination format. The source data is in the vector register group specified by vs2. The scaling shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. The low lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are used to control the right shift amount, which provides the scaling. For vnclip, the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer.
vnclipu.wi	vs2, simm5, vd	For vnclipu/vnclip, the rounding mode is specified in the vxrm CSR. Rounding occurs around the least-significant bit of the destination and before saturation. For vnclipu, the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer. # Narrowing unsigned clip # SEW 2*SEW SEW vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm))
vnclipu.wv	vs2, vs1, vd	For vnclipu/vnclip, the rounding mode is specified in the vxrm CSR. Rounding occurs around the least-significant bit of the destination and before saturation. For vnclipu, the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer. # Narrowing unsigned clip # SEW 2*SEW SEW vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm))
vnclipu.wx	vs2, rs1, vd	For vnclipu/vnclip, the rounding mode is specified in the vxrm CSR. Rounding occurs around the least-significant bit of the destination and before saturation. For vnclipu, the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer. # Narrowing unsigned clip # SEW 2*SEW SEW vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm))

v / _vector_register_gather_instructions

Vector Permutation Instructions / 17.4. Vector Register Gather Instructions

Operation	Arguments	Description
vrgather.vi	vs2, simm5, vd	The vrgather.vv form uses SEW/LMUL for both the data and indices. The vrgatherei16.vv form uses SEW/LMUL for the data in vs2 but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in vs1. For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, otherwise the instruction encoding is reserved. vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]
vrgather.vv	vs2, vs1, vd	The vrgather.vv form uses SEW/LMUL for both the data and indices. The vrgatherei16.vv form uses SEW/LMUL for the data in vs2 but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in vs1. For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, otherwise the instruction encoding is reserved. vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]
vrgather.vx	vs2, rs1, vd	The vrgather.vv form uses SEW/LMUL for both the data and indices. The vrgatherei16.vv form uses SEW/LMUL for the data in vs2 but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in vs1. For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, otherwise the instruction encoding is reserved. vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]

v / _vector_register_grouping_vlmul20

Vector Extension Programmer’s Model / 4.4. Vector type register, vtype

Operation

Arguments

Description

min

rd, rs1, rs2

MIN

Spike ISS Implementation:

require_either_extension(EXT_ZBPBO, EXT_ZBB);
WRITE_RD(sext_xlen(sreg_t(RS1) < sreg_t(RS2) ? RS1 : RS2));

v / _vector_single_width_averaging_add_and_subtract

Vector Fixed-Point Arithmetic Instructions / 13.2. Vector Single-Width Averaging Add and Subtract

Operation

Arguments

Description

vaaddu.vv

vs2, vs1, vd

The averaging add and subtract instructions right shift the result by one bit and round off the result according to the setting in vxrm. Both unsigned and signed versions are provided. For vaaddu and vaadd there can be no overflow in the result. For vasub and vasubu, overflow is ignored and the result wraps around.

# Averaging add # Averaging adds of unsigned integers. vaaddu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] + vs1[i], 1) vaaddu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] + x[rs1], 1) # Averaging adds of signed integers. vaadd.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] + vs1[i], 1) vaadd.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] + x[rs1], 1) # Averaging subtract # Averaging subtract of unsigned integers. vasubu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] - vs1[i], 1) vasubu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] - x[rs1], 1) # Averaging subtract of signed integers. vasub.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] - vs1[i], 1) vasub.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] - x[rs1], 1)

vaaddu.vx

vs2, rs1, vd

v / _vector_single_width_floating_point_addsubtract_instructions

Vector Floating-Point Instructions / 14.2. Vector Single-Width Floating-Point Add/Subtract Instructions

Operation	Arguments	Description
vfadd.vf	vs2, rs1, vd	# Floating-point add vfadd.vv vd, vs2, vs1, vm # Vector-vector vfadd.vf vd, vs2, rs1, vm # vector-scalar # Floating-point subtract vfsub.vv vd, vs2, vs1, vm # Vector-vector vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1] vfrsub.vf vd, vs2, rs1, vm # Scalar-vector vd[i] = f[rs1] - vs2[i]
vfadd.vv	vs2, vs1, vd	# Floating-point add vfadd.vv vd, vs2, vs1, vm # Vector-vector vfadd.vf vd, vs2, rs1, vm # vector-scalar # Floating-point subtract vfsub.vv vd, vs2, vs1, vm # Vector-vector vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1] vfrsub.vf vd, vs2, rs1, vm # Scalar-vector vd[i] = f[rs1] - vs2[i]

v / _vector_single_width_floating_point_fused_multiply_add_instructions

Vector Floating-Point Instructions / 14.6. Vector Single-Width Floating-Point Fused Multiply-Add Instructions

Operation	Arguments	Description
vfmacc.vf	vs2, rs1, vd	# FP multiply-accumulate, overwrites addend vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP negate-(multiply-accumulate), overwrites subtrahend vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP multiply-subtract-accumulator, overwrites subtrahend vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP negate-(multiply-subtract-accumulator), overwrites minuend vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] # FP multiply-add, overwrites multiplicand vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i] vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i] # FP negate-(multiply-add), overwrites multiplicand vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i] vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i] # FP multiply-sub, overwrites multiplicand vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i] vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i] # FP negate-(multiply-sub), overwrites multiplicand vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vfnmsub.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) + vs2[i]
vfmacc.vv	vs2, vs1, vd	# FP multiply-accumulate, overwrites addend vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP negate-(multiply-accumulate), overwrites subtrahend vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP multiply-subtract-accumulator, overwrites subtrahend vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP negate-(multiply-subtract-accumulator), overwrites minuend vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] # FP multiply-add, overwrites multiplicand vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i] vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i] # FP negate-(multiply-add), overwrites multiplicand vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i] vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i] # FP multiply-sub, overwrites multiplicand vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i] vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i] # FP negate-(multiply-sub), overwrites multiplicand vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vfnmsub.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) + vs2[i]

v / _vector_single_width_floating_point_multiplydivide_instructions

Vector Floating-Point Instructions / 14.4. Vector Single-Width Floating-Point Multiply/Divide Instructions

Operation	Arguments	Description
vfmul.vf	vs2, rs1, vd	# Floating-point multiply vfmul.vv vd, vs2, vs1, vm # Vector-vector vfmul.vf vd, vs2, rs1, vm # vector-scalar # Floating-point divide vfdiv.vv vd, vs2, vs1, vm # Vector-vector vfdiv.vf vd, vs2, rs1, vm # vector-scalar # Reverse floating-point divide vector = scalar / vector vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i]
vfmul.vv	vs2, vs1, vd	# Floating-point multiply vfmul.vv vd, vs2, vs1, vm # Vector-vector vfmul.vf vd, vs2, rs1, vm # vector-scalar # Floating-point divide vfdiv.vv vd, vs2, vs1, vm # Vector-vector vfdiv.vf vd, vs2, rs1, vm # vector-scalar # Reverse floating-point divide vector = scalar / vector vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i]

v / _vector_single_width_fractional_multiply_with_rounding_and_saturation

Vector Fixed-Point Arithmetic Instructions / 13.3. Vector Single-Width Fractional Multiply with Rounding and Saturation

Operation	Arguments	Description
vsmul.vv	vs2, vs1, vd	# Signed saturating and rounding fractional multiply # See vxrm description for rounding calculation vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]vs1[i], SEW-1)) vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]x[rs1], SEW-1))
vsmul.vx	vs2, rs1, vd	# Signed saturating and rounding fractional multiply # See vxrm description for rounding calculation vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]vs1[i], SEW-1)) vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]x[rs1], SEW-1))

v / _vector_single_width_integer_add_and_subtract

Vector Integer Arithmetic Instructions / 12.1. Vector Single-Width Integer Add and Subtract

Operation	Arguments	Description
vadd.vi	vs2, simm5, vd	# Integer adds. vadd.vv vd, vs2, vs1, vm # Vector-vector vadd.vx vd, vs2, rs1, vm # vector-scalar vadd.vi vd, vs2, imm, vm # vector-immediate # Integer subtract vsub.vv vd, vs2, vs1, vm # Vector-vector vsub.vx vd, vs2, rs1, vm # vector-scalar # Integer reverse subtract vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i]
vadd.vv	vs2, vs1, vd	# Integer adds. vadd.vv vd, vs2, vs1, vm # Vector-vector vadd.vx vd, vs2, rs1, vm # vector-scalar vadd.vi vd, vs2, imm, vm # vector-immediate # Integer subtract vsub.vv vd, vs2, vs1, vm # Vector-vector vsub.vx vd, vs2, rs1, vm # vector-scalar # Integer reverse subtract vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i]
vadd.vx	vs2, rs1, vd	# Integer adds. vadd.vv vd, vs2, vs1, vm # Vector-vector vadd.vx vd, vs2, rs1, vm # vector-scalar vadd.vi vd, vs2, imm, vm # vector-immediate # Integer subtract vsub.vv vd, vs2, vs1, vm # Vector-vector vsub.vx vd, vs2, rs1, vm # vector-scalar # Integer reverse subtract vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i]

v / _vector_single_width_integer_multiply_add_instructions

Vector Integer Arithmetic Instructions / 12.13. Vector Single-Width Integer Multiply-Add Instructions

Operation

Arguments

Description

vmacc.vv

vs2, vs1, vd

The integer multiply-add instructions are destructive and are provided in two forms, one that overwrites the addend or minuend (vmacc, vnmsac) and one that overwrites the first multiplicand (vmadd, vnmsub).

# Integer multiply-add, overwrite addend vmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Integer multiply-sub, overwrite minuend vnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vnmsac.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vs2[i]) + vd[i] # Integer multiply-add, overwrite multiplicand vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i] vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i] # Integer multiply-sub, overwrite multiplicand vnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vnmsub.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vd[i]) + vs2[i]

vmacc.vx

vs2, rs1, vd

v / _vector_single_width_integer_multiply_instructions

Vector Integer Arithmetic Instructions / 12.10. Vector Single-Width Integer Multiply Instructions

Operation	Arguments	Description
vmul.vv	vs2, vs1, vd	# Signed multiply, returning low bits of product vmul.vv vd, vs2, vs1, vm # Vector-vector vmul.vx vd, vs2, rs1, vm # vector-scalar # Signed multiply, returning high bits of product vmulh.vv vd, vs2, vs1, vm # Vector-vector vmulh.vx vd, vs2, rs1, vm # vector-scalar # Unsigned multiply, returning high bits of product vmulhu.vv vd, vs2, vs1, vm # Vector-vector vmulhu.vx vd, vs2, rs1, vm # vector-scalar # Signed(vs2)-Unsigned multiply, returning high bits of product vmulhsu.vv vd, vs2, vs1, vm # Vector-vector vmulhsu.vx vd, vs2, rs1, vm # vector-scalar
vmul.vx	vs2, rs1, vd	# Signed multiply, returning low bits of product vmul.vv vd, vs2, vs1, vm # Vector-vector vmul.vx vd, vs2, rs1, vm # vector-scalar # Signed multiply, returning high bits of product vmulh.vv vd, vs2, vs1, vm # Vector-vector vmulh.vx vd, vs2, rs1, vm # vector-scalar # Unsigned multiply, returning high bits of product vmulhu.vv vd, vs2, vs1, vm # Vector-vector vmulhu.vx vd, vs2, rs1, vm # vector-scalar # Signed(vs2)-Unsigned multiply, returning high bits of product vmulhsu.vv vd, vs2, vs1, vm # Vector-vector vmulhsu.vx vd, vs2, rs1, vm # vector-scalar

v / _vector_single_width_saturating_add_and_subtract

Vector Fixed-Point Arithmetic Instructions / 13.1. Vector Single-Width Saturating Add and Subtract

Operation	Arguments	Description
vsaddu.vi	vs2, simm5, vd	# Saturating adds of unsigned integers. vsaddu.vv vd, vs2, vs1, vm # Vector-vector vsaddu.vx vd, vs2, rs1, vm # vector-scalar vsaddu.vi vd, vs2, imm, vm # vector-immediate # Saturating adds of signed integers. vsadd.vv vd, vs2, vs1, vm # Vector-vector vsadd.vx vd, vs2, rs1, vm # vector-scalar vsadd.vi vd, vs2, imm, vm # vector-immediate # Saturating subtract of unsigned integers. vssubu.vv vd, vs2, vs1, vm # Vector-vector vssubu.vx vd, vs2, rs1, vm # vector-scalar # Saturating subtract of signed integers. vssub.vv vd, vs2, vs1, vm # Vector-vector vssub.vx vd, vs2, rs1, vm # vector-scalar
vsaddu.vv	vs2, vs1, vd	# Saturating adds of unsigned integers. vsaddu.vv vd, vs2, vs1, vm # Vector-vector vsaddu.vx vd, vs2, rs1, vm # vector-scalar vsaddu.vi vd, vs2, imm, vm # vector-immediate # Saturating adds of signed integers. vsadd.vv vd, vs2, vs1, vm # Vector-vector vsadd.vx vd, vs2, rs1, vm # vector-scalar vsadd.vi vd, vs2, imm, vm # vector-immediate # Saturating subtract of unsigned integers. vssubu.vv vd, vs2, vs1, vm # Vector-vector vssubu.vx vd, vs2, rs1, vm # vector-scalar # Saturating subtract of signed integers. vssub.vv vd, vs2, vs1, vm # Vector-vector vssub.vx vd, vs2, rs1, vm # vector-scalar
vsaddu.vx	vs2, rs1, vd	# Saturating adds of unsigned integers. vsaddu.vv vd, vs2, vs1, vm # Vector-vector vsaddu.vx vd, vs2, rs1, vm # vector-scalar vsaddu.vi vd, vs2, imm, vm # vector-immediate # Saturating adds of signed integers. vsadd.vv vd, vs2, vs1, vm # Vector-vector vsadd.vx vd, vs2, rs1, vm # vector-scalar vsadd.vi vd, vs2, imm, vm # vector-immediate # Saturating subtract of unsigned integers. vssubu.vv vd, vs2, vs1, vm # Vector-vector vssubu.vx vd, vs2, rs1, vm # vector-scalar # Saturating subtract of signed integers. vssub.vv vd, vs2, vs1, vm # Vector-vector vssub.vx vd, vs2, rs1, vm # vector-scalar

v / _vector_single_width_scaling_shift_instructions

Vector Fixed-Point Arithmetic Instructions / 13.4. Vector Single-Width Scaling Shift Instructions

Operation	Arguments	Description
vssrl.vi	vs2, simm5, vd	These instructions shift the input value right, and round off the shifted out bits according to vxrm. The scaling right shifts have both zero-extending (vssrl) and sign-extending (vssra) forms. The data to be shifted is in the vector register group specified by vs2 and the shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. Only the low lg2(SEW) bits of the shift-amount value are used to control the shift amount. # Scaling shift right logical vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) # Scaling shift right arithmetic vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm)
vssrl.vv	vs2, vs1, vd	These instructions shift the input value right, and round off the shifted out bits according to vxrm. The scaling right shifts have both zero-extending (vssrl) and sign-extending (vssra) forms. The data to be shifted is in the vector register group specified by vs2 and the shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. Only the low lg2(SEW) bits of the shift-amount value are used to control the shift amount. # Scaling shift right logical vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) # Scaling shift right arithmetic vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm)
vssrl.vx	vs2, rs1, vd	These instructions shift the input value right, and round off the shifted out bits according to vxrm. The scaling right shifts have both zero-extending (vssrl) and sign-extending (vssra) forms. The data to be shifted is in the vector register group specified by vs2 and the shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. Only the low lg2(SEW) bits of the shift-amount value are used to control the shift amount. # Scaling shift right logical vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) # Scaling shift right arithmetic vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm)

v / _vector_single_width_shift_instructions

Vector Integer Arithmetic Instructions / 12.6. Vector Single-Width Shift Instructions

Operation	Arguments	Description
vsll.vi	vs2, simm5, vd	# Bit shift operations vsll.vv vd, vs2, vs1, vm # Vector-vector vsll.vx vd, vs2, rs1, vm # vector-scalar vsll.vi vd, vs2, uimm, vm # vector-immediate vsrl.vv vd, vs2, vs1, vm # Vector-vector vsrl.vx vd, vs2, rs1, vm # vector-scalar vsrl.vi vd, vs2, uimm, vm # vector-immediate vsra.vv vd, vs2, vs1, vm # Vector-vector vsra.vx vd, vs2, rs1, vm # vector-scalar vsra.vi vd, vs2, uimm, vm # vector-immediate
vsll.vv	vs2, vs1, vd	# Bit shift operations vsll.vv vd, vs2, vs1, vm # Vector-vector vsll.vx vd, vs2, rs1, vm # vector-scalar vsll.vi vd, vs2, uimm, vm # vector-immediate vsrl.vv vd, vs2, vs1, vm # Vector-vector vsrl.vx vd, vs2, rs1, vm # vector-scalar vsrl.vi vd, vs2, uimm, vm # vector-immediate vsra.vv vd, vs2, vs1, vm # Vector-vector vsra.vx vd, vs2, rs1, vm # vector-scalar vsra.vi vd, vs2, uimm, vm # vector-immediate
vsll.vx	vs2, rs1, vd	# Bit shift operations vsll.vv vd, vs2, vs1, vm # Vector-vector vsll.vx vd, vs2, rs1, vm # vector-scalar vsll.vi vd, vs2, uimm, vm # vector-immediate vsrl.vv vd, vs2, vs1, vm # Vector-vector vsrl.vx vd, vs2, rs1, vm # vector-scalar vsrl.vi vd, vs2, uimm, vm # vector-immediate vsra.vv vd, vs2, vs1, vm # Vector-vector vsra.vx vd, vs2, rs1, vm # vector-scalar vsra.vi vd, vs2, uimm, vm # vector-immediate

v / _vector_slide1down_instruction

Vector Permutation Instructions / 17.3. Vector Slide Instructions

Operation

Arguments

Description

vfslide1down.vf

vs2, rs1, vd

The vfslide1down instruction is defined analogously, but sources its scalar argument from an f register.

vslide1down.vx

vs2, rs1, vd

The vslide1down instruction copies the first vl-1 active elements values from index i+1 in the source vector register group to index i in the destination vector register group.

The vslide1down instruction places the x register argument at location vl-1 in the destination vector register, provided that element vl-1 is active, otherwise the destination element is unchanged. If XLEN < SEW, the value is sign-extended to SEW bits. If XLEN > SEW, the least-significant bits are copied over and the high SEW-XLEN bits are ignored.

vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] vfslide1down.vf vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=f[rs1]

vslide1down behavior i < vstart unchanged vstart <= i < vl-1 vd[i] = vs2[i+1] if v0.mask[i] enabled vstart <= i = vl-1 vd[vl-1] = x[rs1] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy

v / _vector_slide1up

Vector Permutation Instructions / 17.3. Vector Slide Instructions

Operation

Arguments

Description

vfslide1up.vf

vs2, rs1, vd

The vfslide1up instruction is defined analogously, but sources its scalar argument from an f register.

vslide1up.vx

vs2, rs1, vd

The vslide1up instruction places the x register argument at location 0 of the destination vector register group, provided that element 0 is active, otherwise the destination element update follows the current mask agnostic/undisturbed policy. If XLEN < SEW, the value is sign-extended to SEW bits. If XLEN > SEW, the least-significant bits are copied over and the high SEW-XLEN bits are ignored.

The vslide1up instruction requires that the destination vector register group does not overlap the source vector register group. Otherwise, the instruction encoding is reserved.

vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] vfslide1up.vf vd, vs2, rs1, vm # vd[0]=f[rs1], vd[i+1] = vs2[i]

vslide1up behavior i < vstart unchanged 0 = i = vstart vd[i] = x[rs1] if v0.mask[i] enabled max(vstart, 1) <= i < vl vd[i] = vs2[i-1] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy

v / _vector_slide_instructions

Vector Permutation Instructions / 17.3. Vector Slide Instructions

Operation	Arguments	Description
vslideup.vi	vs2, simm5, vd	For all of the vslideup, vslidedown, v[f]slide1up, and v[f]slide1down instructions, if vstart >= vl, the instruction performs no operation and leaves the destination vector register unchanged.
vslideup.vx	vs2, rs1, vd	For all of the vslideup, vslidedown, v[f]slide1up, and v[f]slide1down instructions, if vstart >= vl, the instruction performs no operation and leaves the destination vector register unchanged.

v / _vector_slidedown_instructions

Vector Permutation Instructions / 17.3. Vector Slide Instructions

Operation

Arguments

Description

vslidedown.vi

vs2, simm5, vd

For vslidedown, the value in vl specifies the maximum number of destination elements that are written. The remaining elements past vl are handled according to the current tail policy (Section Vector Tail Agnostic and Vector Mask Agnostic vta and vma ).

vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] vslidedown.vi vd, vs2, uimm, vm # vd[i] = vs2[i+uimm]

vslidedown behavior for source elements for element i in slide 0 <= i+OFFSET < VLMAX src[i] = vs2[i+OFFSET] VLMAX <= i+OFFSET src[i] = 0 vslidedown behavior for destination element i in slide 0 < i < vstart Unchanged vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy

vslidedown.vx

vs2, rs1, vd

vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] vslidedown.vi vd, vs2, uimm, vm # vd[i] = vs2[i+uimm]

v / _vector_strided_instructions

Vector Loads and Stores / 8.5. Vector Strided Instructions

Operation

Arguments

Description

vlse8.v

rs2, rs1, vd

# Vector strided loads and stores # vd destination, rs1 base address, rs2 byte stride vlse8.v vd, (rs1), rs2, vm # 8-bit strided load vlse16.v vd, (rs1), rs2, vm # 16-bit strided load vlse32.v vd, (rs1), rs2, vm # 32-bit strided load vlse64.v vd, (rs1), rs2, vm # 64-bit strided load # vs3 store data, rs1 base address, rs2 byte stride vsse8.v vs3, (rs1), rs2, vm # 8-bit strided store vsse16.v vs3, (rs1), rs2, vm # 16-bit strided store vsse32.v vs3, (rs1), rs2, vm # 32-bit strided store vsse64.v vs3, (rs1), rs2, vm # 64-bit strided store

Spike ISS Implementation:

// vlse8.v and vlsseg[2-8]e8.v
VI_LD(i * RS2, fn, int8, false);

v / _vector_unit_stride_instructions

Vector Loads and Stores / 8.4. Vector Unit-Stride Instructions

Operation

Arguments

Description

vle8.v

rs1, vd

# Vector unit-stride loads and stores # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) vle8.v vd, (rs1), vm # 8-bit unit-stride load vle16.v vd, (rs1), vm # 16-bit unit-stride load vle32.v vd, (rs1), vm # 32-bit unit-stride load vle64.v vd, (rs1), vm # 64-bit unit-stride load # vs3 store data, rs1 base address, vm is mask encoding (v0.t or <missing>) vse8.v vs3, (rs1), vm # 8-bit unit-stride store vse16.v vs3, (rs1), vm # 16-bit unit-stride store vse32.v vs3, (rs1), vm # 32-bit unit-stride store vse64.v vs3, (rs1), vm # 64-bit unit-stride store

Spike ISS Implementation:

// vle8.v and vlseg[2-8]e8.v
VI_LD(0, (i * nf + fn), int8, false);

vlm.v

rs1, vd

vlm.v and vsm.v are encoded with the same width[2:0]=0 encoding as vle8.v and vse8.v, but are distinguished by different lumop and sumop encodings. Since vlm.v and vsm.v operate as byte loads and stores, vstart is in units of bytes for these instructions.

# Vector unit-stride mask load vlm.v vd, (rs1) # Load byte vector of length ceil(vl/8) # Vector unit-stride mask store vsm.v vs3, (rs1) # Store byte vector of length ceil(vl/8)

Spike ISS Implementation:

// vle1.v and vlseg[2-8]e8.v
VI_LD(0, (i * nf + fn), int8, true);

v / _vector_unordered_single_width_floating_point_sum_reduction

Vector Reduction Operations / 15.3. Vector Single-Width Floating-Point Reduction Instructions

Operation	Arguments	Description
vfredusum.vs	vs2, vs1, vd	The unordered sum reduction instruction, vfredusum, provides an implementation more freedom in performing the reduction.

v / _vector_widening_floating_point_addsubtract_instructions

Vector Floating-Point Instructions / 14.3. Vector Widening Floating-Point Add/Subtract Instructions

Operation	Arguments	Description
vfwadd.vf	vs2, rs1, vd	# Widening FP add/subtract, 2SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar
vfwadd.vv	vs2, vs1, vd	# Widening FP add/subtract, 2SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar
vfwadd.wf	vs2, rs1, vd	# Widening FP add/subtract, 2SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar
vfwadd.wv	vs2, vs1, vd	# Widening FP add/subtract, 2SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar

v / _vector_widening_floating_point_fused_multiply_add_instructions

Vector Floating-Point Instructions / 14.7. Vector Widening Floating-Point Fused Multiply-Add Instructions

Operation	Arguments	Description
vfwmacc.vf	vs2, rs1, vd	# FP widening multiply-accumulate, overwrites addend vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP widening negate-(multiply-accumulate), overwrites addend vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP widening multiply-subtract-accumulator, overwrites addend vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP widening negate-(multiply-subtract-accumulator), overwrites addend vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i]
vfwmacc.vv	vs2, vs1, vd	# FP widening multiply-accumulate, overwrites addend vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP widening negate-(multiply-accumulate), overwrites addend vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP widening multiply-subtract-accumulator, overwrites addend vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP widening negate-(multiply-subtract-accumulator), overwrites addend vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i]

v / _vector_widening_floating_point_multiply

Vector Floating-Point Instructions / 14.5. Vector Widening Floating-Point Multiply

Operation	Arguments	Description
vfwmul.vf	vs2, rs1, vd	# Widening floating-point multiply vfwmul.vv vd, vs2, vs1, vm # vector-vector vfwmul.vf vd, vs2, rs1, vm # vector-scalar
vfwmul.vv	vs2, vs1, vd	# Widening floating-point multiply vfwmul.vv vd, vs2, vs1, vm # vector-vector vfwmul.vf vd, vs2, rs1, vm # vector-scalar

v / _vector_widening_integer_addsubtract

Vector Integer Arithmetic Instructions / 12.2. Vector Widening Integer Add/Subtract

Operation	Arguments	Description
vwaddu.vv	vs2, vs1, vd	# Widening unsigned integer add/subtract, 2SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2SEW = 2SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2SEW = 2SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar
vwaddu.vx	vs2, rs1, vd	# Widening unsigned integer add/subtract, 2SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2SEW = 2SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2SEW = 2SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar
vwaddu.wv	vs2, vs1, vd	# Widening unsigned integer add/subtract, 2SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2SEW = 2SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2SEW = 2SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar
vwaddu.wx	vs2, rs1, vd	# Widening unsigned integer add/subtract, 2SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2SEW = 2SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2SEW = 2SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar

v / _vector_widening_integer_multiply_add_instructions

Vector Integer Arithmetic Instructions / 12.14. Vector Widening Integer Multiply-Add Instructions

Operation	Arguments	Description
vwmaccu.vv	vs2, vs1, vd	# Widening unsigned-integer multiply-add, overwrite addend vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-integer multiply-add, overwrite addend vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-unsigned-integer multiply-add, overwrite addend vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i] vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i] # Widening unsigned-signed-integer multiply-add, overwrite addend vwmaccus.vx vd, rs1, vs2, vm # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i]
vwmaccu.vx	vs2, rs1, vd	# Widening unsigned-integer multiply-add, overwrite addend vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-integer multiply-add, overwrite addend vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-unsigned-integer multiply-add, overwrite addend vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i] vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i] # Widening unsigned-signed-integer multiply-add, overwrite addend vwmaccus.vx vd, rs1, vs2, vm # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i]

v / _vector_widening_integer_multiply_instructions

Vector Integer Arithmetic Instructions / 12.12. Vector Widening Integer Multiply Instructions

Operation	Arguments	Description
vwmul.vv	vs2, vs1, vd	# Widening signed-integer multiply vwmul.vv vd, vs2, vs1, vm # vector-vector vwmul.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned-integer multiply vwmulu.vv vd, vs2, vs1, vm # vector-vector vwmulu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar
vwmul.vx	vs2, rs1, vd	# Widening signed-integer multiply vwmul.vv vd, vs2, vs1, vm # vector-vector vwmul.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned-integer multiply vwmulu.vv vd, vs2, vs1, vm # vector-vector vwmulu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar

v / _vfirst_find_first_set_mask_bit

Vector Mask Instructions / 16.3. vfirst find-first-set mask bit

Operation

Arguments

Description

vfirst.m

vs2, rd

The vfirst instruction finds the lowest-numbered active element of the source mask vector that has the value 1 and writes that element's index to a GPR. If no active element has the value 1, -1 is written to the GPR.

The vfirst.m instruction writes x[rd] even if vl=0 (with the value -1, since no mask elements are active).

Traps on vfirst are always reported with a vstart of 0. The vfirst instruction will raise an illegal instruction exception if vstart is non-zero.

vfirst.m rd, vs2, vm

Spike ISS Implementation:

// vmfirst rd, vs2
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
reg_t vl = P.VU.vl->read();
reg_t rs2_num = insn.rs2();
require(P.VU.vstart->read() == 0);
reg_t pos = -1;
for (reg_t i=P.VU.vstart->read(); i < vl; ++i) {
VI_LOOP_ELEMENT_SKIP()

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx ) >> mpos) & 0x1) == 1;
if (vs2_lsb) {
pos = i;
break;
}
}
P.VU.vstart->write(0);
WRITE_RD(pos);

v / _vmsif_m_set_including_first_mask_bit

Vector Mask Instructions / 16.5. vmsif.m set-including-first mask bit

Operation

Arguments

Description

vmsif.m

vs2, vd

Traps on vmsif.m are always reported with a vstart of 0. The vmsif instruction will raise an illegal instruction exception if vstart is non-zero.

vmsif.m vd, vs2, vm # Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 1 0 0 v3 contents vmsif.m v2, v3 0 0 0 0 0 1 1 1 v2 contents 1 0 0 1 0 1 0 1 v3 contents vmsif.m v2, v3 0 0 0 0 0 0 0 1 v2 1 1 0 0 0 0 1 1 v0 vcontents 1 0 0 1 0 1 0 0 v3 contents vmsif.m v2, v3, v0.t 1 1 x x x x 1 1 v2 contents

Spike ISS Implementation:

// vmsif.m rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
require(P.VU.vstart->read() == 0);
require_vm;
require(insn.rd() != insn.rs2());

reg_t vl = P.VU.vl->read();
reg_t rd_num = insn.rd();
reg_t rs2_num = insn.rs2();

bool has_one = false;
for (reg_t i = P.VU.vstart->read(); i < vl; ++i) {
const int midx = i / 64;
const int mpos = i % 64;
const uint64_t mmask = UINT64_C(1) << mpos; \

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx ) >> mpos) & 0x1) == 1;
bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1;

if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) {
auto &vd = P.VU.elt<uint64_t>(rd_num, midx, true);
uint64_t res = 0;
if (!has_one && !vs2_lsb) {
res = 1;
} else if (!has_one && vs2_lsb) {
has_one = true;
res = 1;
}
vd = (vd & ~mmask) | ((res << mpos) & mmask);
}
}

v / _vmsof_m_set_only_first_mask_bit

Vector Mask Instructions / 16.6. vmsof.m set-only-first mask bit

Operation

Arguments

Description

vmsof.m

vs2, vd

Traps on vmsof.m are always reported with a vstart of 0. The vmsof instruction will raise an illegal instruction exception if vstart is non-zero.

vmsof.m vd, vs2, vm # Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 1 0 0 v3 contents vmsof.m v2, v3 0 0 0 0 0 1 0 0 v2 contents 1 0 0 1 0 1 0 1 v3 contents vmsof.m v2, v3 0 0 0 0 0 0 0 1 v2 1 1 0 0 0 0 1 1 v0 vcontents 1 1 0 1 0 1 0 0 v3 contents vmsof.m v2, v3, v0.t 0 1 x x x x 0 0 v2 contents

Spike ISS Implementation:

// vmsof.m rd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
require(P.VU.vstart->read() == 0);
require_vm;
require(insn.rd() != insn.rs2());

reg_t vl = P.VU.vl->read();
reg_t rd_num = insn.rd();
reg_t rs2_num = insn.rs2();

bool has_one = false;
for (reg_t i = P.VU.vstart->read() ; i < vl; ++i) {
const int midx = i / 64;
const int mpos = i % 64;
const uint64_t mmask = UINT64_C(1) << mpos; \

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx ) >> mpos) & 0x1) == 1;
bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1;

if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) {
uint64_t &vd = P.VU.elt<uint64_t>(rd_num, midx, true);
uint64_t res = 0;
if (!has_one && vs2_lsb) {
has_one = true;
res = 1;
}
vd = (vd & ~mmask) | ((res << mpos) & mmask);
}
}

v / _widening_floating_pointinteger_type_convert_instructions

Vector Floating-Point Instructions / 14.18. Widening Floating-Point/Integer Type-Convert Instructions

Operation	Arguments	Description
vfwcvt.f.f.v	vs2, vd	vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.
vfwcvt.f.x.v	vs2, vd	vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.
vfwcvt.f.xu.v	vs2, vd	vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.
vfwcvt.rtz.x.f.v	vs2, vd	vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.
vfwcvt.rtz.xu.f.v	vs2, vd	vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.
vfwcvt.x.f.v	vs2, vd	vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.
vfwcvt.xu.f.v	vs2, vd	vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float.

v / _zve_vector_extensions_for_embedded_processors

Standard Vector Extensions / 19.2. Zve*: Vector Extensions for Embedded Processors

Operation	Arguments	Description
vmulh.vv	vs2, vs1, vd	All Zve* extensions support all vector integer instructions (Section Vector Integer Arithmetic Instructions ), except that the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64*.
vmulh.vx	vs2, rs1, vd	All Zve* extensions support all vector integer instructions (Section Vector Integer Arithmetic Instructions ), except that the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64*.

v / sec-agnostic

Vector Extension Programmer’s Model / 4.4. Vector type register, vtype

Operation

Arguments

Description

vmsbf.m

vs2, vd

In addition, except for mask load instructions, any element in the tail of a mask result can also be written with the value the mask-producing operation would have calculated with vl=VLMAX. Furthermore, for mask-logical instructions and vmsbf.m, vmsif.m, vmsof.m mask-manipulation instructions, any element in the tail of the result can be written with the value the mask-producing operation would have calculated with vl=VLEN, SEW=8, and LMUL=8 (i.e., all bits of the mask register can be overwritten).

Spike ISS Implementation:

// vmsbf.m vd, vs2, vm
require(P.VU.vsew >= e8 && P.VU.vsew <= e64);
require_vector(true);
require(P.VU.vstart->read() == 0);
require_vm;
require(insn.rd() != insn.rs2());

reg_t vl = P.VU.vl->read();
reg_t rd_num = insn.rd();
reg_t rs2_num = insn.rs2();

bool has_one = false;
for (reg_t i = P.VU.vstart->read(); i < vl; ++i) {
const int midx = i / 64;
const int mpos = i % 64;
const uint64_t mmask = UINT64_C(1) << mpos; \

bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx) >> mpos) & 0x1) == 1;
bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1;


if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) {
auto &vd = P.VU.elt<uint64_t>(rd_num, midx, true);
uint64_t res = 0;
if (!has_one && !vs2_lsb) {
res = 1;
} else if (!has_one && vs2_lsb) {
has_one = true;
}
vd = (vd & ~mmask) | ((res << mpos) & mmask);
}
}

vsetvli

zimm11, rs1, rd

The assembly syntax adds two mandatory flags to the vsetvli instruction:

ta # Tail agnostic tu # Tail undisturbed ma # Mask agnostic mu # Mask undisturbed vsetvli t0, a0, e32, m4, ta, ma # Tail agnostic, mask agnostic vsetvli t0, a0, e32, m4, tu, ma # Tail undisturbed, mask agnostic vsetvli t0, a0, e32, m4, ta, mu # Tail agnostic, mask undisturbed vsetvli t0, a0, e32, m4, tu, mu # Tail undisturbed, mask undisturbed

Spike ISS Implementation:

require_vector_novtype(false);
WRITE_RD(P.VU.set_vl(insn.rd(), insn.rs1(), RS1, insn.v_zimm11()));

v / sec-mask-register-logical

Vector Mask Instructions / 16.1. Vector Mask-Register Logical Instructions

Operation	Arguments	Description
vmand.mm	vs2, vs1, vd	vmand.mm vd, src1, src2 vmand.mm vd, src2, src2 vmand.mm vd, src1, src1 vmand.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && vs1.mask[i] vmnand.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] && vs1.mask[i]) vmandn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && !vs1.mask[i] vmxor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] ^^ vs1.mask[i] vmor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] \|\| vs1.mask[i] vmnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] \|\| vs1.mask[i]) vmorn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] \|\| !vs1.mask[i] vmxnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] ^^ vs1.mask[i]) vmmv.m vd, vs => vmand.mm vd, vs, vs # Copy mask register vmclr.m vd => vmxor.mm vd, vd, vd # Clear mask register vmset.m vd => vmxnor.mm vd, vd, vd # Set mask register vmnot.m vd, vs => vmnand.mm vd, vs, vs # Invert bits
vmandn.mm	vs2, vs1, vd	vmandn.mm vd, src2, src1 vmandn.mm vd, src1, src2
vmnand.mm	vs2, vs1, vd	vmnand.mm vd, src1, src1 vmnand.mm vd, src2, src2 vmnand.mm vd, src1, src2
vmnor.mm	vs2, vs1, vd	vmnor.mm vd, src1, src2
vmorn.mm	vs2, vs1, vd	vmorn.mm vd, src2, src1 vmorn.mm vd, src1, src2
vmxnor.mm	vs2, vs1, vd	vmxnor.mm vd, src1, src2 vmxnor.mm vd, vd, vd
vmxor.mm	vs2, vs1, vd	vmxor.mm vd, vd, vd vmxor.mm vd, src1, src2

v / sec-narrowing

Vector Arithmetic Instruction Formats / 11.3. Narrowing Vector Arithmetic Instructions

Operation	Arguments	Description
vnsra.wi	vs2, simm5, vd	A vn* prefix on the opcode is used to distinguish these instructions in the assembler, or a vfn* prefix for narrowing floating-point opcodes. The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv)
vnsra.wv	vs2, vs1, vd	A vn* prefix on the opcode is used to distinguish these instructions in the assembler, or a vfn* prefix for narrowing floating-point opcodes. The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv)
vnsra.wx	vs2, rs1, vd	A vn* prefix on the opcode is used to distinguish these instructions in the assembler, or a vfn* prefix for narrowing floating-point opcodes. The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv)

v / sec-vec-operands

Vector Instruction Formats / 6.2. Vector Operands

Operation	Arguments	Description
vnsrl.wi	vs2, simm5, vd	The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not).
vnsrl.wv	vs2, vs1, vd	The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not).
vnsrl.wx	vs2, rs1, vd	The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not).
vzext.vf2	vs2, vd	The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not).
vzext.vf4	vs2, vd	The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not).
vzext.vf8	vs2, vd	The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not).

v / sec-vector-float-reduce

Vector Reduction Operations / 15.3. Vector Single-Width Floating-Point Reduction Instructions

Operation	Arguments	Description
vfredosum.vs	vs2, vs1, vd	# Simple reductions. vfredosum.vs vd, vs2, vs1, vm # Ordered sum vfredusum.vs vd, vs2, vs1, vm # Unordered sum vfredmax.vs vd, vs2, vs1, vm # Maximum value vfredmin.vs vd, vs2, vs1, vm # Minimum value

v / sec-vector-float-reduce-widen

Vector Reduction Operations / 15.4. Vector Widening Floating-Point Reduction Instructions

Operation	Arguments	Description
vfwredosum.vs	vs2, vs1, vd	# Simple reductions. vfwredosum.vs vd, vs2, vs1, vm # Ordered sum vfwredusum.vs vd, vs2, vs1, vm # Unordered sum

v / sec-vector-integer-reduce

Vector Reduction Operations / 15.1. Vector Single-Width Integer Reduction Instructions

Operation	Arguments	Description
vredsum.vs	vs2, vs1, vd	# Simple reductions, where [] denotes all active elements: vredsum.vs vd, vs2, vs1, vm # vd[0] = sum( vs1[0] , vs2[] ) vredmaxu.vs vd, vs2, vs1, vm # vd[0] = maxu( vs1[0] , vs2[] ) vredmax.vs vd, vs2, vs1, vm # vd[0] = max( vs1[0] , vs2[] ) vredminu.vs vd, vs2, vs1, vm # vd[0] = minu( vs1[0] , vs2[] ) vredmin.vs vd, vs2, vs1, vm # vd[0] = min( vs1[0] , vs2[] ) vredand.vs vd, vs2, vs1, vm # vd[0] = and( vs1[0] , vs2[] ) vredor.vs vd, vs2, vs1, vm # vd[0] = or( vs1[0] , vs2[] ) vredxor.vs vd, vs2, vs1, vm # vd[0] = xor( vs1[0] , vs2[*] )

v / sec-vector-integer-reduce-widen

Vector Reduction Operations / 15.2. Vector Widening Integer Reduction Instructions

Operation

Arguments

Description

vwredsum.vs

vs2, vs1, vd

The vwredsum.vs instruction sign-extends the SEW-wide vector elements before summing them.

vwredsumu.vs

vs2, vs1, vd

The unsigned vwredsumu.vs instruction zero-extends the SEW-wide vector elements before summing them, then adds the 2*SEW-width scalar element, and stores the result in a 2*SEW-width scalar element.

For both vwredsumu.vs and vwredsum.vs, overflows wrap around.

# Unsigned sum reduction into double-width accumulator vwredsumu.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(zero-extend(SEW)) # Signed sum reduction into double-width accumulator vwredsum.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(sign-extend(SEW))

counters

counters / zicntr-standard-extension-for-base-counters-and-timers

11 Counters / 11.1 “Zicntr” Standard Extension for Base Counters and Timers

Operation	Arguments	Description
rdcycle	rd	RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDCYCLE pseudoinstruction reads the low XLEN bits of the cycle CSR which holds a count of the number of clock cycles executed by the processor core on which the hart is running from an arbitrary start time in the past. RDCYCLEH is an RV32I-only instruction that reads bits 63-32 of the same cycle counter. The underlying 64-bit counter should never overflow in practice. The rate at which the cycle counter advances will depend on the implementation and operating environment. The execution environment should provide a means to determine the current rate (cycles/second) at which the cycle counter is incrementing. RDCYCLE is intended to return the number of cycles executed by the processor core, not the hart. Precisely defining what is a "core" is difficult given some implementation choices (e.g., AMD Bulldozer). Precisely defining what is a "clock cycle" is also difficult given the range of implementations (including software emulations), but the intent is that RDCYCLE is used for performance monitoring along with the other performance counters. In particular, where there is one hart/core, one would expect cycle-count/instructions-retired to measure CPI for a hart. Even though there is no precise definition that works for all platforms, this is still a useful facility for most platforms, and an imprecise, common, "usually correct" standard here is better than no standard. The intent of RDCYCLE was primarily performance monitoring/tuning, and the specification was written with that goal in mind. On some simple platforms, cycle count might represent a valid implementation of RDTIME, in which case RDTIME and RDCYCLE may return the same result.
rdcycleh	rd	RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDCYCLE pseudoinstruction reads the low XLEN bits of the cycle CSR which holds a count of the number of clock cycles executed by the processor core on which the hart is running from an arbitrary start time in the past. RDCYCLEH is an RV32I-only instruction that reads bits 63-32 of the same cycle counter. The underlying 64-bit counter should never overflow in practice. The rate at which the cycle counter advances will depend on the implementation and operating environment. The execution environment should provide a means to determine the current rate (cycles/second) at which the cycle counter is incrementing.
rdinstret	rd	RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDINSTRET pseudoinstruction reads the low XLEN bits of the instret CSR, which counts the number of instructions retired by this hart from some arbitrary start point in the past. RDINSTRETH is an RV32I-only instruction that reads bits 63-32 of the same instruction counter. The underlying 64-bit counter should never overflow in practice.
rdinstreth	rd	RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDINSTRET pseudoinstruction reads the low XLEN bits of the instret CSR, which counts the number of instructions retired by this hart from some arbitrary start point in the past. RDINSTRETH is an RV32I-only instruction that reads bits 63-32 of the same instruction counter. The underlying 64-bit counter should never overflow in practice.
rdtime	rd	RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDTIME pseudoinstruction reads the low XLEN bits of the time CSR, which counts wall-clock real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only instruction that reads bits 63-32 of the same real-time counter. The underlying 64-bit counter increments by one with each tick of the real-time clock, and, for realistic real-time clock frequencies, should never overflow in practice. The execution environment should provide a means of determining the period of a counter tick (seconds/tick). The period must be constant. The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock. The environment should provide a means to determine the accuracy of the clock (i.e., the maximum relative error between the nominal and actual real-time clock periods). On some simple platforms, cycle count might represent a valid implementation of RDTIME, in which case RDTIME and RDCYCLE may return the same result.
rdtimeh	rd	RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDTIME pseudoinstruction reads the low XLEN bits of the time CSR, which counts wall-clock real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only instruction that reads bits 63-32 of the same real-time counter. The underlying 64-bit counter increments by one with each tick of the real-time clock, and, for realistic real-time clock frequencies, should never overflow in practice. The execution environment should provide a means of determining the period of a counter tick (seconds/tick). The period must be constant. The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock. The environment should provide a means to determine the accuracy of the clock (i.e., the maximum relative error between the nominal and actual real-time clock periods).

zihintpause

zihintpause / chap:zihintpause

4 “Zihintpause” Pause Hint, Version 2.0 /

Operation

Arguments

Description

pause

The PAUSE instruction is a HINT that indicates the current hart's rate of instruction retirement should be temporarily reduced or paused. The duration of its effect must be bounded and may be zero. No architectural state is changed.

Software can use the PAUSE instruction to reduce energy consumption while executing spin-wait code sequences. Multithreaded cores might temporarily relinquish execution resources to other harts when PAUSE is executed. It is recommended that a PAUSE instruction generally be included in the code sequence for a spin-wait loop.

A future extension might add primitives similar to the x86 MONITOR/MWAIT instructions, which provide a more efficient mechanism to wait on writes to a specific memory location. However, these instructions would not supplant PAUSE. PAUSE is more appropriate when polling for non-memory events, when polling for multiple events, or when software does not know precisely what events it is polling for.

The duration of a PAUSE instruction's effect may vary significantly within and among implementations. In typical implementations this duration should be much less than the time to perform a context switch, probably more on the rough order of an on-chip cache miss latency or a cacheless access to main memory.

A series of PAUSE instructions can be used to create a cumulative delay loosely proportional to the number of PAUSE instructions. In spin-wait loops in portable code, however, only one PAUSE instruction should be used before re-evaluating loop conditions, else the hart might stall longer than optimal on some implementations, degrading system performance.

PAUSE is encoded as a FENCE instruction with pred=W, succ=0, fm=0, rd=x0, and rs1=x0.

PAUSE is encoded as a hint within the FENCE opcode because some implementations are expected to deliberately stall the PAUSE instruction until outstanding memory transactions have completed. Because the successor set is null, however, PAUSE does not mandate any particular memory ordering--hence, it truly is a HINT.

Like other FENCE instructions, PAUSE cannot be used within LR/SC sequences without voiding the forward-progress guarantee.

The choice of a predecessor set of W is arbitrary, since the successor set is null. Other HINTs similar to PAUSE might be encoded with other predecessor sets.

zfh

half precision convert and move instructions

half precision floating point classify instruction

half precision load and store instructions

zfh / half-precision-convert-and-move-instructions

15 “Zfh” and “Zfhmin” Standard Extensions for Half-Precision Floating-Point, Version 0.1 / 15.3 Half-Precision Convert and Move Instructions

Operation	Arguments	Description
fcvt.d.h	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.
fcvt.h.d	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.
fcvt.h.l	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.
fcvt.h.lu	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.
fcvt.h.q	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.
fcvt.h.s	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.
fcvt.h.w	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.
fcvt.h.wu	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.
fcvt.l.h	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.
fcvt.lu.h	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.
fcvt.q.h	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.
fcvt.s.h	rd, rs1	New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively.
fcvt.w.h	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.
fcvt.wu.h	rd, rs1	New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions.
fmv.h.x	rd, rs1	FMV.H.X moves the half-precision value encoded in IEEE 754-2008 standard encoding from the lower 16 bits of integer register rs1 to the floating-point register rd, NaN-boxing the result. FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
fmv.x.h	rd, rs1	Instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.H moves the half-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd, filling the upper XLEN-16 bits with copies of the floating-point number's sign bit. FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
fsgnj.h	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation: require_either_extension(EXT_ZFH, EXT_ZHINX); require_fp; WRITE_FRD_H(fsgnj16(freg(FRS1_H), freg(FRS2_H), false, false));
fsgnjn.h	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation: require_either_extension(EXT_ZFH, EXT_ZHINX); require_fp; WRITE_FRD_H(fsgnj16(freg(FRS1_H), freg(FRS2_H), true, false));
fsgnjx.h	rd, rs1, rs2	Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation: require_either_extension(EXT_ZFH, EXT_ZHINX); require_fp; WRITE_FRD_H(fsgnj16(freg(FRS1_H), freg(FRS2_H), false, true));

zfh / half-precision-floating-point-classify-instruction

15 “Zfh” and “Zfhmin” Standard Extensions for Half-Precision Floating-Point, Version 0.1 / 15.5 Half-Precision Floating-Point Classify Instruction

Operation

Arguments

Description

fclass.h

rd, rs1

The half-precision floating-point classify instruction, FCLASS.H, is defined analogously to its single-precision counterpart, but operates on half-precision operands.

Spike ISS Implementation:

require_either_extension(EXT_ZFH, EXT_ZHINX);
require_fp;
WRITE_RD(f16_classify(FRS1_H));

zfh / half-precision-load-and-store-instructions

15 “Zfh” and “Zfhmin” Standard Extensions for Half-Precision Floating-Point, Version 0.1 / 15.1 Half-Precision Load and Store Instructions

Operation

Arguments

Description

flh

rd, rs1, imm12

FLH and FSH are only guaranteed to execute atomically if the effective address is naturally aligned.

FLH and FSH do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. FLH NaN-boxes the result written to rd, whereas FSH ignores all but the lower 16 bits in rs2.

Spike ISS Implementation:

require_extension(EXT_INTERNAL_ZFH_MOVE);
require_fp;
WRITE_FRD(f16(MMU.load<uint16_t>(RS1 + insn.i_imm())));

fsh

rs1, rs2, imm12

FLH and FSH are only guaranteed to execute atomically if the effective address is naturally aligned.

Spike ISS Implementation:

require_extension(EXT_INTERNAL_ZFH_MOVE);
require_fp;
MMU.store<uint16_t>(RS1 + insn.s_imm(), FRS2.v[0]);

csr

csr /

Operation	Arguments	Description
csrc	csr, rs	Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm. Psuedo Opcode, Equivalent Operations: csrrc x0, csr, rs
csrci	csr, imm	Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm. Psuedo Opcode, Equivalent Operations: csrrci x0, csr, imm
csrr	rd, csr	The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Psuedo Opcode, Equivalent Operations: csrrs rd, csr, x0
csrrc	rd, rs1	The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the value of the CSR, zero-extends the value to XLEN bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be cleared in the CSR, if that CSR bit is writable. Other bits in the CSR are not explicitly written. For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects. A CSRRW with rs1=x0 will attempt to write zero to the destination CSR. The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. Spike ISS Implementation: bool write = insn.rs1() != 0; int csr = validate_csr(insn.csr(), write); reg_t old = p->get_csr(csr, insn, write); if (write) { p->put_csr(csr, old & ~RS1); } WRITE_RD(sext_xlen(old)); serialize();
csrrci	rd, zimm	The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. Spike ISS Implementation: bool write = insn.rs1() != 0; int csr = validate_csr(insn.csr(), write); reg_t old = p->get_csr(csr, insn, write); if (write) { p->put_csr(csr, old & ~(reg_t)insn.rs1()); } WRITE_RD(sext_xlen(old)); serialize();
csrrs	rd, rs1	The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the value of the CSR, zero-extends the value to XLEN bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are not explicitly written. For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects. A CSRRW with rs1=x0 will attempt to write zero to the destination CSR. The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Spike ISS Implementation: bool write = insn.rs1() != 0; int csr = validate_csr(insn.csr(), write); reg_t old = p->get_csr(csr, insn, write); if (write) { p->put_csr(csr, old \| RS1); } WRITE_RD(sext_xlen(old)); serialize();
csrrsi	rd, zimm	The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. Spike ISS Implementation: bool write = insn.rs1() != 0; int csr = validate_csr(insn.csr(), write); reg_t old = p->get_csr(csr, insn, write); if (write) { p->put_csr(csr, old \| insn.rs1()); } WRITE_RD(sext_xlen(old)); serialize();
csrrw	rd, rs1	The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values in the CSRs and integer registers. CSRRW reads the old value of the CSR, zero-extends the value to XLEN bits, then writes it to integer register rd. The initial value in rs1 is written to the CSR. If rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects. A CSRRW with rs1=x0 will attempt to write zero to the destination CSR. The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Spike ISS Implementation: int csr = validate_csr(insn.csr(), true); reg_t old = p->get_csr(csr, insn, true); p->put_csr(csr, RS1); WRITE_RD(sext_xlen(old)); serialize();
csrrwi	rd, zimm	The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Spike ISS Implementation: int csr = validate_csr(insn.csr(), true); reg_t old = p->get_csr(csr, insn, true); p->put_csr(csr, insn.rs1()); WRITE_RD(sext_xlen(old)); serialize();
csrs	csr, rs	Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm. Psuedo Opcode, Equivalent Operations: csrrs x0, csr, rs
csrsi	csr, imm	Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm. Psuedo Opcode, Equivalent Operations: csrrsi x0, csr, imm
csrw	csr, rs	The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Psuedo Opcode, Equivalent Operations: csrrw x0, csr, rs
csrwi	csr, imm	The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Psuedo Opcode, Equivalent Operations: csrrwi x0, csr, imm

supervisor

supervisor / svinval

7 “Svinval” Standard Extension for Fine-Grained Address-Translation Cache Invalidation, Version 1.0 /

Operation	Arguments	Description
hfence.gvma	rs1, rs2	The Svinval extension splits SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA instructions into finer-grained invalidation and ordering operations that can be more efficiently batched or pipelined on certain classes of high-performance implementation. If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception. High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.
hfence.vvma	rs1, rs2	The Svinval extension splits SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA instructions into finer-grained invalidation and ordering operations that can be more efficiently batched or pipelined on certain classes of high-performance implementation. If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception. High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.
hinval.gvma	rs1, rs2	If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception. In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.
hinval.vvma	rs1, rs2	If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception. In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.
sfence.inval.ir		The SINVAL.VMA instruction invalidates any address-translation cache entries that an SFENCE.VMA instruction with the same values of rs1 and rs2 would invalidate. However, unlike SFENCE.VMA, SINVAL.VMA instructions are only ordered with respect to SFENCE.VMA, SFENCE.W.INVAL, and SFENCE.INVAL.IR instructions as defined below. The SFENCE.W.INVAL instruction guarantees that any previous stores already visible to the current RISC-V hart are ordered before subsequent SINVAL.VMA instructions executed by the same hart. The SFENCE.INVAL.IR instruction guarantees that any previous SINVAL.VMA instructions executed by the current hart are ordered before subsequent implicit references by that hart to the memory-management data structures. When executed in order (but not necessarily consecutively) by a single hart, the sequence SFENCE.W.INVAL, SINVAL.VMA, and SFENCE.INVAL.IR has the same effect as a hypothetical SFENCE.VMA instruction in which: reads and writes following the SFENCE.INVAL.IR are considered to be those subsequent to the SFENCE.VMA. If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SFENCE.W.INVAL and SFENCE.INVAL.IR instructions do not need to be trapped when mstatus.TVM=1 or when hstatus.VTVM=1, as they only have ordering effects but no visible side effects. Trapping of the SINVAL.VMA instruction is sufficient to enable emulation of the intended overall TLB maintenance functionality. In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction. High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.
sfence.w.inval		The SINVAL.VMA instruction invalidates any address-translation cache entries that an SFENCE.VMA instruction with the same values of rs1 and rs2 would invalidate. However, unlike SFENCE.VMA, SINVAL.VMA instructions are only ordered with respect to SFENCE.VMA, SFENCE.W.INVAL, and SFENCE.INVAL.IR instructions as defined below. The SFENCE.W.INVAL instruction guarantees that any previous stores already visible to the current RISC-V hart are ordered before subsequent SINVAL.VMA instructions executed by the same hart. The SFENCE.INVAL.IR instruction guarantees that any previous SINVAL.VMA instructions executed by the current hart are ordered before subsequent implicit references by that hart to the memory-management data structures. When executed in order (but not necessarily consecutively) by a single hart, the sequence SFENCE.W.INVAL, SINVAL.VMA, and SFENCE.INVAL.IR has the same effect as a hypothetical SFENCE.VMA instruction in which: reads and writes prior to the SFENCE.W.INVAL are considered to be those prior to the SFENCE.VMA, and If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SFENCE.W.INVAL and SFENCE.INVAL.IR instructions do not need to be trapped when mstatus.TVM=1 or when hstatus.VTVM=1, as they only have ordering effects but no visible side effects. Trapping of the SINVAL.VMA instruction is sufficient to enable emulation of the intended overall TLB maintenance functionality. In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction. High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops.

hypervisor

hypervisor / hypervisor-virtual-machine-load-and-store-instructions

5 Hypervisor Extension, Version 1.0 / 5.3 Hypervisor Instructions

Operation	Arguments	Description
hlv.b	rd, rs1	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation: require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); WRITE_RD(MMU.guest_load<int8_t>(RS1));
hlv.bu	rd, rs1	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course.
hlv.d	rd, rs1	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation: require_extension('H'); require_rv64; require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); WRITE_RD(MMU.guest_load<int64_t>(RS1));
hlv.h	rd, rs1	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation: require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); WRITE_RD(MMU.guest_load<int16_t>(RS1));
hlv.hu	rd, rs1	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.)
hlv.w	rd, rs1	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. HLVX.WU is valid for RV32, even though LWU and HLV.WU are not. (For RV32, HLVX.WU can be considered a variant of HLV.W, as sign extension is irrelevant for 32-bit values.) Spike ISS Implementation: require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); WRITE_RD(MMU.guest_load<int32_t>(RS1));
hlv.wu	rd, rs1	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.) HLVX.WU is valid for RV32, even though LWU and HLV.WU are not. (For RV32, HLVX.WU can be considered a variant of HLV.W, as sign extension is irrelevant for 32-bit values.)
hlvx.hu	rd, rs1	Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.)
hlvx.wu	rd, rs1	Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.) HLVX.WU is valid for RV32, even though LWU and HLV.WU are not. (For RV32, HLVX.WU can be considered a variant of HLV.W, as sign extension is irrelevant for 32-bit values.)
hsv.b	rs1, rs2	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation: require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); MMU.guest_store<uint8_t>(RS1, RS2);
hsv.d	rs1, rs2	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation: require_extension('H'); require_rv64; require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); MMU.guest_store<uint64_t>(RS1, RS2);
hsv.h	rs1, rs2	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation: require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); MMU.guest_store<uint16_t>(RS1, RS2);
hsv.w	rs1, rs2	For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation: require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); MMU.guest_store<uint32_t>(RS1, RS2);