- Intro & References
- GCC Extended Assembler
- Instruction List
- rv32
- rv64
- a
- c
- d
- d / d-standard-extension-for-double-precision-floating-point-version-2.2
- d / double-precision-floating-point-conversion-and-move-instructions
- d / fld_fsd
- d / sec:single-float-compute
- d / single-precision-floating-point-compare-instructions
- d / single-precision-floating-point-conversion-and-move-instructions
- f
- m
- q
- v
- v / _introduction
- v / _narrowing_floating_pointinteger_type_convert_instructions
- v / _single_width_floating_pointinteger_type_convert_instructions
- v / _state_of_vector_extension_at_reset
- v / _unit_stride_fault_only_first_loads
- v / _vector_bitwise_logical_instructions
- v / _vector_compress_instruction
- v / _vector_count_population_in_mask_vcpop_m
- v / _vector_element_index_instruction
- v / _vector_floating_point_classify_instruction
- v / _vector_floating_point_compare_instructions
- v / _vector_floating_point_merge_instruction
- v / _vector_floating_point_minmax_instructions
- v / _vector_floating_point_move_instruction
- v / _vector_floating_point_reciprocal_estimate_instruction
- v / _vector_floating_point_reciprocal_square_root_estimate_instruction
- v / _vector_floating_point_sign_injection_instructions
- v / _vector_floating_point_square_root_instruction
- v / _vector_indexed_instructions
- v / _vector_instruction_formats
- v / _vector_instruction_listing
- v / _vector_integer_add_with_carry_subtract_with_borrow_instructions
- v / _vector_integer_compare_instructions
- v / _vector_integer_divide_instructions
- v / _vector_integer_merge_instructions
- v / _vector_integer_minmax_instructions
- v / _vector_integer_move_instructions
- v / _vector_iota_instruction
- v / _vector_loadstore_whole_register_instructions
- v / _vector_narrowing_fixed_point_clip_instructions
- v / _vector_register_gather_instructions
- v / _vector_register_grouping_vlmul20
- v / _vector_single_width_averaging_add_and_subtract
- v / _vector_single_width_floating_point_addsubtract_instructions
- v / _vector_single_width_floating_point_fused_multiply_add_instructions
- v / _vector_single_width_floating_point_multiplydivide_instructions
- v / _vector_single_width_fractional_multiply_with_rounding_and_saturation
- v / _vector_single_width_integer_add_and_subtract
- v / _vector_single_width_integer_multiply_add_instructions
- v / _vector_single_width_integer_multiply_instructions
- v / _vector_single_width_saturating_add_and_subtract
- v / _vector_single_width_scaling_shift_instructions
- v / _vector_single_width_shift_instructions
- v / _vector_slide1down_instruction
- v / _vector_slide1up
- v / _vector_slide_instructions
- v / _vector_slidedown_instructions
- v / _vector_strided_instructions
- v / _vector_unit_stride_instructions
- v / _vector_unordered_single_width_floating_point_sum_reduction
- v / _vector_widening_floating_point_addsubtract_instructions
- v / _vector_widening_floating_point_fused_multiply_add_instructions
- v / _vector_widening_floating_point_multiply
- v / _vector_widening_integer_addsubtract
- v / _vector_widening_integer_multiply_add_instructions
- v / _vector_widening_integer_multiply_instructions
- v / _vfirst_find_first_set_mask_bit
- v / _vmsif_m_set_including_first_mask_bit
- v / _vmsof_m_set_only_first_mask_bit
- v / _widening_floating_pointinteger_type_convert_instructions
- v / _zve_vector_extensions_for_embedded_processors
- v / sec-agnostic
- v / sec-mask-register-logical
- v / sec-narrowing
- v / sec-vec-operands
- v / sec-vector-float-reduce
- v / sec-vector-float-reduce-widen
- v / sec-vector-integer-reduce
- v / sec-vector-integer-reduce-widen
- counters
- zihintpause
- zfh
- csr
- supervisor
- hypervisor
Intro & References
For information on assembler programming:
- RISC-V Assembly Programmer’s Manual
- Pseudo Opcodes (Not covered below)
- RISC-V Instruction Table
- RISC-V Opcode Map
- GNU Assembler, RISC-V Section
- GNU Linker
Some good cheat sheets.
- RISC-V Instruction-Set Cheatsheet, from Erik Engheim. PDF Version
- RISC-V-QuickRefCard-v042.pdf, “basic assembly programmer’s quick reference card” from Dylan McNamee.
- A old but nicely formatted “green card” summary of the ISA: RISCVGreenCardv8-20151013.pdf
GCC Extended Assembler
GCC gives direct access to instructions via __asm__
. e.g.
- No argument instructions:
__asm__ volatile ("nop"); __asm__ volatile ("wfi");
- With register arguments:
__asm__ volatile ("csrrw %0, mie, %1" /* read and write atomically */ : "=r" (ret) /* output: register %0 */ : "r" (value) /* input: register %1 */ : /* clobbers: none */);
Opcodes are listed in machine readable format here
Instruction List
rv32 | rv64 | a | c | d | f | m | q | v | counters | zihintpause | zfh | csr | supervisor | hypervisor |
rv32
control transfer instructions | environment call and breakpoints | immediate encoding variants | integer register immediate instructions | |
integer register register operations | sec:rv32:ldst |
rv32 /
/
Operation | Arguments | Description |
ebreak |
RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity). Spike ISS Implementation:if (!STATE.debug_mode && ( (!STATE.v && STATE.prv == PRV_M && STATE.dcsr->ebreakm) || (!STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreaks) || (!STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreaku) || (STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreakvs) || (STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreakvu))) { throw trap_debug_mode(); } else { throw trap_breakpoint(STATE.v, pc); } |
|
ecall |
RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity). Spike ISS Implementation:switch (STATE.prv) { case PRV_U: throw trap_user_ecall(); case PRV_S: if (STATE.v) throw trap_virtual_supervisor_ecall(); else throw trap_supervisor_ecall(); case PRV_M: throw trap_machine_ecall(); default: abort(); } |
|
fence | rs1, rd |
RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity). Spike ISS Implementation: |
nop |
RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed to reduce the hardware required in a minimal implementation. RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total. RV32I can emulate almost any other ISA extension (except the A extension, which requires additional hardware support for atomicity). Psuedo Opcode, Equivalent Operations:addi x0, x0, 0 |
rv32 / control-transfer-instructions
2 RV32I Base Integer Instruction Set, Version 2.1 / 2.5 Control Transfer Instructions
Operation | Arguments | Description |
beq | rs1, rs2, bimm12 |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation:if (RS1 == RS2) set_pc(BRANCH_TARGET); |
bge | rs1, rs2, bimm12 |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation:if (sreg_t(RS1) >= sreg_t(RS2)) set_pc(BRANCH_TARGET); |
bgeu | rs1, rs2, bimm12 |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation:if (RS1 >= RS2) set_pc(BRANCH_TARGET); |
bgt | rs, rt, offset |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Psuedo Opcode, Equivalent Operations:blt rt, rs, offset |
bgtu | rs, rt, offset |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Psuedo Opcode, Equivalent Operations:bltu rt, rs, offset |
ble | rs, rt, offset |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Psuedo Opcode, Equivalent Operations:bge rt, rs, offset |
bleu | rs, rt, offset |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Psuedo Opcode, Equivalent Operations:bgeu rt, rs, offset |
blt | rs1, rs2, bimm12 |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation:if (sreg_t(RS1) < sreg_t(RS2)) set_pc(BRANCH_TARGET); |
bltu | rs1, rs2, bimm12 |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Signed array bounds may be checked with a single BLTU instruction, since any negative index will compare greater than any nonnegative bound. Spike ISS Implementation:if (RS1 < RS2) set_pc(BRANCH_TARGET); |
bne | rs1, rs2, bimm12 |
Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, using signed and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greater than or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively. Spike ISS Implementation:if (RS1 != RS2) set_pc(BRANCH_TARGET); |
rv32 / environment-call-and-breakpoints
2 RV32I Base Integer Instruction Set, Version 2.1 / 2.8 Environment Call and Breakpoints
Operation | Arguments | Description |
sbreak |
ECALL and EBREAK were previously named SCALL and SBREAK. The instructions have the same functionality and encoding, but were renamed to reflect that they can be used more generally than to call a supervisor-level operating system or debugger. |
|
scall |
ECALL and EBREAK were previously named SCALL and SBREAK. The instructions have the same functionality and encoding, but were renamed to reflect that they can be used more generally than to call a supervisor-level operating system or debugger. |
rv32 / immediate-encoding-variants
2 RV32I Base Integer Instruction Set, Version 2.1 / 2.3 Immediate Encoding Variants
Operation | Arguments | Description |
addw | rd, rs1, rs2 |
In RV64I, checks of 32-bit signed additions can be optimized further by comparing the results of ADD and ADDW on the operands. Spike ISS Implementation:require_rv64; WRITE_RD(sext32(RS1 + RS2)); |
j | offset |
There are a further two variants of the instruction formats (B/J) based on the handling of immediates, as shown in Figure 1.3 . Similarly, the only difference between the U and J formats is that the 20-bit immediate is shifted left by 12 bits to form U immediates and by 1 bit to form J immediates. The location of instruction bits in the U and J format immediates is chosen to maximize overlap with the other formats and with each other. Although more complex implementations might have separate adders for branch and jump calculations and so would not benefit from keeping the location of immediate bits constant across types of instruction, we wanted to reduce the hardware cost of the simplest implementations. By rotating bits in the instruction encoding of B and J immediates instead of using dynamic hardware muxes to multiply the immediate by 2, we reduce instruction signal fanout and immediate mux costs by around a factor of 2. The scrambled immediate encoding will add negligible time to static or ahead-of-time compilation. For dynamic generation of instructions, there is some small additional overhead, but the most common short forward branches have straightforward immediate encodings. Psuedo Opcode, Equivalent Operations:jal x0, offset |
rv32 / integer-register-immediate-instructions
2 RV32I Base Integer Instruction Set, Version 2.1 / 2.4 Integer Computational Instructions
Operation | Arguments | Description |
addi | rd, rs1, imm12 |
ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic overflow is ignored and the result is simply the low XLEN bits of the result. ADDI rd, rs1, 0 is used to implement the MV rd, rs1 assembler pseudoinstruction. Spike ISS Implementation:WRITE_RD(sext_xlen(RS1 + insn.i_imm())); |
andi | rd, rs1, imm12 |
ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs). Spike ISS Implementation:WRITE_RD(insn.i_imm() & RS1); |
auipc | rd, imm20 |
AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the address of the AUIPC instruction, then places the result in register rd. The AUIPC instruction supports two-instruction sequences to access arbitrary offsets from the PC for both control-flow transfers and data accesses. The combination of an AUIPC and the 12-bit immediate in a JALR can transfer control to any 32-bit PC-relative address, while an AUIPC plus the 12-bit immediate offset in regular load or store instructions can access any 32-bit PC-relative data address. Spike ISS Implementation:WRITE_RD(sext_xlen(insn.u_imm() + pc)); |
jal | rd, jimm20 |
The current PC can be obtained by setting the U-immediate to 0. Although a JAL +4 instruction could also be used to obtain the local PC (of the instruction following the JAL), it might cause pipeline breaks in simpler microarchitectures or pollute BTB structures in more complex microarchitectures. Spike ISS Implementation:reg_t tmp = npc; set_pc(JUMP_TARGET); WRITE_RD(tmp); |
jalr | rd, rs1, imm12 |
The AUIPC instruction supports two-instruction sequences to access arbitrary offsets from the PC for both control-flow transfers and data accesses. The combination of an AUIPC and the 12-bit immediate in a JALR can transfer control to any 32-bit PC-relative address, while an AUIPC plus the 12-bit immediate offset in regular load or store instructions can access any 32-bit PC-relative data address. Spike ISS Implementation:reg_t tmp = npc; set_pc((RS1 + insn.i_imm()) & ~reg_t(1)); WRITE_RD(tmp); |
lui | rd, imm20 |
LUI (load upper immediate) is used to build 32-bit constants and uses the U-type format. LUI places the 32-bit U-immediate value into the destination register rd, filling in the lowest 12 bits with zeros. Spike ISS Implementation:WRITE_RD(insn.u_imm()); |
mv | rd, rs |
ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic overflow is ignored and the result is simply the low XLEN bits of the result. ADDI rd, rs1, 0 is used to implement the MV rd, rs1 assembler pseudoinstruction. Psuedo Opcode, Equivalent Operations:addi rd, rs, 0 |
not | rd, rs |
ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs). Psuedo Opcode, Equivalent Operations:xori rd, rs, -1 |
ori | rd, rs1, imm12 |
ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs). Spike ISS Implementation:// prefetch.i/r/w hint when rd = 0 and i_imm[4:0] = 0/1/3 WRITE_RD(insn.i_imm() | RS1); |
seqz | rd, rs |
SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs). Psuedo Opcode, Equivalent Operations:sltiu rd, rs, 1 |
slli | rd, rs1 |
Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits). Spike ISS Implementation:require(SHAMT < xlen); WRITE_RD(sext_xlen(RS1 << SHAMT)); |
slti | rd, rs1, imm12 |
SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs). Spike ISS Implementation:WRITE_RD(sreg_t(RS1) < sreg_t(insn.i_imm())); |
sltiu | rd, rs1, imm12 |
SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs). Spike ISS Implementation:WRITE_RD(RS1 < reg_t(insn.i_imm())); |
srai | rd, rs1 |
Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits). Spike ISS Implementation:require(SHAMT < xlen); WRITE_RD(sext_xlen(sext_xlen(RS1) >> SHAMT)); |
srli | rd, rs1 |
Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits). Spike ISS Implementation:require(SHAMT < xlen); WRITE_RD(sext_xlen(zext_xlen(RS1) >> SHAMT)); |
xori | rd, rs1, imm12 |
ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs). Spike ISS Implementation:WRITE_RD(insn.i_imm() ^ RS1); |
rv32 / integer-register-register-operations
2 RV32I Base Integer Instruction Set, Version 2.1 / 2.4 Integer Computational Instructions
Operation | Arguments | Description |
add | rd, rs1, rs2 |
ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation:WRITE_RD(sext_xlen(RS1 + RS2)); |
and | rd, rs1, rs2 |
ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation:WRITE_RD(RS1 & RS2); |
or | rd, rs1, rs2 |
ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation:WRITE_RD(RS1 | RS2); |
sll | rd, rs1, rs2 |
SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2. Spike ISS Implementation:WRITE_RD(sext_xlen(RS1 << (RS2 & (xlen-1)))); |
slt | rd, rs1, rs2 |
ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation:WRITE_RD(sreg_t(RS1) < sreg_t(RS2)); |
sltu | rd, rs1, rs2 |
ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation:WRITE_RD(RS1 < RS2); |
snez | rd, rs |
ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Psuedo Opcode, Equivalent Operations:sltu rd, x0, rs |
sra | rd, rs1, rs2 |
SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2. Spike ISS Implementation:WRITE_RD(sext_xlen(sext_xlen(RS1) >> (RS2 & (xlen-1)))); |
srl | rd, rs1, rs2 |
SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2. Spike ISS Implementation:WRITE_RD(sext_xlen(zext_xlen(RS1) >> (RS2 & (xlen-1)))); |
sub | rd, rs1, rs2 |
ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation:WRITE_RD(sext_xlen(RS1 - RS2)); |
xor | rd, rs1, rs2 |
ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflows are ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations. Spike ISS Implementation:WRITE_RD(RS1 ^ RS2); |
rv32 / sec:rv32:ldst
2 RV32I Base Integer Instruction Set, Version 2.1 / 2.6 Load and Store Instructions
Operation | Arguments | Description |
lb | rd, rs1, imm12 |
The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation:WRITE_RD(MMU.load<int8_t>(RS1 + insn.i_imm())); |
lbu | rd, rs1, imm12 |
The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation:WRITE_RD(MMU.load<uint8_t>(RS1 + insn.i_imm())); |
lh | rd, rs1, imm12 |
The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation:WRITE_RD(MMU.load<int16_t>(RS1 + insn.i_imm())); |
lhu | rd, rs1, imm12 |
The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation:WRITE_RD(MMU.load<uint16_t>(RS1 + insn.i_imm())); |
lw | rd, rs1, imm12 |
The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation:WRITE_RD(MMU.load<int32_t>(RS1 + insn.i_imm())); |
sb | rs1, rs2, imm12 |
The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation:MMU.store<uint8_t>(RS1 + insn.s_imm(), RS2); |
sh | rs1, rs2, imm12 |
The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation:MMU.store<uint16_t>(RS1 + insn.s_imm(), RS2); |
sw | rs1, rs2, imm12 |
The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory. Spike ISS Implementation:MMU.store<uint32_t>(RS1 + insn.s_imm(), RS2); |
rv64
integer computational instructions | integer register immediate instructions | load and store instructions | register state |
rv64 / integer-computational-instructions
6 RV64I Base Integer Instruction Set, Version 2.1 / 6.2 Integer Computational Instructions
Operation | Arguments | Description |
sllw | rd, rs1, rs2 |
SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. The shift amount is given by rs2[4:0]. Spike ISS Implementation:require_rv64; WRITE_RD(sext32(RS1 << (RS2 & 0x1F))); |
sraw | rd, rs1, rs2 |
SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. The shift amount is given by rs2[4:0]. Spike ISS Implementation:require_rv64; WRITE_RD(sext32(int32_t(RS1) >> (RS2 & 0x1F))); |
srlw | rd, rs1, rs2 |
SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. The shift amount is given by rs2[4:0]. Spike ISS Implementation:require_rv64; WRITE_RD(sext32((uint32_t)RS1 >> (RS2 & 0x1F))); |
rv64 / integer-register-immediate-instructions
6 RV64I Base Integer Instruction Set, Version 2.1 / 6.2 Integer Computational Instructions
Operation | Arguments | Description |
addiw | rd, rs1, imm12 |
ADDIW is an RV64I instruction that adds the sign-extended 12-bit immediate to register rs1 and produces the proper sign-extension of a 32-bit result in rd. Overflows are ignored and the result is the low 32 bits of the result sign-extended to 64 bits. Note, ADDIW rd, rs1, 0 writes the sign-extension of the lower 32 bits of register rs1 into register rd (assembler pseudoinstruction SEXT.W). Spike ISS Implementation:require_rv64; WRITE_RD(sext32(insn.i_imm() + RS1)); |
ld | rd, rs1, imm12 |
Note that the set of address offsets that can be formed by pairing LUI with LD, AUIPC with JALR, etc.in RV64I is [ - 231 - 211, 231 - 211 - 1]. Spike ISS Implementation:require_rv64; WRITE_RD(MMU.load<int64_t>(RS1 + insn.i_imm())); |
sext.w | rd, rs |
ADDIW is an RV64I instruction that adds the sign-extended 12-bit immediate to register rs1 and produces the proper sign-extension of a 32-bit result in rd. Overflows are ignored and the result is the low 32 bits of the result sign-extended to 64 bits. Note, ADDIW rd, rs1, 0 writes the sign-extension of the lower 32 bits of register rs1 into register rd (assembler pseudoinstruction SEXT.W). Psuedo Opcode, Equivalent Operations:addiw rd, rs, 0 |
slliw | rd, rs1 |
SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. SLLIW, SRLIW, and SRAIW encodings with imm[5] 0 are reserved. Previously, SLLIW, SRLIW, and SRAIW with imm[5] 0 were defined to cause illegal instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change. Spike ISS Implementation:require_rv64; WRITE_RD(sext32(RS1 << SHAMT)); |
sraiw | rd, rs1 |
SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. SLLIW, SRLIW, and SRAIW encodings with imm[5] 0 are reserved. Previously, SLLIW, SRLIW, and SRAIW with imm[5] 0 were defined to cause illegal instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change. Spike ISS Implementation:require_rv64; WRITE_RD(sext32(int32_t(RS1) >> SHAMT)); |
srliw | rd, rs1 |
SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and sign-extend their 32-bit results to 64 bits. SLLIW, SRLIW, and SRAIW encodings with imm[5] 0 are reserved. Previously, SLLIW, SRLIW, and SRAIW with imm[5] 0 were defined to cause illegal instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change. Spike ISS Implementation:require_rv64; WRITE_RD(sext32((uint32_t)RS1 >> SHAMT)); |
rv64 / load-and-store-instructions
6 RV64I Base Integer Instruction Set, Version 2.1 / 6.3 Load and Store Instructions
Operation | Arguments | Description |
lwu | rd, rs1, imm12 |
The LW instruction loads a 32-bit value from memory and sign-extends this to 64 bits before storing it in register rd for RV64I. The LWU instruction, on the other hand, zero-extends the 32-bit value from memory for RV64I. LH and LHU are defined analogously for 16-bit values, as are LB and LBU for 8-bit values. The SD, SW, SH, and SB instructions store 64-bit, 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory respectively. Spike ISS Implementation:require_rv64; WRITE_RD(MMU.load<uint32_t>(RS1 + insn.i_imm())); |
sd | rs1, rs2, imm12 |
The LW instruction loads a 32-bit value from memory and sign-extends this to 64 bits before storing it in register rd for RV64I. The LWU instruction, on the other hand, zero-extends the 32-bit value from memory for RV64I. LH and LHU are defined analogously for 16-bit values, as are LB and LBU for 8-bit values. The SD, SW, SH, and SB instructions store 64-bit, 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory respectively. Spike ISS Implementation:require_rv64; MMU.store<uint64_t>(RS1 + insn.s_imm(), RS2); |
rv64 / register-state
6 RV64I Base Integer Instruction Set, Version 2.1 / 6.1 Register State
Operation | Arguments | Description |
subw | rd, rs1, rs2 |
The compiler and calling convention maintain an invariant that all 32-bit values are held in a sign-extended format in 64-bit registers. Even 32-bit unsigned integers extend bit 31 into bits 63 through 32. Consequently, conversion between unsigned and signed 32-bit integers is a no-op, as is conversion from a signed 32-bit integer to a signed 64-bit integer. Existing 64-bit wide SLTU and unsigned branch compares still operate correctly on unsigned 32-bit integers under this invariant. Similarly, existing 64-bit wide logical operations on 32-bit sign-extended integers preserve the sign-extension property. A few new instructions (ADD[I]W/SUBW/SxxW) are required for addition and shifts to ensure reasonable performance for 32-bit values. Spike ISS Implementation:require_rv64; WRITE_RD(sext32(RS1 - RS2)); |
a
atomics | sec:lrsc |
a / atomics
9 “A” Standard Extension for Atomic Instructions, Version 2.1 / 9.4 Atomic Memory Operations
Operation | Arguments | Description |
amoadd.d | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs + RS2; })); |
amoadd.w | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs + RS2; }))); |
amoand.d | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs & RS2; })); |
amoand.w | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs & RS2; }))); |
amomax.d | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](int64_t lhs) { return std::max(lhs, int64_t(RS2)); })); |
amomax.w | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](int32_t lhs) { return std::max(lhs, int32_t(RS2)); }))); |
amomaxu.d | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return std::max(lhs, RS2); })); |
amomaxu.w | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return std::max(lhs, uint32_t(RS2)); }))); |
amomin.d | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](int64_t lhs) { return std::min(lhs, int64_t(RS2)); })); |
amomin.w | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](int32_t lhs) { return std::min(lhs, int32_t(RS2)); }))); |
amominu.d | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return std::min(lhs, RS2); })); |
amominu.w | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return std::min(lhs, uint32_t(RS2)); }))); |
amoor.d | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs | RS2; })); |
amoor.w | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs | RS2; }))); |
amoswap.d | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t UNUSED lhs) { return RS2; })); |
amoswap.w | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t UNUSED lhs) { return RS2; }))); |
amoxor.d | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.amo<uint64_t>(RS1, [&](uint64_t lhs) { return lhs ^ RS2; })); |
amoxor.w | rd, rs1, rs2 |
The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an R-type instruction format. These AMO instructions atomically load a data value from the address in rs1, place the value into register rd, apply a binary operator to the loaded value and the original value in rs2, then store the result back to the original address in rs1. AMOs can either operate on 64-bit (RV64 only) or 32-bit words in memory. For RV64, 32-bit AMOs always sign-extend the value placed in rd, and ignore the upper 32 bits of the original value of rs2. Spike ISS Implementation:require_extension('A'); WRITE_RD(sext32(MMU.amo<uint32_t>(RS1, [&](uint32_t lhs) { return lhs ^ RS2; }))); |
a / sec:lrsc
9 “A” Standard Extension for Atomic Instructions, Version 2.1 / 9.2 Load-Reserved/Store-Conditional Instructions
Operation | Arguments | Description |
lr.d | rd, rs1 |
Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd. Spike ISS Implementation:require_extension('A'); require_rv64; WRITE_RD(MMU.load_reserved<int64_t>(RS1)); |
lr.w | rd, rs1 |
Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd. Spike ISS Implementation:require_extension('A'); WRITE_RD(MMU.load_reserved<int32_t>(RS1)); |
sc.d | rd, rs1, rs2 |
Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd. Spike ISS Implementation:require_extension('A'); require_rv64; bool have_reservation = MMU.store_conditional<uint64_t>(RS1, RS2); WRITE_RD(!have_reservation); |
sc.w | rd, rs1, rs2 |
Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) instructions. LR.W loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation set--a set of bytes that subsumes the bytes in the addressed word. SC.W conditionally writes a word in rs2 to the address in rs1: the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd. Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd. Spike ISS Implementation:require_extension('A'); bool have_reservation = MMU.store_conditional<uint32_t>(RS1, RS2); WRITE_RD(!have_reservation); |
c
c / compressed
17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.8 RVC Instruction Set Listings
Operation | Arguments | Description |
c.slli_rv32 | rd_rs1_n0, c_nzuimm6lo | |
c.srai_rv32 | rd_rs1_p, c_nzuimm5 | |
c.srli_rv32 | rd_rs1_p, c_nzuimm5 |
c / control-transfer-instructions
17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.4 Control Transfer Instructions
Operation | Arguments | Description |
c.beqz | rs1_p, c_bimm9 |
C.BEQZ performs conditional control transfers. The offset is sign-extended and added to the pc to form the branch target address. It can therefore target a ±256 B range. C.BEQZ takes the branch if the value in register rs1' is zero. It expands to beq rs1'', x0, offset. Spike ISS Implementation:require_extension(EXT_ZCA); if (RVC_RS1S == 0) set_pc(pc + insn.rvc_b_imm()); |
c.bnez | rs1_p, c_bimm9 |
C.BNEZ is defined analogously, but it takes the branch if rs1' contains a nonzero value. It expands to bne rs1'', x0, offset. Spike ISS Implementation:require_extension(EXT_ZCA); if (RVC_RS1S != 0) set_pc(pc + insn.rvc_b_imm()); |
c.j | c_imm12 |
C.J performs an unconditional control transfer. The offset is sign-extended and added to the pc to form the jump target address. C.J can therefore target a ±2 KiB range. C.J expands to jal x0, offset. C.JAL is an RV32C-only instruction that performs the same operation as C.J, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1. C.JAL expands to jal x1, offset. Spike ISS Implementation:require_extension(EXT_ZCA); set_pc(pc + insn.rvc_j_imm()); |
c.jal | c_imm12 |
C.JAL is an RV32C-only instruction that performs the same operation as C.J, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1. C.JAL expands to jal x1, offset. Spike ISS Implementation:require_extension(EXT_ZCA); if (xlen == 32) { reg_t tmp = npc; set_pc(pc + insn.rvc_j_imm()); WRITE_REG(X_RA, tmp); } else { // c.addiw require(insn.rvc_rd() != 0); WRITE_RD(sext32(RVC_RS1 + insn.rvc_imm())); } |
c / integer-constant-generation-instructions
17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions
Operation | Arguments | Description |
c.addi16sp | c_nzimm10 |
C.LUI loads the non-zero 6-bit immediate field into bits 17-12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination. C.LUI expands into lui rd, nzimm. C.LUI is only valid when rd {x0,x2}, and when the immediate is not equal to zero. The code points with nzimm=0 are reserved; the remaining code points with rd=x0 are HINTs; and the remaining code points with rd=x2 correspond to the C.ADDI16SP instruction. |
c.li | rd, c_imm6 |
C.LI loads the sign-extended 6-bit immediate, imm, into register rd. C.LI expands into addi rd, x0, imm. C.LI is only valid when rd x0; the code points with rd=x0 encode HINTs. Spike ISS Implementation:require_extension(EXT_ZCA); WRITE_RD(insn.rvc_imm()); |
c.lui | rd_n2, c_nzimm18 |
C.LUI loads the non-zero 6-bit immediate field into bits 17-12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination. C.LUI expands into lui rd, nzimm. C.LUI is only valid when rd {x0,x2}, and when the immediate is not equal to zero. The code points with nzimm=0 are reserved; the remaining code points with rd=x0 are HINTs; and the remaining code points with rd=x2 correspond to the C.ADDI16SP instruction. Spike ISS Implementation:require_extension(EXT_ZCA); if (insn.rvc_rd() == 2) { // c.addi16sp require(insn.rvc_addi16sp_imm() != 0); WRITE_REG(X_SP, sext_xlen(RVC_SP + insn.rvc_addi16sp_imm())); } else { require(insn.rvc_imm() != 0); WRITE_RD(insn.rvc_imm() << 12); } |
c / integer-register-immediate-operations
17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions
Operation | Arguments | Description |
c.addi | rd_rs1_n0, c_nzimm6, c_nzimm6 |
C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in register rd then writes the result to rd. C.ADDI expands into addi rd, rd, nzimm. C.ADDI is only valid when rd x0 and nzimm 0. The code points with rd=x0 encode the C.NOP instruction; the remaining code points with nzimm=0 encode HINTs. Spike ISS Implementation:require_extension(EXT_ZCA); WRITE_RD(sext_xlen(RVC_RS1 + insn.rvc_imm())); |
c.addi4spn | rd_p, c_nzuimm10 |
C.ADDI4SPN is a CIW-format instruction that adds a zero-extended non-zero immediate, scaled by 4, to the stack pointer, x2, and writes the result to rd''. This instruction is used to generate pointers to stack-allocated variables, and expands to addi rd'', x2, nzuimm. C.ADDI4SPN is only valid when nzuimm 0; the code points with nzuimm=0 are reserved. Spike ISS Implementation:require_extension(EXT_ZCA); require(insn.rvc_addi4spn_imm() != 0); WRITE_RVC_RS2S(sext_xlen(RVC_SP + insn.rvc_addi4spn_imm())); |
c.addiw | rd_rs1_n0, c_imm6 |
C.ADDIW is an RV64C/RV128C-only instruction that performs the same computation but produces a 32-bit result, then sign-extends result to 64 bits. C.ADDIW expands into addiw rd, rd, imm. The immediate can be zero for C.ADDIW, where this corresponds to sext.w rd. C.ADDIW is only valid when rd x0; the code points with rd=x0 are reserved. |
c.andi | rd_rs1_p, c_imm6 |
C.ANDI is a CB-format instruction that computes the bitwise AND of the value in register rd' and the sign-extended 6-bit immediate, then writes the result to rd'. C.ANDI expands to andi rd'', rd'', imm. Spike ISS Implementation:require_extension(EXT_ZCA); WRITE_RVC_RS1S(RVC_RS1S & insn.rvc_imm()); |
c.slli | rd_rs1_n0, c_nzuimm6 |
C.SLLI is a CI-format instruction that performs a logical left shift of the value in register rd then writes the result to rd. The shift amount is encoded in the shamt field. For RV128C, a shift amount of zero is used to encode a shift of 64. C.SLLI expands into slli rd, rd, shamt, except for RV128C with shamt=0, which expands to slli rd, rd, 64. Spike ISS Implementation:require_extension(EXT_ZCA); require(insn.rvc_zimm() < xlen); WRITE_RD(sext_xlen(RVC_RS1 << insn.rvc_zimm())); |
c.srai | rd_rs1_p, c_nzuimm6 |
C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic right shift. C.SRAI expands to srai rd'', rd'', shamt. Spike ISS Implementation:require_extension(EXT_ZCA); require(insn.rvc_zimm() < xlen); WRITE_RVC_RS1S(sext_xlen(sext_xlen(RVC_RS1S) >> insn.rvc_zimm())); |
c.srli | rd_rs1_p, c_nzuimm6 |
C.SRLI is a CB-format instruction that performs a logical right shift of the value in register rd' then writes the result to rd'. The shift amount is encoded in the shamt field. For RV128C, a shift amount of zero is used to encode a shift of 64. Furthermore, the shift amount is sign-extended for RV128C, and so the legal shift amounts are 1-31, 64, and 96-127. C.SRLI expands into srli rd'', rd'', shamt, except for RV128C with shamt=0, which expands to srli rd'', rd'', 64. C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic right shift. C.SRAI expands to srai rd'', rd'', shamt. Spike ISS Implementation:require_extension(EXT_ZCA); require(insn.rvc_zimm() < xlen); WRITE_RVC_RS1S(sext_xlen(zext_xlen(RVC_RS1S) >> insn.rvc_zimm())); |
c / integer-register-register-operations
17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions
Operation | Arguments | Description |
c.add | rd_rs1, c_rs2_n0 |
C.ADD adds the values in registers rd and rs2 and writes the result to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JALR and C.EBREAK instructions. The code points with rs2 x0 and rd = x0 are HINTs. Spike ISS Implementation:require_extension(EXT_ZCA); require(insn.rvc_rs2() != 0); WRITE_RD(sext_xlen(RVC_RS1 + RVC_RS2)); |
c.addw | rd_rs1_p, rs2_p |
C.ADDW is an RV64C/RV128C-only instruction that adds the values in registers rd' and rs2', then sign-extends the lower 32 bits of the sum before writing the result to register rd'. C.ADDW expands into addw rd'', rd'', rs2''. Spike ISS Implementation:require_extension(EXT_ZCA); require_rv64; WRITE_RVC_RS1S(sext32(RVC_RS1S + RVC_RS2S)); |
c.and | rd_rs1_p, rs2_p |
C.AND computes the bitwise AND of the values in registers rd' and rs2', then writes the result to register rd'. C.AND expands into and rd'', rd'', rs2''. Spike ISS Implementation:require_extension(EXT_ZCA); WRITE_RVC_RS1S(RVC_RS1S & RVC_RS2S); |
c.ebreak |
C.ADD adds the values in registers rd and rs2 and writes the result to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JALR and C.EBREAK instructions. The code points with rs2 x0 and rd = x0 are HINTs. Spike ISS Implementation:require_extension(EXT_ZCA); if (!STATE.debug_mode && ( (!STATE.v && STATE.prv == PRV_M && STATE.dcsr->ebreakm) || (!STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreaks) || (!STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreaku) || (STATE.v && STATE.prv == PRV_S && STATE.dcsr->ebreakvs) || (STATE.v && STATE.prv == PRV_U && STATE.dcsr->ebreakvu))) { throw trap_debug_mode(); } else { throw trap_breakpoint(STATE.v, pc); } |
|
c.jalr | c_rs1_n0 |
C.ADD adds the values in registers rd and rs2 and writes the result to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JALR and C.EBREAK instructions. The code points with rs2 x0 and rd = x0 are HINTs. Spike ISS Implementation:require_extension(EXT_ZCA); require(insn.rvc_rs1() != 0); reg_t tmp = npc; set_pc(RVC_RS1 & ~reg_t(1)); WRITE_REG(X_RA, tmp); |
c.jr | rs1_n0 |
C.MV copies the value in register rs2 into register rd. C.MV expands into add rd, x0, rs2. C.MV is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JR instruction. The code points with rs2 x0 and rd = x0 are HINTs. Spike ISS Implementation:require_extension(EXT_ZCA); require(insn.rvc_rs1() != 0); set_pc(RVC_RS1 & ~reg_t(1)); |
c.mv | rd, c_rs2_n0 |
C.MV copies the value in register rs2 into register rd. C.MV expands into add rd, x0, rs2. C.MV is only valid when rs2 x0; the code points with rs2 = x0 correspond to the C.JR instruction. The code points with rs2 x0 and rd = x0 are HINTs. C.MV expands to a different instruction than the canonical MV pseudoinstruction, which instead uses ADDI. Implementations that handle MV specially, e.g. using register-renaming hardware, may find it more convenient to expand C.MV to MV instead of ADD, at slight additional hardware cost. Spike ISS Implementation:require_extension(EXT_ZCA); require(insn.rvc_rs2() != 0); WRITE_RD(RVC_RS2); |
c.or | rd_rs1_p, rs2_p |
C.OR computes the bitwise OR of the values in registers rd' and rs2', then writes the result to register rd'. C.OR expands into or rd'', rd'', rs2''. Spike ISS Implementation:require_extension(EXT_ZCA); WRITE_RVC_RS1S(RVC_RS1S | RVC_RS2S); |
c.sub | rd_rs1_p, rs2_p |
C.SUB subtracts the value in register rs2' from the value in register rd', then writes the result to register rd'. C.SUB expands into sub rd'', rd'', rs2''. Spike ISS Implementation:require_extension(EXT_ZCA); WRITE_RVC_RS1S(sext_xlen(RVC_RS1S - RVC_RS2S)); |
c.subw | rd_rs1_p, rs2_p |
C.SUBW is an RV64C/RV128C-only instruction that subtracts the value in register rs2' from the value in register rd', then sign-extends the lower 32 bits of the difference before writing the result to register rd'. C.SUBW expands into subw rd'', rd'', rs2''. Spike ISS Implementation:require_extension(EXT_ZCA); require_rv64; WRITE_RVC_RS1S(sext32(RVC_RS1S - RVC_RS2S)); |
c.xor | rd_rs1_p, rs2_p |
C.XOR computes the bitwise XOR of the values in registers rd' and rs2', then writes the result to register rd'. C.XOR expands into xor rd'', rd'', rs2''. Spike ISS Implementation:require_extension(EXT_ZCA); WRITE_RVC_RS1S(RVC_RS1S ^ RVC_RS2S); |
c / load-and-store-instructions
17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.3 Load and Store Instructions
Operation | Arguments | Description |
c.fld | rd_p, rs1_p, c_uimm8 |
C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register rd'. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to fld rd'', offset(rs1''). Spike ISS Implementation:require_extension(EXT_ZCD); require_fp; WRITE_RVC_FRS2S(f64(MMU.load<uint64_t>(RVC_RS1S + insn.rvc_ld_imm()))); |
c.flw | rd_p, rs1_p, c_uimm7 |
C.FLW is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to flw rd'', offset(rs1''). Spike ISS Implementation:if (xlen == 32) { require_extension(EXT_ZCF); require_fp; WRITE_RVC_FRS2S(f32(MMU.load<uint32_t>(RVC_RS1S + insn.rvc_lw_imm()))); } else { // c.ld require_extension(EXT_ZCA); WRITE_RVC_RS2S(MMU.load<int64_t>(RVC_RS1S + insn.rvc_ld_imm())); } |
c.fsd | rs1_p, rs2_p, c_uimm8 |
C.FSD is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to fsd rs2'', offset(rs1''). Spike ISS Implementation:require_extension(EXT_ZCD); require_fp; MMU.store<uint64_t>(RVC_RS1S + insn.rvc_ld_imm(), RVC_FRS2S.v[0]); |
c.fsw | rs1_p, rs2_p, c_uimm7 |
C.FSW is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to fsw rs2'', offset(rs1''). Spike ISS Implementation:if (xlen == 32) { require_extension(EXT_ZCF); require_fp; MMU.store<uint32_t>(RVC_RS1S + insn.rvc_lw_imm(), RVC_FRS2S.v[0]); } else { // c.sd require_extension(EXT_ZCA); MMU.store<uint64_t>(RVC_RS1S + insn.rvc_ld_imm(), RVC_RS2S); } |
c.ld | rd_p, rs1_p, c_uimm8 |
C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to ld rd'', offset(rs1''). |
c.lw | rd_p, rs1_p, c_uimm7 |
C.LW loads a 32-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to lw rd'', offset(rs1''). Spike ISS Implementation:require_extension(EXT_ZCA); WRITE_RVC_RS2S(MMU.load<int32_t>(RVC_RS1S + insn.rvc_lw_imm())); |
c.sd | rs1_p, rs2_p, c_uimm8 |
C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. It expands to sd rs2'', offset(rs1''). |
c.sw | rs1_p, rs2_p, c_uimm7 |
C.SW stores a 32-bit value in register rs2' to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. It expands to sw rs2'', offset(rs1''). Spike ISS Implementation:require_extension(EXT_ZCA); MMU.store<uint32_t>(RVC_RS1S + insn.rvc_lw_imm(), RVC_RS2S); |
c / nop-instruction
17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.5 Integer Computational Instructions
Operation | Arguments | Description |
c.nop | c_nzimm6 |
C.NOP is a CI-format instruction that does not change any user-visible state, except for advancing the pc and incrementing any applicable performance counters. C.NOP expands to nop. C.NOP is only valid when imm=0; the code points with imm 0 encode HINTs. |
c / stack-pointer-based-loads-and-stores
17 “C” Standard Extension for Compressed Instructions, Version 2.0 / 17.3 Load and Store Instructions
Operation | Arguments | Description |
c.fldsp | rd, c_uimm9sp |
C.FLDSP is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register rd. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to fld rd, offset(x2). Spike ISS Implementation:require_extension(EXT_ZCD); require_fp; WRITE_FRD(f64(MMU.load<uint64_t>(RVC_SP + insn.rvc_ldsp_imm()))); |
c.flwsp | rd, c_uimm8sp |
C.FLWSP is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register rd. It computes its effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to flw rd, offset(x2). Spike ISS Implementation:if (xlen == 32) { require_extension(EXT_ZCF); require_fp; WRITE_FRD(f32(MMU.load<uint32_t>(RVC_SP + insn.rvc_lwsp_imm()))); } else { // c.ldsp require_extension(EXT_ZCA); require(insn.rvc_rd() != 0); WRITE_RD(MMU.load<int64_t>(RVC_SP + insn.rvc_ldsp_imm())); } |
c.fsdsp | c_rs2, c_uimm9sp_s |
C.FSDSP is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to fsd rs2, offset(x2). Spike ISS Implementation:require_extension(EXT_ZCD); require_fp; MMU.store<uint64_t>(RVC_SP + insn.rvc_sdsp_imm(), RVC_FRS2.v[0]); |
c.fswsp | c_rs2, c_uimm8sp_s |
C.FSWSP is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to fsw rs2, offset(x2). Spike ISS Implementation:if (xlen == 32) { require_extension(EXT_ZCF); require_fp; MMU.store<uint32_t>(RVC_SP + insn.rvc_swsp_imm(), RVC_FRS2.v[0]); } else { // c.sdsp require_extension(EXT_ZCA); MMU.store<uint64_t>(RVC_SP + insn.rvc_sdsp_imm(), RVC_RS2); } |
c.ldsp | rd_n0, c_uimm9sp |
C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into register rd. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to ld rd, offset(x2). C.LDSP is only valid when rd x0; the code points with rd = x0 are reserved. |
c.lwsp | rd_n0, c_uimm8sp |
C.LWSP loads a 32-bit value from memory into register rd. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to lw rd, offset(x2). C.LWSP is only valid when rd x0; the code points with rd = x0 are reserved. Spike ISS Implementation:require_extension(EXT_ZCA); require(insn.rvc_rd() != 0); WRITE_RD(MMU.load<int32_t>(RVC_SP + insn.rvc_lwsp_imm())); |
c.sdsp | c_rs2, c_uimm9sp_s |
C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, x2. It expands to sd rs2, offset(x2). |
c.swsp | c_rs2, c_uimm8sp_s |
C.SWSP stores a 32-bit value in register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. It expands to sw rs2, offset(x2). Spike ISS Implementation:require_extension(EXT_ZCA); MMU.store<uint32_t>(RVC_SP + insn.rvc_swsp_imm(), RVC_RS2); |
d
d / d-standard-extension-for-double-precision-floating-point-version-2.2
13 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / 13.7 Double-Precision Floating-Point Classify Instruction
Operation | Arguments | Description |
fclass.d | rd, rs1 |
The double-precision floating-point classify instruction, FCLASS.D, is defined analogously to its single-precision counterpart, but operates on double-precision operands. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; WRITE_RD(f64_classify(FRS1_D)); |
d / double-precision-floating-point-conversion-and-move-instructions
13 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / 13.5 Double-Precision Floating-Point Conversion and Move Instructions
Operation | Arguments | Description |
fcvt.d.l | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S. |
fcvt.d.lu | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S. |
fcvt.d.s | rd, rs1 |
The double-precision to single-precision and single-precision to double-precision conversion instructions, FCVT.S.D and FCVT.D.S, are encoded in the OP-FP major opcode space and both the source and destination are floating-point registers. The rs2 field encodes the datatype of the source, and the fmt field encodes the datatype of the destination. FCVT.S.D rounds according to the RM field; FCVT.D.S will never round. |
fcvt.d.w | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S. All floating-point to integer and integer to floating-point conversion instructions round according to the rm field. Note FCVT.D.W[U] always produces an exact result and is unaffected by rounding mode. |
fcvt.d.wu | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S. |
fcvt.l.d | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S. |
fcvt.lu.d | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S. |
fcvt.s.d | rd, rs1 |
The double-precision to single-precision and single-precision to double-precision conversion instructions, FCVT.S.D and FCVT.D.S, are encoded in the OP-FP major opcode space and both the source and destination are floating-point registers. The rs2 field encodes the datatype of the source, and the fmt field encodes the datatype of the destination. FCVT.S.D rounds according to the RM field; FCVT.D.S will never round. |
fcvt.w.d | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S. |
fcvt.wu.d | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd. FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values. For RV64, FCVT.W[U].D sign-extends the 32-bit result. FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions. The range of valid inputs for FCVT.int.D and the behavior for invalid inputs are the same as for FCVT.int.S. |
fmv.d | rd, rs |
For XLEN>=64 only, instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.D moves the double-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd. FMV.D.X moves the double-precision value encoded in IEEE 754-2008 standard encoding from the integer register rs1 to the floating-point register rd. FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Psuedo Opcode, Equivalent Operations:fsgnj.d rd, rs, rs |
fmv.x.d | rd, rs1 |
For XLEN>=64 only, instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.D moves the double-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd. FMV.D.X moves the double-precision value encoded in IEEE 754-2008 standard encoding from the integer register rs1 to the floating-point register rd. FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. |
fsgnj.d | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; WRITE_FRD_D(fsgnj64(freg(FRS1_D), freg(FRS2_D), false, false)); |
fsgnjn.d | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; WRITE_FRD_D(fsgnj64(freg(FRS1_D), freg(FRS2_D), true, false)); |
fsgnjx.d | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; WRITE_FRD_D(fsgnj64(freg(FRS1_D), freg(FRS2_D), false, true)); |
d / fld_fsd
13 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / 13.3 Double-Precision Load and Store Instructions
Operation | Arguments | Description |
fld | rd, rs1, imm12 |
The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd. FSD stores a double-precision value from the floating-point registers to memory. FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN>=64. FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation:require_extension('D'); require_fp; WRITE_FRD(f64(MMU.load<uint64_t>(RS1 + insn.i_imm()))); |
fsd | rs1, rs2, imm12 |
The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd. FSD stores a double-precision value from the floating-point registers to memory. FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN>=64. FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation:require_extension('D'); require_fp; MMU.store<uint64_t>(RS1 + insn.s_imm(), FRS2.v[0]); |
d / sec:single-float-compute
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.6 Single-Precision Floating-Point Computational Instructions
Operation | Arguments | Description |
fadd.d | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_add(FRS1_D, FRS2_D)); set_fp_exceptions; |
fdiv.d | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_div(FRS1_D, FRS2_D)); set_fp_exceptions; |
fmadd.d | rd, rs1, rs2, rs3 |
FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mulAdd(FRS1_D, FRS2_D, FRS3_D)); set_fp_exceptions; |
fmax.d | rd, rs1, rs2 |
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; bool greater = f64_lt_quiet(FRS2_D, FRS1_D) || (f64_eq(FRS2_D, FRS1_D) && (FRS2_D.v & F64_SIGN)); if (isNaNF64UI(FRS1_D.v) && isNaNF64UI(FRS2_D.v)) WRITE_FRD_D(f64(defaultNaNF64UI)); else WRITE_FRD_D((greater || isNaNF64UI(FRS2_D.v) ? FRS1_D : FRS2_D)); set_fp_exceptions; |
fmin.d | rd, rs1, rs2 |
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; bool less = f64_lt_quiet(FRS1_D, FRS2_D) || (f64_eq(FRS1_D, FRS2_D) && (FRS1_D.v & F64_SIGN)); if (isNaNF64UI(FRS1_D.v) && isNaNF64UI(FRS2_D.v)) WRITE_FRD_D(f64(defaultNaNF64UI)); else WRITE_FRD_D((less || isNaNF64UI(FRS2_D.v) ? FRS1_D : FRS2_D)); set_fp_exceptions; |
fmsub.d | rd, rs1, rs2, rs3 |
FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mulAdd(FRS1_D, FRS2_D, f64(FRS3_D.v ^ F64_SIGN))); set_fp_exceptions; |
fmul.d | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mul(FRS1_D, FRS2_D)); set_fp_exceptions; |
fnmadd.d | rd, rs1, rs2, rs3 |
FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mulAdd(f64(FRS1_D.v ^ F64_SIGN), FRS2_D, f64(FRS3_D.v ^ F64_SIGN))); set_fp_exceptions; |
fnmsub.d | rd, rs1, rs2, rs3 |
FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_mulAdd(f64(FRS1_D.v ^ F64_SIGN), FRS2_D, FRS3_D)); set_fp_exceptions; |
fsqrt.d | rd, rs1 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_sqrt(FRS1_D)); set_fp_exceptions; |
fsub.d | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_D(f64_sub(FRS1_D, FRS2_D)); set_fp_exceptions; |
d / single-precision-floating-point-compare-instructions
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.8 Single-Precision Floating-Point Compare Instructions
Operation | Arguments | Description |
feq.d | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; WRITE_RD(f64_eq(FRS1_D, FRS2_D)); set_fp_exceptions; |
fle.d | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; WRITE_RD(f64_le(FRS1_D, FRS2_D)); set_fp_exceptions; |
flt.d | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation:require_either_extension('D', EXT_ZDINX); require_fp; WRITE_RD(f64_lt(FRS1_D, FRS2_D)); set_fp_exceptions; |
d / single-precision-floating-point-conversion-and-move-instructions
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.7 Single-Precision Floating-Point Conversion and Move Instructions
Operation | Arguments | Description |
fabs.d | rd, rs |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Psuedo Opcode, Equivalent Operations:fsgnjx.d rd, rs, rs |
fmv.d.x | rd, rs1 |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools. |
fneg.d | rd, rs |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Psuedo Opcode, Equivalent Operations:fsgnjn.d rd, rs, rs |
f
f / floating-point-control-and-status-register
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.2 Floating-Point Control and Status Register
Operation | Arguments | Description |
frcsr | rd |
The fcsr register can be read and written with the FRCSR and FSCSR instructions, which are assembler pseudoinstructions built on the underlying CSR access instructions. FRCSR reads fcsr by copying it into integer register rd. FSCSR swaps the value in fcsr by copying the original value into integer register rd, and then writing a new value obtained from integer register rs1 into fcsr. |
frflags | rd |
The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags. |
frrm | rd |
The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags. |
fscsr | rd, rs1 |
The fcsr register can be read and written with the FRCSR and FSCSR instructions, which are assembler pseudoinstructions built on the underlying CSR access instructions. FRCSR reads fcsr by copying it into integer register rd. FSCSR swaps the value in fcsr by copying the original value into integer register rd, and then writing a new value obtained from integer register rs1 into fcsr. |
fsflags | rd, rs1 |
The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags. |
fsrm | rd, rs1 |
The fields within the fcsr can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode field frm and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 into frm. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags field fflags. |
f / sec:single-float
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.9 Single-Precision Floating-Point Classify Instruction
Operation | Arguments | Description |
fclass.s | rd, rs1 |
The FCLASS.S instruction examines the value in floating-point register rs1 and writes to integer register rd a 10-bit mask that indicates the class of the floating-point number. The format of the mask is described in Table [tab:fclass] . The corresponding bit in rd will be set if the property is true and clear otherwise. All other bits in rd are cleared. Note that exactly one bit in rd will be set. FCLASS.S does not set the floating-point exception flags. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; WRITE_RD(f32_classify(FRS1_F)); |
f / sec:single-float-compute
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.6 Single-Precision Floating-Point Computational Instructions
Operation | Arguments | Description |
fadd.s | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_add(FRS1_F, FRS2_F)); set_fp_exceptions; |
fdiv.s | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_div(FRS1_F, FRS2_F)); set_fp_exceptions; |
fmadd.s | rd, rs1, rs2, rs3 |
FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mulAdd(FRS1_F, FRS2_F, FRS3_F)); set_fp_exceptions; |
fmax.s | rd, rs1, rs2 |
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; bool greater = f32_lt_quiet(FRS2_F, FRS1_F) || (f32_eq(FRS2_F, FRS1_F) && (FRS2_F.v & F32_SIGN)); if (isNaNF32UI(FRS1_F.v) && isNaNF32UI(FRS2_F.v)) WRITE_FRD_F(f32(defaultNaNF32UI)); else WRITE_FRD_F((greater || isNaNF32UI(FRS2_F.v) ? FRS1_F : FRS2_F)); set_fp_exceptions; |
fmin.s | rd, rs1, rs2 |
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; bool less = f32_lt_quiet(FRS1_F, FRS2_F) || (f32_eq(FRS1_F, FRS2_F) && (FRS1_F.v & F32_SIGN)); if (isNaNF32UI(FRS1_F.v) && isNaNF32UI(FRS2_F.v)) WRITE_FRD_F(f32(defaultNaNF32UI)); else WRITE_FRD_F((less || isNaNF32UI(FRS2_F.v) ? FRS1_F : FRS2_F)); set_fp_exceptions; |
fmsub.s | rd, rs1, rs2, rs3 |
FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mulAdd(FRS1_F, FRS2_F, f32(FRS3_F.v ^ F32_SIGN))); set_fp_exceptions; |
fmul.s | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mul(FRS1_F, FRS2_F)); set_fp_exceptions; |
fnmadd.s | rd, rs1, rs2, rs3 |
FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mulAdd(f32(FRS1_F.v ^ F32_SIGN), FRS2_F, f32(FRS3_F.v ^ F32_SIGN))); set_fp_exceptions; |
fnmsub.s | rd, rs1, rs2, rs3 |
FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_mulAdd(f32(FRS1_F.v ^ F32_SIGN), FRS2_F, FRS3_F)); set_fp_exceptions; |
fsqrt.s | rd, rs1 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_sqrt(FRS1_F)); set_fp_exceptions; |
fsub.s | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; softfloat_roundingMode = RM; WRITE_FRD_F(f32_sub(FRS1_F, FRS2_F)); set_fp_exceptions; |
f / single-precision-floating-point-compare-instructions
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.8 Single-Precision Floating-Point Compare Instructions
Operation | Arguments | Description |
feq.s | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; WRITE_RD(f32_eq(FRS1_F, FRS2_F)); set_fp_exceptions; |
fle.s | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; WRITE_RD(f32_le(FRS1_F, FRS2_F)); set_fp_exceptions; |
flt.s | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; WRITE_RD(f32_lt(FRS1_F, FRS2_F)); set_fp_exceptions; |
f / single-precision-floating-point-conversion-and-move-instructions
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.7 Single-Precision Floating-Point Conversion and Move Instructions
Operation | Arguments | Description |
fabs.s | rd, rs |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Psuedo Opcode, Equivalent Operations:fsgnjx.s rd, rs, rs |
fcvt.l.s | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. FCVT.L.S |
fcvt.lu.s | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. FCVT.LU.S |
fcvt.s.l | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. |
fcvt.s.lu | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. |
fcvt.s.w | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. All floating-point to integer and integer to floating-point conversion instructions round according to the rm field. A floating-point register can be initialized to floating-point positive zero using FCVT.S.W rd, x0, which will never set any exception flags. |
fcvt.s.wu | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. |
fcvt.w.s | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. FCVT.W.S |
fcvt.wu.s | rd, rs1 |
Floating-point-to-integer and integer-to-floating-point conversion instructions are encoded in the OP-FP major opcode space. FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values. For XLEN > 32, FCVT.W[U].S sign-extends the 32-bit result to the destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions. If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set. Table [tab:int_conv] gives the range of valid inputs for FCVT.int.S and the behavior for invalid inputs. FCVT.WU.S |
fmv.s | rd, rs |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools. Psuedo Opcode, Equivalent Operations:fsgnj.s rd, rs, rs |
fmv.w.x | rd, rs1 |
FMV.W.X moves the single-precision value encoded in IEEE 754-2008 standard encoding from the lower 32 bits of integer register rs1 to the floating-point register rd. The bits are not modified in the transfer, and in particular, the payloads of non-canonical NaNs are preserved. The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools. |
fmv.x.s | rd, rs1 |
The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools. |
fmv.x.w | rd, rs1 |
Instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.W moves the single-precision value in floating-point register rs1 represented in IEEE 754-2008 encoding to the lower 32 bits of integer register rd. The bits are not modified in the transfer, and in particular, the payloads of non-canonical NaNs are preserved. For RV64, the higher 32 bits of the destination register are filled with copies of the floating-point number's sign bit. The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools. |
fneg.s | rd, rs |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Psuedo Opcode, Equivalent Operations:fsgnjn.s rd, rs, rs |
fsgnj.s | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; WRITE_FRD_F(fsgnj32(freg(FRS1_F), freg(FRS2_F), false, false)); |
fsgnjn.s | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; WRITE_FRD_F(fsgnj32(freg(FRS1_F), freg(FRS2_F), true, false)); |
fsgnjx.s | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result's sign bit is rs2's sign bit; for FSGNJN, the result's sign bit is the opposite of rs2's sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry). Spike ISS Implementation:require_either_extension('F', EXT_ZFINX); require_fp; WRITE_FRD_F(fsgnj32(freg(FRS1_F), freg(FRS2_F), false, true)); |
neg | rd, rs |
The sign-injection instructions provide floating-point MV, ABS, and NEG, as well as supporting a few other operations, including the IEEE copySign operation and sign manipulation in transcendental math function libraries. Although MV, ABS, and NEG only need a single register operand, whereas FSGNJ instructions need two, it is unlikely most microarchitectures would add optimizations to benefit from the reduced number of register reads for these relatively infrequent instructions. Even in this case, a microarchitecture can simply detect when both source registers are the same for FSGNJ instructions and only read a single copy. Psuedo Opcode, Equivalent Operations:sub rd, x0, rs |
f / single-precision-load-and-store-instructions
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.5 Single-Precision Load and Store Instructions
Operation | Arguments | Description |
flw | rd, rs1, imm12 |
Floating-point loads and stores use the same base+offset addressing mode as the integer base ISAs, with a base address in register rs1 and a 12-bit signed byte offset. The FLW instruction loads a single-precision floating-point value from memory into floating-point register rd. FSW stores a single-precision value from floating-point register rs2 to memory. FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned. FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation:require_extension('F'); require_fp; WRITE_FRD(f32(MMU.load<uint32_t>(RS1 + insn.i_imm()))); |
fsw | rs1, rs2, imm12 |
Floating-point loads and stores use the same base+offset addressing mode as the integer base ISAs, with a base address in register rs1 and a 12-bit signed byte offset. The FLW instruction loads a single-precision floating-point value from memory into floating-point register rd. FSW stores a single-precision value from floating-point register rs2 to memory. FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned. FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation:require_extension('F'); require_fp; MMU.store<uint32_t>(RS1 + insn.s_imm(), FRS2.v[0]); |
m
division operations | multiplication operations |
m / division-operations
8 “M” Standard Extension for Integer Multiplication and Division, Version 2.0 / 8.2 Division Operations
Operation | Arguments | Description |
div | rd, rs1, rs2 |
DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend. If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U] rdr, rs1, rs2 (rdq cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides. DIV[W] Spike ISS Implementation:require_extension('M'); sreg_t lhs = sext_xlen(RS1); sreg_t rhs = sext_xlen(RS2); if (rhs == 0) WRITE_RD(UINT64_MAX); else if (lhs == INT64_MIN && rhs == -1) WRITE_RD(lhs); else WRITE_RD(sext_xlen(lhs / rhs)); |
divu | rd, rs1, rs2 |
DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend. DIVU[W] Spike ISS Implementation:require_extension('M'); reg_t lhs = zext_xlen(RS1); reg_t rhs = zext_xlen(RS2); if (rhs == 0) WRITE_RD(UINT64_MAX); else WRITE_RD(sext_xlen(lhs / rhs)); |
divuw | rd, rs1, rs2 |
DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero. Spike ISS Implementation:require_extension('M'); require_rv64; reg_t lhs = zext32(RS1); reg_t rhs = zext32(RS2); if (rhs == 0) WRITE_RD(UINT64_MAX); else WRITE_RD(sext32(lhs / rhs)); |
divw | rd, rs1, rs2 |
DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero. Spike ISS Implementation:require_extension('M'); require_rv64; sreg_t lhs = sext32(RS1); sreg_t rhs = sext32(RS2); if (rhs == 0) WRITE_RD(UINT64_MAX); else WRITE_RD(sext32(lhs / rhs)); |
rem | rd, rs1, rs2 |
DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend. If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U] rdr, rs1, rs2 (rdq cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides. REM[W] Spike ISS Implementation:require_extension('M'); sreg_t lhs = sext_xlen(RS1); sreg_t rhs = sext_xlen(RS2); if (rhs == 0) WRITE_RD(lhs); else if (lhs == INT64_MIN && rhs == -1) WRITE_RD(0); else WRITE_RD(sext_xlen(lhs % rhs)); |
remu | rd, rs1, rs2 |
DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend. REMU[W] Spike ISS Implementation:require_extension('M'); reg_t lhs = zext_xlen(RS1); reg_t rhs = zext_xlen(RS2); if (rhs == 0) WRITE_RD(sext_xlen(RS1)); else WRITE_RD(sext_xlen(lhs % rhs)); |
remuw | rd, rs1, rs2 |
DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero. Spike ISS Implementation:require_extension('M'); require_rv64; reg_t lhs = zext32(RS1); reg_t rhs = zext32(RS2); if (rhs == 0) WRITE_RD(sext32(lhs)); else WRITE_RD(sext32(lhs % rhs)); |
remw | rd, rs1, rs2 |
DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2, treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively. Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero. Spike ISS Implementation:require_extension('M'); require_rv64; sreg_t lhs = sext32(RS1); sreg_t rhs = sext32(RS2); if (rhs == 0) WRITE_RD(lhs); else WRITE_RD(sext32(lhs % rhs)); |
m / multiplication-operations
8 “M” Standard Extension for Integer Multiplication and Division, Version 2.0 / 8.1 Multiplication Operations
Operation | Arguments | Description |
mul | rd, rs1, rs2 |
MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. In RV64, MUL can be used to obtain the upper 32 bits of the 64-bit product, but signed arguments must be proper 32-bit signed values, whereas unsigned arguments must have their upper 32 bits clear. If the arguments are not known to be sign- or zero-extended, an alternative is to shift both arguments left by 32 bits, then use MULH[[S]U]. Spike ISS Implementation:require_either_extension('M', EXT_ZMMUL); WRITE_RD(sext_xlen(RS1 * RS2)); |
mulh | rd, rs1, rs2 |
MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. In RV64, MUL can be used to obtain the upper 32 bits of the 64-bit product, but signed arguments must be proper 32-bit signed values, whereas unsigned arguments must have their upper 32 bits clear. If the arguments are not known to be sign- or zero-extended, an alternative is to shift both arguments left by 32 bits, then use MULH[[S]U]. Spike ISS Implementation:require_either_extension('M', EXT_ZMMUL); if (xlen == 64) WRITE_RD(mulh(RS1, RS2)); else WRITE_RD(sext32((sext32(RS1) * sext32(RS2)) >> 32)); |
mulhsu | rd, rs1, rs2 |
MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. MULHSU is used in multi-word signed multiplication to multiply the most-significant word of the multiplicand (which contains the sign bit) with the less-significant words of the multiplier (which are unsigned). Spike ISS Implementation:require_either_extension('M', EXT_ZMMUL); if (xlen == 64) WRITE_RD(mulhsu(RS1, RS2)); else WRITE_RD(sext32((sext32(RS1) * reg_t((uint32_t)RS2)) >> 32)); |
mulhu | rd, rs1, rs2 |
MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and signed rs1×unsigned rs2 multiplication, respectively. If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. Spike ISS Implementation:require_either_extension('M', EXT_ZMMUL); if (xlen == 64) WRITE_RD(mulhu(RS1, RS2)); else WRITE_RD(sext32(((uint64_t)(uint32_t)RS1 * (uint64_t)(uint32_t)RS2) >> 32)); |
mulw | rd, rs1, rs2 |
MULW is an RV64 instruction that multiplies the lower 32 bits of the source registers, placing the sign-extension of the lower 32 bits of the result into the destination register. Spike ISS Implementation:require_either_extension('M', EXT_ZMMUL); require_rv64; WRITE_RD(sext32(RS1 * RS2)); |
q
q / q-standard-extension-for-quad-precision-floating-point-version-2.2
14 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / 14.5 Quad-Precision Floating-Point Classify Instruction
Operation | Arguments | Description |
fclass.q | rd, rs1 |
The quad-precision floating-point classify instruction, FCLASS.Q, is defined analogously to its double-precision counterpart, but operates on quad-precision operands. Spike ISS Implementation:require_extension('Q'); require_fp; WRITE_RD(f128_classify(f128(FRS1))); |
q / quad-precision-convert-and-move-instructions
14 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / 14.3 Quad-Precision Convert and Move Instructions
Operation | Arguments | Description |
fcvt.d.q | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. |
fcvt.l.q | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. |
fcvt.lu.q | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. |
fcvt.q.d | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. |
fcvt.q.l | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. |
fcvt.q.lu | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. |
fcvt.q.s | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. |
fcvt.q.w | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. |
fcvt.q.wu | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. |
fcvt.s.q | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. |
fcvt.w.q | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. |
fcvt.wu.q | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision-to-integer and integer-to-double-precision conversion instructions. FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number. FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. |
fsgnj.q | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction. Spike ISS Implementation:require_extension('Q'); require_fp; WRITE_FRD(fsgnj128(FRS1, FRS2, false, false)); |
fsgnjn.q | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction. Spike ISS Implementation:require_extension('Q'); require_fp; WRITE_FRD(fsgnj128(FRS1, FRS2, true, false)); |
fsgnjx.q | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction. Spike ISS Implementation:require_extension('Q'); require_fp; WRITE_FRD(fsgnj128(FRS1, FRS2, false, true)); |
q / quad-precision-load-and-store-instructions
14 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / 14.1 Quad-Precision Load and Store Instructions
Operation | Arguments | Description |
flq | rd, rs1, imm12 |
FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128. FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation:require_extension('Q'); require_fp; WRITE_FRD(MMU.load_float128(RS1 + insn.i_imm())); |
fsq | rs1, rs2, imm12 |
FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128. FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. Spike ISS Implementation:require_extension('Q'); require_fp; MMU.store_float128(RS1 + insn.s_imm(), FRS2); |
q / sec:single-float-compute
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.6 Single-Precision Floating-Point Computational Instructions
Operation | Arguments | Description |
fadd.q | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_add(f128(FRS1), f128(FRS2))); set_fp_exceptions; |
fdiv.q | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_div(f128(FRS1), f128(FRS2))); set_fp_exceptions; |
fmadd.q | rd, rs1, rs2, rs3 |
FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3. Spike ISS Implementation:require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mulAdd(f128(FRS1), f128(FRS2), f128(FRS3))); set_fp_exceptions; |
fmax.q | rd, rs1, rs2 |
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation:require_extension('Q'); require_fp; bool greater = f128_lt_quiet(f128(FRS2), f128(FRS1)) || (f128_eq(f128(FRS2), f128(FRS1)) && (f128(FRS2).v[1] & F64_SIGN)); if (isNaNF128(f128(FRS1)) && isNaNF128(f128(FRS2))) WRITE_FRD(f128(defaultNaNF128())); else WRITE_FRD(greater || isNaNF128(f128(FRS2)) ? FRS1 : FRS2); set_fp_exceptions; |
fmin.q | rd, rs1, rs2 |
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value - 0.0 is considered to be less than the value + 0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN. Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. Spike ISS Implementation:require_extension('Q'); require_fp; bool less = f128_lt_quiet(f128(FRS1), f128(FRS2)) || (f128_eq(f128(FRS1), f128(FRS2)) && (f128(FRS1).v[1] & F64_SIGN)); if (isNaNF128(f128(FRS1)) && isNaNF128(f128(FRS2))) WRITE_FRD(f128(defaultNaNF128())); else WRITE_FRD(less || isNaNF128(f128(FRS2)) ? FRS1 : FRS2); set_fp_exceptions; |
fmsub.q | rd, rs1, rs2, rs3 |
FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3. Spike ISS Implementation:require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mulAdd(f128(FRS1), f128(FRS2), f128_negate(f128(FRS3)))); set_fp_exceptions; |
fmul.q | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mul(f128(FRS1), f128(FRS2))); set_fp_exceptions; |
fnmadd.q | rd, rs1, rs2, rs3 |
FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3. Spike ISS Implementation:require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mulAdd(f128_negate(f128(FRS1)), f128(FRS2), f128_negate(f128(FRS3)))); set_fp_exceptions; |
fnmsub.q | rd, rs1, rs2, rs3 |
FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3. Spike ISS Implementation:require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_mulAdd(f128_negate(f128(FRS1)), f128(FRS2), f128(FRS3))); set_fp_exceptions; |
fsqrt.q | rd, rs1 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_sqrt(f128(FRS1))); set_fp_exceptions; |
fsub.q | rd, rs1, rs2 |
Floating-point arithmetic instructions with one or two source operands use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1. FDIV.S performs the single-precision floating-point division of rs1 by rs2. FSQRT.S computes the square root of rs1. In each case, the result is written to rd. Spike ISS Implementation:require_extension('Q'); require_fp; softfloat_roundingMode = RM; WRITE_FRD(f128_sub(f128(FRS1), f128(FRS2))); set_fp_exceptions; |
q / single-precision-floating-point-compare-instructions
12 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / 12.8 Single-Precision Floating-Point Compare Instructions
Operation | Arguments | Description |
feq.q | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation:require_extension('Q'); require_fp; WRITE_RD(f128_eq(f128(FRS1), f128(FRS2))); set_fp_exceptions; |
fle.q | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation:require_extension('Q'); require_fp; WRITE_RD(f128_le(f128(FRS1), f128(FRS2))); set_fp_exceptions; |
flt.q | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (rs1 = rs2, rs1 < rs2, rs1 \leq rs2) writing 1 to the integer register rd if the condition holds, and 0 otherwise. FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. Spike ISS Implementation:require_extension('Q'); require_fp; WRITE_RD(f128_lt(f128(FRS1), f128(FRS2))); set_fp_exceptions; |
v
v / _introduction
- Introduction /
Operation | Arguments | Description |
vamoaddei16.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoadde.v vd, (rs1), vs2, vd VI_AMO({ return lhs + vs3; }, uint, e16); |
vamoaddei32.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoadde.v vd, (rs1), vs2, vd VI_AMO({ return lhs + vs3; }, uint, e32); |
vamoaddei64.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoadde.v vd, (rs1), vs2, vd VI_AMO({ return lhs + vs3; }, uint, e64); |
vamoaddei8.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoadde.v vd, (rs1), vs2, vd VI_AMO({ return lhs + vs3; }, uint, e8); |
vamoandei16.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoande.v vd, (rs1), vs2, vd VI_AMO({ return lhs & vs3; }, uint, e16); |
vamoandei32.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoande.v vd, (rs1), vs2, vd VI_AMO({ return lhs & vs3; }, uint, e32); |
vamoandei64.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoande.v vd, (rs1), vs2, vd VI_AMO({ return lhs & vs3; }, uint, e64); |
vamoandei8.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoande.v vd, (rs1), vs2, vd VI_AMO({ return lhs & vs3; }, uint, e8); |
vamomaxei16.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomaxe.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e16); |
vamomaxei32.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomaxe.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e32); |
vamomaxei64.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomaxe.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e64); |
vamomaxei8.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomaxe.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3; }, int, e8); |
vamomaxuei16.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomaxue.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e16); |
vamomaxuei32.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomaxue.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e32); |
vamomaxuei64.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomaxue.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e64); |
vamomaxuei8.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomaxue.v vd, (rs1), vs2, vd VI_AMO({ return lhs >= vs3 ? lhs : vs3;; }, uint, e8); |
vamominei16.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomine.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e16); |
vamominei32.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomine.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e32); |
vamominei64.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomine.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e64); |
vamominei8.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamomine.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3; }, int, e8); |
vamominuei16.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamominue.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e16); |
vamominuei32.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamominue.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e32); |
vamominuei64.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamominue.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e64); |
vamominuei8.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamominue.v vd, (rs1), vs2, vd VI_AMO({ return lhs < vs3 ? lhs : vs3;; }, uint, e8); |
vamoorei16.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs | vs3; }, uint, e16); |
vamoorei32.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs | vs3; }, uint, e32); |
vamoorei64.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs | vs3; }, uint, e64); |
vamoorei8.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs | vs3; }, uint, e8); |
vamoswapei16.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoswape.v vd, (rs1), vs2, vd VI_AMO({ return vs3; }, uint, e16); |
vamoswapei32.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoswape.v vd, (rs1), vs2, vd VI_AMO({ return vs3; }, uint, e32); |
vamoswapei64.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoswape.v vd, (rs1), vs2, vd VI_AMO({ return vs3; }, uint, e64); |
vamoswapei8.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoswape.v vd, (rs1), vs2, vd VI_AMO({ return vs3; }, uint, e8); |
vamoxorei16.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs ^ vs3; }, uint, e16); |
vamoxorei32.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs ^ vs3; }, uint, e32); |
vamoxorei64.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs ^ vs3; }, uint, e64); |
vamoxorei8.v | vs2, rs1, vd |
Spike ISS Implementation:
//vamoore.v vd, (rs1), vs2, vd VI_AMO({ return lhs ^ vs3; }, uint, e8); |
vl1re16.v | rs1, vd |
Spike ISS Implementation:
// vl1re16.v vd, (rs1) VI_LD_WHOLE(uint16); |
vl1re32.v | rs1, vd |
Spike ISS Implementation:
// vl1re32.v vd, (rs1) VI_LD_WHOLE(uint32); |
vl1re64.v | rs1, vd |
Spike ISS Implementation:
// vl1re64.v vd, (rs1) VI_LD_WHOLE(uint64); |
vl2re16.v | rs1, vd |
Spike ISS Implementation:
// vl2e16.v vd, (rs1) VI_LD_WHOLE(uint16); |
vl2re32.v | rs1, vd |
Spike ISS Implementation:
// vl2re32.v vd, (rs1) VI_LD_WHOLE(uint32); |
vl2re64.v | rs1, vd |
Spike ISS Implementation:
// vl2re64.v vd, (rs1) VI_LD_WHOLE(uint64); |
vl2re8.v | rs1, vd |
Spike ISS Implementation:
// vl2re8.v vd, (rs1) VI_LD_WHOLE(uint8); |
vl4re16.v | rs1, vd |
Spike ISS Implementation:
// vl4re16.v vd, (rs1) VI_LD_WHOLE(uint16); |
vl4re32.v | rs1, vd |
Spike ISS Implementation:
// vl4re32.v vd, (rs1) VI_LD_WHOLE(uint32); |
vl4re64.v | rs1, vd |
Spike ISS Implementation:
// vl4re64.v vd, (rs1) VI_LD_WHOLE(uint64); |
vl4re8.v | rs1, vd |
Spike ISS Implementation:
// vl4re8.v vd, (rs1) VI_LD_WHOLE(uint8); |
vl8re16.v | rs1, vd |
Spike ISS Implementation:
// vl8re16.v vd, (rs1) VI_LD_WHOLE(uint16); |
vl8re32.v | rs1, vd |
Spike ISS Implementation:
// vl8re32.v vd, (rs1) VI_LD_WHOLE(uint32); |
vl8re64.v | rs1, vd |
Spike ISS Implementation:
// vl8re64.v vd, (rs1) VI_LD_WHOLE(uint64); |
vl8re8.v | rs1, vd |
Spike ISS Implementation:
// vl8re8.v vd, (rs1) VI_LD_WHOLE(uint8); |
vle1024.v | rs1, vd | |
vle1024ff.v | rs1, vd | |
vle128.v | rs1, vd | |
vle128ff.v | rs1, vd | |
vle16.v | rs1, vd |
Spike ISS Implementation:
// vle16.v and vlseg[2-8]e16.v VI_LD(0, (i * nf + fn), int16, false); |
vle16ff.v | rs1, vd |
Spike ISS Implementation:
// vle16ff.v and vlseg[2-8]e16ff.v VI_LDST_FF(int16); |
vle256.v | rs1, vd | |
vle256ff.v | rs1, vd | |
vle32.v | rs1, vd |
Spike ISS Implementation:
// vle32.v and vlseg[2-8]e32.v VI_LD(0, (i * nf + fn), int32, false); |
vle32ff.v | rs1, vd |
Spike ISS Implementation:
// vle32ff.v and vlseg[2-8]e32ff.v VI_LDST_FF(int32); |
vle512.v | rs1, vd | |
vle512ff.v | rs1, vd | |
vle64.v | rs1, vd |
Spike ISS Implementation:
// vle64.v and vlseg[2-8]e64.v VI_LD(0, (i * nf + fn), int64, false); |
vle64ff.v | rs1, vd |
Spike ISS Implementation:
// vle64ff.v and vlseg[2-8]e64ff.v VI_LDST_FF(int64); |
vloxei1024.v | vs2, rs1, vd | |
vloxei128.v | vs2, rs1, vd | |
vloxei16.v | vs2, rs1, vd |
Spike ISS Implementation:
// vlxei16.v and vlxseg[2-8]e16.v VI_LD_INDEX(e16, true); |
vloxei256.v | vs2, rs1, vd | |
vloxei32.v | vs2, rs1, vd |
Spike ISS Implementation:
// vlxe32.v and vlxseg[2-8]ei32.v VI_LD_INDEX(e32, true); |
vloxei512.v | vs2, rs1, vd | |
vloxei64.v | vs2, rs1, vd |
Spike ISS Implementation:
// vlxei64.v and vlxseg[2-8]ei64.v VI_LD_INDEX(e64, true); |
vloxei8.v | vs2, rs1, vd |
Spike ISS Implementation:
// vlxei8.v and vlxseg[2-8]ei8.v VI_LD_INDEX(e8, true); |
vlse1024.v | rs2, rs1, vd | |
vlse128.v | rs2, rs1, vd | |
vlse16.v | rs2, rs1, vd |
Spike ISS Implementation:
// vlse16.v and vlsseg[2-8]e16.v VI_LD(i * RS2, fn, int16, false); |
vlse256.v | rs2, rs1, vd | |
vlse32.v | rs2, rs1, vd |
Spike ISS Implementation:
// vlse32.v and vlsseg[2-8]e32.v VI_LD(i * RS2, fn, int32, false); |
vlse512.v | rs2, rs1, vd | |
vlse64.v | rs2, rs1, vd |
Spike ISS Implementation:
// vlse64.v and vlsseg[2-8]e64.v VI_LD(i * RS2, fn, int64, false); |
vluxei1024.v | vs2, rs1, vd | |
vluxei128.v | vs2, rs1, vd | |
vluxei16.v | vs2, rs1, vd |
Spike ISS Implementation:
// vlxei16.v and vlxseg[2-8]e16.v VI_LD_INDEX(e16, true); |
vluxei256.v | vs2, rs1, vd | |
vluxei32.v | vs2, rs1, vd |
Spike ISS Implementation:
// vlxe32.v and vlxseg[2-8]ei32.v VI_LD_INDEX(e32, true); |
vluxei512.v | vs2, rs1, vd | |
vluxei64.v | vs2, rs1, vd |
Spike ISS Implementation:
// vlxei64.v and vlxseg[2-8]ei64.v VI_LD_INDEX(e64, true); |
vmv1r.v | vs2, vd |
Spike ISS Implementation:
// vmv1r.v vd, vs2 #include "vmvnfr_v.h" |
vmv2r.v | vs2, vd |
Spike ISS Implementation:
// vmv2r.v vd, vs2 #include "vmvnfr_v.h" |
vmv4r.v | vs2, vd |
Spike ISS Implementation:
// vmv4r.v vd, vs2 #include "vmvnfr_v.h" |
vmv8r.v | vs2, vd |
Spike ISS Implementation:
// vmv8r.v vd, vs2 #include "vmvnfr_v.h" |
vs1r.v | rs1, vs3 |
Spike ISS Implementation:
// vs1r.v vs3, (rs1) VI_ST_WHOLE |
vs2r.v | rs1, vs3 |
Spike ISS Implementation:
// vs2r.v vs3, (rs1) VI_ST_WHOLE |
vs4r.v | rs1, vs3 |
Spike ISS Implementation:
// vs4r.v vs3, (rs1) VI_ST_WHOLE |
vs8r.v | rs1, vs3 |
Spike ISS Implementation:
// vs8r.v vs3, (rs1) VI_ST_WHOLE |
vse1024.v | rs1, vs3 | |
vse128.v | rs1, vs3 | |
vse16.v | rs1, vs3 |
Spike ISS Implementation:
// vse16.v and vsseg[2-8]e16.v VI_ST(0, (i * nf + fn), uint16, false); |
vse256.v | rs1, vs3 | |
vse32.v | rs1, vs3 |
Spike ISS Implementation:
// vse32.v and vsseg[2-8]e32.v VI_ST(0, (i * nf + fn), uint32, false); |
vse512.v | rs1, vs3 | |
vse64.v | rs1, vs3 |
Spike ISS Implementation:
// vse64.v and vsseg[2-8]e64.v VI_ST(0, (i * nf + fn), uint64, false); |
vse8.v | rs1, vs3 |
Spike ISS Implementation:
// vse8.v and vsseg[2-8]e8.v VI_ST(0, (i * nf + fn), uint8, false); |
vsm.v | rs1, vs3 |
Spike ISS Implementation:
// vse1.v VI_ST(0, (i * nf + fn), uint8, true); |
vsoxei1024.v | vs2, rs1, vs3 | |
vsoxei128.v | vs2, rs1, vs3 | |
vsoxei16.v | vs2, rs1, vs3 |
Spike ISS Implementation:
// vsxei16.v and vsxseg[2-8]ei16.v VI_ST_INDEX(e16, true); |
vsoxei256.v | vs2, rs1, vs3 | |
vsoxei32.v | vs2, rs1, vs3 |
Spike ISS Implementation:
// vsxei32.v and vsxseg[2-8]ei32.v VI_ST_INDEX(e32, true); |
vsoxei512.v | vs2, rs1, vs3 | |
vsoxei64.v | vs2, rs1, vs3 |
Spike ISS Implementation:
// vsxei64.v and vsxseg[2-8]ei64.v VI_ST_INDEX(e64, true); |
vsoxei8.v | vs2, rs1, vs3 |
Spike ISS Implementation:
// vsxei8.v and vsxseg[2-8]ei8.v VI_ST_INDEX(e8, true); |
vsse1024.v | rs2, rs1, vs3 | |
vsse128.v | rs2, rs1, vs3 | |
vsse16.v | rs2, rs1, vs3 |
Spike ISS Implementation:
// vsse16v and vssseg[2-8]e16.v VI_ST(i * RS2, fn, uint16, false); |
vsse256.v | rs2, rs1, vs3 | |
vsse32.v | rs2, rs1, vs3 |
Spike ISS Implementation:
// vsse32.v and vssseg[2-8]e32.v VI_ST(i * RS2, fn, uint32, false); |
vsse512.v | rs2, rs1, vs3 | |
vsse64.v | rs2, rs1, vs3 |
Spike ISS Implementation:
// vsse64.v and vssseg[2-8]e64.v VI_ST(i * RS2, fn, uint64, false); |
vsse8.v | rs2, rs1, vs3 |
Spike ISS Implementation:
// vsse8.v and vssseg[2-8]e8.v VI_ST(i * RS2, fn, uint8, false); |
vsuxei1024.v | vs2, rs1, vs3 | |
vsuxei128.v | vs2, rs1, vs3 | |
vsuxei16.v | vs2, rs1, vs3 |
Spike ISS Implementation:
// vsuxe16.v VI_ST_INDEX(e16, true); |
vsuxei256.v | vs2, rs1, vs3 | |
vsuxei32.v | vs2, rs1, vs3 |
Spike ISS Implementation:
// vsuxe32.v VI_ST_INDEX(e32, true); |
vsuxei512.v | vs2, rs1, vs3 | |
vsuxei64.v | vs2, rs1, vs3 |
Spike ISS Implementation:
// vsuxe64.v VI_ST_INDEX(e64, true); |
vsuxei8.v | vs2, rs1, vs3 |
Spike ISS Implementation:
// vsuxe8.v VI_ST_INDEX(e8, true); |
v / _narrowing_floating_pointinteger_type_convert_instructions
- Vector Floating-Point Instructions / 14.19. Narrowing Floating-Point/Integer Type-Convert Instructions
Operation | Arguments | Description |
vfncvt.f.f.w | vs2, vd |
vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd. |
vfncvt.f.x.w | vs2, vd |
vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd. |
vfncvt.f.xu.w | vs2, vd |
vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd. |
vfncvt.rod.f.f.w | vs2, vd |
vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd. |
vfncvt.rtz.x.f.w | vs2, vd |
vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd. |
vfncvt.rtz.xu.f.w | vs2, vd |
vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd. |
vfncvt.x.f.w | vs2, vd |
vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd. |
vfncvt.xu.f.w | vs2, vd |
vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, # rounding towards odd. |
v / _single_width_floating_pointinteger_type_convert_instructions
- Vector Floating-Point Instructions / 14.17. Single-Width Floating-Point/Integer Type-Convert Instructions
Operation | Arguments | Description |
vfcvt.f.x.v | vs2, vd |
vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float. |
vfcvt.f.xu.v | vs2, vd |
vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float. |
vfcvt.rtz.x.f.v | vs2, vd |
vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float. |
vfcvt.rtz.xu.f.v | vs2, vd |
vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float. |
vfcvt.x.f.v | vs2, vd |
vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float. |
vfcvt.xu.f.v | vs2, vd |
vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float. |
v / _state_of_vector_extension_at_reset
- Vector Extension Programmer’s Model / 4.11. State of Vector Extension at Reset
Operation | Arguments | Description |
vsetvl | rs2, rs1, rd |
The vector extension must have a consistent state at reset. In particular, vtype and vl must have values that can be read and then restored with a single vsetvl instruction. Spike ISS Implementation:require_vector_novtype(false); WRITE_RD(P.VU.set_vl(insn.rd(), insn.rs1(), RS1, RS2)); |
v / _unit_stride_fault_only_first_loads
- Vector Loads and Stores / 8.7. Unit-stride Fault-Only-First Loads
Operation | Arguments | Description |
vle8ff.v | rs1, vd |
# Vector unit-stride fault-only-first loads # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) vle8ff.v vd, (rs1), vm # 8-bit unit-stride fault-only-first load vle16ff.v vd, (rs1), vm # 16-bit unit-stride fault-only-first load vle32ff.v vd, (rs1), vm # 32-bit unit-stride fault-only-first load vle64ff.v vd, (rs1), vm # 64-bit unit-stride fault-only-first load Spike ISS Implementation:// vle8ff.v and vlseg[2-8]e8ff.v VI_LDST_FF(int8); |
v / _vector_bitwise_logical_instructions
- Vector Integer Arithmetic Instructions / 12.5. Vector Bitwise Logical Instructions
Operation | Arguments | Description |
vand.vi | vs2, simm5, vd |
# Bitwise logical operations. vand.vv vd, vs2, vs1, vm # Vector-vector vand.vx vd, vs2, rs1, vm # vector-scalar vand.vi vd, vs2, imm, vm # vector-immediate vor.vv vd, vs2, vs1, vm # Vector-vector vor.vx vd, vs2, rs1, vm # vector-scalar vor.vi vd, vs2, imm, vm # vector-immediate vxor.vv vd, vs2, vs1, vm # Vector-vector vxor.vx vd, vs2, rs1, vm # vector-scalar vxor.vi vd, vs2, imm, vm # vector-immediate |
vand.vv | vs2, vs1, vd |
# Bitwise logical operations. vand.vv vd, vs2, vs1, vm # Vector-vector vand.vx vd, vs2, rs1, vm # vector-scalar vand.vi vd, vs2, imm, vm # vector-immediate vor.vv vd, vs2, vs1, vm # Vector-vector vor.vx vd, vs2, rs1, vm # vector-scalar vor.vi vd, vs2, imm, vm # vector-immediate vxor.vv vd, vs2, vs1, vm # Vector-vector vxor.vx vd, vs2, rs1, vm # vector-scalar vxor.vi vd, vs2, imm, vm # vector-immediate |
vand.vx | vs2, rs1, vd |
# Bitwise logical operations. vand.vv vd, vs2, vs1, vm # Vector-vector vand.vx vd, vs2, rs1, vm # vector-scalar vand.vi vd, vs2, imm, vm # vector-immediate vor.vv vd, vs2, vs1, vm # Vector-vector vor.vx vd, vs2, rs1, vm # vector-scalar vor.vi vd, vs2, imm, vm # vector-immediate vxor.vv vd, vs2, vs1, vm # Vector-vector vxor.vx vd, vs2, rs1, vm # vector-scalar vxor.vi vd, vs2, imm, vm # vector-immediate |
v / _vector_compress_instruction
- Vector Permutation Instructions / 17.5. Vector Compress Instruction
Operation | Arguments | Description |
vcompress.vm | vs2, vs1, vd |
vcompress is encoded as an unmasked instruction (vm=1). The equivalent masked instruction (vm=0) is reserved. A trap on a vcompress instruction is always reported with a vstart of 0. Executing a vcompress instruction with a non-zero vstart raises an illegal instruction exception. vcompress.vm vd, vs2, vs1 # Compress into vd elements of vs2 where vs1 is enabled Example use of vcompress instruction 8 7 6 5 4 3 2 1 0 Element number 1 1 0 1 0 0 1 0 1 v0 8 7 6 5 4 3 2 1 0 v1 1 2 3 4 5 6 7 8 9 v2 vcompress.vm v2, v1, v0 1 2 3 4 8 7 5 2 0 v2 |
v / _vector_count_population_in_mask_vcpop_m
- Vector Mask Instructions / 16.2. Vector count population in mask vcpop.m
Operation | Arguments | Description |
vcpop.m | vs2, rd |
The vcpop.m instruction counts the number of mask elements of the active elements of the vector source mask register that have the value 1 and writes the result to a scalar x register. The vcpop.m instruction writes x[rd] even if vl=0 (with the value 0, since no mask elements are active). Traps on vcpop.m are always reported with a vstart of 0. The vcpop.m instruction will raise an illegal instruction exception if vstart is non-zero. vcpop.m rd, vs2, vm vcpop.m rd, vs2, v0.t # x[rd] = sum_i ( vs2.mask[i] && v0.mask[i] ) Spike ISS Implementation:// vmpopc rd, vs2, vm require(P.VU.vsew >= e8 && P.VU.vsew <= e64); require_vector(true); reg_t vl = P.VU.vl->read(); reg_t rs2_num = insn.rs2(); require(P.VU.vstart->read() == 0); reg_t popcount = 0; for (reg_t i=P.VU.vstart->read(); i<vl; ++i) { const int midx = i / 32; const int mpos = i % 32; bool vs2_lsb = ((P.VU.elt<uint32_t>(rs2_num, midx ) >> mpos) & 0x1) == 1; if (insn.v_vm() == 1) { popcount += vs2_lsb; } else { bool do_mask = (P.VU.elt<uint32_t>(0, midx) >> mpos) & 0x1; popcount += (vs2_lsb && do_mask); } } P.VU.vstart->write(0); WRITE_RD(popcount); |
v / _vector_element_index_instruction
- Vector Mask Instructions / 16.9. Vector Element Index Instruction
Operation | Arguments | Description |
vid.v | vd |
The vid.v instruction writes each element's index to the destination vector register group, from 0 to vl-1. vid.v vd, vm # Write element ID to destination. Spike ISS Implementation:// vmpopc rd, vs2, vm require(P.VU.vsew >= e8 && P.VU.vsew <= e64); require_vector(true); reg_t sew = P.VU.vsew; reg_t rd_num = insn.rd(); require_align(rd_num, P.VU.vflmul); require_vm; for (reg_t i = P.VU.vstart->read() ; i < P.VU.vl->read(); ++i) { VI_LOOP_ELEMENT_SKIP(); switch (sew) { case e8: P.VU.elt<uint8_t>(rd_num, i, true) = i; break; case e16: P.VU.elt<uint16_t>(rd_num, i, true) = i; break; case e32: P.VU.elt<uint32_t>(rd_num, i, true) = i; break; default: P.VU.elt<uint64_t>(rd_num, i, true) = i; break; } } P.VU.vstart->write(0); |
v / _vector_floating_point_classify_instruction
- Vector Floating-Point Instructions / 14.14. Vector Floating-Point Classify Instruction
Operation | Arguments | Description |
vfclass.v | vs2, vd |
vfclass.v vd, vs2, vm # Vector-vector Spike ISS Implementation:// vfclass.v vd, vs2, vm VI_VFP_V_LOOP ({ vd = f16(f16_classify(vs2)); }, { vd = f32(f32_classify(vs2)); }, { vd = f64(f64_classify(vs2)); }) |
v / _vector_floating_point_compare_instructions
- Vector Floating-Point Instructions / 14.13. Vector Floating-Point Compare Instructions
Operation | Arguments | Description |
vmfeq.vf | vs2, rs1, vd |
The compare instructions follow the semantics of the scalar floating-point compare instructions. vmfeq and vmfne raise the invalid operation exception only on signaling NaN inputs. vmflt, vmfle, vmfgt, and vmfge raise the invalid operation exception on both signaling and quiet NaN inputs. vmfne writes 1 to the destination element when either operand is NaN, whereas the other compares write 0 when either operand is NaN. # Compare equal vmfeq.vv vd, vs2, vs1, vm # Vector-vector vmfeq.vf vd, vs2, rs1, vm # vector-scalar # Compare not equal vmfne.vv vd, vs2, vs1, vm # Vector-vector vmfne.vf vd, vs2, rs1, vm # vector-scalar # Compare less than vmflt.vv vd, vs2, vs1, vm # Vector-vector vmflt.vf vd, vs2, rs1, vm # vector-scalar # Compare less than or equal vmfle.vv vd, vs2, vs1, vm # Vector-vector vmfle.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than vmfgt.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than or equal vmfge.vf vd, vs2, rs1, vm # vector-scalar # Example of implementing isgreater() vmfeq.vv v0, va, va # Only set where A is not NaN. vmfeq.vv v1, vb, vb # Only set where B is not NaN. vmand.mm v0, v0, v1 # Only set where A and B are ordered, vmfgt.vv v0, va, vb, v0.t # so only set flags on ordered values. |
vmfeq.vv | vs2, vs1, vd |
The compare instructions follow the semantics of the scalar floating-point compare instructions. vmfeq and vmfne raise the invalid operation exception only on signaling NaN inputs. vmflt, vmfle, vmfgt, and vmfge raise the invalid operation exception on both signaling and quiet NaN inputs. vmfne writes 1 to the destination element when either operand is NaN, whereas the other compares write 0 when either operand is NaN. # Compare equal vmfeq.vv vd, vs2, vs1, vm # Vector-vector vmfeq.vf vd, vs2, rs1, vm # vector-scalar # Compare not equal vmfne.vv vd, vs2, vs1, vm # Vector-vector vmfne.vf vd, vs2, rs1, vm # vector-scalar # Compare less than vmflt.vv vd, vs2, vs1, vm # Vector-vector vmflt.vf vd, vs2, rs1, vm # vector-scalar # Compare less than or equal vmfle.vv vd, vs2, vs1, vm # Vector-vector vmfle.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than vmfgt.vf vd, vs2, rs1, vm # vector-scalar # Compare greater than or equal vmfge.vf vd, vs2, rs1, vm # vector-scalar # Example of implementing isgreater() vmfeq.vv v0, va, va # Only set where A is not NaN. vmfeq.vv v1, vb, vb # Only set where B is not NaN. vmand.mm v0, v0, v1 # Only set where A and B are ordered, vmfgt.vv v0, va, vb, v0.t # so only set flags on ordered values. |
vmflt.vf | vs2, rs1, vd |
Comparison Assembler Mapping Assembler pseudoinstruction va < vb vmflt.vv vd, va, vb, vm va <= vb vmfle.vv vd, va, vb, vm va > vb vmflt.vv vd, vb, va, vm vmfgt.vv vd, va, vb, vm va >= vb vmfle.vv vd, vb, va, vm vmfge.vv vd, va, vb, vm va < f vmflt.vf vd, va, f, vm va <= f vmfle.vf vd, va, f, vm va > f vmfgt.vf vd, va, f, vm va >= f vmfge.vf vd, va, f, vm va, vb vector register groups f scalar floating-point register |
vmflt.vv | vs2, vs1, vd |
Comparison Assembler Mapping Assembler pseudoinstruction va < vb vmflt.vv vd, va, vb, vm va <= vb vmfle.vv vd, va, vb, vm va > vb vmflt.vv vd, vb, va, vm vmfgt.vv vd, va, vb, vm va >= vb vmfle.vv vd, vb, va, vm vmfge.vv vd, va, vb, vm va < f vmflt.vf vd, va, f, vm va <= f vmfle.vf vd, va, f, vm va > f vmfgt.vf vd, va, f, vm va >= f vmfge.vf vd, va, f, vm va, vb vector register groups f scalar floating-point register |
v / _vector_floating_point_merge_instruction
- Vector Floating-Point Instructions / 14.15. Vector Floating-Point Merge Instruction
Operation | Arguments | Description |
vfmerge.vfm | vs2, rs1, vd |
The vfmerge.vfm instruction is encoded as a masked instruction (vm=0). At elements where the mask value is zero, the first vector operand is copied to the destination element, otherwise a scalar floating-point register value is copied to the destination element. vfmerge.vfm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? f[rs1] : vs2[i] |
v / _vector_floating_point_minmax_instructions
- Vector Floating-Point Instructions / 14.11. Vector Floating-Point MIN/MAX Instructions
Operation | Arguments | Description |
vfmin.vf | vs2, rs1, vd |
The vector floating-point vfmin and vfmax instructions have the same behavior as the corresponding scalar floating-point instructions in version 2.2 of the RISC-V F/D/Q extension. # Floating-point minimum vfmin.vv vd, vs2, vs1, vm # Vector-vector vfmin.vf vd, vs2, rs1, vm # vector-scalar # Floating-point maximum vfmax.vv vd, vs2, vs1, vm # Vector-vector vfmax.vf vd, vs2, rs1, vm # vector-scalar |
vfmin.vv | vs2, vs1, vd |
The vector floating-point vfmin and vfmax instructions have the same behavior as the corresponding scalar floating-point instructions in version 2.2 of the RISC-V F/D/Q extension. # Floating-point minimum vfmin.vv vd, vs2, vs1, vm # Vector-vector vfmin.vf vd, vs2, rs1, vm # vector-scalar # Floating-point maximum vfmax.vv vd, vs2, vs1, vm # Vector-vector vfmax.vf vd, vs2, rs1, vm # vector-scalar |
v / _vector_floating_point_move_instruction
- Vector Floating-Point Instructions / 14.16. Vector Floating-Point Move Instruction
Operation | Arguments | Description |
vfmv.f.s | vs2, rd |
vfmv.v.f vd, rs1 # vd[i] = f[rs1] |
vfmv.s.f | rs1, vd |
vfmv.v.f vd, rs1 # vd[i] = f[rs1] |
vfmv.v.f | rs1, vd |
vfmv.v.f vd, rs1 # vd[i] = f[rs1] |
v / _vector_floating_point_reciprocal_estimate_instruction
- Vector Floating-Point Instructions / 14.10. Vector Floating-Point Reciprocal Estimate Instruction
Operation | Arguments | Description |
vfrec7.v | vs2, vd |
Table 17. vfrec7.v common-case lookup table contents # Floating-point reciprocal estimate to 7 bits. vfrec7.v vd, vs2, vm Spike ISS Implementation:// vfclass.v vd, vs2, vm VI_VFP_V_LOOP ({ vd = f16_recip7(vs2); }, { vd = f32_recip7(vs2); }, { vd = f64_recip7(vs2); }) |
v / _vector_floating_point_reciprocal_square_root_estimate_instruction
- Vector Floating-Point Instructions / 14.9. Vector Floating-Point Reciprocal Square-Root Estimate Instruction
Operation | Arguments | Description |
vfrsqrt7.v | vs2, vd |
Table 16. vfrsqrt7.v common-case lookup table contents # Floating-point reciprocal square-root estimate to 7 bits. vfrsqrt7.v vd, vs2, vm Spike ISS Implementation:// vfclass.v vd, vs2, vm VI_VFP_V_LOOP ({ vd = f16_rsqrte7(vs2); }, { vd = f32_rsqrte7(vs2); }, { vd = f64_rsqrte7(vs2); }) |
v / _vector_floating_point_sign_injection_instructions
- Vector Floating-Point Instructions / 14.12. Vector Floating-Point Sign-Injection Instructions
Operation | Arguments | Description |
vfsgnj.vf | vs2, rs1, vd |
vfsgnj.vv vd, vs2, vs1, vm # Vector-vector vfsgnj.vf vd, vs2, rs1, vm # vector-scalar vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar |
vfsgnj.vv | vs2, vs1, vd |
vfsgnj.vv vd, vs2, vs1, vm # Vector-vector vfsgnj.vf vd, vs2, rs1, vm # vector-scalar vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar |
v / _vector_floating_point_square_root_instruction
- Vector Floating-Point Instructions / 14.8. Vector Floating-Point Square-Root Instruction
Operation | Arguments | Description |
vfsqrt.v | vs2, vd |
# Floating-point square root vfsqrt.v vd, vs2, vm # Vector-vector square root Spike ISS Implementation:// vsqrt.v vd, vd2, vm VI_VFP_V_LOOP ({ vd = f16_sqrt(vs2); }, { vd = f32_sqrt(vs2); }, { vd = f64_sqrt(vs2); }) |
v / _vector_indexed_instructions
- Vector Loads and Stores / 8.6. Vector Indexed Instructions
Operation | Arguments | Description |
vluxei8.v | vs2, rs1, vd |
# Vector indexed loads and stores # Vector indexed-unordered load instructions # vd destination, rs1 base address, vs2 byte offsets vluxei8.v vd, (rs1), vs2, vm # unordered 8-bit indexed load of SEW data vluxei16.v vd, (rs1), vs2, vm # unordered 16-bit indexed load of SEW data vluxei32.v vd, (rs1), vs2, vm # unordered 32-bit indexed load of SEW data vluxei64.v vd, (rs1), vs2, vm # unordered 64-bit indexed load of SEW data # Vector indexed-ordered load instructions # vd destination, rs1 base address, vs2 byte offsets vloxei8.v vd, (rs1), vs2, vm # ordered 8-bit indexed load of SEW data vloxei16.v vd, (rs1), vs2, vm # ordered 16-bit indexed load of SEW data vloxei32.v vd, (rs1), vs2, vm # ordered 32-bit indexed load of SEW data vloxei64.v vd, (rs1), vs2, vm # ordered 64-bit indexed load of SEW data # Vector indexed-unordered store instructions # vs3 store data, rs1 base address, vs2 byte offsets vsuxei8.v vs3, (rs1), vs2, vm # unordered 8-bit indexed store of SEW data vsuxei16.v vs3, (rs1), vs2, vm # unordered 16-bit indexed store of SEW data vsuxei32.v vs3, (rs1), vs2, vm # unordered 32-bit indexed store of SEW data vsuxei64.v vs3, (rs1), vs2, vm # unordered 64-bit indexed store of SEW data # Vector indexed-ordered store instructions # vs3 store data, rs1 base address, vs2 byte offsets vsoxei8.v vs3, (rs1), vs2, vm # ordered 8-bit indexed store of SEW data vsoxei16.v vs3, (rs1), vs2, vm # ordered 16-bit indexed store of SEW data vsoxei32.v vs3, (rs1), vs2, vm # ordered 32-bit indexed store of SEW data vsoxei64.v vs3, (rs1), vs2, vm # ordered 64-bit indexed store of SEW data Spike ISS Implementation:// vlxei8.v and vlxseg[2-8]ei8.v VI_LD_INDEX(e8, true); |
v / _vector_instruction_formats
- Vector Instruction Formats /
Operation | Arguments | Description |
vsetivli | zimm10, zimm, rd |
{reg: [ {bits: 7, name: 0x57, attr: 'vsetivli'}, {bits: 5, name: 'rd', type: 4}, {bits: 3, name: 7}, {bits: 5, name: 'uimm[4:0]', type: 5}, {bits: 10, name: 'zimm[9:0]', type: 5}, {bits: 1, name: '1'}, {bits: 1, name: '1'}, ]} Spike ISS Implementation:require_vector_novtype(false); WRITE_RD(P.VU.set_vl(insn.rd(), -1, insn.rs1(), insn.v_zimm10())); |
v / _vector_instruction_listing
- Vector Instruction Listing /
Operation | Arguments | Description |
vaadd.vv | vs2, vs1, vd |
vaadd |
vaadd.vx | vs2, rs1, vd |
vaadd |
vasub.vv | vs2, vs1, vd |
vasub |
vasub.vx | vs2, rs1, vd |
vasub |
vasubu.vv | vs2, vs1, vd |
vasubu |
vasubu.vx | vs2, rs1, vd |
vasubu |
vdiv.vv | vs2, vs1, vd |
vdiv |
vdiv.vx | vs2, rs1, vd |
vdiv |
vfdiv.vf | vs2, rs1, vd |
vfdiv |
vfdiv.vv | vs2, vs1, vd |
vfdiv |
vfmadd.vf | vs2, rs1, vd |
vfmadd |
vfmadd.vv | vs2, vs1, vd |
vfmadd |
vfmax.vf | vs2, rs1, vd |
vfmax |
vfmax.vv | vs2, vs1, vd |
vfmax |
vfmsac.vf | vs2, rs1, vd |
vfmsac |
vfmsac.vv | vs2, vs1, vd |
vfmsac |
vfmsub.vf | vs2, rs1, vd |
vfmsub |
vfmsub.vv | vs2, vs1, vd |
vfmsub |
vfnmacc.vf | vs2, rs1, vd |
vfnmacc |
vfnmacc.vv | vs2, vs1, vd |
vfnmacc |
vfnmadd.vf | vs2, rs1, vd |
vfnmadd |
vfnmadd.vv | vs2, vs1, vd |
vfnmadd |
vfnmsac.vf | vs2, rs1, vd |
vfnmsac |
vfnmsac.vv | vs2, vs1, vd |
vfnmsac |
vfnmsub.vf | vs2, rs1, vd |
vfnmsub |
vfnmsub.vv | vs2, vs1, vd |
vfnmsub |
vfrdiv.vf | vs2, rs1, vd |
vfrdiv |
vfredmax.vs | vs2, vs1, vd |
vfredmax |
vfredmin.vs | vs2, vs1, vd |
vfredmin |
vfrsub.vf | vs2, rs1, vd |
vfrsub |
vfsgnjn.vf | vs2, rs1, vd |
vfsgnjn |
vfsgnjn.vv | vs2, vs1, vd |
vfsgnjn |
vfsgnjx.vf | vs2, rs1, vd |
vfsgnjx |
vfsgnjx.vv | vs2, vs1, vd |
vfsgnjx |
vfsub.vf | vs2, rs1, vd |
vfsub |
vfsub.vv | vs2, vs1, vd |
vfsub |
vfwmsac.vf | vs2, rs1, vd |
vfwmsac |
vfwmsac.vv | vs2, vs1, vd |
vfwmsac |
vfwnmacc.vf | vs2, rs1, vd |
vfwnmacc |
vfwnmacc.vv | vs2, vs1, vd |
vfwnmacc |
vfwnmsac.vf | vs2, rs1, vd |
vfwnmsac |
vfwnmsac.vv | vs2, vs1, vd |
vfwnmsac |
vfwredusum.vs | vs2, vs1, vd |
vfwredusum |
vfwsub.vf | vs2, rs1, vd |
vfwsub vfwsub.w |
vfwsub.vv | vs2, vs1, vd |
vfwsub vfwsub.w |
vfwsub.wf | vs2, rs1, vd |
vfwsub vfwsub.w |
vfwsub.wv | vs2, vs1, vd |
vfwsub vfwsub.w |
vmadd.vv | vs2, vs1, vd |
vmadd |
vmadd.vx | vs2, rs1, vd |
vmadd |
vmax.vv | vs2, vs1, vd |
vmax |
vmax.vx | vs2, rs1, vd |
vmax |
vmaxu.vv | vs2, vs1, vd |
vmaxu |
vmaxu.vx | vs2, rs1, vd |
vmaxu |
vmfge.vf | vs2, rs1, vd |
vmfge |
vmfgt.vf | vs2, rs1, vd |
vmfgt |
vmfle.vf | vs2, rs1, vd |
vmfle |
vmfle.vv | vs2, vs1, vd |
vmfle |
vmfne.vf | vs2, rs1, vd |
vmfne |
vmfne.vv | vs2, vs1, vd |
vmfne |
vmin.vv | vs2, vs1, vd |
vmin |
vmin.vx | vs2, rs1, vd |
vmin |
vmor.mm | vs2, vs1, vd |
vmor |
vmsgtu.vi | vs2, simm5, vd |
vmsgtu |
vmsgtu.vx | vs2, rs1, vd |
vmsgtu |
vmsle.vi | vs2, simm5, vd |
vmsle |
vmsle.vv | vs2, vs1, vd |
vmsle |
vmsle.vx | vs2, rs1, vd |
vmsle |
vmsleu.vi | vs2, simm5, vd |
vmsleu |
vmsleu.vv | vs2, vs1, vd |
vmsleu |
vmsleu.vx | vs2, rs1, vd |
vmsleu |
vmsltu.vv | vs2, vs1, vd |
vmsltu |
vmsltu.vx | vs2, rs1, vd |
vmsltu |
vmsne.vi | vs2, simm5, vd |
vmsne |
vmsne.vv | vs2, vs1, vd |
vmsne |
vmsne.vx | vs2, rs1, vd |
vmsne |
vmulhsu.vv | vs2, vs1, vd |
vmulhsu |
vmulhsu.vx | vs2, rs1, vd |
vmulhsu |
vmulhu.vv | vs2, vs1, vd |
vmulhu |
vmulhu.vx | vs2, rs1, vd |
vmulhu |
vnmsac.vv | vs2, vs1, vd |
vnmsac |
vnmsac.vx | vs2, rs1, vd |
vnmsac |
vnmsub.vv | vs2, vs1, vd |
vnmsub |
vnmsub.vx | vs2, rs1, vd |
vnmsub |
vor.vi | vs2, simm5, vd |
vor |
vor.vv | vs2, vs1, vd |
vor |
vor.vx | vs2, rs1, vd |
vor |
vredand.vs | vs2, vs1, vd |
vredand |
vredmax.vs | vs2, vs1, vd |
vredmax |
vredmaxu.vs | vs2, vs1, vd |
vredmaxu |
vredmin.vs | vs2, vs1, vd |
vredmin |
vredminu.vs | vs2, vs1, vd |
vredminu |
vredor.vs | vs2, vs1, vd |
vredor |
vredxor.vs | vs2, vs1, vd |
vredxor |
vrem.vv | vs2, vs1, vd |
vrem |
vrem.vx | vs2, rs1, vd |
vrem |
vremu.vv | vs2, vs1, vd |
vremu |
vremu.vx | vs2, rs1, vd |
vremu |
vrgatherei16.vv | vs2, vs1, vd |
vrgatherei16 |
vrsub.vi | vs2, simm5, vd |
vrsub |
vrsub.vx | vs2, rs1, vd |
vrsub |
vsadd.vi | vs2, simm5, vd |
vsadd |
vsadd.vv | vs2, vs1, vd |
vsadd |
vsadd.vx | vs2, rs1, vd |
vsadd |
vsext.vf2 | vs2, vd |
vsext.vf8 vsext.vf4 vsext.vf2 |
vsext.vf4 | vs2, vd |
vsext.vf8 vsext.vf4 vsext.vf2 |
vsext.vf8 | vs2, vd |
vsext.vf8 vsext.vf4 vsext.vf2 |
vsra.vi | vs2, simm5, vd |
vsra |
vsra.vv | vs2, vs1, vd |
vsra |
vsra.vx | vs2, rs1, vd |
vsra |
vsrl.vi | vs2, simm5, vd |
vsrl |
vsrl.vv | vs2, vs1, vd |
vsrl |
vsrl.vx | vs2, rs1, vd |
vsrl |
vssra.vi | vs2, simm5, vd |
vssra |
vssra.vv | vs2, vs1, vd |
vssra |
vssra.vx | vs2, rs1, vd |
vssra |
vssub.vv | vs2, vs1, vd |
vssub |
vssub.vx | vs2, rs1, vd |
vssub |
vssubu.vv | vs2, vs1, vd |
vssubu |
vssubu.vx | vs2, rs1, vd |
vssubu |
vsub.vv | vs2, vs1, vd |
vsub |
vsub.vx | vs2, rs1, vd |
vsub |
vwadd.vv | vs2, vs1, vd |
vwadd vwadd.w |
vwadd.vx | vs2, rs1, vd |
vwadd vwadd.w |
vwadd.wv | vs2, vs1, vd |
vwadd vwadd.w |
vwadd.wx | vs2, rs1, vd |
vwadd vwadd.w |
vwmacc.vv | vs2, vs1, vd |
vwmacc |
vwmacc.vx | vs2, rs1, vd |
vwmacc |
vwmaccsu.vv | vs2, vs1, vd |
vwmaccsu |
vwmaccsu.vx | vs2, rs1, vd |
vwmaccsu |
vwmaccus.vx | vs2, rs1, vd |
vwmaccus |
vwmulsu.vv | vs2, vs1, vd |
vwmulsu |
vwmulsu.vx | vs2, rs1, vd |
vwmulsu |
vwmulu.vv | vs2, vs1, vd |
vwmulu |
vwmulu.vx | vs2, rs1, vd |
vwmulu |
vwsub.vv | vs2, vs1, vd |
vwsub vwsub.w |
vwsub.vx | vs2, rs1, vd |
vwsub vwsub.w |
vwsub.wv | vs2, vs1, vd |
vwsub vwsub.w |
vwsub.wx | vs2, rs1, vd |
vwsub vwsub.w |
vwsubu.vv | vs2, vs1, vd |
vwsubu vwsubu.w |
vwsubu.vx | vs2, rs1, vd |
vwsubu vwsubu.w |
vwsubu.wv | vs2, vs1, vd |
vwsubu vwsubu.w |
vwsubu.wx | vs2, rs1, vd |
vwsubu vwsubu.w |
vxor.vi | vs2, simm5, vd |
vxor |
vxor.vv | vs2, vs1, vd |
vxor |
vxor.vx | vs2, rs1, vd |
vxor |
v / _vector_integer_add_with_carry_subtract_with_borrow_instructions
- Vector Integer Arithmetic Instructions / 12.4. Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions
Operation | Arguments | Description |
vadc.vim | vs2, simm5, vd |
vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd. These instructions are encoded as masked instructions (vm=0), but they operate on and write back all body elements. Encodings corresponding to the unmasked versions (vm=1) are reserved. For vadc and vsbc, the instruction encoding is reserved if the destination vector register is v0. # Produce sum with carry. # vd[i] = vs2[i] + vs1[i] + v0.mask[i] vadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] + x[rs1] + v0.mask[i] vadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd[i] = vs2[i] + imm + v0.mask[i] vadc.vim vd, vs2, imm, v0 # Vector-immediate # Produce carry out in mask register format # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) vmadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) vmadc.vim vd, vs2, imm, v0 # Vector-immediate # vd.mask[i] = carry_out(vs2[i] + vs1[i]) vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in # vd.mask[i] = carry_out(vs2[i] + x[rs1]) vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in # vd.mask[i] = carry_out(vs2[i] + imm) vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in |
vadc.vvm | vs2, vs1, vd |
vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd. These instructions are encoded as masked instructions (vm=0), but they operate on and write back all body elements. Encodings corresponding to the unmasked versions (vm=1) are reserved. For vadc and vsbc, the instruction encoding is reserved if the destination vector register is v0. # Produce sum with carry. # vd[i] = vs2[i] + vs1[i] + v0.mask[i] vadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] + x[rs1] + v0.mask[i] vadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd[i] = vs2[i] + imm + v0.mask[i] vadc.vim vd, vs2, imm, v0 # Vector-immediate # Produce carry out in mask register format # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) vmadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) vmadc.vim vd, vs2, imm, v0 # Vector-immediate # vd.mask[i] = carry_out(vs2[i] + vs1[i]) vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in # vd.mask[i] = carry_out(vs2[i] + x[rs1]) vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in # vd.mask[i] = carry_out(vs2[i] + imm) vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in |
vadc.vxm | vs2, rs1, vd |
vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd. These instructions are encoded as masked instructions (vm=0), but they operate on and write back all body elements. Encodings corresponding to the unmasked versions (vm=1) are reserved. For vadc and vsbc, the instruction encoding is reserved if the destination vector register is v0. # Produce sum with carry. # vd[i] = vs2[i] + vs1[i] + v0.mask[i] vadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] + x[rs1] + v0.mask[i] vadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd[i] = vs2[i] + imm + v0.mask[i] vadc.vim vd, vs2, imm, v0 # Vector-immediate # Produce carry out in mask register format # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) vmadc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) vmadc.vim vd, vs2, imm, v0 # Vector-immediate # vd.mask[i] = carry_out(vs2[i] + vs1[i]) vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in # vd.mask[i] = carry_out(vs2[i] + x[rs1]) vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in # vd.mask[i] = carry_out(vs2[i] + imm) vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in |
vmadc.vi | vs2, simm5, vd |
vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word |
vmadc.vim | vs2, simm5, vd |
vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word |
vmadc.vv | vs2, vs1, vd |
vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word |
vmadc.vvm | vs2, vs1, vd |
vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word |
vmadc.vx | vs2, rs1, vd |
vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word |
vmadc.vxm | vs2, rs1, vd |
vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (vm=0), and write the result back to mask register vd. If unmasked (vm=1), there is no carry-in or borrow-in. These instructions operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. # Example multi-word arithmetic sequence, accumulating into v4 vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 vadc.vvm v4, v4, v8, v0 # Calc new sum vmmv.m v0, v1 # Move temp carry into v0 for next word |
vmsbc.vv | vs2, vs1, vd |
For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative. |
vmsbc.vvm | vs2, vs1, vd |
For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative. |
vmsbc.vx | vs2, rs1, vd |
For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative. |
vmsbc.vxm | vs2, rs1, vd |
For vmsbc, the borrow is defined to be 1 iff the difference, prior to truncation, is negative. |
vsbc.vvm | vs2, vs1, vd |
The subtract with borrow instruction vsbc performs the equivalent function to support long word arithmetic for subtraction. There are no subtract with immediate instructions. # Produce difference with borrow. # vd[i] = vs2[i] - vs1[i] - v0.mask[i] vsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] - x[rs1] - v0.mask[i] vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # Produce borrow out in mask register format # vd.mask[i] = borrow_out(vs2[i] - vs1[i] - v0.mask[i]) vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = borrow_out(vs2[i] - x[rs1] - v0.mask[i]) vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = borrow_out(vs2[i] - vs1[i]) vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in # vd.mask[i] = borrow_out(vs2[i] - x[rs1]) vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in |
vsbc.vxm | vs2, rs1, vd |
The subtract with borrow instruction vsbc performs the equivalent function to support long word arithmetic for subtraction. There are no subtract with immediate instructions. # Produce difference with borrow. # vd[i] = vs2[i] - vs1[i] - v0.mask[i] vsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd[i] = vs2[i] - x[rs1] - v0.mask[i] vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # Produce borrow out in mask register format # vd.mask[i] = borrow_out(vs2[i] - vs1[i] - v0.mask[i]) vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector # vd.mask[i] = borrow_out(vs2[i] - x[rs1] - v0.mask[i]) vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar # vd.mask[i] = borrow_out(vs2[i] - vs1[i]) vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in # vd.mask[i] = borrow_out(vs2[i] - x[rs1]) vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in |
v / _vector_integer_compare_instructions
- Vector Integer Arithmetic Instructions / 12.8. Vector Integer Compare Instructions
Operation | Arguments | Description |
vmseq.vi | vs2, simm5, vd |
# Set if equal vmseq.vv vd, vs2, vs1, vm # Vector-vector vmseq.vx vd, vs2, rs1, vm # vector-scalar vmseq.vi vd, vs2, imm, vm # vector-immediate # Set if not equal vmsne.vv vd, vs2, vs1, vm # Vector-vector vmsne.vx vd, vs2, rs1, vm # vector-scalar vmsne.vi vd, vs2, imm, vm # vector-immediate # Set if less than, unsigned vmsltu.vv vd, vs2, vs1, vm # Vector-vector vmsltu.vx vd, vs2, rs1, vm # Vector-scalar # Set if less than, signed vmslt.vv vd, vs2, vs1, vm # Vector-vector vmslt.vx vd, vs2, rs1, vm # vector-scalar # Set if less than or equal, unsigned vmsleu.vv vd, vs2, vs1, vm # Vector-vector vmsleu.vx vd, vs2, rs1, vm # vector-scalar vmsleu.vi vd, vs2, imm, vm # Vector-immediate # Set if less than or equal, signed vmsle.vv vd, vs2, vs1, vm # Vector-vector vmsle.vx vd, vs2, rs1, vm # vector-scalar vmsle.vi vd, vs2, imm, vm # vector-immediate # Set if greater than, unsigned vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar vmsgtu.vi vd, vs2, imm, vm # Vector-immediate # Set if greater than, signed vmsgt.vx vd, vs2, rs1, vm # Vector-scalar vmsgt.vi vd, vs2, imm, vm # Vector-immediate # Following two instructions are not provided directly # Set if greater than or equal, unsigned # vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar # Set if greater than or equal, signed # vmsge.vx vd, vs2, rs1, vm # Vector-scalar |
vmseq.vv | vs2, vs1, vd |
# Set if equal vmseq.vv vd, vs2, vs1, vm # Vector-vector vmseq.vx vd, vs2, rs1, vm # vector-scalar vmseq.vi vd, vs2, imm, vm # vector-immediate # Set if not equal vmsne.vv vd, vs2, vs1, vm # Vector-vector vmsne.vx vd, vs2, rs1, vm # vector-scalar vmsne.vi vd, vs2, imm, vm # vector-immediate # Set if less than, unsigned vmsltu.vv vd, vs2, vs1, vm # Vector-vector vmsltu.vx vd, vs2, rs1, vm # Vector-scalar # Set if less than, signed vmslt.vv vd, vs2, vs1, vm # Vector-vector vmslt.vx vd, vs2, rs1, vm # vector-scalar # Set if less than or equal, unsigned vmsleu.vv vd, vs2, vs1, vm # Vector-vector vmsleu.vx vd, vs2, rs1, vm # vector-scalar vmsleu.vi vd, vs2, imm, vm # Vector-immediate # Set if less than or equal, signed vmsle.vv vd, vs2, vs1, vm # Vector-vector vmsle.vx vd, vs2, rs1, vm # vector-scalar vmsle.vi vd, vs2, imm, vm # vector-immediate # Set if greater than, unsigned vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar vmsgtu.vi vd, vs2, imm, vm # Vector-immediate # Set if greater than, signed vmsgt.vx vd, vs2, rs1, vm # Vector-scalar vmsgt.vi vd, vs2, imm, vm # Vector-immediate # Following two instructions are not provided directly # Set if greater than or equal, unsigned # vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar # Set if greater than or equal, signed # vmsge.vx vd, vs2, rs1, vm # Vector-scalar |
vmseq.vx | vs2, rs1, vd |
# Set if equal vmseq.vv vd, vs2, vs1, vm # Vector-vector vmseq.vx vd, vs2, rs1, vm # vector-scalar vmseq.vi vd, vs2, imm, vm # vector-immediate # Set if not equal vmsne.vv vd, vs2, vs1, vm # Vector-vector vmsne.vx vd, vs2, rs1, vm # vector-scalar vmsne.vi vd, vs2, imm, vm # vector-immediate # Set if less than, unsigned vmsltu.vv vd, vs2, vs1, vm # Vector-vector vmsltu.vx vd, vs2, rs1, vm # Vector-scalar # Set if less than, signed vmslt.vv vd, vs2, vs1, vm # Vector-vector vmslt.vx vd, vs2, rs1, vm # vector-scalar # Set if less than or equal, unsigned vmsleu.vv vd, vs2, vs1, vm # Vector-vector vmsleu.vx vd, vs2, rs1, vm # vector-scalar vmsleu.vi vd, vs2, imm, vm # Vector-immediate # Set if less than or equal, signed vmsle.vv vd, vs2, vs1, vm # Vector-vector vmsle.vx vd, vs2, rs1, vm # vector-scalar vmsle.vi vd, vs2, imm, vm # vector-immediate # Set if greater than, unsigned vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar vmsgtu.vi vd, vs2, imm, vm # Vector-immediate # Set if greater than, signed vmsgt.vx vd, vs2, rs1, vm # Vector-scalar vmsgt.vi vd, vs2, imm, vm # Vector-immediate # Following two instructions are not provided directly # Set if greater than or equal, unsigned # vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar # Set if greater than or equal, signed # vmsge.vx vd, vs2, rs1, vm # Vector-scalar |
vmsgt.vi | vs2, simm5, vd |
Similarly, vmsge{u}.vi is not provided and the compare is implemented using vmsgt{u}.vi with the immediate decremented by one. The resulting effective vmsge.vi range is -15 to 16, and the resulting effective vmsgeu.vi range is 1 to 16 (Note, vmsgeu.vi with immediate 0 is not useful as it is always true). The vmsge{u}.vx operation can be synthesized by reducing the value of x by 1 and using the vmsgt{u}.vx instruction, when it is known that this will not underflow the representation in x. Sequences to synthesize `vmsge{u}.vx` instruction va >= x, x > minimum addi t0, x, -1; vmsgt{u}.vx vd, va, t0, vm |
vmsgt.vx | vs2, rs1, vd |
Similarly, vmsge{u}.vi is not provided and the compare is implemented using vmsgt{u}.vi with the immediate decremented by one. The resulting effective vmsge.vi range is -15 to 16, and the resulting effective vmsgeu.vi range is 1 to 16 (Note, vmsgeu.vi with immediate 0 is not useful as it is always true). The vmsge{u}.vx operation can be synthesized by reducing the value of x by 1 and using the vmsgt{u}.vx instruction, when it is known that this will not underflow the representation in x. Sequences to synthesize `vmsge{u}.vx` instruction va >= x, x > minimum addi t0, x, -1; vmsgt{u}.vx vd, va, t0, vm |
vmslt.vv | vs2, vs1, vd |
Comparison Assembler Mapping Assembler Pseudoinstruction va < vb vmslt{u}.vv vd, va, vb, vm va <= vb vmsle{u}.vv vd, va, vb, vm va > vb vmslt{u}.vv vd, vb, va, vm vmsgt{u}.vv vd, va, vb, vm va >= vb vmsle{u}.vv vd, vb, va, vm vmsge{u}.vv vd, va, vb, vm va < x vmslt{u}.vx vd, va, x, vm va <= x vmsle{u}.vx vd, va, x, vm va > x vmsgt{u}.vx vd, va, x, vm va >= x see below va < i vmsle{u}.vi vd, va, i-1, vm vmslt{u}.vi vd, va, i, vm va <= i vmsle{u}.vi vd, va, i, vm va > i vmsgt{u}.vi vd, va, i, vm va >= i vmsgt{u}.vi vd, va, i-1, vm vmsge{u}.vi vd, va, i, vm va, vb vector register groups x scalar integer register i immediate unmasked va >= x pseudoinstruction: vmsge{u}.vx vd, va, x expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd masked va >= x, vd != v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 masked va >= x, vd == v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vd, vd, vt masked va >= x, any vd pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vt, v0, vt; vmandn.mm vd, vd, v0; vmor.mm vd, vt, vd The vt argument to the pseudoinstruction must name a temporary vector register that is not same as vd and which will be clobbered by the pseudoinstruction # (a < b) && (b < c) in two instructions when mask-undisturbed vmslt.vv v0, va, vb # All body elements written vmslt.vv v0, vb, vc, v0.t # Only update at set mask |
vmslt.vx | vs2, rs1, vd |
Comparison Assembler Mapping Assembler Pseudoinstruction va < vb vmslt{u}.vv vd, va, vb, vm va <= vb vmsle{u}.vv vd, va, vb, vm va > vb vmslt{u}.vv vd, vb, va, vm vmsgt{u}.vv vd, va, vb, vm va >= vb vmsle{u}.vv vd, vb, va, vm vmsge{u}.vv vd, va, vb, vm va < x vmslt{u}.vx vd, va, x, vm va <= x vmsle{u}.vx vd, va, x, vm va > x vmsgt{u}.vx vd, va, x, vm va >= x see below va < i vmsle{u}.vi vd, va, i-1, vm vmslt{u}.vi vd, va, i, vm va <= i vmsle{u}.vi vd, va, i, vm va > i vmsgt{u}.vi vd, va, i, vm va >= i vmsgt{u}.vi vd, va, i-1, vm vmsge{u}.vi vd, va, i, vm va, vb vector register groups x scalar integer register i immediate unmasked va >= x pseudoinstruction: vmsge{u}.vx vd, va, x expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd masked va >= x, vd != v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 masked va >= x, vd == v0 pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vd, vd, vt masked va >= x, any vd pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt expansion: vmslt{u}.vx vt, va, x; vmandn.mm vt, v0, vt; vmandn.mm vd, vd, v0; vmor.mm vd, vt, vd The vt argument to the pseudoinstruction must name a temporary vector register that is not same as vd and which will be clobbered by the pseudoinstruction # (a < b) && (b < c) in two instructions when mask-undisturbed vmslt.vv v0, va, vb # All body elements written vmslt.vv v0, vb, vc, v0.t # Only update at set mask |
v / _vector_integer_divide_instructions
- Vector Integer Arithmetic Instructions / 12.11. Vector Integer Divide Instructions
Operation | Arguments | Description |
vdivu.vv | vs2, vs1, vd |
# Unsigned divide. vdivu.vv vd, vs2, vs1, vm # Vector-vector vdivu.vx vd, vs2, rs1, vm # vector-scalar # Signed divide vdiv.vv vd, vs2, vs1, vm # Vector-vector vdiv.vx vd, vs2, rs1, vm # vector-scalar # Unsigned remainder vremu.vv vd, vs2, vs1, vm # Vector-vector vremu.vx vd, vs2, rs1, vm # vector-scalar # Signed remainder vrem.vv vd, vs2, vs1, vm # Vector-vector vrem.vx vd, vs2, rs1, vm # vector-scalar |
vdivu.vx | vs2, rs1, vd |
# Unsigned divide. vdivu.vv vd, vs2, vs1, vm # Vector-vector vdivu.vx vd, vs2, rs1, vm # vector-scalar # Signed divide vdiv.vv vd, vs2, vs1, vm # Vector-vector vdiv.vx vd, vs2, rs1, vm # vector-scalar # Unsigned remainder vremu.vv vd, vs2, vs1, vm # Vector-vector vremu.vx vd, vs2, rs1, vm # vector-scalar # Signed remainder vrem.vv vd, vs2, vs1, vm # Vector-vector vrem.vx vd, vs2, rs1, vm # vector-scalar |
v / _vector_integer_merge_instructions
- Vector Integer Arithmetic Instructions / 12.15. Vector Integer Merge Instructions
Operation | Arguments | Description |
vmerge.vim | vs2, simm5, vd |
The vmerge instructions are encoded as masked instructions (vm=0). The instructions combine two sources as follows. At elements where the mask value is zero, the first operand is copied to the destination element, otherwise the second operand is copied to the destination element. The first operand is always a vector register group specified by vs2. The second operand is a vector register group specified by vs1 or a scalar x register specified by rs1 or a 5-bit sign-extended immediate. vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i] |
vmerge.vvm | vs2, vs1, vd |
The vmerge instructions are encoded as masked instructions (vm=0). The instructions combine two sources as follows. At elements where the mask value is zero, the first operand is copied to the destination element, otherwise the second operand is copied to the destination element. The first operand is always a vector register group specified by vs2. The second operand is a vector register group specified by vs1 or a scalar x register specified by rs1 or a 5-bit sign-extended immediate. vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i] |
vmerge.vxm | vs2, rs1, vd |
The vmerge instructions are encoded as masked instructions (vm=0). The instructions combine two sources as follows. At elements where the mask value is zero, the first operand is copied to the destination element, otherwise the second operand is copied to the destination element. The first operand is always a vector register group specified by vs2. The second operand is a vector register group specified by vs1 or a scalar x register specified by rs1 or a 5-bit sign-extended immediate. vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i] |
v / _vector_integer_minmax_instructions
- Vector Integer Arithmetic Instructions / 12.9. Vector Integer Min/Max Instructions
Operation | Arguments | Description |
vminu.vv | vs2, vs1, vd |
# Unsigned minimum vminu.vv vd, vs2, vs1, vm # Vector-vector vminu.vx vd, vs2, rs1, vm # vector-scalar # Signed minimum vmin.vv vd, vs2, vs1, vm # Vector-vector vmin.vx vd, vs2, rs1, vm # vector-scalar # Unsigned maximum vmaxu.vv vd, vs2, vs1, vm # Vector-vector vmaxu.vx vd, vs2, rs1, vm # vector-scalar # Signed maximum vmax.vv vd, vs2, vs1, vm # Vector-vector vmax.vx vd, vs2, rs1, vm # vector-scalar |
vminu.vx | vs2, rs1, vd |
# Unsigned minimum vminu.vv vd, vs2, vs1, vm # Vector-vector vminu.vx vd, vs2, rs1, vm # vector-scalar # Signed minimum vmin.vv vd, vs2, vs1, vm # Vector-vector vmin.vx vd, vs2, rs1, vm # vector-scalar # Unsigned maximum vmaxu.vv vd, vs2, vs1, vm # Vector-vector vmaxu.vx vd, vs2, rs1, vm # vector-scalar # Signed maximum vmax.vv vd, vs2, vs1, vm # Vector-vector vmax.vx vd, vs2, rs1, vm # vector-scalar |
v / _vector_integer_move_instructions
- Vector Integer Arithmetic Instructions / 12.16. Vector Integer Move Instructions
Operation | Arguments | Description |
vmv.s.x | rs1, vd |
The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm |
vmv.v.i | simm5, vd |
The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm |
vmv.v.v | vs1, vd |
The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm |
vmv.v.x | rs1, vd |
The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm |
vmv.x.s | vs2, rd |
The vector integer move instructions copy a source operand to a vector register group. The vmv.v.v variant copies a vector register group, whereas the vmv.v.x and vmv.v.i variants splat a scalar register or immediate to all active elements of the destination vector register group. These instructions are encoded as unmasked instructions (vm=1). The first operand specifier (vs2) must contain v0, and any other vector register number in vs2 is reserved. The form vmv.v.v vd, vd, which leaves body elements unchanged, can be used to indicate that the register will next be used with an EEW equal to SEW. vmv.v.v vd, vs1 # vd[i] = vs1[i] vmv.v.x vd, rs1 # vd[i] = x[rs1] vmv.v.i vd, imm # vd[i] = imm |
v / _vector_iota_instruction
- Vector Mask Instructions / 16.8. Vector Iota Instruction
Operation | Arguments | Description |
viota.m | vs2, vd |
The viota.m instruction reads a source vector mask register and writes to each element of the destination vector register group the sum of all the bits of elements in the mask register whose index is less than the element, e.g., a parallel prefix sum of the mask values. Traps on viota.m are always reported with a vstart of 0, and execution is always restarted from the beginning when resuming after a trap handler. An illegal instruction exception is raised if vstart is non-zero. The viota.m instruction can be combined with memory scatter instructions (indexed stores) to perform vector compress functions. viota.m vd, vs2, vm # Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 0 0 1 v2 contents viota.m v4, v2 # Unmasked 2 2 2 1 1 1 1 0 v4 result 1 1 1 0 1 0 1 1 v0 contents 1 0 0 1 0 0 0 1 v2 contents 2 3 4 5 6 7 8 9 v4 contents viota.m v4, v2, v0.t # Masked, vtype.vma=0 1 1 1 5 1 7 1 0 v4 results Spike ISS Implementation:// vmpopc rd, vs2, vm require(P.VU.vsew >= e8 && P.VU.vsew <= e64); require_vector(true); reg_t vl = P.VU.vl->read(); reg_t sew = P.VU.vsew; reg_t rd_num = insn.rd(); reg_t rs2_num = insn.rs2(); require(P.VU.vstart->read() == 0); require_vm; require_align(rd_num, P.VU.vflmul); require_noover(rd_num, P.VU.vflmul, rs2_num, 1); int cnt = 0; for (reg_t i = 0; i < vl; ++i) { const int midx = i / 64; const int mpos = i % 64; bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx) >> mpos) & 0x1) == 1; bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1; bool has_one = false; if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) { if (vs2_lsb) { has_one = true; } } bool use_ori = (insn.v_vm() == 0) && !do_mask; switch (sew) { case e8: P.VU.elt<uint8_t>(rd_num, i, true) = use_ori ? P.VU.elt<uint8_t>(rd_num, i) : cnt; break; case e16: P.VU.elt<uint16_t>(rd_num, i, true) = use_ori ? P.VU.elt<uint16_t>(rd_num, i) : cnt; break; case e32: P.VU.elt<uint32_t>(rd_num, i, true) = use_ori ? P.VU.elt<uint32_t>(rd_num, i) : cnt; break; default: P.VU.elt<uint64_t>(rd_num, i, true) = use_ori ? P.VU.elt<uint64_t>(rd_num, i) : cnt; break; } if (has_one) { cnt++; } } |
v / _vector_loadstore_whole_register_instructions
- Vector Loads and Stores / 8.9. Vector Load/Store Whole Register Instructions
Operation | Arguments | Description |
vl1re8.v | rs1, vd |
# Format of whole register load and store instructions. vl1r.v v3, (a0) # Pseudoinstruction equal to vl1re8.v vl1re8.v v3, (a0) # Load v3 with VLEN/8 bytes held at address in a0 vl1re16.v v3, (a0) # Load v3 with VLEN/16 halfwords held at address in a0 vl1re32.v v3, (a0) # Load v3 with VLEN/32 words held at address in a0 vl1re64.v v3, (a0) # Load v3 with VLEN/64 doublewords held at address in a0 vl2r.v v2, (a0) # Pseudoinstruction equal to vl2re8.v v2, (a0) vl2re8.v v2, (a0) # Load v2-v3 with 2*VLEN/8 bytes from address in a0 vl2re16.v v2, (a0) # Load v2-v3 with 2*VLEN/16 halfwords held at address in a0 vl2re32.v v2, (a0) # Load v2-v3 with 2*VLEN/32 words held at address in a0 vl2re64.v v2, (a0) # Load v2-v3 with 2*VLEN/64 doublewords held at address in a0 vl4r.v v4, (a0) # Pseudoinstruction equal to vl4re8.v vl4re8.v v4, (a0) # Load v4-v7 with 4*VLEN/8 bytes from address in a0 vl4re16.v v4, (a0) vl4re32.v v4, (a0) vl4re64.v v4, (a0) vl8r.v v8, (a0) # Pseudoinstruction equal to vl8re8.v vl8re8.v v8, (a0) # Load v8-v15 with 8*VLEN/8 bytes from address in a0 vl8re16.v v8, (a0) vl8re32.v v8, (a0) vl8re64.v v8, (a0) vs1r.v v3, (a1) # Store v3 to address in a1 vs2r.v v2, (a1) # Store v2-v3 to address in a1 vs4r.v v4, (a1) # Store v4-v7 to address in a1 vs8r.v v8, (a1) # Store v8-v15 to address in a1 Spike ISS Implementation:// vl1re8.v vd, (rs1) VI_LD_WHOLE(uint8); |
v / _vector_narrowing_fixed_point_clip_instructions
- Vector Fixed-Point Arithmetic Instructions / 13.5. Vector Narrowing Fixed-Point Clip Instructions
Operation | Arguments | Description |
vnclip.wi | vs2, simm5, vd |
The vnclip instructions are used to pack a fixed-point value into a narrower destination. The instructions support rounding, scaling, and saturation into the final destination format. The source data is in the vector register group specified by vs2. The scaling shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. The low lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are used to control the right shift amount, which provides the scaling. For vnclip, the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer. |
vnclip.wv | vs2, vs1, vd |
The vnclip instructions are used to pack a fixed-point value into a narrower destination. The instructions support rounding, scaling, and saturation into the final destination format. The source data is in the vector register group specified by vs2. The scaling shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. The low lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are used to control the right shift amount, which provides the scaling. For vnclip, the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer. |
vnclip.wx | vs2, rs1, vd |
The vnclip instructions are used to pack a fixed-point value into a narrower destination. The instructions support rounding, scaling, and saturation into the final destination format. The source data is in the vector register group specified by vs2. The scaling shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. The low lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are used to control the right shift amount, which provides the scaling. For vnclip, the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer. |
vnclipu.wi | vs2, simm5, vd |
For vnclipu/vnclip, the rounding mode is specified in the vxrm CSR. Rounding occurs around the least-significant bit of the destination and before saturation. For vnclipu, the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer. # Narrowing unsigned clip # SEW 2*SEW SEW vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm)) |
vnclipu.wv | vs2, vs1, vd |
For vnclipu/vnclip, the rounding mode is specified in the vxrm CSR. Rounding occurs around the least-significant bit of the destination and before saturation. For vnclipu, the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer. # Narrowing unsigned clip # SEW 2*SEW SEW vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm)) |
vnclipu.wx | vs2, rs1, vd |
For vnclipu/vnclip, the rounding mode is specified in the vxrm CSR. Rounding occurs around the least-significant bit of the destination and before saturation. For vnclipu, the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer. # Narrowing unsigned clip # SEW 2*SEW SEW vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm)) |
v / _vector_register_gather_instructions
- Vector Permutation Instructions / 17.4. Vector Register Gather Instructions
Operation | Arguments | Description |
vrgather.vi | vs2, simm5, vd |
The vrgather.vv form uses SEW/LMUL for both the data and indices. The vrgatherei16.vv form uses SEW/LMUL for the data in vs2 but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in vs1. For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, otherwise the instruction encoding is reserved. vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm] |
vrgather.vv | vs2, vs1, vd |
The vrgather.vv form uses SEW/LMUL for both the data and indices. The vrgatherei16.vv form uses SEW/LMUL for the data in vs2 but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in vs1. For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, otherwise the instruction encoding is reserved. vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm] |
vrgather.vx | vs2, rs1, vd |
The vrgather.vv form uses SEW/LMUL for both the data and indices. The vrgatherei16.vv form uses SEW/LMUL for the data in vs2 but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in vs1. For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, otherwise the instruction encoding is reserved. vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm] |
v / _vector_register_grouping_vlmul20
- Vector Extension Programmer’s Model / 4.4. Vector type register, vtype
Operation | Arguments | Description |
min | rd, rs1, rs2 |
MIN Spike ISS Implementation:require_either_extension(EXT_ZBPBO, EXT_ZBB); WRITE_RD(sext_xlen(sreg_t(RS1) < sreg_t(RS2) ? RS1 : RS2)); |
v / _vector_single_width_averaging_add_and_subtract
- Vector Fixed-Point Arithmetic Instructions / 13.2. Vector Single-Width Averaging Add and Subtract
Operation | Arguments | Description |
vaaddu.vv | vs2, vs1, vd |
The averaging add and subtract instructions right shift the result by one bit and round off the result according to the setting in vxrm. Both unsigned and signed versions are provided. For vaaddu and vaadd there can be no overflow in the result. For vasub and vasubu, overflow is ignored and the result wraps around. # Averaging add # Averaging adds of unsigned integers. vaaddu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] + vs1[i], 1) vaaddu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] + x[rs1], 1) # Averaging adds of signed integers. vaadd.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] + vs1[i], 1) vaadd.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] + x[rs1], 1) # Averaging subtract # Averaging subtract of unsigned integers. vasubu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] - vs1[i], 1) vasubu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] - x[rs1], 1) # Averaging subtract of signed integers. vasub.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] - vs1[i], 1) vasub.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] - x[rs1], 1) |
vaaddu.vx | vs2, rs1, vd |
The averaging add and subtract instructions right shift the result by one bit and round off the result according to the setting in vxrm. Both unsigned and signed versions are provided. For vaaddu and vaadd there can be no overflow in the result. For vasub and vasubu, overflow is ignored and the result wraps around. # Averaging add # Averaging adds of unsigned integers. vaaddu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] + vs1[i], 1) vaaddu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] + x[rs1], 1) # Averaging adds of signed integers. vaadd.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] + vs1[i], 1) vaadd.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] + x[rs1], 1) # Averaging subtract # Averaging subtract of unsigned integers. vasubu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] - vs1[i], 1) vasubu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] - x[rs1], 1) # Averaging subtract of signed integers. vasub.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] - vs1[i], 1) vasub.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] - x[rs1], 1) |
v / _vector_single_width_floating_point_addsubtract_instructions
- Vector Floating-Point Instructions / 14.2. Vector Single-Width Floating-Point Add/Subtract Instructions
Operation | Arguments | Description |
vfadd.vf | vs2, rs1, vd |
# Floating-point add vfadd.vv vd, vs2, vs1, vm # Vector-vector vfadd.vf vd, vs2, rs1, vm # vector-scalar # Floating-point subtract vfsub.vv vd, vs2, vs1, vm # Vector-vector vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1] vfrsub.vf vd, vs2, rs1, vm # Scalar-vector vd[i] = f[rs1] - vs2[i] |
vfadd.vv | vs2, vs1, vd |
# Floating-point add vfadd.vv vd, vs2, vs1, vm # Vector-vector vfadd.vf vd, vs2, rs1, vm # vector-scalar # Floating-point subtract vfsub.vv vd, vs2, vs1, vm # Vector-vector vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1] vfrsub.vf vd, vs2, rs1, vm # Scalar-vector vd[i] = f[rs1] - vs2[i] |
v / _vector_single_width_floating_point_fused_multiply_add_instructions
- Vector Floating-Point Instructions / 14.6. Vector Single-Width Floating-Point Fused Multiply-Add Instructions
Operation | Arguments | Description |
vfmacc.vf | vs2, rs1, vd |
# FP multiply-accumulate, overwrites addend vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP negate-(multiply-accumulate), overwrites subtrahend vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP multiply-subtract-accumulator, overwrites subtrahend vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP negate-(multiply-subtract-accumulator), overwrites minuend vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] # FP multiply-add, overwrites multiplicand vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i] vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i] # FP negate-(multiply-add), overwrites multiplicand vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i] vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i] # FP multiply-sub, overwrites multiplicand vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i] vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i] # FP negate-(multiply-sub), overwrites multiplicand vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vfnmsub.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) + vs2[i] |
vfmacc.vv | vs2, vs1, vd |
# FP multiply-accumulate, overwrites addend vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP negate-(multiply-accumulate), overwrites subtrahend vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP multiply-subtract-accumulator, overwrites subtrahend vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP negate-(multiply-subtract-accumulator), overwrites minuend vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] # FP multiply-add, overwrites multiplicand vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i] vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i] # FP negate-(multiply-add), overwrites multiplicand vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i] vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i] # FP multiply-sub, overwrites multiplicand vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i] vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i] # FP negate-(multiply-sub), overwrites multiplicand vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vfnmsub.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) + vs2[i] |
v / _vector_single_width_floating_point_multiplydivide_instructions
- Vector Floating-Point Instructions / 14.4. Vector Single-Width Floating-Point Multiply/Divide Instructions
Operation | Arguments | Description |
vfmul.vf | vs2, rs1, vd |
# Floating-point multiply vfmul.vv vd, vs2, vs1, vm # Vector-vector vfmul.vf vd, vs2, rs1, vm # vector-scalar # Floating-point divide vfdiv.vv vd, vs2, vs1, vm # Vector-vector vfdiv.vf vd, vs2, rs1, vm # vector-scalar # Reverse floating-point divide vector = scalar / vector vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i] |
vfmul.vv | vs2, vs1, vd |
# Floating-point multiply vfmul.vv vd, vs2, vs1, vm # Vector-vector vfmul.vf vd, vs2, rs1, vm # vector-scalar # Floating-point divide vfdiv.vv vd, vs2, vs1, vm # Vector-vector vfdiv.vf vd, vs2, rs1, vm # vector-scalar # Reverse floating-point divide vector = scalar / vector vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i] |
v / _vector_single_width_fractional_multiply_with_rounding_and_saturation
- Vector Fixed-Point Arithmetic Instructions / 13.3. Vector Single-Width Fractional Multiply with Rounding and Saturation
Operation | Arguments | Description |
vsmul.vv | vs2, vs1, vd |
# Signed saturating and rounding fractional multiply # See vxrm description for rounding calculation vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*vs1[i], SEW-1)) vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*x[rs1], SEW-1)) |
vsmul.vx | vs2, rs1, vd |
# Signed saturating and rounding fractional multiply # See vxrm description for rounding calculation vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*vs1[i], SEW-1)) vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*x[rs1], SEW-1)) |
v / _vector_single_width_integer_add_and_subtract
- Vector Integer Arithmetic Instructions / 12.1. Vector Single-Width Integer Add and Subtract
Operation | Arguments | Description |
vadd.vi | vs2, simm5, vd |
# Integer adds. vadd.vv vd, vs2, vs1, vm # Vector-vector vadd.vx vd, vs2, rs1, vm # vector-scalar vadd.vi vd, vs2, imm, vm # vector-immediate # Integer subtract vsub.vv vd, vs2, vs1, vm # Vector-vector vsub.vx vd, vs2, rs1, vm # vector-scalar # Integer reverse subtract vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i] |
vadd.vv | vs2, vs1, vd |
# Integer adds. vadd.vv vd, vs2, vs1, vm # Vector-vector vadd.vx vd, vs2, rs1, vm # vector-scalar vadd.vi vd, vs2, imm, vm # vector-immediate # Integer subtract vsub.vv vd, vs2, vs1, vm # Vector-vector vsub.vx vd, vs2, rs1, vm # vector-scalar # Integer reverse subtract vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i] |
vadd.vx | vs2, rs1, vd |
# Integer adds. vadd.vv vd, vs2, vs1, vm # Vector-vector vadd.vx vd, vs2, rs1, vm # vector-scalar vadd.vi vd, vs2, imm, vm # vector-immediate # Integer subtract vsub.vv vd, vs2, vs1, vm # Vector-vector vsub.vx vd, vs2, rs1, vm # vector-scalar # Integer reverse subtract vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i] |
v / _vector_single_width_integer_multiply_add_instructions
- Vector Integer Arithmetic Instructions / 12.13. Vector Single-Width Integer Multiply-Add Instructions
Operation | Arguments | Description |
vmacc.vv | vs2, vs1, vd |
The integer multiply-add instructions are destructive and are provided in two forms, one that overwrites the addend or minuend (vmacc, vnmsac) and one that overwrites the first multiplicand (vmadd, vnmsub). # Integer multiply-add, overwrite addend vmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Integer multiply-sub, overwrite minuend vnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vnmsac.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vs2[i]) + vd[i] # Integer multiply-add, overwrite multiplicand vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i] vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i] # Integer multiply-sub, overwrite multiplicand vnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vnmsub.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vd[i]) + vs2[i] |
vmacc.vx | vs2, rs1, vd |
The integer multiply-add instructions are destructive and are provided in two forms, one that overwrites the addend or minuend (vmacc, vnmsac) and one that overwrites the first multiplicand (vmadd, vnmsub). # Integer multiply-add, overwrite addend vmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Integer multiply-sub, overwrite minuend vnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vnmsac.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vs2[i]) + vd[i] # Integer multiply-add, overwrite multiplicand vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i] vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i] # Integer multiply-sub, overwrite multiplicand vnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] vnmsub.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vd[i]) + vs2[i] |
v / _vector_single_width_integer_multiply_instructions
- Vector Integer Arithmetic Instructions / 12.10. Vector Single-Width Integer Multiply Instructions
Operation | Arguments | Description |
vmul.vv | vs2, vs1, vd |
# Signed multiply, returning low bits of product vmul.vv vd, vs2, vs1, vm # Vector-vector vmul.vx vd, vs2, rs1, vm # vector-scalar # Signed multiply, returning high bits of product vmulh.vv vd, vs2, vs1, vm # Vector-vector vmulh.vx vd, vs2, rs1, vm # vector-scalar # Unsigned multiply, returning high bits of product vmulhu.vv vd, vs2, vs1, vm # Vector-vector vmulhu.vx vd, vs2, rs1, vm # vector-scalar # Signed(vs2)-Unsigned multiply, returning high bits of product vmulhsu.vv vd, vs2, vs1, vm # Vector-vector vmulhsu.vx vd, vs2, rs1, vm # vector-scalar |
vmul.vx | vs2, rs1, vd |
# Signed multiply, returning low bits of product vmul.vv vd, vs2, vs1, vm # Vector-vector vmul.vx vd, vs2, rs1, vm # vector-scalar # Signed multiply, returning high bits of product vmulh.vv vd, vs2, vs1, vm # Vector-vector vmulh.vx vd, vs2, rs1, vm # vector-scalar # Unsigned multiply, returning high bits of product vmulhu.vv vd, vs2, vs1, vm # Vector-vector vmulhu.vx vd, vs2, rs1, vm # vector-scalar # Signed(vs2)-Unsigned multiply, returning high bits of product vmulhsu.vv vd, vs2, vs1, vm # Vector-vector vmulhsu.vx vd, vs2, rs1, vm # vector-scalar |
v / _vector_single_width_saturating_add_and_subtract
- Vector Fixed-Point Arithmetic Instructions / 13.1. Vector Single-Width Saturating Add and Subtract
Operation | Arguments | Description |
vsaddu.vi | vs2, simm5, vd |
# Saturating adds of unsigned integers. vsaddu.vv vd, vs2, vs1, vm # Vector-vector vsaddu.vx vd, vs2, rs1, vm # vector-scalar vsaddu.vi vd, vs2, imm, vm # vector-immediate # Saturating adds of signed integers. vsadd.vv vd, vs2, vs1, vm # Vector-vector vsadd.vx vd, vs2, rs1, vm # vector-scalar vsadd.vi vd, vs2, imm, vm # vector-immediate # Saturating subtract of unsigned integers. vssubu.vv vd, vs2, vs1, vm # Vector-vector vssubu.vx vd, vs2, rs1, vm # vector-scalar # Saturating subtract of signed integers. vssub.vv vd, vs2, vs1, vm # Vector-vector vssub.vx vd, vs2, rs1, vm # vector-scalar |
vsaddu.vv | vs2, vs1, vd |
# Saturating adds of unsigned integers. vsaddu.vv vd, vs2, vs1, vm # Vector-vector vsaddu.vx vd, vs2, rs1, vm # vector-scalar vsaddu.vi vd, vs2, imm, vm # vector-immediate # Saturating adds of signed integers. vsadd.vv vd, vs2, vs1, vm # Vector-vector vsadd.vx vd, vs2, rs1, vm # vector-scalar vsadd.vi vd, vs2, imm, vm # vector-immediate # Saturating subtract of unsigned integers. vssubu.vv vd, vs2, vs1, vm # Vector-vector vssubu.vx vd, vs2, rs1, vm # vector-scalar # Saturating subtract of signed integers. vssub.vv vd, vs2, vs1, vm # Vector-vector vssub.vx vd, vs2, rs1, vm # vector-scalar |
vsaddu.vx | vs2, rs1, vd |
# Saturating adds of unsigned integers. vsaddu.vv vd, vs2, vs1, vm # Vector-vector vsaddu.vx vd, vs2, rs1, vm # vector-scalar vsaddu.vi vd, vs2, imm, vm # vector-immediate # Saturating adds of signed integers. vsadd.vv vd, vs2, vs1, vm # Vector-vector vsadd.vx vd, vs2, rs1, vm # vector-scalar vsadd.vi vd, vs2, imm, vm # vector-immediate # Saturating subtract of unsigned integers. vssubu.vv vd, vs2, vs1, vm # Vector-vector vssubu.vx vd, vs2, rs1, vm # vector-scalar # Saturating subtract of signed integers. vssub.vv vd, vs2, vs1, vm # Vector-vector vssub.vx vd, vs2, rs1, vm # vector-scalar |
v / _vector_single_width_scaling_shift_instructions
- Vector Fixed-Point Arithmetic Instructions / 13.4. Vector Single-Width Scaling Shift Instructions
Operation | Arguments | Description |
vssrl.vi | vs2, simm5, vd |
These instructions shift the input value right, and round off the shifted out bits according to vxrm. The scaling right shifts have both zero-extending (vssrl) and sign-extending (vssra) forms. The data to be shifted is in the vector register group specified by vs2 and the shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. Only the low lg2(SEW) bits of the shift-amount value are used to control the shift amount. # Scaling shift right logical vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) # Scaling shift right arithmetic vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm) |
vssrl.vv | vs2, vs1, vd |
These instructions shift the input value right, and round off the shifted out bits according to vxrm. The scaling right shifts have both zero-extending (vssrl) and sign-extending (vssra) forms. The data to be shifted is in the vector register group specified by vs2 and the shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. Only the low lg2(SEW) bits of the shift-amount value are used to control the shift amount. # Scaling shift right logical vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) # Scaling shift right arithmetic vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm) |
vssrl.vx | vs2, rs1, vd |
These instructions shift the input value right, and round off the shifted out bits according to vxrm. The scaling right shifts have both zero-extending (vssrl) and sign-extending (vssra) forms. The data to be shifted is in the vector register group specified by vs2 and the shift amount value can come from a vector register group vs1, a scalar integer register rs1, or a zero-extended 5-bit immediate. Only the low lg2(SEW) bits of the shift-amount value are used to control the shift amount. # Scaling shift right logical vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) # Scaling shift right arithmetic vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm) |
v / _vector_single_width_shift_instructions
- Vector Integer Arithmetic Instructions / 12.6. Vector Single-Width Shift Instructions
Operation | Arguments | Description |
vsll.vi | vs2, simm5, vd |
# Bit shift operations vsll.vv vd, vs2, vs1, vm # Vector-vector vsll.vx vd, vs2, rs1, vm # vector-scalar vsll.vi vd, vs2, uimm, vm # vector-immediate vsrl.vv vd, vs2, vs1, vm # Vector-vector vsrl.vx vd, vs2, rs1, vm # vector-scalar vsrl.vi vd, vs2, uimm, vm # vector-immediate vsra.vv vd, vs2, vs1, vm # Vector-vector vsra.vx vd, vs2, rs1, vm # vector-scalar vsra.vi vd, vs2, uimm, vm # vector-immediate |
vsll.vv | vs2, vs1, vd |
# Bit shift operations vsll.vv vd, vs2, vs1, vm # Vector-vector vsll.vx vd, vs2, rs1, vm # vector-scalar vsll.vi vd, vs2, uimm, vm # vector-immediate vsrl.vv vd, vs2, vs1, vm # Vector-vector vsrl.vx vd, vs2, rs1, vm # vector-scalar vsrl.vi vd, vs2, uimm, vm # vector-immediate vsra.vv vd, vs2, vs1, vm # Vector-vector vsra.vx vd, vs2, rs1, vm # vector-scalar vsra.vi vd, vs2, uimm, vm # vector-immediate |
vsll.vx | vs2, rs1, vd |
# Bit shift operations vsll.vv vd, vs2, vs1, vm # Vector-vector vsll.vx vd, vs2, rs1, vm # vector-scalar vsll.vi vd, vs2, uimm, vm # vector-immediate vsrl.vv vd, vs2, vs1, vm # Vector-vector vsrl.vx vd, vs2, rs1, vm # vector-scalar vsrl.vi vd, vs2, uimm, vm # vector-immediate vsra.vv vd, vs2, vs1, vm # Vector-vector vsra.vx vd, vs2, rs1, vm # vector-scalar vsra.vi vd, vs2, uimm, vm # vector-immediate |
v / _vector_slide1down_instruction
- Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation | Arguments | Description |
vfslide1down.vf | vs2, rs1, vd |
The vfslide1down instruction is defined analogously, but sources its scalar argument from an f register. |
vslide1down.vx | vs2, rs1, vd |
The vslide1down instruction copies the first vl-1 active elements values from index i+1 in the source vector register group to index i in the destination vector register group. The vslide1down instruction places the x register argument at location vl-1 in the destination vector register, provided that element vl-1 is active, otherwise the destination element is unchanged. If XLEN < SEW, the value is sign-extended to SEW bits. If XLEN > SEW, the least-significant bits are copied over and the high SEW-XLEN bits are ignored. vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] vfslide1down.vf vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=f[rs1] vslide1down behavior i < vstart unchanged vstart <= i < vl-1 vd[i] = vs2[i+1] if v0.mask[i] enabled vstart <= i = vl-1 vd[vl-1] = x[rs1] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy |
v / _vector_slide1up
- Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation | Arguments | Description |
vfslide1up.vf | vs2, rs1, vd |
The vfslide1up instruction is defined analogously, but sources its scalar argument from an f register. |
vslide1up.vx | vs2, rs1, vd |
The vslide1up instruction places the x register argument at location 0 of the destination vector register group, provided that element 0 is active, otherwise the destination element update follows the current mask agnostic/undisturbed policy. If XLEN < SEW, the value is sign-extended to SEW bits. If XLEN > SEW, the least-significant bits are copied over and the high SEW-XLEN bits are ignored. The vslide1up instruction requires that the destination vector register group does not overlap the source vector register group. Otherwise, the instruction encoding is reserved. vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] vfslide1up.vf vd, vs2, rs1, vm # vd[0]=f[rs1], vd[i+1] = vs2[i] vslide1up behavior i < vstart unchanged 0 = i = vstart vd[i] = x[rs1] if v0.mask[i] enabled max(vstart, 1) <= i < vl vd[i] = vs2[i-1] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy |
v / _vector_slide_instructions
- Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation | Arguments | Description |
vslideup.vi | vs2, simm5, vd |
For all of the vslideup, vslidedown, v[f]slide1up, and v[f]slide1down instructions, if vstart >= vl, the instruction performs no operation and leaves the destination vector register unchanged. |
vslideup.vx | vs2, rs1, vd |
For all of the vslideup, vslidedown, v[f]slide1up, and v[f]slide1down instructions, if vstart >= vl, the instruction performs no operation and leaves the destination vector register unchanged. |
v / _vector_slidedown_instructions
- Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation | Arguments | Description |
vslidedown.vi | vs2, simm5, vd |
For vslidedown, the value in vl specifies the maximum number of destination elements that are written. The remaining elements past vl are handled according to the current tail policy (Section Vector Tail Agnostic and Vector Mask Agnostic vta and vma ). vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] vslidedown.vi vd, vs2, uimm, vm # vd[i] = vs2[i+uimm] vslidedown behavior for source elements for element i in slide 0 <= i+OFFSET < VLMAX src[i] = vs2[i+OFFSET] VLMAX <= i+OFFSET src[i] = 0 vslidedown behavior for destination element i in slide 0 < i < vstart Unchanged vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy |
vslidedown.vx | vs2, rs1, vd |
For vslidedown, the value in vl specifies the maximum number of destination elements that are written. The remaining elements past vl are handled according to the current tail policy (Section Vector Tail Agnostic and Vector Mask Agnostic vta and vma ). vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] vslidedown.vi vd, vs2, uimm, vm # vd[i] = vs2[i+uimm] vslidedown behavior for source elements for element i in slide 0 <= i+OFFSET < VLMAX src[i] = vs2[i+OFFSET] VLMAX <= i+OFFSET src[i] = 0 vslidedown behavior for destination element i in slide 0 < i < vstart Unchanged vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy |
v / _vector_strided_instructions
- Vector Loads and Stores / 8.5. Vector Strided Instructions
Operation | Arguments | Description |
vlse8.v | rs2, rs1, vd |
# Vector strided loads and stores # vd destination, rs1 base address, rs2 byte stride vlse8.v vd, (rs1), rs2, vm # 8-bit strided load vlse16.v vd, (rs1), rs2, vm # 16-bit strided load vlse32.v vd, (rs1), rs2, vm # 32-bit strided load vlse64.v vd, (rs1), rs2, vm # 64-bit strided load # vs3 store data, rs1 base address, rs2 byte stride vsse8.v vs3, (rs1), rs2, vm # 8-bit strided store vsse16.v vs3, (rs1), rs2, vm # 16-bit strided store vsse32.v vs3, (rs1), rs2, vm # 32-bit strided store vsse64.v vs3, (rs1), rs2, vm # 64-bit strided store Spike ISS Implementation:// vlse8.v and vlsseg[2-8]e8.v VI_LD(i * RS2, fn, int8, false); |
v / _vector_unit_stride_instructions
- Vector Loads and Stores / 8.4. Vector Unit-Stride Instructions
Operation | Arguments | Description |
vle8.v | rs1, vd |
# Vector unit-stride loads and stores # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) vle8.v vd, (rs1), vm # 8-bit unit-stride load vle16.v vd, (rs1), vm # 16-bit unit-stride load vle32.v vd, (rs1), vm # 32-bit unit-stride load vle64.v vd, (rs1), vm # 64-bit unit-stride load # vs3 store data, rs1 base address, vm is mask encoding (v0.t or <missing>) vse8.v vs3, (rs1), vm # 8-bit unit-stride store vse16.v vs3, (rs1), vm # 16-bit unit-stride store vse32.v vs3, (rs1), vm # 32-bit unit-stride store vse64.v vs3, (rs1), vm # 64-bit unit-stride store Spike ISS Implementation:// vle8.v and vlseg[2-8]e8.v VI_LD(0, (i * nf + fn), int8, false); |
vlm.v | rs1, vd |
vlm.v and vsm.v are encoded with the same width[2:0]=0 encoding as vle8.v and vse8.v, but are distinguished by different lumop and sumop encodings. Since vlm.v and vsm.v operate as byte loads and stores, vstart is in units of bytes for these instructions. # Vector unit-stride mask load vlm.v vd, (rs1) # Load byte vector of length ceil(vl/8) # Vector unit-stride mask store vsm.v vs3, (rs1) # Store byte vector of length ceil(vl/8) Spike ISS Implementation:// vle1.v and vlseg[2-8]e8.v VI_LD(0, (i * nf + fn), int8, true); |
v / _vector_unordered_single_width_floating_point_sum_reduction
- Vector Reduction Operations / 15.3. Vector Single-Width Floating-Point Reduction Instructions
Operation | Arguments | Description |
vfredusum.vs | vs2, vs1, vd |
The unordered sum reduction instruction, vfredusum, provides an implementation more freedom in performing the reduction. |
v / _vector_widening_floating_point_addsubtract_instructions
- Vector Floating-Point Instructions / 14.3. Vector Widening Floating-Point Add/Subtract Instructions
Operation | Arguments | Description |
vfwadd.vf | vs2, rs1, vd |
# Widening FP add/subtract, 2*SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar |
vfwadd.vv | vs2, vs1, vd |
# Widening FP add/subtract, 2*SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar |
vfwadd.wf | vs2, rs1, vd |
# Widening FP add/subtract, 2*SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar |
vfwadd.wv | vs2, vs1, vd |
# Widening FP add/subtract, 2*SEW = SEW +/- SEW vfwadd.vv vd, vs2, vs1, vm # vector-vector vfwadd.vf vd, vs2, rs1, vm # vector-scalar vfwsub.vv vd, vs2, vs1, vm # vector-vector vfwsub.vf vd, vs2, rs1, vm # vector-scalar # Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW vfwadd.wv vd, vs2, vs1, vm # vector-vector vfwadd.wf vd, vs2, rs1, vm # vector-scalar vfwsub.wv vd, vs2, vs1, vm # vector-vector vfwsub.wf vd, vs2, rs1, vm # vector-scalar |
v / _vector_widening_floating_point_fused_multiply_add_instructions
- Vector Floating-Point Instructions / 14.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
Operation | Arguments | Description |
vfwmacc.vf | vs2, rs1, vd |
# FP widening multiply-accumulate, overwrites addend vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP widening negate-(multiply-accumulate), overwrites addend vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP widening multiply-subtract-accumulator, overwrites addend vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP widening negate-(multiply-subtract-accumulator), overwrites addend vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] |
vfwmacc.vv | vs2, vs1, vd |
# FP widening multiply-accumulate, overwrites addend vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] # FP widening negate-(multiply-accumulate), overwrites addend vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP widening multiply-subtract-accumulator, overwrites addend vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] # FP widening negate-(multiply-subtract-accumulator), overwrites addend vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] |
v / _vector_widening_floating_point_multiply
- Vector Floating-Point Instructions / 14.5. Vector Widening Floating-Point Multiply
Operation | Arguments | Description |
vfwmul.vf | vs2, rs1, vd |
# Widening floating-point multiply vfwmul.vv vd, vs2, vs1, vm # vector-vector vfwmul.vf vd, vs2, rs1, vm # vector-scalar |
vfwmul.vv | vs2, vs1, vd |
# Widening floating-point multiply vfwmul.vv vd, vs2, vs1, vm # vector-vector vfwmul.vf vd, vs2, rs1, vm # vector-scalar |
v / _vector_widening_integer_addsubtract
- Vector Integer Arithmetic Instructions / 12.2. Vector Widening Integer Add/Subtract
Operation | Arguments | Description |
vwaddu.vv | vs2, vs1, vd |
# Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar |
vwaddu.vx | vs2, rs1, vd |
# Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar |
vwaddu.wv | vs2, vs1, vd |
# Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar |
vwaddu.wx | vs2, rs1, vd |
# Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW vwaddu.vv vd, vs2, vs1, vm # vector-vector vwaddu.vx vd, vs2, rs1, vm # vector-scalar vwsubu.vv vd, vs2, vs1, vm # vector-vector vwsubu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = SEW +/- SEW vwadd.vv vd, vs2, vs1, vm # vector-vector vwadd.vx vd, vs2, rs1, vm # vector-scalar vwsub.vv vd, vs2, vs1, vm # vector-vector vwsub.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW vwaddu.wv vd, vs2, vs1, vm # vector-vector vwaddu.wx vd, vs2, rs1, vm # vector-scalar vwsubu.wv vd, vs2, vs1, vm # vector-vector vwsubu.wx vd, vs2, rs1, vm # vector-scalar # Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW vwadd.wv vd, vs2, vs1, vm # vector-vector vwadd.wx vd, vs2, rs1, vm # vector-scalar vwsub.wv vd, vs2, vs1, vm # vector-vector vwsub.wx vd, vs2, rs1, vm # vector-scalar |
v / _vector_widening_integer_multiply_add_instructions
- Vector Integer Arithmetic Instructions / 12.14. Vector Widening Integer Multiply-Add Instructions
Operation | Arguments | Description |
vwmaccu.vv | vs2, vs1, vd |
# Widening unsigned-integer multiply-add, overwrite addend vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-integer multiply-add, overwrite addend vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-unsigned-integer multiply-add, overwrite addend vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i] vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i] # Widening unsigned-signed-integer multiply-add, overwrite addend vwmaccus.vx vd, rs1, vs2, vm # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i] |
vwmaccu.vx | vs2, rs1, vd |
# Widening unsigned-integer multiply-add, overwrite addend vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-integer multiply-add, overwrite addend vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] # Widening signed-unsigned-integer multiply-add, overwrite addend vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i] vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i] # Widening unsigned-signed-integer multiply-add, overwrite addend vwmaccus.vx vd, rs1, vs2, vm # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i] |
v / _vector_widening_integer_multiply_instructions
- Vector Integer Arithmetic Instructions / 12.12. Vector Widening Integer Multiply Instructions
Operation | Arguments | Description |
vwmul.vv | vs2, vs1, vd |
# Widening signed-integer multiply vwmul.vv vd, vs2, vs1, vm # vector-vector vwmul.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned-integer multiply vwmulu.vv vd, vs2, vs1, vm # vector-vector vwmulu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar |
vwmul.vx | vs2, rs1, vd |
# Widening signed-integer multiply vwmul.vv vd, vs2, vs1, vm # vector-vector vwmul.vx vd, vs2, rs1, vm # vector-scalar # Widening unsigned-integer multiply vwmulu.vv vd, vs2, vs1, vm # vector-vector vwmulu.vx vd, vs2, rs1, vm # vector-scalar # Widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar |
v / _vfirst_find_first_set_mask_bit
- Vector Mask Instructions / 16.3. vfirst find-first-set mask bit
Operation | Arguments | Description |
vfirst.m | vs2, rd |
The vfirst instruction finds the lowest-numbered active element of the source mask vector that has the value 1 and writes that element's index to a GPR. If no active element has the value 1, -1 is written to the GPR. The vfirst.m instruction writes x[rd] even if vl=0 (with the value -1, since no mask elements are active). Traps on vfirst are always reported with a vstart of 0. The vfirst instruction will raise an illegal instruction exception if vstart is non-zero. vfirst.m rd, vs2, vm Spike ISS Implementation:// vmfirst rd, vs2 require(P.VU.vsew >= e8 && P.VU.vsew <= e64); require_vector(true); reg_t vl = P.VU.vl->read(); reg_t rs2_num = insn.rs2(); require(P.VU.vstart->read() == 0); reg_t pos = -1; for (reg_t i=P.VU.vstart->read(); i < vl; ++i) { VI_LOOP_ELEMENT_SKIP() bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx ) >> mpos) & 0x1) == 1; if (vs2_lsb) { pos = i; break; } } P.VU.vstart->write(0); WRITE_RD(pos); |
v / _vmsif_m_set_including_first_mask_bit
- Vector Mask Instructions / 16.5. vmsif.m set-including-first mask bit
Operation | Arguments | Description |
vmsif.m | vs2, vd |
Traps on vmsif.m are always reported with a vstart of 0. The vmsif instruction will raise an illegal instruction exception if vstart is non-zero. vmsif.m vd, vs2, vm # Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 1 0 0 v3 contents vmsif.m v2, v3 0 0 0 0 0 1 1 1 v2 contents 1 0 0 1 0 1 0 1 v3 contents vmsif.m v2, v3 0 0 0 0 0 0 0 1 v2 1 1 0 0 0 0 1 1 v0 vcontents 1 0 0 1 0 1 0 0 v3 contents vmsif.m v2, v3, v0.t 1 1 x x x x 1 1 v2 contents Spike ISS Implementation:// vmsif.m rd, vs2, vm require(P.VU.vsew >= e8 && P.VU.vsew <= e64); require_vector(true); require(P.VU.vstart->read() == 0); require_vm; require(insn.rd() != insn.rs2()); reg_t vl = P.VU.vl->read(); reg_t rd_num = insn.rd(); reg_t rs2_num = insn.rs2(); bool has_one = false; for (reg_t i = P.VU.vstart->read(); i < vl; ++i) { const int midx = i / 64; const int mpos = i % 64; const uint64_t mmask = UINT64_C(1) << mpos; \ bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx ) >> mpos) & 0x1) == 1; bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1; if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) { auto &vd = P.VU.elt<uint64_t>(rd_num, midx, true); uint64_t res = 0; if (!has_one && !vs2_lsb) { res = 1; } else if (!has_one && vs2_lsb) { has_one = true; res = 1; } vd = (vd & ~mmask) | ((res << mpos) & mmask); } } |
v / _vmsof_m_set_only_first_mask_bit
- Vector Mask Instructions / 16.6. vmsof.m set-only-first mask bit
Operation | Arguments | Description |
vmsof.m | vs2, vd |
Traps on vmsof.m are always reported with a vstart of 0. The vmsof instruction will raise an illegal instruction exception if vstart is non-zero. vmsof.m vd, vs2, vm # Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 1 0 0 v3 contents vmsof.m v2, v3 0 0 0 0 0 1 0 0 v2 contents 1 0 0 1 0 1 0 1 v3 contents vmsof.m v2, v3 0 0 0 0 0 0 0 1 v2 1 1 0 0 0 0 1 1 v0 vcontents 1 1 0 1 0 1 0 0 v3 contents vmsof.m v2, v3, v0.t 0 1 x x x x 0 0 v2 contents Spike ISS Implementation:// vmsof.m rd, vs2, vm require(P.VU.vsew >= e8 && P.VU.vsew <= e64); require_vector(true); require(P.VU.vstart->read() == 0); require_vm; require(insn.rd() != insn.rs2()); reg_t vl = P.VU.vl->read(); reg_t rd_num = insn.rd(); reg_t rs2_num = insn.rs2(); bool has_one = false; for (reg_t i = P.VU.vstart->read() ; i < vl; ++i) { const int midx = i / 64; const int mpos = i % 64; const uint64_t mmask = UINT64_C(1) << mpos; \ bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx ) >> mpos) & 0x1) == 1; bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1; if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) { uint64_t &vd = P.VU.elt<uint64_t>(rd_num, midx, true); uint64_t res = 0; if (!has_one && vs2_lsb) { has_one = true; res = 1; } vd = (vd & ~mmask) | ((res << mpos) & mmask); } } |
v / _widening_floating_pointinteger_type_convert_instructions
- Vector Floating-Point Instructions / 14.18. Widening Floating-Point/Integer Type-Convert Instructions
Operation | Arguments | Description |
vfwcvt.f.f.v | vs2, vd |
vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float. |
vfwcvt.f.x.v | vs2, vd |
vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float. |
vfwcvt.f.xu.v | vs2, vd |
vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float. |
vfwcvt.rtz.x.f.v | vs2, vd |
vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float. |
vfwcvt.rtz.xu.f.v | vs2, vd |
vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float. |
vfwcvt.x.f.v | vs2, vd |
vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float. |
vfwcvt.xu.f.v | vs2, vd |
vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float. |
v / _zve_vector_extensions_for_embedded_processors
- Standard Vector Extensions / 19.2. Zve*: Vector Extensions for Embedded Processors
Operation | Arguments | Description |
vmulh.vv | vs2, vs1, vd |
All Zve* extensions support all vector integer instructions (Section Vector Integer Arithmetic Instructions ), except that the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64*. |
vmulh.vx | vs2, rs1, vd |
All Zve* extensions support all vector integer instructions (Section Vector Integer Arithmetic Instructions ), except that the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64*. |
v / sec-agnostic
- Vector Extension Programmer’s Model / 4.4. Vector type register, vtype
Operation | Arguments | Description |
vmsbf.m | vs2, vd |
In addition, except for mask load instructions, any element in the tail of a mask result can also be written with the value the mask-producing operation would have calculated with vl=VLMAX. Furthermore, for mask-logical instructions and vmsbf.m, vmsif.m, vmsof.m mask-manipulation instructions, any element in the tail of the result can be written with the value the mask-producing operation would have calculated with vl=VLEN, SEW=8, and LMUL=8 (i.e., all bits of the mask register can be overwritten). Spike ISS Implementation:// vmsbf.m vd, vs2, vm require(P.VU.vsew >= e8 && P.VU.vsew <= e64); require_vector(true); require(P.VU.vstart->read() == 0); require_vm; require(insn.rd() != insn.rs2()); reg_t vl = P.VU.vl->read(); reg_t rd_num = insn.rd(); reg_t rs2_num = insn.rs2(); bool has_one = false; for (reg_t i = P.VU.vstart->read(); i < vl; ++i) { const int midx = i / 64; const int mpos = i % 64; const uint64_t mmask = UINT64_C(1) << mpos; \ bool vs2_lsb = ((P.VU.elt<uint64_t>(rs2_num, midx) >> mpos) & 0x1) == 1; bool do_mask = (P.VU.elt<uint64_t>(0, midx) >> mpos) & 0x1; if (insn.v_vm() == 1 || (insn.v_vm() == 0 && do_mask)) { auto &vd = P.VU.elt<uint64_t>(rd_num, midx, true); uint64_t res = 0; if (!has_one && !vs2_lsb) { res = 1; } else if (!has_one && vs2_lsb) { has_one = true; } vd = (vd & ~mmask) | ((res << mpos) & mmask); } } |
vsetvli | zimm11, rs1, rd |
The assembly syntax adds two mandatory flags to the vsetvli instruction: ta # Tail agnostic tu # Tail undisturbed ma # Mask agnostic mu # Mask undisturbed vsetvli t0, a0, e32, m4, ta, ma # Tail agnostic, mask agnostic vsetvli t0, a0, e32, m4, tu, ma # Tail undisturbed, mask agnostic vsetvli t0, a0, e32, m4, ta, mu # Tail agnostic, mask undisturbed vsetvli t0, a0, e32, m4, tu, mu # Tail undisturbed, mask undisturbed Spike ISS Implementation:require_vector_novtype(false); WRITE_RD(P.VU.set_vl(insn.rd(), insn.rs1(), RS1, insn.v_zimm11())); |
v / sec-mask-register-logical
- Vector Mask Instructions / 16.1. Vector Mask-Register Logical Instructions
Operation | Arguments | Description |
vmand.mm | vs2, vs1, vd |
vmand.mm vd, src1, src2 vmand.mm vd, src2, src2 vmand.mm vd, src1, src1 vmand.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && vs1.mask[i] vmnand.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] && vs1.mask[i]) vmandn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && !vs1.mask[i] vmxor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] ^^ vs1.mask[i] vmor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || vs1.mask[i] vmnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] || vs1.mask[i]) vmorn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || !vs1.mask[i] vmxnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] ^^ vs1.mask[i]) vmmv.m vd, vs => vmand.mm vd, vs, vs # Copy mask register vmclr.m vd => vmxor.mm vd, vd, vd # Clear mask register vmset.m vd => vmxnor.mm vd, vd, vd # Set mask register vmnot.m vd, vs => vmnand.mm vd, vs, vs # Invert bits |
vmandn.mm | vs2, vs1, vd |
vmandn.mm vd, src2, src1 vmandn.mm vd, src1, src2 |
vmnand.mm | vs2, vs1, vd |
vmnand.mm vd, src1, src1 vmnand.mm vd, src2, src2 vmnand.mm vd, src1, src2 |
vmnor.mm | vs2, vs1, vd |
vmnor.mm vd, src1, src2 |
vmorn.mm | vs2, vs1, vd |
vmorn.mm vd, src2, src1 vmorn.mm vd, src1, src2 |
vmxnor.mm | vs2, vs1, vd |
vmxnor.mm vd, src1, src2 vmxnor.mm vd, vd, vd |
vmxor.mm | vs2, vs1, vd |
vmxor.mm vd, vd, vd vmxor.mm vd, src1, src2 |
v / sec-narrowing
- Vector Arithmetic Instruction Formats / 11.3. Narrowing Vector Arithmetic Instructions
Operation | Arguments | Description |
vnsra.wi | vs2, simm5, vd |
A vn* prefix on the opcode is used to distinguish these instructions in the assembler, or a vfn* prefix for narrowing floating-point opcodes. The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv) |
vnsra.wv | vs2, vs1, vd |
A vn* prefix on the opcode is used to distinguish these instructions in the assembler, or a vfn* prefix for narrowing floating-point opcodes. The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv) |
vnsra.wx | vs2, rs1, vd |
A vn* prefix on the opcode is used to distinguish these instructions in the assembler, or a vfn* prefix for narrowing floating-point opcodes. The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv) |
v / sec-vec-operands
- Vector Instruction Formats / 6.2. Vector Operands
Operation | Arguments | Description |
vnsrl.wi | vs2, simm5, vd |
The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not). |
vnsrl.wv | vs2, vs1, vd |
The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not). |
vnsrl.wx | vs2, rs1, vd |
The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group (e.g., when LMUL=1, vnsrl.wi v0, v0, 3 is legal, but a destination of v1 is not). |
vzext.vf2 | vs2, vd |
The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not). |
vzext.vf4 | vs2, vd |
The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not). |
vzext.vf8 | vs2, vd |
The destination EEW is greater than the source EEW, the source EMUL is at least 1, and the overlap is in the highest-numbered part of the destination register group (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, v2, or v4 is not). |
v / sec-vector-float-reduce
- Vector Reduction Operations / 15.3. Vector Single-Width Floating-Point Reduction Instructions
Operation | Arguments | Description |
vfredosum.vs | vs2, vs1, vd |
# Simple reductions. vfredosum.vs vd, vs2, vs1, vm # Ordered sum vfredusum.vs vd, vs2, vs1, vm # Unordered sum vfredmax.vs vd, vs2, vs1, vm # Maximum value vfredmin.vs vd, vs2, vs1, vm # Minimum value |
v / sec-vector-float-reduce-widen
- Vector Reduction Operations / 15.4. Vector Widening Floating-Point Reduction Instructions
Operation | Arguments | Description |
vfwredosum.vs | vs2, vs1, vd |
# Simple reductions. vfwredosum.vs vd, vs2, vs1, vm # Ordered sum vfwredusum.vs vd, vs2, vs1, vm # Unordered sum |
v / sec-vector-integer-reduce
- Vector Reduction Operations / 15.1. Vector Single-Width Integer Reduction Instructions
Operation | Arguments | Description |
vredsum.vs | vs2, vs1, vd |
# Simple reductions, where [*] denotes all active elements: vredsum.vs vd, vs2, vs1, vm # vd[0] = sum( vs1[0] , vs2[*] ) vredmaxu.vs vd, vs2, vs1, vm # vd[0] = maxu( vs1[0] , vs2[*] ) vredmax.vs vd, vs2, vs1, vm # vd[0] = max( vs1[0] , vs2[*] ) vredminu.vs vd, vs2, vs1, vm # vd[0] = minu( vs1[0] , vs2[*] ) vredmin.vs vd, vs2, vs1, vm # vd[0] = min( vs1[0] , vs2[*] ) vredand.vs vd, vs2, vs1, vm # vd[0] = and( vs1[0] , vs2[*] ) vredor.vs vd, vs2, vs1, vm # vd[0] = or( vs1[0] , vs2[*] ) vredxor.vs vd, vs2, vs1, vm # vd[0] = xor( vs1[0] , vs2[*] ) |
v / sec-vector-integer-reduce-widen
- Vector Reduction Operations / 15.2. Vector Widening Integer Reduction Instructions
Operation | Arguments | Description |
vwredsum.vs | vs2, vs1, vd |
The vwredsum.vs instruction sign-extends the SEW-wide vector elements before summing them. |
vwredsumu.vs | vs2, vs1, vd |
The unsigned vwredsumu.vs instruction zero-extends the SEW-wide vector elements before summing them, then adds the 2*SEW-width scalar element, and stores the result in a 2*SEW-width scalar element. For both vwredsumu.vs and vwredsum.vs, overflows wrap around. # Unsigned sum reduction into double-width accumulator vwredsumu.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(zero-extend(SEW)) # Signed sum reduction into double-width accumulator vwredsum.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(sign-extend(SEW)) |
counters
counters / zicntr-standard-extension-for-base-counters-and-timers
11 Counters / 11.1 “Zicntr” Standard Extension for Base Counters and Timers
Operation | Arguments | Description |
rdcycle | rd |
RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDCYCLE pseudoinstruction reads the low XLEN bits of the cycle CSR which holds a count of the number of clock cycles executed by the processor core on which the hart is running from an arbitrary start time in the past. RDCYCLEH is an RV32I-only instruction that reads bits 63-32 of the same cycle counter. The underlying 64-bit counter should never overflow in practice. The rate at which the cycle counter advances will depend on the implementation and operating environment. The execution environment should provide a means to determine the current rate (cycles/second) at which the cycle counter is incrementing. RDCYCLE is intended to return the number of cycles executed by the processor core, not the hart. Precisely defining what is a "core" is difficult given some implementation choices (e.g., AMD Bulldozer). Precisely defining what is a "clock cycle" is also difficult given the range of implementations (including software emulations), but the intent is that RDCYCLE is used for performance monitoring along with the other performance counters. In particular, where there is one hart/core, one would expect cycle-count/instructions-retired to measure CPI for a hart. Even though there is no precise definition that works for all platforms, this is still a useful facility for most platforms, and an imprecise, common, "usually correct" standard here is better than no standard. The intent of RDCYCLE was primarily performance monitoring/tuning, and the specification was written with that goal in mind. On some simple platforms, cycle count might represent a valid implementation of RDTIME, in which case RDTIME and RDCYCLE may return the same result. |
rdcycleh | rd |
RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDCYCLE pseudoinstruction reads the low XLEN bits of the cycle CSR which holds a count of the number of clock cycles executed by the processor core on which the hart is running from an arbitrary start time in the past. RDCYCLEH is an RV32I-only instruction that reads bits 63-32 of the same cycle counter. The underlying 64-bit counter should never overflow in practice. The rate at which the cycle counter advances will depend on the implementation and operating environment. The execution environment should provide a means to determine the current rate (cycles/second) at which the cycle counter is incrementing. |
rdinstret | rd |
RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDINSTRET pseudoinstruction reads the low XLEN bits of the instret CSR, which counts the number of instructions retired by this hart from some arbitrary start point in the past. RDINSTRETH is an RV32I-only instruction that reads bits 63-32 of the same instruction counter. The underlying 64-bit counter should never overflow in practice. |
rdinstreth | rd |
RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDINSTRET pseudoinstruction reads the low XLEN bits of the instret CSR, which counts the number of instructions retired by this hart from some arbitrary start point in the past. RDINSTRETH is an RV32I-only instruction that reads bits 63-32 of the same instruction counter. The underlying 64-bit counter should never overflow in practice. |
rdtime | rd |
RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDTIME pseudoinstruction reads the low XLEN bits of the time CSR, which counts wall-clock real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only instruction that reads bits 63-32 of the same real-time counter. The underlying 64-bit counter increments by one with each tick of the real-time clock, and, for realistic real-time clock frequencies, should never overflow in practice. The execution environment should provide a means of determining the period of a counter tick (seconds/tick). The period must be constant. The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock. The environment should provide a means to determine the accuracy of the clock (i.e., the maximum relative error between the nominal and actual real-time clock periods). On some simple platforms, cycle count might represent a valid implementation of RDTIME, in which case RDTIME and RDCYCLE may return the same result. |
rdtimeh | rd |
RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. The RDTIME pseudoinstruction reads the low XLEN bits of the time CSR, which counts wall-clock real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only instruction that reads bits 63-32 of the same real-time counter. The underlying 64-bit counter increments by one with each tick of the real-time clock, and, for realistic real-time clock frequencies, should never overflow in practice. The execution environment should provide a means of determining the period of a counter tick (seconds/tick). The period must be constant. The real-time clocks of all harts in a single user application should be synchronized to within one tick of the real-time clock. The environment should provide a means to determine the accuracy of the clock (i.e., the maximum relative error between the nominal and actual real-time clock periods). |
zihintpause
zihintpause / chap:zihintpause
4 “Zihintpause” Pause Hint, Version 2.0 /
Operation | Arguments | Description |
pause |
The PAUSE instruction is a HINT that indicates the current hart's rate of instruction retirement should be temporarily reduced or paused. The duration of its effect must be bounded and may be zero. No architectural state is changed. Software can use the PAUSE instruction to reduce energy consumption while executing spin-wait code sequences. Multithreaded cores might temporarily relinquish execution resources to other harts when PAUSE is executed. It is recommended that a PAUSE instruction generally be included in the code sequence for a spin-wait loop. A future extension might add primitives similar to the x86 MONITOR/MWAIT instructions, which provide a more efficient mechanism to wait on writes to a specific memory location. However, these instructions would not supplant PAUSE. PAUSE is more appropriate when polling for non-memory events, when polling for multiple events, or when software does not know precisely what events it is polling for. The duration of a PAUSE instruction's effect may vary significantly within and among implementations. In typical implementations this duration should be much less than the time to perform a context switch, probably more on the rough order of an on-chip cache miss latency or a cacheless access to main memory. A series of PAUSE instructions can be used to create a cumulative delay loosely proportional to the number of PAUSE instructions. In spin-wait loops in portable code, however, only one PAUSE instruction should be used before re-evaluating loop conditions, else the hart might stall longer than optimal on some implementations, degrading system performance. PAUSE is encoded as a FENCE instruction with pred=W, succ=0, fm=0, rd=x0, and rs1=x0. PAUSE is encoded as a hint within the FENCE opcode because some implementations are expected to deliberately stall the PAUSE instruction until outstanding memory transactions have completed. Because the successor set is null, however, PAUSE does not mandate any particular memory ordering--hence, it truly is a HINT. Like other FENCE instructions, PAUSE cannot be used within LR/SC sequences without voiding the forward-progress guarantee. The choice of a predecessor set of W is arbitrary, since the successor set is null. Other HINTs similar to PAUSE might be encoded with other predecessor sets. |
zfh
half precision convert and move instructions | half precision floating point classify instruction | half precision load and store instructions |
zfh / half-precision-convert-and-move-instructions
15 “Zfh” and “Zfhmin” Standard Extensions for Half-Precision Floating-Point, Version 0.1 / 15.3 Half-Precision Convert and Move Instructions
Operation | Arguments | Description |
fcvt.d.h | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively. |
fcvt.h.d | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively. |
fcvt.h.l | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions. |
fcvt.h.lu | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions. |
fcvt.h.q | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively. |
fcvt.h.s | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively. |
fcvt.h.w | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions. |
fcvt.h.wu | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions. |
fcvt.l.h | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions. |
fcvt.lu.h | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions. |
fcvt.q.h | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively. |
fcvt.s.h | rd, rs1 |
New floating-point-to-floating-point conversion instructions are added. These instructions are defined analogously to the double-precision floating-point-to-floating-point conversion instructions. FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively. If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively. |
fcvt.w.h | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions. |
fcvt.wu.h | rd, rs1 |
New floating-point-to-integer and integer-to-floating-point conversion instructions are added. These instructions are defined analogously to the single-precision-to-integer and integer-to-single-precision conversion instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions. |
fmv.h.x | rd, rs1 |
FMV.H.X moves the half-precision value encoded in IEEE 754-2008 standard encoding from the lower 16 bits of integer register rs1 to the floating-point register rd, NaN-boxing the result. FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. |
fmv.x.h | rd, rs1 |
Instructions are provided to move bit patterns between the floating-point and integer registers. FMV.X.H moves the half-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd, filling the upper XLEN-16 bits with copies of the floating-point number's sign bit. FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. |
fsgnj.h | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation:require_either_extension(EXT_ZFH, EXT_ZHINX); require_fp; WRITE_FRD_H(fsgnj16(freg(FRS1_H), freg(FRS2_H), false, false)); |
fsgnjn.h | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation:require_either_extension(EXT_ZFH, EXT_ZHINX); require_fp; WRITE_FRD_H(fsgnj16(freg(FRS1_H), freg(FRS2_H), true, false)); |
fsgnjx.h | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction. Spike ISS Implementation:require_either_extension(EXT_ZFH, EXT_ZHINX); require_fp; WRITE_FRD_H(fsgnj16(freg(FRS1_H), freg(FRS2_H), false, true)); |
zfh / half-precision-floating-point-classify-instruction
15 “Zfh” and “Zfhmin” Standard Extensions for Half-Precision Floating-Point, Version 0.1 / 15.5 Half-Precision Floating-Point Classify Instruction
Operation | Arguments | Description |
fclass.h | rd, rs1 |
The half-precision floating-point classify instruction, FCLASS.H, is defined analogously to its single-precision counterpart, but operates on half-precision operands. Spike ISS Implementation:require_either_extension(EXT_ZFH, EXT_ZHINX); require_fp; WRITE_RD(f16_classify(FRS1_H)); |
zfh / half-precision-load-and-store-instructions
15 “Zfh” and “Zfhmin” Standard Extensions for Half-Precision Floating-Point, Version 0.1 / 15.1 Half-Precision Load and Store Instructions
Operation | Arguments | Description |
flh | rd, rs1, imm12 |
FLH and FSH are only guaranteed to execute atomically if the effective address is naturally aligned. FLH and FSH do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. FLH NaN-boxes the result written to rd, whereas FSH ignores all but the lower 16 bits in rs2. Spike ISS Implementation:require_extension(EXT_INTERNAL_ZFH_MOVE); require_fp; WRITE_FRD(f16(MMU.load<uint16_t>(RS1 + insn.i_imm()))); |
fsh | rs1, rs2, imm12 |
FLH and FSH are only guaranteed to execute atomically if the effective address is naturally aligned. FLH and FSH do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. FLH NaN-boxes the result written to rd, whereas FSH ignores all but the lower 16 bits in rs2. Spike ISS Implementation:require_extension(EXT_INTERNAL_ZFH_MOVE); require_fp; MMU.store<uint16_t>(RS1 + insn.s_imm(), FRS2.v[0]); |
csr
csr /
/
Operation | Arguments | Description |
csrc | csr, rs |
Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm. Psuedo Opcode, Equivalent Operations:csrrc x0, csr, rs |
csrci | csr, imm |
Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm. Psuedo Opcode, Equivalent Operations:csrrci x0, csr, imm |
csrr | rd, csr |
The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Psuedo Opcode, Equivalent Operations:csrrs rd, csr, x0 |
csrrc | rd, rs1 |
The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the value of the CSR, zero-extends the value to XLEN bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be cleared in the CSR, if that CSR bit is writable. Other bits in the CSR are not explicitly written. For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects. A CSRRW with rs1=x0 will attempt to write zero to the destination CSR. The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. Spike ISS Implementation:bool write = insn.rs1() != 0; int csr = validate_csr(insn.csr(), write); reg_t old = p->get_csr(csr, insn, write); if (write) { p->put_csr(csr, old & ~RS1); } WRITE_RD(sext_xlen(old)); serialize(); |
csrrci | rd, zimm |
The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. Spike ISS Implementation:bool write = insn.rs1() != 0; int csr = validate_csr(insn.csr(), write); reg_t old = p->get_csr(csr, insn, write); if (write) { p->put_csr(csr, old & ~(reg_t)insn.rs1()); } WRITE_RD(sext_xlen(old)); serialize(); |
csrrs | rd, rs1 |
The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the value of the CSR, zero-extends the value to XLEN bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are not explicitly written. For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects. A CSRRW with rs1=x0 will attempt to write zero to the destination CSR. The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Spike ISS Implementation:bool write = insn.rs1() != 0; int csr = validate_csr(insn.csr(), write); reg_t old = p->get_csr(csr, insn, write); if (write) { p->put_csr(csr, old | RS1); } WRITE_RD(sext_xlen(old)); serialize(); |
csrrsi | rd, zimm |
The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. Spike ISS Implementation:bool write = insn.rs1() != 0; int csr = validate_csr(insn.csr(), write); reg_t old = p->get_csr(csr, insn, write); if (write) { p->put_csr(csr, old | insn.rs1()); } WRITE_RD(sext_xlen(old)); serialize(); |
csrrw | rd, rs1 |
The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values in the CSRs and integer registers. CSRRW reads the old value of the CSR, zero-extends the value to XLEN bits, then writes it to integer register rd. The initial value in rs1 is written to the CSR. If rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects. A CSRRW with rs1=x0 will attempt to write zero to the destination CSR. The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Spike ISS Implementation:int csr = validate_csr(insn.csr(), true); reg_t old = p->get_csr(csr, insn, true); p->put_csr(csr, RS1); WRITE_RD(sext_xlen(old)); serialize(); |
csrrwi | rd, zimm |
The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal instruction exceptions on accesses to read-only CSRs. For CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields. The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Spike ISS Implementation:int csr = validate_csr(insn.csr(), true); reg_t old = p->get_csr(csr, insn, true); p->put_csr(csr, insn.rs1()); WRITE_RD(sext_xlen(old)); serialize(); |
csrs | csr, rs |
Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm. Psuedo Opcode, Equivalent Operations:csrrs x0, csr, rs |
csrsi | csr, imm |
Further assembler pseudoinstructions are defined to set and clear bits in the CSR when the old value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, uimm. Psuedo Opcode, Equivalent Operations:csrrsi x0, csr, imm |
csrw | csr, rs |
The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Psuedo Opcode, Equivalent Operations:csrrw x0, csr, rs |
csrwi | csr, imm |
The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI csr, uimm, is encoded as CSRRWI x0, csr, uimm. Psuedo Opcode, Equivalent Operations:csrrwi x0, csr, imm |
supervisor
supervisor / svinval
7 “Svinval” Standard Extension for Fine-Grained Address-Translation Cache Invalidation, Version 1.0 /
Operation | Arguments | Description |
hfence.gvma | rs1, rs2 |
The Svinval extension splits SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA instructions into finer-grained invalidation and ordering operations that can be more efficiently batched or pipelined on certain classes of high-performance implementation. If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception. High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops. |
hfence.vvma | rs1, rs2 |
The Svinval extension splits SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA instructions into finer-grained invalidation and ordering operations that can be more efficiently batched or pipelined on certain classes of high-performance implementation. If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception. High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops. |
hinval.gvma | rs1, rs2 |
If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception. In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops. |
hinval.vvma | rs1, rs2 |
If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA require the same permissions and raise the same exceptions as SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively. In particular, an attempt to execute any of these instructions in U-mode always raises an illegal instruction exception, and an attempt to execute SINVAL.VMA or HINVAL.GVMA in S-mode or HS-mode when mstatus.TVM=1 also raises an illegal instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual instruction exception. When hstatus.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception. In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops. |
sfence.inval.ir |
The SINVAL.VMA instruction invalidates any address-translation cache entries that an SFENCE.VMA instruction with the same values of rs1 and rs2 would invalidate. However, unlike SFENCE.VMA, SINVAL.VMA instructions are only ordered with respect to SFENCE.VMA, SFENCE.W.INVAL, and SFENCE.INVAL.IR instructions as defined below. The SFENCE.W.INVAL instruction guarantees that any previous stores already visible to the current RISC-V hart are ordered before subsequent SINVAL.VMA instructions executed by the same hart. The SFENCE.INVAL.IR instruction guarantees that any previous SINVAL.VMA instructions executed by the current hart are ordered before subsequent implicit references by that hart to the memory-management data structures. When executed in order (but not necessarily consecutively) by a single hart, the sequence SFENCE.W.INVAL, SINVAL.VMA, and SFENCE.INVAL.IR has the same effect as a hypothetical SFENCE.VMA instruction in which: reads and writes following the SFENCE.INVAL.IR are considered to be those subsequent to the SFENCE.VMA. If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SFENCE.W.INVAL and SFENCE.INVAL.IR instructions do not need to be trapped when mstatus.TVM=1 or when hstatus.VTVM=1, as they only have ordering effects but no visible side effects. Trapping of the SINVAL.VMA instruction is sufficient to enable emulation of the intended overall TLB maintenance functionality. In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction. High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops. |
|
sfence.w.inval |
The SINVAL.VMA instruction invalidates any address-translation cache entries that an SFENCE.VMA instruction with the same values of rs1 and rs2 would invalidate. However, unlike SFENCE.VMA, SINVAL.VMA instructions are only ordered with respect to SFENCE.VMA, SFENCE.W.INVAL, and SFENCE.INVAL.IR instructions as defined below. The SFENCE.W.INVAL instruction guarantees that any previous stores already visible to the current RISC-V hart are ordered before subsequent SINVAL.VMA instructions executed by the same hart. The SFENCE.INVAL.IR instruction guarantees that any previous SINVAL.VMA instructions executed by the current hart are ordered before subsequent implicit references by that hart to the memory-management data structures. When executed in order (but not necessarily consecutively) by a single hart, the sequence SFENCE.W.INVAL, SINVAL.VMA, and SFENCE.INVAL.IR has the same effect as a hypothetical SFENCE.VMA instruction in which: reads and writes prior to the SFENCE.W.INVAL are considered to be those prior to the SFENCE.VMA, and If the hypervisor extension is implemented, the Svinval extension also provides two additional instructions: HINVAL.VVMA and HINVAL.GVMA. These have the same semantics as SINVAL.VMA, except that they combine with SFENCE.W.INVAL and SFENCE.INVAL.IR to replace HFENCE.VVMA and HFENCE.GVMA, respectively, instead of SFENCE.VMA. In addition, HINVAL.GVMA uses VMIDs instead of ASIDs. SFENCE.W.INVAL and SFENCE.INVAL.IR instructions do not need to be trapped when mstatus.TVM=1 or when hstatus.VTVM=1, as they only have ordering effects but no visible side effects. Trapping of the SINVAL.VMA instruction is sufficient to enable emulation of the intended overall TLB maintenance functionality. In typical usage, software will invalidate a range of virtual addresses in the address-translation caches by executing an SFENCE.W.INVAL instruction, executing a series of SINVAL.VMA, HINVAL.VVMA, or HINVAL.GVMA instructions to the addresses (and optionally ASIDs or VMIDs) in question, and then executing an SFENCE.INVAL.IR instruction. High-performance implementations will be able to pipeline the address-translation cache invalidation operations, and will defer any pipeline stalls or other memory ordering enforcement until an SFENCE.W.INVAL, SFENCE.INVAL.IR, SFENCE.VMA, HFENCE.GVMA, or HFENCE.VVMA instruction is executed. Simpler implementations may implement SINVAL.VMA, HINVAL.VVMA, and HINVAL.GVMA identically to SFENCE.VMA, HFENCE.VVMA, and HFENCE.GVMA, respectively, while implementing SFENCE.W.INVAL and SFENCE.INVAL.IR instructions as no-ops. |
hypervisor
hypervisor / hypervisor-virtual-machine-load-and-store-instructions
5 Hypervisor Extension, Version 1.0 / 5.3 Hypervisor Instructions
Operation | Arguments | Description |
hlv.b | rd, rs1 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation:require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); WRITE_RD(MMU.guest_load<int8_t>(RS1)); |
hlv.bu | rd, rs1 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. |
hlv.d | rd, rs1 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation:require_extension('H'); require_rv64; require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); WRITE_RD(MMU.guest_load<int64_t>(RS1)); |
hlv.h | rd, rs1 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation:require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); WRITE_RD(MMU.guest_load<int16_t>(RS1)); |
hlv.hu | rd, rs1 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.) |
hlv.w | rd, rs1 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. HLVX.WU is valid for RV32, even though LWU and HLV.WU are not. (For RV32, HLVX.WU can be considered a variant of HLV.W, as sign extension is irrelevant for 32-bit values.) Spike ISS Implementation:require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); WRITE_RD(MMU.guest_load<int32_t>(RS1)); |
hlv.wu | rd, rs1 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.) HLVX.WU is valid for RV32, even though LWU and HLV.WU are not. (For RV32, HLVX.WU can be considered a variant of HLV.W, as sign extension is irrelevant for 32-bit values.) |
hlvx.hu | rd, rs1 |
Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.) |
hlvx.wu | rd, rs1 |
Instructions HLVX.HU and HLVX.WU are the same as HLV.HU and HLV.WU, except that execute permission takes the place of read permission during address translation. That is, the memory being read must be executable in both stages of address translation, but read permission is not required. For the supervisor physical address that results from address translation, the supervisor physical memory attributes must grant both execute and read permissions. (The supervisor physical memory attributes are the machine's physical memory attributes as modified by physical memory protection, Section [sec:pmp] , for supervisor level.) HLVX.WU is valid for RV32, even though LWU and HLV.WU are not. (For RV32, HLVX.WU can be considered a variant of HLV.W, as sign extension is irrelevant for 32-bit values.) |
hsv.b | rs1, rs2 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation:require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); MMU.guest_store<uint8_t>(RS1, RS2); |
hsv.d | rs1, rs2 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation:require_extension('H'); require_rv64; require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); MMU.guest_store<uint64_t>(RS1, RS2); |
hsv.h | rs1, rs2 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation:require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); MMU.guest_store<uint16_t>(RS1, RS2); |
hsv.w | rs1, rs2 |
For every RV32I or RV64I load instruction, LB, LBU, LH, LHU, LW, LWU, and LD, there is a corresponding virtual-machine load instruction: HLV.B, HLV.BU, HLV.H, HLV.HU, HLV.W, HLV.WU, and HLV.D. For every RV32I or RV64I store instruction, SB, SH, SW, and SD, there is a corresponding virtual-machine store instruction: HSV.B, HSV.H, HSV.W, and HSV.D. Instructions HLV.WU, HLV.D, and HSV.D are not valid for RV32, of course. Spike ISS Implementation:require_extension('H'); require_novirt(); require_privilege(get_field(STATE.hstatus->read(), HSTATUS_HU) ? PRV_U : PRV_S); MMU.guest_store<uint32_t>(RS1, RS2); |