You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
While working on #17807, I was frustrated by the fact that for some inputs, the number of iterations required may be much smaller than the worst case, but for SFPU programming, the worst case number of iterations must be used (as there is no SFPU control flow), affecting performance.
One possible solution is to use RISCV_DEBUG_REG_FPU_STICKY_BITS (a.k.a. special values) as a way for the SFPU to signal to the RISC-V core whether or not more iterations are needed. This register contains flags that indicate a special values (NaN, infinity, denorm) were generated in outputs of the FPU or SFPU.
For example, after condition codes have been set on the SFPU, if any lanes are still enabled, we can generate a NaN value, which should set the special values register.
Now, all that remains is for the RISC-V core to wait for an appropriate semaphore signal, check the special values register, clear it, and issue more instructions for the loop if necessary, otherwise it can issue instructions for the next set of operations.
Describe the solution you'd like
The above seems like a bit of a "hack", and while testing, I found that the sticky bits didn't always get set properly, though I've had difficulty extracting this into a standalone test case. One workaround was to issue two NaN-generating instructions, but it's unclear why this should work.
Is this method 100% reliable? What latency guarantees are there for these flags to be readable by the RISC-V core?
It would be great to have an officially supported/documented method for doing this.
Describe alternatives you've considered
The above special values register hack.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
While working on #17807, I was frustrated by the fact that for some inputs, the number of iterations required may be much smaller than the worst case, but for SFPU programming, the worst case number of iterations must be used (as there is no SFPU control flow), affecting performance.
One possible solution is to use
RISCV_DEBUG_REG_FPU_STICKY_BITS
(a.k.a. special values) as a way for the SFPU to signal to the RISC-V core whether or not more iterations are needed. This register contains flags that indicate a special values (NaN, infinity, denorm) were generated in outputs of the FPU or SFPU.For example, after condition codes have been set on the SFPU, if any lanes are still enabled, we can generate a
NaN
value, which should set the special values register.Now, all that remains is for the RISC-V core to wait for an appropriate semaphore signal, check the special values register, clear it, and issue more instructions for the loop if necessary, otherwise it can issue instructions for the next set of operations.
Describe the solution you'd like
The above seems like a bit of a "hack", and while testing, I found that the sticky bits didn't always get set properly, though I've had difficulty extracting this into a standalone test case. One workaround was to issue two NaN-generating instructions, but it's unclear why this should work.
Is this method 100% reliable? What latency guarantees are there for these flags to be readable by the RISC-V core?
It would be great to have an officially supported/documented method for doing this.
Describe alternatives you've considered
The above special values register hack.
The text was updated successfully, but these errors were encountered: