You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by GitBox <gi...@apache.org> on 2022/04/22 11:22:20 UTC
[GitHub] [incubator-nuttx] pussuw opened a new pull request, #6134: RISC-V: Fix system crash when FPU is in use
pussuw opened a new pull request, #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134
FPU registers need to be written prior to updating CSR_STATUS
## Summary
Fixes full system crash due to illegal instruction when accessing FPU when it is too late
## Impact
Fixes system crash
## Testing
icicle:nsh and icicle:knsh, ostest fails at once prior fix, passes with fix
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1110343817
@pussuw
>I confirmed that maix-bit:smp works on real hardware.
I noticed that fpu_test works ** without ** this PR on maix-bit (real hardware).
So I think that something is wrong with your icicle.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] pussuw commented on pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
pussuw commented on PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1109545726
I have never used any emulated environment so obviously I did not notice this issue. I ran several tests with FPU enabled/disabled and with flat/protected mode, and both work with this patch, but at least protected mode fails randomly without.
I'll continue monitoring this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1109525750
>@masayuki2009 I am not able to reproduce on hardware (mpfs icicle:nsh):
@pussuw
I confirmed that `maix-bit:smp` works on real hardware.
```
nsh> uname -a
NuttX 10.3.0-RC0 aaa5316235 Apr 26 2022 17:30:16 risc-v maix-bit
nsh> mount
/proc type procfs
nsh> ps
PID GROUP CPU PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 0 FIFO Kthread N-- Assigned 00000000 002000 001072 53.6% CPU0 IDLE
1 1 1 0 FIFO Kthread N-- Running 00000000 002000 000592 29.6% CPU1 IDLE
2 2 0 100 RR Task --- Running 00000000 003040 002256 74.2% nsh_main
nsh> free
total used free largest nused nfree
Umem: 2079920 11984 2067936 2067936 38 1
...
nsh> ostest
stdio_test: write fd=1
stdio_test: Standard I/O Check: printf
stdio_test: write fd=2
stdio_test: Standard I/O Check: fprintf to stderr
ostest_main: putenv(Variable1=BadValue3)
ostest_main: setenv(Variable1, GoodValue1, TRUE)
ostest_main: setenv(Variable2, BadValue1, FALSE)
ostest_main: setenv(Variable2, GoodValue2, TRUE)
ostest_main: setenv(Variable3, Variable3, FALSE)
ostest_main: setenv(Variable3, Variable3, FALSE)
show_variable: Variable=Variable1 has value=GoodValue1
show_variable: Variable=Variable2 has value=GoodValue2
show_variable: Variable=Variable3 has value=GoodValue3
ostest_main: Started user_main at PID=24
...
user_main: FPU test
Starting task FPU#1
FPU#1: pass 1
fpu_test: Started task FPU#1 at PID=25
Starting task FPU#2
FPU#2: pass 1
fpu_test: Started task FPU#2 at PID=26
FPU#1: pass 2
FPU#2: pass 2
FPU#1: pass 3
FPU#2: pass 3
FPU#1: pass 4
FPU#2: pass 4
FPU#1: pass 5
FPU#2: pass 5
FPU#1: pass 6
FPU#2: pass 6
FPU#1: pass 7
FPU#2: pass 7
FPU#1: pass 8
FPU#2: pass 8
FPU#1: pass 9
FPU#2: pass 9
FPU#1: pass 10
FPU#2: pass 10
FPU#1: pass 11
FPU#2: pass 11
FPU#1: pass 12
FPU#2: pass 12
FPU#1: pass 13
FPU#2: pass 13
FPU#1: pass 14
FPU#2: pass 14
FPU#1: pass 15
FPU#2: pass 15
FPU#1: pass 16
FPU#2: pass 16
FPU#1: Succeeded
FPU#2: Succeeded
fpu_test: Returning
....
```
So the issue might only happen with `rv-virt:nsh64` on QEMU.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1109173828
@pussuw
I noticed that fpu_test with rv-virt:nsh64 on QEMU failed.
If I revert this PR, it works.
$ ~/opensource/QEMU/qemu-6.2/build/riscv64-softmmu/qemu-system-riscv64 -nographic -cpu rv64 -smp 8 -M virt -bios none -kernel ./nuttx
```
user_main: FPU test
Starting task FPU#1
fpu_test: Started task FPU#1 at PID=4
riscv_exception: EXCEPTION: Illegal instruction. MCAUSE: 0000000000000002
riscv_exception: PANIC!!! Exception = 0000000000000002
up_assert: Assertion failed at file:common/riscv_exception.c line: 89 task: FPU#1
riscv_registerdump: EPC: 000000008000f574
riscv_registerdump: A0: 0000000000000000 A1: 0000000080025f30 A2: 0000000080025f30 A3: 000000008000f518
riscv_registerdump: A4: 0000000000000020 A5: 0000000000000440 A6: 0000000000000000 A7: 0000000000000000
riscv_registerdump: T0: 0000000000000000 T1: 0000000000000000 T2: 0000000000000000 T3: 0000000000000000
riscv_registerdump: T4: 0000000000000000 T5: 0000000000000000 T6: 0000000000000000
riscv_registerdump: S0: 0000000000000000 S1: 0000000000000001 S2: 0000000000000000 S3: 0000000000000000
riscv_registerdump: S4: 0000000000000000 S5: 000000008001f608 S6: 0000000000000000 S7: 0000000000000000
riscv_registerdump: S8: 0000000000000000 S9: 0000000000000000 S10: 0000000000000000 S11: 0000000000000000
riscv_registerdump: SP: 0000000080026670 FP: 0000000000000000 TP: 0000000000000000 RA: 000000008000f566
riscv_dumpstate: sp: 000000008001ea70
riscv_dumpstate: IRQ stack:
riscv_dumpstate: base: 000000008001e310
riscv_dumpstate: size: 0000000000000800
riscv_stackdump: 000000008001ea60: 000007e0 00000000 80004baa 00000000 00000059 00000000 80015478 00000000
riscv_stackdump: 000000008001ea80: 00000000 00000000 00000000 00000000 00000000 00000000 80026460 00000000
riscv_stackdump: 000000008001eaa0: 00000002 00000000 8000307c 00000000 00000009 00000000 800006a4 00000000
riscv_stackdump: 000000008001eac0: 8001ff18 00000000 00000001 00000000 8001ff18 00000000 80000aa8 00000000
riscv_stackdump: 000000008001eae0: deadbeef deadbeef 80000400 00000000 00000002 00000000 800003c2 00000000
riscv_dumpstate: sp: 0000000080026670
riscv_dumpstate: User stack:
riscv_dumpstate: base: 0000000080025f50
riscv_dumpstate: size: 00000000000007e0
riscv_dumpstate: User Stack
riscv_stackdump: 0000000080026660: 00000000 00000000 8000f566 00000000 8001ec38 00000000 00000c00 00000000
riscv_stackdump: 0000000080026680: 80005082 00000000 00000000 00000000 80021540 00000000 80000bec 00000000
riscv_stackdump: 00000000800266a0: 80021580 00000000 80000fb0 00000000 80021580 00000000 80000cfa 00000000
riscv_stackdump: 00000000800266c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
riscv_stackdump: 00000000800266e0: 00000000 00000000 00000000 00000000 8000f518 00000000 80003160 00000000
riscv_stackdump: 0000000080026700: 80025f30 00000000 00000001 00000000 00000000 00000000 8000100a 00000000
riscv_showtasks: PID PRI USED STACK FILLED COMMAND
riscv_showtasks: ---- ---- 648 2048 31.6% irq
riscv_dump_task: 0 0 1040 2000 52.0% Idle Task
riscv_dump_task: 1 100 1616 3040 53.1% nsh_main
riscv_dump_task: 2 100 1328 2016 65.8% ostest
riscv_dump_task: 3 100 1440 8112 17.7% ostest
riscv_dump_task: 4 100 720 2016 35.7% FPU#1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] pussuw commented on pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
pussuw commented on PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1301895125
@masayuki2009 It has been a while but I was revisiting the context saving code and remembered you had an issue with FPU on QEMU. I found at least one issue, that might be the cause for my crashes:
The int/fpu register restoration ordering in exception_common:
```
load_ctx sp
riscv_loadfpu sp
```
is bad, because riscv_loadfpu uses registers t0,t1,t2
```
.macro riscv_loadfpu out
#ifdef CONFIG_ARCH_FPU
REGLOAD t0, REG_INT_CTX(\out)
li t1, MSTATUS_FS
and t2, t0, t1
li t1, MSTATUS_FS_INIT
ble t2, t1, skip_load_fpu
```
So the bad ordering restores the original integer registers including t0,t1,t2, and then these are immediately destroyed by riscv_loadfpu.
One potential fix would be use the "bad" ordering above, but to restore t0,t1,t2 from the context after the FPU context is restored:
```
skip_load_fpu:
... restore t0,t1,t2
```
I will try this again when I have time, but if you ever get the urge to get FPU working on QEMU this might be a potential fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] pussuw commented on pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
pussuw commented on PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1110623077
@masayuki2009
I also did testing on icicle:nsh (BUILD_FLAT) and it works with and without this patch.
However, with icicle:knsh (BUILD_PROTECTED) the issue is 100% reproducible. FPU gets enabled, but ostest does not run fpu_test as it only compiles with build flat. With knsh, the system crashes at random locations during ostest, which has never completed successfully **without** this patch, and I have not seen an error **with* this patch.
The way I found this (potential) issue, is by bisecting:
5b6dd876b87196cc82dd45258d77e22d0c8b033b <- good commit (risc-v/riscv_assert.c Fix dumping of status from ISR)
c29a3b7bd89c45d3781133679ae875e0e835bdd4 <- bad commit (pipe: Increase buffer size by one byte to ompensate the full indicator)
So somewhere between those two commits there is an issue.
48b81bda0904a426360fa809fd747e3966d29024 <- (arch/risc-v: Change riscv_savefpu/riscv_loadfpu to macro) by partially reverting this commit the crash goes away. The partial revert is to move loadfpu back to where it was prior to 48b8...
So this is how I made this patch. Of course there can be some other issue and I just masked it with this patch. One clear candidate can be `riscv_get_newintctx()`, because ostest creates / destroys new tasks for certain tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] xiaoxiang781216 merged pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 merged PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] pussuw commented on pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
pussuw commented on PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1109396888
@masayuki2009 I'll run fpu test on hardware. I get an immediate crash due to illegal instruction if this patch is not in place, so the exact opposite behavior to yours.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] pussuw commented on pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
pussuw commented on PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1109419451
@masayuki2009 I am not able to reproduce on hardware (mpfs icicle:nsh):
```
user_main: FPU test
Starting task FPU#1
fpu_test: Started task FPU#1 at PID=6
FPU#1: pass 1
Starting task FPU#2
fpu_test: Started task FPU#2 at PID=7
FPU#2: pass 1
FPU#1: pass 2
FPU#2: pass 2
FPU#1: pass 3
FPU#2: pass 3
FPU#1: pass 4
FPU#2: pass 4
FPU#1: pass 5
FPU#2: pass 5
FPU#1: pass 6
FPU#2: pass 6
FPU#1: pass 7
FPU#2: pass 7
FPU#1: pass 8
FPU#2: pass 8
FPU#1: pass 9
FPU#2: pass 9
FPU#1: pass 10
FPU#2: pass 10
FPU#1: pass 11
FPU#2: pass 11
FPU#1: pass 12
FPU#2: pass 12
FPU#1: pass 13
FPU#2: pass 13
FPU#1: pass 14
FPU#2: pass 14
FPU#1: pass 15
FPU#2: pass 15
FPU#1: pass 16
FPU#2: pass 16
FPU#1: Succeeded
FPU#2: Succeeded
fpu_test: Returning
user_main: Exiting
ostest_main: Exiting with status 0
nsh>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-nuttx] pussuw commented on pull request #6134: RISC-V: Fix system crash when FPU is in use
Posted by GitBox <gi...@apache.org>.
pussuw commented on PR #6134:
URL: https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1109449088
I'm wondering if this is somehow related:
https://github.com/apache/incubator-nuttx/pull/6125
This PR:
https://github.com/apache/incubator-nuttx/pull/6101
Changed the order in which FPU and CSR_STATUS are restored, 6134 (this PR) basically just restores the original ordering. In my case, without 6134 the system crashes totally randomly, not even necessarily during the FPU test. With 6134 I don't get crashes and FPU test also works fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org