I have been experiencing frequent OS crashes starting around September last year while still running Kubuntu which I wasn't able to pin down. Filled several pages over at the Kubuntu Forum and spend hours with a Dell support guy but never got to the bottom of the issue.
Part of my quest to get this issue solved was moving to Manjaro and I got stuck here because I liked what I saw and it seemed to be a lot more resilient to this issue, a crash maybe once a week. And I only recently stumbled over the likely root cause, this is exactly what I see happening:
Once I found this I started to play around with the
nvme_core.default_ps_max_latency_us kernel parameter but if anything things are worse than before, no matter if I set it to 0, 220, 5500 or leave it out.
Actually, today was the first time I was able to reliably reproduce the error(at least for a while) by running a backup to a second internal drive. The OS reliably (sic) crashed, no matter what I set in
I can see the changes are picked up by running
sudo nvme get-feature -f 0x0c -H /dev/nvme0
get-feature:0xc (Autonomous Power State Transition), Current value:0x000001
Autonomous Power State Transition Enable (APSTE): Enabled
Auto PST Entries .................
Idle Time Prior to Transition (ITPT): 86 ms
Idle Transition Power State (ITPS): 3
Which gives me
Enabled when a value >0 or nothing is set. When I set
nvme_core.default_ps_max_latency_us=0 this will switch to
Kernel doesn't seem to matter, I tried with 4.20 (my default), 4.19 and 4.14 today, got crashes with all of them.
I had probably 20 crashes today, most of them forced via the backup, but now things are stable again for 2 hours, including running a backup!?
The unpredictability of this bug is driving me mad, and I got a few presentations coming up where I can't have a crashing system. I think I will open a case with Dell to get this nasty NVMe drive replaced but before I do so I wanted to quickly check if anyone here has got any other ideas of what to check.
And because I know you will ask , here's my Inxi:
inxi -D Drives: Local Storage: total: 2.29 TiB used: 1.04 TiB (45.6%) ID-1: /dev/nvme0n1 vendor: Samsung model: SM961 NVMe 512GB size: 476.94 GiB ID-2: /dev/sda vendor: Seagate model: ST2000LM015-2E8174 size: 1.82 TiB thomas@hermes:~$ inxi -b System: Host: hermes Kernel: 4.20.3-1-MANJARO x86_64 bits: 64 Desktop: KDE Plasma 5.14.5 Distro: Manjaro Linux Machine: Type: Laptop System: Dell product: Precision 7510 v: N/A serial: <root required> Mobo: Dell model: 0M1YNP v: A00 serial: <root required> UEFI: Dell v: 1.16.3 date: 09/12/2018 Battery: ID-1: BAT0 charge: 41.8 Wh condition: 66.5/91.0 Wh (73%) CPU: Quad Core: Intel Core i7-6920HQ type: MT MCP speed: 825 MHz min/max: 800/3800 MHz Graphics: Device-1: Intel HD Graphics 530 driver: i915 v: kernel Device-2: NVIDIA GM107GLM [Quadro M1000M] driver: nouveau v: kernel Display: x11 server: X.Org 1.20.3 driver: intel,nouveau unloaded: modesetting resolution: 1920x1080~60Hz OpenGL: renderer: Mesa DRI Intel HD Graphics 530 (Skylake GT2) v: 4.5 Mesa 18.3.2 Network: Device-1: Intel Ethernet I219-LM driver: e1000e Device-2: Intel Wireless 8260 driver: iwlwifi Drives: Local Storage: total: 2.29 TiB used: 1.04 TiB (45.6%) Info: Processes: 319 Uptime: 1h 32m Memory: 15.56 GiB used: 2.92 GiB (18.8%) Shell: bash inxi: 3.0.30