System crashes due to SSD

Hello everyone!
Recently I've bought Kingston A2000 (1TB) and did a fresh install of Manjaro KDE. At first, everything was fine but after a while, my system starts to freeze without any reason. First I thought that it was KDE (there are still some warnings in KSystemLog about kwin_x11, krunner, and plasmashell) but I wasn't able to switch to tty, and during one of these crashes I've found this
photo_2020-07-26_10-19-08
By searching the internet I've found that there were some kernel bugs:
https://bbs.archlinux.org/viewtopic.php?id=243390
https://bugzilla.kernel.org/show_bug.cgi?id=204887
But I don't know is this bug fixed already or not.
I was trying several different kernels (linux-pf, linux-xanmod, and stock Manjaro-5.7.9-1) but there are still these crashes.
Also, I was trying nvme_core.default_ps_max_latency_us=0 and nvme_core.default_ps_max_latency_us=5500 but my OS still crashes from time to time.
And here's my inxi -Fxxxza --no-host

System:    Kernel: 5.7.6-pf3-1 x86_64 bits: 64 compiler: gcc v: 10.1.0 
           parameters: BOOT_IMAGE=/boot/vmlinuz-linux-pf root=UUID=7c8c852b-f4d7-4293-991d-301797f6b1c7 rw quiet apparmor=1 
           security=apparmor udev.log_priority=3 
           Desktop: KDE Plasma 5.19.3 tk: Qt 5.15.0 info: latte-dock wm: kwin_x11 dm: SDDM Distro: Manjaro Linux 
Machine:   Type: Laptop System: ASUSTeK product: TUF Gaming FX705DY_FX705DY v: 1.0 serial: <filter> 
           Mobo: ASUSTeK model: FX705DY v: 1.0 serial: <filter> UEFI: American Megatrends v: FX705DY.315 date: 03/09/2020 
Battery:   ID-1: BAT0 charge: 57.8 Wh condition: 57.8/64.0 Wh (90%) volts: 15.6/15.6 model: ASUSTeK ASUS Battery type: Li-ion 
           serial: <filter> status: Not charging 
CPU:       Topology: Quad Core model: AMD Ryzen 5 3550H with Radeon Vega Mobile Gfx bits: 64 type: MT MCP arch: Zen+ 
           family: 17 (23) model-id: 18 (24) stepping: 1 microcode: 8108102 L2 cache: 2048 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 33538 
           Speed: 2315 MHz min/max: 1400/2100 MHz boost: enabled Core speeds (MHz): 1: 2125 2: 2110 3: 1610 4: 1559 5: 1308 
           6: 1441 7: 1543 8: 1497 
           Vulnerabilities: Type: itlb_multihit status: Not affected 
           Type: l1tf status: Not affected 
           Type: mds status: Not affected 
           Type: meltdown status: Not affected 
           Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp 
           Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization 
           Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, STIBP: disabled, RSB filling 
           Type: srbds status: Not affected 
           Type: tsx_async_abort status: Not affected 
Graphics:  Device-1: AMD Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] vendor: ASUSTeK driver: amdgpu 
           v: kernel bus ID: 01:00.0 chip ID: 1002:67ef 
           Device-2: Advanced Micro Devices [AMD/ATI] Picasso vendor: ASUSTeK driver: amdgpu v: kernel bus ID: 05:00.0 
           chip ID: 1002:15d8 
           Display: x11 server: X.Org 1.20.8 driver: amdgpu FAILED: ati unloaded: modesetting alternate: fbdev,vesa 
           compositor: kwin_x11 resolution: 1920x1080~60Hz 
           OpenGL: renderer: AMD RAVEN (DRM 3.37.0 5.7.6-pf3-1 LLVM 10.0.0) v: 4.6 Mesa 20.1.3 direct render: Yes 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio vendor: ASUSTeK 
           driver: snd_hda_intel v: kernel bus ID: 05:00.1 chip ID: 1002:15de 
           Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel 
           bus ID: 05:00.6 chip ID: 1022:15e3 
           Sound Server: ALSA v: k5.7.6-pf3-1 
Network:   Device-1: Realtek RTL8821CE 802.11ac PCIe Wireless Network Adapter vendor: AzureWave driver: rtl8821ce v: N/A 
           port: e000 bus ID: 03:00.0 chip ID: 10ec:c821 
           IF: wlp3s0 state: down mac: <filter> 
           Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASUSTeK driver: r8169 v: kernel port: d000 
           bus ID: 04:00.0 chip ID: 10ec:8168 
           IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Drives:    Local Storage: total: 1.82 TiB used: 252.11 GiB (13.5%) 
           ID-1: /dev/nvme0n1 vendor: Kingston model: SA2000M81000G size: 931.51 GiB block size: physical: 512 B 
           logical: 512 B speed: 31.6 Gb/s lanes: 4 serial: <filter> rev: S5Z42105 scheme: GPT 
           ID-2: /dev/sda vendor: Toshiba model: MQ04ABF100 size: 931.51 GiB block size: physical: 4096 B logical: 512 B 
           speed: 6.0 Gb/s rotation: 5400 rpm serial: <filter> rev: 0J scheme: GPT 
Partition: ID-1: / raw size: 100.00 GiB size: 97.93 GiB (97.93%) used: 20.20 GiB (20.6%) fs: ext4 dev: /dev/nvme0n1p1 
           ID-2: /home raw size: 831.51 GiB size: 817.46 GiB (98.31%) used: 231.88 GiB (28.4%) fs: ext4 dev: /dev/nvme0n1p2 
Sensors:   System Temperatures: cpu: 48.5 C mobo: N/A 
           Fan Speeds (RPM): cpu: 2100 
           GPU: device: amdgpu temp: 44 C device: amdgpu temp: 48 C 
Info:      Processes: 286 Uptime: 49m Memory: 15.39 GiB used: 3.27 GiB (21.3%) Init: systemd v: 245 Compilers: gcc: 10.1.0 
           Shell: bash v: 5.0.18 running in: konsole inxi: 3.0.37

GSmartControl info

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.6-pf3-1] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       KINGSTON SA2000M81000G
Serial Number:                      50026B7683D01F40
Firmware Version:                   S5Z42105
PCI Vendor/Subsystem ID:            0x2646
IEEE OUI Identifier:                0x0026b7
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            379,850,522,624 [379 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            0026b7 683d01f405
Local Time is:                      Sun Jul 26 10:49:51 2020 MSK
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0
 1 +     4.60W       -        -    1  1  1  1        0       0
 2 +     3.80W       -        -    2  2  2  2        0       0
 3 -   0.0450W       -        -    3  3  3  3     2000    2000
 4 -   0.0040W       -        -    4  4  4  4    15000   15000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

I can provide any other info that would help to fix this. Thank you!

Hi!
did you try to update kernel to 5.7.9?

You mean custom 5.7.9? I can't update none of my custom kernels because my system hangs during compiling it.

You could try switching to unstable branch and run a full update.

sudo pacman -aS unstable -u https://manjaro.moson.eu && sudo pacman -Syyu

This will bring you on 5.7.10.

My systems is Intel based so I can't be of much help otherwise. I have heard of this 'bug' with nvme devices but - luckily - I have never been affected by it.

I think this wouldn't help me. When I'm installing something small from AUR or Official Repos (doens't matter) everything goes fine, but when I'm trying to install(and compile) something bigger like custom kernel - my system just hang about to 3/4 of process.

It sounds like it is a memory / cache issue? The /tmp dir runs full? Try changing the /tmp mount from tmpfs to physical?

I think it doesn't connect to this, because system can hang when I'm turning my laptop off or rebooting it. Also it may hang during gaming or even just after login into system. But I'll give a try with your advise.

You are probably right - just worth mentioning :slight_smile: - I understand why you are puzzled - I am too because I have never experienced an issue like it - just read on it briefly some time ago.

Dumb question, have you tried doing the kernel update via a TTY? I had a similar issue with my Western Digital NVME drive and that is how I had to do a kernel update (4.19). I still have to use the kernel argument you mentioned above.

Nope, but I'll give it a try! Is that kernel argument helped you?

Yes it did. I use the nvme_core.default_ps_max_latency_us=5500 argument. You should be able to drop to a TTY right at the login screen by hitting ctrl+alt+F3. If a new kernel doesn't work, try uninstalling TLP. It may be putting the drive into a "standby or sleep" state.

I am on my phone so I hope the formatting is correct. I had to do the above process over a year ago. Currently I am running the latest Linux-Zen kernel and I am not having any issues.

BTW, my laptop is an Asus FX504G.

I need to uninstall TLP or replace it with something else ? I'm already turned off all sleep/hibernate options in system power management.
So you say that you don't have issues like mine with same/almost same hardware?

My laptop is a generation behind yours using an Intel processor. I had some major problems when I first got it over a year ago, but they have all been resolved now.

I would just uninstall TLP and not replace it with another program. If that doesn't work, you can always reinstall it.

Also, make sure that you are using the latest bios. You can get to it by holding down the Esc key when you first power on your laptop. This will bring up a "boot" menu. Choose "setup". Asus has a real nice utility that allows you to update the bios while in the bios. Just make sure to disable secure boot again after updating.

If memory serves, those settings do not affect TLP. TLP has to be configured separately. Either by the terminal or using the program TLPUI.

Like I said, I hope I am remembering correctly.

Sorry about the double post.

Checked BIOS version - I have 315(last version on of. website).
Checked etc/default/grub - there was nvme_core.default_ps_max_latency_us="5500" , I've deleted quotes and tried again with and without TLP (within tty and under graphic session) - system still hangs during compiling something huge. Are there any special packages for SSD?
Besides, I have zram and zswap packages installed so I can't run out of RAM or something like that.
I don't want to switch to another distro (maybe a pure Arch if it will fix this) but this problem ruins my work and game processes.

In pamac - cache is 0 and there are only 1.5GB of AUR files in tmp directory.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Forum kindly sponsored by