Nvidia Optimus Laptop using PRIME fails to resume from suspend: GPU has fallen of the bus

Hello,

I installed the video-hybrid-intel-nvidia-440xx-prime configuration using mhwd and everything seems to work fine except for one thing: resuming from suspend results in a black screen.

Laptop: DELL XPS 15 7590 (Intel UHD Graphics 630 + NVIDIA GeForce GTX 650 M)

inxi -Fxxxza --no-host:

System:    Kernel: 5.4.20-1-MANJARO x86_64 bits: 64 compiler: gcc v: 9.2.1 
           parameters: BOOT_IMAGE=/boot/vmlinuz-5.4-x86_64 
           root=UUID=a6df5f56-6a54-45ee-893c-f4ea4c4be361 rw quiet 
           resume=UUID=1c4aa7ee-8b84-49f7-8f0b-50758729ea73 udev.log_priority=3 
           Desktop: KDE Plasma 5.18.0 tk: Qt 5.14.1 info: latte-dock wm: kwin_x11 dm: SDDM 
           Distro: Manjaro Linux 
Machine:   Type: Laptop System: Dell product: XPS 15 7590 v: N/A serial: <filter> Chassis: type: 10 
           serial: <filter> 
           Mobo: Dell model: 0VYV0G v: A00 serial: <filter> UEFI: Dell v: 1.5.0 date: 12/25/2019 
Battery:   ID-1: BAT0 charge: 28.5 Wh condition: 96.2/97.0 Wh (99%) volts: 11.1/11.4 
           model: SMP DELL GPM0365 type: Li-ion serial: <filter> status: Discharging 
CPU:       Topology: 6-Core model: Intel Core i7-9750H bits: 64 type: MT MCP arch: Kaby Lake family: 6 
           model-id: 9E (158) stepping: A (10) microcode: CA L2 cache: 12.0 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 62431 
           Speed: 900 MHz min/max: 800/4500 MHz Core speeds (MHz): 1: 900 2: 900 3: 900 4: 900 5: 900 
           6: 900 7: 900 8: 900 9: 900 10: 900 11: 900 12: 900 
           Vulnerabilities: Type: itlb_multihit status: KVM: Split huge pages 
           Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable 
           Type: mds mitigation: Clear CPU buffers; SMT vulnerable 
           Type: meltdown mitigation: PTI 
           Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp 
           Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization 
           Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: 
           conditional, RSB filling 
           Type: tsx_async_abort status: Not affected 
Graphics:  Device-1: Intel UHD Graphics 630 vendor: Dell driver: i915 v: kernel bus ID: 00:02.0 
           chip ID: 8086:3e9b 
           Device-2: NVIDIA TU117M [GeForce GTX 1650 Mobile / Max-Q] vendor: Hewlett-Packard 
           driver: nvidia v: 440.59 bus ID: 01:00.0 chip ID: 10de:1f91 
           Display: x11 server: X.Org 1.20.7 driver: modesetting,nvidia 
           alternate: fbdev,intel,nouveau,nv,vesa compositor: kwin_x11 tty: N/A 
           OpenGL: renderer: Mesa DRI Intel UHD Graphics 630 (Coffeelake 3x8 GT2) v: 4.6 Mesa 19.3.4 
           compat-v: 3.0 direct render: Yes 
Audio:     Device-1: Intel Cannon Lake PCH cAVS vendor: Dell driver: snd_hda_intel v: kernel 
           bus ID: 00:1f.3 chip ID: 8086:a348 
           Sound Server: ALSA v: k5.4.20-1-MANJARO 
Network:   Device-1: Intel Wi-Fi 6 AX200 vendor: Bigfoot Networks driver: iwlwifi v: kernel port: 3000 
           bus ID: 3b:00.0 chip ID: 8086:2723 
           IF: wlp59s0 state: up mac: <filter> 
Drives:    Local Storage: total: 476.94 GiB used: 77.27 GiB (16.2%) 
           ID-1: /dev/nvme0n1 vendor: SK Hynix model: PC601 NVMe 512GB size: 476.94 GiB block size: 
           physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 serial: <filter> rev: 80002111 
           scheme: GPT 
Partition: ID-1: / raw size: 459.83 GiB size: 451.62 GiB (98.21%) used: 77.23 GiB (17.1%) fs: ext4 
           dev: /dev/nvme0n1p2 
           ID-2: swap-1 size: 16.81 GiB used: 0 KiB (0.0%) fs: swap swappiness: 60 (default) 
           cache pressure: 100 (default) dev: /dev/nvme0n1p3 
Sensors:   System Temperatures: cpu: 46.0 C mobo: N/A 
           Fan Speeds (RPM): N/A 
Info:      Processes: 293 Uptime: 4h 05m Memory: 15.28 GiB used: 2.99 GiB (19.6%) Init: systemd v: 244 
           Compilers: gcc: 9.2.1 clang: 9.0.1 Shell: fish v: 3.0.2 running in: tilix inxi: 3.0.37 

prime-run seems to work:

$ glxinfo | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2) 
OpenGL core profile version string: 4.6 (Core Profile) Mesa 19.3.4
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 19.3.4
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.3.4
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

$ prime-run glxinfo | grep OpenGL
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce GTX 1650/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 440.59
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.6.0 NVIDIA 440.59
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 440.59
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

However as said above I get a black screen when trying to resume from suspend. Looking through journalctl -b -1's output after a forced reboot, I found these lines which occured when I got a black screen:

fév 13 15:52:16 DELL kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.59  Thu Jan 30 01:00:41 UTC 2020
fév 13 16:04:00 DELL kernel: NVRM: GPU at PCI:0000:01:00: GPU-ee5b9bdb-e3b0-ac64-a3ad-f4b06bb7039f
fév 13 16:04:00 DELL kernel: NVRM: Xid (PCI:0000:01:00): 79, pid=881, GPU has fallen off the bus.
fév 13 16:04:00 DELL kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
fév 13 16:04:00 DELL kernel: NVRM: A GPU crash dump has been created. If possible, please run
                              NVRM: nvidia-bug-report.sh as root to collect this data before
                              NVRM: the NVIDIA kernel module is unloaded.

Do I need to do anything else after installing the prime mhwd configuration? Is it the correct installation method? I found these posts: https://archived.forum.manjaro.org/t/install-nvidia-prime-on-manjaro-18-1-4/114993, https://archived.forum.manjaro.org/t/install-nvidia-prime-on-manjaro-18-1-4/114993, but they are not super recent and about older versions of Manjaro, so I am not sure what I should follow.

Thanks a lot.

Hello,
I would suggest to use the kernel boot parameter option:
pcie_aspm=off
and see if that changes the behavior.

2 Likes

I tried and I got the same error unfortunately.

That means it needs more investigation on that particular laptop model. Have you checked this:
https://wiki.archlinux.org/index.php/Dell_XPS_15_7590

1 Like

I did and I didn't find anything related to using Nvidia Prime specifically, only Nvidia xrun or bumblebee, which I would prefer to avoid for now.

However in the meantime I found the following links: https://forums.linuxmint.com/viewtopic.php?t=258577, https://archived.forum.manjaro.org/t/gpu-falling-off-bus-and-freezing-laptop/104046/10 and after some trial and error discovered I only need the acpi_osi=! acpi_osi='Windows 2009' parameters to solve my issue. Unfortunately, now I can't change my screen backlight intensity anymore and my laptop's power button doesn't work.

I haven't looked deeply into it yet, but it seems like the acpi_osi kernel parameter can influence which features the BIOS enables which I guess might be what causes these other issues.

Also if that can be useful, note that resuming from hibernation (suspend to disk) works perfectly fine though.

On the same arch wiki
https://wiki.archlinux.org/index.php/Dell_XPS_15_7590#Backlight

The thing is, if you combine instructions from different sources, you have to make sure that are not conflicting one to another.
Best of luck!

Indeed haha. I just updated to kernel 5.5 though and the issue seems to have disappeared, I'll look into it more when I have time, thanks for the link and your help!

some laptops dont like acpi_osi=! and it breaks certain things while fixing others. for a dell xps try using only one of these and also leave out the windows osi paramter as well.

acpi_rev_override=1
acpi_rev_override=2
acpi_rev_override=3
acpi_rev_override=4
acpi_rev_override=5

most dell's seem to work better with these and does so without breaking something else. (like backlight)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Forum kindly sponsored by