AMD Ryzen 7 3700X and Radeon RX 5700 XT support

You not having the latest and greatest LLVM was probably the problem for it not working properly before. With LLVM 8 I couldn't even boot...

On what kernel are you now with those instabilities? If you're on 5.3rc7, too, then it's maybe some differences in software we're using or whatever. In that case you might wanna try something like https://aur.archlinux.org/packages/linux-amd-staging-drm-next-git for example. Maybe it helps

I think @Zamundaaa might be right - i'v had zero issues till now (besides the fan-control and doom things mentioned above).
If you've got lcalier, you can install llvm-libs-git to get llvm 10

1 Like

Yeah, that must have been the problem. I'm on the rc7. I noticed the freezes only happen when the GPU is being pushed. I had a few windows open with video playback in each. A quick google search suggests it may be a power problem. At least the system is stable enough for my regular work.

update:
The problem is a little more severe than I thought. Resume from suspend also causes a freeze.

Sep 06 14:13:13 twifty-lynx systemd[1]: tlp-sleep.service: Succeeded.
Sep 06 14:13:13 twifty-lynx systemd[1]: Stopped TLP suspend/resume.
Sep 06 14:13:13 twifty-lynx audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=tlp-sleep comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 06 14:13:13 twifty-lynx kernel: audit: type=1131 audit(1567750393.846:85): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=tlp-sleep comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=succ>
Sep 06 14:13:16 twifty-lynx systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Sep 06 14:13:16 twifty-lynx audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 06 14:13:16 twifty-lynx kernel: audit: type=1131 audit(1567750396.273:86): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? ter>
Sep 06 14:13:19 twifty-lynx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=658084, emitted seq=658086
Sep 06 14:13:19 twifty-lynx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1314 thread gnome-shel:cs0 pid 1320
Sep 06 14:13:19 twifty-lynx kernel: [drm] GPU recovery disabled.
Sep 06 14:13:24 twifty-lynx org.gnome.Shell.desktop[1314]: amdgpu: The CS has been rejected, see dmesg for more information (-16).
Sep 06 14:13:24 twifty-lynx kernel: Move buffer fallback to memcpy unavailable
Sep 06 14:13:24 twifty-lynx kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Sep 06 14:13:24 twifty-lynx kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -16!
Sep 06 14:13:30 twifty-lynx NetworkManager[739]: <info>  [1567750410.0778] manager: NetworkManager state is now CONNECTED_SITE
Sep 06 14:13:30 twifty-lynx dbus-daemon[737]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.3' (uid=0 pid=739 comm="/usr/bin/N>
Sep 06 14:13:30 twifty-lynx systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 06 14:13:30 twifty-lynx dbus-daemon[737]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Sep 06 14:13:30 twifty-lynx systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 06 14:13:30 twifty-lynx audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 06 14:13:30 twifty-lynx kernel: audit: type=1130 audit(1567750410.086:87): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? ter>
Sep 06 14:13:33 twifty-lynx org.gnome.Shell.desktop[1314]: [1989:2009:0906/141333.893251:ERROR:connection_factory_impl.cc(413)] Failed to connect to MCS endpoint with error -105
Sep 06 14:13:40 twifty-lynx org.gnome.Shell.desktop[1314]: amdgpu: The CS has been rejected, see dmesg for more information (-16).
Sep 06 14:13:40 twifty-lynx kernel: Move buffer fallback to memcpy unavailable
Sep 06 14:13:40 twifty-lynx kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Sep 06 14:13:40 twifty-lynx kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -16!
Sep 06 14:13:40 twifty-lynx systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Sep 06 14:13:40 twifty-lynx audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 06 14:13:40 twifty-lynx kernel: audit: type=1131 audit(1567750420.279:88): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? ter>
Sep 06 14:13:48 twifty-lynx kernel: check: Corrupted low memory at 00000000bd059f0f (2aa8 phys) = 00550000
Sep 06 14:13:48 twifty-lynx kernel: check: Corrupted low memory at 00000000881dc0d7 (5550 phys) = f00000000000
Sep 06 14:13:48 twifty-lynx kernel: ------------[ cut here ]------------
Sep 06 14:13:48 twifty-lynx kernel: Memory corruption detected in low memory
Sep 06 14:13:48 twifty-lynx kernel: WARNING: CPU: 0 PID: 18847 at arch/x86/kernel/check.c:161 check_corruption+0xcd/0xd1
Sep 06 14:13:48 twifty-lynx kernel: Modules linked in: tun isofs cmac rfcomm bnep fuse edac_mce_amd nls_iso8859_1 nls_cp437 vfat fat rtwpci rtw88 snd_hda_codec_realtek kvm snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdm>
Sep 06 14:13:48 twifty-lynx kernel:  nf_defrag_ipv4 libcrc32c iptable_filter uinput crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sd_mod uas usb_storage hid_logitech_hidpp hid_logitech_dj hid_generic u>
Sep 06 14:13:48 twifty-lynx kernel: CPU: 0 PID: 18847 Comm: kworker/0:1 Tainted: G        W         5.3.0-1-MANJARO #1
Sep 06 14:13:48 twifty-lynx kernel: Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VII HERO (WI-FI), BIOS 2703 08/20/2019
Sep 06 14:13:48 twifty-lynx kernel: Workqueue: events check_corruption
Sep 06 14:13:48 twifty-lynx kernel: RIP: 0010:check_corruption+0xcd/0xd1
Sep 06 14:13:48 twifty-lynx kernel: Code: 41 5c bf 40 01 00 00 41 5d e9 1f bf 03 00 80 3d 5b a7 2a 01 00 75 c7 48 c7 c7 40 42 4c af c6 05 4b a7 2a 01 01 e8 92 f5 01 00 <0f> 0b eb b0 48 c7 c7 c8 41 4c af e8 d1 a6 08 00 48 8b 3>
Sep 06 14:13:48 twifty-lynx kernel: RSP: 0018:ffff93928119fe58 EFLAGS: 00010282
Sep 06 14:13:48 twifty-lynx kernel: RAX: 0000000000000000 RBX: ffff8f69c0010000 RCX: 0000000000000006
Sep 06 14:13:48 twifty-lynx kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8f71be617700
Sep 06 14:13:48 twifty-lynx kernel: RBP: 0000000000000001 R08: 0000000000000659 R09: 0000000000000001
Sep 06 14:13:48 twifty-lynx kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f69c0010000
Sep 06 14:13:48 twifty-lynx kernel: R13: 0000000080000000 R14: 0000000000000000 R15: 0ffff8f71be62f70
Sep 06 14:13:48 twifty-lynx kernel: FS:  0000000000000000(0000) GS:ffff8f71be600000(0000) knlGS:0000000000000000
Sep 06 14:13:48 twifty-lynx kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 06 14:13:48 twifty-lynx kernel: CR2: 000055e803543c50 CR3: 0000000780da8000 CR4: 00000000003406f0
Sep 06 14:13:48 twifty-lynx kernel: Call Trace:
Sep 06 14:13:48 twifty-lynx kernel:  process_one_work+0x1da/0x380
Sep 06 14:13:48 twifty-lynx kernel:  worker_thread+0x4d/0x3f0
Sep 06 14:13:48 twifty-lynx kernel:  kthread+0xfb/0x130
Sep 06 14:13:48 twifty-lynx kernel:  ? process_one_work+0x380/0x380
Sep 06 14:13:48 twifty-lynx kernel:  ? kthread_park+0x80/0x80
Sep 06 14:13:48 twifty-lynx kernel:  ret_from_fork+0x22/0x40
Sep 06 14:13:48 twifty-lynx kernel: ---[ end trace 953af0f8b75a034c ]---
Sep 06 14:13:55 twifty-lynx org.gnome.Shell.desktop[1314]: amdgpu: The CS has been rejected, see dmesg for more information (-16).
Sep 06 14:13:55 twifty-lynx kernel: Move buffer fallback to memcpy unavailable
Sep 06 14:13:55 twifty-lynx kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Sep 06 14:13:55 twifty-lynx kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -16!

Which DM are you guys using? I'm wondering if Gnome has anything to do with this.

Update:
Switching to wayland has solved all my issues. The card runs very smooth now.

1 Like

Using KWin/KDE here, haven't tested suspend/resume yet (don't know when i last used that - must have been years ^^)

What kind of PSU do you have and what GPU did you have before that?

Using the 2700X on the Crosshair VII Hero, the PSU is a corsair RM750x. I built the system about 10 months ago, but held back on the GPU until AMD came out with something better than vega (which was their "best" at the time). I was using a Strix RX580 (my son is happy for the hand-me-down).

I may get a 3000 series CPU in the near future, but TBH the 2700X has more than enough power for what I do.

Resume from sleep is working perfectly for me on KDE. Looking through journalctl | grep amdgpu I can't see any errors.
If it would be a power problem then it should probably not happen on resume from suspend.
BUT just in case: how's your GPU connected to your PSU? If it's on the lane where the CPU is on you might wanna try switching to the other. It should not make a difference unless your CPU pulls something like 200W but whatever.

ok got this working and got great results on the benchmark!
Processes i used was,
install the firmware from the AUR Package
add the lcarlier repo to /etc/pacman.conf

[mesa-git]
SigLevel = Never
Server = https://pkgbuild.com/~lcarlier/$repo/$arch

Then do the installs,
sudo pacman -Sy mesa-git lib32-mesa-git mesa-demos
sudo pacman -S vulkan-icd-loader lib32-vulkan-icd-loader vulkan-tools
sudo pacman -S vulkan-radeon-git lib32-vulkan-radeon-git
I am running an AMD CPU so i didn't install the intel stuff.
used glxinfo | grep 'OpenGL' to check mesa and llvm info/versions.
then vulkaninfo | grep "GPU id" to check GPU showing up ok
used glxgears to check it was rendering ok
my kernel info is Linux rommi 5.3.0-1-MANJARO #1 SMP Mon Sep 2 18:26:38 UTC 2019 x86_64 GNU/Linux

then to test I used, phoronix-test-suite from the AUR.
then phoronix-test-suite install unigine-heaven installs the bench mark.
phoronix-test-suite run unigine-heaven to run it, i used 1920x1080 and standard 3 passes. average FPS of 150.3 put me in the top 95%.

sources,

  1. This thread for the AUR firmware and kernel info
  2. GloriousEggroll's Blog on mesa git. just changed a few bits and skipped llvm-svn
  3. This Blog for the bench mark info.

Super happy to have it all running :slight_smile:
original install was KDE from the Architect installer on the testing branch to get latest kernel and stuff.

5 Likes

What is your power level like (sensors)?

Mine hovers around 25W on idle. The arch threads mention a similar higher than normal draw.

yeah looks like 34w just watching a video at 1920x1080

i've got 35w, but i've got two screens - so as the VRam should always be running max clock it didn't suprise me ^^

Same. 35W with two displays. And it's definitely the VRAM - I tried a Vega64 once and it pulled like 5W only with 3 displays...

really? I thought the max VRAM-Clock thing was true for all cards? My rx580 had the same behaviour. Or is it because the HBM is less power hungry?

HBM is just that much less power hungry. And IIRC it didn't even stay at max clock all the time.

I added in:

AMD_DEBUG=nodma

to /etc/environment which has stopped some of the crashes I have been getting using 5700XT.

I had not installed any fan control library but this seemed to maybe more than one issue.

I'm not sure on the performance impact of using this, but I have not noticed anything and my system is much more stable with it set

I'm seriously considering returning my card, and (with a sickly taste in my mouth) replacing it with Nvidia.

I booted into windows and started playing Metro Exodus. Ran great for 10 minutes then I was kicked back to the desktop, After that I would get nothing but random application crashes and blue screens. I reinstalled everything and tried again. Exactly the same result. After the final BSOD, I booted back into linux. The problems followed me. Anything I did on linux, opening chrome for example, would crash Gnome. I've tried this a few times, each time watching the sensors (card never goes above 80C, fans switch between 40-60%, usage at about 98%). Nothing seems out of the ordinary.

The card is acting like it builds up a temperature during those 10 minutes of gaming, hits a ■■■■■ wall and crashes. Then it requires at least 30 minutes to cool off, during which random crashes are encountered on ANY OS. After that, the card is back to normal.

At this point I don't know if it's the card, firmware or drivers. I paid more than a months salary for this. While I expected there to be a few problems on linux, I didn't expect such serious problems on windows and I'm more than annoyed that those problems persist across reboots and OSs.

That sounds like a defective hardware to me, as you encounter the problems in Linux and Windows.
I'd try RMAing it instead of turning to evilNV ^^

Just tried again with the newest afterburner. Put fans at 100% and clock to 1960mhz. The only temp that had me worried was the junction which hovered between 85-95C (though I think this may be normal). Managed to play Metro2033 for 15 minutes before it crashed back to the desktop. I've started letting the system idle for 15-20 minutes after these crashes to avoid the BSOD.

Since it doesn't seem to be a temperature problem, I'm wondering if a capacitor or something on the card is causing this.

I use windows also for games and its never crashed on me there. As @Termy said, it sounds defective.

All the tests i've seen report junction in this range, so that should be normal (my red devil reaches 92° in silent bios).

I'd say it sounds like either the memory or (more likely) the VRMs overheating.

My first thought was the memory. But, unless the values reported by the card are incorrect, they all appear to be within range. Saying that though, ALL the problems I've seen so far tell me memory access is the issue.

I now have to decide, get another Strix replacement, another model or go with Ngreedier.

Forum kindly sponsored by