Random Freezes in Linux (multiple distros) but not in Win10

I'm not a complete newbie (toying with linux for 20 years off and on) but never really got deep into the OS or it's workings. I've again decided to give it a go as my main OS, but I'm running into some issues with random freezes.

I have experienced this in Mint, Ubuntu, and now Manjaro (as I've tried to decide which distro I want to run with). I have not been able to identify any specific activity as a cause. I am fairly confident that the hardware is fine as I have no problems in Win10. I also ran memtest for about 10 hours with no errors. I think that it's most likely related to video drivers.

The way it looks when it freezes is not always exactly the same. Sometimes, the video and system just freeze, display unchanged. Sometimes the entire display freezes except the mouse cursor, until the cursor then freezes a couple seconds later. And a couple times I get a graphical glitch that looks like this. That screenshot may be from Mint, but it glitches about the same. The length of time the system stays up varies, but I've had it freeze before I could even enter my password at the login screen. One time I noticed that while it was frozen I was able to turn caps and numlocks on and off, but usually they are frozen too. What does happen every time is that shortly after the freeze, my monitor goes to sleep because it is not receiving any input.

So that is what I've been able to figure out so far, but I'm not sure where to go next as far as what logs I want to look at or anything.

I also want to state my appreciation for the forum software automatically saving drafts of my message. It froze twice on me while typing this all out.

System:
  Host: kk Kernel: 5.7.0-1-MANJARO x86_64 bits: 64 compiler: gcc v: 9.3.0 
  Desktop: KDE Plasma 5.18.4 Distro: Manjaro Linux 
Machine:
  Type: Desktop System: Gigabyte product: Z170X-UD3 v: N/A 
  serial: <root required> 
  Mobo: Gigabyte model: Z170X-UD3-CF serial: <root required> 
  UEFI [Legacy]: American Megatrends v: F23g date: 03/09/2018 
CPU:
  Topology: Quad Core model: Intel Core i5-6600K bits: 64 type: MCP 
  arch: Skylake-S rev: 3 L2 cache: 6144 KiB 
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx 
  bogomips: 28009 
  Speed: 800 MHz min/max: 800/3900 MHz Core speeds (MHz): 1: 800 2: 800 
  3: 800 4: 800 
Graphics:
  Device-1: AMD Baffin [Radeon RX 460/560D / Pro 
  450/455/460/555/555X/560/560X] 
  vendor: XFX Pine Polaris 21 XL driver: amdgpu v: kernel bus ID: 01:00.0 
  Display: x11 server: X.Org 1.20.8 driver: amdgpu FAILED: ati 
  unloaded: modesetting resolution: 1920x1080~60Hz 
  OpenGL: renderer: Radeon RX 560 Series (POLARIS11 DRM 3.36.0 
  5.7.0-1-MANJARO LLVM 10.0.0) 
  v: 4.6 Mesa 20.0.5 direct render: Yes 
Audio:
  Device-1: Intel 100 Series/C230 Series Family HD Audio vendor: Gigabyte 
  driver: snd_hda_intel v: kernel bus ID: 00:1f.3 
  Device-2: AMD Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] 
  vendor: XFX Pine driver: snd_hda_intel v: kernel bus ID: 01:00.1 
  Sound Server: ALSA v: k5.7.0-1-MANJARO 
Network:
  Device-1: Intel Ethernet I219-V vendor: Gigabyte driver: e1000e 
  v: 3.2.6-k port: f000 bus ID: 00:1f.6 
  IF: enp0s31f6 state: down mac: 40:8d:5c:1f:56:2a 
  Device-2: Intel Wireless 8260 driver: iwlwifi v: kernel port: e000 
  bus ID: 0e:00.0 
  IF: wlp14s0 state: up mac: 34:41:5d:87:c3:78 
  IF-ID-1: eth0 state: up speed: -1 duplex: unknown 
  mac: 96:4f:fe:5c:fc:42 
Drives:
  Local Storage: total: 5.91 TiB used: 13.89 GiB (0.2%) 
  ID-1: /dev/sda vendor: Samsung model: SSD 850 EVO M.2 500GB 
  size: 465.76 GiB 
  ID-2: /dev/sdb vendor: Western Digital model: WD20EARS-00MVWB0 
  size: 1.82 TiB 
  ID-3: /dev/sdc vendor: Western Digital model: WD40EZRZ-00GXCB0 
  size: 3.64 TiB 
Partition:
  ID-1: / size: 78.73 GiB used: 13.89 GiB (17.6%) fs: ext4 dev: /dev/sda5 
Sensors:
  System Temperatures: cpu: 41.0 C mobo: 29.8 C gpu: amdgpu temp: 39 C 
  Fan Speeds (RPM): N/A gpu: amdgpu fan: 960 
Info:
  Processes: 219 Uptime: 1m Memory: 7.73 GiB used: 1.16 GiB (15.0%) 
  Init: systemd Compilers: gcc: 9.3.0 Shell: bash v: 5.0.16 inxi: 3.0.37 

How is the swap going?
cat /proc/swaps

Literally nothing there past the headings. I wonder why...

Filename                                Type            Size    Used    Priority

Because you have no swap?
That would probably make you freeze.
Usually we have a SWAP partition .. I didnt notice one which is why I asked.
Some folks use a swap file or a dynamically managed one (systemd-swap) .. but you must have some sort of swap .. or any time you reach your max memory .. your system will lock up.
This should be part of the automatic partitioning/install process, and one of the steps in Architect .. so I am not sure how you missed it .. but if it wasnt intentional .. you should probably make a SWAP.

I'm not sure how it missed it, but I do now have the swap set up and the random freezing continues. Is there a log that might point me in the right direction?

System:    Host: kk Kernel: 5.6.7-1-MANJARO x86_64 bits: 64 compiler: gcc v: 9.3.0 Desktop: KDE Plasma 5.18.4 
           Distro: Manjaro Linux 
Machine:   Type: Desktop System: Gigabyte product: Z170X-UD3 v: N/A serial: <root required> 
           Mobo: Gigabyte model: Z170X-UD3-CF serial: <root required> UEFI [Legacy]: American Megatrends v: F23g 
           date: 03/09/2018 
CPU:       Topology: Quad Core model: Intel Core i5-6600K bits: 64 type: MCP arch: Skylake-S rev: 3 L2 cache: 6144 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 28009 
           Speed: 800 MHz min/max: 800/3900 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 
Graphics:  Device-1: AMD Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] vendor: XFX Pine Polaris 21 XL 
           driver: amdgpu v: kernel bus ID: 01:00.0 
           Display: x11 server: X.Org 1.20.8 driver: amdgpu FAILED: ati unloaded: modesetting resolution: 1920x1080~60Hz 
           OpenGL: renderer: Radeon RX 560 Series (POLARIS11 DRM 3.36.0 5.6.7-1-MANJARO LLVM 10.0.0) v: 4.6 Mesa 20.0.5 
           direct render: Yes 
Audio:     Device-1: Intel 100 Series/C230 Series Family HD Audio vendor: Gigabyte driver: snd_hda_intel v: kernel 
           bus ID: 00:1f.3 
           Device-2: AMD Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] vendor: XFX Pine driver: snd_hda_intel 
           v: kernel bus ID: 01:00.1 
           Sound Server: ALSA v: k5.6.7-1-MANJARO 
Network:   Device-1: Intel Ethernet I219-V vendor: Gigabyte driver: e1000e v: 3.2.6-k port: f000 bus ID: 00:1f.6 
           IF: enp0s31f6 state: down mac: 40:8d:5c:1f:56:2a 
           Device-2: Intel Wireless 8260 driver: iwlwifi v: kernel port: e000 bus ID: 0e:00.0 
           IF: wlp14s0 state: up mac: 34:41:5d:87:c3:78 
Drives:    Local Storage: total: 5.91 TiB used: 19.39 GiB (0.3%) 
           ID-1: /dev/sda vendor: Samsung model: SSD 850 EVO M.2 500GB size: 465.76 GiB 
           ID-2: /dev/sdb vendor: Western Digital model: WD20EARS-00MVWB0 size: 1.82 TiB 
           ID-3: /dev/sdc vendor: Western Digital model: WD40EZRZ-00GXCB0 size: 3.64 TiB 
Partition: ID-1: / size: 96.78 GiB used: 19.39 GiB (20.0%) fs: ext4 dev: /dev/sda5 
           ID-2: swap-1 size: 8.00 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda6 
Sensors:   System Temperatures: cpu: 42.0 C mobo: 29.8 C gpu: amdgpu temp: 34 C 
           Fan Speeds (RPM): N/A gpu: amdgpu fan: 965 
Info:      Processes: 242 Uptime: 2m Memory: 7.73 GiB used: 2.10 GiB (27.2%) Init: systemd Compilers: gcc: 9.3.0 Shell: bash 
           v: 5.0.16 inxi: 3.0.37

Hello,
On SSD i would not make use of a swap partition at all. I would go with systemd-swap. (there are a couple of posts i made about it)
I would also consider this:

I will look into that. It doesn't seem like that would cause my system to keep freezing though.

Then what is the cause ?

Maybe I misunderstood your advice? From what I read, the issue with swap on SSD is an issue of preserving lifespan for the drive. Is there reason to think it would cause random freezing?

With the graphical glitching and lack of display output, I figured it was likely related to something in that area.

But if I really knew what the issue was, I wouldn't be here asking for help.

Some system require a scheduler tweak, yes.

That is another issue and could only be part of correlation or cumulation of more than one issue. That is why you go step by step trough elimination. I'm not experienced at AMD GPU drivers, but i'm sure you can investigate that too.

Exactly, in order to know what is not causing something you have to know what could cause it first.

Is there any chance your system is overclocked?
Also your inxi output looks strange, as your video driver seems to fail.

Maybe try to re-install it via mhwd.

Please check also this thread. The user has the same motherboard and overclocking through Windows was his issue.

This is a development kernel.
Would you mind to test with an lts kernel like linux54 or linux419?
Also, please check your RAM with memtest.

Thanks all for the suggestions. I was unable to continue troubleshooting for a bit over a week, but I've continued on now, and have found a solution.

First, I'm not using manjaro right now, but this wasn't a distro specific bug so hopefully the fix is not going to work any differently in manjaro (I'm on openSUSE currently, still testing out distros).

The solution was to disable dynamic power management in the amdgpu driver. I'm not sure what other consequences there might be, but my system is now stable.

To do this, you need add the following boot option amdgpu.dpm=0

For more detail and googleability, I've included the dmesg errors I was seeing at the bottom of this post. The later errors popped up at the time of freeze.

Also, the system was not completely frozen as I was usually able to ssh in long enough to run a few commands. However I was not able to reset x, complete a successful shutdown command, or do much of anything other than get a dmesg output.

The only way I was able to restart was with the power button or with "alt-SysRq-reisub"

And finally in regards to the inxi output I originally posted:
driver: amdgpu FAILED: ati

It turns out this is not saying that the amdgpu driver failed. Those are two separate details that could be read:

driver: amdgpu
FAILED: ati

Here are the dmesg errors:

[    4.640180] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out.
[    4.640240] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 9 (-110
).
[ 7847.395134] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=400137, emi
tted seq=400139
[ 7849.695105] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=115648, e
mitted seq=115649
[ 7849.695139] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=49935, em
itted seq=49937
2 Likes

A post was split to a new topic: Cannot stop fans

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Forum kindly sponsored by