Manjaro Linux not properly throttling/undervolting/fan controlling. Giant Heat Pump vs Windows 10

ystem:    Host: Datavore Kernel: 5.7.0-3-MANJARO x86_64 bits: 64 compiler: gcc v: 10.1.0 
           Desktop: KDE Plasma 5.18.5 tk: Qt 5.15.0 info: latte-dock wm: kwin_x11 dm: SDDM 
           Distro: Manjaro Linux 
Machine:   Type: Desktop Mobo: ASUSTeK model: ROG STRIX X570-E GAMING v: Rev X.0x 
           serial: <filter> UEFI: American Megatrends v: 1407 date: 02/24/2020 
Battery:   Device-1: hidpp_battery_0 model: Logitech Wireless Mouse PID:0036 serial: N/A 
           charge: 55% (should be ignored) rechargeable: yes status: Discharging 
           Device-2: hidpp_battery_1 model: Logitech Wireless Keyboard PID:0056 serial: N/A 
           charge: 70% (should be ignored) rechargeable: yes status: Discharging 
           Device-3: hidpp_battery_2 model: Logitech M720 Triathlon Multi-Device Mouse 
           serial: <filter> charge: 55% (should be ignored) rechargeable: yes 
           status: Discharging 
CPU:       Topology: 12-Core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP arch: Zen 
           L2 cache: 6144 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm 
           bogomips: 182139 
           Speed: 2195 MHz min/max: 2200/3800 MHz boost: enabled Core speeds (MHz): 1: 2500 
           2: 2120 3: 2015 4: 2212 5: 2281 6: 2188 7: 2102 8: 2700 9: 1916 10: 1871 11: 3929 
           12: 2113 13: 1928 14: 2458 15: 1926 16: 1881 17: 2686 18: 2482 19: 1872 20: 2021 
           21: 3865 22: 2053 23: 2170 24: 2014 
Graphics:  Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] vendor: ASUSTeK 
           driver: amdgpu v: kernel bus ID: 0c:00.0 chip ID: 1002:731f 
           Display: x11 server: X.Org 1.20.8 driver: amdgpu FAILED: ati 
           unloaded: modesetting,radeon alternate: fbdev,vesa compositor: kwin_x11 
           resolution: 1920x1080~60Hz, 1920x1080~60Hz 
           renderer: AMD Radeon RX 5600 XT (NAVI10 DRM 3.37.0 5.7.0-3-MANJARO LLVM 10.0.0) 
           v: 4.6 Mesa 20.0.7 direct render: Yes 
Audio:     Device-1: AMD Navi 10 HDMI Audio driver: snd_hda_intel v: kernel bus ID: 0c:00.1 
           chip ID: 1002:ab38 
           Device-2: AMD Starship/Matisse HD Audio vendor: ASUSTeK driver: snd_hda_intel 
           v: kernel bus ID: 0e:00.4 chip ID: 1022:1487 
           Sound Server: ALSA v: k5.7.0-3-MANJARO 
Network:   Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel bus ID: 04:00.0 
           chip ID: 8086:2723 
           IF: wlp4s0 state: up mac: <filter> 
           Device-2: Realtek RTL8125 2.5GbE vendor: ASUSTeK driver: r8169 v: kernel 
           port: e000 bus ID: 05:00.0 chip ID: 10ec:8125 
           IF: enp5s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           Device-3: Intel I211 Gigabit Network vendor: ASUSTeK driver: igb v: 5.6.0-k 
           port: d000 bus ID: 06:00.0 chip ID: 8086:1539 
           IF: enp6s0 state: down mac: <filter> 
Drives:    Local Storage: total: 4.57 TiB used: 95.77 GiB (2.0%) 
           ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO 1TB size: 931.51 GiB 
           speed: 31.6 Gb/s lanes: 4 serial: <filter> rev: 2B2QEXE7 scheme: GPT 
           ID-2: /dev/sda vendor: HGST (Hitachi) model: HDN724030ALE640 size: 2.73 TiB 
           speed: 6.0 Gb/s rotation: 7200 rpm serial: <filter> rev: A5E0 scheme: GPT 
           ID-3: /dev/sdb vendor: Samsung model: SSD 860 DCT 960GB size: 894.25 GiB 
           speed: 6.0 Gb/s serial: <filter> rev: 0B6Q scheme: MBR 
           ID-4: /dev/sdc type: USB vendor: PNY model: USB 2.0 FD size: 57.70 GiB 
           serial: <filter> rev: PMAP scheme: MBR 
Partition: ID-1: / size: 881.67 GiB used: 92.46 GiB (10.5%) fs: ext4 dev: /dev/nvme0n1p2 
           ID-2: swap-1 size: 34.47 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/nvme0n1p3 
Sensors:   System Temperatures: cpu: 57.5 C mobo: N/A gpu: amdgpu temp: 44 C 
           Fan Speeds (RPM): N/A gpu: amdgpu fan: 0 
Info:      Processes: 406 Uptime: 1h 18m Memory: 31.33 GiB used: 3.27 GiB (10.4%) 
           Init: systemd v: 245 Compilers: gcc: 10.1.0 Shell: bash v: 5.0.17 
           running in: konsole inxi: 3.0.37 

I've been working hard at getting my Manjaro install to behave well. I've installed and configured Sensors to get proper data, installed corectrl and worked with it. All report reasonable answers. But the fans are not slowing and in general Linux seems to put out a lot of heat which counters what conrectrl and others are reporting. Such as low GFX card wattage use and low or no fan-speed when idle. The issue seems to be CPU. When idle, windows will actually spin the fan to an almost stop and then restart it and then bring it back down to an average of about 100rpm or less. Tools under win report the proper parking of the idle cores into sleep.

Corectrl reports that the core clock is being properly down clocked but it also does not show individual cores only "all cores". Same thing with sensors. It does not list individual core temps or clocks or voltage only "all" as one. Mind you it is "correct" in that none of the values are unexpected. But if Corectrl output is correct it does not downclock or park processors for any length of time as it keeps ramping it back up. From the heat-output and the fans never going below 50% and the fact that just idling linux raises the room temperature significantly it would appear that Linux is in fact not downclocking, nor undervolting, nor parking cores.

It seems to be trying but instead it just keep waking everything back up, even with the monitor tools off (aka,to prevent the Observer Effect by turning off the tools but observing fan speed and room temp). I wonder if this is an artifact of the 5.7.0 kernel or is it something KDE is doing that is making the system think it needs to throttle up.

Just want to spend more time under Manjaro and have been using it as my primary for almost half a year now and would rather not go back to windows. I even game under linux with great success. But it's summer and having a space heater that can raise the room temp by 3-6F even when idle isn't comfortable.

Wasting power really isn't a concern for me, heat is.

I am probably overlooking something. If there is any output I can give that would help I can copy paste the terminal here. The various tools are indeed trying to slow down the CPU but something is making them run right back up in less than a ms from when they are ramped down.

Htop shows no excessive CPU load. Nor does Ksysguard so they agree that just sitting at 1% or less utilization while corectrl shows a very jagged CPU clock output.

I've searched the forums and reddit and installed or made sure the tools were installed or the modules if need be. I am stumped at this point.

Output of sensors-detect (needs to be run as root)?

powertop could be helpfull to see if something is wrong

That is a long list... but..

# sensors-detect version 3.6.0
# Kernel: 5.7.0-3-MANJARO x86_64
# Processor: AMD Ryzen 9 3900X 12-Core Processor (23/113/0)

This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.

Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no): 
Module cpuid loaded successfully.
Silicon Integrated Systems SIS5595...                       No
VIA VT82C686 Integrated Sensors...                          No
VIA VT8231 Integrated Sensors...                            No
AMD K8 thermal sensors...                                   No
AMD Family 10h thermal sensors...                           No
AMD Family 11h thermal sensors...                           No
AMD Family 12h and 14h thermal sensors...                   No
AMD Family 15h thermal sensors...                           No
AMD Family 16h thermal sensors...                           No
AMD Family 17h thermal sensors...                           Success!
    (driver `k10temp')
AMD Family 15h power sensors...                             No
AMD Family 16h power sensors...                             No
Hygon Family 18h thermal sensors...                         No
Intel digital thermal sensor...                             No
Intel AMB FB-DIMM thermal sensor...                         No
Intel 5500/5520/X58 thermal sensor...                       No
VIA C7 thermal sensor...                                    No
VIA Nano thermal sensor...                                  No

Some Super I/O chips contain embedded sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no): 
Probing for Super-I/O at 0x2e/0x2f
Trying family `National Semiconductor/ITE'...               No
Trying family `SMSC'...                                     No
Trying family `VIA/Winbond/Nuvoton/Fintek'...               Yes
Found `Nuvoton NCT6798D Super IO Sensors'                   Success!
    (address 0x290, driver `nct6775')
Probing for Super-I/O at 0x4e/0x4f
Trying family `National Semiconductor/ITE'...               No
Trying family `SMSC'...                                     No
Trying family `VIA/Winbond/Nuvoton/Fintek'...               No
Trying family `ITE'...                                      No

Some systems (mainly servers) implement IPMI, a set of common interfaces
through which system health data may be retrieved, amongst other things.
We first try to get the information from SMBIOS. If we don't find it
there, we have to read from arbitrary I/O ports to probe for such
interfaces. This is normally safe. Do you want to scan for IPMI
interfaces? (YES/no): 
Probing for `IPMI BMC KCS' at 0xca0...                      No
Probing for `IPMI BMC SMIC' at 0xca8...                     No

Some hardware monitoring chips are accessible through the ISA I/O ports.
We have to write to arbitrary I/O ports to probe them. This is usually
safe though. Yes, you do have ISA I/O ports even if you do not have any
ISA slots! Do you want to scan the ISA I/O ports? (yes/NO): 

Lastly, we can probe the I2C/SMBus adapters for connected hardware
monitoring devices. This is the most risky part, and while it works
reasonably well on most systems, it has been reported to cause trouble
on some systems.
Do you want to probe the I2C/SMBus adapters now? (YES/no): 
Using driver `i2c-piix4' for device 0000:00:14.0: AMD KERNCZ SMBus
Module i2c-dev loaded successfully.

Next adapter: SMBus PIIX4 adapter port 0 at 0b00 (i2c-0)
Do you want to scan it? (YES/no/selectively): 
Client found at address 0x52
Probing for `Analog Devices ADM1033'...                     No
Probing for `Analog Devices ADM1034'...                     No
Probing for `SPD EEPROM'...                                 Yes
    (confidence 8, not a hardware monitoring chip)
Client found at address 0x53
Probing for `Analog Devices ADM1033'...                     No
Probing for `Analog Devices ADM1034'...                     No
Probing for `SPD EEPROM'...                                 Yes
    (confidence 8, not a hardware monitoring chip)

Next adapter: SMBus PIIX4 adapter port 2 at 0b00 (i2c-1)
Do you want to scan it? (YES/no/selectively): 

Next adapter: AMDGPU DM i2c hw bus 0 (i2c-2)
Do you want to scan it? (yes/NO/selectively): YES

Next adapter: AMDGPU DM i2c hw bus 1 (i2c-3)
Do you want to scan it? (yes/NO/selectively): YES
Client found at address 0x49
Probing for `National Semiconductor LM75'...                No
Probing for `National Semiconductor LM75A'...               No
Probing for `Dallas Semiconductor DS75'...                  No
Probing for `National Semiconductor LM77'...                No
Probing for `Analog Devices ADT7410/ADT7420'...             No
Probing for `Maxim MAX6642'...                              No
Probing for `Texas Instruments TMP435'...                   No
Probing for `National Semiconductor LM73'...                No
Probing for `National Semiconductor LM92'...                No
Probing for `National Semiconductor LM76'...                No
Probing for `Maxim MAX6633/MAX6634/MAX6635'...              No
Probing for `NXP/Philips SA56004'...                        No
Probing for `SMSC EMC1023'...                               No
Probing for `SMSC EMC1043'...                               No
Probing for `SMSC EMC1053'...                               No
Probing for `SMSC EMC1063'...                               No

Next adapter: AMDGPU DM i2c hw bus 2 (i2c-4)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: AMDGPU DM i2c hw bus 3 (i2c-5)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: AMDGPU DM aux hw bus 0 (i2c-6)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: AMDGPU DM aux hw bus 1 (i2c-7)
Do you want to scan it? (yes/NO/selectively): 

Next adapter: AMDGPU DM aux hw bus 2 (i2c-8)
Do you want to scan it? (yes/NO/selectively): 

Now follows a summary of the probes I have just done.
Just press ENTER to continue: 

Driver `k10temp' (autoloaded):
  * Chip `AMD Family 17h thermal sensors' (confidence: 9)

Driver `nct6775':
  * ISA bus, address 0x290
    Chip `Nuvoton NCT6798D Super IO Sensors' (confidence: 9)

Do you want to overwrite /etc/conf.d/lm_sensors? (YES/no): 
Unloading i2c-dev... OK
Unloading cpuid... OK

I don't have any thoughts WRT AMD's coretemp.

  • Did you load the nct6775 module?

If so, what is the output of sensors?

looking at powertop. It seems.. hmm. I am honestly having trouble interpreting the data.

It is loaded. Used by zero.

Adapter: Virtual device
temp1:        +48.0°C  

Adapter: PCI adapter
Composite:    +44.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +44.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +47.9°C  (low  = -273.1°C, high = +65261.8°C)

Adapter: PCI adapter
vddgfx:      775.00 mV 
fan1:           0 RPM  (min =    0 RPM, max = 3630 RPM)
edge:         +53.0°C  (crit = +118.0°C, hyst = -273.1°C)
                       (emerg = +99.0°C)
junction:     +53.0°C  (crit = +99.0°C, hyst = -273.1°C)
                       (emerg = +99.0°C)
mem:          +52.0°C  (crit = +99.0°C, hyst = -273.1°C)
                       (emerg = +99.0°C)
power1:       10.00 W  (cap = 160.00 W)

Adapter: PCI adapter
Vcore:         1.47 V  
Vsoc:          1.08 V  
Tctl:         +62.5°C  
Tdie:         +62.5°C  
Tccd1:        +45.5°C  
Tccd2:        +44.2°C  
Icore:        19.00 A  
Isoc:         12.50 A  

I setup some more controls in the BIOS and it is mostly working better. Still runs hotter, but fan control via the BIOS seems the only realistic way to manage the fans as none of the other fan controls really seem to have any impact.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Forum kindly sponsored by