I'm posting here because I don't know what else to try and I kind of lost track of all the things I tried.
So, the main observable problem I have is that if I leave my computer (a desktop with all energy savings options turned off) running downloads for a while, when I come back sometimes the network is down. Restarting the network manager does not bring it back and I can only get it working again by rebooting.
Relevant lspci for the network card:
07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
Subsystem: ASUSTeK Computer Inc. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
Flags: bus master, fast devsel, latency 0, IRQ 37
I/O ports at f000 [size=256]
Memory at fcc04000 (64-bit, non-prefetchable) [size=4K]
Memory at fcc00000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
Capabilities: [170] Latency Tolerance Reporting
Capabilities: [178] L1 PM Substates
Kernel driver in use: r8169
Kernel modules: r8169
I tried using the "hardware configuration" software to install "network-r8168" but the problem happens with that as well.
I'm currently running kernel 5.5.2-1 but I have tried 5.4.18-1 and 5.3.18-1 as well.
Now, sometimes, after the network drops (I don't know how long later) it seems like more of the system goes offline. I don't know if this is related or not to the network issue.
When the network goes down, nothing different shows up on dmesg - no error messages, no attempts to reconnect, nothing. It just stops pinging locally. Other devices on the same network work fine.
When more parts of the system go down usually my btrfs filesystem goes read-only and, sometimes, the entire system starts freezing for 2-3 seconds then running for 4-8 seconds, like a cycle of hiccups.
When more of the system goes down I get way more stuff on dmesg. I don't know what of it is relevant, so I'm posting the first stuff that shows up that is not just the usual audit messages:
[ 3611.548615] audit: type=1130 audit(1582413211.591:127): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 3621.799999] audit: type=1131 audit(1582413221.845:128): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 3930.215468] pcieport 0000:02:06.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 3930.215504] pcieport 0000:02:05.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 3930.215519] pcieport 0000:02:04.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 3930.215534] pcieport 0000:02:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 3930.508779] enp7s0: cmd = 0xff, should be 0x07
.
[ 3930.508785] enp7s0: io_base_l = 0xffff, should be 0xf001
.
[ 3930.508789] enp7s0: mem_base_l = 0xffff, should be 0x4004
.
[ 3930.508793] enp7s0: mem_base_h = 0xffff, should be 0xfcc0
.
[ 3930.508797] enp7s0: resv_0x1c_l = 0xffff, should be 0x0000
.
[ 3930.508800] enp7s0: resv_0x1c_h = 0xffff, should be 0x0000
.
[ 3930.508804] enp7s0: resv_0x20_l = 0xffff, should be 0x0004
.
[ 3930.508807] enp7s0: resv_0x20_h = 0xffff, should be 0xfcc0
.
[ 3930.508811] enp7s0: resv_0x24_l = 0xffff, should be 0x0000
.
[ 3930.508815] enp7s0: resv_0x24_h = 0xffff, should be 0x0000
I also got this
[ 3930.632849] ------------[ cut here ]------------
[ 3930.632861] WARNING: CPU: 7 PID: 63 at /storage/manjaro/makepkg/linux55-r8168/src/r8168-8.048.00/src/r8168_n.c:6843 rtl8168_wait_phy_ups_resume+0x52/0x60 [r8168]
[ 3930.632862] Modules linked in: snd_seq_dummy snd_seq snd_seq_device fuse squashfs loop mousedev joydev input_leds edac_mce_amd nls_iso8859_1 nls_cp437 vfat fat ccp rng_core amdgpu kvm irqbypass btrfs blake2b_generic xor gpu_sched i2c_algo_bit ttm snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper ledtrig_audio snd_hda_codec_hdmi drm snd_hda_intel snd_intel_dspcfg snd_hda_codec eeepc_wmi snd_hda_core asus_wmi agpgart battery snd_hwdep crct10dif_pclmul syscopyarea crc32_pclmul sparse_keymap sysfillrect ghash_clmulni_intel snd_pcm rfkill wmi_bmof snd_timer raid6_pq snd aesni_intel sp5100_tco libcrc32c crypto_simd sysimgblt cryptd r8168(OE) glue_helper fb_sys_fops soundcore i2c_piix4 k10temp pcspkr wmi gpio_amdpt pinctrl_amd evdev mac_hid acpi_cpufreq uinput crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sd_mod hid_microsoft ff_memless hid_generic usbhid hid ahci libahci crc32c_intel libata xhci_pci xhci_hcd scsi_mod
[ 3930.632911] CPU: 7 PID: 63 Comm: ksoftirqd/7 Tainted: G W OE 5.5.2-1-MANJARO #1
[ 3930.632913] Hardware name: System manufacturer System Product Name/PRIME B450M-GAMING/BR, BIOS 2006 11/13/2019
[ 3930.632919] RIP: 0010:rtl8168_wait_phy_ups_resume+0x52/0x60 [r8168]
[ 3930.632923] Code: a4 ff ff ff bf 58 89 41 00 89 c3 e8 38 68 a9 d4 83 e3 07 66 44 39 eb 74 05 83 fd 63 7e d1 83 fd 64 74 07 5b 5d 41 5c 41 5d c3 <0f> 0b 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 0f b6 87
[ 3930.632925] RSP: 0018:ffffa12fc03c3d28 EFLAGS: 00010046
[ 3930.632928] RAX: 00000d4532da1d05 RBX: 0000000000000007 RCX: 0000000000000007
[ 3930.632929] RDX: 0000000000386412 RSI: 00000d4532a1b8f3 RDI: 0000000000385ae1
[ 3930.632931] RBP: 0000000000000064 R08: 00000000ffffffff R09: 0000000000000000
[ 3930.632932] R10: 0000000000000002 R11: 00000000000000f0 R12: ffff91a5791a88c0
[ 3930.632933] R13: 0000000000000003 R14: ffff91a5791a8b18 R15: ffff91a5791a88c0
[ 3930.632936] FS: 0000000000000000(0000) GS:ffff91a5909c0000(0000) knlGS:0000000000000000
[ 3930.632937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3930.632939] CR2: 0000557811de9712 CR3: 00000002ab924000 CR4: 00000000003406e0
[ 3930.632940] Call Trace:
[ 3930.632952] rtl8168_esd_timer.cold+0x1db/0x3a9 [r8168]
[ 3930.632963] ? rtl8168_open+0x430/0x430 [r8168]
[ 3930.632967] call_timer_fn+0x2d/0x160
[ 3930.632971] run_timer_softirq+0x1ad/0x510
[ 3930.632977] ? rtl8168_open+0x430/0x430 [r8168]
[ 3930.632983] __do_softirq+0x111/0x34d
[ 3930.632989] run_ksoftirqd+0x32/0x40
[ 3930.632992] smpboot_thread_fn+0x19a/0x230
[ 3930.632996] kthread+0xfb/0x130
[ 3930.632998] ? sort_range+0x20/0x20
[ 3930.633000] ? kthread_park+0x90/0x90
[ 3930.633004] ret_from_fork+0x22/0x40
[ 3930.633009] ---[ end trace 7894c6017069a6f9 ]---
And then the system comes crashing down
[ 3959.949053] enp7s0: resv_0x2c_l = 0xffff, should be 0x1043
.
[ 3959.949057] enp7s0: resv_0x2c_h = 0xffff, should be 0x8677
.
[ 3959.949184] enp7s0: pci_sn_l = 0xffffffff, should be 0x684ce000
.
[ 3959.950348] enp7s0: pci_sn_h = 0xffffffff, should be 0x01000000
.
[ 3959.951388] enp7s0: esd_flag = 0x7fff
.
[ 3962.788040] r8168: enp7s0: link up
[ 3962.806956] ata5.00: exception Emask 0x52 SAct 0x80fff841 SErr 0xffffffff action 0x6 frozen
[ 3962.806962] ata5: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch }
[ 3962.806967] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.806976] ata5.00: cmd 61/00:00:80:9d:43/0a:00:38:00:00/40 tag 0 ncq dma 1310720 ou
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.806979] ata5.00: status: { DRDY }
[ 3962.806983] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.806993] ata5.00: cmd 61/00:30:80:07:43/0a:00:38:00:00/40 tag 6 ncq dma 1310720 ou
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.806995] ata5.00: status: { DRDY }
[ 3962.806998] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807005] ata5.00: cmd 61/00:58:80:11:43/0a:00:38:00:00/40 tag 11 ncq dma 1310720 ou
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807006] ata5.00: status: { DRDY }
[ 3962.807008] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807014] ata5.00: cmd 61/00:60:80:1b:43/0a:00:38:00:00/40 tag 12 ncq dma 1310720 ou
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807016] ata5.00: status: { DRDY }
[ 3962.807018] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807024] ata5.00: cmd 61/00:68:80:25:43/0a:00:38:00:00/40 tag 13 ncq dma 1310720 ou
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807026] ata5.00: status: { DRDY }
[ 3962.807028] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807034] ata5.00: cmd 61/00:70:80:2f:43/0a:00:38:00:00/40 tag 14 ncq dma 1310720 ou
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807035] ata5.00: status: { DRDY }
[ 3962.807037] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807043] ata5.00: cmd 61/00:78:80:39:43/0a:00:38:00:00/40 tag 15 ncq dma 1310720 ou
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807044] ata5.00: status: { DRDY }
[ 3962.807046] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807052] ata5.00: cmd 61/00:80:80:43:43/0a:00:38:00:00/40 tag 16 ncq dma 1310720 ou
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807054] ata5.00: status: { DRDY }
[ 3962.807055] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807061] ata5.00: cmd 61/00:88:80:4d:43/0a:00:38:00:00/40 tag 17 ncq dma 1310720 ou
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807063] ata5.00: status: { DRDY }
[ 3962.807064] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807070] ata5.00: cmd 61/00:90:80:57:43/0a:00:38:00:00/40 tag 18 ncq dma 1310720 ou
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807071] ata5.00: status: { DRDY }
[ 3962.807073] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807079] ata5.00: cmd 61/00:98:80:61:43/0a:00:38:00:00/40 tag 19 ncq dma 1310720 ou
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807080] ata5.00: status: { DRDY }
[ 3962.807082] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807087] ata5.00: cmd 61/00:a0:80:6b:43/0a:00:38:00:00/40 tag 20 ncq dma 1310720 ou
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807089] ata5.00: status: { DRDY }
[ 3962.807091] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807096] ata5.00: cmd 61/00:a8:80:75:43/0a:00:38:00:00/40 tag 21 ncq dma 1310720 ou
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807098] ata5.00: status: { DRDY }
[ 3962.807100] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807105] ata5.00: cmd 61/00:b0:80:7f:43/0a:00:38:00:00/40 tag 22 ncq dma 1310720 ou
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807107] ata5.00: status: { DRDY }
[ 3962.807108] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807114] ata5.00: cmd 61/00:b8:80:89:43/0a:00:38:00:00/40 tag 23 ncq dma 1310720 ou
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807115] ata5.00: status: { DRDY }
[ 3962.807117] ata5.00: failed command: WRITE FPDMA QUEUED
[ 3962.807123] ata5.00: cmd 61/00:f8:80:93:43/0a:00:38:00:00/40 tag 31 ncq dma 1310720 ou
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[ 3962.807124] ata5.00: status: { DRDY }
[ 3962.807128] ata5: hard resetting link
[ 3962.807133] ahci 0000:01:00.1: AHCI controller unavailable!
[ 3963.839057] ata5: failed to resume link (SControl FFFFFFFF)
[ 3963.839072] ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)
[ 3964.855824] enp7s0: cmd = 0xff, should be 0x07
.
[ 3964.855829] enp7s0: io_base_l = 0xffff, should be 0xf001
.
[ 3964.855834] enp7s0: mem_base_l = 0xffff, should be 0x4004
.
I can do IO-heavy tasks like moving stuff around on my btrfs filesystem with no problems, and I have been running games for testing purposes with no issues either. At least for now, these problems only seem to crop up when using the network card heavily.
Now, I do NOT have a lot of experience with Linux in general. But here's a list of things I tried:
- Different kernel versions between 5.3 and 5.5
- Kernel parameters on grub like iommu=off, pci_aspm=off, libata.force=noncq
- Unplugging any peripherals that are not absolutely necessary (including disconnecting mouse and keyboard)
- With and without the driver from "hardware cofiguration" - the problem seemed to occur more quickly with it installed, but I can't be sure of that.
Now, most of these solutions might seem completely irrelevant to the issue, but keep in mind that I don't know what else to try, which is why I'm desperate for any kind of help I could get. Any suggestions would be greatly appreciated.