Several issues at once, getting worse

Hello everyone,

before moving away from Manjaro and switching to KDE Neon (which I really do not want to do, mainly because I like rolling releases and do not trust Ubuntu/Canonical), let me try to tell you about my increasingly troublesome relationship with Arch distros.

Over the past few months with Manjaro (KDE), the following issues started to show:

· After a while (really that vague, sometimes very quickly, sometimes only after a half hour or more) any program stops opening, no matter if tried from the menu (Application Launcher) or by clicking on an associated file, without any error message except when I click on a text file, then it says KDEinit can not open Kate.

· Related to that, although the menu (Application Launcher) is still opening, even basic OS functions stop working, like reboot and poweroff. The only chance is to open Yakuake (F12, that does still work) and type the respective command, which then mostly is replied with a “watchdog did not stop” warning and a few others (not always, and not always the same) before the machine restarts/shuts down.

· ISO files are sometimes recognised as such, sometimes are treated as text files (did you ever wait for Kate to open 3.2 GB?)

· Startup has become much slower. I have Manjaro installed on an external SSD drive (Angelbird SSD2GO, a real gem!) via USB3/USB3.1-C, which in the beginning responded in about 20 secs from turning on to login screen and another 7 secs from password confirmation to desktop; now it takes at least a minute or more, I didn’t take the time. Especially the second part, after login confirmation, has become disturbingly slow.

· Many times, or rather, almost every other time, the wired network connection fails. I have set it to not connect automatically, and when I click “connect”, after a few seconds of spinning the red symbol shows, then going back to the not-connected symbol. Sometimes I have to try three, four times before I am actually connected.

· Sometimes (vague again, because not consistent) when leaving the machine for a while (long enough for a screen lock, that is), coming back would give me a black screen with the message
The screen locker is broken and unlocking is not possible anymore. In order to unlock switch to a virtual terminal (e.g. Ctrl+Alt+F2), log in and execute the command
loginctl unlock-session c2
Afterwards switch back to the running session (Ctrl+Alt+F1),
That never works for me because I am not using an English keyboard layout which yet seems to be the default for the virtual terminal at this stage, so, my password doesn’t match. My temporary work-around here was to deactivate screen locking altogether — yet that must not be a permanent solution. Sometimes I need the locking.

· Most worrisome, though, is the warning that sometimes shows on startup, between GRUB and login, which reads something like
Delayed block allocation failed for inode 2753385 at logical offset 1378 with max blocks 3 with error -5 This should not happen!! Data will be lost
The numbers are never the same, the disk tests fine, and the message does not display every time. It does never display when I start Manjaro using kernel 4.9.66-1 (switching kernels appears to help with the programs-not-opening issue), but with my standard, 4.14.3-1.

To be fair, the programs-not-opening issue I have experienced with USB thumb drive installs of both SwagArch (XFCE) and Antergos (KDE), too. So, it’s more likely an Arch issue, but I do not find anything in their forum. The KDEinit message from above might also point to a Plasma related situation, but I didn’t find anything there, either (and it’s happening on a non-Plasma system, too).

So, should anyone have any idea what might be the cause here, please let me know. I’m desperate.

Thanks, and of course thanks to Manjaro’s developers, too! Apart from those annoying issues, I love Manjaro!

Some kernels work better with some machines. If kernel 4.9 works better, there may be a regression in kernel 4.14, or 4.14 may never work well on your hardware. I've been through several kernel releases where odd-numbered releases worked better on my hardware than even-numbered releases (though 4.14 seems to be an exception...).

So - I'd start with switching back to 4.9 (or up to 4.15) to narrow down the range of issues.

As it's a USB drive then disabling USB power saving may also help; edit /etc/default/tlp and set USB_AUTOSUSPEND=0 (then reboot).


(moved from #general-discussion to #support-for-manjaro-editions as this is a request for help, not a discussion item; added #kde tag)

Thanks Jonathon, and sorry — I am new here and was totally unsure were to post this.

Thanks for your quick reply!

That only helps avoiding the Delayed block allocation… warning. Programs refuse to open also when Manjaro starts with kernel 4.9.

Help with what exactly, the Delayed block allocation… warning, ot the programs-not-opening issue? I really like autosuspend.

Ah, OK. So we need to dig deeper. :slight_smile:

What happens when you run a program from the terminal (e.g. by typing firefox)?

Is there any output in dmesg?

The TLP default will aggressively suspend USB devices that may not support it; by not forcing it, the device drivers will suspend the device only if it's supported by the driver.

Basically, it's one extra thing to rule out as a possibility, especially as it's a USB device.

There's more info here:

1 Like

Thanks again, this is a very positive experience I’m having; not only is your forum software amazing, but I also really get profound help. Great. :hearts:

I will now reboot (I have nano’ed /etc/default/tlp already) and wait for the problems to re-occur, then get back here with any dmesg output.

See you in a bit.

1 Like

Well, well, well … may be totally unrelated, or not: startup was very fast this time.

As I said, I edited tlp, then — without rebooting yet — opened some random programs. After a short while the issue occured again, nothing would open. I then tried opening KPatience from CLI, and this is what I got:

Invalid MIT-MAGIC-COOKIE-1 keyqt.qpa.screen: QXcbConnection: Could not connect to display :0
Could not connect to any X display.

Then I rebooted, using kernel 4.9. Very speedy, as I mentioned, and I have now opened several random programs (at least ten, normally the issue should have happened already) without any problem.

Does this clarify anything? Could “unforcing“ suspend actually also solve the other problems? Or, to put it another way: Could the X server be affected by the forced autosuspend command, although no suspending was initiated?

Thank you once again for your help, Jonathon. Very much appreciated!

Maybe I’ll try rebooting in a while, using kernel 4.14 again. I will keep posting further results. For now, I cross my fingers …

The change wouldn't have taken effect at that point. TLP is a service, so you could have restarted it with systemctl restart tlp, but "reboot" is easier to say. :slight_smile:

Yes - you're running from a USB disk, so if TLP is trying to suspend the disk while you're using it, it could well cause stalls and all those related issues.

Well worth trying 4.14 now. :slight_smile:

1 Like

Slightly off-topic but USB autosuspend should be disabled by default. It's a PITA when working with external USB drives.

2 Likes

I'm very likely now to take on maintenance of the TLP package and change the terrible defaults. This comes up way too often...

2 Likes

I was aware of that; I kept going so the issue would come up again and I could give you a proper dmesg :slight_smile:

:slight_smile: Get it! For future reference, try me — I consider myself medium experienced and pretty adaptive.

Will do. Get back to you.

Is TLP a Manjaro-specific package? Or does it relate to Arch? Because, remember, I’ve had those issues with SwagArch¹ and Antergos, too.


¹ Which is quite nice, by the way, but needs too much tweaking for my needs, Thunar not offering search and some other XFCE-related things, mainly.

TLP is installed by default in Manjaro, not sure about Arch and other derivatives, but with the Arch approach the default configuration is whatever the upstream project suggests. These aren't necessarily the best, but Arch expects the user to be able to configure their own system.

Manjaro tries to be a little more proactive in making users' lives easier so I'm likely to change these defaults in a Manjaro-specific package. If I can find the upstream issue tracker I'll report the issue there too so hopefully they can fix it for everyone.


Edit:

1 Like

Great. Thanks!

Well, I am sorry to say that my problems are not solved.

First, I kept going with the rebooted system for a while, which ultimately resulted in yet another program opening failure. More often than not it’s the network manager that at some point stops opening the configuration window.

Then I restarted (having to use Yakuake, F12 reboot again), this time allowing kernel 4.14 to load, and yet again, there it was, Delayed block allocation… It only showed very briefly, though, the first part of the booting process was still fast. After entering the password it got much slower again.

Do you perchance have any other idea what I might try?

Thanks!

(I’ll be off for an hour or three, but will check back later.)

OK, so if behaviour is roughly the same on both 4.9 and 4.14, and given this is an SSD, the next thing to try will be making sure the disk scheduler isn't playing a part. I've recently found that bfq can get in the way of certain workloads whereas noop, which should be slower, works better. It's also normally recommended for SSDs, so:

First, identify the disk on your system with e.g. lsblk. My examples will use /dev/sdc, change it to match your SSD location.

On a live system,

  • to check the current scheduler:
    # cat /sys/block/sdc/queue/scheduler
    [noop] deadline cfq bfq-sq
    
  • to change it:
    # echo "noop" > /sys/block/sdc/queue/scheduler
    

To make this take effect on boot,

  • for a single boot, press F10 during boot and add elevator=noop to the GRUB boot line.

  • for every boot, edit /etc/default/grub, add elevator=noop to the end of GRUB_CMDLINE_DEFAULT, then sudo update-grub (and reboot).

(this last bit needs checking, I haven't used elevator= for a while)


A last resort, given some of the other issues, would be to reinstall to make sure there aren't any missing or corrupted files or settings.

You could check the state of packaged files with:

pacman -Qkk $(pacman -Qsq) | grep -v " 0 altered"

Jonathon, thank you so much for again getting back to me!
I was unable to check yesterday, but now have a few replies:

My result here is
noop deadline cfq [bfq]
which probably means that bfq is running, right?

Trying this, I get Permission denied, no matter if I try it as user or add sudo.

Seems to be correct, other sites are recommending the same. I will first try the on-single-boot solution, then change /etc/default/grub.

Well, this is pretty crazy. That command presents me with lines on end, most of them warnings about missing language files (e.g. /usr/share/locale/xy/LC_MESSAGES/zvbi.mo (No such file or directory) with xy being various language codes)
Also, it gives me results like these,

backup file: sddm: /etc/pam.d/sddm (Size mismatch)
sed: 127 total files, 39 altered files
shadow: 558 total files, 372 altered files
shared-mime-info: 230 total files, 72 altered files
skanlite: 242 total files, 51 altered files
solid: 251 total files, 59 altered files
sonnet: 373 total files, 101 altered files
spectacle: 186 total files, 35 altered files
subversion: 356 total files, 12 altered files
backup file: sudo: /etc/sudoers (Modification time mismatch)
backup file: sudo: /etc/sudoers (Size mismatch)
sudo: 187 total files, 58 altered files
syntax-highlighting: 300 total files, 83 altered files
system-config-printer: 479 total files, 61 altered files
systemd: 1398 total files, 28 altered files
systemd-kcm: 82 total files, 20 altered files
tar: 131 total files, 38 altered files
testdisk: 34 total files, 6 altered files
texinfo: 551 total files, 62 altered files
backup file: tlp: /etc/default/tlp (Modification time mismatch)
tre: 33 total files, 2 altered files
udiskie: 101 total files, 3 altered files
udisks2: 406 total files, 69 altered files
upower: 73 total files, 4 altered files
util-linux: 484 total files, 26 altered files
v4l-utils: 278 total files, 2 altered files

and later

vlc: 1005 total files, 115 altered files
volume_key: 132 total files, 38 altered files
webkit2gtk: 565 total files, 46 altered files
wget: 129 total files, 38 altered files
wxgtk-common: 823 total files, 22 altered files
xdg-desktop-portal: 85 total files, 15 altered files
xdg-user-dirs: 242 total files, 75 altered files
xkeyboard-config: 451 total files, 41 altered files
xorg-fonts-encodings: 59 total files, 2 altered files
xz: 119 total files, 6 altered files
yakuake: 354 total files, 52 altered files
yaourt: 118 total files, 33 altered files

Earlier results (for anything before sddm) are cut off because my Konsole has a limitation (and I don’t know which one, nor where to change it).

Yeah, that I was trying to avoid. Because my mind tells me, “why reinstall? Probably the same thing happens again after a few days…”

Thanks anyhow, I will check elevator=noop now.

You have to run it as root (e.g. after using sudo -i). (I think there's something involving "sudo tee" that would also work.)

You have an awful lot of file changes. This could be missing localisation files, but it suggests file system corruption.

I'd recommend reinstalling, then running with USB_AUTOSUSPEND=0 to make sure the drive doesn't get powered down while you're using it.

1 Like

Well, see our other communication, fresh install with your tlp modification did not change a thing. Same problem after first reboot already.

Thanks anyhow, you have been very kind to try and help.

1 Like

Maybe it was just not meant to be.

:cry:

Forum kindly sponsored by