Manjaro fails to boot after hard reset (because presumably out of memory)

Hi!

[Addition: Unfortunately new users can only post one picture per post but all I have are pictures. All the pictures included in the original post can be found here]
[Addition 2: I was so confused about how to start a new topic (I guess I couldn't before?) that I already posted a question on Unix Stackexchange, too (https://unix.stackexchange.com/questions/579765/manjaro-fails-to-boot-after-hard-reset-because-it-was-presumably-out-of-memory). I hope that's not a problem. If anyone is in the same situation that I was, then yes, newbies cannot post to the newbie corner right away but only after some magical time limit? I read all the newbie materials but didn't find any information on this]

What's the situation?
My Manjaro installation is not booting properly but throwing some errors (see more detailed below) and leaving me in an emergency console. I have dual-boot so I can boot to Windows and the computer seems to function properly but Manjaro has some problems. I do not have a USB stick for now so I cannot do live USB debugging but I an access a terminal from both GRUB and the emergency shell that Manjaro dumps me in after a failed boot.

What happened?
I've been running Manjaro without any issues for about a year. Today I had to perform some extensive matrix calculations that eat up all the RAM and sure enough when I ran the calculations on my 16GB RAM + 8 GB swap machine all of that was consumed and the OS killed the process which is fairly standard. I thought I could mitigate this by adding another swapfile so I created a swapfile2 that was 16GB and gave it to the system to use as swap (only did swapon, I did not mount the swapfile in /etc/fstab because I knew that I don't want it later when the computation is done). Having newly found 24GB of swap + 16 GB RAM I ran the operation again, only to find ~20 minutes later that the computer had completely frozen. I waited a couple of minutes to see if maybe it was just interacting with swap (which is relatively slow and sometimes freezes the UI), but after 5-10 minutes it seemed that it would not recover. I resorted to the easiest solution and did a hard reset to the system. Only to find that it doesn't boot any more.

What do I see?
I've been rummaging around the emergency terminal for a while now. Firstly I deleted the second swapfile but since it wasn't mounted anyways that probably didn't change anything.
For some weird reason I do not have access to |<> symbols (tried all of my keyboard but nowhere to be found) so it limits what I can do in the shell. However, I have retrieved some logs and uncovered some weird phenomena.

This is the "splash screen" that I see every time I try to boot:
image

From here on, I have two options. If I press nothing and wait, it logs some more errors and dumps me in an emergency shell. If I press enter, I can select some locale options (out of all the things!) and then after some logging it just hangs in the following screen until after a while it reboots on its own:

image removed

Anyways, I think the emergency shell is a bit more interesting. Start sequence for the emergency shell:

image removed

journalctl -xb that is recommended returns nothing, as does just journalctl and all the variations suggested here. Just says -- no entries --. systemctl default hangs for a minute or so and then reboots into the same broken Manjaro.

The end of the dmesg output can be seen here:

image removed

Unfortunately there's no scroll or copy-paste option so I don't know how to extract more than that. The dmesg output points to LVM2 logical volumes failure so after googling that a bit I came across some things that I should log out, so here's the output of fdisk -l and blkid.

image removed

Checking the fstab file I also noticed that there's some problem with all the files in the /etc directory:

image removed

I googled that too but it seems more like a symptom than the underlying problem.

And finally checked the status of the journald service, which probably is not very informative.

image removed

I hope someone can point me in the right direction, because right now I have no clue where to start debugging this mess.
Thanks!

Since you are new, it looks like your screenshots were removed so I only see the first one.

If your root partition is formatted to ext4 or some other filesystem that supports fsck, I would start by running fsck on that partition. It is possible that the hard freeze caused some disk corruption.

I would also check that your process didn't fill the disk up.

1 Like

"Structure needs cleaning" indicates you need to run an fsck on the disk. You can do that via a live environment.

I don't have access to a USB pen drive right now so running fsck is complicated. Is there any other way than from live USB?

How would I check that? df -h seems to indicate that there is some space left
IMG_20200413_160433_0

Yeah, that looks fine. You probably just need to run fsck.

90% use on your root partition is not really that fine - something has eaten a lot of space so some cleaning needs to be done (e.g. pacman -Sc).

So couple of days later I got a USB pen drive now and ran fsck on the partition. It worked and fixed a lot of problems. Now when I boot it boots to a terminal and the first time around it asked me to create a new root account and select some locale setup options as if this was a brand new installation. Previously I had Gnome but that is nowhere to be found, just like my previous account. On the other hand my data is still all there and my previous accounts' home folder as well.

I would be OK with losing my previous accounts but how could I get Gnome back? What happened to it?

Thanks

What does pacman -Qsq gnome return?

It says that /etc/pacman.conf could not be read because such file/directory does not exist. So I don't have pacman any more either? Or just the configuration?

I ran out of time with fixing it because I need the computer in full swing now. Resorted to the easy option and reinstalled Manjaro. I still don't understand however why it happened in the first place, so if anyone can explain that, it would be much appreciated. All I did was add a temporary swapfile and a hard reset after that. Why does that mess up the system so bad?

A "hard reset" can cause data loss. This is a pretty standard thing - you don't repeatedly yank the power cable out of a running computer and expect it to keep working as if nothing had happened.

A full root partition will cause other issues.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Forum kindly sponsored by