Log showing why the hardware fails

catch22 · April 26, 2023, 9:08am

I have a desktop that dates from 2011. It used to work great and was very reliable, always running Linux Mint.
Since recently it’s not fully functional anymore - crashing regularly - and I’d like to find out why; overheating? dying motherboard? …
I know absolutely nothing about hardware, but of course I can give some details about the machine, if needed.

My question is:

what command(s) can I run to get a log diagnosing why the machine crashes?

ThatGuyB · April 26, 2023, 11:18am

Welcome to the forum!

sudo dmesg when the system is running, sudo cat /var/log/messages and any other of the messages* files, to check if any error was reported there.

sudo smartctl -a /dev/sda would also show disk failure, if there are any, assuming /dev/sda is your main disk. Running lsblk will show you where your root partition is mounted. Use that, minus the partition number.

Desktops that old tend to develop issues with their PSUs, not providing enough juice to the system, those would be mostly silent / undetected errors. Your system generally freezes when that happens, but could also crash on rare occasions. Crashes are more frequent with faulty RAM and disks.

hulxmash · April 27, 2023, 12:27am

You can test the ram easily with memtest86. If you happen to have a bootable arch Linux install media, memtest it’s built in to the ISO. Hard drives (as long as it’s not an nvme drive) often report signs of failure with S.M.A.R.T. Data. This can often be found in the disks utility of the OS. I’ve never used mint so I’m not sure what utility it has, but I’m sure a quick google will find you something.

catch22 · May 2, 2023, 4:10pm

Where can I upload the output of sudo dmesg for you to see?
For the second suggestion you gave (cat /var/log/messages) I get “no such file”.
With the smartctl command I get “command not found” and when installing smartmontools it says “temporary failure resolving ‘archive.ubuntu.com’”

ThatGuyB · May 3, 2023, 1:07am

Seems like you got some DNS issues. Check /etc/resolv.conf and use something like 9.9.9.9.

Weird that you cannot see the /var/log/messages. Did you try “sudo cat /var/log/messages” ?

https://0bin.net/

catch22 · May 3, 2023, 8:14am

https://0bin.net/paste/HAUq-Sr

that’s the pastebin; the other suggestions output will follow a bit later.
EDIT: here they are…
I even tried su - instead of sudo, but still get “no such file or directory”.

In /etc/resolvconf I see 3 files: base, tail, head, so where do I use 9.9.9.9.?

catch22 · May 3, 2023, 8:47am

is this the disk utility? I see it says SMART is not enabled.

hulxmash · May 3, 2023, 11:02am

That is the disk utility I was referring to. That utility would show you the output of the smartctl command as shown above. You might check your bios to be sure that S.M.A.R.T. monitoring is enabled (although, it would be strange if it wasn’t) but most likely its because you are missing smartmontools. I assumed mint would have this installed.

catch22 · May 7, 2023, 4:48pm

Meanwhile I managed to install smartmontools and used these 2 commands

    sudo smartctl -t short -a /dev/sda
    sudo smartctl -a /dev/sda

One of the lines reads:

177 Wear_Leveling_Count     0x0013   094   094   000    Pre-fail  Always       -       99

I’m guessing this means the SSD is almost completely dead??

ThatGuyB · May 7, 2023, 7:53pm

I have used HDDs in pre-fail safely for years, but never SSDs in that state. It is technically what the manufacturer reports as expect life cycles, which you probably went past. It can still work, but it is possible you will encounter write errors and maybe bit flips (if you don’t use something like ZFS in a RAID configuration that ensures the proper data always gets read).

Generally when you reach pre-fail, it is time to back up your data and expect the drives will die, as in, be prepared to buy new ones in case they do.

catch22 · May 19, 2023, 9:50am

A friend of mine has some kind of benchmark tool for SSD’s and he took the disk home to test; apparently it is in good shape.
When I get the disk back in June, I’ll see what happens when I do a clean reinstall.