I have a desktop that dates from 2011. It used to work great and was very reliable, always running Linux Mint.
Since recently it’s not fully functional anymore - crashing regularly - and I’d like to find out why; overheating? dying motherboard? …
I know absolutely nothing about hardware, but of course I can give some details about the machine, if needed.
My question is:
- what command(s) can I run to get a log diagnosing why the machine crashes?
Welcome to the forum!
sudo dmesg when the system is running,
sudo cat /var/log/messages and any other of the messages* files, to check if any error was reported there.
sudo smartctl -a /dev/sda would also show disk failure, if there are any, assuming /dev/sda is your main disk. Running
lsblk will show you where your root partition is mounted. Use that, minus the partition number.
Desktops that old tend to develop issues with their PSUs, not providing enough juice to the system, those would be mostly silent / undetected errors. Your system generally freezes when that happens, but could also crash on rare occasions. Crashes are more frequent with faulty RAM and disks.
You can test the ram easily with memtest86. If you happen to have a bootable arch Linux install media, memtest it’s built in to the ISO. Hard drives (as long as it’s not an nvme drive) often report signs of failure with S.M.A.R.T. Data. This can often be found in the disks utility of the OS. I’ve never used mint so I’m not sure what utility it has, but I’m sure a quick google will find you something.
Where can I upload the output of sudo dmesg for you to see?
For the second suggestion you gave (cat /var/log/messages) I get “no such file”.
With the smartctl command I get “command not found” and when installing smartmontools it says “temporary failure resolving ‘archive.ubuntu.com’”
Seems like you got some DNS issues. Check /etc/resolv.conf and use something like 220.127.116.11.
Weird that you cannot see the /var/log/messages. Did you try “sudo cat /var/log/messages” ?
that’s the pastebin; the other suggestions output will follow a bit later.
EDIT: here they are…
I even tried su - instead of sudo, but still get “no such file or directory”.
In /etc/resolvconf I see 3 files: base, tail, head, so where do I use 18.104.22.168.?
is this the disk utility? I see it says SMART is not enabled.
That is the disk utility I was referring to. That utility would show you the output of the smartctl command as shown above. You might check your bios to be sure that S.M.A.R.T. monitoring is enabled (although, it would be strange if it wasn’t) but most likely its because you are missing smartmontools. I assumed mint would have this installed.
Meanwhile I managed to install smartmontools and used these 2 commands
sudo smartctl -t short -a /dev/sda
sudo smartctl -a /dev/sda
One of the lines reads:
177 Wear_Leveling_Count 0x0013 094 094 000 Pre-fail Always - 99
I’m guessing this means the SSD is almost completely dead??
I have used HDDs in pre-fail safely for years, but never SSDs in that state. It is technically what the manufacturer reports as expect life cycles, which you probably went past. It can still work, but it is possible you will encounter write errors and maybe bit flips (if you don’t use something like ZFS in a RAID configuration that ensures the proper data always gets read).
Generally when you reach pre-fail, it is time to back up your data and expect the drives will die, as in, be prepared to buy new ones in case they do.
A friend of mine has some kind of benchmark tool for SSD’s and he took the disk home to test; apparently it is in good shape.
When I get the disk back in June, I’ll see what happens when I do a clean reinstall.