My server has been crashing a lot, and now I'm home to do some more fixing on it. If you've been sending me mail and didn't get a reply, it's one more reason for pinging me again.
Various things were crashing:
- weird vm failures like
Bad page state at free_hot_cold_page
errors - pyblosxom going wild with memory allocation and getting OOMed
- sata_nv driver starting to do nothing but dumping pointers on console, really fast
- occasional kernel panics
This was on kernel 2.6.12. I did not find a way to upgrade to newer ones for sarge on amd64 and get the system to boot: seems to be because some partitions are in LVM.
Yet another update of the mobo flash didn't help much.
I left memtest86+ running for a night: memory seems to be perfect.
I turned off swap and run badblocks on the swap partition, read and write tests: that's ok as well.
I'm monitoring internal temperature, both motherboard and hard drives, and the system isn't overheating.
So, it was time to upgrade to testing. Thanks to various cool people we have testing-security which makes the idea of running testing on my server a bit less scary.
The upgrade brought udev, and (sigh) the random swap of the ethernet devices. I found something here. It works, I have even to say it's cool, although it doesn't work if I use 'eth0' and 'eth1' as device names: gotta rewrite a couple of scripts.
Now I'm on testing, and some files are corrupted because of the various hard drive-related crashes. Ganneff and Yoe nicely pointed me at debsums, so let's have some fun:
debsums -a -c 2>&1 |tee debsums-report.log
not so cool things come out. Among them, a list of packages that are not using dh_md5sums. Bad bad: people, please build packages with that: it's nice and comfy. Luckily debsums reports them and I can just apt-get install --reinstall them.
Another cool thing of debsums, Ganneff pointed out, is that when it's installed and you install a package that doesn't have the md5sums inside, then debsums will compute and store the sums. So, if you don't have debsums installed, it seems to be a good idea to do it now.
Ok, so now I'm on 2.6.15. As a side effect, the onboard gigabit ethernet card is now supported: cool! Now let's see if it holds up, and then, since now I'm on etch, I could as well enjoy it and start playing with Rails :)
It could even be an opportunity to replace pyblosxom. Any good Rails_ blog around? I couldn't find any who is also available in Debian :(