Server crashes

eng pdo tips

I manage to tame my crashing server quite a bit.

When I phoned the shopkeeper asking about the terms for getting the motherboard replaced, he asked me what SATA hard drive am I using. He told me that Maxtor harddrives have big problems with NForce chipsets (in this forum there's also something). Of course I had a damn Matrox DiamondMax Plus 9 harddrive mounted on that damn SATA port.

Now I disconnected the harddrive, then run 8 kernel compiles in parallel together with this small C program:

    #include <stdlib.h>
    #include <stdio.h>

    #define SIZE 1000000000

    main()
    {
       char* buf = malloc(SIZE);
       int i;
       printf("allocated\n");
       while (1)
       {
          for (i = 0; i < SIZE; i++)
             buf[i]++;
          printf("filled\n");
          sleep(1);
       }
       return 0;
    }

the C program is to make sure that the swap is used, otherwise even 8 kernel compiles in parallel would happily fit in 1Gb RAM.

I run this kind of heavy load for 3 hours and I only had some timeouts in some network services (load was like 25!) and this error twice:

    Jan  9 14:58:46 eddie kernel: ----------- [cut here ] --------- [please bite here ] ---------
    Jan  9 14:58:46 eddie kernel: Kernel BUG at lib/radix-tree.c:372
    Jan  9 14:58:46 eddie kernel: invalid operand: 0000 [1]
    Jan  9 14:58:46 eddie kernel: CPU 0
    [...blah blah blah...]

Since the error happened twice and in the exact same position, and since I previously wanted to try the latest drivers and the kernel is a too new 2.6.15, I'd blame those errors to a normal kernel bug. That code seems to be a nasty bit which already gave problems on 64bit systems.

Now it's time to get that hard drive replaced. This is so time consuming, but at least things are improving.