I got home from work on Thursday Night and noticed something funny going on with our webserver. The automatic server software update that keeps our backend software up to date and secure was stuck in the middle of an upgrade.
I stopped the stalled process and tried running the update by hand. It got stuck again.. trying to install an update for our RAID controller card. I stopped the stuck process again and it did the same thing...
So I contacted Cpanel support and they logged in and somehow made the update run and complete.
However sometime after they finished we started having an issue, at random times the server load on the server would go from about 1.2 to 125 or higher showing a PFAULT error! The only way to get the load back down was reboot the server.
I stayed up Thursday night and whenever the server started acting flaky I would reboot it by hand.
Friday night I got home from work and put time in rebuilding Apache. I have tried a number of configurations but after each one the problem was back. So again Friday night I went to bed but woke up about every hour to check on things and reboot things if needed.
I then had a dream of how to fix things a little.. I wrote a cron job to reboot the web server software every 15 minutes. This idea worked, and twice during the day yesterday the server locked up but within a few minutes we were back online again.
Last night I got a call from our server guru LER. I explained to Larry what was happening and he was stumped as was I. And then I remembered what started this entire thing... the update that crashed for our RAID controller card. I said to him, I wonder if the new RAID controller software needs a newer FREE BSD driver or kernel. Larry did some quick research at a Free BSD site and noticed that there was an update for our RAID controller card and it also needed a Kernel update as well.
Larry worked his magic and got us on the latest stable version of Free BSD and rebooted the server...
Since then we have not had an issue!
I woke up overnight every hour checking on things and all was good. I then got up and removed my cron job which restarted the web server software. We have been running GREAT since then.
So THANK YOU to LER for his help last night! And also thanks to you guys for your patience while I knocked my head against the wall trying to figure out this issue. I am sorry for any slowness or downtime that we had over the past few days.
Hopefully tonight I can get some sleep instead of automatically waking up every hour on the hour.
I stopped the stalled process and tried running the update by hand. It got stuck again.. trying to install an update for our RAID controller card. I stopped the stuck process again and it did the same thing...
So I contacted Cpanel support and they logged in and somehow made the update run and complete.
However sometime after they finished we started having an issue, at random times the server load on the server would go from about 1.2 to 125 or higher showing a PFAULT error! The only way to get the load back down was reboot the server.
I stayed up Thursday night and whenever the server started acting flaky I would reboot it by hand.
Friday night I got home from work and put time in rebuilding Apache. I have tried a number of configurations but after each one the problem was back. So again Friday night I went to bed but woke up about every hour to check on things and reboot things if needed.
I then had a dream of how to fix things a little.. I wrote a cron job to reboot the web server software every 15 minutes. This idea worked, and twice during the day yesterday the server locked up but within a few minutes we were back online again.
Last night I got a call from our server guru LER. I explained to Larry what was happening and he was stumped as was I. And then I remembered what started this entire thing... the update that crashed for our RAID controller card. I said to him, I wonder if the new RAID controller software needs a newer FREE BSD driver or kernel. Larry did some quick research at a Free BSD site and noticed that there was an update for our RAID controller card and it also needed a Kernel update as well.
Larry worked his magic and got us on the latest stable version of Free BSD and rebooted the server...
Since then we have not had an issue!
I woke up overnight every hour checking on things and all was good. I then got up and removed my cron job which restarted the web server software. We have been running GREAT since then.
So THANK YOU to LER for his help last night! And also thanks to you guys for your patience while I knocked my head against the wall trying to figure out this issue. I am sorry for any slowness or downtime that we had over the past few days.
Hopefully tonight I can get some sleep instead of automatically waking up every hour on the hour.