I remoted into the moodle application server, which was very slow. After about a minute of waiting for the app. server to load, I went down the hall to our networking guys and asked them to recycle the server. I tried reloading the web page and got a different error message. Could not connect to the DB. That told me the app. server was now loading, but not connecting to the db server. A little progress.
I remoted into the moodle db server, which was up and looked a message in the status bar about low disk space. A real lead! A couple weeks ago, I automated the backing up of the 11 db instances on the this shared server, I even wrote a sticky note that says "monitor the space on the F drive". The F drive is the partician where we store backup data, away from the C drive, which stores the program code.
I went into some of the folders and removed some of the backups, freeing about 3 GB of space. I reloaded the web page but still got the same DB connection message. I got out of my chair to go ask the network guys to recycle the db server as well when I realized since the server was up, I had remoted in successfully, I could recycle it. I did a restart on the db server and waited about 1 minute.
I hit the refresh key on my browser and saw the site loading....my blood pressure started retreating. I checked another site, it was up, blood pressure returned to normal.
High level steps to recover from server crash
- Ensure the site is down, what is the message
- remote into the app server
- remote into the db server
- check the drive space on the db server
- recycle the app and db server, if you can connect.
- If cannot connect, ask LakeNet to recycle the servers
- Check the error logs for specific messages
Moral of the story? - a bunch..
Monitor drive space.Increase drive space, where backups are stored.
Move backups somewhere else, like a shared Novell drive.
Do not share a db server, let each moodle instance live separately on its own VM.
Do not panic, stay calm, think clearly.
Do not assume what the problem is, verify yourself
Use the error logs for details
Take steps to prevent a re-occurrence
This is the error message reported in the moodle error log
[Thu May 17 08:59:07 2012] [error] [client ip address] ADODB Error: Can't connect to MySQL server on 'server ip address' (10061)
No comments:
Post a Comment