Not a usual scripting or deployment post this time – this is a website server rebuild diary. I wanted to make a quick post to document a server rebuild and to log the sort of work that goes into maintaining this and the www.autoitscript.com site. Even small (ish) sites can be quite a bit of effort.

Current environment is CentOS 5.8 (upgraded many times from earlier versions of 5.x) running on a dedicated server (Quad core, 8GB RAM, mirrored disks).

CentOS packages aren’t upgraded throughout the lifetime of a major release (all 5.x versions) aside from security fixes. This means that the OS is very stable and great for hosting but it can require some custom packages of newer software to cater for the websites that I want to run. Significant packages include:

  • Apache 2.2.22 (Jason Litka Repo)
  • MySQL 5.1.48 (Jason Litka Repo)
  • PHP 5.2.17 (Jason Litka Repo)
  • Subversion 1.7 (self build)
  • Python 2.4

As you can see, the only “standard” CentOS package is Python 2.4 as many of the CentOS internals rely on it and upgrading it can cause major problems.

The main web applications we use are:

We currently handle around 40GB of traffic per day and the forum frequently has close to 1000 active users at once. I have recently started to use Amazon CloudFront to serve images so that the site is quite fast no matter where it is accessed from.

Tuesday 13th November, 2012

Attempted to upgrade MediaWiki to version 1.20 and found that it now has a minimum requirement of  PHP 5.3. I found that CentOS had unusually released a special set of PHP 5.3 RPMs so I installed those. PHP immediately stopped working and this was traced to my APC (PHP accelerator) extension which needs to be compiled against the correct version of PHP. I reinstalled APC using “pecl” and everything worked again. I was pretty pleased that PHP 5.3 was running as there are more and more applications (particularly WordPress extensions) which require 5.3 that I could now use.

I looked to upgrade Trac to v1.0 which has always been fairly straightforward, but found that it now had Python 2.5 as a minimum requirement. I could attempt to build a custom version of Python just for use in Trac, but I decided that as my hosting provider could supply a CentOS 6 image that I would use the opportunity to rebuild the server entirely (with so many upgrades and custom packages it would be nice to get back to a fresh install).

Unfortunately, the ISP imaging process will wipe the disk entirely, making the upgrade “fun”…

Wednesday 14th November, 2012

I spent a couple of hours double checking my backup processes. These are performed daily using a combination of:

  • Plesk Backup Manager – Backs up hosting settings, email, DNS and MySQL databases.
  • Custom shell scripts – Backs up anything not handled by Plesk – there’s a lot of tweaks to config files that happen over the years that I want to preserve. I also manually backup the MySQL databases in addition to Plesk as I’m a little paranoid…

The custom shell scripts create a rolling 30 day set of backups locally on the server. They also store the backups on a special FTP server provided by the ISP for off-server backups. Once a month I take a copy of  the latest backup to my home machine. This is not a quick operation as the backup files are 4 GB…zipped!

After checking the backup scripts I notice that the local FTP upload has not been working correctly due to an ISP infrastructure change. I also find that the newer version of Plesk that I’m using no longer backs up some of the customisations to web hosting config files that I’d made.

Time to triple check the scripts!

Thursday 15th November, 2012

During the rebuild the site is likely to be offline for a number of hours. At some point (after Plesk initially restores the hosting and database settings) the websites will likely be “up” but “broken” as I install the various software packages that the web applications depend on. I don’t want my Google rankings to be hit during this time and I want to minimise the problems that users get trying to use a half-installed site. This Google Webmaster post recommends using the 503 HTTP result code to indicate that the site is temporarily down so that search results don’t get mangled. I added the following code to my hosting configuration and will be enabling it just before I start the full site backup and restore.

ErrorDocument 503 "Our website is temporarily closed for maintenance. Please check back later."
RewriteEngine On
RewriteCond %{REMOTE_ADDR} !^111\.222\.333\.444$
RewriteRule .* - [R=503,L]
ErrorDocument 503 "Our website is temporarily closed for maintenance. Please check back later."
RewriteEngine On
RewriteCond %{REMOTE_ADDR} !^111\.222\.333\.444$
RewriteRule .* - [R=503,L]

This will allow me to access the web sites normally with my IP address, but anyone else will get the 503 message.

Friday 16th November, 2012

I took the sites offline with the 503  message above and started the backup to the ISP local FTP site. I also took a copy to my home machine where I manually extracted the files to make sure that they looked OK. I started the ISP server image process for their CentOS 6.3 image. A nice feature of the 1&1 hosting package I use is that you can connect to your server via a SSH Serial console – this means that even if the server is completely broken and not on the network you can still interact with it. And when doing a server rebuild this helps to reassure myself that something is actually happening.

The imaging process took about an hour, and then it took another hour to download and extract the backup from FTP. Unlike my backup scripts I’d don’t have scripts to automate the restore process – usually I’m rebuilding to a different platform so it would be tricky to automate properly. However I have notes for every piece of software I need which documents how I set it up previously. I also keep a master list of Yum packages in a script so that I can reinstall every Yum package quickly.

A summary of the restore procedure is:

  • Run “Yum Update” to ensure that all components are current
  • Reinstall Yum packages from my master list
  • Run some custom scripts that recreate the users and permissions I need
  • Extract backup files from FTP
  • Use the Plesk Backup Manager to restore the website files, basic hosting settings, DNS, email and MySQL databases

At this point the main website functionality was restored and the website and forums could be brought online again which I did by removing my 503 message. The site was usable again for the majority of users and I could restore the rest of the site services over the next few days (Subversion, Trac, etc).