Wireles Networking is a practical guide to planning and building low-cost telecommunications infrastructure. See the editorial for more information....



Mirroring a Website

With permission of the owner or web master of a site, the whole site can be mirrored to a local server overnight, if it is not too large. This is something that might be considered for important websites that are of particular interest to the organization or that are very popular with web users. This may have some use, but it has some potential pitfalls. For example, if the site that is mirrored contains CGI scripts or other dynamic content that require interactive input from the user, this would cause problems. An example is a website that requires people to register online for a conference. If someone registers online on a mirrored server (and the mirrored script works), the organizers of the site will not have the information that the person registered.

Because mirroring a site may infringe copyright, this technique should only be used with permission of the site concerned. If the site runs rsync,the site could be mirrored using rsync. This is likely the fastest and most efficient way to keep site contents synchronized. If the remote web server is not running rsync, the recommended software to use is a program called wget.It is part of most versions of Unix/Linux. A Windows version can be found at xoomer.virgilio.it/hherold, or in the free Cygwin Unix tools package.

A script can be set up to run every night on a local web server and do the following:

  • Change directory to the web server document root: for example, /var/ www/ on Unix, or C:\Inetpub\wwwroot on Windows.

  • Mirror the website using the command: wget --cache=off -m http://www.python.org

The mirrored website will be in a directory www.python.org. The web server should now be configured to serve the contents of that directory as a name-based virtual host. Set up the local DNS server to fake an entry for this site. For this to work, client PCs should be configured to use the local DNS server(s) as the primary DNS. (This is advisable in any case, because a local caching DNS server speeds up web response times).




Last Update: 2007-01-24