Understanding server response codes

Every time a page is requested your server reports how the process of finding and sending the file to the user went. By analysing the server response codes that your server is spitting out, you can diagnose various problems with your site, as well as learning much about the surfing habits of your readers.

Headers

Whenever a user sends a request to a server, a process called a ‘handshake’ begins where the server and your computer communicate and the server makes sure it can accommodate what your user has requested of it. This means being able to make the connection between the two computers and then completing the transfer of data.

Headers are short fragments of text which are generated by servers to hold information pertaining to each transfer as it occurs. There are four kinds of headers:

General
This holds information about the client (user), the server itself and the protocol being used (like http or ftp).
Entity
This holds information about the data that is being transferred.
Request
This holds information about the allowable formats and parameters for the transfer.
Response
This is sent out by the server at the end of a transfer, and includes detailed information, in code form, on the outcome of the transfer.
The Response Codes

As a web surfer you’ve probably become familiar with the dreaded 404 error message (and possibly made your own), which signifies a ‘page not found’ error. That’s the most well-known server response code, but there are many more. These numerical codes are grouped — the low numbers are generally ‘good’, and operate silently, while anything over 400 is definitely bad news and will be reported to the user in the form of an error message.

Code Explanation
100-199 Silent Response Codes that signify that a request has been received and is currently being processed.
100 The request has been completed and the rest of the process can continue.
101 The user’s request to switch protocols (like from FTP to HTTP) was accepted.
200-299 Silent codes that confirm that requests have completed successfully.
200 Ok — the file which the client requested is available for transfer. This is the response code you want to see all of your users receiving.
201 When new pages are created by posted form data or by a CGI process, this is confirmation that it worked.
202 The client’s request was accepted, though not yet processed.
203 The information contained in the entity header is not from the original site, but from a third party server.
204 If you click a link which has no target URL, this response is elicited by the server. It’s silent and doesn’t warn the user about anything.
205 This allows the server to reset any content returned by a CGI.
206 Partial content — the requested file wasn’t downloaded entirely. This is returned when the user presses the stop button before a page is loaded, for example.
300-399 A redirection is occurring from the original request.
300 The requested address refers to more than one file. Depending on how the server is configured, you get an error or a choice of which page you want.
301 Moved Permanently — if the server is set up properly it will automatically redirect the reader to the new location of the file.
302 Found — page has been moved temporarily, and the new URL is available. You should be sent there by the server.
303 This is a “see other” SRC. Data is somewhere else and the GET method is used to retrieve it.
304 Not Modified — if the request header includes an ‘if modified since’ parameter, this code will be returned if the file has not changed since that date. Search engine robots may generate a lot of these.
400-499 Request is incomplete for some reason.
400 Bad Request — there is a syntax error in the request, and it is denied.
401 The request header did not contain the necessary authentication codes, and the client is denied access.
402 Payment is required. This code is not yet in operation.
403 Forbidden — the client is not allowed to see a certain file. This is also returned at times when the server doesn’t want any more visitors.
404 Document not found — the requested file was not found on the server. Possibly because it was deleted, or never existed before. Often caused by misspellings of URLs.
405 The method you are using to access the file is not allowed.
406 The requested file exists but cannot be used as the client system doesn’t understand the format the file is configured for.
407 The request must be authorised before it can take place.
408 Request Timeout — the server took longer than its allowed time to process the request. Often caused by heavy net traffic.
409 Too many concurrent requests for a single file.
410 The file used to be in this position, but is there no longer.
411 The request is missing its Content-Length header.
412 A certain configuration is required for this file to be delivered, but the client has not set this up.
413 The requested file was too big to process.
414 The address you entered was overly long for the server.
415 The filetype of the request is unsupported.
500-599 Errors have occurred in the server itself.
500 Internal Server Error — nasty response that is usually caused by a problem in your Perl code when a CGI program is run.
501 The request cannot be carried out by the server.
502 Bad Gateway — the server you’re trying to reach is sending back errors.
503 Temporarily Unavailable — the service or file that is being requested is not currently available.
504 The gateway has timed out. Like the 408 timeout error, but this one occurs at the gateway of the server.
505 The HTTP protocol you are asking for is not supported.

Preventing Bandwidth Theft with Apache

Even if you expressly ask them not to, some webmasters will try to directly link to your images from their pages. By stealing your images in this way they’re using up your site’s bandwidth without notifying their visitors of your site, which means you get no credit and a bigger bandwidth bill at the end of the month. Erk! Luckily, a simple configuration change provides the necessary fix.

The Swindle

So how exactly is this dastardly deed carried out? It’s a simple ruse: the offending webmaster sees an image they like the look of on your site. Smiling craftily, they murmur “I’ll have that,” and proceed to write an img tag like this:

<img src=”http://www.yoursite.com/media/image.gif” alt=”I eat bandwidth for breakfast”>

Your image will then appear on their site as if it was one of their own. This practice is known as “hotlinking” or “leeching” an image. It’s rude, and often infringes on the copyright of the image.

You’ll often see people hotlinking images to use as their » avatar or signature image on messageboards. It is a poor strategy to link to images stored on a separate server than the one your webpages are on, as it slows the loading of the page, not to mention leaving you wide open to embarrassing retribution of the » “switcheroo” variety.

A webmaster could also offer a link to an image on your site, like this:

<a href=”http://www.yoursite.com/media/image.gif”>Download this image</a>

This is pretty much the same deal, though at least this time people will see that they’re getting the image from a different server than the one that referred them to it. However, this is poor form, and it would be much better for the webmaster in question to either take a local copy of the image (save it on their own server and link to that image, provided there’s no copyright infringement), or link to a page on your site that has the image on it.

What we want to do is to stop images from loading on remote sites by redirecting all requests for them to another location.

Locking the Door

As with most configuration changes in this section, this technique requires that your site is hosted on a computer running the Apache web server. The vast majority of servers do run Apache, since it’s free and excellent, although those of you on IIS or whatever could consider this » PHP solution.

So, how are we going to work this? Well, whenever an image is requested from a server, the page that referred to this image is sent as part of the request. This is the page that linked to the image or the page that contained the img call to it. We can look at this referrer, and if it doesn’t match a list of sites that you allow to link to your images, you can block the image from being sent.

Let’s Do This Thing

Open your .htaccess (or create a new one), and add these lines:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yoursite\.com [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} ^http://.*$
RewriteRule \.(jpe?g|gif|bmp|png)$ /media/nohotlinks.png [L]

We’re using URL rewriting to redirect any unwanted image requests. If you’ve done any redirecting before this should seem straightforward enough. Let me step through this, line by line:

Before we do any redirect, we set down some conditions — those are the two RewriteConds. The first checks if the variable HTTP_REFERER does not start with either http://yoursite.com or http://www.yoursite.com (the question mark meaning “zero or one occurrences of the preceding parentheses,” and the exclamation mark negating the match). The [NC] flag simply makes the match case-insensitive.
The second condition checks if no referrer was sent, which may occur if a visitor typed the image’s address into the location bar. We don’t want to block those requests.
The third condition checks if the referrer header does actually contain another website’s URL. This is to guard against doing the wrong thing in the case of users with special software on their computers that replace all referrer headers they send with text like “Blocked by personal firewall.” Again, we don’t want to block those requests.
If all of these conditions are true, we know that the image is being requested from a remote site, and can go ahead with the redirect. “HTTP_REFERER” (with one ‘r’) is not a mistake; some joker on the HTTP team just couldn’t spell, and this has survived as a geeky joke ever since.
The RewriteRule itself is a simple one. It simply looks at the file extension of the file being served. If the file has any of the extensions listed, it is rewritten to our ‘nohotlinks’ image.
sourcetip: Don’t try to redirect blocked image requests to a HTML page instead of this ‘nohotlinks’ image. It won’t work, because the browser is expecting a file with the MIME type of an image, and so will only accept another image.

The image you redirect to should obviously be small or you’re defeating the purpose. What you put in the image is up to you — people have had past success with retina-searing animations or the aforementioned » humourous replacement strategy.

If you would like instead to simply block the images completely and not redirect to another image, you can send back a “403 Forbidden” error message by replacing the RewriteRule above with this:

RewriteRule \.(jpe?g|gif|bmp|png)$ – [F]

Nobody’s Perfect…

There are some isolated cases when this won’t work. Some tools that allow people to surf “anonymously” will not send proper referrer headers, meaning that images will become broken on your own site for these visitors. Some proxies and firewalls will have the same effect. However, this won’t affect the vast majority of your visitors, and those who use referrer-hiding services are likely well aware of the side-effects.

Password protecting a folder using Apache

In general, all websites are freely viewable by anybody who wants to see them. Requiring a username and password to access various sensitive areas of your site allows you to restrict access to only a chosen few people who know the secret codes. In this tutorial I’ll present a method to secure a directory of documents by using a special Apache server configuration file.

Password protection through JavaScript

Before we get into this section, I present a minor caveat: using JavaScript to secure your website is an absolutely rubbish way to keep unwanted visitors out. If I encounter a site that tries to block access using JavaScript, it is a simple matter of temporarily disabling JavaScript in my browser to circumvent the dialog box. With no JavaScript, the link to the protected area of the site will work like any other normal link, and I will be able to roam free through the heretofore unseen depths of the site.

On top of this rather large chink in the armour, those pages will also be automatically indexed by search engines, leaving the private information accessible simply by searching for it.

So, given that any halfway competent infiltrator will easily be able to access a site secured only through JavaScript, I am not going to describe the method to do it, as there are significantly more secure ways to protect a section of your site that are much safer to use.

Using a .htaccess file

This special “.htaccess file”, which you may have encountered before if you’ve set up your own 404 error, is a configuration file for the Apache web server. It is just a text file with a special name that contains rules that your server will apply before it sends any files to a viewer of your site. These rules can change the URL of a page, create custom error messages, or in this case require a valid username and password to gain access to a certain area of the site.

These configuration files work on a directory basis, so if your site is at www.example.com and you place the .htaccess file in the root directory (where your index.html homepage is), the entire site will be off-limits and all visitors will need a password to view anything. This is generally not what you want, and so you will create a .htaccess file within a certain directory.

When you set up authorisation for a certain directory, that directory, all of its files, and any directories within it are all protected by this one file. You can have multiple different .htaccess file in multiple directories in your site if necessary.

To create the file, open your text editor and save a blank file as “.htaccess” in the directory you want to protect, noting that the filename starts with a dot. Windows users may find that they are told they can’t start a filename with a dot. If you get this error, use your FTP program to create the .htaccess file on your server and edit it there instead.

Setting up Authorisation

Now that we have our all-important .htaccess file, we’ll need to add the authorisation rules to it. Add these lines to your file:

AuthName “Section name”
AuthType Basic
AuthUserFile /.htpasswd
Require valid-user

Change the “Section name” to whatever the secure section of your website is called. This value will be placed in the dialog box when a user is asked for their details, so try to make it descriptive so that they know what they’re being asked for. The dialog looks like this in Firefox:

Firefox .htaccess authentication dialog box

If you save that file now and try to access this part of your website, you should be presented with a dialog box in your browser asking you for your username and password. Of course, there is no right answer yet because we haven’t set up any users. If you press Cancel in the dialog you will be given the standard “401 Authorization Required” error response code. This is what everyone will see if they log in incorrectly.

The .htpasswd file

To add valid users to our authentication scheme, we use a second file called a .htpasswd file. This file will hold a list of usernames and passwords. Because it contains potentially sensitive information, you should store it in a place that’s impossible to access from the web. This means putting it somewhere else on the server outside of your “web” or “www” directory where your website files are stored. Your hosting company will be able to help you place this file securely so that no ne’er-do-wells can access it.

Once you have secured this file, change the line in .htaccess that points to it. It’ll then look something like this:

AuthUserFile /usr/home/ross/.htpasswd

Finally, we just need to start adding valid users to this file. For added security, the passwords of your users aren’t stored in plain text in the .htpasswd file — they’re encrypted so that they can’t be read by a user snooping around the server. To add a user called “rustyleroo” with the password “flummox45”, we would add this line to the file:

rustyleroo:E2JbzVpOLlE6Y

As you can see, the password has been obfuscated into a strange form of gobbledegook. I derived this value (technically called a “hash”) by running the original password through an encryption program. There are lots of these available online (this one for example). You can add new users by adding new lines to this file, all in the form username:encryptedpassword.

Accessing the protected section

Now when you reload a file behind the authorisation wall, you enter a username and password into the dialog box. The server will encrypt this password again, and compare it to the encrypted version stored in the file to see if they match. If they do, you will be allowed to view the rest of the protected files as normal.

You can send the username and password to people in this format:

http://username:password@www.example.com/directory/

Clicking a link like that will log you in as the user at the start of the URL. Of course, you need to make sure that only the intended person gets their hands on this information.

Finally, to remove any password restrictions on your files, just delete the .htaccess file. For the final cherry on the cake, you can follow the steps in the 404 error tutorial, but instead set up a custom 401 error, so that users who log in incorrectly get a nicely-formatted error message.