Redirects, rewrites and other juicy .htaccess morsels

Tagged:
You can follow any response to this post though the RSS feed

Redirects and rewrites are two essential tools in your arsenal as a web developer, especially when migrating and upgrading sites that finish up with different URL/path structures to the original. They can also help overcome problems with consistency, search engine optimisation, website hosting limitations and much more.

There’s a few key differences though, between the way redirects and rewrite rules can be implemented.

In it’s simplest form, a redirect is simply saying go here, not there. This might be to fetch a different page, image or other resource, or it might be to return a different response advising the visitor that something has changed, or is wrong. IT makes use of Apache’s mod_alias, and is the most common type of .htaccess directive used by SEO companies to transfer the value (as perceived by the search engines) of an old web page to the new.

A rewrite on the other hand, is asking the server to manipulate the output to the user, whether it be the just content or the URL, or both. It uses Apache’s mod_rewrite, which although not as compact in it’s syntax, offers developers far more power and flexibility.

For the most part, we’ll be talking about rewrite rules, however redirects certainly have there place as they’re easy, they’re fast, and they take up fewer server resources. So consider this carefully when deciding how to implement your code, especially in shared hosting environments.

Rules & syntax – Redirects (mod_alias)

Allowed values (response codes):

  • 200 – OK – This is the standard response for a successful HTTP request
  • 301 – Moved Permanently – All requests, current and future should be sent to the given URI
  • 304 – Not Modified – Suggests that the resource has not been modified since the version specified by any request headers (If-Modified-Since or If-None-Match). It means there is no need to transmit the resource again, since the client still has a copy downloaded previously.
  • 401 – Unauthorized – Similar to 403 Forbidden, but specifically for use when authentication is required and has failed or has not yet been provided. The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource. See Basic access authentication and Digest access authentication.
  • 403 – Forbidden – The request was a valid request, but the server is refusing to respond to it. Unlike a 401 Unauthorized response, authenticating will make no difference.
  • 404 – Not Found – The requested resource could not be found but may be available again in the future. Subsequent requests by the client are permissible.
  • 418 – I’m a teapot (RFC 2324) – This code was defined in 1998 as one of the traditional IETF April Fools’ jokes, in RFC 2324, Hyper Text Coffee Pot Control Protocol, and is not expected to be implemented by actual HTTP servers. The RFC specifies this code should be returned by tea pots requested to brew coffee.
  • 420 – Enhance Your Calm (Twitter) – Not part of the HTTP standard, but returned by version 1 of the Twitter Search and Trends API when the client is being rate limited.
  • 500 – Internal Server Error – A generic error message, given when an unexpected condition was encountered and no more specific message is suitable.

Example Redirect using mod_alias

So let’s say we wanted to send all requests for the path “/shop” to a subdomain with the same name…

Rules & syntax – Rewrites (mod_rewrite)

Allowed values (Condition flags):

  • NC – Case insensitive
  • OR – Allows a rule to apply if one of a series of conditions are true.

Allowed values (Rule flags)

  • R[=code] – Redirect to new URL, with optional code (see below). Prefix Substitution with http://thishost[:thisport]/ (which makes the new URL a URI) to force a external redirection. If no code is given a HTTP response of 302 (MOVED TEMPORARILY) is used. If you want to use other response codes in the range 300-400 just specify them as a number or use one of the following symbolic names: temp (default), permanent, seeother. Use it for rules which should canonicalize the URL and give it back to the client, e.g., translate “/~” into “/u/” or always append a slash to /u/user, etc.

    Note: When you use this flag, make sure that the substitution field is a valid URL! If not, you are redirecting to an invalid location! And remember that this flag itself only prefixes the URL with http://thishost[:thisport]/, rewriting continues. Usually you also want to stop and do the redirection immediately. To stop the rewriting you also have to provide the ‘L’ flag.

  • F – Forbidden (sends 403 header). This forces the current URL to be forbidden, i.e., it immediately sends back a HTTP response of 403 (FORBIDDEN). Use this flag in conjunction with appropriate RewriteConds to conditionally block some URLs.
  • G – Gone (no longer exists). This forces the current URL to be gone, i.e., it immediately sends back a HTTP response of 410 (GONE). Use this flag to mark pages which no longer exist as gone.
  • P – Proxy. This flag forces the substitution part to be internally forced as a proxy request and immediately (i.e., rewriting rule processing stops here) put through the proxy module. You have to make sure that the substitution string is a valid URI (e.g., typically starting with http://hostname) which can be handled by the Apache proxy module. If not you get an error from the proxy module. Use this flag to achieve a more powerful implementation of the ProxyPass directive, to map some remote stuff into the namespace of the local server.

    Notice: To use this functionality make sure you have the proxy module compiled into your Apache server program. If you don’t know please check whether mod_proxy.c is part of the “httpd -l” output. If yes, this functionality is available to mod_rewrite. If not, then you first have to rebuild the “httpd” program with mod_proxy enabled.

  • L – Last Rule. Stop the rewriting process here and don’t apply any more rewriting rules. This corresponds to the Perl last command or the break command from the C language. Use this flag to prevent the currently rewritten URL from being rewritten further by following rules. For example, use it to rewrite the root-path URL (‘/’) to a real one, e.g., ‘/e/www/’.
  • N – Next (i.e. restart rules) – Re-run the rewriting process (starting again with the first rewriting rule). Here the URL to match is again not the original URL but the URL from the last rewriting rule. This corresponds to the Perl next command or the continue command from the C language. Use this flag to restart the rewriting process, i.e., to immediately go to the top of the loop. But be careful not to create an infinite loop!
  • C – Chain. This flag chains the current rule with the next rule (which itself can be chained with the following rule, etc.). This has the following effect: if a rule matches, then processing continues as usual, i.e., the flag has no effect. If the rule does not match, then all following chained rules are skipped. For instance, use it to remove the “.www” part inside a per-directory rule set when you let an external redirect happen (where the “.www” part should not to occur!).
  • T=mime-type – Set Mime Type. Force the MIME-type of the target file to be MIME-type. For instance, this can be used to simulate the mod_alias directive ScriptAlias which internally forces all files inside the mapped directory to have a MIME type of “application/x-httpd-cgi”.
  • NS – Skip if internal sub-request. This flag forces the rewriting engine to skip a rewriting rule if the current request is an internal sub-request. For instance, sub-requests occur internally in Apache when mod_include tries to find out information about possible directory default files (index.xxx). On sub-requests it is not always useful and even sometimes causes a failure to if the complete set of rules are applied. Use this flag to exclude some rules. Use the following rule for your decision: whenever you prefix some URLs with CGI-scripts to force them to be processed by the CGI-script, the chance is high that you will run into problems (or even overhead) on sub-requests. In these cases, use this flag.
  • NC – Case insensitive. This makes the Pattern case-insensitive, i.e., there is no difference between ‘A-Z’ and ‘a-z’ when Pattern is matched against the current URL.
  • QSA – Append query string. This flag forces the rewriting engine to append a query string part in the substitution string to the existing one instead of replacing it. Use this when you want to add more data to the query string via a rewrite rule.
  • NE – Do not escape output. This flag keeps mod_rewrite from applying the usual URI escaping rules to the result of a rewrite. Ordinarily, special characters (such as ‘%’, ‘$’, ‘;’, and so on) will be escaped into their hexcode equivalents (‘%25′, ‘%24′, and ‘%3B’, respectively); this flag prevents this from being done. This allows percent symbols to appear in the output, as in…

    RewriteRule /foo/(.*) /bar?arg=P1\%3d$1 [R,NE]

    …which would turn ‘/foo/zed’ into a safe request for ‘/bar?arg=P1=zed’.

  • PT – Pass through. This flag forces the rewriting engine to set the uri field of the internal request_rec structure to the value of the filename field. This flag is just a hack to be able to post-process the output of RewriteRule directives by Alias, ScriptAlias, Redirect, etc. directives from other URI-to-filename translators. A trivial example to show the semantics: If you want to rewrite /abc to /def via the rewriting engine of mod_rewrite and then /def to /ghi with mod_alias:

    RewriteRule ^/abc(.*) /def$1 [PT]
    Alias /def /ghi

    If you omit the PT flag then mod_rewrite will do its job fine, i.e., it rewrites uri=/abc/… to filename=/def/… as a full API-compliant URI-to-filename translator should do. Then mod_alias comes and tries to do a URI-to-filename transition which will not work.

    Note: You have to use this flag if you want to intermix directives of different modules which contain URL-to-filename translators. The typical example is the use of mod_alias and mod_rewrite.

  • S=x – Skip next x rules. This flag forces the rewriting engine to skip the next num rules in sequence when the current rule matches. Use this to make pseudo if-then-else constructs: The last rule of the then-clause becomes skip=N where N is the number of rules in the else-clause. (This is not the same as the ‘chain|C’ flag!)
  • E=var:value – Set environment variable “var” to “value”. This forces an environment variable named VAR to be set to the value VAL, where VAL can contain regexp backreferences $N and %N which will be expanded. You can use this flag more than once to set more than one variable. The variables can be later dereferenced in many situations, but usually from within XSSI (via ) or CGI (e.g. $ENV{‘VAR’}). Additionally you can dereference it in a following RewriteCond pattern via %{ENV:VAR}. Use this to strip but remember information from URLs.

Allowed values (Server variables)

HTTP Headers

  • HTTP_USER_AGENT
  • HTTP_REFERER
  • HTTP_COOKIE
  • HTTP_FORWARDED
  • HTTP_HOST
  • HTTP_PROXY_CONNECTION
  • HTTP_ACCEPT

Request

  • REMOTE_ADDR
  • REMOTE_HOST
  • REMOTE_USER
  • REMOTE_IDENT
  • REQUEST_METHOD
  • SCRIPT_FILENAME
  • PATH_INFO
  • QUERY_STRING
  • AUTH_TYPE

Server

  • DOCUMENT_ROOT
  • SERVER_ADMIN
  • SERVER_NAME
  • SERVER_ADDR
  • SERVER_PORT
  • SERVER_PROTOCOL
  • SERVER_SOFTWARE

Time

  • TIME_YEAR
  • TIME_MON
  • TIME_DAY
  • TIME_HOUR
  • TIME_MIN
  • TIME_SEC
  • TIME_WDAY
  • TIME

Special

  • API_VERSION
  • THE_REQUEST
  • REQUEST_URI
  • REQUEST_FILENAME
  • IS_SUBREQ

Example Rewrite Rule using mod_rewrite

Let’s say we have some files in a folder (img/thumbs) that needed to be served from a different location, but only if the request was made for a specific folder on a specific domain.

Operators, options

 

Redirect one page to another

Redirect an entire path to another (includes subfolders)

Redirect an entire domain to another domain (includes subfolders)

Redirect http to https (or https to http)

Redirect www to non-www (or non-www to www)

Redirect based on query string

Redirect based on file extension

Rewrites – Remove file extension

Rewrites – Replace entire path

Rewrites – Replace entire domain

Some popular resources