URL manipulation attacks: example, measures

The URL (Uniform Resource Locator) of a web application is the vector that makes it possible to indicate the requested resource. This article will show you how to protect yourself against URL manipulation attacks.

What is the essence of a URL?

A URL is a string of printable ASCII characters divided into five parts.

The first is the name of the protocol, the "language" used to communicate on the network. The HTTP protocol is the most widely used (HyperText Transfer Protocol), which makes it possible to exchange web pages in HTML format. Other protocols may also be used, including FTP, News, and Mailto.

The second is the ID and password that makes it possible to specify the parameters required to access a secure server. This option is not recommended since the password circulates unscrambled in the URL.

The third is the server's name, the domain name of the computer hosting the requested resource. Note that it is possible to use the server's IP address.

The fourth is the port number, which is associated with a service that tells the server what type of resource is being requested. The port associated with the protocol by default is port number 80. When the server's web service is associated with port number 80, the specification of the port number is optional.

The fifth is the access path to the resource, which tells the server where the resource is located; generally, this includes the location (directory) and the requested file name.

A URL has the following structure:

Protocol	Password (optional)	Server name	Port (optional if 80)	Path
http://	user:password@	www.commentcamarche.net	:80	/glossair/glossair.php3

The URL can make it possible to send parameters to the server by following the file name with a question mark and then data in ASCII format. A URL is, then, a string of characters with the following format:

http://en.kioskea.net/forum/?cat=1&page=2

What is the URL manipulation?

By manipulating certain parts of a URL, a hacker can get a web server to deliver web pages that they are not supposed to have access to.

On dynamic websites, parameters are mostly passed via the URL as follows:

http://target/forum/?cat=2

The data present in the URL is automatically created by the site. When navigating normally, a user simply clicks the links proposed by the website. If a user manually modifies the parameter, they can try different values, for example:

http://target/forum/?cat=6

If the designer has not anticipated this possibility, the hacker may potentially obtain access to an area that is usually protected.

In addition, the hacker can get the site to process an unexpected case, for example:

http://target/forum/?cat=***********

In the above example, if the site's designer has not anticipated the case where the data is not a number, the site may enter an unexpected state and reveal information in an error message.

Trial and Error

A hacker may possibly test directories and file extensions randomly in order to find important information.

They may try searching for directories that make it possible to administer the site:

http://target/admin/
http://target/admin.cgi

They could also try searching for a script to reveal information about the remote system:

http://target/phpinfo.php3

They may also try searching for backup copies. The .bak extension is generally used and is not interpreted by servers by default. This can cause a script to be displayed:

http://target/.bak

Finally, they may search for hidden files in the remote system. On UNIX systems, when the site's root directory corresponds to a user's directory, the files created by the system might be accessible via the web:

http://target/.bash_history
http://target/.htaccess

Directory Traversal

Directory traversal or path traversal attacks involve modifying the tree structure path in the URL in order to force the server to access unauthorized parts of the site.

In a classic example, the user may be forced to gradually move back through the tree structure, particularly in the event that the resource is not accessible, for example:

http://target/base/test/ascii.php3
http://target/base/test/
http://target/base/

On vulnerable servers, attackers can simply move back through the path with several ../-type strings:

http://target/../../../../directory/file

More advanced attacks encode certain characters in the form of URL encoding:

http://target/..%2F..%2F..%2Fdirectory/file

They may also employ a Unicode notation:

http://target/..%u2216..%u2216directory/file

Many dynamic sites pass the name of pages to be displayed as parameters in a form similar to the following:

http://target/cgi-bin/script.cgi?url=index.htm

If no verifications are carried out, a hacker may modify the URL manually in order to request access to a site resource that they do not have direct access to, for example:

http://target/cgi-bin/script.cgi?url=script.cgi

What are the countermeasures?

When it comes to URL manipulation prevention and strategies to secure a web server against URL manipulation attacks, it is necessary to keep a watch on vulnerabilities and regularly apply the patches provided by the web server's publisher. Moreover, a detailed configuration of the web server helps keep users from surfing on pages they are not supposed to have access to.

The web server should therefore be configured to:

Prevent the browsing of pages located below the website's root (chroot mechanism).
Disable the display of files present in a directory that does not contain an index file (also known as Directory Browsing).
Delete useless directories and files (including hidden files).
Make sure the server protects access to directories containing sensitive data.
Delete unnecessary configuration options.
Make sure the server accurately interprets dynamic pages, including backup files (.bak).
Delete unnecessary script interpreters.
Prevent HTTP viewing of HTTPS-accessible pages.