Username Password

MoreNiche Affiliate SEO Tutorials

How to Use Robots.txt

This guide looks a setting up your own robots.txt file so that you can control spiders from search engines as well as more malicious sources.

The only tools you need are notepad, this web page and access to your websites root folder via FTP or similar means.

What is Robots.txt?

Robots.txt is a special file that can be uploaded to a server who’s primary function is to control search engine robots.

What are robots I hear you ask? Well the search engine robots are more commonly referred to as spiders. These spiders filter through your websites via links and make records of everything they find. This data is then used in different ways by the search engines to generate search results.

Why would I Want to Control Spiders?

Good question, since the robots are generally doing a good thing why would you want to hinder or stop them? Well the truth is there are lots of reasons why you would want to control the spiders here are a few examples of where robots.txt can come in handy.

Under Constructionrobots fixer

When your site in under construction its not really much use to users or spiders. Prevent them indexing half or broken sites by using robots.txt so you can create a positive first impression when you’re ready to launch.

Bad Robots

Unfortunately not all robots are friendly, robots.txt will help you stop the content and email scrapers as well as specific search engine spiders you don’t want checking your content or ones which are overloading your server.

Search Engines Look for Them

Most search engines do a check for robots.txt, this alone is reason enough to put at least something there so you return a 200 header status. Google even gives you information on what they have found in your robots.txt file in webmaster tools.

Duplicated or Sensitive Content

There are plenty of legitimate reasons to duplicate content both internally and externally. If you feel this is enough to endanger your site in certain search engines you may want to stop them crawling the sensitive pages.

There is also a possibility the content on some of your pages is irrelevant or indeed sensitive, you may wish to exclude these pages as well.

How to use Robots.txt

Robots.txt is pretty easy to use, to create you just need to use notepad and follow the right syntax. Luckily this is also very similar and is comprised of only a few elements.

The file is made up of several records and each record features a user-agent section and a disallow section. You may choose to include comments with the use of a "#" at the start of a line.

The "user-agent" section defines which robots should follow the command. To list multiple robots simply use multiple user-agent lines. You can also use the wildcard character "*" to force ALL robots to obey the following command. An example of this line is below.

User-agent: *

The "Disallow" section is used for specifying the directory or file that should not be accessed. It is fairly simply sytax and you just need to include the directory (excluding you base URL) or file similar to the following example.

Again you can have multiple lines to disallow a selection of files or directories.

Disallow: /folder1/

The "allow" command can also come in useful, it lets you specify a specific file to allow in a directory you may have disallowed. It still needs to be paired with a User-Agent command but can be added in with disallow commands. Here is an example of a full record using all of the above.

User-Agent: *
Disallow: /folder1/
Disallow: /folder2/
Disallow: /folder3/
Allow: /folder1/important.html

Allow All

Happy with all the robots getting to your site? Just use this simple code to open the gates and ensure you’re not blocking anything

User-Agent: *
Disallow:

Block All

Site down for a while for construction? This will block everything form spidering. Don’t forget to remove it when you’re done!

User-Agent: *
Disallow: /

Linking to your Sitemap

A handy little trick that some robots allow is to link to your sitemap inside your robots file. Simply use the following example and modify the URL to suite your own domain.

Sitemap:http://www.example.com/sitemap.xml

Uploading Your File

Once you’re ready you can upload the file to your server. It MUST be placed in the root of the domain.

If you want to test that the right robots have access to the right pages the Google Webmaster Tools section has a good tool.

I hope this one has helped you understand some of things you can do with robots.txt, it's a useful and easy to understand little file and will help improve SEO and secure your site.

If you have found this tutorial useful think of how much you could learn with the hundreds of similar guides that are waiting for you in the MoreNiche member's area.
It's FREE to join so sign up and increase your money making potential today.