How to prevent duplicate content with effective use of the robots.txt and robots tag.

Posted: Sep 19, 2005 | Comments: 0 | Views: 155 | Bookmark and Share

Duplicate content is one of the problems that we regularly come across as part of the search engine optimization services we offer. If the search engines determine your site contains similar content, this may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified.

Your primary weapon of choice against duplicate content can be found within "The Robot Exclusion Protocol" which has now been adopted by all the major search engines.

There are two ways to control how the search engine spiders index your site. 1. The Robot Exclusion File or "robots.txt" and 2. The Robots < Meta > Tag

The Robots Exclusion File (Robots.txt)

This is a simple text file that can be created in Notepad. Once created you must upload the file into the root directory of your website e.g. www.yourwebsite.com/robots.txt. Before a search engine spider indexes your website they look for this file which tells them exactly how to index your site's content.

The use of the robots.txt file is most suited to static html sites or for excluding certain files in dynamic sites. If the majority of your site is dynamically created then consider using the Robots < Meta >Tag.

Creating your robots.txt file

Example 1 Scenario

If you wanted to make the .txt file applicable to all search engine spiders and make the entire site available for indexing. The robots.txt file would look like this:

User-agent: * Disallow:

Explanation The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. By leaving the "Disallow" blank all parts of the site are suitable for indexing.

Example 2 Scenario

If you wanted to make the .txt file applicable to all search engine spiders and to stop the spiders from indexing the faq, cgi-bin the images directories and a specific page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: * Disallow: /faq/ Disallow: /cgi-bin/ Disallow: /images/ Disallow: /faqs.html

Explanation The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. Preventing access to the directories is achieved by naming them, and the specific page is referenced directly. The named files & directories will now not be indexed by any search engine spiders.

Example 3 Scenario

If you wanted to make the .txt file applicable to the Google spider, googlebot and stop it from indexing the faq, cgi-bin, images directories and a specific html page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: googlebot Disallow: /faq/ Disallow: /cgi-bin/ Disallow: /images/ Disallow: /faqs.html

Explanation By naming the particular search spider in the "User-agent" you prevent it from indexing the content you specify. Preventing access to the directories is achieved by simply naming them, and the specific page is referenced directly. The named files & directories will not be indexed by Google.

That's all there is to it!

As mentioned earlier the robots.txt file can be difficult to implement in the case of dynamic sites and in this case it's probably necessary to use a combination of the robots.txt and the robots tag.

The Robots < Meta > Tag

This alternative way of telling the search engines what to do with site content appears in the section of a web page. A simple example would be as follows;

In this example we are telling all search engines not to index the page or to follow any of the links contained within the page.

In this second example I don't want Google to cache the page, because the site contains time sensitive information. This can be achieved simply by adding the "noarchive" directive.

What could be simpler!

Although there are other ways of preventing duplicate content from appearing in the Search Engines this is the simplest to implement and all websites should operate either a robots.txt file and or a Robot tag combination.

Should you require further information about our search engine marketing or optimization services please visit us at www.e-prominence.co.uk - The search marketing company

(ArticlesBase SC #2324)

Rate this Article
  • 1
  • 2
  • 3
  • 4
  • 5
  • 0 vote(s)
    Feedback
    RSS
    Print
    Email
    Re-Publish

    Source:  http://www.articlesbase.com/seo-articles/how-to-prevent-duplicate-content-with-effective-use-of-the-robotstxt-and-robots-tag-2324.html

    Article Tags:

    Services

    ,

    problems

    ,

    Optimization

    ,

    search

    ,

    engine

    ,

    Content

    ,

    engines

    ,

    duplicate

    ,

    Result

    ,

    penalties

    ,

    part

    ,

    May

    ,

    determine

    ,

    regularly

    ,

    come across

    ,

    we offer

    ,

    contains similar

    ,

    exclusion fro

    Google Hacks Tutorial - Part 2

    More cool google search strings that bring up things you may never had expected! I apologize in advance for the potentially poor quality, so for the record the search strings are as follows: "robots.txt" "disallow:" filetype:txt intitle:index of ws_ftp.ini intitle:"index of" passwd passwd.bak (07:59)

    Fallout 3 Playthrough - Part 25 - It's Ammo Mart!

    Continue searching the Super-Duper Mart, finding lots of ammo along the way. Still need to find food. Enlist the short-lived help of a robotic aide. (10:46)

    How to Use Links for Search Engine Optimization

    Learn how to use links for search engine optimization. Please Visit; http://www.icrossing.com http://www.searchengineoptimization.com.sg http://www.elliance.com for more info on Search Engine Optimization Tools (01:09)

    How to Customize Your Own Search Engine

    Internet tutorials, this video will focus on how to customize your own search engine. (01:49)

    How to Get Listed in Search Engines

    An elementary video that explains to beginners how to get blogs listed in search engines quickly and easily. (09:20)

    Advertising is a highly complex business. It requires multifaceted expertise and experience. It involves objectivity. It involves objectivity. It involves huge investments and many other things.

    By: Emily Jones l Internet > SEO l Mar 19, 2010

    Finding IT Outsourcing Companies in India is like finding fresh water fish inside a pond. They are available in abundance with a service provider available at every block claiming to be the most reliable.

    By: John Anthony l Internet > SEO l Mar 19, 2010

    Useful information and advice on search engine optimization (SEO). Find out what you need to do to get your website ranking in Google.

    By: Mark Walters l Internet > SEO l Mar 19, 2010 l Views: 1

    Pay per click advertising can bring tremendous amounts of traffic to your site, but only if you angle it right. One of the best ways to ensure that internet users click through to your site after seeing your ad on the search engine pages is to interlink all of your online marketing strategies.

    By: Davis Morris l Internet > SEO l Mar 19, 2010

    Creativity brings appeal and shine to your website for sure. When we think of the most creative works in an online perspective, areas like designing, content creation etc. get their permanent hold in our psyche. Meaningful, rich, fresh and genuine web pages ornamented with smart and scintillating designs have now become the prime ingredient of a website.

    By: Arun Kumar l Internet > SEO l Mar 19, 2010

    Now a day's link building technique isn't as effectual as it used to be earlier with search engine like Google, Yahoo etc. depreciating lots of directories in the past has caused very difficult for directory owners to earn.

    By: David Johny l Internet > SEO l Mar 18, 2010

    No web site will appear at the top of a search result by accident. Most sites get to the top with the aid of search engine optimization (SEO).

    By: Jacob Daniel l Internet > SEO l Mar 18, 2010

    What are authority link tools ?, in order to establish what an authority link tool is you first have to define what an authority link is. An authority link is a backlink from a website that has standing in the search engines (these are usually .edu or .gov domains).

    By: Ian Williamson l Internet > SEO l Mar 18, 2010 l Views: 1

    This may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified. Your primary weapon of choice against duplicate.

    By: Andrew Allfrey l Internet > SEO l Sep 19, 2005 l Views: 155

    Add new Comment

     
    * Required fields
    Author Box
    Articles Categories
    All Categories
    0