Oct 24 2005

Updating Your Robots.txt File

Hey, you're new. We like you already! You obviously have great taste! If you like what you read here you'll probably want to subscribe to our RSS feed (or the audio RSS feed). Stick around and be sure to speak up and post a comment or two!

Dan Thies has found a neat hidden protocol that can be used on your robots.txt file: The Wildcard.

User-agent: Googlebot
Disallow: /*sort=

This would stop Googlebot from reading any URL that included the string “sort=” no matter where that string occurs in the URL.

This will come in handy for sites that use user or session IDs. We recently optimized such a client and had them change the ID requirements from being attached immediately, to only being attached to the URL once a product is added to the shopping cart. A big plus. Unfortunately, once they add a product and surf back into the site the ID now gets attached to every page. This can cause duplicate content problems.

This is the code we’ll be using for our client:

User-agent: Googlebot
Disallow: /*ps_session=

This will prevent any URLs with the session from being spidered and therefore prevent pages with duplicate content from getting in Google’s index.

Post comments RSS feed Like this post? Subscribe to the RSS feed and get lots more!

Leave a Reply

Ja, ich möchte bei Kommentaren benachrichtigt werden!