wordpress robots.txt
this is the next thing that I hurry to setup – robots.txt. I noticed, that by default wordpress robots.txt redirects to robots.txt/ and the output looks like:
User-agent: * Disallow:
that means everything is allowed to crawl and index for search engines.
first thing – not sure why it is redirecting to robots.txt/ (the file doesn’t exists – seems wordpress it generates it dynamically) and the second thing – I want some restrictions here.
so created file robots.txt that contains the following content:
User-agent: Mediapartners-Google* Disallow: USER-AGENT: * Disallow: /author/ Disallow: /wp-admin/ Disallow: /cgi-bin/ Disallow: /wp-includes/ Disallow: /wp-content/ Disallow: /wp- Disallow: /page/ Disallow: /tag/ Disallow: /trackback/ Disallow: /feed/
I allow everything for Mediapartners-Google – for google adsense service.
Disallow indexing wordpress files, author pages that may be rated as duplicate content, tags, and pages. For now I think it’s enough to let indexing homepage, categories and posts, of course.
Tags: robots.txt, seo, wordpress

June 30th, 2008 at 1:00 pm
one update on that. decided to disallow indexing categories for a while as well. Still not sure about my blog structure. It’s ok to keep closed it for a while.
so added at the bottom:
Disallow: /theme/
as category base name I set [theme] instead of [category].