This is a video of Matt Cutts going over a few ways to remove URL's from your site. You need to watch this if you have or suspect problems with duplicate content issues on your site.
Why should you be concerned about URL removal? Because duplicate content penalties can be solved quickly! From my experience, using all of these tactics, particularly the last one will:
- Remove your content from the search engine within a day.
- Fix any ranking penalties in about a week.
To cut to the chase, the URL Removal Tool can be found in Google Webmasters Tools. In short you’ll need to:
- Verify your site with Google Webmaster tools. Its going to be useful for more than just this for the future SEO health of your site.
- 404 or robots.txt the entire page or directory that you’re going to be removing.
It won’t allow you to otherwise.UPDATE: May 17th 2011: This is no longer true, however still a good practice. You’ll need to do it within 90 days anyways if you think there might be something still linking to the page you’re trying to remove. See Easier URL Removals For Site Owners.
Review & Input on Matt’s Methods
Stop linking to pages - Run a query on your database for particular links and you can find all of these quickly. You can use the Search tab in PHPMyAdmin to easily accomplish this.
.htaccess – Password protecting a page is not an option for most business scenarios. Non-the-less, here is a .htaccess password tool if you need it.
robots.txt / nofollow – I combined these because my response to them are so similar. Historically I’ve a had issues with Google not recognizing robots.txt right away. Its also not a true removal because Google still lists it as an uncrawled reference. Granted, when it does this the content is not counted as duplicate. Another thing worth mentioning, using this tactic alone seems to take Google months to recognize on its own. You’ll want to do it anyways since the fast URL Removal method requires it.
Noindex Meta Tag - I’ve seen issues on Google crawls with noindex/nofollow meta tags when combined with 301/302 (page moved permanently/temporarily) or robots.txt. If you’re trying to get pages out of the index make sure you disable redirects temporarily so that Google can see you want the page to drop out of index. This normally applies to old pages from a recent site migration.
<META NAME="ROBOTS" CONTENT="NOINDEX">
robotstxt.org goes into a lot more detail about the usage of the noindex/nofollow tags when used as meta tags.
The URL Removal tool - As mentioned before will allow you to:
- Remove entire directory structures
- Remove single pages
- Remove just the cache of a page.
It does not currently allow you to remove based on a wild card. Which would be REALLY helpful in my opinion. IE. this wont work:
You’ll find it under Site Configuration >> Crawler Access as seen in here.
In this example you can see that I recently used this tool to remove 20 some odd auto translated directories from this site.
This was created by the WordPress plugin Global Translator and is a good example of why you might need to use the methods above.
Global Translator seemed like a good idea at the time, but after some research it seemed to be causing myself and others a few SEO penalties. I removed it and broke even on the traffic I lost shortly after. And no more confusing comments in Chinese . The traffic quality is higher since more of it is from the US. That means more money because most ads in non-English speaking countries seem to register at about two cents a click. But that’s a story for another day.
Entrupeners, Subscribe for the lastest tools, tips, and tutorials.