Austin SEO Company | TastyPlacement
Call Now: 512-535-2492
  • Austin SEO
    • TastyPlacement in the Press
    • Team
    • Our Markets
      • SEO Dallas TX
      • SEO Houston TX
      • SEO San Antonio TX
  • Blog | Tutorials
  • Services
    • Industry Specific
      • HVAC SEO & Marketing
    • Local
      • Local Directory Submission
      • Google Maps Ranking Consulting
    • Mobile
      • Mobile SEO and Digital Marketing
      • Mobile Website Design
    • Web Development
      • Ecommerce Web Dev
      • WordPress Development
    • Tracking & Analytics
      • Analytics and Monitoring
      • Web Performance Call Tracking
      • SEO Article Tracker Software
    • PPC Management
    • WordPress SEO Service
    • Infographic Development
    • Social Media Marketing
  • Case Studies & Portfolios
    • Infographic Portfolio
    • SEO Portfolio
    • SEO Testimonials
    • Design Portfolio
  • Contact TastyPlacement
    • Job: Local Digital Marketing Specialist
    • Job: Social Media Manager Trainee
    • Job: Search Marketing Trainee
    • Job: SEO and Internet Marketing Sales
    • Privacy Policy & Terms of Use
    • Google Adwords Disclosure

Tutorial: Block Bad Bots with .htaccess

Tutorial: Block Bad Bots with .htaccess

In this tutorial, we’ll learn how to block bad bots and spiders from your website. We can save bandwidth and performance for customers, increase security, and prevent scrapers from putting duplicate content around the web.

Quick Start Instructions/Roadmap

For those looking to get started right away (without a lot of chit-chat), here are the steps to blocking bad bots with .htaccess:

  • FTP to your website and find your .htaccess file in your root directory
  • Create a page in your root directory called 403.html, the content of the page doesn’t matter, our is a text file with just the characters “403″
  • Browse to this page on AskApache that has a sample .htaccess snippet complete with bad bots already coded in
  • You can add any bots to the sample .htaccess file as long as you follow the .htaccess syntax rules
  • Test your .htaccess file with a bot spoofing site like wannabrowser.com

Check Your Server Logs for Bad Bots

Bad Bots Server Log

If you read your website server logs, you’ll see that bots and crawlers regularly visit your site–these visits can ultimately amount to hundreds of visits a day and plenty of bandwidth. The server log pasted above is from TastyPlacement, and the bot identified in red is discoverybot. This bot was nice enough to identify its website for me, but DiscoveryEngine.com touts itself as the next great search engine, but presently offers nothing except stolen bandwidth. It’s not a bot I want visiting my site. If you check your server logs, you might see bad bots like sitesnagger, reaper, harvest, and others.  Make a note of any suspicious bots you see in your logs.

AskApache’s Bad Bot RewriteRules

AskApache maintains a very brief tutorial but a very comprehensive .htaccess code snippet here. What’ makes that page so great is that the .htaccess snippet already has dozens of bad bots blocked (like reaper, blackwidow, sitesnagger) and you can simply add any new bots you identify.

If we want to block a bot not covered by AskApache’s default text, we just add a line to the “RewriteCond” section, separating each bot with a “|” pipe character. We’ve put “discoverybot” in our file because that’s a visitor we know we don’t want :

# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(verybadbot|discoverybot) [NC,OR]

If you are on the WordPress platform be careful not to disrupt existing entries in your .htaccess file. As always, keep a backup of your .htaccess file, it’s quite easy to break your site with one coding error. Also, it’s probably better to put these rewrite rules at the beginning of your .htaccess file so no pages are served before the bots read the rewrite directives. Here’s a simplified version of the complete .htaccess file:

ErrorDocument 403 /403.html

RewriteEngine On
RewriteBase /

# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|discoverybot) [NC,OR]

# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]

Here’s a translation of the .htaccess file above:

  • ErrorDocument sets a webpage titled 403.html to serve as our error document when bad bots are encountered; you want to create a page in your root directory called 403.html, the content of the page doesn’t matter, our is a text file with just the characters “403″
  • RewriteEngine and RewriteBase simple mean “ready to enforce rewrite rules, and set the base URL to the website root”
  • RewriteCond directs the server “if you encounter any of these bot names, enforce the RewriteRule that follows”
  • RewriteRule directs all bad bots identified in the text to our ErrorDocument, 403.html

 Testing Our .htaccess File

Once you upload your .htaccess file, you can test it by browsing to your site and pretending to be a bad bad. You do this by going to wannabrowser.com and spoofing a User Agent, in this case, we spoofed “SiteSnagger”:

If you installed properly, you should be directed to your 403 page, and you have successfully blocked most bad bots.

Limitiations

Now, why don’t we do this with Robots.txt and simply tell bots not to index? Simple: because bots might simply ignore our directive, or they’ll crawl anyway and just not index the content–that’s not a fix. Even with this .htaccess fix, it’ll only block bots that identify themselves. If a bot is spoofing itself as a legitimate User Agent, then this technique won’t work. We’ll post a tutorial soon about how to block traffic based on IP address. But, that said, you’ll block 90% of bad bot traffic with this technique.

Enjoy!

 

 

About the Author: Michael David


Michael David is the founder, current CEO, and lead strategist at TastyPlacement, based in Austin, Texas. He is the author of "WordPress 3.0 Search Engine Optimization" with the prestigious IT publisher, Packt Publishing. TastyPlacement performs search marketing campaigns, public relations, search engine optimization, social media consulting and online advertising for companies in a wide range of fields.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Blog and Tutorial Categories

  • Backlink Strategies
  • Infographics
  • Internet Marketing
  • Local Maps and Local Listings
  • Mobile SEO
  • Our Book: SEO for Wordpress
  • PHPLD
  • Portfolio
  • PPC
  • Programming & PHP
  • SEO
  • SEO Power Tools
  • SEO Resources
  • Social Media Marketing
  • Web Design
  • WordPress

Recent Posts and Tutorials

  • New Places for Business Bulk Upload Tool
  • Google Places Update: How to Find Missing Google+ Local Listings
  • New Orleans Pubcon 2013 Epic Dining Guide
  • Infographic: Fonts & Colors That Drive the World’s Top Brands
  • Infographic: Urban Mining
Call: 512.535.2492

Our Core Services

  • Austin SEO
  • Dallas SEO
  • San Antonio SEO
  • Austin PPC Management
  • WordPress SEO Service

Some Rich Snippets…

TastyPlacement

3910 S I H 35 Ste 302
Austin, Texas 78704-7424 USA
Office: (512) 535-2492

Scan Me

QR Code

Let’s Be Super Best Friends:

  • TastyPlacement on LinkedIn
  • TastyPlacement on Facebook
  • TastyPlacement on Twitter

TastyPlacement