machina-policy is a way to parse and query robots.txt files so your web-scraping bot can be a good bot instead of a bad bot.

machina-policy supports the following basic elements of robots.txt files:

  • Allow: lines
  • Disallow: lines
  • URL globbing (like Googlebot: * is a wildcard, $ is a terminating anchor)
  • Crawl-delay (actually obeying crawl-delay is up to you)
  • Defaulting to User-agent: * if specified user-agent not found