The Ultimate Guide to Robots.txt: From Beginner to Pro (with Full Examples)
Content
## What is Robots.txt?
A `robots.txt` file is a plain text file located in the root directory of your website. It follows the Robots Exclusion Protocol to inform search engine crawlers (like Googlebot) which pages or files they should or should not crawl. Correctly configuring `robots.txt` is a fundamental part of technical SEO, helping to guide crawlers to index your important content efficiently while protecting sensitive or unnecessary pages from being indexed.
---
## Key Point: Placement Matters
The location of the `robots.txt` file is critical. It **must** be placed in the root directory of your website. If placed incorrectly, search engines will not be able to find and follow its rules.
- **Correct Location**: `https://wiki.lib00.com/robots.txt`
- **Incorrect Location**: `https://wiki.lib00.com/blog/robots.txt`
**Core Rules**:
- The filename must be `robots.txt` in all lowercase.
- Each domain (or subdomain) requires its own `robots.txt` file.
- It must be UTF-8 encoded.
---
## Core Syntax Explained
The syntax of `robots.txt` is simple and consists of a few main directives:
| Directive | Description | Example |
|---|---|---|
| `User-agent:` | Specifies which crawler the rule applies to. `*` means all crawlers. | `User-agent: Googlebot` |
| `Disallow:` | Prohibits crawlers from accessing the specified path. | `Disallow: /admin-lib00/` |
| `Allow:` | Allows crawlers to access the specified path, overriding `Disallow`. | `Allow: /public/` |
| `Sitemap:` | Informs crawlers of the location of your sitemap(s) to help them discover all important pages. | `Sitemap: https://wiki.lib00.com/sitemap.xml` |
| `Crawl-delay:` | (Non-standard, but supported by some crawlers) Sets the minimum time interval (in seconds) between fetches. | `Crawl-delay: 5` |
### Using Wildcards
- `*`: Matches any sequence of characters.
- `$`: Matches the end of a URL.
For example, `Disallow: /*.pdf$` will block crawlers from fetching all files ending in `.pdf`.
---
## Complete Configuration Example (Recommended)
Here is a comprehensive `robots.txt` example, curated by DP@lib00, which you can adapt to your needs.
```txt
# robots.txt for wiki.lib00.com
# Allow all crawlers by default
User-agent: *
Allow: /
# 1. Disallow admin, private, and temporary directories
Disallow: /admin/
Disallow: /private/
Disallow: /temp/
Disallow: /api/
# 2. Disallow specific file types
Disallow: /*.zip$
Disallow: /*.log$
# 3. Disallow indexing of search results and user action pages
Disallow: /search
Disallow: /login
Disallow: /cart
Disallow: /checkout
# 4. Disallow dynamic URLs with specific parameters
Disallow: /*?sessionid=
Disallow: /*?sort=
# 5. Set special rules for specific bots (optional)
User-agent: BadBot
Disallow: /
User-agent: Googlebot
# Allow Google to access everything except one specific directory
Disallow: /nogoogle/
# 6. Specify Sitemap locations
Sitemap: https://wiki.lib00.com/sitemap.xml
Sitemap: https://wiki.lib00.com/sitemap-images.xml
```
---
## Common Pitfall: Sitemaps Require Absolute URLs
A very common mistake is using a relative path in the `Sitemap` directive. According to the official protocol, **a Sitemap must be specified using a full absolute URL**, including the protocol and domain name.
- **✅ Correct**: `Sitemap: https://wiki.lib00.com/sitemap.xml`
- **❌ Incorrect**: `Sitemap: /sitemap.xml`
**Why?**
1. **Protocol Requirement**: Search engines need to know the exact location of the sitemap.
2. **Cross-Domain Support**: This allows you to host your sitemap file on a CDN or a different domain.
You can specify multiple sitemaps for a single site; just place each on a new line.
---
## Practical Templates
### Template 1: Allow All
Suitable for simple websites where all content is intended for indexing.
```txt
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
```
### Template 2: Disallow All
Ideal for sites under development or those that should not be indexed by any search engine.
```txt
User-agent: *
Disallow: /
```
---
## How to Validate?
After configuring your file, always use a tool to validate it and ensure there are no syntax errors.
- **Google Search Console**: Has a powerful built-in robots.txt Tester.
- **Bing Webmaster Tools**: Also offers similar functionality.
By following this guide, you can confidently create a robust and effective `robots.txt` file for your site (like wiki.lib00) and improve its SEO performance.
Related Contents
From Concept to Cron Job: Building the Perfect SEO Sitemap for a Multilingual Video Website
Duration: 00:00 | DP | 2026-01-20 08:23:13Decoding SEO's Canonical Tag: From Basics to Multilingual Site Mastery
Duration: 00:00 | DP | 2025-12-28 22:15:00The SEO Dilemma: Is `page=1` Causing a Duplicate Content Disaster?
Duration: 00:00 | DP | 2025-11-26 06:44:42Should You Encode Chinese Characters in Sitemap URLs? The Definitive Guide
Duration: 00:00 | DP | 2025-11-27 08:19:23The Ultimate Guide to Pagination SEO: Mastering `noindex` and `canonical`
Duration: 00:00 | DP | 2025-11-27 16:50:57The Ultimate Vue SPA SEO Guide: Perfect Indexing with Nginx + Static Generation
Duration: 00:00 | DP | 2025-11-28 18:25:38Can robots.txt Stop Bad Bots? Think Again! Here's the Ultimate Guide to Web Scraping Protection
Duration: 00:00 | DP | 2025-11-09 08:15:00Multilingual SEO Showdown: URL Parameters vs. Subdomains vs. Subdirectories—Which is Best?
Duration: 00:00 | DP | 2025-11-12 11:51:00The Art of URL Naming: Hyphen (-) vs. Underscore (_), Which is the SEO and Standard-Compliant Champion?
Duration: 00:00 | DP | 2026-01-24 08:28:23Frontend Development vs. JavaScript: How to Choose the Perfect Category for Your Tech Article
Duration: 00:00 | DP | 2026-02-06 10:37:19The Secret of URL Encoding: Is Your Link Friendly to Users and SEO?
Duration: 00:00 | DP | 2026-01-26 08:30:58Underscore vs. Hyphen: Which Naming Convention Should You Use for Files and Folders?
Duration: 00:00 | DP | 2026-02-13 13:05:04URL Refactoring in Practice: From Parameter Hell to SEO Heaven
Duration: 00:00 | DP | 2026-02-14 13:26:11Stop Hardcoding Your Sitemap! A Guide to Dynamically Generating Smart `priority` and `changefreq` with PHP
Duration: 00:00 | DP | 2026-03-02 19:03:55Recommended
The`0` Status Code Trap: An `Invisible Killer` Causing Countless Bugs in JavaScript
00:00 | 10Using 0 as a status code (e.g., for 'hidden') in a...
The Ultimate Guide: Why Does PHP json_decode Fail with a "Control Character Error"?
00:00 | 63Frequently encountering the "Control character err...
From Guzzle to Native cURL: A Masterclass in Refactoring a PHP Translator Component
00:00 | 57Learn how to replace Guzzle with native PHP cURL f...
Master Your PHP CLI: 3 Quick Ways to Check for pdo_pgsql Installation
00:00 | 19When developing with PHP and PostgreSQL, ensuring ...