Tech Articles

How to close the site from indexing using robots.txt

In this article, we answered five frequently asked questions about closing the site from search engines.

Search engine crawlers scan all data on the Internet. Nevertheless, website owners can limit or deny access to their resource. This requires closing the site from indexing via the robots.txt system file.

If you don’t need to close the site completely, block search indexing of individual pages. Users shouldn’t see the back office of the site, personal accounts, outdated information from the promotions or calendar section in their search. It is also necessary to close scripts, pop-up windows, banners, and heavy files from indexing. This will help reduce the indexing time and server load.

How to close the site completely?

The website is usually completely closed from indexing during the development or redesign. The websites where webmasters are learning or experimenting are also often closed.

You can prohibit indexing of the site for all search engines, for a single bot, or choose to ban all except one.

Ban for all User-agent: *

Disallow: /

Ban for an individual robot User-agent:Googlebot-Image

Disallow: /

Ban for all but one robot User-agent: *

Disallow: /

User-agent: Google

Allow: /

How to close individual pages?

Small business card websites don’t usually require hiding individual pages. For those websites with a lot of service information, close pages and whole sections:

  • administration panel;
  • system directories;
  • personal account;
  • registration forms;
  • order forms;
  • product comparison;
  • favorites;
  • recycle bin;
  • captcha;
  • pop-ups and banners;
  • site search;
  • session IDs.

It is advisable to ban the indexing of so-called junk pages. These are the ones that contain outdated news, promotions and special offers, and events. Close articles with outdated content on information websites. Or else the site will be perceived as out of date or irrelevant. To avoid closing articles and materials, update the data in them regularly.

Block indexing of

a single page User-agent: *

Disallow: /contact.html

a section User-agent: *

Disallow: /catalog/

The entire site, except for one section User-agent: *

Disallow: /

Allow: /catalog

The entire section, except for one subsection User-agent: *

Disallow: /product

Allow: /product/auto

Site search User-agent: *

Disallow: /search

Administration panel User-agent: *

Disallow: /admin

How to close other information?

With the robots.txt file, you can close folders, files, scripts, and utm-tags. You can hide them completely or selectively. Indicate a ban for indexing for all or individual robots.

Indexing ban

File type User-agent: *

Disallow: /*.jpg

Folders User-agent: *

Disallow: /images/

Folder, except for one file User-agent: *

Disallow: /images/

Allow: file.jpg

Scripts User-agent: *

Disallow: /plugins/*.js

utm-tags User-agent: *

Disallow: *utm=

utm-tags for Yandex Clean-Param: utm_source&utm_medium&utm_campaign

How to close a site through meta tags?

A good alternative to the robots.txt file is the robots meta tag. Insert it in the site’s source code in the index.html file. Place it in the <head> container. Indicate those crawlers that cannot index the site. Enter “robots” if you close the site from all bot-crawlers. If you close the site from a specific crawler, enter its name.

Option 1.

<meta name=”robots” content=”noindex, nofollow”/>

Option 2.

<meta name=”robots” content=”none”/>

The “content” attribute has the following values:

none — no indexing allowed, (including noindex and nofollow);

noindex — no indexing of content is allowed;

nofollow — no indexing of links is allowed;

follow — links indexing is allowed;

index — indexing is allowed;

all — content and links indexing is allowed.

Thus, you can disallow indexing of content, but allow links’ indexing. To do this, enter content= “noindex, follow”. The links on such a page will be indexed, and the text — will not. Use combinations of values for different cases.

If you decide to close the site from indexing using meta tags, there is no need to create robots.txt.

What kind of errors may occur?

Logical — when the rules contradict each other. Detect logical errors by checking the robots.txt file in the Google Robots Testing Tool.

Syntactic — when rules are not written correctly in the file.

The most common ones include:

  • non-case-sensitive writing;
  • writing in capital letters;
  • listing all the rules in one line;
  • not having a blank line between the rules;
  • specifying the crawler in the directive;
  • listing a set instead of closing an entire section or folder;
  • not having a mandatory disallow directive

Cheat sheet

Use two options to ban the site indexing. Create a robots.txt file and specify a disallow directive for all crawlers. Or, write a ban in the robots meta tag (the index.html file inside the tag).

Close service information, out-of-date information, scripts, sessions, and utm-tags. Create a separate rule for each ban. Ban all search robots via * or specify the name of a specific crawler. If you want to allow only one robot to do that, write the rule through disallow. And don’t forget to check the indexing of links with an online google index checker.

Avoid logical and syntactic errors when creating a robots.txt file. Check the file using Yandex.Webmaster and Google Robots Testing Tool.

Occasionally check page indexing bans in mass using Linkbox. This is done in two steps: upload all the URLs and click on check links.

H4ck0
Step by step hacking tutorials about wireless cracking, kali linux, metasploit, ethical hacking, seo tips and tricks, malware analysis and scanning.
https://www.yeahhub.com/