How do I prevent search engines from picking up my page in their index?
Question by Compi | 2012-09-01 at 20:49
I have a website, that should and may not get into the index of a search engine (Google, Yahoo, Bing)!
So, before I set the page online (because then it's certainly faster too late than I can think), I want to ask you, what can be done effectively, that a page will not be included in the search index - provided there are meaningful opportunities available at all.
Related Topics
SQLite: Creating an Index on one or more Columns
Info | 0 Comments
The Askingbox Search
Info | 0 Comments
HTACCESS: Simplify URL
Tutorial | 0 Comments
How to Replace multiple Texts at the same Time
Tutorial | 0 Comments
MySQL: Display search results on multiple pages
Tutorial | 0 Comments
CSS: Include CSS Stylesheets in HTML
Tutorial | 0 Comments
MySQL: Combine full text search with LIKE search for words with 3 letters
Tutorial | 2 Comments
Important Note
Please note: The contributions published on askingbox.com are contributions of users and should not substitute professional advice. They are not verified by independents and do not necessarily reflect the opinion of askingbox.com. Learn more.
Participate
Ask your own question or write your own article on askingbox.com. That’s how it’s done.
Usually, in the head of each page, there is the following line:
That means, that you allow search engines to crawl your page and to take the page in their index and that search engines are allowed to follow the links available on your website. Here, you can write instead:
Through this meta tag, you tell the search engines, that the page should not be included into their index, and if the search engines are cute, they will delete your page from their index, after they have discovered it.
If you have links on your page, which the search engines should not follow, you can use this:
The rel="nofollow" tells the search engine, that it should not follow this link. However, there are also search engines, that do not investigate this information and visit the site anyway. Because, afetr all, your pages can still be called and visited even though they are marked with these tags.
2012-09-04 at 12:51
You can also write instructions in the robots.txt. The robots.txt should be available under yourdomain.com/robots.txt and it is a plain text file, in which you can just type directives for the crawlers.
To exclude the web crawlers from all areas and files of your website, just write in the robots.txt:
If crawlers now want to index your site, they look first in the robots.txt, and find their the statement, that they should take nothing from your own site. Then they disappear.
However, if you want to exclude just a few pages or specific directories, you can also specify that in the robots.txt. That's a bit much for this comment, so I have to written a little tutorial on this subject.
2012-09-06 at 08:04
Unfortunately, there are also some search engines that do not comply with the instructions given in robots.txt and they also ignore your meta tags.
In this case, just a .htaccess file in the appropriate directory can help. These may look like this:
This in your .htaccess file, locks all requests from the two IPs 123.456.789.000 and 100.100.100.100. The fourth line ensures, that alsoo all requests from IP addresses that begin with 200.123 will be locked like 200.123.1.1.1 or 200.123.10.27.23, for instance.
Only in this way, you can be sure that the appropriate Spider really can not access your site! The only problem is, you need a list of all the "bad" or at least the most important spider crawler. And they may change constantly! For such a list, that is quite actual, you should search the Internet.
Last update on 2020-01-20 | Created on 2012-09-08
I would consider to protect your site with a password generally and to set the page not publicly into the internet.
It seems to me, that your website is not meant to be found by search engines and therefore it should probably be only available to a small circle of people. And those people, you can also give the password for the site. With this solution, you do not have the problem to exclude search engines from your website or that perhaps secret information published on the page come to the public.
2012-09-10 at 01:41