NamePros.Com (http://www.namepros.com/)
-   Search Engines (http://www.namepros.com/search-engines/)
-   -   robots.txt help needed (http://www.namepros.com/search-engines/455845-robots-txt-help-needed.html)

abdulbasituae 04-10-2008 07:46 AM

robots.txt help needed
 
Hello everyone,

I have a forum whose robots.txt I have made and placed at http://www.funwadi.com/robots.txt which Google is successfully accessing and blocking those which I have disallowed.

Now I wanted to block profile pages of members which Google is indexing at a rapid pace. I just don't want Google to index profile pages. Now the problem is that profile pages are in this format:-

http://www.funwadi.com/forum/member2.html
http://www.funwadi.com/forum/member3.html
http://www.funwadi.com/forum/member4.html
http://www.funwadi.com/forum/member5.html

and so on. I have over 44,000 registered users and I don't want Google to index profile pages so what I should enter in robots.txt file so that Google won't follow the above members pages.

Thanks a bunch in advance

AbdulBasit Makrani

enlytend 04-10-2008 12:53 PM

Google, Yahoo and MSN support 2 wildcards in robots.txt - *, which means any string of characters, and $ which anchors to the end of the url.

This should do it ...
Disallow: /forum/member*.html$

abdulbasituae 04-12-2008 12:51 AM

Originally Posted by enlytend
Google, Yahoo and MSN support 2 wildcards in robots.txt - *, which means any string of characters, and $ which anchors to the end of the url.

This should do it ...
Disallow: /forum/member*.html$



Thanks for the reply. I added the above disallow code in robots.txt file and checked through google webmasters tool who checks whether any specific URL is blocked or not so every member page it shows is allowed!

Any idea what to do now ? :(

enlytend 04-12-2008 03:51 AM

??? Should have worked. But here are two other suggestions:

.htaccess - deny the spider useragents access to those files. This is the most foolproof method. (Sorry don't have time to figure out the syntax and write it out)

or

modify the profile code so that there's a robots meta in the header:
<meta name="robots" content="noindex" />

weblord 04-12-2008 03:59 AM

robots.txt is currently blocking googlebot
see the proof under "restricted with robots.txt"
http://www.xml-sitemaps.com/se-bot-...ot&submit=Start


Originally Posted by enlytend
Google, Yahoo and MSN support 2 wildcards in robots.txt - *, which means any string of characters, and $ which anchors to the end of the url.

This should do it ...
Disallow: /forum/member*.html$


abdulbasituae 04-12-2008 11:17 AM

Originally Posted by weblord
robots.txt is currently blocking googlebot
see the proof under "restricted with robots.txt"
http://www.xml-sitemaps.com/se-bot-...ot&submit=Start



Oh yeah, that have started working. Awesome :) :)

Thanks a lot

Originally Posted by enlytend
??? Should have worked. But here are two other suggestions:

.htaccess - deny the spider useragents access to those files. This is the most foolproof method. (Sorry don't have time to figure out the syntax and write it out)

or

modify the profile code so that there's a robots meta in the header:
<meta name="robots" content="noindex" />



Thanks a lot. The code you gave me have actually started blocking google bot to index profile pages.
Once again thank you very much :)

abdulbasituae 04-20-2008 04:11 AM

Rep added enlytend :)


All times are GMT -7. The time now is 01:36 AM.
Site Sponsors
Advertise your business at NamePros

Powered by: vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 2.4.0