问:您好,
我禁用过谷歌抓取,效果不明白 ,流量仍然被消耗掉了,User-agent: Googlebot Disallow: / Sitemap: http://www.cmmi345.org/sitemap.xml
,禁止谷歌抓网站数据,效果不明显
答:您好,
您使用robots规则去拦截的话是不一定有效果的,因为蜘蛛不一定会按照robots.txt里面的规则来爬行。因为这些规则并是不强制生效,而是简单的一个约定。
如果要屏蔽谷歌蜘蛛爬行您的站点,可以直接参考https://www.west.cn/faq/list.asp?unid=662 处理再观察。非常感谢您对我司的支持,谢谢!
问:麻烦帮我确认一下,我这样配置有问题么,如下:
<?xml version=\”1.0\” encoding=\”UTF-8\”?><configuration> <system.webServer> <rewrite> <rules><rules>
<rule name=\”Block spider\”> <match url=\”(^robots.txt$)\” ignoreCase=\”false\” negate=\”true\” /> <conditions> <add input=\”{HTTP_USER_AGENT}\” pattern=\”SemrushBot|googlebot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|curl|perl|Python|Wget|Xenu|ZmEu\” ignoreCase=\”true\” /> </conditions> <action type=\”CustomResponse\” statusCode=\”403\” statusReason=\”Forbidden\” statusDescription=\”Forbidden\” /></rule>
</rules>
<rule name=\”/ Z-BlogPHP Imported Rule\” stopProcessing=\”true\”> <match url=\”^.*?\” ignoreCase=\”false\” /> <conditions logicalGrouping=\”MatchAll\”> <add input=\”{REQUEST_FILENAME}\” matchType=\”IsFile\” negate=\”true\” /> <add input=\”{REQUEST_FILENAME}\” matchType=\”IsDirectory\” negate=\”true\” /> </conditions> <action type=\”Rewrite\” url=\”index.php/{R:0}\” /> </rule> <rule name=\”/ Z-BlogPHP Imported Rule index.php\” stopProcessing=\”true\”> <match url=\”^index.php/.*?\” ignoreCase=\”false\” /> <conditions logicalGrouping=\”MatchAll\”> <add input=\”{REQUEST_FILENAME}\” matchType=\”IsFile\” /> </conditions> <action type=\”Rewrite\” url=\”index.php/{R:0}\” /> </rule> </rules> </rewrite> </system.webServer></configuration>
答:您好,
您的规则添加的有问题,已经给您做了调整。现在模拟谷歌蜘蛛测试抓取您的站点已经无法正常访问了,提示403报错。您测试核实一下,非常感谢您对我司的支持,谢谢!