惊闻360综合搜索上线,前去围观了一下,果然牛逼,搜索结果页整个就是一个百度嘛,页面抄的挺像。
要做搜索,肯定是要“做”出来才行,页面抄成这样,连背景图片、CSS都直接拿来用,貌似“做”的不够道德啊。不过,抄风格这事在中国互联网界早已蔚然成风,企鹅早就驾轻就熟,因为这个就责怪360貌似有点冤,凭什么企鹅山寨就可以,我大360就不可以?
好吧,咱还是继续围观。在好奇、自恋等各种情绪的支配下,我在360的搜索框里输入了本博客的站点网址“site:www.zhujianfeng.info”。这个大家都懂的,就是看搜录了多少页面呗。搜索结果出来之后哥那个兴奋啊,360真是太给鄙人面子了,哥从来没有给360提交过url,哥的网站流量这么小,居然收录了,居然有3个页面!!!想当年哥想让百度收录,提交url之后苦等了好几个月啊!!苦等好几个月之后也才只收录了3个页面啊!!等等,怎么360收录的这3个页面跟百度一个月之前收录的那三个页面是一样的?顺序还是一样的,这让我情何以堪,合着您就是个二道贩子么?
额,这样下结论未免太早了,于是我做了一个不太艰难的决定,我决定去apache的访问日志里头找360的蜘蛛,如果没有,那就有得怀疑了。蜘蛛嘛,就是搜索引擎抓取网页的程序(详见http://baike.baidu.com/view/2755932.htm),搜索引擎给我们的结果都是蜘蛛事先从各个网站抓取过来的。那也就是说,既然我能在360综合搜索搜到我的网站,那么360的蜘蛛肯定要事前到我的网站上抓取过页面,我的apache日志里肯定就有360蜘蛛的“踪迹”。
比如,百度蜘蛛在我这里的踪迹如下:
123.125.71.113 – – [05/Jul/2012:06:59:34 +0000] “GET /wp-includes/wlwmanifest.xml HTTP/1.1” 200 1314 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.27 – – [05/Jul/2012:06:59:36 +0000] “GET /?feed=rss2 HTTP/1.1” 200 9435 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.25 – – [05/Jul/2012:06:59:38 +0000] “GET /?feed=rss2&p=18 HTTP/1.1” 200 1314 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.74 – – [05/Jul/2012:06:59:40 +0000] “GET /?p=18&replytocom=2 HTTP/1.1” 200 7297 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.48 – – [05/Jul/2012:06:59:42 +0000] “GET /?feed=rss2&page_id=2 HTTP/1.1” 200 1041 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.101 – – [05/Jul/2012:06:59:44 +0000] “GET /?feed=rss2&cat=4 HTTP/1.1” 200 4828 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.101 – – [05/Jul/2012:06:59:46 +0000] “GET /?feed=rss2&p=47 HTTP/1.1” 200 1089 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.103 – – [05/Jul/2012:06:59:48 +0000] “GET /?feed=rss2&cat=3 HTTP/1.1” 200 2864 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.53 – – [05/Jul/2012:06:59:50 +0000] “GET /?feed=rss2&p=27 HTTP/1.1” 200 1082 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.23 – – [05/Jul/2012:06:59:52 +0000] “GET /?feed=rss2&p=13 HTTP/1.1” 200 1079 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.102 – – [05/Jul/2012:06:59:54 +0000] “GET /?p=50&replytocom=3 HTTP/1.1” 200 7843 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.83 – – [05/Jul/2012:06:59:56 +0000] “GET /?feed=rss2&p=50 HTTP/1.1” 200 1209 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.113 – – [05/Jul/2012:06:59:58 +0000] “GET /?feed=rss2&cat=5 HTTP/1.1” 200 4827 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.36 – – [05/Jul/2012:07:00:00 +0000] “GET /?feed=rss2&cat=7 HTTP/1.1” 200 3824 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
123.125.71.45 – – [05/Jul/2012:07:00:02 +0000] “GET /?feed=rss2&cat=6 HTTP/1.1” 200 2250 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
google的:
66.249.73.201 – – [23/Jun/2012:10:48:07 +0000] “GET /robots.txt HTTP/1.1” 404 512 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.73.201 – – [23/Jun/2012:10:48:07 +0000] “GET /?p=mxvrrqtcpsp HTTP/1.1” 200 19085 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.73.201 – – [23/Jun/2012:11:13:51 +0000] “GET /?p=mxvrrqtcpsp&paged=2 HTTP/1.1” 200 6873 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.68.52 – – [23/Jun/2012:16:45:26 +0000] “GET /robots.txt HTTP/1.1” 404 512 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.68.52 – – [23/Jun/2012:16:45:26 +0000] “GET /?p=41 HTTP/1.1” 200 4450 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.68.52 – – [23/Jun/2012:19:45:24 +0000] “GET /?m=201204 HTTP/1.1” 200 8477 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.67.162 – – [23/Jun/2012:22:03:28 +0000] “GET /robots.txt HTTP/1.1” 404 510 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.67.162 – – [23/Jun/2012:22:03:28 +0000] “GET /?m=201206 HTTP/1.1” 200 5522 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.67.162 – – [24/Jun/2012:04:03:36 +0000] “GET /wp-trackback.php?p=13 HTTP/1.1” 302 604 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.67.162 – – [24/Jun/2012:04:03:37 +0000] “GET /?p=13 HTTP/1.1” 200 6593 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
bing的:
157.55.18.23 – – [28/Jun/2012:08:52:12 +0000] “GET /robots.txt HTTP/1.1” 404 535 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.18.23 – – [28/Jun/2012:08:52:46 +0000] “GET /?C=M;O=A HTTP/1.1” 200 9419 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.17.150 – – [29/Jun/2012:06:13:25 +0000] “GET /robots.txt HTTP/1.1” 404 535 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.18.23 – – [29/Jun/2012:06:56:05 +0000] “GET /?C=M;O=A HTTP/1.1” 200 9419 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.16.11 – – [29/Jun/2012:08:12:05 +0000] “GET /robots.txt HTTP/1.1” 404 535 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.16.11 – – [29/Jun/2012:08:23:58 +0000] “GET /phpinfo.php HTTP/1.1” 404 508 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.17.150 – – [29/Jun/2012:08:55:06 +0000] “GET /wiki/index.php/%E9%A6%96%E9%A1%B5 HTTP/1.1” 404 527 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.16.11 – – [29/Jun/2012:14:56:49 +0000] “GET /robots.txt HTTP/1.1” 404 535 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.16.11 – – [29/Jun/2012:15:16:30 +0000] “GET / HTTP/1.1” 200 9419 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
157.55.16.11 – – [23/Jun/2012:16:21:24 +0000] “GET /robots.txt HTTP/1.1” 404 535 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)”
作为一个有身份的搜索引擎,相信360应该也是有蜘蛛的。接下来就是日志里找360蜘蛛啦。首先我们要猜,360的蜘蛛会叫啥呢?至少应该带上360或者带上qihu之类的词吧。如果真的没带,那就当我上面这些都在放P好了。在所有的日志里搜索一把“360”,大小写都算上,OH,no,这么多360浏览器的身影,去掉360EE和360SE,结果都是数字里恰好有个360。这。。。难道是真的真的真的没有360的蜘蛛来么?好吧,下面咱反过来找好了,先找所有带有bot或者spider的记录好了。正常的蜘蛛在名字里应该会有这两个单词中的一个吧,如果360的蜘蛛叫做“zhizhu”,那我无话可说,只能找块豆腐撞死算了。先查找含有bot或者spider的行,然后里面在查找里面是否有和360有关的单词。结果是。。。。。没有!!
这说明什么呢?这说明有很大的可能360的蜘蛛从来没有来过我的网站,但是他的结果页却有我的网站。套用小沈阳的一句名言:这是为什么捏?
为了避免是因为我的文本查找能力太差导致冤枉了360,我把日志放出来好了,大家帮我找找有360蜘蛛来过没,如果谁找到了请告诉我一声,我好好准备点苍蝇虫子啥的迎接360蜘蛛的到来。
日志下载地址:http://www.zhujianfeng.info/temp/access.tar.gz