pzg's blog

常见的100多个爬虫

107 个机器人
Yahoo Slurp
Unknown robot (identified by ‘crawl’)
Googlebot
Yahoo! Slurp China
GouGou
OutfoxBot
GigaBot
Lilina
MSNBot
Java (Often spam bot)
NewsGator Online
BaiDuSpider
Sina Iask Spider
Bloglines
MagpieRSS
Alexa (IA Archiver)
Feedfetcher-Google
MT::Telegraph::Agent
Feedburner
RoJo aggregator
Python-urllib
Jakarta commons-httpclient
Heritrix
Google AdSense
Mydoyouhike
Unknown robot (identified by ‘spider’)
Voyager
Ocelli
Unknown robot (identified by hit on ‘robots.txt’)
Unknown robot (identified by ‘robot’)
larbin
Hylanda
ZyBorg
ZhuaXia
lanshanbot
Technoratibot
Microsoft URL Control
IRLbot
Unknown robot (identified by ‘bot/’ or ‘bot-‘)
Turn It In
Megite
MSIECrawler
msnbot-media
Sogou Spider
NutchCVS
MJ12bot
Kinjabot
Gaisbot
SurveyBot
Ask
StackRambler
Girafabot
T-H-U-N-D-E-R-S-T-O-N-E
Yahoo Feed Seeker
WordPress
UniversalFeedParser
Sphere Scout
findlinks
SBIder
Yahoo-Blogs
FeedValidator
Yahoo-MMCrawler
lwp-trivial
Webdup
Blogslive
IBM Almaden Research Center WebFountain?
Openfind data gatherer
BlogPulse ISSpider intelliseek.com
HPPrint
Walhello appie
BlogSearch
ping.blo.gs
Biz360 spider
UP.Browser
topicblogs
Exabot
Snappy
LinkWalker
BlogBridge Service
The World Wide Web Worm
Nutch
The web archive (IA Archiver)
Feedster
YahooSeeker-Testing
Voila
aipbot
PluckFeedCrawler
Everest-Vulcan
NG 2.x (Exalead)
MS SharePoint Portal Server – MS Search 4.0 Robot
Missigua_Locator
boitho.com-dc
Sunrise
Blogshares Spiders
ExactSeek Crawler
Nagios
nicebot
HTTrack off-line browser
Harvest
SandCrawler (Microsoft)
edgeio-retriever
NG 1.x (Exalead)
HTMLParser
Scooter
Y!J Yahoo Japan
arks
Tagyu Agent

《 “常见的100多个爬虫” 》 有 3 条评论

  1. Louis Han 的头像

    总结得真够全

    1. countmeon 的头像

      最近有不认识的蜘蛛老过来 所以找了这么个东东,看看到底是不是蜘蛛

  2. 娜娜lei1314 的头像

    我怎么一个都不认识呢,呜呜!

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注