Link to home
Start Free TrialLog in
Avatar of OliWarner
OliWarner

asked on

User-agent strings

I need a list of the most popular spider user-agent strings. I've got several items on my website that log or increment things and I really only want some of those to be logging if the thing hitting the page is a real person and not a bot. So I'm left checking the user-agents.

I can either grab the most popular browsers or the most popular bots... Whatever is most efficient -- you decide!

Either way, time-complexity is an issue as it is a fairly busy site, so the shortest and most effective list wins =)
Avatar of fpintos
fpintos

Have you tried setting up robot.txt to block these spiders? This is by far the simplest way.
Avatar of OliWarner

ASKER

I don't want to block them from viewing the pages -- just stop my logging script counting hits from them.
I've got them in a db, Oli.  Give me a minute to extract the bots from the browsers.
ASKER CERTIFIED SOLUTION
Avatar of rdivilbiss
rdivilbiss
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
> shortest and most effective list wins
So the list of all popular bot agent strings

    *Bot*

Most of the spider user-agent strings will have the substring "Bot" in it.
Ordinary user-agent string from IE, FF, NS etc do not contain "Bot".
Try filter using this.
I'm already using it.

From the above list I found That "CrawlerBot" is also a substring of bot agent strings

No, in the list above, "CrawlerBot" is not part of the user agent string, it was a field in the database of my collection of live user agents taken from dozens of my web sites and hand categorized.  Ignore: ,"CrawlerBot"
Yeah I can parse those out without issue. Thanks Rod that looks like it'll do the job perfectly
Those were as of January. There could always be a few new ones cropping up.