You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2009/06/18 02:36:07 UTC

[jira] Commented: (PIG-855) Filter to determine if a UserAgent string is a bot

    [ https://issues.apache.org/jira/browse/PIG-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721012#action_12721012 ] 

Dmitriy V. Ryaboy commented on PIG-855:
---------------------------------------

Jeff, the approach depends on whether you care more about false positives or false negatives.

The right way to do this is probably not to write a boolean function, but  something that returns one of several codes -- known browser, known crawler, monitor,  stuff like wget and curl, and "unknown".

IAB has a standard list of bots and spiders (http://www.iab.net/sites/login.php), and maintains an industry standard for the filters that should be applied before numbers are reported.  

> Filter to determine if a UserAgent string is a bot
> --------------------------------------------------
>
>                 Key: PIG-855
>                 URL: https://issues.apache.org/jira/browse/PIG-855
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Priority: Minor
>
> A PiggyBank contrib that would allow one to filter records by whether a UserAgent strings represents a bot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.