You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2009/06/17 19:02:07 UTC
[jira] Created: (PIG-855) Filter to determine if a UserAgent string
is a bot
Filter to determine if a UserAgent string is a bot
--------------------------------------------------
Key: PIG-855
URL: https://issues.apache.org/jira/browse/PIG-855
Project: Pig
Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Priority: Minor
A PiggyBank contrib that would allow one to filter records by whether a UserAgent strings represents a bot.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-855) Filter to determine if a UserAgent
string is a bot
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721012#action_12721012 ]
Dmitriy V. Ryaboy commented on PIG-855:
---------------------------------------
Jeff, the approach depends on whether you care more about false positives or false negatives.
The right way to do this is probably not to write a boolean function, but something that returns one of several codes -- known browser, known crawler, monitor, stuff like wget and curl, and "unknown".
IAB has a standard list of bots and spiders (http://www.iab.net/sites/login.php), and maintains an industry standard for the filters that should be applied before numbers are reported.
> Filter to determine if a UserAgent string is a bot
> --------------------------------------------------
>
> Key: PIG-855
> URL: https://issues.apache.org/jira/browse/PIG-855
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Priority: Minor
>
> A PiggyBank contrib that would allow one to filter records by whether a UserAgent strings represents a bot.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.