You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Mingjie Lai (JIRA)" <ji...@apache.org> on 2011/08/04 22:41:27 UTC
[jira] [Issue Comment Edited] (FLUME-722) RegexAllExtractor doesn't
ignore empty groups
[ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079579#comment-13079579 ]
Mingjie Lai edited comment on FLUME-722 at 8/4/11 8:40 PM:
-----------------------------------------------------------
I cannot reproduce it with test cases -- TestExtractors.
But for a flume deployment, I have an flume node configured as:
<code>
tail ("/tmp/log.log") | regexAll( "(.+)\\t(.+)\\t(.+)\\t(.+)\\t(.+)", "", "license", "timestamp", "rating", "url") => text("/tmp/aa.txt")
<code>
what I saw is:
<code>
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] { : 210.5.102.6 } { license : xxx } { rating : 38 } { tailSrcFile : (long)7382926376034922343 (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx}
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] { : 194.78.100.30 } { license : xxx } { rating : 41 } { tailSrcFile : (long)7382926376034922343 (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx}
<code>
It's' an obvious java string comparison bug. So no test cases for this issue.
was (Author: mingjielai):
I cannot reproduce it with test cases -- TestExtractors.
But for a flume deployment, I have an flume node configured as:
<code>
tail ("/tmp/log.log") | regexAll( "(.+)\\t(.+)\\t(.+)\\t(.+)\\t(.+)", "", "license", "timestamp", "rating", "url") => text("/tmp/aa.txt")
</code>
what I saw is:
<code>
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] { : 210.5.102.6 } { license : xxx } { rating : 38 } { tailSrcFile : (long)7382926376034922343 (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx}
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] { : 194.78.100.30 } { license : xxx } { rating : 41 } { tailSrcFile : (long)7382926376034922343 (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx}
</code>
It's' an obvious java string comparison bug. So no test cases for this issue.
> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
> Key: FLUME-722
> URL: https://issues.apache.org/jira/browse/FLUME-722
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v0.9.4
> Reporter: Nicholas Verbeck
> Assignee: Nicholas Verbeck
> Priority: Minor
> Labels: decorator, regexall
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
> if(names.get(grp-1) != ""){
> Attributes.setString(e, names.get(grp-1), val);
> }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira