You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Mingjie Lai (JIRA)" <ji...@apache.org> on 2011/08/04 22:41:27 UTC

[jira] [Issue Comment Edited] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079579#comment-13079579 ] 

Mingjie Lai edited comment on FLUME-722 at 8/4/11 8:40 PM:
-----------------------------------------------------------

I cannot reproduce it with test cases -- TestExtractors. 

But for a flume deployment, I have an flume node configured as:

<code>
tail ("/tmp/log.log") | regexAll( "(.+)\\t(.+)\\t(.+)\\t(.+)\\t(.+)", "", "license", "timestamp", "rating", "url") => text("/tmp/aa.txt")
<code>

what I saw is:
<code>
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 210.5.102.6 } { license : xxx } { rating : 38 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx}
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 194.78.100.30 } { license : xxx } { rating : 41 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx} 
<code>

It's' an obvious java string comparison bug. So no test cases for this issue. 


      was (Author: mingjielai):
    I cannot reproduce it with test cases -- TestExtractors. 

But for a flume deployment, I have an flume node configured as:

<code>
tail ("/tmp/log.log") | regexAll( "(.+)\\t(.+)\\t(.+)\\t(.+)\\t(.+)", "", "license", "timestamp", "rating", "url") => text("/tmp/aa.txt")
</code>

what I saw is:
<code>
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 210.5.102.6 } { license : xxx } { rating : 38 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx}
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 194.78.100.30 } { license : xxx } { rating : 41 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx} 
</code>

It's' an obvious java string comparison bug. So no test cases for this issue. 

  
> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Nicholas Verbeck
>            Priority: Minor
>              Labels: decorator, regexall
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira