You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Nicholas Verbeck (JIRA)" <ji...@apache.org> on 2011/08/04 04:01:28 UTC

[jira] [Created] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

RegexAllExtractor doesn't ignore empty groups
---------------------------------------------

                 Key: FLUME-722
                 URL: https://issues.apache.org/jira/browse/FLUME-722
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: v0.9.4
            Reporter: Nicholas Verbeck
            Assignee: Nicholas Verbeck
            Priority: Minor


Hi flume devs.

I saw a bug when using RegexAllExtractor: line 94:

         if(names.get(grp-1) != ""){
           Attributes.setString(e, names.get(grp-1), val);
         }

Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.

(I don't think I can open a issue at cloudera jira.)

Thanks,
Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-722:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>             Fix For: v0.9.5
>
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Mingjie Lai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mingjie Lai updated FLUME-722:
------------------------------

    Attachment: FLUME-722.patch

Patch available. 

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Nicholas Verbeck
>            Priority: Minor
>              Labels: decorator, regexall
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129329#comment-13129329 ] 

Hudson commented on FLUME-722:
------------------------------

Integrated in flume-728 #40 (See [https://builds.apache.org/job/flume-728/40/])
    FLUME-722. Transactinal memory channel implementation.

(Prasad Mujumdar via Arvind Prabhakar)

arvind : http://svn.apache.org/viewvc/?view=rev&rev=1185412
Files : 
* /incubator/flume/branches/flume-728/flume-ng-core/src/main/java/org/apache/flume/channel/MemoryChannel.java
* /incubator/flume/branches/flume-728/flume-ng-core/src/main/java/org/apache/flume/channel/MultiOpMemChannel.java
* /incubator/flume/branches/flume-728/flume-ng-core/src/main/java/org/apache/flume/channel/PseudoTxnMemoryChannel.java
* /incubator/flume/branches/flume-728/flume-ng-core/src/test/java/org/apache/flume/channel/TestMemoryChannelTransaction.java
* /incubator/flume/branches/flume-728/flume-ng-core/src/test/java/org/apache/flume/sink/TestAvroSink.java
* /incubator/flume/branches/flume-728/flume-ng-core/src/test/java/org/apache/flume/sink/TestLoggerSink.java
* /incubator/flume/branches/flume-728/flume-ng-core/src/test/java/org/apache/flume/sink/TestRollingFileSink.java
* /incubator/flume/branches/flume-728/flume-ng-core/src/test/java/org/apache/flume/source/TestAvroSource.java
* /incubator/flume/branches/flume-728/flume-ng-core/src/test/java/org/apache/flume/source/TestSequenceGeneratorSource.java
* /incubator/flume/branches/flume-728/flume-ng-node/src/test/java/org/apache/flume/source/TestNetcatSource.java
* /incubator/flume/branches/flume-728/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java

                
> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>             Fix For: v0.9.5
>
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Mingjie Lai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079579#comment-13079579 ] 

Mingjie Lai edited comment on FLUME-722 at 8/4/11 8:40 PM:
-----------------------------------------------------------

I cannot reproduce it with test cases -- TestExtractors. 

But for a flume deployment, I have an flume node configured as:

<code>
tail ("/tmp/log.log") | regexAll( "(.+)\\t(.+)\\t(.+)\\t(.+)\\t(.+)", "", "license", "timestamp", "rating", "url") => text("/tmp/aa.txt")
<code>

what I saw is:
<code>
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 210.5.102.6 } { license : xxx } { rating : 38 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx}
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 194.78.100.30 } { license : xxx } { rating : 41 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx} 
<code>

It's' an obvious java string comparison bug. So no test cases for this issue. 


      was (Author: mingjielai):
    I cannot reproduce it with test cases -- TestExtractors. 

But for a flume deployment, I have an flume node configured as:

<code>
tail ("/tmp/log.log") | regexAll( "(.+)\\t(.+)\\t(.+)\\t(.+)\\t(.+)", "", "license", "timestamp", "rating", "url") => text("/tmp/aa.txt")
</code>

what I saw is:
<code>
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 210.5.102.6 } { license : xxx } { rating : 38 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx}
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 194.78.100.30 } { license : xxx } { rating : 41 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx} 
</code>

It's' an obvious java string comparison bug. So no test cases for this issue. 

  
> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Nicholas Verbeck
>            Priority: Minor
>              Labels: decorator, regexall
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh reassigned FLUME-722:
------------------------------------

    Assignee: Mingjie Lai  (was: Nicholas Verbeck)

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079876#comment-13079876 ] 

Jonathan Hsieh commented on FLUME-722:
--------------------------------------

Ok, I've been able to cause the problem and it seems to come from the parsing of the config.  Diving deeper.

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079877#comment-13079877 ] 

Jonathan Hsieh commented on FLUME-722:
--------------------------------------

Here's a test the fails before and succeeds after the fix.  Fold this into TestExtractors please?

{code}
  /**
  * the parser step creates a "" that is not the canonical "" which requires equals vs == test
  **/
  @Test
  public void testRegexAllExtractorEmptyProblem() throws IOException,
      InterruptedException, FlumeSpecException {
    final MemorySinkSource mem = new MemorySinkSource();
    mem.open();
    SinkFactoryImpl sf = new SinkFactoryImpl();
    sf.setSink("mem", new SinkBuilder() {
      @Override
      public EventSink build(Context context, String... argv) {
        return mem;
      }
      
    });
    FlumeBuilder.setSinkFactory(sf);
    RegexAllExtractor re = (RegexAllExtractor) FlumeBuilder.buildSink(
        LogicalNodeContext.testingContext(), "regexAll(\"(.+)\\\\t(.+)\","
            + "\"\", \"keep\") mem");

    re.open();
    re.append(new EventImpl("ignoreme\tkeepme".getBytes()));
    re.close();

    mem.close();
    mem.open();
    Event e1 = mem.next();
    assertEquals(null, Attributes.readString(e1, ""));
    assertEquals("keepme", Attributes.readString(e1, "keep"));
  }
{code}

Thanks,
Jon.

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-722:
---------------------------------

    Fix Version/s: v0.9.5

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>             Fix For: v0.9.5
>
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-722:
---------------------------------

    Status: Patch Available  (was: Open)

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Mingjie Lai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081283#comment-13081283 ] 

Mingjie Lai commented on FLUME-722:
-----------------------------------

@jon Thanks for your comments. 

I created https://review.cloudera.org/r/1891/ for easier review. And I also posted a new patch which includes the test case at the rb. 

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>             Fix For: v0.9.5
>
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081459#comment-13081459 ] 

Jonathan Hsieh commented on FLUME-722:
--------------------------------------

looks good.  Thanks Mingjie!

Still waiting on svn import so committing to github.com for now.  Will close issue when pushed to apache svn.

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>             Fix For: v0.9.5
>
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079867#comment-13079867 ] 

Jonathan Hsieh commented on FLUME-722:
--------------------------------------

@Mingjie

Nice catch!   I agree this is a bad code smell. Am I right to assume that after the patch, the { : 210.5.102.6 } and { :194.78.100.30 } parts are not present?

I'm really curious about how to duplicate this and why this happens and would really like to figure this out before we commit.  I've tried to make a test case that would make an "" attribute show up but I can't seem to do it either.  Can you give a scrubbed example line?  Do they look like:

"1.2.3.4\txxxLicense\t1304406740\t42\txxxUrl"

Also, a style nit -- can you invert the equals expression to eliminate the possibility of a NPE?  This could be done by changing: 
!names.get(grp-1).equals("")
into 
!"".equals(names.get(grp-1))

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Mingjie Lai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079579#comment-13079579 ] 

Mingjie Lai commented on FLUME-722:
-----------------------------------

I cannot reproduce it with test cases -- TestExtractors. 

But for a flume deployment, I have an flume node configured as:

<code>
tail ("/tmp/log.log") | regexAll( "(.+)\\t(.+)\\t(.+)\\t(.+)\\t(.+)", "", "license", "timestamp", "rating", "url") => text("/tmp/aa.txt")
</code>

what I saw is:
<code>
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 210.5.102.6 } { license : xxx } { rating : 38 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx}
localhost [INFO Thu Aug 04 13:05:12 PDT 2011] {  : 194.78.100.30 } { license : xxx } { rating : 41 } { tailSrcFile : (long)7382926376034922343  (string) 'full.log' (double)3.641231503360977E185 } { timestamp : 1304406740 } { url : xxx} 
</code>

It's' an obvious java string comparison bug. So no test cases for this issue. 


> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Nicholas Verbeck
>            Priority: Minor
>              Labels: decorator, regexall
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079883#comment-13079883 ] 

Jonathan Hsieh commented on FLUME-722:
--------------------------------------

Actually, instead of the previous comment, this is a smaller and better change that better demonstrates the root cause.  (second call to names.add different and added an assert). 

Can you modify the existing function to be this instead?  

{code}
  @Test
  public void testRegexAllExtractor() throws IOException, InterruptedException {
    MemorySinkSource mem = new MemorySinkSource();
    mem.open();
    ArrayList<String> names = new ArrayList<String>();
    names.add("d1");
    // when parsed, a separate instance of "" is created
    names.add(new String(new byte[0]));
    names.add("d2");

    RegexAllExtractor re = new RegexAllExtractor(mem, "(\\d):(\\d):(\\d)",
        names);

    re.open();
    re.append(new EventImpl("1:2:3.4foobar5".getBytes()));
    re.close();

    mem.close();
    mem.open();
    Event e1 = mem.next();
    assertEquals("1", Attributes.readString(e1, "d1"));
    assertEquals(null, Attributes.readString(e1, ""));
    assertEquals("3", Attributes.readString(e1, "d2"));
  }
{code}

Thanks,
Jon.

> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-722) RegexAllExtractor doesn't ignore empty groups

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082202#comment-13082202 ] 

jiraposter@reviews.apache.org commented on FLUME-722:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1448/
-----------------------------------------------------------

(Updated 2011-08-10 07:11:31.240611)


Review request for Flume and jmhsieh.


Changes
-------

Update do include link to jira issue


Summary
-------

Reposted for Mingjie Lai (was originally posted and reviewed on https://review.cloudera.org/r/1891/)


This addresses bug flume-722.
    https://issues.apache.org/jira/browse/flume-722


Diffs
-----

  /trunk/flume-core/src/main/java/com/cloudera/flume/core/extractors/RegexAllExtractor.java 1155987 
  /trunk/flume-core/src/test/java/com/cloudera/flume/core/extractors/TestExtractors.java 1155987 

Diff: https://reviews.apache.org/r/1448/diff


Testing
-------

Ran test, it passed.


Thanks,

jmhsieh



> RegexAllExtractor doesn't ignore empty groups
> ---------------------------------------------
>
>                 Key: FLUME-722
>                 URL: https://issues.apache.org/jira/browse/FLUME-722
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>            Reporter: Nicholas Verbeck
>            Assignee: Mingjie Lai
>            Priority: Minor
>              Labels: decorator, regexall
>             Fix For: v0.9.5
>
>         Attachments: FLUME-722.patch
>
>
> Hi flume devs.
> I saw a bug when using RegexAllExtractor: line 94:
>          if(names.get(grp-1) != ""){
>            Attributes.setString(e, names.get(grp-1), val);
>          }
> Please help to file a jira and correct it to use String.equal(), otherwise it doesn't ignore empty groups.
> (I don't think I can open a issue at cloudera jira.)
> Thanks,
> Mingjie

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira