You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Eran Kutner (JIRA)" <ji...@apache.org> on 2011/08/10 11:11:27 UTC

[jira] [Created] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

escapedFormatDfs goes into a file creation frenzy
-------------------------------------------------

                 Key: FLUME-734
                 URL: https://issues.apache.org/jira/browse/FLUME-734
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: v0.9.4
         Environment: CentOS 5.6
            Reporter: Eran Kutner
            Priority: Critical
         Attachments: flume.log

Using this configuration:
collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }

The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.

Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.

See attached log excerpt.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on FLUME-734 started by Jonathan Hsieh.

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh reassigned FLUME-734:
------------------------------------

    Assignee: Jonathan Hsieh

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118427#comment-13118427 ] 

jiraposter@reviews.apache.org commented on FLUME-734:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2129/#review2230
-----------------------------------------------------------



flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java
<https://reviews.apache.org/r/2129/#comment5178>

    Added context here.


- jmhsieh


On 2011-09-30 20:47:52, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2129/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-30 20:47:52)
bq.  
bq.  
bq.  Review request for Flume, Prasad Mujumdar and Eric Sammer.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Output formats are now changed into function expressions and a new output format is created for each customdfssink. (this happens when there are multiple files being written by a escapedFormatDfs, and there was a shared outputformat object).
bq.  
bq.  commit 20a85af7d21a2a33c63903c794b4bd0d3dd2be02
bq.  Author: Jonathan Hsieh <jo...@cloudera.com>
bq.  Date:   Wed Aug 10 09:18:02 2011 -0700
bq.  
bq.      FLUME-734: escapedFormatDfs goes into a file creation frenzy
bq.  
bq.  
bq.  This addresses bug flume-734.
bq.      https://issues.apache.org/jira/browse/flume-734
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-core/src/main/java/com/cloudera/flume/conf/SinkBuilderUtil.java 2aaa566 
bq.    flume-core/src/main/java/com/cloudera/flume/handlers/hdfs/EscapedCustomDfsSink.java 20ebdfd 
bq.    flume-core/src/test/java/com/cloudera/flume/handlers/hdfs/TestEscapedCustomOutputDfs.java 7618acd 
bq.    flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 3096f59 
bq.    flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 8ecfed1 
bq.  
bq.  Diff: https://reviews.apache.org/r/2129/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Tests pass.
bq.  
bq.  From comments on jira, an earlier version of this patch is working for a few folks already.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Jonathan Hsieh (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh resolved FLUME-734.
----------------------------------

       Resolution: Fixed
    Fix Version/s: v0.9.5
    
> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-734-escapedFormatDfs-goes-into-a-file-creation.patch, FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Luke Forehand (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luke Forehand updated FLUME-734:
--------------------------------

    Comment: was deleted

(was: We are struggling with this same problem, we are seeing the exception with version 0.9.4
{{{
011-09-19 17:39:42,772 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode sink.hdfs-24 exited with error: OutputFormat instance can only write to the same OutputStream
java.io.IOException: OutputFormat instance can only write to the same OutputStream
    at com.ni.flume.outputformat.SeqFileJsonOutputFormat.format(SeqFileJsonOutputFormat.java:78)
    at com.cloudera.flume.handlers.hdfs.CustomDfsSink.append(CustomDfsSink.java:80)
    at com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.append(EscapedCustomDfsSink.java:123)
    at com.cloudera.flume.core.CompositeSink.append(CompositeSink.java:61)
    at com.cloudera.flume.handlers.rolling.RollSink.synchronousAppend(RollSink.java:234)
}}})

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Luke Forehand (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109054#comment-13109054 ] 

Luke Forehand commented on FLUME-734:
-------------------------------------

We are struggling with this same problem, we are seeing the exception with version 0.9.4

011-09-19 17:39:42,772 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode sink.hdfs-24 exited with error: OutputFormat instance can only write to the same OutputStream
java.io.IOException: OutputFormat instance can only write to the same OutputStream

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082368#comment-13082368 ] 

Jonathan Hsieh commented on FLUME-734:
--------------------------------------

The root cause of this is the fact tha the outputformat in the escapedFormatDfs gets reused per file it attempts to write, and that the seqfile output format assumes that it is only being used by a single file handle.  I've been able to reliably duplicate this problem with a test case, working on patch now.

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Priority: Critical
>         Attachments: flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Eric Hauser (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105411#comment-13105411 ] 

Eric Hauser commented on FLUME-734:
-----------------------------------

When trying this patch using avrodata, I got the following:


java.lang.NullPointerException
        at com.cloudera.flume.conf.SinkBuilderUtil.resolveOutputFormat(SinkBuilderUtil.java:59)
        at com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.openWriter(EscapedCustomDfsSink.java:109)
        at com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.append(EscapedCustomDfsSink.java:131)
        at com.cloudera.flume.core.CompositeSink.append(CompositeSink.java:61)
        at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
        at com.cloudera.flume.handlers.rolling.RollSink.synchronousAppend(RollSink.java:234)
        at com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:183)
        at com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:181)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118581#comment-13118581 ] 

Jonathan Hsieh commented on FLUME-734:
--------------------------------------

committed.
                
> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-734-escapedFormatDfs-goes-into-a-file-creation.patch, FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099168#comment-13099168 ] 

Jonathan Hsieh commented on FLUME-734:
--------------------------------------

Good to hear! 

I don't think TestRollSink is a test I expect to fail. It might be a different patch in progress.  Did you modify this the patch I posted?  If so it would be great if you share that change!

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-734:
---------------------------------

    Attachment: FLUME-734-draft.patch

@Eran, I've attached a draft fix that I was working on.  I haven't been able to test it as much as I would like.  Take a look and give it a try.

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118202#comment-13118202 ] 

jiraposter@reviews.apache.org commented on FLUME-734:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2129/#review2221
-----------------------------------------------------------

Ship it!


Looks fine to me.
Note that the TestRollSink.java has a new test case getting added as part of Flume-768. The EscapedCustomDfsSink is called from that new test. We'll need that minor change in the new test after both patches are committed.

thanx
Prasad
 

- Prasad


On 2011-09-30 15:09:20, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2129/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-30 15:09:20)
bq.  
bq.  
bq.  Review request for Flume, Prasad Mujumdar and Eric Sammer.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Output formats are now changed into function expressions and a new output format is created for each customdfssink. (this happens when there are multiple files being written by a escapedFormatDfs, and there was a shared outputformat object).
bq.  
bq.  commit 20a85af7d21a2a33c63903c794b4bd0d3dd2be02
bq.  Author: Jonathan Hsieh <jo...@cloudera.com>
bq.  Date:   Wed Aug 10 09:18:02 2011 -0700
bq.  
bq.      FLUME-734: escapedFormatDfs goes into a file creation frenzy
bq.  
bq.  
bq.  This addresses bug flume-734.
bq.      https://issues.apache.org/jira/browse/flume-734
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 8ecfed1 
bq.    flume-core/src/main/java/com/cloudera/flume/conf/SinkBuilderUtil.java 2aaa566 
bq.    flume-core/src/main/java/com/cloudera/flume/handlers/hdfs/EscapedCustomDfsSink.java 20ebdfd 
bq.    flume-core/src/test/java/com/cloudera/flume/handlers/hdfs/TestEscapedCustomOutputDfs.java 7618acd 
bq.    flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 1fd788f 
bq.  
bq.  Diff: https://reviews.apache.org/r/2129/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Tests pass.
bq.  
bq.  From comments on jira, an earlier version of this patch is working for a few folks already.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118428#comment-13118428 ] 

jiraposter@reviews.apache.org commented on FLUME-734:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2129/
-----------------------------------------------------------

(Updated 2011-09-30 20:47:52.242185)


Review request for Flume, Prasad Mujumdar and Eric Sammer.


Changes
-------

After FLUME-768, a minor tweak was required to make it compile and pass.


Summary
-------

Output formats are now changed into function expressions and a new output format is created for each customdfssink. (this happens when there are multiple files being written by a escapedFormatDfs, and there was a shared outputformat object).

commit 20a85af7d21a2a33c63903c794b4bd0d3dd2be02
Author: Jonathan Hsieh <jo...@cloudera.com>
Date:   Wed Aug 10 09:18:02 2011 -0700

    FLUME-734: escapedFormatDfs goes into a file creation frenzy


This addresses bug flume-734.
    https://issues.apache.org/jira/browse/flume-734


Diffs (updated)
-----

  flume-core/src/main/java/com/cloudera/flume/conf/SinkBuilderUtil.java 2aaa566 
  flume-core/src/main/java/com/cloudera/flume/handlers/hdfs/EscapedCustomDfsSink.java 20ebdfd 
  flume-core/src/test/java/com/cloudera/flume/handlers/hdfs/TestEscapedCustomOutputDfs.java 7618acd 
  flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 3096f59 
  flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 8ecfed1 

Diff: https://reviews.apache.org/r/2129/diff


Testing
-------

Tests pass.

bq. From comments on jira, an earlier version of this patch is working for a few folks already.


Thanks,

jmhsieh


                
> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated FLUME-734:
---------------------------------

    Attachment: 0001-FLUME-734-escapedFormatDfs-goes-into-a-file-creation.patch
    
> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>             Fix For: v0.9.5
>
>         Attachments: 0001-FLUME-734-escapedFormatDfs-goes-into-a-file-creation.patch, FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Eran Kutner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096852#comment-13096852 ] 

Eran Kutner commented on FLUME-734:
-----------------------------------

Any progress? Is there anything I can help with?

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Eran Kutner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098670#comment-13098670 ] 

Eran Kutner commented on FLUME-734:
-----------------------------------

After running it for a day, it seems that the problem is gone now.
The only issue I noticed was that TestRollSink failed to compile after applying the patch, complaining about a missing constructor for EscapedCustomDfsSink. I guess it probably comes from another patch?

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Eric Hauser (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105411#comment-13105411 ] 

Eric Hauser edited comment on FLUME-734 at 9/15/11 10:20 PM:
-------------------------------------------------------------

When trying this patch using avrodata, I got the following:

java.lang.NullPointerException
        at com.cloudera.flume.conf.SinkBuilderUtil.resolveOutputFormat(SinkBuilderUtil.java:59)
        at com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.openWriter(EscapedCustomDfsSink.java:109)
        at com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.append(EscapedCustomDfsSink.java:131)
        at com.cloudera.flume.core.CompositeSink.append(CompositeSink.java:61)
        at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
        at com.cloudera.flume.handlers.rolling.RollSink.synchronousAppend(RollSink.java:234)
        at com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:183)
        at com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:181)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

The context needs to be assigned to the member variable in the constructor.  After doing this, the patch worked for us.

      was (Author: ewhauser):
    When trying this patch using avrodata, I got the following:


java.lang.NullPointerException
        at com.cloudera.flume.conf.SinkBuilderUtil.resolveOutputFormat(SinkBuilderUtil.java:59)
        at com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.openWriter(EscapedCustomDfsSink.java:109)
        at com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.append(EscapedCustomDfsSink.java:131)
        at com.cloudera.flume.core.CompositeSink.append(CompositeSink.java:61)
        at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
        at com.cloudera.flume.handlers.rolling.RollSink.synchronousAppend(RollSink.java:234)
        at com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:183)
        at com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:181)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

  
> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Eran Kutner (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eran Kutner updated FLUME-734:
------------------------------

    Attachment: flume.log

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Priority: Critical
>         Attachments: flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118124#comment-13118124 ] 

jiraposter@reviews.apache.org commented on FLUME-734:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2129/
-----------------------------------------------------------

Review request for Flume, Prasad Mujumdar and Eric Sammer.


Summary
-------

Output formats are now changed into function expressions and a new output format is created for each customdfssink. (this happens when there are multiple files being written by a escapedFormatDfs, and there was a shared outputformat object).

commit 20a85af7d21a2a33c63903c794b4bd0d3dd2be02
Author: Jonathan Hsieh <jo...@cloudera.com>
Date:   Wed Aug 10 09:18:02 2011 -0700

    FLUME-734: escapedFormatDfs goes into a file creation frenzy


This addresses bug flume-734.
    https://issues.apache.org/jira/browse/flume-734


Diffs
-----

  flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 8ecfed1 
  flume-core/src/main/java/com/cloudera/flume/conf/SinkBuilderUtil.java 2aaa566 
  flume-core/src/main/java/com/cloudera/flume/handlers/hdfs/EscapedCustomDfsSink.java 20ebdfd 
  flume-core/src/test/java/com/cloudera/flume/handlers/hdfs/TestEscapedCustomOutputDfs.java 7618acd 
  flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 1fd788f 

Diff: https://reviews.apache.org/r/2129/diff


Testing
-------

Tests pass.

bq. From comments on jira, an earlier version of this patch is working for a few folks already.


Thanks,

jmhsieh


                
> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (FLUME-734) escapedFormatDfs goes into a file creation frenzy

Posted by "Luke Forehand (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109054#comment-13109054 ] 

Luke Forehand edited comment on FLUME-734 at 9/20/11 10:50 PM:
---------------------------------------------------------------

We are struggling with this same problem, we are seeing the exception with version 0.9.4
{{{
011-09-19 17:39:42,772 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode sink.hdfs-24 exited with error: OutputFormat instance can only write to the same OutputStream
java.io.IOException: OutputFormat instance can only write to the same OutputStream
    at com.ni.flume.outputformat.SeqFileJsonOutputFormat.format(SeqFileJsonOutputFormat.java:78)
    at com.cloudera.flume.handlers.hdfs.CustomDfsSink.append(CustomDfsSink.java:80)
    at com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.append(EscapedCustomDfsSink.java:123)
    at com.cloudera.flume.core.CompositeSink.append(CompositeSink.java:61)
    at com.cloudera.flume.handlers.rolling.RollSink.synchronousAppend(RollSink.java:234)
}}}

      was (Author: lukeforehand):
    We are struggling with this same problem, we are seeing the exception with version 0.9.4

011-09-19 17:39:42,772 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode sink.hdfs-24 exited with error: OutputFormat instance can only write to the same OutputStream
java.io.IOException: OutputFormat instance can only write to the same OutputStream
  
> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Assignee: Jonathan Hsieh
>            Priority: Critical
>         Attachments: FLUME-734-draft.patch, flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, once in a while the collector would go into a file creation frenzy, creating new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance can only write to the same OutputStream" causing the file to be closed a new one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but the behavior I'm seeing feels like some sort of a race condition. It is happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira