You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Jonathan Hsieh (JIRA)" <ji...@apache.org> on 2009/10/28 01:23:59 UTC

[jira] Created: (HADOOP-6339) SequenceFile writer does not properly flush stream with external DataOutputStream

SequenceFile writer does not properly flush stream with external DataOutputStream
---------------------------------------------------------------------------------

                 Key: HADOOP-6339
                 URL: https://issues.apache.org/jira/browse/HADOOP-6339
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 0.20.1
            Reporter: Jonathan Hsieh


When using the SequenceFile.createWriter(..,FSDataOutputStream, ...) method to create a Writer, data is not flushed when the encapsulating SequenceFile is closed.

Example test case skeleton:
{code}
public void testWhyFail() throws IOException {

    // There a was a failure case using :
    Configuration conf = ... ;
    Path path = new Path("file:///tmp/testfile");
    FileSystem hdfs = path.getFileSystem(conf);

    // writing
    FSDataOutputStream dos = hdfs.create(path);
    hdfs.deleteOnExit(path);

    // it is specifically with this writer.
    Writer writer = SequenceFile.createWriter(conf, dos,
        WriteableEventKey.class, WriteableEvent.class,
        SequenceFile.CompressionType.NONE, new DefaultCodec());

    Writable value = ...;
    Writable key = ...;

    writer.append(key, value);
    writer.sync();
    writer.close();

    // Test fails unless I close the underlying FSDataOutputStream handle with the line below.
    //    dos.close(); 
    
    // WTF: nothing written by this writer!
    FileStatus stats = hdfs.getFileStatus(path);
    assertTrue(stats.getLen() > 0);
    // it should have written something but it failed.
  }
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HADOOP-6339) SequenceFile writer does not properly flush stream with external DataOutputStream

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher reopened HADOOP-6339:
---------------------------------------


> SequenceFile writer does not properly flush stream with external DataOutputStream
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-6339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6339
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Hsieh
>
> When using the SequenceFile.createWriter(..,FSDataOutputStream, ...) method to create a Writer, data is not flushed when the encapsulating SequenceFile is closed.
> Example test case skeleton:
> {code}
> public void testWhyFail() throws IOException {
>     // There a was a failure case using :
>     Configuration conf = ... ;
>     Path path = new Path("file:///tmp/testfile");
>     FileSystem hdfs = path.getFileSystem(conf);
>     // writing
>     FSDataOutputStream dos = hdfs.create(path);
>     hdfs.deleteOnExit(path);
>     // it is specifically with this writer.
>     Writer writer = SequenceFile.createWriter(conf, dos,
>         WriteableEventKey.class, WriteableEvent.class,
>         SequenceFile.CompressionType.NONE, new DefaultCodec());
>     Writable value = ...;
>     Writable key = ...;
>     writer.append(key, value);
>     writer.sync();
>     writer.close();
>     // Test fails unless I close the underlying FSDataOutputStream handle with the line below.
>     //    dos.close(); 
>     
>     // WTF: nothing written by this writer!
>     FileStatus stats = hdfs.getFileStatus(path);
>     assertTrue(stats.getLen() > 0);
>     // it should have written something but it failed.
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-6339) SequenceFile writer does not properly flush stream with external DataOutputStream

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das resolved HADOOP-6339.
---------------------------------

    Resolution: Invalid

Hey Jonathan, this is supposed to work that way. The outputstream is supposed to be closed by the app since it was created by the app. 

> SequenceFile writer does not properly flush stream with external DataOutputStream
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-6339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6339
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Hsieh
>
> When using the SequenceFile.createWriter(..,FSDataOutputStream, ...) method to create a Writer, data is not flushed when the encapsulating SequenceFile is closed.
> Example test case skeleton:
> {code}
> public void testWhyFail() throws IOException {
>     // There a was a failure case using :
>     Configuration conf = ... ;
>     Path path = new Path("file:///tmp/testfile");
>     FileSystem hdfs = path.getFileSystem(conf);
>     // writing
>     FSDataOutputStream dos = hdfs.create(path);
>     hdfs.deleteOnExit(path);
>     // it is specifically with this writer.
>     Writer writer = SequenceFile.createWriter(conf, dos,
>         WriteableEventKey.class, WriteableEvent.class,
>         SequenceFile.CompressionType.NONE, new DefaultCodec());
>     Writable value = ...;
>     Writable key = ...;
>     writer.append(key, value);
>     writer.sync();
>     writer.close();
>     // Test fails unless I close the underlying FSDataOutputStream handle with the line below.
>     //    dos.close(); 
>     
>     // WTF: nothing written by this writer!
>     FileStatus stats = hdfs.getFileStatus(path);
>     assertTrue(stats.getLen() > 0);
>     // it should have written something but it failed.
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6339) SequenceFile writer does not properly flush stream with external DataOutputStream

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771083#action_12771083 ] 

Aaron Kimball commented on HADOOP-6339:
---------------------------------------

I agree that the app should close the underlying OutputStream. But after the SequenceFile.Writer is closed, its data should be available in the file. SequenceFile.Writer.close() in fact calls flush() on the underlying DataOutputStream. It is bizarre that this data is not available to other readers. I still consider this a bug.

at SequenceFile.java, line 985:
{code}
        // Close the underlying stream iff we own it...
        if (ownOutputStream) {
          out.close();
        } else {
          out.flush(); // <-- This does not seem to work right.
        }
        out = null;
{code}



> SequenceFile writer does not properly flush stream with external DataOutputStream
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-6339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6339
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Hsieh
>
> When using the SequenceFile.createWriter(..,FSDataOutputStream, ...) method to create a Writer, data is not flushed when the encapsulating SequenceFile is closed.
> Example test case skeleton:
> {code}
> public void testWhyFail() throws IOException {
>     // There a was a failure case using :
>     Configuration conf = ... ;
>     Path path = new Path("file:///tmp/testfile");
>     FileSystem hdfs = path.getFileSystem(conf);
>     // writing
>     FSDataOutputStream dos = hdfs.create(path);
>     hdfs.deleteOnExit(path);
>     // it is specifically with this writer.
>     Writer writer = SequenceFile.createWriter(conf, dos,
>         WriteableEventKey.class, WriteableEvent.class,
>         SequenceFile.CompressionType.NONE, new DefaultCodec());
>     Writable value = ...;
>     Writable key = ...;
>     writer.append(key, value);
>     writer.sync();
>     writer.close();
>     // Test fails unless I close the underlying FSDataOutputStream handle with the line below.
>     //    dos.close(); 
>     
>     // WTF: nothing written by this writer!
>     FileStatus stats = hdfs.getFileStatus(path);
>     assertTrue(stats.getLen() > 0);
>     // it should have written something but it failed.
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6339) SequenceFile writer does not properly flush stream with external DataOutputStream

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770732#action_12770732 ] 

Jonathan Hsieh commented on HADOOP-6339:
----------------------------------------

Devaraj:   If that is the case, I think the javadoc documentation should be updated explaining these semantics because they are different from close semantics when creating the writer via SequenceFile.createWriter(...,Path,...).  

Inside SequenceFile.close(),  it eventually calls flush on the FSDataOutputStream.  Note that this test case is actually writing to the local file system (file:///tmp/testfile).  Is that flush call supposed to do nothing to maintain consistency with the semantics when writing to hdfs?

Thanks,
Jon.

> SequenceFile writer does not properly flush stream with external DataOutputStream
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-6339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6339
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Hsieh
>
> When using the SequenceFile.createWriter(..,FSDataOutputStream, ...) method to create a Writer, data is not flushed when the encapsulating SequenceFile is closed.
> Example test case skeleton:
> {code}
> public void testWhyFail() throws IOException {
>     // There a was a failure case using :
>     Configuration conf = ... ;
>     Path path = new Path("file:///tmp/testfile");
>     FileSystem hdfs = path.getFileSystem(conf);
>     // writing
>     FSDataOutputStream dos = hdfs.create(path);
>     hdfs.deleteOnExit(path);
>     // it is specifically with this writer.
>     Writer writer = SequenceFile.createWriter(conf, dos,
>         WriteableEventKey.class, WriteableEvent.class,
>         SequenceFile.CompressionType.NONE, new DefaultCodec());
>     Writable value = ...;
>     Writable key = ...;
>     writer.append(key, value);
>     writer.sync();
>     writer.close();
>     // Test fails unless I close the underlying FSDataOutputStream handle with the line below.
>     //    dos.close(); 
>     
>     // WTF: nothing written by this writer!
>     FileStatus stats = hdfs.getFileStatus(path);
>     assertTrue(stats.getLen() > 0);
>     // it should have written something but it failed.
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6339) SequenceFile writer does not properly flush stream with external DataOutputStream

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771841#action_12771841 ] 

Devaraj Das commented on HADOOP-6339:
-------------------------------------

Actually, the flush call is somewhat misleading. If I am reading things right, it seems like LocalFilesystem's flush actually ends up being a no-op. 

> SequenceFile writer does not properly flush stream with external DataOutputStream
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-6339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6339
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Hsieh
>
> When using the SequenceFile.createWriter(..,FSDataOutputStream, ...) method to create a Writer, data is not flushed when the encapsulating SequenceFile is closed.
> Example test case skeleton:
> {code}
> public void testWhyFail() throws IOException {
>     // There a was a failure case using :
>     Configuration conf = ... ;
>     Path path = new Path("file:///tmp/testfile");
>     FileSystem hdfs = path.getFileSystem(conf);
>     // writing
>     FSDataOutputStream dos = hdfs.create(path);
>     hdfs.deleteOnExit(path);
>     // it is specifically with this writer.
>     Writer writer = SequenceFile.createWriter(conf, dos,
>         WriteableEventKey.class, WriteableEvent.class,
>         SequenceFile.CompressionType.NONE, new DefaultCodec());
>     Writable value = ...;
>     Writable key = ...;
>     writer.append(key, value);
>     writer.sync();
>     writer.close();
>     // Test fails unless I close the underlying FSDataOutputStream handle with the line below.
>     //    dos.close(); 
>     
>     // WTF: nothing written by this writer!
>     FileStatus stats = hdfs.getFileStatus(path);
>     assertTrue(stats.getLen() > 0);
>     // it should have written something but it failed.
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.