You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Eli Reisman (JIRA)" <ji...@apache.org> on 2016/07/15 23:45:21 UTC

[jira] [Commented] (SAMZA-968) SequenceFileHdfsFileWriter does not close file properly

    [ https://issues.apache.org/jira/browse/SAMZA-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380315#comment-15380315 ] 

Eli Reisman commented on SAMZA-968:
-----------------------------------

Thanks, have been heads down lately, would have jumped in on this thread sooner. This change looks great, thanks for catching this.

> SequenceFileHdfsFileWriter does not close file properly
> -------------------------------------------------------
>
>                 Key: SAMZA-968
>                 URL: https://issues.apache.org/jira/browse/SAMZA-968
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.10.0, 0.10.1
>            Reporter: Benjamin Smith
>            Assignee: Benjamin Smith
>            Priority: Minor
>             Fix For: 0.10.1
>
>         Attachments: SAMZA-968.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> From dev@samza.apache.org:
> Hi, Benjamin,
> Thanks a lot for reporting this! It makes sense from reading the posts.
> Could you open a JIRA? Are you interested in assigning to yourself and
> contribute the fix?
> Thanks a lot again!
> -Yi
> > Hello,
> >
> > I am working on a project where we are integrating Samza and Hive. As part
> > of this project, we ran into an issue where sequence files written from
> > Samza were taking a long time (hours) to completely sync with HDFS.
> >
> > After some Googling and digging into the code, it appears that the issue
> > is here:
> >
> > https://github.com/apache/samza/blob/master/samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala#L111
> >
> > Writer.stream(dfs.create(path)) implies that the caller of
> > dfs.create(path) is responsible for closing the created stream explicitly.
> > This doesn't happen, and the SequenceFileHdfsWriter call to close will only
> > flush the stream.
> >
> > I believe the correct line should be:
> >
> > Writer.file(path)
> >
> > Or, SequenceFileHdfsWriter should explicitly track and close the stream.
> >
> > Thanks!
> >
> > Ben
> >
> > Refernece material:
> >
> > http://stackoverflow.com/questions/27916872/why-the-sequencefile-is-truncated
> >
> > https://apache.googlesource.com/hadoop-common/+/HADOOP-6685/src/java/org/apache/hadoop/io/SequenceFile.java#1238



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)