You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Eli Reisman (JIRA)" <ji...@apache.org> on 2016/07/15 23:45:21 UTC
[jira] [Commented] (SAMZA-968) SequenceFileHdfsFileWriter does not
close file properly
[ https://issues.apache.org/jira/browse/SAMZA-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380315#comment-15380315 ]
Eli Reisman commented on SAMZA-968:
-----------------------------------
Thanks, have been heads down lately, would have jumped in on this thread sooner. This change looks great, thanks for catching this.
> SequenceFileHdfsFileWriter does not close file properly
> -------------------------------------------------------
>
> Key: SAMZA-968
> URL: https://issues.apache.org/jira/browse/SAMZA-968
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.10.0, 0.10.1
> Reporter: Benjamin Smith
> Assignee: Benjamin Smith
> Priority: Minor
> Fix For: 0.10.1
>
> Attachments: SAMZA-968.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> From dev@samza.apache.org:
> Hi, Benjamin,
> Thanks a lot for reporting this! It makes sense from reading the posts.
> Could you open a JIRA? Are you interested in assigning to yourself and
> contribute the fix?
> Thanks a lot again!
> -Yi
> > Hello,
> >
> > I am working on a project where we are integrating Samza and Hive. As part
> > of this project, we ran into an issue where sequence files written from
> > Samza were taking a long time (hours) to completely sync with HDFS.
> >
> > After some Googling and digging into the code, it appears that the issue
> > is here:
> >
> > https://github.com/apache/samza/blob/master/samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala#L111
> >
> > Writer.stream(dfs.create(path)) implies that the caller of
> > dfs.create(path) is responsible for closing the created stream explicitly.
> > This doesn't happen, and the SequenceFileHdfsWriter call to close will only
> > flush the stream.
> >
> > I believe the correct line should be:
> >
> > Writer.file(path)
> >
> > Or, SequenceFileHdfsWriter should explicitly track and close the stream.
> >
> > Thanks!
> >
> > Ben
> >
> > Refernece material:
> >
> > http://stackoverflow.com/questions/27916872/why-the-sequencefile-is-truncated
> >
> > https://apache.googlesource.com/hadoop-common/+/HADOOP-6685/src/java/org/apache/hadoop/io/SequenceFile.java#1238
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)