You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/12/11 19:15:01 UTC

[jira] [Commented] (AVRO-2109) Reset buffers in case of IOException

    [ https://issues.apache.org/jira/browse/AVRO-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717787#comment-16717787 ] 

ASF subversion and git services commented on AVRO-2109:
-------------------------------------------------------

Commit a731fab500606404ecfd755717b441109ccf7337 in avro's branch refs/heads/branch-1.8 from [~gszadovszky]
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=a731fab ]

AVRO-2109: Reset buffers in case of IOException

Closes #260

Signed-off-by: Zoltan Ivanfi <zi...@cloudera.com>
Signed-off-by: sacharya <su...@apache.org>
Signed-off-by: Nandor Kollar <nk...@apache.org>
(cherry picked from commit 673261c8656124cc58bee65fe5e8c779350779ee)


> Reset buffers in case of IOException
> ------------------------------------
>
>                 Key: AVRO-2109
>                 URL: https://issues.apache.org/jira/browse/AVRO-2109
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.8.2
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>             Fix For: 1.7.8, 1.9.0, 1.8.3
>
>
> In case of an {{IOException}} is thrown out from {{DataFileWriter.writeBlock}} the {{buffer}} and {{blockCount}} are not reset therefore duplicated data is written out when {{close}}/{{flush}}.
> This is actually a conceptual question whether we should reset the buffer or not in case of an exception. In case of an exception occurs during writing the file we shall expect that the file will be corrupt. So, the possible duplication of data shall not matter.
> In the other hand if the file is already corrupt why would we try to write anything again at file close?
> This issue comes from a Flume issue where the HDFS wait thread is interrupted because of a timeout during writing an Avro file. The actual block is properly written already but because of the {{IOException}} caused by the thread interrupt we invoke {{close()}} on the writer which writes the block again with some other stuff (maybe duplicated sync marker) that makes the file corrupt.
> [~busbey], [~nkollar], [~zi], any thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)