You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Mikhail Bernadsky (JIRA)" <ji...@apache.org> on 2014/06/08 07:45:02 UTC

[jira] [Updated] (HADOOP-10669) Avro serialization does not flush buffered serialized values causing data lost

     [ https://issues.apache.org/jira/browse/HADOOP-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Bernadsky updated HADOOP-10669:
---------------------------------------

    Attachment: HADOOP-10669.patch

> Avro serialization does not flush buffered serialized values causing data lost
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-10669
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10669
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 2.4.0
>            Reporter: Mikhail Bernadsky
>         Attachments: HADOOP-10669.patch
>
>
> Found this debugging Nutch. 
> MapTask serializes keys and values to the same stream, in pairs: 
> keySerializer.serialize(key); 
> ..... 
> valSerializer.serialize(value);
>  ..... 
> bb.write(b0, 0, 0); 
> AvroSerializer does not flush its buffer after each serialization. So if it is used for valSerializer, the values are only partially written or not written at all to the output stream before the record is marked as complete (the last line above).



--
This message was sent by Atlassian JIRA
(v6.2#6252)