You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2014/07/30 23:02:40 UTC

[jira] [Resolved] (HDFS-1264) 0.20: OOME in HDFS client made an unrecoverable HDFS block

     [ https://issues.apache.org/jira/browse/HDFS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved HDFS-1264.
------------------------------------

    Resolution: Fixed

> 0.20: OOME in HDFS client made an unrecoverable HDFS block
> ----------------------------------------------------------
>
>                 Key: HDFS-1264
>                 URL: https://issues.apache.org/jira/browse/HDFS-1264
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs-client
>    Affects Versions: 0.20-append
>            Reporter: Todd Lipcon
>             Fix For: 0.20-append
>
>         Attachments: blk_logs_sorted.txt, hdfs-679-testcase-20.txt
>
>
> Ran into a bad issue in testing overnight. One of the writers experienced an OOME in the middle of writing a checksum chunk to the stream inside a sync() call. It then proceeded to retry recovery on each DN in the pipeline, but each recovery failed because its internal checksum buffer was borked in some way - on the DNs I see "Unexpected checksum mismatch" errors after each recovery attempt.
> When another client tried to recover the file using appendFile, it got the "Partial CRC 3766269197 does not match value computed the  last time file was closed" error (plus there was only one replica left in targets). It thus failed to set up the append pipeline, and ran into HDFS-1262.
> This was on 0.20-append, though it may happen on trunk as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)