You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2007/03/15 22:12:10 UTC

[jira] Created: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly

ChecksumFileSystem does not handle ChecksumError correctly
----------------------------------------------------------

                 Key: HADOOP-1124
                 URL: https://issues.apache.org/jira/browse/HADOOP-1124
             Project: Hadoop
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.12.0
            Reporter: Hairong Kuang
             Fix For: 0.13.0


When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.

I have three comments:
1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block. 
3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang resolved HADOOP-1124.
-----------------------------------

    Resolution: Fixed

Resolved by HADOOP-1470.

> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
>                 Key: HADOOP-1124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1124
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.12.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block. 
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang reassigned HADOOP-1124:
-------------------------------------

    Assignee: Hairong Kuang

> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
>                 Key: HADOOP-1124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1124
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.12.0
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block. 
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1124:
----------------------------------

    Fix Version/s:     (was: 0.13.0)
                   0.14.0

> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
>                 Key: HADOOP-1124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1124
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.12.0
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.14.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block. 
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: [jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly

Posted by Hairong Kuang <ha...@yahoo-inc.com>.

Doug,

Yes, you are right. I made it target for 0.14.

Hairong 

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Friday, May 11, 2007 11:00 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: [jira] Updated: (HADOOP-1124) ChecksumFileSystem does not
handle ChecksumError correctly

Shouldn't this be targeted for 0.14?

Doug

Hairong Kuang (JIRA) wrote:
>      [ 
> https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.j
> ira.plugin.system.issuetabpanels:all-tabpanel ]
> 
> Hairong Kuang updated HADOOP-1124:
> ----------------------------------
> 
>     Priority: Major  (was: Blocker)
> 
>> ChecksumFileSystem does not handle ChecksumError correctly
>> ----------------------------------------------------------
>>
>>                 Key: HADOOP-1124
>>                 URL: https://issues.apache.org/jira/browse/HADOOP-1124
>>             Project: Hadoop
>>          Issue Type: Bug
>>          Components: fs
>>    Affects Versions: 0.12.0
>>            Reporter: Hairong Kuang
>>         Assigned To: Hairong Kuang
>>             Fix For: 0.13.0
>>
>>
>> When handle ChecksumError, the checksumed file system tries to recover by
rereading from a different replica.
>> I have three comments:
>> 1. One bug in the code is that when retrying, the object that computes
checksum does not get restored to the old state.
>> 2. The code also assumes that the first byte read and the byte being read
when ChecksumError occurs are in the same block. 
>> 3. It would be more efficient if we roll back to the first byte in the
chunk that's being checksumed instead of rolling back to the first byte that
was read.
>

Re: [jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly

Posted by Doug Cutting <cu...@apache.org>.

Shouldn't this be targeted for 0.14?

Doug

Hairong Kuang (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> 
> Hairong Kuang updated HADOOP-1124:
> ----------------------------------
> 
>     Priority: Major  (was: Blocker)
> 
>> ChecksumFileSystem does not handle ChecksumError correctly
>> ----------------------------------------------------------
>>
>>                 Key: HADOOP-1124
>>                 URL: https://issues.apache.org/jira/browse/HADOOP-1124
>>             Project: Hadoop
>>          Issue Type: Bug
>>          Components: fs
>>    Affects Versions: 0.12.0
>>            Reporter: Hairong Kuang
>>         Assigned To: Hairong Kuang
>>             Fix For: 0.13.0
>>
>>
>> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
>> I have three comments:
>> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
>> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block. 
>> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.
>

[jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1124:
----------------------------------

    Priority: Major  (was: Blocker)

> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
>                 Key: HADOOP-1124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1124
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.12.0
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.13.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block. 
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1124:
-------------------------------------

    Priority: Blocker  (was: Major)

Possible blocker for 0.13 release. Investigation in progress.

> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
>                 Key: HADOOP-1124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1124
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.12.0
>            Reporter: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block. 
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494594 ] 

Hairong Kuang commented on HADOOP-1124:
---------------------------------------

The problems 2 and 3 described above is not critical. But problem 1 causes a  job to fail on ChecksumException when a task gets a ChecksumError when read after seeking to a position which is not at the checksum chunk boundary althogh there are non-corrupted replicas available.

I plan to create a separate issue dealing with problem 1 and mark it as a Blocker, then I will mark this issue as a non-blocker.

> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
>                 Key: HADOOP-1124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1124
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.12.0
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block. 
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.