You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2007/03/15 22:12:10 UTC
[jira] Created: (HADOOP-1124) ChecksumFileSystem does not handle
ChecksumError correctly
ChecksumFileSystem does not handle ChecksumError correctly
----------------------------------------------------------
Key: HADOOP-1124
URL: https://issues.apache.org/jira/browse/HADOOP-1124
Project: Hadoop
Issue Type: Bug
Components: fs
Affects Versions: 0.12.0
Reporter: Hairong Kuang
Fix For: 0.13.0
When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
I have three comments:
1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block.
3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-1124) ChecksumFileSystem does not handle
ChecksumError correctly
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang resolved HADOOP-1124.
-----------------------------------
Resolution: Fixed
Resolved by HADOOP-1470.
> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
> Key: HADOOP-1124
> URL: https://issues.apache.org/jira/browse/HADOOP-1124
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.12.0
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.14.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block.
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-1124) ChecksumFileSystem does not handle
ChecksumError correctly
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang reassigned HADOOP-1124:
-------------------------------------
Assignee: Hairong Kuang
> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
> Key: HADOOP-1124
> URL: https://issues.apache.org/jira/browse/HADOOP-1124
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.12.0
> Reporter: Hairong Kuang
> Assigned To: Hairong Kuang
> Priority: Blocker
> Fix For: 0.13.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block.
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle
ChecksumError correctly
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-1124:
----------------------------------
Fix Version/s: (was: 0.13.0)
0.14.0
> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
> Key: HADOOP-1124
> URL: https://issues.apache.org/jira/browse/HADOOP-1124
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.12.0
> Reporter: Hairong Kuang
> Assigned To: Hairong Kuang
> Fix For: 0.14.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block.
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
RE: [jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle ChecksumError correctly
Posted by Hairong Kuang <ha...@yahoo-inc.com>.
Doug,
Yes, you are right. I made it target for 0.14.
Hairong
-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org]
Sent: Friday, May 11, 2007 11:00 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: [jira] Updated: (HADOOP-1124) ChecksumFileSystem does not
handle ChecksumError correctly
Shouldn't this be targeted for 0.14?
Doug
Hairong Kuang (JIRA) wrote:
> [
> https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.j
> ira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Hairong Kuang updated HADOOP-1124:
> ----------------------------------
>
> Priority: Major (was: Blocker)
>
>> ChecksumFileSystem does not handle ChecksumError correctly
>> ----------------------------------------------------------
>>
>> Key: HADOOP-1124
>> URL: https://issues.apache.org/jira/browse/HADOOP-1124
>> Project: Hadoop
>> Issue Type: Bug
>> Components: fs
>> Affects Versions: 0.12.0
>> Reporter: Hairong Kuang
>> Assigned To: Hairong Kuang
>> Fix For: 0.13.0
>>
>>
>> When handle ChecksumError, the checksumed file system tries to recover by
rereading from a different replica.
>> I have three comments:
>> 1. One bug in the code is that when retrying, the object that computes
checksum does not get restored to the old state.
>> 2. The code also assumes that the first byte read and the byte being read
when ChecksumError occurs are in the same block.
>> 3. It would be more efficient if we roll back to the first byte in the
chunk that's being checksumed instead of rolling back to the first byte that
was read.
>
Re: [jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle
ChecksumError correctly
Posted by Doug Cutting <cu...@apache.org>.
Shouldn't this be targeted for 0.14?
Doug
Hairong Kuang (JIRA) wrote:
> [ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Hairong Kuang updated HADOOP-1124:
> ----------------------------------
>
> Priority: Major (was: Blocker)
>
>> ChecksumFileSystem does not handle ChecksumError correctly
>> ----------------------------------------------------------
>>
>> Key: HADOOP-1124
>> URL: https://issues.apache.org/jira/browse/HADOOP-1124
>> Project: Hadoop
>> Issue Type: Bug
>> Components: fs
>> Affects Versions: 0.12.0
>> Reporter: Hairong Kuang
>> Assigned To: Hairong Kuang
>> Fix For: 0.13.0
>>
>>
>> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
>> I have three comments:
>> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
>> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block.
>> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.
>
[jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle
ChecksumError correctly
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-1124:
----------------------------------
Priority: Major (was: Blocker)
> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
> Key: HADOOP-1124
> URL: https://issues.apache.org/jira/browse/HADOOP-1124
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.12.0
> Reporter: Hairong Kuang
> Assigned To: Hairong Kuang
> Fix For: 0.13.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block.
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1124) ChecksumFileSystem does not handle
ChecksumError correctly
Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HADOOP-1124:
-------------------------------------
Priority: Blocker (was: Major)
Possible blocker for 0.13 release. Investigation in progress.
> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
> Key: HADOOP-1124
> URL: https://issues.apache.org/jira/browse/HADOOP-1124
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.12.0
> Reporter: Hairong Kuang
> Priority: Blocker
> Fix For: 0.13.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block.
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1124) ChecksumFileSystem does not handle
ChecksumError correctly
Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494594 ]
Hairong Kuang commented on HADOOP-1124:
---------------------------------------
The problems 2 and 3 described above is not critical. But problem 1 causes a job to fail on ChecksumException when a task gets a ChecksumError when read after seeking to a position which is not at the checksum chunk boundary althogh there are non-corrupted replicas available.
I plan to create a separate issue dealing with problem 1 and mark it as a Blocker, then I will mark this issue as a non-blocker.
> ChecksumFileSystem does not handle ChecksumError correctly
> ----------------------------------------------------------
>
> Key: HADOOP-1124
> URL: https://issues.apache.org/jira/browse/HADOOP-1124
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.12.0
> Reporter: Hairong Kuang
> Assigned To: Hairong Kuang
> Priority: Blocker
> Fix For: 0.13.0
>
>
> When handle ChecksumError, the checksumed file system tries to recover by rereading from a different replica.
> I have three comments:
> 1. One bug in the code is that when retrying, the object that computes checksum does not get restored to the old state.
> 2. The code also assumes that the first byte read and the byte being read when ChecksumError occurs are in the same block.
> 3. It would be more efficient if we roll back to the first byte in the chunk that's being checksumed instead of rolling back to the first byte that was read.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.