You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2007/05/10 01:06:15 UTC

[jira] Created: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Checksum object does not get restored to the old state in retries when handle ChecksumException
-----------------------------------------------------------------------------------------------

                 Key: HADOOP-1345
                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.12.3
            Reporter: Hairong Kuang
         Assigned To: Hairong Kuang
            Priority: Blocker
             Fix For: 0.13.0


In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494787 ] 

Raghu Angadi commented on HADOOP-1345:
--------------------------------------

+1. Looks good.


> Checksum object does not get restored to the old state in retries when handle ChecksumException
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: checksum.patch, checksum.patch
>
>
> In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1345:
----------------------------------

    Status: Patch Available  (was: Open)

> Checksum object does not get restored to the old state in retries when handle ChecksumException
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: checksum.patch, checksum.patch
>
>
> In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1345:
----------------------------------

    Attachment: checksum.patch

There is a related bug. SeekToNewSources does not compute the position in .crc file correctly. The new patch refects a fix to the new bug.

> Checksum object does not get restored to the old state in retries when handle ChecksumException
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: checksum.patch, checksum.patch
>
>
> In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495963 ] 

Hadoop QA commented on HADOOP-1345:
-----------------------------------

Integrated in Hadoop-Nightly #89 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/89/)

> Checksum object does not get restored to the old state in retries when handle ChecksumException
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: checksum.patch, checksum.patch
>
>
> In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1345:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Hairong!

> Checksum object does not get restored to the old state in retries when handle ChecksumException
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: checksum.patch, checksum.patch
>
>
> In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1345:
----------------------------------

    Attachment: checksum.patch

> Checksum object does not get restored to the old state in retries when handle ChecksumException
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: checksum.patch
>
>
> In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495129 ] 

Nigel Daley commented on HADOOP-1345:
-------------------------------------

Can this be unit tested?

> Checksum object does not get restored to the old state in retries when handle ChecksumException
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: checksum.patch, checksum.patch
>
>
> In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494807 ] 

Hadoop QA commented on HADOOP-1345:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12357004/checksum.patch applied and successfully tested against trunk revision r536583.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/128/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/128/console

> Checksum object does not get restored to the old state in retries when handle ChecksumException
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: checksum.patch, checksum.patch
>
>
> In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1345) Checksum object does not get restored to the old state in retries when handle ChecksumException

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495154 ] 

Hairong Kuang commented on HADOOP-1345:
---------------------------------------

I thought about it. But it is hard to decide which replica to corrupt and deterministically produce ChecksumException with MiniDFSCluster.

> Checksum object does not get restored to the old state in retries when handle ChecksumException
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1345
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: checksum.patch, checksum.patch
>
>
> In ChecksumFile.FSInputChecker, when a ChecksumException occurs, it tries to recover from the error by reading a different replica. However, the current code does not restore the Checksum object's old state. This causes a read not able to recover from ChecksumException although there are non-corrupted replicas available if the read follows a seek to a position which is not at the checksum chunk boundary . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.