You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "ZhuGuanyin (JIRA)" <ji...@apache.org> on 2009/12/01 04:07:29 UTC

[jira] Created: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.

job.xml should add crc check in tasktracker and sub jvm.
--------------------------------------------------------

                 Key: MAPREDUCE-1254
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
          Components: task, tasktracker
    Affects Versions: 0.22.0
            Reporter: ZhuGuanyin


Currently job.xml in tasktracker and subjvm are write to local disk through ChecksumFilesystem, and already had crc checksum information, but load the job.xml file without crc check. It would cause the mapred job finished successful but with wrong data because of disk error.  Example: The tasktracker and sub task jvm would load the default configuration if it doesn't successfully load the job.xml which maybe replace the mapper with IdentityMapper. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.

Posted by "ZhuGuanyin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785838#action_12785838 ] 

ZhuGuanyin commented on MAPREDUCE-1254:
---------------------------------------

Because the local inexpensive disks are not reliable, and we once found the non zero file became zero length, but the os kernel message has no warning, while some minutes later, the kernel message report the disk failtures. Durining that time,  the read operation return success without throw any IOException. 

In current implementation, it would throw IOException if the job.xml missing, but it couldn't detect the configuration file has corrupted or has being truncated.

> job.xml should add crc check in tasktracker and sub jvm.
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-1254
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: ZhuGuanyin
>
> Currently job.xml in tasktracker and subjvm are write to local disk through ChecksumFilesystem, and already had crc checksum information, but load the job.xml file without crc check. It would cause the mapred job finished successful but with wrong data because of disk error.  Example: The tasktracker and sub task jvm would load the default configuration if it doesn't successfully load the job.xml which maybe replace the mapper with IdentityMapper. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785788#action_12785788 ] 

Zheng Shao commented on MAPREDUCE-1254:
---------------------------------------

Can you explain why the sub task jvm will continue if it doesn't successfully load the job.xml?
Shouldn't it error out with an IOException?


> job.xml should add crc check in tasktracker and sub jvm.
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-1254
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: ZhuGuanyin
>
> Currently job.xml in tasktracker and subjvm are write to local disk through ChecksumFilesystem, and already had crc checksum information, but load the job.xml file without crc check. It would cause the mapred job finished successful but with wrong data because of disk error.  Example: The tasktracker and sub task jvm would load the default configuration if it doesn't successfully load the job.xml which maybe replace the mapper with IdentityMapper. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788912#action_12788912 ] 

Todd Lipcon commented on MAPREDUCE-1254:
----------------------------------------

bq. if it does happen, the corrupted data or default data would load without notice

This seems like a bug on its own (or a bug waiting to happen)

I'm not against the CRC (I think it's a good idea) but we should also fail a job if job.xml fails to parse as valid XML, I think.

> job.xml should add crc check in tasktracker and sub jvm.
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-1254
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: ZhuGuanyin
>
> Currently job.xml in tasktracker and subjvm are write to local disk through ChecksumFilesystem, and already had crc checksum information, but load the job.xml file without crc check. It would cause the mapred job finished successful but with wrong data because of disk error.  Example: The tasktracker and sub task jvm would load the default configuration if it doesn't successfully load the job.xml which maybe replace the mapper with IdentityMapper. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786259#action_12786259 ] 

Zheng Shao commented on MAPREDUCE-1254:
---------------------------------------

Got it. It seems a good idea to read and check the checksum.
Will you upload a patch including a simple test case?


> job.xml should add crc check in tasktracker and sub jvm.
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-1254
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: ZhuGuanyin
>
> Currently job.xml in tasktracker and subjvm are write to local disk through ChecksumFilesystem, and already had crc checksum information, but load the job.xml file without crc check. It would cause the mapred job finished successful but with wrong data because of disk error.  Example: The tasktracker and sub task jvm would load the default configuration if it doesn't successfully load the job.xml which maybe replace the mapper with IdentityMapper. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786261#action_12786261 ] 

Todd Lipcon commented on MAPREDUCE-1254:
----------------------------------------

Curious why the XML reading doesn't fail for an empty file. Emptiness is not valid XML, right?

> job.xml should add crc check in tasktracker and sub jvm.
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-1254
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: ZhuGuanyin
>
> Currently job.xml in tasktracker and subjvm are write to local disk through ChecksumFilesystem, and already had crc checksum information, but load the job.xml file without crc check. It would cause the mapred job finished successful but with wrong data because of disk error.  Example: The tasktracker and sub task jvm would load the default configuration if it doesn't successfully load the job.xml which maybe replace the mapper with IdentityMapper. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.

Posted by "ZhuGuanyin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788657#action_12788657 ] 

ZhuGuanyin commented on MAPREDUCE-1254:
---------------------------------------

I just show the example that the inexpensive disk are not reliable, the kernel doesn't notice the hardware failture while it has being truncated.

1)job.xml in configuration are loaded asynchronous, and if it could  corrupted or missing before parse it, if it does happen, the corrupted data or default data would load without notice(that means some task run the right configuration, but some would run with wrong configurations);

2)the job.xml has so many important parameters, it need check before used;

3) if it doesn't crc check, why we generate the crc checksum file?  :)

> job.xml should add crc check in tasktracker and sub jvm.
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-1254
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: ZhuGuanyin
>
> Currently job.xml in tasktracker and subjvm are write to local disk through ChecksumFilesystem, and already had crc checksum information, but load the job.xml file without crc check. It would cause the mapred job finished successful but with wrong data because of disk error.  Example: The tasktracker and sub task jvm would load the default configuration if it doesn't successfully load the job.xml which maybe replace the mapper with IdentityMapper. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.