You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2012/11/27 16:52:01 UTC

[jira] [Created] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Tom White created MAPREDUCE-4824:
------------------------------------

             Summary: Provide a mechanism for jobs to indicate they should not be recovered on restart
                 Key: MAPREDUCE-4824
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
          Components: mrv1
    Affects Versions: 1.1.0
            Reporter: Tom White
            Assignee: Tom White


Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)

Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507428#comment-13507428 ] 

Harsh J commented on MAPREDUCE-4824:
------------------------------------

+1, please commit. Thanks Tom!
                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch, MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507593#comment-13507593 ] 

Arun C Murthy commented on MAPREDUCE-4824:
------------------------------------------

Tom, I'm concerned that this might blow up different schedulers in different ways. I need to re-check, but have you tested this with all 3 scehdulers?

Maybe we need to do an 'if' check during recovery and not throw an IOException? 
                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch, MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-4824:
---------------------------------

    Attachment: MAPREDUCE-4824.patch

Here's a patch that implements this idea. Jobs that shouldn't be recovered should set mapred.job.restart.recover to false.
                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505234#comment-13505234 ] 

Bikas Saha commented on MAPREDUCE-4824:
---------------------------------------

Agree with Harsh.
I assume this config is job specific and cannot be inadvertently set to disable recovery of all jobs?
                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-4824:
---------------------------------

    Attachment: MAPREDUCE-4824.patch

Good point, Harsh. Here's a new patch with the property documented in mapred-default.xml.
                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch, MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507596#comment-13507596 ] 

Arun C Murthy commented on MAPREDUCE-4824:
------------------------------------------

Also, we might want to optimize this for hadoop-2, where in JobClient should set a field in AppSubmissionContext where-by it informs the RM that 'I do not want retries.'

Thoughts?
                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch, MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505596#comment-13505596 ] 

Harsh J commented on MAPREDUCE-4824:
------------------------------------

bq. I didn't add the property to mapred-default.xml, since it is a job-specific property and these are generally not added there.

We do have several job-specific properties with proper defaults listed in that file. Unless someone overrides them manually, how come there is harm in doing this, and must we remove the ones already present?

The file just helps serve as a good doc. behind the config feature, cause otherwise there's no doc reference to this in the patch.
                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-4824:
---------------------------------

    Attachment: MAPREDUCE-4824.patch

Thanks for the feedback. Here's an updated patch with the improved message.

I didn't add the property to mapred-default.xml, since it is a job-specific property and these are generally not added there. There's no way to have true job-specific properties, since if someone adds the property to the jobtracker's mapred-site.xml file then it will be picked up. I'm not sure there's an easy way around this. 
                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504897#comment-13504897 ] 

Harsh J commented on MAPREDUCE-4824:
------------------------------------

Hi,

- The message below in the exception can be improved I feel. I think its better to say "Job ID was not recovered since it disabled recovery-upon-restart (mapred.job.restart.recover set to false).". Also, since this case is to be expected (non-default override), I think it ought to be a simple INFO log, but I understand we need to throw an Exception to halt the loading of the JIP.

{code}
+      if (recovered && !conf.getBoolean("mapred.job.restart.recover", true)) {
+        throw new IOException("Job " + jobId + " should not be recovered " +
+            "since mapred.job.restart.recover is set to false.");
+      }
{code}

- We could also add this property to mapred-default.xml and document it that way.

The test changes look good.
                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-4824:
---------------------------------

    Attachment: MAPREDUCE-4824.patch

> I'm concerned that this might blow up different schedulers in different ways.

I don't think that's a problem since the code change only affects job submission, which kicks in before scheduling code is run.

> Maybe we need to do an 'if' check during recovery and not throw an IOException?

I had another look at this and came up with a new patch. Does it look better?

The Hadoop 2 change sounds like the right approach. At first I thought we didn't need the property in Hadoop 2, due to MAPREDUCE-2702, but actually it would allow users to mark a job as non-recoverable on a per-instance basis. It would build on YARN-128.

                
> Provide a mechanism for jobs to indicate they should not be recovered on restart
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4824
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv1
>    Affects Versions: 1.1.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch, MAPREDUCE-4824.patch, MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM calls the method.)
> Instead, we can add a MR configuration property to say that a job is not recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira