You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2016/10/25 14:32:58 UTC
[jira] [Comment Edited] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

    [ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603334#comment-15603334 ] 

Jonathan Eagles edited comment on TEZ-3271 at 10/25/16 2:32 PM:
----------------------------------------------------------------

bq. generateEmptyEventsForSourceTask in EdgeManagerPlugin should not be an abstract function. Given that CartesianProductEdgeManager needs changing this is an incompatible feature. An appropriate exception thrown could be used to indicate that the EM plugin in use does not support the failure threshold percent feature.
If we strictly limit this feature to known tez outputs, we can avoid empty event generation at this time in the edge manager plugin and can promote that to the edge.

bq. I think we can add a fail-safe in the edge plugins to generate the events only for known outputs (maybe if they belong the tez runtime package ? )
I can add exception throwing to the Edge to restrict this to org.apache.tez outputs only

bq. i.e. if someone ends up writing a new output that uses a different payload we would need to throw an error at least with the current impl though we do need to figure out how the EM plugin can invoke an empty event that the Input understands. One option here would be to enhance the DME meta info to indicate empty/null payload or invoke an api on the Output to generate the empty data event.
I think this is aimed at how to implement this completely generically and should go into a follow up JIRA if we are using this jira to implement a stop-gap until a full blown implementation can be finished.

bq. As for event generation, I have a doubt with respect to recovery given that we expect all DME events to be generated before a task completes. This might be something to test more carefully on recovery to see if events are generated correctly as needed when a failed vertex is recovered or replayed as needed.
Will see about this.

bq. Unit test could be moved to TestTezJobs. At some point we probably need to get rid of a lot of the TestMRR* minicluster tests.
I am assuming you mean to reimplement in a non-MR way and not to just move the code over and so will approach this comment from that perspective.


was (Author: jeagles):
bq. generateEmptyEventsForSourceTask in EdgeManagerPlugin should not be an abstract function. Given that CartesianProductEdgeManager needs changing this is an incompatible feature. An appropriate exception thrown could be used to indicate that the EM plugin in use does not support the failure threshold percent feature.
If we strictly limit this feature to know tez outputs, we can avoid empty event generation at this time in the edge manager plugin and can promote that to the edge.

bq. I think we can add a fail-safe in the edge plugins to generate the events only for known outputs (maybe if they belong the tez runtime package ? )
I add exception throwing to the Edge to restrict this to org.apache.tez outputs only

bq. i.e. if someone ends up writing a new output that uses a different payload we would need to throw an error at least with the current impl though we do need to figure out how the EM plugin can invoke an empty event that the Input understands. One option here would be to enhance the DME meta info to indicate empty/null payload or invoke an api on the Output to generate the empty data event.
I think this is aimed at how to implement this completely generically and should go into a follow up JIRA if we are using this jira to implement a stop-gap until a full blow implementation can be finished.

bq. As for event generation, I have a doubt with respect to recovery given that we expect all DME events to be generated before a task completes. This might be something to test more carefully on recovery to see if events are generated correctly as needed when a failed vertex is recovered or replayed as needed.
Will see about this.

bq. Unit test could be moved to TestTezJobs. At some point we probably need to get rid of a lot of the TestMRR* minicluster tests.
I am assuming you mean to reimplement in a non-mr way and not to just move the code over and so will approach this comment from that perspective.

> Provide mapreduce failures.maxpercent equivalent
> ------------------------------------------------
>
>                 Key: TEZ-3271
>                 URL: https://issues.apache.org/jira/browse/TEZ-3271
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, TEZ-3271.6.patch, TEZ-3271.7.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed to cause the work to be considered a success. To meet that end, I propose we provide a tez equivalent of mapreduce.map.failures.maxpercent and mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)