You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by AnilKumar B <ak...@gmail.com> on 2017/06/14 18:01:28 UTC

Configurable Task level time outs and task failures

Hi,

In some of the data science use cases like Predictions etc, we are using
Spark. Most of the times, we faced data skew ness issues and we have
distributed them using Murmur hashing or round robin assignment and fixed
skew ness issue across the partitions/tasks.

But still, some of the tasks are taking huge time due to it's logical flow
based on the nature of the data for a particular key. For our use cases, we
are OK, if we omit few tasks, if they cannot complete in certain amount of
time.

That's why we have implemented task level time outs and our job is still
successful, even some of the tasks, are not completed in defined time, with
this we are able to define the SLA's for our Spark applications.

Is there any mechanism from Spark framework to define the task level time
outs and making job successful even with x% of the tasks are successful.
(Where x can be configured)? And anyone faced such issues?


Thanks & Regards,
B Anil Kumar.