You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by domibd <db...@lipn.univ-paris13.fr> on 2016/01/04 13:05:56 UTC

stopping a process usgin an RDD

Hello,

Is there a way to stop under a condition a process (like map-reduce) using 
an RDD ?

(this could be use if the process does not always need to
 explore all the RDD)

thanks

Dominique





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/stopping-a-process-usgin-an-RDD-tp25870.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: stopping a process usgin an RDD

Posted by Michael Segel <ms...@hotmail.com>.

Not really a good idea. 

It breaks the paradigm. 

If I understand the OP’s idea… they want to halt processing the RDD, but not the entire job. 
So when it hits a certain condition, it will stop that task yet continue on to the next RDD. (Assuming you have more RDDs or partitions than you have task ’slots’)  So if you fail enough RDDs, your job fails meaning you don’t get any results. 

The best you could do is a NOOP.  That is… if your condition is met on that RDD, your M/R job will not output anything to the collection so no more data is being added to the result set. 

The whole paradigm is to process the entire RDD at the time. 

You may spin cycles, but that’s not a really bad thing. 

HTH

-Mike

> On Jan 4, 2016, at 6:45 AM, Daniel Darabos <da...@lynxanalytics.com> wrote:
> 
> You can cause a failure by throwing an exception in the code running on the executors. The task will be retried (if spark.task.maxFailures > 1), and then the stage is failed. No further tasks are processed after that, and an exception is thrown on the driver. You could catch the exception and see if it was caused by your own special exception.
> 
> On Mon, Jan 4, 2016 at 1:05 PM, domibd <db@lipn.univ-paris13.fr <ma...@lipn.univ-paris13.fr>> wrote:
> Hello,
> 
> Is there a way to stop under a condition a process (like map-reduce) using
> an RDD ?
> 
> (this could be use if the process does not always need to
>  explore all the RDD)
> 
> thanks
> 
> Dominique
> 
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/stopping-a-process-usgin-an-RDD-tp25870.html <http://apache-spark-user-list.1001560.n3.nabble.com/stopping-a-process-usgin-an-RDD-tp25870.html>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <ma...@spark.apache.org>
> 
>

Re: stopping a process usgin an RDD

Posted by Daniel Darabos <da...@lynxanalytics.com>.

You can cause a failure by throwing an exception in the code running on the
executors. The task will be retried (if spark.task.maxFailures > 1), and
then the stage is failed. No further tasks are processed after that, and an
exception is thrown on the driver. You could catch the exception and see if
it was caused by your own special exception.

On Mon, Jan 4, 2016 at 1:05 PM, domibd <db...@lipn.univ-paris13.fr> wrote:

> Hello,
>
> Is there a way to stop under a condition a process (like map-reduce) using
> an RDD ?
>
> (this could be use if the process does not always need to
>  explore all the RDD)
>
> thanks
>
> Dominique
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/stopping-a-process-usgin-an-RDD-tp25870.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>