You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Liu, Raymond" <ra...@intel.com> on 2014/09/01 03:00:08 UTC

RE: The concurrent model of spark job/stage/task

1,2 :As the docs mentioned, "if they were submitted from separate threads" say, you fork your main thread and invoke action in each thread. Job and stage is always numbered in order , while not necessary corresponding to their execute order, but generated order. In your case, If you just call multiple actions in single thread, each job will be blocked until finish.

3 : rdd.collect to driver side if you like, but maybe you would prefer to do it in worker side by apply your logic in some transform actions.

Best Regards,
Raymond Liu

From: 李华 [mailto:35597813@qq.com] 
Sent: Thursday, August 28, 2014 4:39 PM
To: user
Subject: The concurrent model of spark job/stage/task 

hi, guys

  I am trying to understand how spark work on the concurrent model. I read below from https://spark.apache.org/docs/1.0.2/job-scheduling.html 

quote
" Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users)."

I searched everywhere but not get:
1. how to start 2 or more jobs in one spark driver, in java code.. I wrote 2 actions in the code, but the job still staged in index 0, 1, 2, 3... looks they run secquencly.
2. are the stages run currently? because they always number in order 0, 1. 2. 3.. I obverserved on the spark stage UI.
3. Can I retrieve the data out of RDD? like populate a pojo myself and compute on it.

Thanks in advance, guys.

________________________________________

‍