You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kostas Kougios <ko...@googlemail.com> on 2015/07/28 16:58:22 UTC

sc.parallelise to work more like a producer/consumer?

Hi, I am using sc.parallelise(...32k of items) several times for 1 job. Each
executor takes x amount of time to process it's items but this results in
some executors finishing quickly and staying idle till the others catch up.
Only after all executors complete the first 32k batch, the next batch is
send for processing.

Is there a way to make it work more as producer/consumer?

Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sc-parallelise-to-work-more-like-a-producer-consumer-tp24032.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: sc.parallelise to work more like a producer/consumer?

Posted by Kostas Kougios <ko...@googlemail.com>.
there is a work around.

sc.parallelise(items, items size / 2)

This way each executor will get a batch of 2 items at a time, simulating a
producer-consumer. With /4 it will get 4 items.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sc-parallelise-to-work-more-like-a-producer-consumer-tp24032p24085.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org