You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Guy Harmach <Gu...@Amdocs.com> on 2017/07/31 06:48:39 UTC

Running several spark actions in parallel

Hi,

I need to run a batch job written in Java that executes several SQL statements on different hive tables, and then process each partition result set in a foreachPartition() operator.
I'd like to run these actions in parallel.
I saw there are two approaches for achieving this:

1.       Using the java.util.concurrent package e.g. Future/ForkJoinPool

2.        Transforming my Dataset to JavaRDD<Row> and using the foreachPartitionAsync() on the RDD.

Can you please recommend the best way to achieve this using one of these options, or suggest a better approach?

Thanks, Guy
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>