You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Darin McBeath <dd...@yahoo.com.INVALID> on 2014/11/11 17:40:58 UTC

What should be the number of partitions after a union and a subtractByKey

Assume the following where both updatePairRDD and deletePairRDD are both HashPartitioned.  Before the union, each one of these has 512 partitions.   The new created updateDeletePairRDD has 1024 partitions.  Is this the general/expected behavior for a union (the number of partitions to double)?
JavaPairRDD<String,String> updateDeletePairRDD = updatePairRDD.union(deletePairRDD);
Then a similar question for subtractByKey.  In the example below, baselinePairRDD is HashPartitioned (with 512 partitions).  We know from above that updateDeletePairRDD has 1024 partitions.  The newly created workSubtractBaselinePairRDD has 512 partitions.  This makes sense because we are only 'subtracting' records from the baselinePairRDD and one wouldn't think the number of partitions would increase.  Is this the general/expected behavior for a subractByKey?

JavaPairRDD<String,String> workSubtractBaselinePairRDD = baselinePairRDD.subtractByKey(updateDeletePairRDD);