You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Chawla,Sumit " <su...@gmail.com> on 2017/01/31 17:08:22 UTC
Unique Partition Id per partition
Hi All
I have a rdd, which i partition based on some key, and then can sc.runJob
for each partition.
Inside this function, i assign each partition a unique key using following:
"%s_%s" % (id(part), int(round(time.time()))
This is to make sure that, each partition produces separate bookeeping stuff,
which can be aggregated by external system. However, I sometimes i
notice multiple
partition results pointing to same partition_id. Is this some issue due to the
way above code is serialized by Pyspark. What's the best way to define
a unique id
for each partition. I undestand that its same executor getting
multiple partitions to process,
but i would expect the above code to produce a unique id for each partition.
Regards
Sumit Chawla
Re: Unique Partition Id per partition
Posted by Michael Allman <mi...@videoamp.com>.
Hi Sumit,
Can you use http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=rdd#pyspark.RDD.mapPartitionsWithIndex <http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=rdd#pyspark.RDD.mapPartitionsWithIndex> to solve your problem?
Michael
> On Jan 31, 2017, at 9:08 AM, Chawla,Sumit <su...@gmail.com> wrote:
>
> Hi All
>
> I have a rdd, which i partition based on some key, and then can sc.runJob for each partition.
> Inside this function, i assign each partition a unique key using following:
>
> "%s_%s" % (id(part), int(round(time.time()))
> This is to make sure that, each partition produces separate bookeeping stuff,
> which can be aggregated by external system. However, I sometimes i notice multiple
> partition results pointing to same partition_id. Is this some issue due to the
> way above code is serialized by Pyspark. What's the best way to define a unique id
> for each partition. I undestand that its same executor getting multiple partitions to process,
> but i would expect the above code to produce a unique id for each partition.
>
>
> Regards
> Sumit Chawla
>