You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Donni Khan <pr...@googlemail.com> on 2018/04/20 11:19:22 UTC

assign one identifier for all rows that have similar value in RDD

Hi Spark Users,

I want to add one Identifier for all rows that have similar values in an
specific column.
actually I have Spark RDD containing "ID_Nr" and  "Col" ,  I want to assign
one identifier for all rows that have similar value in "Col"


Doeas anyone knows any idea (codes ) to do that?

Thank you,

​

Re: assign one identifier for all rows that have similar value in RDD

Posted by Yong Zhang <ja...@hotmail.com>.
Search Spark windows and first_value function.


Yong


________________________________
From: Donni Khan <pr...@googlemail.com>
Sent: Friday, April 20, 2018 7:19 AM
To: user@spark.apache.org
Subject: assign one identifier for all rows that have similar value in RDD

Hi Spark Users,

I want to add one Identifier for all rows that have similar values in an specific column.
actually I have Spark RDD containing "ID_Nr" and  "Col" ,  I want to assign one identifier for all rows that have similar value in "Col"

[cid:ii_jg7uux2m0_162e2c14b6c73f5c]

Doeas anyone knows any idea (codes ) to do that?

Thank you,


Re: assign one identifier for all rows that have similar value in RDD

Posted by "Bowden, Chris" <ch...@microfocus.com>.
Just hash the column value


-Chris

________________________________
From: Vadim Semenov <va...@datadoghq.com>
Sent: Friday, April 20, 2018 7:09:51 AM
To: Donni Khan
Cc: user
Subject: Re: assign one identifier for all rows that have similar value in RDD

Create another rdd with one-to-one relations Col -> Id, and then join on it?

On Fri, Apr 20, 2018 at 7:19 AM, Donni Khan <pr...@googlemail.com>> wrote:
Hi Spark Users,

I want to add one Identifier for all rows that have similar values in an specific column.
actually I have Spark RDD containing "ID_Nr" and  "Col" ,  I want to assign one identifier for all rows that have similar value in "Col"

[cid:ii_jg7uux2m0_162e2c14b6c73f5c]

Doeas anyone knows any idea (codes ) to do that?

Thank you,




--
Sent from my iPhone

Re: assign one identifier for all rows that have similar value in RDD

Posted by Vadim Semenov <va...@datadoghq.com>.
Create another rdd with one-to-one relations Col -> Id, and then join on it?

On Fri, Apr 20, 2018 at 7:19 AM, Donni Khan <pr...@googlemail.com>
wrote:

> Hi Spark Users,
>
> I want to add one Identifier for all rows that have similar values in an
> specific column.
> actually I have Spark RDD containing "ID_Nr" and  "Col" ,  I want to
> assign one identifier for all rows that have similar value in "Col"
>
>
> Doeas anyone knows any idea (codes ) to do that?
>
> Thank you,
>
> ​
>



-- 
Sent from my iPhone