You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2020/02/28 12:56:10 UTC

Compute the Hash of each row in new column

Hi Spark Users,
How can I compute Hash of each row and store in new column at Dataframe,
could someone help me.

Thanks

Re: Compute the Hash of each row in new column

Posted by Chetan Khatri <ch...@gmail.com>.
Thanks Enrico. I meant one hash of each single row in extra column
something like this.. val newDs = typedRows.withColumn("hash", hash(
typedRows.columns.map(col): _*))

On Mon, Mar 2, 2020 at 3:51 PM Enrico Minack <ma...@enrico.minack.dev> wrote:

> Well, then apply md5 on all columns:
>
> ds.select(ds.columns.map(col) ++ ds.columns.map(column =>
> md5(col(column)).as(s"$column hash")): _*).show(false)
>
> Enrico
>
> Am 02.03.20 um 11:10 schrieb Chetan Khatri:
>
> Thanks Enrico
> I want to compute hash of all the columns value in the row.
>
> On Fri, Feb 28, 2020 at 7:28 PM Enrico Minack <ma...@enrico.minack.dev>
> wrote:
>
>> This computes the md5 hash of a given column id of Dataset ds:
>>
>> ds.withColumn("id hash", md5($"id")).show(false)
>>
>> Test with this Dataset ds:
>>
>> import org.apache.spark.sql.types._
>> val ds = spark.range(10).select($"id".cast(StringType))
>>
>> Available are md5, sha, sha1, sha2 and hash:
>> https://spark.apache.org/docs/2.4.5/api/sql/index.html
>>
>> Enrico
>>
>>
>> Am 28.02.20 um 13:56 schrieb Chetan Khatri:
>> > Hi Spark Users,
>> > How can I compute Hash of each row and store in new column at
>> > Dataframe, could someone help me.
>> >
>> > Thanks
>>
>>
>>
>

Re: Compute the Hash of each row in new column

Posted by Enrico Minack <ma...@Enrico.Minack.dev>.
Well, then apply md5 on all columns:

ds.select(ds.columns.map(col) ++ ds.columns.map(column => 
md5(col(column)).as(s"$column hash")): _*).show(false)

Enrico

Am 02.03.20 um 11:10 schrieb Chetan Khatri:
> Thanks Enrico
> I want to compute hash of all the columns value in the row.
>
> On Fri, Feb 28, 2020 at 7:28 PM Enrico Minack <mail@enrico.minack.dev 
> <ma...@enrico.minack.dev>> wrote:
>
>     This computes the md5 hash of a given column id of Dataset ds:
>
>     ds.withColumn("id hash", md5($"id")).show(false)
>
>     Test with this Dataset ds:
>
>     import org.apache.spark.sql.types._
>     val ds = spark.range(10).select($"id".cast(StringType))
>
>     Available are md5, sha, sha1, sha2 and hash:
>     https://spark.apache.org/docs/2.4.5/api/sql/index.html
>
>     Enrico
>
>
>     Am 28.02.20 um 13:56 schrieb Chetan Khatri:
>     > Hi Spark Users,
>     > How can I compute Hash of each row and store in new column at
>     > Dataframe, could someone help me.
>     >
>     > Thanks
>
>


Re: Compute the Hash of each row in new column

Posted by Chetan Khatri <ch...@gmail.com>.
Thanks Enrico
I want to compute hash of all the columns value in the row.

On Fri, Feb 28, 2020 at 7:28 PM Enrico Minack <ma...@enrico.minack.dev>
wrote:

> This computes the md5 hash of a given column id of Dataset ds:
>
> ds.withColumn("id hash", md5($"id")).show(false)
>
> Test with this Dataset ds:
>
> import org.apache.spark.sql.types._
> val ds = spark.range(10).select($"id".cast(StringType))
>
> Available are md5, sha, sha1, sha2 and hash:
> https://spark.apache.org/docs/2.4.5/api/sql/index.html
>
> Enrico
>
>
> Am 28.02.20 um 13:56 schrieb Chetan Khatri:
> > Hi Spark Users,
> > How can I compute Hash of each row and store in new column at
> > Dataframe, could someone help me.
> >
> > Thanks
>
>
>

Re: Compute the Hash of each row in new column

Posted by Enrico Minack <ma...@Enrico.Minack.dev>.
This computes the md5 hash of a given column id of Dataset ds:

ds.withColumn("id hash", md5($"id")).show(false)

Test with this Dataset ds:

import org.apache.spark.sql.types._
val ds = spark.range(10).select($"id".cast(StringType))

Available are md5, sha, sha1, sha2 and hash: 
https://spark.apache.org/docs/2.4.5/api/sql/index.html

Enrico


Am 28.02.20 um 13:56 schrieb Chetan Khatri:
> Hi Spark Users,
> How can I compute Hash of each row and store in new column at 
> Dataframe, could someone help me.
>
> Thanks



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Compute the Hash of each row in new column

Posted by Riccardo Ferrari <fe...@gmail.com>.
Hi Chetan,

Would the sql function `hash` do the trick for your use-case ?

Best,

On Fri, Feb 28, 2020 at 1:56 PM Chetan Khatri <ch...@gmail.com>
wrote:

> Hi Spark Users,
> How can I compute Hash of each row and store in new column at Dataframe,
> could someone help me.
>
> Thanks
>