You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anthony Brew <at...@gmail.com> on 2016/02/25 00:06:10 UTC

How to Exploding a Map[String,Int] column in a DataFrame (Scala)

Hi,
     I have a Dataframe containing a column with a map Map[A,B] with
multiple values. I want to explode the key,value pairs in the map into a
new column, actually planing to create 2 new cols.

My plan had been

- explode "input": Map[K,V] to "temp":Iterable[Map[K,V]]
- new col temp to temp.key
- new col temp to temp.value
- drop temp

But I am failing at the first hurdle.

For example my data looks a bit like like....

scala> test.show()
+----------------------------+------------------------------------------+
|         id                       |      brand
        |
+----------------------------+------------------------------------------+
|a02d1fa5d87dce6a7...|Map(Vans -> 1, Versace ->2,  ...|


but I want to get to

scala> test.show()
+-----------------------------+------------------------------------------+
|         id                        |      brand_key    | brand_count   |
+-----------------------------+------------------------------------------+
| a02d1fa5d87dce6a7...|           Vans       |   1                   |
| a02d1fa5d87dce6a7...|           Versace  |    2                  |


Any suggestions would be appreciated.

Thanks,
Anthony

Re: How to Exploding a Map[String,Int] column in a DataFrame (Scala)

Posted by Michael Armbrust <mi...@databricks.com>.
You can do this using the explode function defined in
org.apache.spark.sql.functions.

Here is some example code
<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/1075277772969587/2840265927289860/fca8d7c609.html>
.


On Wed, Feb 24, 2016 at 3:06 PM, Anthony Brew <at...@gmail.com> wrote:

> Hi,
>      I have a Dataframe containing a column with a map Map[A,B] with
> multiple values. I want to explode the key,value pairs in the map into a
> new column, actually planing to create 2 new cols.
>
> My plan had been
>
> - explode "input": Map[K,V] to "temp":Iterable[Map[K,V]]
> - new col temp to temp.key
> - new col temp to temp.value
> - drop temp
>
> But I am failing at the first hurdle.
>
> For example my data looks a bit like like....
>
> scala> test.show()
> +----------------------------+------------------------------------------+
> |         id                       |      brand
>           |
> +----------------------------+------------------------------------------+
> |a02d1fa5d87dce6a7...|Map(Vans -> 1, Versace ->2,  ...|
>
>
> but I want to get to
>
> scala> test.show()
> +-----------------------------+------------------------------------------+
> |         id                        |      brand_key    | brand_count   |
> +-----------------------------+------------------------------------------+
> | a02d1fa5d87dce6a7...|           Vans       |   1                   |
> | a02d1fa5d87dce6a7...|           Versace  |    2                  |
>
>
> Any suggestions would be appreciated.
>
> Thanks,
> Anthony
>
>
>

Re: How to Exploding a Map[String,Int] column in a DataFrame (Scala)

Posted by Anthony Brew <at...@gmail.com>.
Thanks guys both those answers are really helpful. Really appreciate this.
On 25 Feb 2016 12:05 a.m., "Michał Zieliński" <zi...@gmail.com>
wrote:

> Hi Anthony,
>
> Hopefully that's what you wanted :)
>
> case class Example(id:String, myMap: Map[String,Int])
>> val myDF = sqlContext.createDataFrame(
>>   Seq(
>>     Example("a02d1fa5d87dce6a7", Map("Vans" -> 1, "Versace" ->2))
>>   )
>> )
>> myDF.
>>   explode("myMap","tempSeq")((x: Map[String,Int]) => x.toSeq).
>>   withColumn("brand_key",col("tempSeq")("_1")).
>>   withColumn("brand_count",col("tempSeq")("_2")).
>>   drop("tempSeq").
>>   show(false)
>
>
> This gives us:
>
> +-----------------+----------------------------+---------+-----------+
>> |id               |myMap                       |brand_key|brand_count|
>> +-----------------+----------------------------+---------+-----------+
>> |a02d1fa5d87dce6a7|Map(Vans -> 1, Versace -> 2)|Vans     |1          |
>> |a02d1fa5d87dce6a7|Map(Vans -> 1, Versace -> 2)|Versace  |2          |
>> +-----------------+----------------------------+---------+-----------+
>
>
>
> Cheers,
> Michael
>
> On 25 February 2016 at 00:06, Anthony Brew <at...@gmail.com> wrote:
>
>> Hi,
>>      I have a Dataframe containing a column with a map Map[A,B] with
>> multiple values. I want to explode the key,value pairs in the map into a
>> new column, actually planing to create 2 new cols.
>>
>> My plan had been
>>
>> - explode "input": Map[K,V] to "temp":Iterable[Map[K,V]]
>> - new col temp to temp.key
>> - new col temp to temp.value
>> - drop temp
>>
>> But I am failing at the first hurdle.
>>
>> For example my data looks a bit like like....
>>
>> scala> test.show()
>> +----------------------------+------------------------------------------+
>> |         id                       |      brand
>>           |
>> +----------------------------+------------------------------------------+
>> |a02d1fa5d87dce6a7...|Map(Vans -> 1, Versace ->2,  ...|
>>
>>
>> but I want to get to
>>
>> scala> test.show()
>> +-----------------------------+------------------------------------------+
>> |         id                        |      brand_key    | brand_count   |
>> +-----------------------------+------------------------------------------+
>> | a02d1fa5d87dce6a7...|           Vans       |   1                   |
>> | a02d1fa5d87dce6a7...|           Versace  |    2                  |
>>
>>
>> Any suggestions would be appreciated.
>>
>> Thanks,
>> Anthony
>>
>>
>>
>

Re: How to Exploding a Map[String,Int] column in a DataFrame (Scala)

Posted by Michał Zieliński <zi...@gmail.com>.
Hi Anthony,

Hopefully that's what you wanted :)

case class Example(id:String, myMap: Map[String,Int])
> val myDF = sqlContext.createDataFrame(
>   Seq(
>     Example("a02d1fa5d87dce6a7", Map("Vans" -> 1, "Versace" ->2))
>   )
> )
> myDF.
>   explode("myMap","tempSeq")((x: Map[String,Int]) => x.toSeq).
>   withColumn("brand_key",col("tempSeq")("_1")).
>   withColumn("brand_count",col("tempSeq")("_2")).
>   drop("tempSeq").
>   show(false)


This gives us:

+-----------------+----------------------------+---------+-----------+
> |id               |myMap                       |brand_key|brand_count|
> +-----------------+----------------------------+---------+-----------+
> |a02d1fa5d87dce6a7|Map(Vans -> 1, Versace -> 2)|Vans     |1          |
> |a02d1fa5d87dce6a7|Map(Vans -> 1, Versace -> 2)|Versace  |2          |
> +-----------------+----------------------------+---------+-----------+



Cheers,
Michael

On 25 February 2016 at 00:06, Anthony Brew <at...@gmail.com> wrote:

> Hi,
>      I have a Dataframe containing a column with a map Map[A,B] with
> multiple values. I want to explode the key,value pairs in the map into a
> new column, actually planing to create 2 new cols.
>
> My plan had been
>
> - explode "input": Map[K,V] to "temp":Iterable[Map[K,V]]
> - new col temp to temp.key
> - new col temp to temp.value
> - drop temp
>
> But I am failing at the first hurdle.
>
> For example my data looks a bit like like....
>
> scala> test.show()
> +----------------------------+------------------------------------------+
> |         id                       |      brand
>           |
> +----------------------------+------------------------------------------+
> |a02d1fa5d87dce6a7...|Map(Vans -> 1, Versace ->2,  ...|
>
>
> but I want to get to
>
> scala> test.show()
> +-----------------------------+------------------------------------------+
> |         id                        |      brand_key    | brand_count   |
> +-----------------------------+------------------------------------------+
> | a02d1fa5d87dce6a7...|           Vans       |   1                   |
> | a02d1fa5d87dce6a7...|           Versace  |    2                  |
>
>
> Any suggestions would be appreciated.
>
> Thanks,
> Anthony
>
>
>