You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anthony Brew <at...@gmail.com> on 2016/02/25 00:06:10 UTC
How to Exploding a Map[String,Int] column in a DataFrame (Scala)
Hi,
I have a Dataframe containing a column with a map Map[A,B] with
multiple values. I want to explode the key,value pairs in the map into a
new column, actually planing to create 2 new cols.
My plan had been
- explode "input": Map[K,V] to "temp":Iterable[Map[K,V]]
- new col temp to temp.key
- new col temp to temp.value
- drop temp
But I am failing at the first hurdle.
For example my data looks a bit like like....
scala> test.show()
+----------------------------+------------------------------------------+
| id | brand
|
+----------------------------+------------------------------------------+
|a02d1fa5d87dce6a7...|Map(Vans -> 1, Versace ->2, ...|
but I want to get to
scala> test.show()
+-----------------------------+------------------------------------------+
| id | brand_key | brand_count |
+-----------------------------+------------------------------------------+
| a02d1fa5d87dce6a7...| Vans | 1 |
| a02d1fa5d87dce6a7...| Versace | 2 |
Any suggestions would be appreciated.
Thanks,
Anthony
Re: How to Exploding a Map[String,Int] column in a DataFrame (Scala)
Posted by Michael Armbrust <mi...@databricks.com>.
You can do this using the explode function defined in
org.apache.spark.sql.functions.
Here is some example code
<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/1075277772969587/2840265927289860/fca8d7c609.html>
.
On Wed, Feb 24, 2016 at 3:06 PM, Anthony Brew <at...@gmail.com> wrote:
> Hi,
> I have a Dataframe containing a column with a map Map[A,B] with
> multiple values. I want to explode the key,value pairs in the map into a
> new column, actually planing to create 2 new cols.
>
> My plan had been
>
> - explode "input": Map[K,V] to "temp":Iterable[Map[K,V]]
> - new col temp to temp.key
> - new col temp to temp.value
> - drop temp
>
> But I am failing at the first hurdle.
>
> For example my data looks a bit like like....
>
> scala> test.show()
> +----------------------------+------------------------------------------+
> | id | brand
> |
> +----------------------------+------------------------------------------+
> |a02d1fa5d87dce6a7...|Map(Vans -> 1, Versace ->2, ...|
>
>
> but I want to get to
>
> scala> test.show()
> +-----------------------------+------------------------------------------+
> | id | brand_key | brand_count |
> +-----------------------------+------------------------------------------+
> | a02d1fa5d87dce6a7...| Vans | 1 |
> | a02d1fa5d87dce6a7...| Versace | 2 |
>
>
> Any suggestions would be appreciated.
>
> Thanks,
> Anthony
>
>
>
Re: How to Exploding a Map[String,Int] column in a DataFrame (Scala)
Posted by Anthony Brew <at...@gmail.com>.
Thanks guys both those answers are really helpful. Really appreciate this.
On 25 Feb 2016 12:05 a.m., "Michał Zieliński" <zi...@gmail.com>
wrote:
> Hi Anthony,
>
> Hopefully that's what you wanted :)
>
> case class Example(id:String, myMap: Map[String,Int])
>> val myDF = sqlContext.createDataFrame(
>> Seq(
>> Example("a02d1fa5d87dce6a7", Map("Vans" -> 1, "Versace" ->2))
>> )
>> )
>> myDF.
>> explode("myMap","tempSeq")((x: Map[String,Int]) => x.toSeq).
>> withColumn("brand_key",col("tempSeq")("_1")).
>> withColumn("brand_count",col("tempSeq")("_2")).
>> drop("tempSeq").
>> show(false)
>
>
> This gives us:
>
> +-----------------+----------------------------+---------+-----------+
>> |id |myMap |brand_key|brand_count|
>> +-----------------+----------------------------+---------+-----------+
>> |a02d1fa5d87dce6a7|Map(Vans -> 1, Versace -> 2)|Vans |1 |
>> |a02d1fa5d87dce6a7|Map(Vans -> 1, Versace -> 2)|Versace |2 |
>> +-----------------+----------------------------+---------+-----------+
>
>
>
> Cheers,
> Michael
>
> On 25 February 2016 at 00:06, Anthony Brew <at...@gmail.com> wrote:
>
>> Hi,
>> I have a Dataframe containing a column with a map Map[A,B] with
>> multiple values. I want to explode the key,value pairs in the map into a
>> new column, actually planing to create 2 new cols.
>>
>> My plan had been
>>
>> - explode "input": Map[K,V] to "temp":Iterable[Map[K,V]]
>> - new col temp to temp.key
>> - new col temp to temp.value
>> - drop temp
>>
>> But I am failing at the first hurdle.
>>
>> For example my data looks a bit like like....
>>
>> scala> test.show()
>> +----------------------------+------------------------------------------+
>> | id | brand
>> |
>> +----------------------------+------------------------------------------+
>> |a02d1fa5d87dce6a7...|Map(Vans -> 1, Versace ->2, ...|
>>
>>
>> but I want to get to
>>
>> scala> test.show()
>> +-----------------------------+------------------------------------------+
>> | id | brand_key | brand_count |
>> +-----------------------------+------------------------------------------+
>> | a02d1fa5d87dce6a7...| Vans | 1 |
>> | a02d1fa5d87dce6a7...| Versace | 2 |
>>
>>
>> Any suggestions would be appreciated.
>>
>> Thanks,
>> Anthony
>>
>>
>>
>
Re: How to Exploding a Map[String,Int] column in a DataFrame (Scala)
Posted by Michał Zieliński <zi...@gmail.com>.
Hi Anthony,
Hopefully that's what you wanted :)
case class Example(id:String, myMap: Map[String,Int])
> val myDF = sqlContext.createDataFrame(
> Seq(
> Example("a02d1fa5d87dce6a7", Map("Vans" -> 1, "Versace" ->2))
> )
> )
> myDF.
> explode("myMap","tempSeq")((x: Map[String,Int]) => x.toSeq).
> withColumn("brand_key",col("tempSeq")("_1")).
> withColumn("brand_count",col("tempSeq")("_2")).
> drop("tempSeq").
> show(false)
This gives us:
+-----------------+----------------------------+---------+-----------+
> |id |myMap |brand_key|brand_count|
> +-----------------+----------------------------+---------+-----------+
> |a02d1fa5d87dce6a7|Map(Vans -> 1, Versace -> 2)|Vans |1 |
> |a02d1fa5d87dce6a7|Map(Vans -> 1, Versace -> 2)|Versace |2 |
> +-----------------+----------------------------+---------+-----------+
Cheers,
Michael
On 25 February 2016 at 00:06, Anthony Brew <at...@gmail.com> wrote:
> Hi,
> I have a Dataframe containing a column with a map Map[A,B] with
> multiple values. I want to explode the key,value pairs in the map into a
> new column, actually planing to create 2 new cols.
>
> My plan had been
>
> - explode "input": Map[K,V] to "temp":Iterable[Map[K,V]]
> - new col temp to temp.key
> - new col temp to temp.value
> - drop temp
>
> But I am failing at the first hurdle.
>
> For example my data looks a bit like like....
>
> scala> test.show()
> +----------------------------+------------------------------------------+
> | id | brand
> |
> +----------------------------+------------------------------------------+
> |a02d1fa5d87dce6a7...|Map(Vans -> 1, Versace ->2, ...|
>
>
> but I want to get to
>
> scala> test.show()
> +-----------------------------+------------------------------------------+
> | id | brand_key | brand_count |
> +-----------------------------+------------------------------------------+
> | a02d1fa5d87dce6a7...| Vans | 1 |
> | a02d1fa5d87dce6a7...| Versace | 2 |
>
>
> Any suggestions would be appreciated.
>
> Thanks,
> Anthony
>
>
>