You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jianshi Huang <ji...@gmail.com> on 2014/12/07 07:32:30 UTC

Convert RDD[Map[String, Any]] to SchemaRDD

Hi,

What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?

I'm currently converting each Map to a JSON String and do
JsonRDD.inferSchema.

How about adding inferSchema support to Map[String, Any] directly? It would
be very useful.

Thanks,
-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Convert RDD[Map[String, Any]] to SchemaRDD

Posted by Jianshi Huang <ji...@gmail.com>.
Hi Huai,

Exactly, I'll probably implement one using the new data source API when I
have time... I've found the utility functions in JsonRDD.

Jianshi

On Tue, Dec 9, 2014 at 3:41 AM, Yin Huai <hu...@gmail.com> wrote:

> Hello Jianshi,
>
> You meant you want to convert a Map to a Struct, right? We can extract
> some useful functions from JsonRDD.scala, so others can access them.
>
> Thanks,
>
> Yin
>
> On Mon, Dec 8, 2014 at 1:29 AM, Jianshi Huang <ji...@gmail.com>
> wrote:
>
>> I checked the source code for inferSchema. Looks like this is exactly
>> what I want:
>>
>>   val allKeys = rdd.map(allKeysWithValueTypes).reduce(_ ++ _)
>>
>> Then I can do createSchema(allKeys).
>>
>> Jianshi
>>
>> On Sun, Dec 7, 2014 at 2:50 PM, Jianshi Huang <ji...@gmail.com>
>> wrote:
>>
>>> Hmm..
>>>
>>> I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
>>>
>>> Jianshi
>>>
>>> On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang <ji...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
>>>>
>>>> I'm currently converting each Map to a JSON String and do
>>>> JsonRDD.inferSchema.
>>>>
>>>> How about adding inferSchema support to Map[String, Any] directly? It
>>>> would be very useful.
>>>>
>>>> Thanks,
>>>> --
>>>> Jianshi Huang
>>>>
>>>> LinkedIn: jianshi
>>>> Twitter: @jshuang
>>>> Github & Blog: http://huangjs.github.com/
>>>>
>>>
>>>
>>>
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>
>>
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Convert RDD[Map[String, Any]] to SchemaRDD

Posted by Yin Huai <hu...@gmail.com>.
Hello Jianshi,

You meant you want to convert a Map to a Struct, right? We can extract some
useful functions from JsonRDD.scala, so others can access them.

Thanks,

Yin

On Mon, Dec 8, 2014 at 1:29 AM, Jianshi Huang <ji...@gmail.com>
wrote:

> I checked the source code for inferSchema. Looks like this is exactly what
> I want:
>
>   val allKeys = rdd.map(allKeysWithValueTypes).reduce(_ ++ _)
>
> Then I can do createSchema(allKeys).
>
> Jianshi
>
> On Sun, Dec 7, 2014 at 2:50 PM, Jianshi Huang <ji...@gmail.com>
> wrote:
>
>> Hmm..
>>
>> I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
>>
>> Jianshi
>>
>> On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang <ji...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
>>>
>>> I'm currently converting each Map to a JSON String and do
>>> JsonRDD.inferSchema.
>>>
>>> How about adding inferSchema support to Map[String, Any] directly? It
>>> would be very useful.
>>>
>>> Thanks,
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>
>>
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Re: Convert RDD[Map[String, Any]] to SchemaRDD

Posted by Jianshi Huang <ji...@gmail.com>.
I checked the source code for inferSchema. Looks like this is exactly what
I want:

  val allKeys = rdd.map(allKeysWithValueTypes).reduce(_ ++ _)

Then I can do createSchema(allKeys).

Jianshi

On Sun, Dec 7, 2014 at 2:50 PM, Jianshi Huang <ji...@gmail.com>
wrote:

> Hmm..
>
> I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
>
> Jianshi
>
> On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang <ji...@gmail.com>
> wrote:
>
>> Hi,
>>
>> What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
>>
>> I'm currently converting each Map to a JSON String and do
>> JsonRDD.inferSchema.
>>
>> How about adding inferSchema support to Map[String, Any] directly? It
>> would be very useful.
>>
>> Thanks,
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Convert RDD[Map[String, Any]] to SchemaRDD

Posted by Jianshi Huang <ji...@gmail.com>.
Hmm..

I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782

Jianshi

On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang <ji...@gmail.com>
wrote:

> Hi,
>
> What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
>
> I'm currently converting each Map to a JSON String and do
> JsonRDD.inferSchema.
>
> How about adding inferSchema support to Map[String, Any] directly? It
> would be very useful.
>
> Thanks,
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/