You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Daniil Osipov <da...@shazam.com> on 2015/02/03 01:16:47 UTC

[spark-sql] JsonRDD

Hey Spark developers,

Is there a good reason for JsonRDD being a Scala object as opposed to
class? Seems most other RDDs are classes, and can be extended.

The reason I'm asking is that there is a problem with Hive interoperability
with JSON DataFrames where jsonFile generates case sensitive schema, while
Hive expects case insensitive and fails with an exception during
saveAsTable if there are two columns with the same name in different case.

I'm trying to resolve the problem, but that requires me to extend JsonRDD,
which I can't do. Other RDDs are subclass friendly, why is JsonRDD
different?

Dan

Re: [spark-sql] JsonRDD

Posted by Yin Huai <yh...@databricks.com>.
We probably will extract general purpose functions from JsonRDD and also do
the renaming through https://issues.apache.org/jira/browse/SPARK-5260.

On Tue, Feb 3, 2015 at 9:15 AM, Daniil Osipov <da...@shazam.com>
wrote:

> Thanks Reynold,
>
> Case sensitivity issues are definitely orthogonal. I'll submit a bug or PR.
>
> Is there a way to rename the object to eliminate the confusion? Not sure
> how locked down the API is at this time, but it seems like a potential
> confusion point for developers.
>
> On Mon, Feb 2, 2015 at 4:30 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> It's bad naming - JsonRDD is actually not an RDD. It is just a set of
>> util methods.
>>
>> The case sensitivity issues seem orthogonal, and would be great to be
>> able to control that with a flag.
>>
>>
>> On Mon, Feb 2, 2015 at 4:16 PM, Daniil Osipov <da...@shazam.com>
>> wrote:
>>
>>> Hey Spark developers,
>>>
>>> Is there a good reason for JsonRDD being a Scala object as opposed to
>>> class? Seems most other RDDs are classes, and can be extended.
>>>
>>> The reason I'm asking is that there is a problem with Hive
>>> interoperability
>>> with JSON DataFrames where jsonFile generates case sensitive schema,
>>> while
>>> Hive expects case insensitive and fails with an exception during
>>> saveAsTable if there are two columns with the same name in different
>>> case.
>>>
>>> I'm trying to resolve the problem, but that requires me to extend
>>> JsonRDD,
>>> which I can't do. Other RDDs are subclass friendly, why is JsonRDD
>>> different?
>>>
>>> Dan
>>>
>>
>>
>

Re: [spark-sql] JsonRDD

Posted by Daniil Osipov <da...@shazam.com>.
Thanks Reynold,

Case sensitivity issues are definitely orthogonal. I'll submit a bug or PR.

Is there a way to rename the object to eliminate the confusion? Not sure
how locked down the API is at this time, but it seems like a potential
confusion point for developers.

On Mon, Feb 2, 2015 at 4:30 PM, Reynold Xin <rx...@databricks.com> wrote:

> It's bad naming - JsonRDD is actually not an RDD. It is just a set of util
> methods.
>
> The case sensitivity issues seem orthogonal, and would be great to be able
> to control that with a flag.
>
>
> On Mon, Feb 2, 2015 at 4:16 PM, Daniil Osipov <da...@shazam.com>
> wrote:
>
>> Hey Spark developers,
>>
>> Is there a good reason for JsonRDD being a Scala object as opposed to
>> class? Seems most other RDDs are classes, and can be extended.
>>
>> The reason I'm asking is that there is a problem with Hive
>> interoperability
>> with JSON DataFrames where jsonFile generates case sensitive schema, while
>> Hive expects case insensitive and fails with an exception during
>> saveAsTable if there are two columns with the same name in different case.
>>
>> I'm trying to resolve the problem, but that requires me to extend JsonRDD,
>> which I can't do. Other RDDs are subclass friendly, why is JsonRDD
>> different?
>>
>> Dan
>>
>
>

Re: [spark-sql] JsonRDD

Posted by Reynold Xin <rx...@databricks.com>.
It's bad naming - JsonRDD is actually not an RDD. It is just a set of util
methods.

The case sensitivity issues seem orthogonal, and would be great to be able
to control that with a flag.


On Mon, Feb 2, 2015 at 4:16 PM, Daniil Osipov <da...@shazam.com>
wrote:

> Hey Spark developers,
>
> Is there a good reason for JsonRDD being a Scala object as opposed to
> class? Seems most other RDDs are classes, and can be extended.
>
> The reason I'm asking is that there is a problem with Hive interoperability
> with JSON DataFrames where jsonFile generates case sensitive schema, while
> Hive expects case insensitive and fails with an exception during
> saveAsTable if there are two columns with the same name in different case.
>
> I'm trying to resolve the problem, but that requires me to extend JsonRDD,
> which I can't do. Other RDDs are subclass friendly, why is JsonRDD
> different?
>
> Dan
>