You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jean Georges Perrin <jg...@jgp.net> on 2018/01/29 22:05:18 UTC

Schema - DataTypes.NullType

Hi Sparkians,

Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema?

Thanks

jg
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Schema - DataTypes.NullType

Posted by Jean Georges Perrin <jg...@jgp.net>.
Thanks Nicholas. It makes sense. Now that I have a hint, I can play with it too!

jg

> On Feb 11, 2018, at 19:15, Nicholas Hakobian <ni...@rallyhealth.com> wrote:
> 
> I spent a few minutes poking around in the source code and found this:
> 
> The data type representing None, used for the types that cannot be inferred.
> 
> https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113 <https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113>
> 
> Playing around a bit, this is the only use case that I could immediately come up with; you have some type of a placeholder field already in data, but its always null. If you let createDataFrame (and I bet other things like DataFrameReader would behave similarly) try to infer it directly, it will error out since it can't infer the schema automatically. Doing something like below will allow the data to be used. And, if memory serves, Hive has a concept of a Null data type also for these types of situations.
> 
> In [9]: df = spark.createDataFrame([Row(id=1, val=None), Row(id=2, val=None)], schema=StructType([StructField('id', LongType()), StructField('val', NullType())]))
> 
> In [10]: df.show()
> +---+----+
> | id| val|
> +---+----+
> |  1|null|
> |  2|null|
> +---+----+
> 
> 
> In [11]: df.printSchema()
> root
>  |-- id: long (nullable = true)
>  |-- val: null (nullable = true)
> 
> 
> Nicholas Szandor Hakobian, Ph.D.
> Staff Data Scientist
> Rally Health
> nicholas.hakobian@rallyhealth.com <ma...@rallyhealth.com>
> 
> 
> On Sun, Feb 11, 2018 at 5:40 AM, Jean Georges Perrin <jgp@jgp.net <ma...@jgp.net>> wrote:
> What is the purpose of DataTypes.NullType, specially as you are building a schema? Have anyone used it or seen it as spart of a schema auto-generation?
> 
> 
> (If I keep asking long enough, I may get an answer, no? :) )
> 
> 
> > On Feb 4, 2018, at 13:15, Jean Georges Perrin <jgp@jgp.net <ma...@jgp.net>> wrote:
> >
> > Any taker on this one? ;)
> >
> >> On Jan 29, 2018, at 16:05, Jean Georges Perrin <jgp@jgp.net <ma...@jgp.net>> wrote:
> >>
> >> Hi Sparkians,
> >>
> >> Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema?
> >>
> >> Thanks
> >>
> >> jg
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> 
> 


Re: Schema - DataTypes.NullType

Posted by Nicholas Hakobian <ni...@rallyhealth.com>.
I spent a few minutes poking around in the source code and found this:

The data type representing None, used for the types that cannot be inferred.

https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113

Playing around a bit, this is the only use case that I could immediately
come up with; you have some type of a placeholder field already in data,
but its always null. If you let createDataFrame (and I bet other things
like DataFrameReader would behave similarly) try to infer it directly, it
will error out since it can't infer the schema automatically. Doing
something like below will allow the data to be used. And, if memory serves,
Hive has a concept of a Null data type also for these types of situations.

In [9]: df = spark.createDataFrame([Row(id=1, val=None), Row(id=2,
val=None)], schema=StructType([StructField('id', LongType()),
StructField('val', NullType())]))

In [10]: df.show()
+---+----+
| id| val|
+---+----+
|  1|null|
|  2|null|
+---+----+


In [11]: df.printSchema()
root
 |-- id: long (nullable = true)
 |-- val: null (nullable = true)


Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health
nicholas.hakobian@rallyhealth.com


On Sun, Feb 11, 2018 at 5:40 AM, Jean Georges Perrin <jg...@jgp.net> wrote:

> What is the purpose of DataTypes.NullType, specially as you are building a
> schema? Have anyone used it or seen it as spart of a schema auto-generation?
>
>
> (If I keep asking long enough, I may get an answer, no? :) )
>
>
> > On Feb 4, 2018, at 13:15, Jean Georges Perrin <jg...@jgp.net> wrote:
> >
> > Any taker on this one? ;)
> >
> >> On Jan 29, 2018, at 16:05, Jean Georges Perrin <jg...@jgp.net> wrote:
> >>
> >> Hi Sparkians,
> >>
> >> Can someone tell me what is the purpose of DataTypes.NullType,
> specially as you are building a schema?
> >>
> >> Thanks
> >>
> >> jg
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Schema - DataTypes.NullType

Posted by Jean Georges Perrin <jg...@jgp.net>.
What is the purpose of DataTypes.NullType, specially as you are building a schema? Have anyone used it or seen it as spart of a schema auto-generation?


(If I keep asking long enough, I may get an answer, no? :) )


> On Feb 4, 2018, at 13:15, Jean Georges Perrin <jg...@jgp.net> wrote:
> 
> Any taker on this one? ;)
> 
>> On Jan 29, 2018, at 16:05, Jean Georges Perrin <jg...@jgp.net> wrote:
>> 
>> Hi Sparkians,
>> 
>> Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema?
>> 
>> Thanks
>> 
>> jg
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Schema - DataTypes.NullType

Posted by Jean Georges Perrin <jg...@jgp.net>.
Any taker on this one? ;)

> On Jan 29, 2018, at 16:05, Jean Georges Perrin <jg...@jgp.net> wrote:
> 
> Hi Sparkians,
> 
> Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema?
> 
> Thanks
> 
> jg
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org