You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2016/03/16 23:44:43 UTC

Re: df.dtypes -> pyspark.sql.types

We probably should have the alias. Is this still a problem on master
branch?

On Wed, Mar 16, 2016 at 9:40 AM, Ruslan Dautkhanov <da...@gmail.com>
wrote:

> Running following:
>
> #fix schema for gaid which should not be Double
>> from pyspark.sql.types import *
>> customSchema = StructType()
>> for (col,typ) in tsp_orig.dtypes:
>>     if col=='Agility_GAID':
>>         typ='string'
>>     customSchema.add(col,typ,True)
>
>
> Getting
>
>   ValueError: Could not parse datatype: bigint
>
>
> Looks like pyspark.sql.types doesn't know anything about bigint..
> Should it be aliased to LongType in pyspark.sql.types?
>
> Thanks
>
>
> On Wed, Mar 16, 2016 at 10:18 AM, Ruslan Dautkhanov <da...@gmail.com>
> wrote:
>
>> Hello,
>>
>> Looking at
>>
>> https://spark.apache.org/docs/1.5.1/api/python/_modules/pyspark/sql/types.html
>>
>> and can't wrap my head around how to convert string data types names to
>> actual
>> pyspark.sql.types data types?
>>
>> Does pyspark.sql.types has an interface to return StringType() for
>> "string",
>> IntegerType() for "integer" etc? If it doesn't exist it would be great to
>> have such a
>> mapping function.
>>
>> Thank you.
>>
>>
>> ps. I have a data frame, and use its dtypes to loop through all columns
>> to fix a few
>> columns' data types as a workaround for SPARK-13866.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>
>

Re: df.dtypes -> pyspark.sql.types

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Spark 1.5 is the latest that I have access to and where this problem
happens.

I don't see it's fixed in master but I might be wrong. diff atatched.

https://raw.githubusercontent.com/apache/spark/branch-1.5/python/pyspark/sql/types.py
https://raw.githubusercontent.com/apache/spark/d57daf1f7732a7ac54a91fe112deeda0a254f9ef/python/pyspark/sql/types.py



-- 
Ruslan Dautkhanov

On Wed, Mar 16, 2016 at 4:44 PM, Reynold Xin <rx...@databricks.com> wrote:

> We probably should have the alias. Is this still a problem on master
> branch?
>
> On Wed, Mar 16, 2016 at 9:40 AM, Ruslan Dautkhanov <da...@gmail.com>
> wrote:
>
>> Running following:
>>
>> #fix schema for gaid which should not be Double
>>> from pyspark.sql.types import *
>>> customSchema = StructType()
>>> for (col,typ) in tsp_orig.dtypes:
>>>     if col=='Agility_GAID':
>>>         typ='string'
>>>     customSchema.add(col,typ,True)
>>
>>
>> Getting
>>
>>   ValueError: Could not parse datatype: bigint
>>
>>
>> Looks like pyspark.sql.types doesn't know anything about bigint..
>> Should it be aliased to LongType in pyspark.sql.types?
>>
>> Thanks
>>
>>
>> On Wed, Mar 16, 2016 at 10:18 AM, Ruslan Dautkhanov <dautkhanov@gmail.com
>> > wrote:
>>
>>> Hello,
>>>
>>> Looking at
>>>
>>> https://spark.apache.org/docs/1.5.1/api/python/_modules/pyspark/sql/types.html
>>>
>>> and can't wrap my head around how to convert string data types names to
>>> actual
>>> pyspark.sql.types data types?
>>>
>>> Does pyspark.sql.types has an interface to return StringType() for
>>> "string",
>>> IntegerType() for "integer" etc? If it doesn't exist it would be great
>>> to have such a
>>> mapping function.
>>>
>>> Thank you.
>>>
>>>
>>> ps. I have a data frame, and use its dtypes to loop through all columns
>>> to fix a few
>>> columns' data types as a workaround for SPARK-13866.
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>
>>
>