You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Arkadiusz Bicz <ar...@gmail.com> on 2017/02/23 12:31:33 UTC

Re: Support for decimal separator (comma or period) in spark 2.1

>
> Hi Team,
>
> I would like to know if it is possible to specify decimal localization for
> DataFrameReader for  csv?
>
> I have cvs files from localization where decimal separator is comma like
> 0,32 instead of US way like 0.32
>
> Is it a way to specify in current version of spark to provide
> localization:
>
> spark.read.option("sep",";").option("header",
> "true").option("inferSchema", "true").format("csv").load("
> nonuslocalized.csv")
>
> If not should I create ticket in jira for this ? I can work on solution if
> not available.
>
> Best Regards,
>
> Arkadiusz Bicz
>

Re: Support for decimal separator (comma or period) in spark 2.1

Posted by Hyukjin Kwon <gu...@gmail.com>.
Please take a look at https://issues.apache.org/jira/browse/SPARK-18359.

2017-02-23 21:53 GMT+09:00 Arkadiusz Bicz <ar...@gmail.com>:

> Thank you Sam for answer, I have solved problem by loading all decimals
> columns as string and replacing all commas with dots but this solution is
> lacking of automatic infer schema which is quite nice functionality.
>
> I can work on adding new option to DataFrameReader  for localization like:
>
> spark.read.option("NumberLocale", "German").csv("filefromeurope.csv")
>
> Just wonder if it will be accepted?
>
> Best Regards,
>
> Arkadiusz Bicz
>
> On Thu, Feb 23, 2017 at 1:35 PM, Sam Elamin <hu...@gmail.com>
> wrote:
>
>> Hi Arkadiuz
>>
>> Not sure if there is a localisation ability but I'm sure other will
>> correct me if I'm wrong
>>
>>
>> What you could do is write a udf function that replaces the commas with a
>> .
>>
>> Assuming you know the column in question
>>
>>
>> Regards
>> Sam
>> On Thu, 23 Feb 2017 at 12:31, Arkadiusz Bicz <ar...@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> I would like to know if it is possible to specify decimal localization
>>> for DataFrameReader for  csv?
>>>
>>> I have cvs files from localization where decimal separator is comma like
>>> 0,32 instead of US way like 0.32
>>>
>>> Is it a way to specify in current version of spark to provide
>>> localization:
>>>
>>> spark.read.option("sep",";").option("header",
>>> "true").option("inferSchema", "true").format("csv").load("no
>>> nuslocalized.csv")
>>>
>>> If not should I create ticket in jira for this ? I can work on solution
>>> if not available.
>>>
>>> Best Regards,
>>>
>>> Arkadiusz Bicz
>>>
>>>
>>>
>

Re: Support for decimal separator (comma or period) in spark 2.1

Posted by Arkadiusz Bicz <ar...@gmail.com>.
Thank you Sam for answer, I have solved problem by loading all decimals
columns as string and replacing all commas with dots but this solution is
lacking of automatic infer schema which is quite nice functionality.

I can work on adding new option to DataFrameReader  for localization like:

spark.read.option("NumberLocale", "German").csv("filefromeurope.csv")

Just wonder if it will be accepted?

Best Regards,

Arkadiusz Bicz

On Thu, Feb 23, 2017 at 1:35 PM, Sam Elamin <hu...@gmail.com> wrote:

> Hi Arkadiuz
>
> Not sure if there is a localisation ability but I'm sure other will
> correct me if I'm wrong
>
>
> What you could do is write a udf function that replaces the commas with a .
>
> Assuming you know the column in question
>
>
> Regards
> Sam
> On Thu, 23 Feb 2017 at 12:31, Arkadiusz Bicz <ar...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> I would like to know if it is possible to specify decimal localization
>> for DataFrameReader for  csv?
>>
>> I have cvs files from localization where decimal separator is comma like
>> 0,32 instead of US way like 0.32
>>
>> Is it a way to specify in current version of spark to provide
>> localization:
>>
>> spark.read.option("sep",";").option("header",
>> "true").option("inferSchema", "true").format("csv").load("
>> nonuslocalized.csv")
>>
>> If not should I create ticket in jira for this ? I can work on solution
>> if not available.
>>
>> Best Regards,
>>
>> Arkadiusz Bicz
>>
>>
>>

Re: Support for decimal separator (comma or period) in spark 2.1

Posted by Sam Elamin <hu...@gmail.com>.
Hi Arkadiuz

Not sure if there is a localisation ability but I'm sure other will correct
me if I'm wrong


What you could do is write a udf function that replaces the commas with a .

Assuming you know the column in question


Regards
Sam
On Thu, 23 Feb 2017 at 12:31, Arkadiusz Bicz <ar...@gmail.com>
wrote:

> Hi Team,
>
> I would like to know if it is possible to specify decimal localization for
> DataFrameReader for  csv?
>
> I have cvs files from localization where decimal separator is comma like
> 0,32 instead of US way like 0.32
>
> Is it a way to specify in current version of spark to provide
> localization:
>
> spark.read.option("sep",";").option("header",
> "true").option("inferSchema",
> "true").format("csv").load("nonuslocalized.csv")
>
> If not should I create ticket in jira for this ? I can work on solution if
> not available.
>
> Best Regards,
>
> Arkadiusz Bicz
>
>
>