You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Jain, Nishit" <nj...@underarmour.com> on 2016/10/27 15:54:06 UTC

CSV escaping not working

I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read a csv file which has \ escapes.

val myDA = spark.read
      .option("quote",null)
    .schema(mySchema)
    .csv(filePath)


As per documentation \ is default escape for csv reader. But it does not work. Spark is reading \ as part of my data. For Ex: City column in csv file is north rocks\,au . I am expecting city column should read in code as northrocks,au. But instead spark reads it as northrocks\ and moves au to next column.

I have tried following but did not work:

  *   Explicitly defined escape .option("escape",”\\")
  *   Changed escape to | or : in file and in code
  *   I have tried using spark-csv library

Any one facing same issue? Am I missing something?

Thanks

Re: CSV escaping not working

Posted by Daniel Barclay <da...@gmail.com>.
In any case, it seems that the current behavior is not documented sufficiently.

Koert Kuipers wrote:
> i can see how unquoted csv would work if you escape delimiters, but i have never seen that in practice.
>
> On Thu, Oct 27, 2016 at 2:03 PM, Jain, Nishit <njain1@underarmour.com <ma...@underarmour.com>> wrote:
>
>     I\u2019d think quoting is only necessary if you are not escaping delimiters in data. But we can only share our opinions. It would be good to see something documented.
>     This may be the cause of the issue?: https://issues.apache.org/jira/browse/CSV-135 <https://issues.apache.org/jira/browse/CSV-135>
>
>     From: Koert Kuipers <koert@tresata.com <ma...@tresata.com>>
>     Date: Thursday, October 27, 2016 at 12:49 PM
>
>     To: "Jain, Nishit" <njain1@underarmour.com <ma...@underarmour.com>>
>     Cc: "user@spark.apache.org <ma...@spark.apache.org>" <user@spark.apache.org <ma...@spark.apache.org>>
>     Subject: Re: CSV escaping not working
>
>     well my expectation would be that if you have delimiters in your data you need to quote your values. if you now have quotes without your data you need to escape them.
>
>     so escaping is only necessary if quoted.
>
>     On Thu, Oct 27, 2016 at 1:45 PM, Jain, Nishit <njain1@underarmour.com <ma...@underarmour.com>> wrote:
>
>         Do you mind sharing why should escaping not work without quotes?
>
>         From: Koert Kuipers <koert@tresata.com <ma...@tresata.com>>
>         Date: Thursday, October 27, 2016 at 12:40 PM
>         To: "Jain, Nishit" <njain1@underarmour.com <ma...@underarmour.com>>
>         Cc: "user@spark.apache.org <ma...@spark.apache.org>" <user@spark.apache.org <ma...@spark.apache.org>>
>         Subject: Re: CSV escaping not working
>
>         that is what i would expect: escaping only works if quoted
>
>         On Thu, Oct 27, 2016 at 1:24 PM, Jain, Nishit <njain1@underarmour.com <ma...@underarmour.com>> wrote:
>
>             Interesting finding: Escaping works if data is quoted but not otherwise.
>
>             From: "Jain, Nishit" <njain1@underarmour.com <ma...@underarmour.com>>
>             Date: Thursday, October 27, 2016 at 10:54 AM
>             To: "user@spark.apache.org <ma...@spark.apache.org>" <user@spark.apache.org <ma...@spark.apache.org>>
>             Subject: CSV escaping not working
>
>             I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read a csv file which has \ escapes.
>
>             |val myDA = spark.read .option("quote",null) .schema(mySchema) .csv(filePath) |
>
>             As per documentation \ is default escape for csv reader. But it does not work. Spark is reading \ as part of my data. For Ex: City column in csv file is *north rocks\,au* . I am expecting city column should read in code as *northrocks,au*. But instead spark reads it as *northrocks\* and moves *au* to next column.
>
>             I have tried following but did not work:
>
>               * Explicitly defined escape .option("escape",\u201d\\")
>               * Changed escape to | or : in file and in code
>               * I have tried using spark-csv library
>
>             Any one facing same issue? Am I missing something?
>
>             Thanks
>
>
>
>


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: CSV escaping not working

Posted by Koert Kuipers <ko...@tresata.com>.
i can see how unquoted csv would work if you escape delimiters, but i have
never seen that in practice.

On Thu, Oct 27, 2016 at 2:03 PM, Jain, Nishit <nj...@underarmour.com>
wrote:

> I’d think quoting is only necessary if you are not escaping delimiters in
> data. But we can only share our opinions. It would be good to see something
> documented.
> This may be the cause of the issue?: https://issues.apache.
> org/jira/browse/CSV-135
>
> From: Koert Kuipers <ko...@tresata.com>
> Date: Thursday, October 27, 2016 at 12:49 PM
>
> To: "Jain, Nishit" <nj...@underarmour.com>
> Cc: "user@spark.apache.org" <us...@spark.apache.org>
> Subject: Re: CSV escaping not working
>
> well my expectation would be that if you have delimiters in your data you
> need to quote your values. if you now have quotes without your data you
> need to escape them.
>
> so escaping is only necessary if quoted.
>
> On Thu, Oct 27, 2016 at 1:45 PM, Jain, Nishit <nj...@underarmour.com>
> wrote:
>
>> Do you mind sharing why should escaping not work without quotes?
>>
>> From: Koert Kuipers <ko...@tresata.com>
>> Date: Thursday, October 27, 2016 at 12:40 PM
>> To: "Jain, Nishit" <nj...@underarmour.com>
>> Cc: "user@spark.apache.org" <us...@spark.apache.org>
>> Subject: Re: CSV escaping not working
>>
>> that is what i would expect: escaping only works if quoted
>>
>> On Thu, Oct 27, 2016 at 1:24 PM, Jain, Nishit <nj...@underarmour.com>
>> wrote:
>>
>>> Interesting finding: Escaping works if data is quoted but not otherwise.
>>>
>>> From: "Jain, Nishit" <nj...@underarmour.com>
>>> Date: Thursday, October 27, 2016 at 10:54 AM
>>> To: "user@spark.apache.org" <us...@spark.apache.org>
>>> Subject: CSV escaping not working
>>>
>>> I am using spark-core version 2.0.1 with Scala 2.11. I have simple code
>>> to read a csv file which has \ escapes.
>>>
>>> val myDA = spark.read
>>>       .option("quote",null)
>>>     .schema(mySchema)
>>>     .csv(filePath)
>>>
>>> As per documentation \ is default escape for csv reader. But it does not
>>> work. Spark is reading \ as part of my data. For Ex: City column in csv
>>> file is *north rocks\,au* . I am expecting city column should read in
>>> code as *northrocks,au*. But instead spark reads it as *northrocks\* and
>>> moves *au* to next column.
>>>
>>> I have tried following but did not work:
>>>
>>>    - Explicitly defined escape .option("escape",”\\")
>>>    - Changed escape to | or : in file and in code
>>>    - I have tried using spark-csv library
>>>
>>> Any one facing same issue? Am I missing something?
>>>
>>> Thanks
>>>
>>
>>
>

Re: CSV escaping not working

Posted by "Jain, Nishit" <nj...@underarmour.com>.
I’d think quoting is only necessary if you are not escaping delimiters in data. But we can only share our opinions. It would be good to see something documented.
This may be the cause of the issue?: https://issues.apache.org/jira/browse/CSV-135

From: Koert Kuipers <ko...@tresata.com>>
Date: Thursday, October 27, 2016 at 12:49 PM
To: "Jain, Nishit" <nj...@underarmour.com>>
Cc: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Re: CSV escaping not working

well my expectation would be that if you have delimiters in your data you need to quote your values. if you now have quotes without your data you need to escape them.

so escaping is only necessary if quoted.

On Thu, Oct 27, 2016 at 1:45 PM, Jain, Nishit <nj...@underarmour.com>> wrote:
Do you mind sharing why should escaping not work without quotes?

From: Koert Kuipers <ko...@tresata.com>>
Date: Thursday, October 27, 2016 at 12:40 PM
To: "Jain, Nishit" <nj...@underarmour.com>>
Cc: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Re: CSV escaping not working

that is what i would expect: escaping only works if quoted

On Thu, Oct 27, 2016 at 1:24 PM, Jain, Nishit <nj...@underarmour.com>> wrote:
Interesting finding: Escaping works if data is quoted but not otherwise.

From: "Jain, Nishit" <nj...@underarmour.com>>
Date: Thursday, October 27, 2016 at 10:54 AM
To: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: CSV escaping not working


I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read a csv file which has \ escapes.

val myDA = spark.read
      .option("quote",null)
    .schema(mySchema)
    .csv(filePath)


As per documentation \ is default escape for csv reader. But it does not work. Spark is reading \ as part of my data. For Ex: City column in csv file is north rocks\,au . I am expecting city column should read in code as northrocks,au. But instead spark reads it as northrocks\ and moves au to next column.

I have tried following but did not work:

  *   Explicitly defined escape .option("escape",”\\")
  *   Changed escape to | or : in file and in code
  *   I have tried using spark-csv library

Any one facing same issue? Am I missing something?

Thanks



Re: CSV escaping not working

Posted by Koert Kuipers <ko...@tresata.com>.
well my expectation would be that if you have delimiters in your data you
need to quote your values. if you now have quotes without your data you
need to escape them.

so escaping is only necessary if quoted.

On Thu, Oct 27, 2016 at 1:45 PM, Jain, Nishit <nj...@underarmour.com>
wrote:

> Do you mind sharing why should escaping not work without quotes?
>
> From: Koert Kuipers <ko...@tresata.com>
> Date: Thursday, October 27, 2016 at 12:40 PM
> To: "Jain, Nishit" <nj...@underarmour.com>
> Cc: "user@spark.apache.org" <us...@spark.apache.org>
> Subject: Re: CSV escaping not working
>
> that is what i would expect: escaping only works if quoted
>
> On Thu, Oct 27, 2016 at 1:24 PM, Jain, Nishit <nj...@underarmour.com>
> wrote:
>
>> Interesting finding: Escaping works if data is quoted but not otherwise.
>>
>> From: "Jain, Nishit" <nj...@underarmour.com>
>> Date: Thursday, October 27, 2016 at 10:54 AM
>> To: "user@spark.apache.org" <us...@spark.apache.org>
>> Subject: CSV escaping not working
>>
>> I am using spark-core version 2.0.1 with Scala 2.11. I have simple code
>> to read a csv file which has \ escapes.
>>
>> val myDA = spark.read
>>       .option("quote",null)
>>     .schema(mySchema)
>>     .csv(filePath)
>>
>> As per documentation \ is default escape for csv reader. But it does not
>> work. Spark is reading \ as part of my data. For Ex: City column in csv
>> file is *north rocks\,au* . I am expecting city column should read in
>> code as *northrocks,au*. But instead spark reads it as *northrocks\* and
>> moves *au* to next column.
>>
>> I have tried following but did not work:
>>
>>    - Explicitly defined escape .option("escape",”\\")
>>    - Changed escape to | or : in file and in code
>>    - I have tried using spark-csv library
>>
>> Any one facing same issue? Am I missing something?
>>
>> Thanks
>>
>
>

Re: CSV escaping not working

Posted by "Jain, Nishit" <nj...@underarmour.com>.
Do you mind sharing why should escaping not work without quotes?

From: Koert Kuipers <ko...@tresata.com>>
Date: Thursday, October 27, 2016 at 12:40 PM
To: "Jain, Nishit" <nj...@underarmour.com>>
Cc: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Re: CSV escaping not working

that is what i would expect: escaping only works if quoted

On Thu, Oct 27, 2016 at 1:24 PM, Jain, Nishit <nj...@underarmour.com>> wrote:
Interesting finding: Escaping works if data is quoted but not otherwise.

From: "Jain, Nishit" <nj...@underarmour.com>>
Date: Thursday, October 27, 2016 at 10:54 AM
To: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: CSV escaping not working


I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read a csv file which has \ escapes.

val myDA = spark.read
      .option("quote",null)
    .schema(mySchema)
    .csv(filePath)


As per documentation \ is default escape for csv reader. But it does not work. Spark is reading \ as part of my data. For Ex: City column in csv file is north rocks\,au . I am expecting city column should read in code as northrocks,au. But instead spark reads it as northrocks\ and moves au to next column.

I have tried following but did not work:

  *   Explicitly defined escape .option("escape",”\\")
  *   Changed escape to | or : in file and in code
  *   I have tried using spark-csv library

Any one facing same issue? Am I missing something?

Thanks


Re: CSV escaping not working

Posted by Koert Kuipers <ko...@tresata.com>.
that is what i would expect: escaping only works if quoted

On Thu, Oct 27, 2016 at 1:24 PM, Jain, Nishit <nj...@underarmour.com>
wrote:

> Interesting finding: Escaping works if data is quoted but not otherwise.
>
> From: "Jain, Nishit" <nj...@underarmour.com>
> Date: Thursday, October 27, 2016 at 10:54 AM
> To: "user@spark.apache.org" <us...@spark.apache.org>
> Subject: CSV escaping not working
>
> I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to
> read a csv file which has \ escapes.
>
> val myDA = spark.read
>       .option("quote",null)
>     .schema(mySchema)
>     .csv(filePath)
>
> As per documentation \ is default escape for csv reader. But it does not
> work. Spark is reading \ as part of my data. For Ex: City column in csv
> file is *north rocks\,au* . I am expecting city column should read in
> code as *northrocks,au*. But instead spark reads it as *northrocks\* and
> moves *au* to next column.
>
> I have tried following but did not work:
>
>    - Explicitly defined escape .option("escape",”\\")
>    - Changed escape to | or : in file and in code
>    - I have tried using spark-csv library
>
> Any one facing same issue? Am I missing something?
>
> Thanks
>

Re: CSV escaping not working

Posted by "Jain, Nishit" <nj...@underarmour.com>.
Interesting finding: Escaping works if data is quoted but not otherwise.

From: "Jain, Nishit" <nj...@underarmour.com>>
Date: Thursday, October 27, 2016 at 10:54 AM
To: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: CSV escaping not working


I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read a csv file which has \ escapes.

val myDA = spark.read
      .option("quote",null)
    .schema(mySchema)
    .csv(filePath)


As per documentation \ is default escape for csv reader. But it does not work. Spark is reading \ as part of my data. For Ex: City column in csv file is north rocks\,au . I am expecting city column should read in code as northrocks,au. But instead spark reads it as northrocks\ and moves au to next column.

I have tried following but did not work:

  *   Explicitly defined escape .option("escape",”\\")
  *   Changed escape to | or : in file and in code
  *   I have tried using spark-csv library

Any one facing same issue? Am I missing something?

Thanks