You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anil Kulkarni <an...@gmail.com> on 2019/07/11 20:45:10 UTC

Spark CSV Quote only NOT NULL

Hi Spark users,

My question is :
I am writing a Dataframe to csv.
Option i am using as
.option("quoteAll","true").

This is quoting even null values and making them appear as an empty string.

How do i make sure that quotes are enabled only for non null values?

-- 
Cheers,
Anil Kulkarni
about.me/anilkulkarni
[image: Anil Kulkarni on about.me]
 http://anilkulkarni.com/ <http://about.me/anilkulkarni>

Re: Spark CSV Quote only NOT NULL

Posted by Swetha Ramaiah <sw...@gmail.com>.
Glad to help!

On Sat, Jul 13, 2019 at 12:17 PM Gourav Sengupta <go...@gmail.com>
wrote:

> Hi Swetha,
> I always look into the  source code a lot but it never occured to me to
> look into the test suite, thank a ton for the tip.  Does definitely give
> quite a few ideas - thanks a ton.
>
> Thanks and Regards,
> Gourav
>
> On Fri, Jul 12, 2019 at 6:51 AM Swetha Ramaiah <sw...@gmail.com>
> wrote:
>
>> Hi Anil
>>
>> That was an example. You can replace quote with what double quotes. But
>> these options should give you an idea on how you want treat nulls, empty
>> values and quotes.
>>
>> When I faced this issues, I forked Spark repo and looked at the test
>> suite. This definitely helped me solve my issue.
>>
>> https://github.com/apache/spark/blob/v2.4.3/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
>>
>> Regards
>> Swetha
>>
>> On Jul 11, 2019, at 4:09 PM, Anil Kulkarni <an...@gmail.com> wrote:
>>
>> Hi Swetha,
>>
>> Thank you.
>> But we need the data to be quoted with ".
>> and when a field is null, we dont need the quotes around it.
>>
>> Example:
>> "A",,"B","C"
>> Thanks
>> Anil
>>
>> On Thu, Jul 11, 2019, 1:51 PM Swetha Ramaiah <sw...@gmail.com>
>> wrote:
>>
>>> If you are using Spark 2.4.0, I think you can try something like this:
>>>
>>> .option("quote", "\u0000")
>>> .option("emptyValue", “”)
>>>
>>> .option("nullValue", null)
>>>
>>> Regards
>>> Swetha
>>>
>>>
>>>
>>> On Jul 11, 2019, at 1:45 PM, Anil Kulkarni <an...@gmail.com> wrote:
>>>
>>> Hi Spark users,
>>>
>>> My question is :
>>> I am writing a Dataframe to csv.
>>> Option i am using as
>>> .option("quoteAll","true").
>>>
>>> This is quoting even null values and making them appear as an empty
>>> string.
>>>
>>> How do i make sure that quotes are enabled only for non null values?
>>>
>>> --
>>> Cheers,
>>> Anil Kulkarni
>>> about.me/anilkulkarni
>>> [image: Anil Kulkarni on about.me]
>>>  http://anilkulkarni.com/ <http://about.me/anilkulkarni>
>>>
>>>
>>>
>> --
Regards,
Swetha

Re: Spark CSV Quote only NOT NULL

Posted by Gourav Sengupta <go...@gmail.com>.
Hi Swetha,
I always look into the  source code a lot but it never occured to me to
look into the test suite, thank a ton for the tip.  Does definitely give
quite a few ideas - thanks a ton.

Thanks and Regards,
Gourav

On Fri, Jul 12, 2019 at 6:51 AM Swetha Ramaiah <sw...@gmail.com>
wrote:

> Hi Anil
>
> That was an example. You can replace quote with what double quotes. But
> these options should give you an idea on how you want treat nulls, empty
> values and quotes.
>
> When I faced this issues, I forked Spark repo and looked at the test
> suite. This definitely helped me solve my issue.
>
> https://github.com/apache/spark/blob/v2.4.3/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
>
> Regards
> Swetha
>
> On Jul 11, 2019, at 4:09 PM, Anil Kulkarni <an...@gmail.com> wrote:
>
> Hi Swetha,
>
> Thank you.
> But we need the data to be quoted with ".
> and when a field is null, we dont need the quotes around it.
>
> Example:
> "A",,"B","C"
> Thanks
> Anil
>
> On Thu, Jul 11, 2019, 1:51 PM Swetha Ramaiah <sw...@gmail.com>
> wrote:
>
>> If you are using Spark 2.4.0, I think you can try something like this:
>>
>> .option("quote", "\u0000")
>> .option("emptyValue", “”)
>>
>> .option("nullValue", null)
>>
>> Regards
>> Swetha
>>
>>
>>
>> On Jul 11, 2019, at 1:45 PM, Anil Kulkarni <an...@gmail.com> wrote:
>>
>> Hi Spark users,
>>
>> My question is :
>> I am writing a Dataframe to csv.
>> Option i am using as
>> .option("quoteAll","true").
>>
>> This is quoting even null values and making them appear as an empty
>> string.
>>
>> How do i make sure that quotes are enabled only for non null values?
>>
>> --
>> Cheers,
>> Anil Kulkarni
>> about.me/anilkulkarni
>> [image: Anil Kulkarni on about.me]
>>  http://anilkulkarni.com/ <http://about.me/anilkulkarni>
>>
>>
>>
>

Re: Spark CSV Quote only NOT NULL

Posted by Swetha Ramaiah <sw...@gmail.com>.
Hi Anil

That was an example. You can replace quote with what double quotes. But these options should give you an idea on how you want treat nulls, empty values and quotes.

When I faced this issues, I forked Spark repo and looked at the test suite. This definitely helped me solve my issue.
https://github.com/apache/spark/blob/v2.4.3/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala <https://github.com/apache/spark/blob/v2.4.3/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala>

Regards
Swetha

> On Jul 11, 2019, at 4:09 PM, Anil Kulkarni <an...@gmail.com> wrote:
> 
> Hi Swetha,
> 
> Thank you. 
> But we need the data to be quoted with ". 
> and when a field is null, we dont need the quotes around it.
> 
> Example:
> "A",,"B","C"
> Thanks
> Anil
> 
> On Thu, Jul 11, 2019, 1:51 PM Swetha Ramaiah <swetha.ramaiah@gmail.com <ma...@gmail.com>> wrote:
> If you are using Spark 2.4.0, I think you can try something like this:
> 
> .option("quote", "\u0000")
> .option("emptyValue", “”)
> .option("nullValue", null)
> Regards
> Swetha
> 
> 
>> On Jul 11, 2019, at 1:45 PM, Anil Kulkarni <anil77k@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi Spark users,
>> 
>> My question is :
>> I am writing a Dataframe to csv. 
>> Option i am using as 
>> .option("quoteAll","true").
>> 
>> This is quoting even null values and making them appear as an empty string. 
>> 
>> How do i make sure that quotes are enabled only for non null values?
>> 
>> -- 
>> Cheers,
>> Anil Kulkarni
>> about.me/anilkulkarni
>> 
>>  http://anilkulkarni.com/
>>  <http://about.me/anilkulkarni>
> 


Re: Spark CSV Quote only NOT NULL

Posted by Anil Kulkarni <an...@gmail.com>.
Hi Swetha,

Thank you.
But we need the data to be quoted with ".
and when a field is null, we dont need the quotes around it.

Example:
"A",,"B","C"
Thanks
Anil

On Thu, Jul 11, 2019, 1:51 PM Swetha Ramaiah <sw...@gmail.com>
wrote:

> If you are using Spark 2.4.0, I think you can try something like this:
>
> .option("quote", "\u0000")
> .option("emptyValue", “”)
>
> .option("nullValue", null)
>
> Regards
> Swetha
>
>
>
> On Jul 11, 2019, at 1:45 PM, Anil Kulkarni <an...@gmail.com> wrote:
>
> Hi Spark users,
>
> My question is :
> I am writing a Dataframe to csv.
> Option i am using as
> .option("quoteAll","true").
>
> This is quoting even null values and making them appear as an empty
> string.
>
> How do i make sure that quotes are enabled only for non null values?
>
> --
> Cheers,
> Anil Kulkarni
> about.me/anilkulkarni
> [image: Anil Kulkarni on about.me]
>  http://anilkulkarni.com/ <http://about.me/anilkulkarni>
>
>
>

Re: Spark CSV Quote only NOT NULL

Posted by Swetha Ramaiah <sw...@gmail.com>.
If you are using Spark 2.4.0, I think you can try something like this:

.option("quote", "\u0000")
.option("emptyValue", “”)
.option("nullValue", null)
Regards
Swetha


> On Jul 11, 2019, at 1:45 PM, Anil Kulkarni <an...@gmail.com> wrote:
> 
> Hi Spark users,
> 
> My question is :
> I am writing a Dataframe to csv. 
> Option i am using as 
> .option("quoteAll","true").
> 
> This is quoting even null values and making them appear as an empty string. 
> 
> How do i make sure that quotes are enabled only for non null values?
> 
> -- 
> Cheers,
> Anil Kulkarni
> about.me/anilkulkarni
> 
>  http://anilkulkarni.com/
>  <http://about.me/anilkulkarni>