You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Johan Peltenburg (Jira)" <ji...@apache.org> on 2021/12/03 13:30:00 UTC

[jira] [Comment Edited] (ARROW-14905) [C++] Enable CSV Writer to handle quoting

    [ https://issues.apache.org/jira/browse/ARROW-14905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453003#comment-17453003 ] 

Johan Peltenburg edited comment on ARROW-14905 at 12/3/21, 1:29 PM:
--------------------------------------------------------------------

The default behavior is currently {{{}needed{}}}, so I'll stick to that as the default.

In the case of {{{}all{}}}, it's necessary to make a decision about whether quotes should be inserted for nulls.

What I'm currently brewing up doesn't insert them.
But in this case I'd opt for calling this option {{all_valid}} , exposing to the user that only valid (non-null) values are quoted.

The second possibility is related to ARROW-14903 where we can set a custom value for nulls.
If we choose to insert quotes everywhere, even if the null value is empty, it will produce {{{}""{}}}.
In the case of a custom null value as described in the other issue, it would also be enclosed in quotes.
Drawback is that it then becomes indistinguishable from possible strings that contain the null value.


was (Author: johanpel):
The default behavior is currently {{{}needed{}}}, so I'll stick to that as the default.

In the case of {{{}all{}}}, it's necessary to make a decision about whether quotes should be inserted for nulls.

What I'm currently brewing up doesn't insert them.
But in this case I'd opt for calling this option {{all_valid}} , exposing to the user that only valid (non-null) values are quoted.

If we choose to insert them, even if the null value is empty, it will produce {{""}} when the quote style is set to {{{}all{}}}.
This is slightly related to ARROW-14903, where in the case of a custom null value it would also be enclosed in quotes.
Drawback is that it is indistinguishable from possible strings that contain the null value.

> [C++] Enable CSV Writer to handle quoting
> -----------------------------------------
>
>                 Key: ARROW-14905
>                 URL: https://issues.apache.org/jira/browse/ARROW-14905
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++
>            Reporter: Dragoș Moldovan-Grünfeld
>            Assignee: Johan Peltenburg
>            Priority: Major
>
> This will allow more control over quoting. In {{readr::write_csv()}} {{{}quote{}}} instructs on how to handle fields which contain characters that need to be quoted: 
>  * {{{}needed{}}}: only quote fields which need them
>  * {{{}all{}}}: quote all fields - I think this might be the implicit default behaviour for {{write_csv_arrow()}}
>  * {{{}none{}}}: never quote fields



--
This message was sent by Atlassian Jira
(v8.20.1#820001)