You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "M. Le Bihan (JIRA)" <ji...@apache.org> on 2019/02/25 14:02:00 UTC

[jira] [Comment Edited] (SPARK-26968) option("quoteMode", "NON_NUMERIC") have no effect on a CSV generation

    [ https://issues.apache.org/jira/browse/SPARK-26968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776903#comment-16776903 ] 

M. Le Bihan edited comment on SPARK-26968 at 2/25/19 2:01 PM:
--------------------------------------------------------------

It's still a problem, 
 I see no equivalent with Univocity to obtain the result I expect, which is  :

String values surrounded by quotes
 But the numeric values, not.

Else, the classic importation of that CSV in an Excel or OpenCalc program cannot easily do default conversions.
"codeCommuneCR","nomCommuneCR","populationCR","resultatComptable""03142","LENAX",267,43
This issue can be set as a regression if Univocity is unable to do it. Because before, it was possible. And the issue will be closed when this result could be reached again.

 

Don't close this issue too early please.


was (Author: mlebihan):
It's still a problem, 
I see no equivalent with Univocity to obtain the result I expect, which is  :

String values surrounded by quotes
But the numeric values, not.

Else, the classic importation of that CSV in an Excel or OpenCalc program cannot easily do default conversions.

> option("quoteMode", "NON_NUMERIC") have no effect on a CSV generation
> ---------------------------------------------------------------------
>
>                 Key: SPARK-26968
>                 URL: https://issues.apache.org/jira/browse/SPARK-26968
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: M. Le Bihan
>            Priority: Minor
>
> I have a CSV to write that has that schema :
> {code:java}
> StructType s = schema.add("codeCommuneCR", StringType, false);
> s = s.add("nomCommuneCR", StringType, false);
> s = s.add("populationCR", IntegerType, false);
> s = s.add("resultatComptable", IntegerType, false);{code}
> If I don't provide an option "_quoteMode_" or even if I set it to {{NON_NUMERIC}}, this way :
> {code:java}
> ds.coalesce(1).write().mode(SaveMode.Overwrite) .option("header", "true") .option("quoteMode", "NON_NUMERIC").option("quote", "\"") .csv("./target/out_200071470.csv");{code}
> the CSV written by {{Spark}} is this one :
> {code:java}
> codeCommuneCR,nomCommuneCR,populationCR,resultatComptable
> 03142,LENAX,267,43{code}
> If I set an option "_quoteAll_" instead, like that :
> {code:java}
> ds.coalesce(1).write().mode(SaveMode.Overwrite) .option("header", "true") .option("quoteAll", true).option("quote", "\"") .csv("./target/out_200071470.csv");{code}
> it generates :
> {code:java}
> "codeCommuneCR","nomCommuneCR","populationCR","resultatComptable" "03142","LENAX","267","43"{code}
> It seems that the {{.option("quoteMode", "NON_NUMERIC")}} is broken. It should generate:
>  
> {code:java}
> "codeCommuneCR","nomCommuneCR","populationCR","resultatComptable"
> "03142","LENAX",267,43
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org