You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "benj (JIRA)" <ji...@apache.org> on 2019/06/21 15:44:02 UTC

[jira] [Commented] (DRILL-6958) CTAS csv with option

    [ https://issues.apache.org/jira/browse/DRILL-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869614#comment-16869614 ] 

benj commented on DRILL-6958:
-----------------------------

Currently when you CTAS on csv format, if you have some text columns that contains the separator (ie ",") you will have an incorrect file

Example:
{code:java}
ALTER SESSION SET `store.format`='csv';
CREATE TABLE ....`mycsv` AS
 (SELECT 'there,is,a,problem' AS col1, 'due,to,separator' AS col2, 'in,text,columns...' AS col3);

cat mycsv
col1,col2,col3
there,is,a,problem,due,to,separator,in,text,columns..{code}
So ok it's possible to use trick to place laboriously/manually quote to avoid this problem or to change the separator. examples:
{code:java}
// -- With another separator (#)
ALTER SESSION SET `store.format`='csv';
CREATE TABLE ....`mycsv` AS
 (SELECT col1 || '#' || col2 || '#' || col3 AS `col1#col2#col3` FROM
  (SELECT 'there,is,a,problem' AS col1, 'due,to,separator' AS col2, 'in,text,columns...' AS col3));

cat mycsv
col1#col2#col3
there,is,a,problem#due,to,separator#in,text,columns...
{code}
{code:java}
// -- With quotes (simple version that doesn't work if fields contains quote that it would be necessary to escape)
ALTER SESSION SET `store.format`='csv';
CREATE TABLE ....`mycsv` AS
 (SELECT '"' || col1 || '"'  AS col1, '"' || col2 || '"' AS col2, '"' || col3 || '"' AS col3 FROM
  (SELECT 'there,is,a,problem' AS col1, 'due,to,separator' AS col2, 'in,text,columns...' AS col3));

cat mycsv
col1,col2,col3
"there,is,a,problem","due,to,separator","in,text,columns..."
{code}
So currently the CTAS in CSV format is not really usable.
 It may be very useful to have some options to configure the write in CSV mode.

 

> CTAS csv with option
> --------------------
>
>                 Key: DRILL-6958
>                 URL: https://issues.apache.org/jira/browse/DRILL-6958
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text &amp; CSV
>    Affects Versions: 1.15.0
>            Reporter: benj
>            Priority: Major
>
> Add some options to write CSV file with CTAS :
>  * possibility to change/define the separator,
>  * possibility to write or not the header,
>  * possibility to force the write of only 1 file instead of lot of parts,
>  * possibility to force quoting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)