You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "benj (JIRA)" <ji...@apache.org> on 2019/06/21 15:44:02 UTC
[jira] [Commented] (DRILL-6958) CTAS csv with option
[ https://issues.apache.org/jira/browse/DRILL-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869614#comment-16869614 ]
benj commented on DRILL-6958:
-----------------------------
Currently when you CTAS on csv format, if you have some text columns that contains the separator (ie ",") you will have an incorrect file
Example:
{code:java}
ALTER SESSION SET `store.format`='csv';
CREATE TABLE ....`mycsv` AS
(SELECT 'there,is,a,problem' AS col1, 'due,to,separator' AS col2, 'in,text,columns...' AS col3);
cat mycsv
col1,col2,col3
there,is,a,problem,due,to,separator,in,text,columns..{code}
So ok it's possible to use trick to place laboriously/manually quote to avoid this problem or to change the separator. examples:
{code:java}
// -- With another separator (#)
ALTER SESSION SET `store.format`='csv';
CREATE TABLE ....`mycsv` AS
(SELECT col1 || '#' || col2 || '#' || col3 AS `col1#col2#col3` FROM
(SELECT 'there,is,a,problem' AS col1, 'due,to,separator' AS col2, 'in,text,columns...' AS col3));
cat mycsv
col1#col2#col3
there,is,a,problem#due,to,separator#in,text,columns...
{code}
{code:java}
// -- With quotes (simple version that doesn't work if fields contains quote that it would be necessary to escape)
ALTER SESSION SET `store.format`='csv';
CREATE TABLE ....`mycsv` AS
(SELECT '"' || col1 || '"' AS col1, '"' || col2 || '"' AS col2, '"' || col3 || '"' AS col3 FROM
(SELECT 'there,is,a,problem' AS col1, 'due,to,separator' AS col2, 'in,text,columns...' AS col3));
cat mycsv
col1,col2,col3
"there,is,a,problem","due,to,separator","in,text,columns..."
{code}
So currently the CTAS in CSV format is not really usable.
It may be very useful to have some options to configure the write in CSV mode.
> CTAS csv with option
> --------------------
>
> Key: DRILL-6958
> URL: https://issues.apache.org/jira/browse/DRILL-6958
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Text & CSV
> Affects Versions: 1.15.0
> Reporter: benj
> Priority: Major
>
> Add some options to write CSV file with CTAS :
> * possibility to change/define the separator,
> * possibility to write or not the header,
> * possibility to force the write of only 1 file instead of lot of parts,
> * possibility to force quoting
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)