You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Hari Sekhon (JIRA)" <ji...@apache.org> on 2014/11/14 12:40:35 UTC

[jira] [Updated] (DRILL-1712) Quoted CSV parsing

     [ https://issues.apache.org/jira/browse/DRILL-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Sekhon updated DRILL-1712:
-------------------------------
    Description: 
When querying CSV files Drill doesn't handle quoted CSV files properly and includes the quotes in the data. The directory /tmp/hari in MapR-FS has two simple CSV files,  one quoted, one not quoted so you can see the difference.
{code}
0: jdbc:drill:> select * from dfs.`/tmp/hari` limit 10;
+------------+
|  columns   |
+------------+
| ["1","2","3"] |
| ["4","5","6"] |
| ["7","8","9"] |
| ["\"1\"","\"2\"","\"3\""] |
| ["\"4\"","\"5\"","\"6\""] |
| ["\"7\"","\"8\"","\"9\""] |
+------------+
6 rows selected (0.238 seconds)

 cat hari/hari.csv
1,2,3
4,5,6
7,8,9
cat hari/hari2.csv
"1","2","3"
"4","5","6"
"7","8","9"
{code}
It shouldn't be including the quotes as data, they're just containers to the data.

This is related to DRILL-950 but is not the same issue.

  was:
When querying CSV files Drill doesn't handle quoted CSV files properly and includes the quotes in the data. The directory /tmp/hari in MapR-FS has two simple CSV files,  one quoted, one not quoted so you can see the difference.
{code}
0: jdbc:drill:> select * from dfs.`/tmp/hari` limit 10;
+------------+
|  columns   |
+------------+
| ["1","2","3"] |
| ["4","5","6"] |
| ["7","8","9"] |
| ["\"1\"","\"2\"","\"3\""] |
| ["\"4\"","\"5\"","\"6\""] |
| ["\"7\"","\"8\"","\"9\""] |
+------------+
6 rows selected (0.238 seconds)

 cat hari/hari.csv
1,2,3
4,5,6
7,8,9
cat hari/hari2.csv
"1","2","3"
"4","5","6"
"7","8","9"
{code}
It shouldn't be including the quotes as data, they're just containers to the data.


> Quoted CSV parsing
> ------------------
>
>                 Key: DRILL-1712
>                 URL: https://issues.apache.org/jira/browse/DRILL-1712
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 0.6.0
>         Environment: MapR 4.0.1 M5
>            Reporter: Hari Sekhon
>
> When querying CSV files Drill doesn't handle quoted CSV files properly and includes the quotes in the data. The directory /tmp/hari in MapR-FS has two simple CSV files,  one quoted, one not quoted so you can see the difference.
> {code}
> 0: jdbc:drill:> select * from dfs.`/tmp/hari` limit 10;
> +------------+
> |  columns   |
> +------------+
> | ["1","2","3"] |
> | ["4","5","6"] |
> | ["7","8","9"] |
> | ["\"1\"","\"2\"","\"3\""] |
> | ["\"4\"","\"5\"","\"6\""] |
> | ["\"7\"","\"8\"","\"9\""] |
> +------------+
> 6 rows selected (0.238 seconds)
>  cat hari/hari.csv
> 1,2,3
> 4,5,6
> 7,8,9
> cat hari/hari2.csv
> "1","2","3"
> "4","5","6"
> "7","8","9"
> {code}
> It shouldn't be including the quotes as data, they're just containers to the data.
> This is related to DRILL-950 but is not the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)