You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/06/11 17:11:00 UTC

[jira] [Commented] (IMPALA-1127) Can't get a comma into a STRING column in Impala CSV table

    [ https://issues.apache.org/jira/browse/IMPALA-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508390#comment-16508390 ] 

Tim Armstrong commented on IMPALA-1127:
---------------------------------------

This is a side-effect of Impala and Hive not specifying an escape character by default for CSV. Hive behaves the same way.

> Can't get a comma into a STRING column in Impala CSV table
> ----------------------------------------------------------
>
>                 Key: IMPALA-1127
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1127
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 1.4
>            Reporter: John Russell
>            Priority: Minor
>
> I thought in the past I had put strings containing commas into an Impala CSV table and it did the right thing automatically (escaped the commas with \, since there isn't the notion of optional double quotes like in some text input formats). I tried just now with 1.4 and Impala would always interpret the comma as a separator regardless of escaping.
> [localhost:21000] > create table csv (c1 string, c2 string, c3 string) row format delimited fields terminated by "," stored as textfile
> [localhost:21000] > insert into csv values ("one","two","three"), ('double " quote',"single \' quote","and , comma");
> [localhost:21000] > select * from csv;
> +----------------+----------------+-------+
> | c1             | c2             | c3    |
> +----------------+----------------+-------+
> | one            | two            | three |
> | double " quote | single ' quote | and   |
> +----------------+----------------+-------+
> The bottom row of c3 is truncated where the comma appeared in the input string.
> [localhost:21000] > insert overwrite csv values ("one","two","three"), ('double " quote',"single \' quote","and \, comma");
> [localhost:21000] > select * from csv;
> +----------------+----------------+-------+
> | c1             | c2             | c3    |
> +----------------+----------------+-------+
> | one            | two            | three |
> | double " quote | single ' quote | and   |
> +----------------+----------------+-------+
> Adding a \ escape before the comma didn't help, the value is still truncated.
> Maybe the escape character is being misinterpreted and I need to double it somehow, to get the \ actually into the text file:
> [localhost:21000] > insert overwrite csv values ("one","two","three"), ('double " quote',"single \' quote","and \\, comma");
> [localhost:21000] > select * from csv;
> +----------------+----------------+-------+
> | c1             | c2             | c3    |
> +----------------+----------------+-------+
> | one            | two            | three |
> | double " quote | single ' quote | and \ |
> +----------------+----------------+-------+
> No, the \ shows up but the comma is still treated as a separator by the query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org