You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/06/19 19:13:00 UTC

[jira] [Updated] (IMPALA-3478) Support for UTF-8 BOM on text backed tables.

     [ https://issues.apache.org/jira/browse/IMPALA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong updated IMPALA-3478:
----------------------------------
    Labels: newbie ramp-up  (was: ramp-up)

> Support for UTF-8 BOM on text backed tables.
> --------------------------------------------
>
>                 Key: IMPALA-3478
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3478
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Clients
>    Affects Versions: Impala 2.3.0
>            Reporter: Thomas Scott
>            Priority: Minor
>              Labels: newbie, ramp-up
>
> Data stored in Unicode UTF-8 can contain the Byte Order Mark (BOM) (Hex values "ef bb bf") at the beginning of the file. This is ignored in Hive but in Impala can cause the first field to be misrepresented. A good example of this is if the first column is of type timestamp. Impala will show this as null even though the data is valid in Hive.
> Steps to reproduce:
> In Hive:
> CREATE EXTERNAL TABLE IF NOT EXISTS test_table (col1 timestamp) LOCATION '/tmp/test_table'
> Then into the /tmp/test_table directory write a file with a BOM. I use vim for this as below:
> echo '2010-01-01 00:00:00.000' > foo
> vim -e -s -c ':set bomb' -c ':wq' foo
> SELECT * FROM test_table
> Will display the timestamp in Hive and NULL in Impala.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org