You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2022/10/21 07:21:01 UTC

[jira] [Updated] (HIVE-3677) Encoding Issue - ISO-8859-1

     [ https://issues.apache.org/jira/browse/HIVE-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stamatis Zampetakis updated HIVE-3677:
--------------------------------------
    Fix Version/s:     (was: 0.8.1)

I cleared the fixVersion field since this ticket is still open. Please review this ticket and if the fix is already committed to a specific version please set the version accordingly and mark the ticket as RESOLVED.

According to the [JIRA guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] the fixVersion should be set only when the issue is resolved/closed.

> Encoding Issue - ISO-8859-1
> ---------------------------
>
>                 Key: HIVE-3677
>                 URL: https://issues.apache.org/jira/browse/HIVE-3677
>             Project: Hive
>          Issue Type: Bug
>          Components: Configuration, Import/Export
>    Affects Versions: 0.8.1
>         Environment: Amazon EMR with Hive (Hive 0.8.1 and haddop 1.0.3)
>            Reporter: Sergio Kameoka
>            Priority: Major
>
> We’ve created a very simple example using Amazon EMR with Hive which is basically create a single table with Hive and load some data inside this table. Below you’ll find the code that has been used:
> //CREATE TABLE CODE
> CREATE TABLE sampletable (
> valorstring STRING, valordecimal DOUBLE)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe'
> WITH SERDEPROPERTIES (
> 'serialization.format'='org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol',
> 'quote.delim'='("|\\[|\\])',
> 'field.delim'=' ',
> 'serialization.null.format'='-')
> STORED AS TEXTFILE;
>  
> //LOAD DATA CODE
> LOAD DATA LOCAL INPATH '/tmp/sampletable.txt' OVERWRITE INTO TABLE sampletable;
> Here is the text file content that we are using to load the data:
> /tmp/sampletable.txt
> "Exemplo de texto com acentuação" 90,15
> "Exemplo de texto com acentuação" 80.15
> The problem that we are facing seems to be with the enconding that is been used in Hive configuration. Seems to me that it is been used UTF-8 but for Brazilian format we’ll need to use ISO-8859-1.
> In the example above, when the data is loaded inside the table and we perform a simple select (Select * from sampletable) the text with accentuation is returned totally wrong and the double value with comma is returned as null.
> We’ve already changed the variable LANG in enviroment and Hive variables with SET, but it doesn’t work so far.
> Thank you in advance!!!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)