You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Bharath Krishna (Jira)" <ji...@apache.org> on 2022/09/21 23:25:00 UTC

[jira] [Comment Edited] (HIVE-20693) Case-sensitivity for column names when reading from ORC

    [ https://issues.apache.org/jira/browse/HIVE-20693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607999#comment-17607999 ] 

Bharath Krishna edited comment on HIVE-20693 at 9/21/22 11:24 PM:
------------------------------------------------------------------

The workaround to set orc.force.positional.evolution=true; works fine for select queries.

 

But I still find an issue :

In the below example, assume userId is a field with camel case.

 

{{ALTER table orc_table SET TBLPROPERTIES('orc.force.positional.evolution'='true');}}

{{Now if you run:}}

{{select count\(*\) from  orc_table where date_key='2022-09-21' and userid IS NULL limit 5; }}

{{It still returns a non-zero value, actually returns count of all rows in the table.}}

 

But

select * from  orc_table where date_key='2022-09-21' and userid IS NULL limit 5; 

Returns no results.

 

So the select query works as expected but the count\(*\) still has issues, when querying IS NULL on the camel case column.



Can something be done to fix this count issue ?


was (Author: bharos92):
The workaround to set orc.force.positional.evolution=true; works fine for select queries.

 

But I still find an issue :

In the below example, assume userId is a field with camel case.

 

{{ALTER table orc_table SET TBLPROPERTIES('orc.force.positional.evolution'='true');}}

{{Now if you run:}}

{{select count(*) from  orc_table where date_key='2022-09-21' and userid IS NULL limit 5; }}

{{It still returns a non-zero value, actually returns count of all rows in the table.}}

 

But

select * from  orc_table where date_key='2022-09-21' and userid IS NULL limit 5; 

Returns no results.

 

So the select query works as expected but the count(*) still has issues, when querying IS NULL on the camel case column.



Can something be done to fix this count issue ?

> Case-sensitivity for column names when reading from ORC
> -------------------------------------------------------
>
>                 Key: HIVE-20693
>                 URL: https://issues.apache.org/jira/browse/HIVE-20693
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, ORC
>    Affects Versions: 2.3.2
>            Reporter: Alexandre Crayssac
>            Priority: Major
>
> Hello everyone,
> I observed a different behavior between version 1.2.1 and 2.3.2 (that's the only two versions I've been able to test).
> When creating an external table pointing to ORC files and having upper cased column names in the ORC files metadata I'm able to read the data on 1.2.1 but not on 2.3.2.
> I tested with both upper cased and lower cased column names in my CREATE TABLE statement and it does not work in both cases. Looks like normal since column names are normalized to lower case in Hive.
> So, I would like to know if this is a feature or a bug in Hive 2.3.2 ?
> In fact, if this is a feature it would be impossible to have upper case column names in ORC files with Hive 2.3.2.
> Please, let me know if you need more informations.
> Kind regards,
> Alexandre



--
This message was sent by Atlassian Jira
(v8.20.10#820010)