You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/08/31 13:03:00 UTC

[jira] [Work logged] (HIVE-22622) Hive allows to create a struct with duplicate attribute names

     [ https://issues.apache.org/jira/browse/HIVE-22622?focusedWorklogId=476568&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476568 ]

ASF GitHub Bot logged work on HIVE-22622:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/Aug/20 13:02
            Start Date: 31/Aug/20 13:02
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #1446:
URL: https://github.com/apache/hive/pull/1446


   ### What changes were proposed in this pull request?
   Add a check for duplicated struct field identifiers and throw SemanticException with customized error message when found.
   
   ### Why are the changes needed?
   Creating a table with a struct type column with duplicate field identifier and inserting records is allowed but later when querying from the table we cannot distinguish between the attributes of the struct has the same identifier.
   In some cases (depending on table serde format) the query may fails. See jira for details.
   
   ### Does this PR introduce _any_ user-facing change?
   Introduce new error code and message. Example:
   ```
   FAILED: SemanticException [Error 10423]: Struct field is not unique: id
   ```
   
   ### How was this patch tested?
   1. Create new negative test:
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestNegativeCliDriver -Dqfile=struct_field_uniqueness.q -pl itests/qtest -Pitests
   ```
   
   2. Reproduce query failure
   ```
   CREATE TABLE person
   (
       `id`      int,
       `address` struct<number:int,street:string,number:int>
   )
       ROW FORMAT SERDE
           'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
       STORED AS INPUTFORMAT
           'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
           OUTPUTFORMAT
               'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
   
   INSERT INTO person
   VALUES (1, named_struct('number', 61, 'street', 'Terrasse', 'number', 62));
   INSERT INTO person
   VALUES (2, named_struct('number', 51, 'street', 'Terrasse', 'number', 52));
   
   SELECT address.number FROM person;
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 476568)
    Remaining Estimate: 0h
            Time Spent: 10m

> Hive allows to create a struct with duplicate attribute names
> -------------------------------------------------------------
>
>                 Key: HIVE-22622
>                 URL: https://issues.apache.org/jira/browse/HIVE-22622
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Denys Kuzmenko
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When you create at table with a struct with twice the same attribute name, hive allow you to create it.
> create table test_struct( duplicateColumn struct<id:int, id:int>);
> You can insert data into it :
> insert into test_struct select named_struct("id",1,"id",1);
> But you can not read it :
> select * from test_struct;
> Return : java.io.IOException: java.io.IOException: Error reading file: hdfs://.../test_struct/delta_0000001_0000001_0000/bucket_00000 ,
> We can create and insert. but fail on read the Struct part of the tables. We can still read all other columns (if we have more than one) but not the struct anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)