You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gergely Fürnstáhl (Jira)" <ji...@apache.org> on 2022/06/08 12:17:00 UTC
[jira] [Comment Edited] (HIVE-26298) Selecting complex types on migrated iceberg table does not work

    [ https://issues.apache.org/jira/browse/HIVE-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551582#comment-17551582 ] 

Gergely Fürnstáhl edited comment on HIVE-26298 at 6/8/22 12:16 PM:
-------------------------------------------------------------------

Investigated a bit more, without schema.name-mapping.default its still uses the same iceberg API and works correctly (for the given query's "subschema"):

[https://github.com/apache/iceberg/blob/90225d6c9413016d611e2ce5eff37db1bc1b4fc5/orc/src/main/java/org/apache/iceberg/orc/OrcIterable.java#L87]
{code:java}
Step completed: "thread=55e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141", org.apache.iceberg.orc.OrcIterable.iterator(), line=81 bci=1755e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141[1] print nameMapping
 nameMapping = null
55e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141[1] print schema
 schema = "table {
  4: int_to_array_array_map: optional map<int, list<list<int>>>
}"

Step completed: "thread=55e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141", org.apache.iceberg.orc.OrcIterable.iterator(), line=89 bci=6155e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141[1] print nameMapping
 nameMapping = "[
  ([int_to_array_array_map] -> 4, [ ([key] -> 8), ([value] -> 9, [ ([element] -> 10, [ ([element] -> 11) ]) ]) ])
]"
 {code}
Compared to when the table property namemapping is generated:

[https://github.com/apache/hive/blob/c9e12d84ca0c72732cd24aa0b160f474309f5de9/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java#L325]
{code:java}
Step completed: "thread=HiveServer2-Background-Pool: Thread-757", org.apache.iceberg.mr.hive.HiveIcebergMetaHook.preAlterTable(), line=326 bci=398HiveServer2-Background-Pool: Thread-757[1] print preAlterTableProperties
 preAlterTableProperties = "org.apache.iceberg.mr.hive.HiveIcebergMetaHook$PreAlterTableProperties@105f9694"
HiveServer2-Background-Pool: Thread-757[1] print preAlterTableProperties.schema
 preAlterTableProperties.schema = "table {
  1: int_primitive: optional int
  2: int_array: optional list<int>
  4: int_array_array: optional list<list<int>>
  7: int_to_array_array_map: optional map<int, list<list<int>>>
}"
HiveServer2-Background-Pool: Thread-757[1] print nameMapping
 nameMapping = "[
  ([int_primitive] -> 1)
  ([int_array] -> 2, [ ([element] -> 3) ])
  ([int_array_array] -> 4, [ ([element] -> 6, [ ([element] -> 5) ]) ])
  ([int_to_array_array_map] -> 7, [ ([key] -> 10), ([value] -> 11, [ ([element] -> 9, [ ([element] -> 8) ]) ]) ])
]"
 {code}
Seems like the top level field_ids are already wrong when preAlterTableProperties.schema is created.

Edit:
And the traversal is different too


was (Author: JIRAUSER283863):
Investigated a bit more, without schema.name-mapping.default its still uses the same iceberg API and works correctly (for the given query's "subschema"):

[https://github.com/apache/iceberg/blob/90225d6c9413016d611e2ce5eff37db1bc1b4fc5/orc/src/main/java/org/apache/iceberg/orc/OrcIterable.java#L87]


{code:java}
Step completed: "thread=55e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141", org.apache.iceberg.orc.OrcIterable.iterator(), line=81 bci=1755e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141[1] print nameMapping
 nameMapping = null
55e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141[1] print schema
 schema = "table {
  4: int_to_array_array_map: optional map<int, list<list<int>>>
}"

Step completed: "thread=55e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141", org.apache.iceberg.orc.OrcIterable.iterator(), line=89 bci=6155e54b9c-8458-49f3-b345-a71abe886c30 HiveServer2-Handler-Pool: Thread-141[1] print nameMapping
 nameMapping = "[
  ([int_to_array_array_map] -> 4, [ ([key] -> 8), ([value] -> 9, [ ([element] -> 10, [ ([element] -> 11) ]) ]) ])
]"
 {code}
Compared to when the table property namemapping is generated:

[https://github.com/apache/hive/blob/c9e12d84ca0c72732cd24aa0b160f474309f5de9/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java#L325]
{code:java}
Step completed: "thread=HiveServer2-Background-Pool: Thread-757", org.apache.iceberg.mr.hive.HiveIcebergMetaHook.preAlterTable(), line=326 bci=398HiveServer2-Background-Pool: Thread-757[1] print preAlterTableProperties
 preAlterTableProperties = "org.apache.iceberg.mr.hive.HiveIcebergMetaHook$PreAlterTableProperties@105f9694"
HiveServer2-Background-Pool: Thread-757[1] print preAlterTableProperties.schema
 preAlterTableProperties.schema = "table {
  1: int_primitive: optional int
  2: int_array: optional list<int>
  4: int_array_array: optional list<list<int>>
  7: int_to_array_array_map: optional map<int, list<list<int>>>
}"
HiveServer2-Background-Pool: Thread-757[1] print nameMapping
 nameMapping = "[
  ([int_primitive] -> 1)
  ([int_array] -> 2, [ ([element] -> 3) ])
  ([int_array_array] -> 4, [ ([element] -> 6, [ ([element] -> 5) ]) ])
  ([int_to_array_array_map] -> 7, [ ([key] -> 10), ([value] -> 11, [ ([element] -> 9, [ ([element] -> 8) ]) ]) ])
]"
 {code}
Seems like the top level field_ids are already wrong when preAlterTableProperties.schema is created.

> Selecting complex types on migrated iceberg table does not work
> ---------------------------------------------------------------
>
>                 Key: HIVE-26298
>                 URL: https://issues.apache.org/jira/browse/HIVE-26298
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gergely Fürnstáhl
>            Priority: Major
>         Attachments: 00001-a5d522f4-a065-44e6-983b-ba66596b4332.metadata.json
>
>
> I am working on implementing NameMapping in Impala (mainly replicating Hive's functionality) and ran into the following issue:
> {code:java}
> CREATE TABLE array_demo
> (
>   int_primitive INT,
>   int_array ARRAY<INT>,
>   int_array_array ARRAY<ARRAY<INT>>,
>   int_to_array_array_Map MAP<INT,ARRAY<ARRAY<INT>>>
> )
> STORED AS ORC;
> INSERT INTO array_demo values (0, array(1), array(array(2), array(3,4)), map(5,array(array(6),array(7,8))));
> select * from array_demo;
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
> | array_demo.int_primitive  | array_demo.int_array  | array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
> | 0                         | [1]                   | [[2],[3,4]]                 | {5:[[6],[7,8]]}                    |
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
>  {code}
> Converting to iceberg
>  
>  
> {code:java}
> ALTER TABLE array_demo SET TBLPROPERTIES ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')
> select * from array_demo;
> INFO  : Compiling command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe): select * from array_demo
> INFO  : No Stats for default@array_demo, Columns: int_primitive, int_array, int_to_array_array_map, int_array_array
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:array_demo.int_primitive, type:int, comment:null), FieldSchema(name:array_demo.int_array, type:array<int>, comment:null), FieldSchema(name:array_demo.int_array_array, type:array<array<int>>, comment:null), FieldSchema(name:array_demo.int_to_array_array_map, type:map<int,array<array<int>>>, comment:null)], properties:null)
> INFO  : Completed compiling command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe); Time taken: 0.036 seconds
> INFO  : Executing command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe): select * from array_demo
> INFO  : Completed executing command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe); Time taken: 0.0 seconds
> INFO  : OK
> Error: java.io.IOException: java.lang.IllegalArgumentException: Can not promote MAP type to INTEGER (state=,code=0)
> select int_primitive from array_demo;
> +----------------+
> | int_primitive  |
> +----------------+
> | 0              |
> +----------------+
> 1 row selected (0.088 seconds)
>  {code}
> Removing schema.name-mapping.default solves it
> {code:java}
> ALTER TABLE array_demo UNSET TBLPROPERTIES ('schema.name-mapping.default');
> select * from array_demo;
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
> | array_demo.int_primitive  | array_demo.int_array  | array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
> | 0                         | [1]                   | [[2],[3,4]]                 | {5:[[6],[7,8]]}                    |
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
>  {code}
> Possible cause:
>  
> The name mapping generated and pushed into schema.name-mapping.default is different from the name mapping in the schema in the metadata.json (attached it)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)