You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/10 08:15:50 UTC

[GitHub] [iceberg] liubo1022126 opened a new issue #2576: (Hive) insert overwrite table xxx partition (pt='xxx') on iceberg table

liubo1022126 opened a new issue #2576:
URL: https://github.com/apache/iceberg/issues/2576


    I migrate tables from hive to iceberg use spark3,  so partition info in ddl is hidden.
   
   When I run 
   
   > insert overwrite table xxx  partition (pt='xxx’)
   
    on flink or spark sql-shell, it’s ok, but when I run it on hive sql-shell, I get a error like below: 
   
   > FAILED: ValidationFailureSemanticException table is not partitioned but partition spec exists: {pt=xxx}
   
   So what can I do for it, specify the partition in ddl ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] marton-bod commented on issue #2576: (Hive) insert overwrite table xxx partition (pt='xxx') on iceberg table

Posted by GitBox <gi...@apache.org>.

marton-bod commented on issue #2576:
URL: https://github.com/apache/iceberg/issues/2576#issuecomment-838004158


   @butaozhang Thanks for getting back to us. All the features developed in the `hive-iceberg-handler` module are tested only with the Tez execution engine, since MR has been officially deprecated for Hive4, so we're focusing only on Tez. (It's the opposite situation from the iceberg-hive-runtime, where only MR writes are currently working due to limitations of the Tez/Hive versions depended on by Iceberg). Can you please retry with Tez? Otherwise your setup seems correct. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] butaozhang commented on issue #2576: (Hive) insert overwrite table xxx partition (pt='xxx') on iceberg table

Posted by GitBox <gi...@apache.org>.

butaozhang commented on issue #2576:
URL: https://github.com/apache/iceberg/issues/2576#issuecomment-837665198


   Here append ice_tbl table structure:
   
   ```
   0: jdbc:hive2://hivenode:10000> show create table ice_tbl;
   +----------------------------------------------------+
   |                   createtab_stmt                   |
   +----------------------------------------------------+
   | CREATE EXTERNAL TABLE `ice_tbl`(                   |
   |   `i` int COMMENT 'from deserializer')             |
   | ROW FORMAT SERDE                                   |
   |   'org.apache.iceberg.mr.hive.HiveIcebergSerDe'    |
   | STORED BY                                          |
   |   'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'  |
   |                                                    |
   | LOCATION                                           |
   |   'hdfs://XXX/user/hive/warehouse/hiveice/ice_tbl' |
   | TBLPROPERTIES (                                    |
   |   'bucketing_version'='2',                         |
   |   'engine.hive.enabled'='true',                    |
   |   'external.table.purge'='TRUE',                   |
   |   'last_modified_by'='hive',                       |
   |   'last_modified_time'='1620698724',               |
   |   'metadata_location'='hdfs://XXX/user/hive/warehouse/hiveice/ice_tbl/metadata/00001-f664af14-126c-4e5e-9e3f-3b7fa356ff48.metadata.json',  |
   |   'previous_metadata_location'='hdfs://XXX/user/hive/warehouse/hiveice/ice_tbl/metadata/00000-33a14d2d-ec8b-4e6c-87df-c87fac954e9e.metadata.json',  |
   |   'table_type'='ICEBERG',                          |
   |   'transient_lastDdlTime'='1620698724')            |
   +----------------------------------------------------+
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #2576: (Hive) insert overwrite table xxx partition (pt='xxx') on iceberg table

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on issue #2576:
URL: https://github.com/apache/iceberg/issues/2576#issuecomment-836816786


   Copying @marton-bod 's Email from the Dev Mailing list for future lookup
   
   > 
   > Hi,
   > 
   > Thanks for reaching out! Unfortunately insert overwrite commands are not currently supported by the Hive engine in upstream Iceberg, since it's a feature difficult to implement without touching the Hive codebase as well. Currently we only have support for regular insert queries (using the mr engine, not tez).
   > 
   > We have recently created a new module in upstream Hive called "hive-iceberg-handler", which is supposed to be an alternative over time to using the iceberg-hive-runtime jar, providing an extended feature set and performance improvements. Support for insert overwrites has already been implemented there. Please note the module is still experimental and some of these newer features would require you to run Hive from the master branch due to core Hive API changes, but it might be a good candidate for experimentation. Here's the link for the module if you'd like to check out our ongoing work: https://github.com/apache/hive/tree/master/iceberg/iceberg-handler
   > 
   > As for the specific error message you're getting, it's because the HMS by design sees the Iceberg table as unpartitioned (to enable flexible partitioning down the line and due to how the Hive query planner works), even though the underlying Iceberg table is actually partitioned - hence the upstream error during the compilation phase.
   > 
   > Best,
   > Marton


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] butaozhang edited a comment on issue #2576: (Hive) insert overwrite table xxx partition (pt='xxx') on iceberg table

Posted by GitBox <gi...@apache.org>.

butaozhang edited a comment on issue #2576:
URL: https://github.com/apache/iceberg/issues/2576#issuecomment-837659767


   Hi @RussellSpitzer @marton-bod . I have test "Hive-iceberg-handler" module of  Hive master branch. Here are the test steps:
   1, **Create the ice table using hive beelin:**
     CREATE EXTERNAL TABLE ice_tbl (i int) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
   2, **Make a insert DDL:**
     set hive.execution.engine=mr;
     insert overwrite table ice_tbl values(1);
   3,**Make a select statement of ice_tbl, but found no data:**
   __0: jdbc:hive2://hivenode:10000> select * from ice_tbl;
   
   ```
   INFO  : Compiling command(queryId=hive_20210511100644_a0676d7a-11ac-4c69-8b03-ee081ec33f70): select * from ice_tbl
   INFO  : No Stats for default@ice_tbl, Columns: i
   INFO  : Semantic Analysis Completed (retrial = false)
   INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:ice_tbl.i, type:int, comment:null)], properties:null)
   INFO  : Completed compiling command(queryId=hive_20210511100644_a0676d7a-11ac-4c69-8b03-ee081ec33f70); Time taken: 0.28 seconds
   INFO  : Concurrency mode is disabled, not creating a lock manager
   INFO  : Executing command(queryId=hive_20210511100644_a0676d7a-11ac-4c69-8b03-ee081ec33f70): select * from ice_tbl
   INFO  : Completed executing command(queryId=hive_20210511100644_a0676d7a-11ac-4c69-8b03-ee081ec33f70); Time taken: 0.0 seconds
   +------------+
   | ice_tbl.i  |
   +------------+
   +------------+
   No rows selected (0.36 seconds)__
   
   ```
   
   So, Why is no data returned?  Am I missing some hive_iceberg configuration?  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] butaozhang commented on issue #2576: (Hive) insert overwrite table xxx partition (pt='xxx') on iceberg table

Posted by GitBox <gi...@apache.org>.

butaozhang commented on issue #2576:
URL: https://github.com/apache/iceberg/issues/2576#issuecomment-837659767


   Hi @RussellSpitzer @marton-bod . I have test "Hive-iceberg-handler" module of  Hive master branch. Here are the test steps:
   1, **Create the ice table using hive beelin:**
     CREATE EXTERNAL TABLE ice_tbl (i int) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
   2**, Make a insert DDL:**
     set hive.execution.engine=mr;
     insert overwrite table ice_tbl values(1);
   3,**Make a select statement of ice_tbl, but found no data:**
   __0: jdbc:hive2://hivenode:10000> select * from ice_tbl;
   
   INFO  : Compiling command(queryId=hive_20210511100644_a0676d7a-11ac-4c69-8b03-ee081ec33f70): select * from ice_tbl
   INFO  : No Stats for default@ice_tbl, Columns: i
   INFO  : Semantic Analysis Completed (retrial = false)
   INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:ice_tbl.i, type:int, comment:null)], properties:null)
   INFO  : Completed compiling command(queryId=hive_20210511100644_a0676d7a-11ac-4c69-8b03-ee081ec33f70); Time taken: 0.28 seconds
   INFO  : Concurrency mode is disabled, not creating a lock manager
   INFO  : Executing command(queryId=hive_20210511100644_a0676d7a-11ac-4c69-8b03-ee081ec33f70): select * from ice_tbl
   INFO  : Completed executing command(queryId=hive_20210511100644_a0676d7a-11ac-4c69-8b03-ee081ec33f70); Time taken: 0.0 seconds
   +------------+
   | ice_tbl.i  |
   +------------+
   +------------+
   No rows selected (0.36 seconds)__
   
   
   So, Why is no data returned?  Am I missing some hive_iceberg configuration?  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] butaozhang commented on issue #2576: (Hive) insert overwrite table xxx partition (pt='xxx') on iceberg table

Posted by GitBox <gi...@apache.org>.

butaozhang commented on issue #2576:
URL: https://github.com/apache/iceberg/issues/2576#issuecomment-838045977


   @marton-bod Thanks for giving guidance. I test and it works!  Here is the correct config:
   ```
   set hive.vectorized.execution.enabled=false;
   set hive.execution.engine=tez;
   ```
   
   By the way, can SparkSQL(version3.0)  read hive_Ice tables? I try to add jar hive-iceberg-handler-4.0.0-SNAPSHOT.jar to sparkSQL classpath and make ddl query of the hive_ice table, but didn't succeed. Error like the following:
   ```
   ./spark-sql --jars /data/hive-iceberg-handler-4.0.0-SNAPSHOT.jar
   
   spark-sql> select * from ice_tbl;
   21/05/11 16:03:37 ERROR thriftserver.SparkSQLDriver: Failed in [select * from ice_tbl]
   java.lang.ExceptionInInitializerError
           at org.apache.iceberg.mr.hive.HiveIcebergSerDe.initialize(HiveIcebergSerDe.java:135)
           at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
           at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:450)
           at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:437)
           at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
           at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
           at org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:641)
           at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:624)
           at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree2$1(HiveClientImpl.scala:431)
           at org.apache.spark.sql.hive.client.HiveClientImpl.convertHiveTableToCatalogTable(HiveClientImpl.scala:430)
           at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:424)
           at scala.Option.map(Option.scala:230)
   
           Caused by: java.lang.RuntimeException: Cannot find method: get
           at org.apache.iceberg.common.DynMethods$Builder.build(DynMethods.java:454)
           at org.apache.iceberg.common.DynMethods$Builder.buildStatic(DynMethods.java:522)
           at org.apache.iceberg.mr.hive.serde.objectinspector.IcebergObjectInspector.<clinit>(IcebergObjectInspector.java:46)
           ... 111 more
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org