You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/22 14:03:10 UTC

[GitHub] [iceberg] junsionzhang opened a new issue #2357: hive cannot get data from iceberg table

junsionzhang opened a new issue #2357:
URL: https://github.com/apache/iceberg/issues/2357


   I have setup the hive ,flink and iceberg
   1)create iceberg table using flink
   CREATE CATALOG hive_catalog WITH (
     'type'='iceberg',
     'catalog-type'='hive',
     'uri'='thrift://192.168.100.101:9083',
     'clients'='5',
     'property-version'='1',
     'warehouse'='hdfs://192.168.100.101:9000/hive/warehouse/iceberg'
   );
   CREATE TABLE hive_catalog.iceberg.sample (
       id BIGINT COMMENT 'unique id',
       data STRING
   );
   2) insert data into sample
   insert into sample values(1,'a'),(2,'b');
   insert into sample values(3,'c'),(4,'d');
   3) lauch hive and execute sql
   when doing 'show tables' ,it works fine and I can see the table sample
   **but when doing 'select * from sample', no more data can be seen.**
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-804720129


   Sorry @junsionzhang, I was not clear enough:
   - Please provide the output for `set iceberg.mr.catalog` - I suspect the catalog is not set in your Hive environment.
   
   Base on the output of the `DESCRIBE` I see that when the table is created the `hive.engine.enabled` was not set to true, so the `SerDe`, `InputFormat`, `OutputFormat` is not set correctly.
   
   Please check the https://iceberg.apache.org/hive/ page for more details.
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] junsionzhang commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
junsionzhang commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-807874241


   > Sorry @junsionzhang, I was not clear enough:
   > 
   > * Please provide the output for `set iceberg.mr.catalog` - I suspect the catalog is not set in your Hive environment.
   > 
   > Base on the output of the `DESCRIBE` I see that when the table is created the `hive.engine.enabled` was not set to true, so the `SerDe`, `InputFormat`, `OutputFormat` is not set correctly.
   > 
   > Please check the https://iceberg.apache.org/hive/ page for more details.
   > 
   > Thanks,
   > Peter
   
   Thank you so much.
   1)when I run 'SET iceberg.mr.catalog;' for the first time after luanching hive comannd ,it says ''iceberg.mr.catalog is undefined'
   but after  running 'SET iceberg.mr.catalog=hive' ,the command  'SET iceberg.mr.catalog;'  works fine and the ouput is 'iceberg.mr.catalog=hive'
   2) I do put the iceberg-hive-runtime-0.11.0.jar under $HIVE_HOME/lib
   3) what the SerDe, InputFormat, OutputFormat should be if everything is correct?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] junsionzhang edited a comment on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
junsionzhang edited a comment on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-804577577


   > @junsionzhang:
   > 
   > * Is the catalog type set correctly? Could you please run `set iceberg.mr.catalog`?
   > * What is the output of the `DESCRIBE FORMATTED sample` from Hive?
   > 
   > Thanks,
   > Peter
   
   1) "set iceberg.mr.catalog=true;" works fine in hive 
   2) the `DESCRIBE FORMATTED sample` result is as follow
   
   Database:           	iceberg
   Owner:              	root
   CreateTime:         	Mon Mar 22 18:08:46 CST 2021
   LastAccessTime:     	Sun Jan 18 16:40:22 CST 1970
   Retention:          	2147483647
   Location:           	hdfs://192.168.100.101:9000/hive/warehouse/iceberg/iceberg.db/sample
   Table Type:         	EXTERNAL_TABLE
   Table Parameters:
   	EXTERNAL            	TRUE
   	metadata_location   	hdfs://192.168.100.101:9000/hive/warehouse/iceberg/iceberg.db/sample/metadata/00002-f60f3416-680e-41cc-9a85-ec11dd4b9642.metadata.json
   	numFiles            	1
   	previous_metadata_location	hdfs://192.168.100.101:9000/hive/warehouse/iceberg/iceberg.db/sample/metadata/00001-b50a0569-05c9-4655-9db8-0eed6fa5f866.metadata.json
   	table_type          	ICEBERG
   	totalSize           	826
   	transient_lastDdlTime	1616407726
   
   Storage Information
   SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   InputFormat:        	org.apache.hadoop.mapred.FileInputFormat
   OutputFormat:       	org.apache.hadoop.mapred.FileOutputFormat
   Compressed:         	No
   Num Buckets:        	0
   Bucket Columns:     	[]
   Sort Columns:       	[]
   Time taken: 0.648 seconds, Fetched: 30 row(s)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] junsionzhang commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
junsionzhang commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-811561451


   @pvary  thank you so much and I will try again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-808104730


   The issue you have linked is for older version of Iceberg. It should be possible to read the table from Hive with 0.11.0 version of Iceberg.
   
   Based on your answers I see two problems:
   1. When you are trying to read a table stored in HiveCatalog, you should set the catalog to hive. This could be set in the hive-site.xml, or before issuing the query you could issue the following command:
   ```
   SET iceberg.mr.catalog=hive
   ```
   2. When creating the table the `iceberg.engine.hive.enabled` should be set to true in the config. See the relevant part in the docs:
   > The value iceberg.engine.hive.enabled needs to be set to true and added to the Hive configuration file on the classpath of the application creating or modifying (altering, inserting etc.) the table.
   
   For checking 1. you can issue `set iceberg.mr.catalog` and if the output is different from `hive` you should set it manually for the session (as you have already done based on you last comment)
   
   For checking 2. you can issue `DESCRIBE FORMATTED <table_name>` and check if the SerDe, InputFormat, OutputFormat is set to the correct ones. If the table is created correctly then it should be something like this:
   ```
   SerDe Library: org.apache.iceberg.mr.hive.HiveIcebergSerDe
   InputFormat: org.apache.iceberg.mr.hive.HiveIcebergInputFormat
   OutputFormat: org.apache.iceberg.mr.hive.HiveIcebergOutputFormat
   ```
   
   This should be done at table creation. I am not too familiar how Flink creates the table but alternatively it could be done at creation / modification time by setting the `engine.hive.enabled` Iceberg table property to `true`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-814689614


   I think your issue is more Flink related than Hive related. Somehow we should find out how to enable this property when the table is created throuh Flink.
   
   As for your second question: Yes, you can create Hive tables backed by an Iceberg table using Hive sql DDL. There are some examples on the url above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] MINCWANG commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
MINCWANG commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-841821197


   @junsionzhang Hi, zhangjun.  How did you solve the problem that create iceberg table in the hive ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] junsionzhang commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
junsionzhang commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-804577577


   > @junsionzhang:
   > 
   > * Is the catalog type set correctly? Could you please run `set iceberg.mr.catalog`?
   > * What is the output of the `DESCRIBE FORMATTED sample` from Hive?
   > 
   > Thanks,
   > Peter
   
   1) "set iceberg.mr.catalog=true;" works fine in hive 
   2) the `DESCRIBE FORMATTED sample` result is as follow
   
   Database:           	iceberg
   Owner:              	root
   CreateTime:         	Mon Mar 22 18:08:46 CST 2021
   LastAccessTime:     	Sun Jan 18 16:40:22 CST 1970
   Retention:          	2147483647
   Location:           	hdfs://192.168.100.101:9000/hive/warehouse/iceberg/iceberg.db/sample
   Table Type:         	EXTERNAL_TABLE
   Table Parameters:
   	EXTERNAL            	TRUE
   	metadata_location   	hdfs://192.168.100.101:9000/hive/warehouse/iceberg/iceberg.db/sample/metadata/00002-f60f3416-680e-41cc-9a85-ec11dd4b9642.metadata.json
   	numFiles            	1
   	previous_metadata_location	hdfs://192.168.100.101:9000/hive/warehouse/iceberg/iceberg.db/sample/metadata/00001-b50a0569-05c9-4655-9db8-0eed6fa5f866.metadata.json
   	table_type          	ICEBERG
   	totalSize           	826
   	transient_lastDdlTime	1616407726
   
   # Storage Information
   SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   InputFormat:        	org.apache.hadoop.mapred.FileInputFormat
   OutputFormat:       	org.apache.hadoop.mapred.FileOutputFormat
   Compressed:         	No
   Num Buckets:        	0
   Bucket Columns:     	[]
   Sort Columns:       	[]
   Time taken: 0.648 seconds, Fetched: 30 row(s)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] junsionzhang edited a comment on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
junsionzhang edited a comment on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-814162662


   the hive-site.xml is as follows:
   <property>
       <name>iceberg.mr.catalog</name>
       <value>hive</value>
     </property>
     <property>
       <name>iceberg.engine.hive.enabled</name>
       <value>true</value>
     </property>
   when  both iceberg.mr.catalog and iceberg.engine.hive.enabled  can be confirm  in the hive command shell. but it still does not work, the formatted values are still not right:
   
   Storage Information
   SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   InputFormat:        	org.apache.hadoop.mapred.TextInputFormat
   OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-804111038


   @junsionzhang:
   - Is the catalog type set correctly? Could you please run `set iceberg.mr.catalog`?
   - What is the output of the `DESCRIBE FORMATTED sample` from Hive?
   
   Thanks,
   Peter
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] junsionzhang commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
junsionzhang commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-814162662


   the hive-site.xml is as follows:
   <property>
       <name>iceberg.mr.catalog</name>
       <value>hive</value>
     </property>
     <property>
       <name>iceberg.engine.hive.enabled</name>
       <value>true</value>
     </property>
   when  both iceberg.mr.catalog and iceberg.engine.hive.enabled  can be confirm  in the hive command shell. but it still does not work, the formatted values are still not right:
   
   # Storage Information
   SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   InputFormat:        	org.apache.hadoop.mapred.TextInputFormat
   OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] junsionzhang commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
junsionzhang commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-814203967


   > the hive-site.xml is as follows:
   > 
   > iceberg.mr.catalog
   > hive
   > 
   > 
   > iceberg.engine.hive.enabled
   > true
   > 
   > when both iceberg.mr.catalog and iceberg.engine.hive.enabled can be confirm in the hive command shell. but it still does not work, the formatted values are still not right:
   > 
   > Storage Information
   > SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   > InputFormat: org.apache.hadoop.mapred.TextInputFormat
   > OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   
   by the way , I am really puzzled by the offical website https://iceberg.apache.org/flink/#create-table,
   can I use the ordinary sql to  create the iceberg table ? how can i verify it ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] junsionzhang commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
junsionzhang commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-808017300


   > Sorry @junsionzhang, I was not clear enough:
   > 
   > * Please provide the output for `set iceberg.mr.catalog` - I suspect the catalog is not set in your Hive environment.
   > 
   > Base on the output of the `DESCRIBE` I see that when the table is created the `hive.engine.enabled` was not set to true, so the `SerDe`, `InputFormat`, `OutputFormat` is not set correctly.
   > 
   > Please check the https://iceberg.apache.org/hive/ page for more details.
   > 
   > Thanks,
   > Peter
   
   It seems the same question as  openinx. same steps ,same result. hadoop works but hive cannot  https://github.com/apache/iceberg/issues/1684#issuecomment-720339705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] MINCWANG commented on issue #2357: hive cannot get data from iceberg table

Posted by GitBox <gi...@apache.org>.
MINCWANG commented on issue #2357:
URL: https://github.com/apache/iceberg/issues/2357#issuecomment-841821197


   @junsionzhang Hi, zhangjun.  How did you solve the problem that create iceberg table in the hive ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org