You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/09 11:22:26 UTC

[GitHub] [iceberg] cccs-jc opened a new issue, #5004: Recommended catalog implementation and known limitations

cccs-jc opened a new issue, #5004:
URL: https://github.com/apache/iceberg/issues/5004

   Iceberg supports various catalog. Hive seems to be the most mature however to simplify our deployments we have been using the Hadoop catalog. We are now considering switching to the JDBC catalog.
   
   Are there differences or limitations between these catalogs. We're particularly interested in statistics. When using the Hive catalog you can generate statistics which spark exploits. I believe this is not available to the Hadoop catalog. Is it available to the JDBC catalog?
   
   Seeking recommendations of which catalog implementation to use Hive, Hadoop, JDBC catalog ?
   
   https://github.com/apache/iceberg/issues/2527
   Thanks
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #5004: Recommended catalog implementation and known limitations

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #5004:
URL: https://github.com/apache/iceberg/issues/5004#issuecomment-1151336807

   Pretty sure the Hive statistics the HMS can store won't be used for Iceberg related tables. All of our stats are coming via the DataSource we provide so should be identical for all catalogs. The results of commands like "Analayze" shouldn't work properly on Iceberg tables I think ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5004: Recommended catalog implementation and known limitations

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #5004:
URL: https://github.com/apache/iceberg/issues/5004#issuecomment-1363458302

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #5004: Recommended catalog implementation and known limitations

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #5004: Recommended catalog implementation and known limitations
URL: https://github.com/apache/iceberg/issues/5004


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #5004: Recommended catalog implementation and known limitations

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #5004:
URL: https://github.com/apache/iceberg/issues/5004#issuecomment-1152138128

   > For Iceberg tables, the statistics in Hive / HMS wouldn't be used.
   
   I think this is correct for Spark, but **Hive queries are using the HMS statistics**.
   
   With the Hive 4.0.0 integration we still store the Hive statistics in HMS when the data is inserted through Hive. These statistics are still used for Hive query optimizations. They correct in the sense that the basic statistics are update whenever the table is updated through the HiveCatalog, and to column statistics are invalidated if the table is update by another engine.
   
   That said - with Puffin in place - it should be good to get rid of these duplicate statistics and start to use the correct Iceberg statistics cross engine. There are plans to do that but this is further down the road.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #5004: Recommended catalog implementation and known limitations

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #5004:
URL: https://github.com/apache/iceberg/issues/5004#issuecomment-1151647936

   @RussellSpitzer is correct. For Iceberg tables, the statistics in Hive / HMS wouldn't be used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5004: Recommended catalog implementation and known limitations

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #5004:
URL: https://github.com/apache/iceberg/issues/5004#issuecomment-1341784764

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org