You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/13 18:11:30 UTC

[GitHub] [iceberg] jackye1995 opened a new pull request #1608: add Glue support for HiveCatalog

jackye1995 opened a new pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608


   - add the Glue implementation of Hive (v2) client, which is forked from the [EMR open source implementation](https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore)
   - dynamically load Hive client implementation in `HiveClientPool`
   - use DynamoDB for the locking support missing in Glue
   - add a basic site page about using Iceberg in cloud


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #1608: add Glue support for HiveCatalog

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608#issuecomment-709690136


   > I've been talking with @jackye1995 in the Iceberg channel. Just to update anyone following here, my main concern is that this is a huge patch because it contains the implementation of Hive's thrift API for Glue. Jack is going to pare down the classes required so that we can see what is actually necessary for that approach and we can decide whether to build off of `BaseMetastoreClientOperations` or use Hive after that.
   > 
   > Also, this is a draft so we can look at the whole thing, but we will probably want to split it into multiple PRs. For example, the changes to allow injecting a different Hive client could be a stand-alone PR.
   
   I talked with a few folks today regarding the best way to go for the changes, and what I will do is the following PRs:
   1. `GlueCatalog` that directly implements the `Catalog` interface, and all the table operations
   2. add Dynamo lock table for catalog commit
   3. a dynamic loader of Hive client in `HiveClientPool` so that EMR users can switch the Hive client implementation if they want, no class of the client impl will be added in Iceberg.
   4. Spark and Flink integration points
   5. documentations
   
   Please let me know if there are any further concerns, otherwise I will close this PR later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1608: add Glue support for HiveCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608#issuecomment-709616189


   I've been talking with @jackye1995 in the Iceberg channel. Just to update anyone following here, my main concern is that this is a huge patch because it contains the implementation of Hive's thrift API for Glue. Jack is going to pare down the classes required so that we can see what is actually necessary for that approach and we can decide whether to build off of `BaseMetastoreClientOperations` or use Hive after that.
   
   Also, this is a draft so we can look at the whole thing, but we will probably want to split it into multiple PRs. For example, the changes to allow injecting a different Hive client could be a stand-alone PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #1608: add Glue support for HiveCatalog

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608#issuecomment-708930754


   > Hi @jackye1995. Thanks for taking this on. Have you seen this PR for integrating Nessie with Iceberg? I believe that the idea there is _partially_ that Nessie would also allow for AWS Glue to be used. #1587
   > 
   > However, by no means do I intend to say that this PR should not be moved forward. I think this is a valuable contribution as many people likely use AWS Glue.
   
   Yeah I read about that project a few days ago. I think they can coexist, and I am focusing on the use case for people who only need Glue + Iceberg. The patch here will be largely simplified based on community feedback, so I don't see any conflicts for continuing with both approaches at the same time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #1608: add Glue support for HiveCatalog

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608#issuecomment-709692251


   > Looks like a big patch, would be better to understand if we have a short doc to describe the core things.
   
   Sure I can add a short doc, and as I replied with Ryan, I will separate things to small patches for actual contribution. Let me close this PR, and I will attach a doc after I resubmit a smaller PR, thank you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 closed pull request #1608: add Glue support for HiveCatalog

Posted by GitBox <gi...@apache.org>.
jackye1995 closed pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #1608: add Glue support for HiveCatalog

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608#issuecomment-709692340


   Will contribute in smaller PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #1608: add Glue support for HiveCatalog

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608#issuecomment-710221106


   Hey @jackye1995 this is pretty exciting. Please cc me when the other PRs are submitted. Especially interested in the dynamo stuff and if there are any synergies w/ Nessie


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #1608: add Glue support for HiveCatalog

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608#issuecomment-708983340


   Looks like a big patch,   would be better to understand if we have a short doc to describe the core things.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #1608: add Glue support for HiveCatalog

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #1608:
URL: https://github.com/apache/iceberg/pull/1608#issuecomment-708906044


   Hi @jackye1995. Thanks for taking this on. Have you seen this PR for integrating Nessie with Iceberg? I believe that the idea there is _partially_ that Nessie would also allow for AWS Glue to be used. https://github.com/apache/iceberg/pull/1587
   
   However, by no means do I intend to say that this PR should not be moved forward. I think this is a valuable contribution as many people likely use AWS Glue.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org