You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/12 13:51:34 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue, #2209: [datafusion-contrib] Support Stand Alone Hive MetaStore

tustvold opened a new issue, #2209:
URL: https://github.com/apache/arrow-datafusion/issues/2209

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   Whilst [AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html) can act as a Hive compatible metastore, and #2206 tracks adding support for it, this is of course not an option for people running outside of AWS. It would be good to provide some story for these users.
   
   it isn't actually a Hive metastore, and has a similar but different thrift-based API which can be found [here](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift).
   
   **Describe the solution you'd like**
   
   I'm not very familiar with the Hive ecosystem, but it would appear that when people refer to the Hive metastore, they are referring to a client-side implementation, as opposed to a some service like Zookeeper, or Kafka. 
   
   I found this exceptionally confusing, but there appear to be three options:
   
   1. An implementation using an embedded Derby database
   2. An implementation using a remote JDBC database
   3. An implementation using a remote Hive server
   
   I don't think 1. is useful for DataFusion's purposes, and 2. is likely tricky to implement without running into subtle incompatibilities with the Java version, I therefore think 3. is likely the best option. From my understanding this effectively takes the implementation of 2. and runs it as a network service with a [thrift API](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift). This should be relatively straightforward to support and can likely share a lot of code with any AWS Glue support.
   
   **Describe alternatives you've considered**
   
   We could not support this
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org