You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by GitBox <gi...@apache.org> on 2022/08/17 11:26:06 UTC

[GitHub] [incubator-kyuubi] yikf opened a new issue, #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

yikf opened a new issue, #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the [issues](https://github.com/apache/incubator-kyuubi/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Describe the feature
   
   Hey everyone,
   
   In a modern database architecture, users may have a strong need for federated queries. Since there are a large number of Hive warehouse in the history database, we tried to implement the Hive V2 Datasource based on Spark Datasource V2 to meet this need.  for the discussion, see :https://lists.apache.org/thread/fq8ywr58rzf9bycflj1q4fl1xyz2rq2w
   
   To do this, we have some subtask as follow:
   - initial implement
   - support read code path
   - support write code path
   - document
   - support hive client pool
   - support convert DatasourceV2Relation(a.k.a HiveRelationV2) to LocalRelation to improbe performance
   
   
   
   ### Motivation
   
   _No response_
   
   ### Describe the solution
   
   _No response_
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yikf commented on issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
yikf commented on issue #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259#issuecomment-1219185523

   My idea is to plan for iterative support, especially after Apache Spark 3.3 is released, and i'll add `support more V2 API` to roadmap, what do you think


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] github-actions[bot] commented on issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259#issuecomment-1217893414

   Hello @yikf,
   Thanks for finding the time to report the issue!
   We really appreciate the community's efforts to improve Apache Kyuubi (Incubating).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] pan3793 commented on issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
pan3793 commented on issue #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259#issuecomment-1219243969

   Based on @yikf 's comments and given it's a future-proof component, I think we can make it support Spark 3.3+ only. 
   For example, the currently Spark Hive writer has a long-standing issue, it generates too many small files. `RequiresDistributionAndOrdering` has been introduced in SPARK-33779 since Spark 3.2, and got enhanced in SPARK-37523 which will be available in Spark 3.4, then we can leverage it to solve the small file issue if we don't support 3.1 and early versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yikf commented on issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
yikf commented on issue #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259#issuecomment-1219242710

   > 1. What are we planning for multiple spark version support for this feature?
   
   yes, but the plan is to start support with Spark 3.2, since the 3.2 V2 API is closer to complete,  no plans for an older version
   
   > 2. How do we support HMS with a lower version of spark built-in HMS client?
   
   catalogPlugin support user cusmot option, like our catalog is v2.3, built-in catalog is v1.2,we can set
   ```
   spark.sql.catalog.hivev2catalog.spark.sql.hive.metastore.version  2.3.7
   spark.sql.hive.metastore.version    1.2
   and other configurations like jars and so on... 
   ```
   > 3\. For file scanning, what APIs will we choose, from spark, hive, or raw implementations of file format?
   
   choose `org.apache.hadoop.mapred.InputFormat`, this like spark v1 hive datasource, and we will add a new Optimizer rule which conver DataSourceV2Relation to LogicalRelation to improve performance, this file scan path will be Spark built-in FileFormat(like Orc、Parquet)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] pan3793 closed issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
pan3793 closed issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi
URL: https://github.com/apache/incubator-kyuubi/issues/3259


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yaooqinn commented on issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
yaooqinn commented on issue #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259#issuecomment-1219071742

   where do the Catalog V2 APIs lay in your roadmap?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yaooqinn commented on issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
yaooqinn commented on issue #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259#issuecomment-1219198821

   OK. Additional questions, 
   1) What are we planning for multiple spark version support for this feature?
   2) How do we support HMS with a lower version of spark built-in HMS client?
   3) For file scanning, what APIs will we choose, from spark, hive, or raw implementations of file format?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yaooqinn commented on issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
yaooqinn commented on issue #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259#issuecomment-1219113068

   yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yikf commented on issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
yikf commented on issue #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259#issuecomment-1219101012

   > where do the Catalog V2 APIs lay in your roadmap?
   
   Confused: Does V2API refer to things like supportNamespaces for Catalog or supportPushDown for fileScan?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yikf commented on issue #3259: [FEATURE] Support Hive V2 DataSource in Kyuubi

Posted by GitBox <gi...@apache.org>.
yikf commented on issue #3259:
URL: https://github.com/apache/incubator-kyuubi/issues/3259#issuecomment-1217897150

   cc @yaooqinn @pan3793 @zhaomin1423 @permanentstar


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org