You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@kyuubi.apache.org by GitBox <gi...@apache.org> on 2022/08/09 14:33:09 UTC

[GitHub] [incubator-kyuubi] pan3793 opened a new pull request, #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

pan3793 opened a new pull request, #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211

<!--
Thanks for sending a pull request!

Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html
2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
-->

### _Why are the changes needed?_

Supply document for Kyuubi Spark TPC-H Connector

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] pan3793 commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r942105723


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,76 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-|release|_2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-|release|_2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set ``spark.jars=kyuubi-spark-connector-tpch-|release|_2.12.jar``
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task
+
+Consider to reduce `spark.sql.catalog.tpch.read.maxPartitionBytes` if you want a higher parallelism.
 
 TPC-H Operations
 ----------------
+
+Listing databases under `tpch` catalog.
+
+.. code-block:: sql
+    SHOW DATABASES IN tpch;
+
+Listing tables under `tpch.sf1` database.
+
+.. code-block:: sql
+    SHOW TABLES IN tpch.sf1;
+
+Switch current database to `tpch.sf1` and run a query against it.
+
+.. code-block:: sql
+    USE tpch.sf1;
+    SELECT * FROM orders;
+
+.. _Official Documentation: https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf
+.. _Maven Central: https://repo1.maven.org/maven2/org/apache/kyuubi/kyuubi-spark-connector-tpch_2.12/

Review Comment:
   Because we have not release 1.6.0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] pan3793 commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r943111881


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,80 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+Goto `Try Kyuubi`_ to explore TPC-H data instantly!
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-\ |release|\ _2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set spark.jars=kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task, consider to reduce it if you want a higher parallelism.

Review Comment:
   yeah, editing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] pan3793 commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r943108162


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,80 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+Goto `Try Kyuubi`_ to explore TPC-H data instantly!
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-\ |release|\ _2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set spark.jars=kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task, consider to reduce it if you want a higher parallelism.

Review Comment:
   1. configurable
   2. yes
   3. ignore invalid databases and log warning message. all vaild database list is: sf0, tiny, sf1, sf10, sf30, sf100, sf300, sf1000, sf3000, sf10000, sf30000, sf100000
   4. different output in `desc tcph.sf1.orders` and `show create table tcph.sf1.orders`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] yaooqinn commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

yaooqinn commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r942100466


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,76 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-|release|_2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-|release|_2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set ``spark.jars=kyuubi-spark-connector-tpch-|release|_2.12.jar``
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task
+
+Consider to reduce `spark.sql.catalog.tpch.read.maxPartitionBytes` if you want a higher parallelism.
 
 TPC-H Operations
 ----------------
+
+Listing databases under `tpch` catalog.
+
+.. code-block:: sql
+    SHOW DATABASES IN tpch;
+
+Listing tables under `tpch.sf1` database.
+
+.. code-block:: sql
+    SHOW TABLES IN tpch.sf1;
+
+Switch current database to `tpch.sf1` and run a query against it.
+
+.. code-block:: sql
+    USE tpch.sf1;
+    SELECT * FROM orders;
+
+.. _Official Documentation: https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf

Review Comment:
   how about https://www.tpc.org/tpch/default5.asp



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] yaooqinn commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

yaooqinn commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r943104927


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,80 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+Goto `Try Kyuubi`_ to explore TPC-H data instantly!
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-\ |release|\ _2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set spark.jars=kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task, consider to reduce it if you want a higher parallelism.

Review Comment:
   ```suggestion
      # (required) Register a catalog named `tpch` for the spark engine.
      spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
      #  Exclude database list from the catalog
      spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000
      # (optional) When true, use CHAR/VARCHAR; otherwise use STRING
      spark.sql.catalog.tpch.useAnsiStringType=false
      # (optional) Maximum bytes per task, consider reducing it if you want higher parallelism.
      spark.sql.catalog.tpch.read.maxPartitionBytes=134217728
   ```
   
   Questions,
   
   1. the catalog name `tpch` is hard coded, or configurable
   2. if configurable, are we able to register multiple tpch catalogs, such as tpch1, tpch2 ...
   3. what happened if `excludeDatabases` contains invalid databases, if not what are all the candidates
   4. what are the differences in user experiences for char/varchars and strings



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] pan3793 closed pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

pan3793 closed pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H
URL: https://github.com/apache/incubator-kyuubi/pull/3211


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] yaooqinn commented on pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

yaooqinn commented on PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#issuecomment-1210284199

   can we add a screen shot in the pr desc to verify?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] cfmcgrady commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

cfmcgrady commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r941986408


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,76 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-|release|_2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-|release|_2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set ``spark.jars=kyuubi-spark-connector-tpch-|release|_2.12.jar``
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task
+
+Consider to reduce `spark.sql.catalog.tpch.read.maxPartitionBytes` if you want a higher parallelism.
 
 TPC-H Operations
 ----------------
+
+Listing databases under `tpch` catalog.
+
+.. code-block:: sql
+    SHOW DATABASES IN tpch;
+
+Listing tables under `tpch.sf1` database.
+
+.. code-block:: sql
+    SHOW TABLES IN tpch.sf1;
+
+Switch current database to `tpch.sf1` and run a query against it.
+
+.. code-block:: sql
+    USE tpch.sf1;
+    SELECT * FROM orders;
+
+.. _Official Documentation: https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf
+.. _Maven Central: https://repo1.maven.org/maven2/org/apache/kyuubi/kyuubi-spark-connector-tpch_2.12/

Review Comment:
   the maven central link returns 404 error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] pan3793 commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r943108162


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,80 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+Goto `Try Kyuubi`_ to explore TPC-H data instantly!
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-\ |release|\ _2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set spark.jars=kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task, consider to reduce it if you want a higher parallelism.

Review Comment:
   1. configurable
   2. yes
   3. ignore invalid databases and log warning message. all vaild database list is: sf0, tiny, sf1, sf10, sf30, sf100, sf300, sf1000, sf3000, sf10000, sf30000, sf100000
   4. different output in `desc tcph.sf1.orders` or `show create table tcph.sf1.orders`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] codecov-commenter commented on pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

codecov-commenter commented on PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#issuecomment-1209619822

   # [Codecov](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3211](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a7e6dc3) into [master](https://codecov.io/gh/apache/incubator-kyuubi/commit/c9cc9b7e5f9f2abd54dc6b00c43a54e3008e2b69?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c9cc9b7) will **decrease** coverage by `0.04%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #3211      +/-   ##
   ============================================
   - Coverage     51.52%   51.48%   -0.05%     
     Complexity        6        6              
   ============================================
     Files           459      459              
     Lines         25556    25556              
     Branches       3545     3545              
   ============================================
   - Hits          13169    13157      -12     
   - Misses        11123    11135      +12     
     Partials       1264     1264              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...client/exception/RetryableKyuubiRestException.java](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXJlc3QtY2xpZW50L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reXV1YmkvY2xpZW50L2V4Y2VwdGlvbi9SZXRyeWFibGVLeXV1YmlSZXN0RXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/kyuubi/client/RetryableRestClient.java](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXJlc3QtY2xpZW50L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reXV1YmkvY2xpZW50L1JldHJ5YWJsZVJlc3RDbGllbnQuamF2YQ==) | `48.78% <0.00%> (-24.40%)` | :arrow_down: |
   | [...main/java/org/apache/kyuubi/client/RestClient.java](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXJlc3QtY2xpZW50L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reXV1YmkvY2xpZW50L1Jlc3RDbGllbnQuamF2YQ==) | `82.75% <0.00%> (-3.45%)` | :arrow_down: |
   | [...ache/kyuubi/operation/KyuubiOperationManager.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9vcGVyYXRpb24vS3l1dWJpT3BlcmF0aW9uTWFuYWdlci5zY2FsYQ==) | `80.82% <0.00%> (-1.37%)` | :arrow_down: |
   | [...n/scala/org/apache/kyuubi/engine/ProcBuilder.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9lbmdpbmUvUHJvY0J1aWxkZXIuc2NhbGE=) | `83.12% <0.00%> (-0.63%)` | :arrow_down: |
   | [...apache/kyuubi/engine/JpsApplicationOperation.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9lbmdpbmUvSnBzQXBwbGljYXRpb25PcGVyYXRpb24uc2NhbGE=) | `77.41% <0.00%> (ø)` | |
   | [...in/scala/org/apache/kyuubi/config/KyuubiConf.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLWNvbW1vbi9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9jb25maWcvS3l1dWJpQ29uZi5zY2FsYQ==) | `97.32% <0.00%> (+0.08%)` | :arrow_up: |
   | [...yuubi/server/metadata/jdbc/JDBCMetadataStore.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9zZXJ2ZXIvbWV0YWRhdGEvamRiYy9KREJDTWV0YWRhdGFTdG9yZS5zY2FsYQ==) | `89.96% <0.00%> (+0.69%)` | :arrow_up: |
   | [...rg/apache/kyuubi/ctl/cmd/log/LogBatchCommand.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLWN0bC9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9jdGwvY21kL2xvZy9Mb2dCYXRjaENvbW1hbmQuc2NhbGE=) | `80.00% <0.00%> (+2.00%)` | :arrow_up: |
   
   :mega: Codecov can now indicate which changes are the most critical in Pull Requests. [Learn more](https://about.codecov.io/product/feature/runtime-insights/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] yaooqinn commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

yaooqinn commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r943111591


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,80 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+Goto `Try Kyuubi`_ to explore TPC-H data instantly!
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-\ |release|\ _2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set spark.jars=kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task, consider to reduce it if you want a higher parallelism.

Review Comment:
   can we elaborate on these a bit in comments for each config?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] yaooqinn commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

yaooqinn commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r942135590


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,76 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-|release|_2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-|release|_2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set ``spark.jars=kyuubi-spark-connector-tpch-|release|_2.12.jar``
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task
+
+Consider to reduce `spark.sql.catalog.tpch.read.maxPartitionBytes` if you want a higher parallelism.

Review Comment:
   can be appended in L67?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] yaooqinn commented on pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

yaooqinn commented on PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#issuecomment-1210298711

   maybe we can mention in trykyuubi in this page


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] pan3793 commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r942684101


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,76 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-|release|_2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-|release|_2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set ``spark.jars=kyuubi-spark-connector-tpch-|release|_2.12.jar``
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task
+
+Consider to reduce `spark.sql.catalog.tpch.read.maxPartitionBytes` if you want a higher parallelism.

Review Comment:
   updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] yaooqinn commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

yaooqinn commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r943089872


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,80 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+Goto `Try Kyuubi`_ to explore TPC-H data instantly!
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-\ |release|\ _2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set spark.jars=kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:

Review Comment:
   Shall we mention where to set these configurations?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] pan3793 commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r943108162


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,80 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+Goto `Try Kyuubi`_ to explore TPC-H data instantly!
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-\ |release|\ _2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set spark.jars=kyuubi-spark-connector-tpch-\ |release|\ _2.12.jar
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task, consider to reduce it if you want a higher parallelism.

Review Comment:
   1. configurable
   2. you are right
   3. ignore invalid databases and log warning message. all vaild database list is: sf0, tiny, sf1, sf10, sf30, sf100, sf300, sf1000, sf3000, sf10000, sf30000, sf100000
   4. different output in `desc tcph.sf1.orders` or `show create table tcph.sf1.orders`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] pan3793 commented on pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

pan3793 commented on PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#issuecomment-1211697183

   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org

[GitHub] [incubator-kyuubi] pan3793 commented on a diff in pull request #3211: [Subtask] Connectors for Spark SQL Query Engine -> TPC-H

Posted by GitBox <gi...@apache.org>.

pan3793 commented on code in PR #3211:
URL: https://github.com/apache/incubator-kyuubi/pull/3211#discussion_r942107633


##########
docs/connector/spark/tpch.rst:
##########
@@ -16,19 +16,76 @@
 TPC-H
 =====
 
-TPC-DS Integration
+The TPC-H is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent
+data modifications. The queries and the data populating the database have been chosen to have broad industry-wide
+relevance.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `TPC-H`_.
+   For the knowledge about TPC-H not mentioned in this article, you can obtain it from its `Official Documentation`_.
+
+This connector can be used to test the capabilities and query syntax of Spark without configuring access to an external
+data source. When you query a TPC-H table, the connector generates the data on the fly using a deterministic algorithm.
+
+TPC-H Integration
 ------------------
 
+To enable the integration of kyuubi spark sql engine and TPC-H through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TPC-H connector :ref:`dependencies<spark-tpch-deps>`
+- Setting the spark catalog :ref:`configurations<spark-tpch-conf>`
+
 .. _spark-tpch-deps:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-|release|_2.12.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. kyuubi-spark-connector-tpch-|release|_2.12.jar, which can be found in the `Maven Central`_
+
+In order to make the TPC-H connector package visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TPC-H connector package into ``$SPARK_HOME/jars`` directly
+2. Set ``spark.jars=kyuubi-spark-connector-tpch-|release|_2.12.jar``
+
 .. _spark-tpch-conf:
 
 Configurations
 **************
 
+To add TPC-H tables as a catalog, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.tpch=org.apache.kyuubi.spark.connector.tpch.TPCHCatalog
+   spark.sql.catalog.tpch.excludeDatabases=sf10000,sf30000  # optional Exclude database list from the catalog
+   spark.sql.catalog.tpch.useAnsiStringType=false           # optional When true, use CHAR VARCHAR; otherwise use STRING
+   spark.sql.catalog.tpch.read.maxPartitionBytes=134217728  # optional Max data split size in bytes per task
+
+Consider to reduce `spark.sql.catalog.tpch.read.maxPartitionBytes` if you want a higher parallelism.
 
 TPC-H Operations
 ----------------
+
+Listing databases under `tpch` catalog.
+
+.. code-block:: sql
+    SHOW DATABASES IN tpch;
+
+Listing tables under `tpch.sf1` database.
+
+.. code-block:: sql
+    SHOW TABLES IN tpch.sf1;
+
+Switch current database to `tpch.sf1` and run a query against it.
+
+.. code-block:: sql
+    USE tpch.sf1;
+    SELECT * FROM orders;
+
+.. _Official Documentation: https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf

Review Comment:
   changed to https://www.tpc.org/tpch/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org