You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kyuubi.apache.org by ya...@apache.org on 2022/07/28 06:21:34 UTC

[incubator-kyuubi] branch master updated: [KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiDB/TiKV

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-kyuubi.git


The following commit(s) were added to refs/heads/master by this push:
     new da87ca55c [KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiDB/TiKV
da87ca55c is described below

commit da87ca55cac36d61010b48ae514e814d4adeefaa
Author: zhouyifan279 <zh...@gmail.com>
AuthorDate: Thu Jul 28 14:21:25 2022 +0800

    [KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiDB/TiKV
    
    ### _Why are the changes needed?_
    Close #3154
    
    ### _How was this patch tested?_
    - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
    
    - [ ] Add screenshots for manual tests if appropriate
    
    - [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
    
    Closes #3155 from zhouyifan279/3154.
    
    Closes #3154
    
    682aaf58 [zhouyifan279] [KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiDB
    4301ca44 [zhouyifan279] [KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiDB
    65acabe6 [zhouyifan279] [KYUUBI #3154][Subtask] Connectors for Spark SQL Query Engine -> TiSpark
    
    Authored-by: zhouyifan279 <zh...@gmail.com>
    Signed-off-by: Kent Yao <ya...@apache.org>
---
 docs/connector/spark/index.rst   |   2 +-
 docs/connector/spark/tidb.rst    | 103 +++++++++++++++++++++++++++++++++++++++
 docs/connector/spark/tispark.rst |  36 --------------
 3 files changed, 104 insertions(+), 37 deletions(-)

diff --git a/docs/connector/spark/index.rst b/docs/connector/spark/index.rst
index a83a09860..7109edabb 100644
--- a/docs/connector/spark/index.rst
+++ b/docs/connector/spark/index.rst
@@ -37,6 +37,6 @@ purpose.
     iceberg
     kudu
     flink_table_store
-    tispark
+    tidb
     tpcds
     tpch
diff --git a/docs/connector/spark/tidb.rst b/docs/connector/spark/tidb.rst
new file mode 100644
index 000000000..bfda33262
--- /dev/null
+++ b/docs/connector/spark/tidb.rst
@@ -0,0 +1,103 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+`TiDB`_
+==========
+
+TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing
+(HTAP) workloads.
+
+TiSpark is a thin layer built for running Apache Spark on top of TiDB/TiKV to answer complex OLAP
+queries. It enjoys the merits of both the Spark platform and the distributed clusters
+of TiKV while seamlessly integrated to TiDB to provide one-stop HTAP solutions for online
+transactions and analyses.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of TiDB and TiSpark.
+   For the knowledge not mentioned in this article, you can obtain it from TiDB `Official Documentation`_.
+
+By using kyuubi, we can run SQL queries towards TiDB/TiKV which is more
+convenient, easy to understand, and easy to expand than directly using
+spark to manipulate TiDB/TiKV.
+
+TiDB Integration
+-------------------
+
+To enable the integration of kyuubi spark sql engine and TiDB through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the TiSpark :ref:`dependencies`
+- Setting the spark extension and catalog :ref:`configurations`
+
+.. _dependencies:
+
+Dependencies
+************
+The classpath of kyuubi spark sql engine with TiDB supported consists of
+
+1. kyuubi-spark-sql-engine-|release|.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. tispark-assembly-<spark.version>_<scala.version>-<tispark.version>.jar (example: tispark-assembly-3.2_2.12-3.0.1.jar), which can be found in the `Maven Central`_
+
+In order to make the TiSpark packages visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the TiSpark packages into ``$SPARK_HOME/jars`` directly
+2. Set ``spark.jars=/path/to/tispark-assembly``
+
+.. warning::
+   Please mind the compatibility of different TiDB, TiSpark and Spark versions, which can be confirmed on the page of `TiSpark Environment setup`_.
+
+.. _configurations:
+
+Configurations
+**************
+
+To activate functionality of TiSpark, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.tispark.pd.addresses $pd_host:$pd_port
+   spark.sql.extensions org.apache.spark.sql.TiExtensions
+   spark.sql.catalog.tidb_catalog  org.apache.spark.sql.catalyst.catalog.TiCatalog
+   spark.sql.catalog.tidb_catalog.pd.addresses $pd_host:$pd_port
+
+The `spark.tispark.pd.addresses` and `spark.sql.catalog.tidb_catalog.pd.addresses` configurations
+allow you to put in multiple PD servers. Specify the port number for each of them.
+
+For example, when you have multiple PD servers on `10.16.20.1,10.16.20.2,10.16.20.3` with the port `2379`,
+put it as `10.16.20.1:2379,10.16.20.2:2379,10.16.20.3:2379`.
+
+TiDB Operations
+------------------
+
+Taking ``SELECT`` as a example,
+
+.. code-block:: sql
+
+   SELECT * FROM foo;
+
+Taking ``DELETE FROM`` as a example, Spark 3 added support for DELETE FROM queries to remove data from tables.
+
+.. code-block:: sql
+
+   DELETE FROM foo WHERE id >= 1 and id < 2;
+
+.. note::
+   As for now (TiSpark 3.0.1), TiSpark does not support ``CREATE TABLE``, ``INSERT INTO/OVERWRITE`` operations
+   through Apache Spark Datasource V2 and Catalog APIs.
+
+.. _Official Documentation: https://docs.pingcap.com/tidb/stable/overview
+.. _Maven Central: https://repo1.maven.org/maven2/com/pingcap/tispark/
+.. _TiSpark Environment setup: https://docs.pingcap.com/tidb/stable/tispark-overview#environment-setup
\ No newline at end of file
diff --git a/docs/connector/spark/tispark.rst b/docs/connector/spark/tispark.rst
deleted file mode 100644
index 12fb276b1..000000000
--- a/docs/connector/spark/tispark.rst
+++ /dev/null
@@ -1,36 +0,0 @@
-.. Licensed to the Apache Software Foundation (ASF) under one or more
-   contributor license agreements.  See the NOTICE file distributed with
-   this work for additional information regarding copyright ownership.
-   The ASF licenses this file to You under the Apache License, Version 2.0
-   (the "License"); you may not use this file except in compliance with
-   the License.  You may obtain a copy of the License at
-
-..    http://www.apache.org/licenses/LICENSE-2.0
-
-.. Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
-
-`TiSpark`_
-==========
-
-TiSpark Integration
--------------------
-
-.. _dependencies:
-
-Dependencies
-************
-
-.. _configurations:
-
-Configurations
-**************
-
-
-TiSpark Operations
-------------------
-
-.. _TiSpark: https://docs.pingcap.com/tidb/dev/tispark-overview
\ No newline at end of file