You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by GitBox <gi...@apache.org> on 2022/07/21 08:39:54 UTC

[GitHub] [incubator-kyuubi] deadwind4 opened a new pull request, #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine

deadwind4 opened a new pull request, #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115

   ### _Why are the changes needed?_
   
   Add Iceberg connector doc for Spark SQL Engine
   
   
   ### _How was this patch tested?_
   - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
   
   - [ ] Add screenshots for manual tests if appropriate
   
   - [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] pan3793 commented on a diff in pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine

Posted by GitBox <gi...@apache.org>.
pan3793 commented on code in PR #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115#discussion_r927336021


##########
docs/connector/spark/iceberg.rst:
##########
@@ -16,22 +16,98 @@
 `Iceberg`_
 ==========
 
+Apache Iceberg is an open table format for huge analytic datasets.
+Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala
+using a high-performance table format that works just like a SQL table.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `Iceberg`_.
+   For the knowledge about Iceberg not mentioned in this article,
+   you can obtain it from its `Official Documentation`_.
+
+By using kyuubi, we can run SQL queries towards Iceberg which is more
+convenient, easy to understand, and easy to expand than directly using
+spark to manipulate Iceberg.
 
 Iceberg Integration
 -------------------
 
+To enable the integration of kyuubi spark sql engine and Iceberg through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the Iceberg :ref:`dependencies`
+- Setting the spark extension and catalog :ref:`configurations`
+
 .. _dependencies:
 
 Dependencies
 ************
 
+The **classpath** of kyuubi spark sql engine with Iceberg supported consists of
+
+1. kyuubi-spark-sql-engine-|release|.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. iceberg-spark-runtime-<spark.version>_<scala.version>-<iceberg.version>.jar (example: iceberg-spark-runtime-3.2_2.12-0.14.0.jar), which can be found in the `Maven Central`_
+
+In order to make the Iceberg packages visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the Iceberg packages into ``$SPARK_HOME/jars`` directly
+2. Set ``spark.jars=/path/to/iceberg-spark-runtime``
+
+.. warning::
+   Please mind the compatibility of different Iceberg and Spark versions, which can be confirmed on the page of `Iceberg multi engine support`_.
+
 .. _configurations:
 
 Configurations
 **************
 
+To activate functionality of Iceberg, we can set the following configurations:
+
+.. code-block:: properties
+
+   spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
+   spark.sql.catalog.spark_catalog.type=hive
+   spark.sql.catalog.spark_catalog.uri=thrift://metastore-host:port
+   spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 
 Iceberg Operations
 ------------------
 
-.. _Iceberg: https://iceberg.apache.org/
\ No newline at end of file
+Taking ``CREATE TABLE`` as a example,
+
+.. code-block:: sql
+
+   CREATE TABLE foo (
+     id bigint COMMENT 'unique id',
+     data string)
+   USING iceberg;
+
+Taking ``SELECT`` as a example,
+
+.. code-block:: sql
+
+   SELECT * FROM foo;
+
+Taking ``INSERT`` as a example,
+
+.. code-block:: sql
+
+   INSERT INTO foo VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+Taking ``UPDATE`` as a example, Spark 3.1 added support for UPDATE queries that update matching rows in tables.
+
+.. code-block:: sql
+
+   UPDATE foo SET data = 'd', id = 4 WHERE id >= 3 and id < 4;
+
+Taking ``DELETE FROM`` as a example, Spark 3 added support for DELETE FROM queries to remove data from tables.
+
+.. code-block:: sql

Review Comment:
   Can we add `MERGE INTO` here as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yaooqinn commented on pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine

Posted by GitBox <gi...@apache.org>.
yaooqinn commented on PR #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115#issuecomment-1191253276

   since the doc build fails recently due to upstream breaking changes #3116, can we add a screenshot here to help verify?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] pan3793 commented on pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine

Posted by GitBox <gi...@apache.org>.
pan3793 commented on PR #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115#issuecomment-1192406257

   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] codecov-commenter commented on pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115#issuecomment-1192188803

   # [Codecov](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3115](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (119be81) into [master](https://codecov.io/gh/apache/incubator-kyuubi/commit/f1312ea439d288009200a105db36aa8431f27f8a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f1312ea) will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #3115      +/-   ##
   ============================================
   - Coverage     51.34%   51.32%   -0.02%     
     Complexity        6        6              
   ============================================
     Files           458      458              
     Lines         25388    25388              
     Branches       3536     3536              
   ============================================
   - Hits          13035    13031       -4     
     Misses        11113    11113              
   - Partials       1240     1244       +4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...g/apache/kyuubi/operation/BatchJobSubmission.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9vcGVyYXRpb24vQmF0Y2hKb2JTdWJtaXNzaW9uLnNjYWxh) | `77.63% <0.00%> (-1.25%)` | :arrow_down: |
   | [.../org/apache/kyuubi/session/KyuubiSessionImpl.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9zZXNzaW9uL0t5dXViaVNlc3Npb25JbXBsLnNjYWxh) | `77.27% <0.00%> (-1.14%)` | :arrow_down: |
   | [...n/scala/org/apache/kyuubi/engine/ProcBuilder.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9lbmdpbmUvUHJvY0J1aWxkZXIuc2NhbGE=) | `83.12% <0.00%> (-0.63%)` | :arrow_down: |
   
   Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] pan3793 closed pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine

Posted by GitBox <gi...@apache.org>.
pan3793 closed pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine
URL: https://github.com/apache/incubator-kyuubi/pull/3115


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org