You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by GitBox <gi...@apache.org> on 2022/07/21 08:39:54 UTC
[GitHub] [incubator-kyuubi] deadwind4 opened a new pull request, #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine
deadwind4 opened a new pull request, #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115
### _Why are the changes needed?_
Add Iceberg connector doc for Spark SQL Engine
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [incubator-kyuubi] pan3793 commented on a diff in pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine
Posted by GitBox <gi...@apache.org>.
pan3793 commented on code in PR #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115#discussion_r927336021
##########
docs/connector/spark/iceberg.rst:
##########
@@ -16,22 +16,98 @@
`Iceberg`_
==========
+Apache Iceberg is an open table format for huge analytic datasets.
+Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala
+using a high-performance table format that works just like a SQL table.
+
+.. tip::
+ This article assumes that you have mastered the basic knowledge and operation of `Iceberg`_.
+ For the knowledge about Iceberg not mentioned in this article,
+ you can obtain it from its `Official Documentation`_.
+
+By using kyuubi, we can run SQL queries towards Iceberg which is more
+convenient, easy to understand, and easy to expand than directly using
+spark to manipulate Iceberg.
Iceberg Integration
-------------------
+To enable the integration of kyuubi spark sql engine and Iceberg through
+Apache Spark Datasource V2 and Catalog APIs, you need to:
+
+- Referencing the Iceberg :ref:`dependencies`
+- Setting the spark extension and catalog :ref:`configurations`
+
.. _dependencies:
Dependencies
************
+The **classpath** of kyuubi spark sql engine with Iceberg supported consists of
+
+1. kyuubi-spark-sql-engine-|release|.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of spark distribution
+3. iceberg-spark-runtime-<spark.version>_<scala.version>-<iceberg.version>.jar (example: iceberg-spark-runtime-3.2_2.12-0.14.0.jar), which can be found in the `Maven Central`_
+
+In order to make the Iceberg packages visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the Iceberg packages into ``$SPARK_HOME/jars`` directly
+2. Set ``spark.jars=/path/to/iceberg-spark-runtime``
+
+.. warning::
+ Please mind the compatibility of different Iceberg and Spark versions, which can be confirmed on the page of `Iceberg multi engine support`_.
+
.. _configurations:
Configurations
**************
+To activate functionality of Iceberg, we can set the following configurations:
+
+.. code-block:: properties
+
+ spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
+ spark.sql.catalog.spark_catalog.type=hive
+ spark.sql.catalog.spark_catalog.uri=thrift://metastore-host:port
+ spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
Iceberg Operations
------------------
-.. _Iceberg: https://iceberg.apache.org/
\ No newline at end of file
+Taking ``CREATE TABLE`` as a example,
+
+.. code-block:: sql
+
+ CREATE TABLE foo (
+ id bigint COMMENT 'unique id',
+ data string)
+ USING iceberg;
+
+Taking ``SELECT`` as a example,
+
+.. code-block:: sql
+
+ SELECT * FROM foo;
+
+Taking ``INSERT`` as a example,
+
+.. code-block:: sql
+
+ INSERT INTO foo VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+Taking ``UPDATE`` as a example, Spark 3.1 added support for UPDATE queries that update matching rows in tables.
+
+.. code-block:: sql
+
+ UPDATE foo SET data = 'd', id = 4 WHERE id >= 3 and id < 4;
+
+Taking ``DELETE FROM`` as a example, Spark 3 added support for DELETE FROM queries to remove data from tables.
+
+.. code-block:: sql
Review Comment:
Can we add `MERGE INTO` here as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [incubator-kyuubi] yaooqinn commented on pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine
Posted by GitBox <gi...@apache.org>.
yaooqinn commented on PR #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115#issuecomment-1191253276
since the doc build fails recently due to upstream breaking changes #3116, can we add a screenshot here to help verify?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [incubator-kyuubi] pan3793 commented on pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine
Posted by GitBox <gi...@apache.org>.
pan3793 commented on PR #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115#issuecomment-1192406257
Thanks, merging to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [incubator-kyuubi] codecov-commenter commented on pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #3115:
URL: https://github.com/apache/incubator-kyuubi/pull/3115#issuecomment-1192188803
# [Codecov](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#3115](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (119be81) into [master](https://codecov.io/gh/apache/incubator-kyuubi/commit/f1312ea439d288009200a105db36aa8431f27f8a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f1312ea) will **decrease** coverage by `0.01%`.
> The diff coverage is `n/a`.
```diff
@@ Coverage Diff @@
## master #3115 +/- ##
============================================
- Coverage 51.34% 51.32% -0.02%
Complexity 6 6
============================================
Files 458 458
Lines 25388 25388
Branches 3536 3536
============================================
- Hits 13035 13031 -4
Misses 11113 11113
- Partials 1240 1244 +4
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...g/apache/kyuubi/operation/BatchJobSubmission.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9vcGVyYXRpb24vQmF0Y2hKb2JTdWJtaXNzaW9uLnNjYWxh) | `77.63% <0.00%> (-1.25%)` | :arrow_down: |
| [.../org/apache/kyuubi/session/KyuubiSessionImpl.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9zZXNzaW9uL0t5dXViaVNlc3Npb25JbXBsLnNjYWxh) | `77.27% <0.00%> (-1.14%)` | :arrow_down: |
| [...n/scala/org/apache/kyuubi/engine/ProcBuilder.scala](https://codecov.io/gh/apache/incubator-kyuubi/pull/3115/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9lbmdpbmUvUHJvY0J1aWxkZXIuc2NhbGE=) | `83.12% <0.00%> (-0.63%)` | :arrow_down: |
Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [incubator-kyuubi] pan3793 closed pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine
Posted by GitBox <gi...@apache.org>.
pan3793 closed pull request #3115: [KYUUBI #3069][DOC] Add Iceberg connector doc for Spark SQL Engine
URL: https://github.com/apache/incubator-kyuubi/pull/3115
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org