You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kyuubi.apache.org by ch...@apache.org on 2022/07/29 10:12:43 UTC

[incubator-kyuubi] branch master updated: [KYUUBI #3070][DOC] Add a doc of the Hudi connector for the Flink SQL Engine

This is an automated email from the ASF dual-hosted git repository.

chengpan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-kyuubi.git


The following commit(s) were added to refs/heads/master by this push:
     new 38c7c1602 [KYUUBI #3070][DOC] Add a doc of the Hudi connector for the Flink SQL Engine
38c7c1602 is described below

commit 38c7c160252a514608145bcad9a9be1d47d89dea
Author: Luning Wang <wa...@gmail.com>
AuthorDate: Fri Jul 29 18:12:33 2022 +0800

    [KYUUBI #3070][DOC] Add a doc of the Hudi connector for the Flink SQL Engine
    
    ### _Why are the changes needed?_
    
    Add a doc of the Hudi connector for the Flink SQL Engine
    
    ### _How was this patch tested?_
    - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
    
    - [ ] Add screenshots for manual tests if appropriate
    
    - [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
    
    Closes #3140 from deadwind4/hudi-flink-doc.
    
    Closes #3070
    
    69a2ac4d [Luning Wang] Rename dependencies
    d1a01fad [Luning Wang] [KYUUBI #3070][DOC] Add a doc of the Hudi connector for the Flink SQL Engine
    
    Authored-by: Luning Wang <wa...@gmail.com>
    Signed-off-by: Cheng Pan <ch...@apache.org>
---
 docs/connector/flink/hudi.rst | 117 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)

diff --git a/docs/connector/flink/hudi.rst b/docs/connector/flink/hudi.rst
new file mode 100644
index 000000000..d32f0cda1
--- /dev/null
+++ b/docs/connector/flink/hudi.rst
@@ -0,0 +1,117 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+`Hudi`_
+========
+
+Apache Hudi (pronounced “hoodie”) is the next generation streaming data lake platform.
+Apache Hudi brings core warehouse and database functionality directly to a data lake.
+
+.. tip::
+   This article assumes that you have mastered the basic knowledge and operation of `Hudi`_.
+   For the knowledge about Hudi not mentioned in this article,
+   you can obtain it from its `Official Documentation`_.
+
+By using Kyuubi, we can run SQL queries towards Hudi which is more convenient, easy to understand,
+and easy to expand than directly using flink to manipulate Hudi.
+
+Hudi Integration
+----------------
+
+To enable the integration of kyuubi flink sql engine and Hudi through
+Catalog APIs, you need to:
+
+- Referencing the Hudi :ref:`dependencies<flink-hudi-deps>`
+
+.. _flink-hudi-deps:
+
+Dependencies
+************
+
+The **classpath** of kyuubi flink sql engine with Hudi supported consists of
+
+1. kyuubi-flink-sql-engine-|release|.jar, the engine jar deployed with Kyuubi distributions
+2. a copy of flink distribution
+3. hudi-flink<flink.version>-bundle_<scala.version>-<hudi.version>.jar (example: hudi-flink1.14-bundle_2.12-0.11.1.jar), which can be found in the `Maven Central`_
+
+In order to make the Hudi packages visible for the runtime classpath of engines, we can use one of these methods:
+
+1. Put the Hudi packages into ``$flink_HOME/lib`` directly
+2. Set ``pipeline.jars=/path/to/hudi-flink-bundle``
+
+Hudi Operations
+---------------
+
+Taking ``Create Table`` as a example,
+
+.. code-block:: sql
+
+   CREATE TABLE t1 (
+     id INT PRIMARY KEY NOT ENFORCED,
+     name STRING,
+     price DOUBLE
+   ) WITH (
+     'connector' = 'hudi',
+     'path' = 's3://bucket-name/hudi/',
+     'table.type' = 'MERGE_ON_READ' -- this creates a MERGE_ON_READ table, by default is COPY_ON_WRITE
+   );
+
+Taking ``Query Data`` as a example,
+
+.. code-block:: sql
+
+   SELECT * FROM t1;
+
+Taking ``Insert and Update Data`` as a example,
+
+.. code-block:: sql
+
+   INSERT INTO t1 VALUES (1, 'Lucas' , 2.71828);
+
+Taking ``Streaming Query`` as a example,
+
+.. code-block:: sql
+
+   CREATE TABLE t1 (
+     uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
+     name VARCHAR(10),
+     age INT,
+     ts TIMESTAMP(3),
+     `partition` VARCHAR(20)
+   )
+   PARTITIONED BY (`partition`)
+   WITH (
+     'connector' = 'hudi',
+     'path' = '${path}',
+     'table.type' = 'MERGE_ON_READ',
+     'read.streaming.enabled' = 'true',  -- this option enable the streaming read
+     'read.start-commit' = '20210316134557', -- specifies the start commit instant time
+     'read.streaming.check-interval' = '4' -- specifies the check interval for finding new source commits, default 60s.
+   );
+
+   -- Then query the table in stream mode
+   SELECT * FROM t1;
+
+Taking ``Delete Data``,
+
+The streaming query can implicitly auto delete data.
+When consuming data in streaming query,
+Hudi Flink source can also accepts the change logs from the underneath data source,
+it can then applies the UPDATE and DELETE by per-row level.
+
+
+.. _Hudi: https://hudi.apache.org/
+.. _Official Documentation: https://hudi.apache.org/docs/overview
+.. _Maven Central: https://mvnrepository.com/artifact/org.apache.hudi