You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by xu...@apache.org on 2022/04/11 11:09:27 UTC

[hudi] branch asf-site updated: Adding doc for procedures (#5285)

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new b2a4085765 Adding doc for procedures (#5285)
b2a4085765 is described below

commit b2a408576591316a3750d4ef8ee95227aaf5b4dd
Author: huberylee <sh...@foxmail.com>
AuthorDate: Mon Apr 11 19:09:18 2022 +0800

    Adding doc for procedures (#5285)
    
    * [DOCS] Adding doc for clustering & compaction procedures
    
    * [DOCS] Adding procedures link in quick-start-guide
    
    Co-authored-by: XuQianJin-Stars <fo...@apache.com>
---
 website/docs/procedures.md        | 452 ++++++++++++++++++++++++++++++++++++++
 website/docs/quick-start-guide.md |  18 ++
 website/sidebars.js               |   9 +-
 3 files changed, 478 insertions(+), 1 deletion(-)

diff --git a/website/docs/procedures.md b/website/docs/procedures.md
new file mode 100644
index 0000000000..eb88fa2ccc
--- /dev/null
+++ b/website/docs/procedures.md
@@ -0,0 +1,452 @@
+---
+title: Procedures
+summary: "In this page, we introduce how to use procedures with Hudi."
+toc: true
+last_modified_at: 
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+Stored procedures available when use Hudi SparkSQL extensions in all spark's version.
+
+## Usage
+CALL supports passing arguments by name (recommended) or by position. Mixing position and named arguments is also supported.
+
+#### Named arguments
+All procedure arguments are named. When passing arguments by name, arguments can be in any order and any optional argument can be omitted.
+```
+CALL system.procedure_name(arg_name_2 => arg_2, arg_name_1 => arg_1, ... arg_name_n => arg_n)
+```
+#### Positional arguments
+When passing arguments by position, the arguments may be omitted if they are optional.
+```
+CALL system.procedure_name(arg_1, arg_2, ... arg_n)
+```
+*note:* The system here has no practical meaning, the complete procedure name is system.procedure_name.
+
+## Commit management
+
+### show_commits
+
+Show commits' info.
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| table          | String | Y        | None          | Hudi table name                      |
+| limit          | Int    | N        | 10            | Max number of records to be returned |
+
+**Output**
+
+| Output Name                  | Type   |
+|------------------------------|--------|
+| commit_time                  | String |
+| total_bytes_written          | Long   |
+| total_files_added            | Long   |
+| total_files_updated          | Long   |
+| total_partitions_written     | Long   |
+| total_records_written        | Long   |
+| total_update_records_written | Long   |
+| total_errors                 | Long   |
+
+**Example**
+
+```
+call show_commits(table => 'test_hudi_table', limit => 10);
+```
+
+| commit_time       | 	total_bytes_written     | total_files_added | total_files_updated | total_partitions_written | total_records_written | total_update_records_written | total_errors |
+|-------------------|--------------------------|-------------------|---------------------|--------------------------|-----------------------|------------------------------|--------------|
+| 20220216171049652 | 	432653                  | 0                 | 1                   | 1                        | 0                     | 0                            | 0            |
+| 20220216171027021 | 	435346                  | 1                 | 0                   | 1                        | 1                     | 	0                           | 0            |
+| 20220216171019361 | 	435349                  | 1                 | 0                   | 1                        | 1                     | 	0                           | 0            |
+
+### show_commits_metadata
+
+Show commits' metadata.
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| table          | String | Y        | None          | Hudi table name                      |
+| limit          | Int    | N        | 10            | Max number of records to be returned |
+
+**Output**
+
+| Output Name                     | Type   |
+|---------------------------------|--------|
+| commit_time                     | String |
+| action                          | String |
+| partition                       | String |
+| file_id                         | String |
+| previous_commit                 | String |
+| num_writes                      | Long   |
+| num_inserts                     | Long   |
+| num_deletes                     | Long   |
+| num_update_writes               | String |
+| total_errors                    | Long   |
+| total_log_blocks                | Long   |
+| total_corrupt_logblocks         | Long   |
+| total_rollback_blocks           | Long   |
+| total_log_records               | Long   |
+| total_updated_records_compacted | Long   |
+| total_bytes_written             | Long   |
+
+**Example**
+
+```
+call show_commits_metadata(table => 'test_hudi_table');
+```
+
+|    commit_time   | action  | partition     | file_id                                | previous_commit   | num_writes | num_inserts | num_deletes | num_update_writes | total_errors | total_log_blocks | total_corrupt_logblocks | total_rollback_blocks | total_log_records | total_updated_records_compacted | total_bytes_written|
+|----------------- |---------|---------------|----------------------------------------|-------------------|------------|-------------|-------------|-------------------|--------------|------------------|-------------------------|-----------------------|-------------------|---------------------------------|------------------- |
+|20220109225319449 | commit  | dt=2021-05-03 | d0073a12-085d-4f49-83e9-402947e7e90a-0 | null              | 1          | 1           | 0           | 0                 | 0            | 0 	              | 0                       | 0                     | 0                 | 0                               | 435349             |
+|20220109225311742 | commit  | dt=2021-05-02 | b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0 | 20220109214830592 | 1          | 1           | 0           | 0	                | 0            | 0                | 0                       | 0                     | 0                 | 0                               | 435340             |
+|20220109225301429 | commit  | dt=2021-05-01 | 0d7298b3-6b55-4cff-8d7d-b0772358b78a-0 | 20220109214830592 | 1          | 1           | 0           | 0                 | 0	           | 0                | 0                       | 0                     | 0                 | 0                               | 435340             |
+|20220109214830592 | commit  | dt=2021-05-01 | 0d7298b3-6b55-4cff-8d7d-b0772358b78a-0 | 20220109191631015 | 0          | 0           | 1           | 0	                | 0            | 0                | 0                       | 0                     | 0                 | 0                               | 432653             |
+|20220109214830592 | commit  | dt=2021-05-02 | b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0 | 20220109191648181 | 0          | 0           | 1           | 0	                | 0	           | 0                | 0                       | 0                     | 0                 | 0                               | 432653             |
+|20220109191648181 | commit  | dt=2021-05-02 | b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0 | null              | 1          | 1           | 0           | 0                 | 0            | 0                | 0                       | 0                     | 0                 | 0                               | 435341             |
+|20220109191631015 | commit  | dt=2021-05-01 | 0d7298b3-6b55-4cff-8d7d-b0772358b78a-0 | null              | 1          | 1           | 0           | 0                 | 0            | 0                | 0                       | 0                     | 0                 | 0                               | 435341             |
+
+### rollback_to_instant
+
+Rollback a table to the commit that was current at some time.
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description     |
+|----------------|--------|----------|---------------|-----------------|
+| table          | String | Y        | None          | Hudi table name |
+
+**Output**
+
+| Output Name     | Type    |
+|-----------------|---------|
+| rollback_result | Boolean |
+
+**Example**
+
+Roll back test_hudi_table to one instant
+```
+call rollback_to_instant(table => 'test_hudi_table', instant_time => '20220109225319449');
+```
+
+| rollback_result |
+| :---------------|
+|    true         |
+
+### create_savepoints
+
+Create a savepoint to hudi's table.
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description     |
+|----------------|--------|----------|---------------|-----------------|
+| table          | String | Y        | None          | Hudi table name |
+| commit_Time    | String | Y        | None          | Commit time     |
+| user           | String | N        | ""            | User name       |
+| comments       | String | N        | ""            | Comments        |
+
+**Output**
+
+| Output Name             | Type    |
+|-------------------------|---------|
+| create_savepoint_result | Boolean |
+
+**Example**
+
+Roll back test_hudi_table to one instant
+```
+call create_savepoints(table => 'test_hudi_table', instant_time => '20220109225319449');
+```
+
+| create_savepoint_result |
+|:------------------------|
+| true                    |
+
+### delete_savepoints
+
+Delete a savepoint to hudi's table.
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description     |
+|----------------|--------|----------|---------------|-----------------|
+| table          | String | Y        | None          | Hudi table name |
+| instant_time   | String | Y        | None          | Instant time    |
+
+**Output**
+
+| Output Name             | Type    |
+|-------------------------|---------|
+| delete_savepoint_result | Boolean |
+
+**Example**
+
+Delete a savepoint to test_hudi_table
+```
+call delete_savepoints(table => 'test_hudi_table', instant_time => '20220109225319449');
+```
+
+| delete_savepoint_result |
+|:------------------------|
+| true                    |
+
+### rollback_savepoints
+
+Rollback a table to the commit that was current at some time.
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description     |
+|----------------|--------|----------|---------------|-----------------|
+| table          | String | Y        | None          | Hudi table name |
+| instant_time   | String | Y        | None          | Instant time    |
+
+**Output**
+
+| Output Name               | Type    |
+|---------------------------|---------|
+| rollback_savepoint_result | Boolean |
+
+**Example**
+
+Rollback test_hudi_table to one savepoint
+```
+call rollback_savepoints(table => 'test_hudi_table', instant_time => '20220109225319449');
+```
+
+| rollback_savepoint_result |
+|:--------------------------|
+| true                      |
+
+## Optimization table
+
+### run_clustering
+
+Trigger clustering on a hoodie table. By using partition predicates, clustering table can be run 
+with specified partitions, and you can also specify the order columns to sort data.
+
+:::note
+Newly clustering instant will be generated every call, and all pending clustering instants are executed.
+When calling this procedure, one of parameters ``table`` and ``path`` must be specified at least. If both 
+parameters are given, ``table`` will take effect.
+
+:::
+
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description                   |
+|----------------|--------|----------|---------------|-------------------------------|
+| table          | String | N        | None          | Name of table to be clustered |
+| path           | String | N        | None          | Path of table to be clustered |
+| predicate      | String | N        | None          | Predicate to filter partition |
+| order          | String | N        | None          | Order column split by `,`     |
+
+**Output**
+
+Empty
+
+**Example**
+
+Clustering test_hudi_table with table name
+```
+call run_clustering(table => 'test_hudi_table')
+```
+
+Clustering test_hudi_table with table path
+```
+call run_clustering(path => '/tmp/hoodie/test_hudi_table')
+```
+
+Clustering test_hudi_table with table name, predicate and order column
+```
+call run_clustering(table => 'test_hudi_table', predicate => 'ts <= 20220408L', order => 'ts')
+```
+
+### show_clustering
+
+Show pending clusterings on a hoodie table. 
+
+:::note
+When calling this procedure, one of parameters ``table`` and ``path`` must be specified at least. 
+If both parameters are given, ``table`` will take effect.
+
+:::
+
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| table          | String | N        | None          | Name of table to be clustered        |
+| path           | String | N        | None          | Path of table to be clustered        |
+| limit          | Int    | N        | None          | Max number of records to be returned |
+
+**Output**
+
+| Parameter Name | Type   | Required | Default Value | Description                           |
+|----------------|--------|----------|---------------|---------------------------------------|
+| timestamp      | String | N        | None          | Instant time                          |
+| groups         | Int    | N        | None          | Number of file groups to be processed |
+
+**Example**
+
+Show pending clusterings with table name
+```
+call show_clustering(table => 'test_hudi_table')
+```
+| timestamp         | groups | 
+|-------------------|--------|
+| 20220408153707928 | 2      |
+| 20220408153636963 | 3      |
+
+Show pending clusterings with table path
+```
+call show_clustering(path => '/tmp/hoodie/test_hudi_table')
+```
+| timestamp         | groups | 
+|-------------------|--------|
+| 20220408153707928 | 2      |
+| 20220408153636963 | 3      |
+
+Show pending clusterings with table name and limit
+```
+call show_clustering(table => 'test_hudi_table', limit => 1)
+```
+| timestamp         | groups | 
+|-------------------|--------|
+| 20220408153707928 | 2      |
+
+### run_compaction
+
+Schedule or run compaction on a hoodie table. 
+
+:::note
+For scheduling compaction, if `timestamp` is specified, new scheduled compaction will use given 
+timestamp as instant time. Otherwise, compaction will be scheduled by using current system time. 
+
+For running compaction, given ``timestamp`` must be a pending compaction instant time that 
+already exists, if it's not, exception will be thrown. Meanwhile, if ``timestamp``is specified 
+and there are pending compactions, all pending compactions will be executed without new compaction 
+instant generated. 
+
+When calling this procedure, one of parameters ``table`` and ``path``must be specified at least. 
+If both parameters are given, ``table`` will take effect.
+:::
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description                         |
+|----------------|--------|----------|---------------|-------------------------------------|
+| op             | String | N        | None          | Operation type, `RUN` or `SCHEDULE` |
+| table          | String | N        | None          | Name of table to be compacted       |
+| path           | String | N        | None          | Path of table to be compacted       |
+| timestamp      | String | N        | None          | Instant time                        |
+
+**Output**
+
+The output of `RUN` operation is `EMPTY`, the output of `SCHEDULE` as follow:
+
+| Parameter Name | Type   | Required  | Default Value | Description  |
+|----------------|--------|-----------|---------------|--------------|
+| instant        | String | N         | None          | Instant name |
+
+**Example**
+
+Run compaction with table name
+```
+call run_compaction(op => 'run', table => 'test_hudi_table')
+```
+
+Run compaction with table path
+```
+call run_compaction(op => 'run', path => '/tmp/hoodie/test_hudi_table')
+```
+
+Run compaction with table path and timestamp
+```
+call run_compaction(op => 'run', path => '/tmp/hoodie/test_hudi_table', timestamp => '20220408153658568')
+```
+
+Schedule compaction with table name
+```
+call run_compaction(op => 'schedule', table => 'test_hudi_table')
+```
+| instant           |
+|-------------------|
+| 20220408153650834 |
+
+Schedule compaction with table path
+```
+call run_compaction(op => 'schedule', path => '/tmp/hoodie/test_hudi_table')
+```
+| instant           |
+|-------------------|
+| 20220408153650834 |
+
+Schedule compaction with table path and timestamp
+```
+call run_compaction(op => 'schedule', path => '/tmp/hoodie/test_hudi_table', timestamp => '20220408153658568')
+```
+| instant           |
+|-------------------|
+| 20220408153658568 |
+
+### show_compaction
+
+Show all compactions on a hoodie table, in-flight or completed compactions are included, and result will
+be in reverse order according to trigger time. 
+
+:::note
+When calling this procedure, one of parameters ``table``and ``path`` must be specified at least. 
+If both parameters are given, ``table`` will take effect.
+:::
+
+**Input**
+
+| Parameter Name | Type   | Required | Default Value | Description                          |
+|----------------|--------|----------|---------------|--------------------------------------|
+| table          | String | N        | None          | Name of table to show compaction     |
+| path           | String | N        | None          | Path of table to show compaction     |
+| limit          | Int    | N        | None          | Max number of records to be returned |
+
+**Output**
+
+| Parameter Name | Type   | Required | Default Value | Description                           |
+|----------------|--------|----------|---------------|---------------------------------------|
+| timestamp      | String | N        | None          | Instant time                          |
+| action         | String | N        | None          | Action name of compaction             |
+| size           | Int    | N        | None          | Number of file slices to be compacted |
+
+**Example**
+
+Show compactions with table name
+```
+call show_compaction(table => 'test_hudi_table')
+```
+| timestamp         | action     | size    |
+|-------------------|------------|---------|
+| 20220408153707928 | compaction | 10      |
+| 20220408153636963 | compaction | 10      |
+
+Show compactions with table path
+```
+call show_compaction(path => '/tmp/hoodie/test_hudi_table')
+```
+| timestamp         | action     | size    |
+|-------------------|------------|---------|
+| 20220408153707928 | compaction | 10      |
+| 20220408153636963 | compaction | 10      |
+
+Show compactions with table name and limit
+```
+call show_compaction(table => 'test_hudi_table', limit => 1)
+```
+| timestamp         | action     | size    |
+|-------------------|------------|---------|
+| 20220408153707928 | compaction | 10      |
\ No newline at end of file
diff --git a/website/docs/quick-start-guide.md b/website/docs/quick-start-guide.md
index a31328dd91..6446016254 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -1094,6 +1094,24 @@ alter table hudi_cow_pt_tbl drop partition (dt='2021-12-09', hh='10');
 Currently,  the result of `show partitions` is based on the filesystem table path. It's not precise when delete the whole partition data or drop certain partition directly.
 
 :::
+### Procedures
+**Syntax**
+```sql
+--Call procedure by positional arguments
+CALL system.procedure_name(arg_1, arg_2, ... arg_n)
+
+--Call procedure by named arguments
+CALL system.procedure_name(arg_name_2 => arg_2, arg_name_1 => arg_1, ... arg_name_n => arg_n)
+```
+**Examples**
+```sql
+--show commit's info
+call show_commits(table => 'test_hudi_table', limit => 10);
+```
+
+Call command has already support some commit procedures and table optimization procedures, 
+more details please refer to [procedures](/docs/next/procedures).
+
 
 ## Where to go from here?
 
diff --git a/website/sidebars.js b/website/sidebars.js
index fe4b0b4643..5589aece3f 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -37,7 +37,14 @@ module.exports = {
             type: 'category',
             label: 'How To',
             items: [
-                'table_management',
+                {
+                    type: 'category',
+                    label: 'SQL',
+                    items: [
+                        'table_management',
+                        'procedures'
+                    ],
+                },
                 'writing_data',
                 'hoodie_deltastreamer',
                 'querying_data',