You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/06 21:02:26 UTC

[GitHub] [hudi] xushiyan commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

xushiyan commented on code in PR #6256:
URL: https://github.com/apache/hudi/pull/6256#discussion_r964163474


##########
rfc/rfc-51/rfc-51.md:
##########
@@ -64,69 +65,72 @@ We follow the debezium output format: four columns as shown below
 
 Note: the illustration here ignores all the Hudi metadata columns like `_hoodie_commit_time` in `before` and `after` columns.
 
-## Goals
+## Design Goals
 
 1. Support row-level CDC records generation and persistence;
 2. Support both MOR and COW tables;
 3. Support all the write operations;
 4. Support Spark DataFrame/SQL/Streaming Query;
 
-## Implementation
+## Configurations
 
-### CDC Architecture
+| key                                                 | default  | description                                                                                                                                      |
+|-----------------------------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
+| hoodie.table.cdc.enabled                            | `false`  | The master switch of the CDC features. If `true`, writers and readers will respect CDC configurations and behave accordingly.                    |
+| hoodie.table.cdc.supplemental.logging               | `false`  | If `true`, persist the required information about the changed data, including `before`. If `false`, only `op` and record keys will be persisted. |
+| hoodie.table.cdc.supplemental.logging.include_after | `false`  | If `true`, persist `after` as well.                                                                                                              |

Review Comment:
   > a detailed instruction on they work for different use cases should be explained.
   
   Yes. as shown above, just 1 config `hoodie.table.cdc.supplemental.logging.mode` is to be exposed and the values indicating different level of logging verbosity should be explained in config docs. By default only op and key are to be logged, as the minimized logging mode. I've updated the PR for this accordingly.



##########
rfc/rfc-51/rfc-51.md:
##########
@@ -64,69 +65,72 @@ We follow the debezium output format: four columns as shown below
 
 Note: the illustration here ignores all the Hudi metadata columns like `_hoodie_commit_time` in `before` and `after` columns.
 
-## Goals
+## Design Goals
 
 1. Support row-level CDC records generation and persistence;
 2. Support both MOR and COW tables;
 3. Support all the write operations;
 4. Support Spark DataFrame/SQL/Streaming Query;
 
-## Implementation
+## Configurations
 
-### CDC Architecture
+| key                                                 | default  | description                                                                                                                                      |
+|-----------------------------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
+| hoodie.table.cdc.enabled                            | `false`  | The master switch of the CDC features. If `true`, writers and readers will respect CDC configurations and behave accordingly.                    |
+| hoodie.table.cdc.supplemental.logging               | `false`  | If `true`, persist the required information about the changed data, including `before`. If `false`, only `op` and record keys will be persisted. |
+| hoodie.table.cdc.supplemental.logging.include_after | `false`  | If `true`, persist `after` as well.                                                                                                              |

Review Comment:
   > a detailed instruction on they work for different use cases should be explained.
   
   @danny0405 Yes. as shown above, just 1 config `hoodie.table.cdc.supplemental.logging.mode` is to be exposed and the values indicating different level of logging verbosity should be explained in config docs. By default only op and key are to be logged, as the minimized logging mode. I've updated the PR for this accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org