You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by "liaorui (via GitHub)" <gi...@apache.org> on 2023/03/20 08:16:32 UTC

[GitHub] [inlong-website] liaorui opened a new pull request, #726: [INLONG-725][Doc]complete dirty data archiving config for Doris connector

liaorui opened a new pull request, #726:
URL: https://github.com/apache/inlong-website/pull/726

   ### Prepare a Pull Request
   *(Change the title refer to the following example)*
   
   - Title Example: [INLONG-XYZ][Component] Title of the pull request
   
   *(The following *XYZ* should be replaced by the actual [GitHub Issue](https://github.com/apache/inlong/issues) number)*
   
   - Fixes #725
   
   ### Motivation
   
   *Explain here the context, and why you're making that change. What is the problem you're trying to solve.*
   
   There are some dirty data archiving options for Doris connector. Such as dirty.igore, dirty.side-output.connector and dirty.side-output.labels etc. The PR shows how to use them.
   
   
   
   ### Modifications
   
   *Describe the modifications you've done.*
   
   The English and Chinese document for Doris connector.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (yes / no)
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
     - If a feature is not applicable for documentation, explain why?
     - If a feature is not documented yet in this PR, please create a followup issue for adding the documentation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong-website] yunqingmoswu commented on a diff in pull request #726: [INLONG-725][Doc] Dirty data archiving options description for Doris connector

Posted by "yunqingmoswu (via GitHub)" <gi...@apache.org>.
yunqingmoswu commented on code in PR #726:
URL: https://github.com/apache/inlong-website/pull/726#discussion_r1142836311


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/doris.md:
##########
@@ -294,7 +294,7 @@ TODO: 将在未来支持此功能。
 | sink.batch.size                   | 可选   | 10000             | int      | 单次写 BE 的最大行数                                                                                                                                                                                                                                                                                           |
 | sink.max-retries                  | 可选   | 1                 | int      | 写 BE 失败之后的重试次数                                                                                                                                                                                                                                                                                         |
 | sink.batch.interval               | 可选   | 10s               | string   | Flush 间隔时间,超过该时间后异步线程将缓存中数据写入 BE。 默认值为10秒,支持时间单位 ms、s、min、h和d。设置为0表示关闭定期写入。                                                                                                                                                                                                                            |
-| sink.properties.*                 | 可选   | (none)            | string   | Stream load 的导入参数<br /><br />例如:<br />'sink.properties.column_separator' = ', '<br />定义列分隔符<br /><br />'sink.properties.escape_delimiters' = 'true'<br />特殊字符作为分隔符,'\\x01'会被转换为二进制的0x01<br /><br /> 'sink.properties.format' = 'json'<br />'sink.properties.strip_outer_array' = 'true' <br />JSON格式导入 |
+| sink.properties.*                 | 可选   | (none)            | string   | Stream load 的导入参数<br /><br />例如:<br />'sink.properties.column_separator' = ', '<br />定义列分隔符<br /><br />'sink.properties.escape_delimiters' = 'true'<br />特殊字符作为分隔符,'\\x01'会被转换为二进制的0x01<br /><br /> 'sink.properties.format' = 'json'<br />'sink.properties.strip_outer_array' = 'true' <br />JSON格式导入<br /><br /> 'sink.properties.format' = 'csv'<br />CSV格式导入 |

Review Comment:
   中英文之间空格



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/doris.md:
##########
@@ -303,6 +303,21 @@ TODO: 将在未来支持此功能。
 | sink.multiple.table-pattern       | 可选   | (none)            | string   | 多表写入时,从源端二进制数据中按照 `sink.multiple.table-pattern` 指定名称提取写入的表名。 `sink.multiple.enable` 为true时有效。                         |
 | sink.multiple.ignore-single-table-errors | 可选 | true         | boolean  | 多表写入时,是否忽略某个表写入失败。为 `true` 时,如果某个表写入异常,则不写入该表数据,其他表的数据正常写入。为 `false` 时,如果某个表写入异常,则所有表均停止写入。     |
 | inlong.metric.labels | 可选 | (none) | String | inlong metric 的标签值,该值的构成为groupId=`{groupId}`&streamId=`{streamId}`&nodeId=`{nodeId}`。|
+| sink.multiple.schema-update.policy | 可选 | (none) | string | 往doris表同步数据时,如果doris表不存在或字段长度超过限制,doris服务器会抛出异常。<br /><br /> 当该属性设置为`THROW_WITH_STOP`,异常会向上抛给Flink框架。Flink框架会自动重启任务,尝试恢复。<br /><br /> 当该属性设置为`STOP_PARTIAL`时,doris connector会忽略该表的写入,新数据不再往该表写入,其它表则正常同步。<br /><br /> 当该属性设置为`LOG_WITH_IGNORE`时,异常会打印到日志中,不会向上抛出。后续新数据到来时,继续尝试往该表写入。 |

Review Comment:
   中英文空格,首字母大写



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/doris.md:
##########
@@ -303,6 +303,21 @@ TODO: 将在未来支持此功能。
 | sink.multiple.table-pattern       | 可选   | (none)            | string   | 多表写入时,从源端二进制数据中按照 `sink.multiple.table-pattern` 指定名称提取写入的表名。 `sink.multiple.enable` 为true时有效。                         |
 | sink.multiple.ignore-single-table-errors | 可选 | true         | boolean  | 多表写入时,是否忽略某个表写入失败。为 `true` 时,如果某个表写入异常,则不写入该表数据,其他表的数据正常写入。为 `false` 时,如果某个表写入异常,则所有表均停止写入。     |
 | inlong.metric.labels | 可选 | (none) | String | inlong metric 的标签值,该值的构成为groupId=`{groupId}`&streamId=`{streamId}`&nodeId=`{nodeId}`。|
+| sink.multiple.schema-update.policy | 可选 | (none) | string | 往doris表同步数据时,如果doris表不存在或字段长度超过限制,doris服务器会抛出异常。<br /><br /> 当该属性设置为`THROW_WITH_STOP`,异常会向上抛给Flink框架。Flink框架会自动重启任务,尝试恢复。<br /><br /> 当该属性设置为`STOP_PARTIAL`时,doris connector会忽略该表的写入,新数据不再往该表写入,其它表则正常同步。<br /><br /> 当该属性设置为`LOG_WITH_IGNORE`时,异常会打印到日志中,不会向上抛出。后续新数据到来时,继续尝试往该表写入。 |
+| dirty.ignore | 可选 | (none)| boolean | 往doris表同步数据时,如果遇到错误和异常,通过该变量可以控制是否忽略脏数据。如果设置为`false`,则忽略脏数据,不归档。如果为`true`,则根据其它的`dirty.side-output.*`的配置决定如何归档数据。 |

Review Comment:
   中英文空格



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong-website] dockerzhang merged pull request #726: [INLONG-725][Doc] Dirty data archiving options description for Doris connector

Posted by "dockerzhang (via GitHub)" <gi...@apache.org>.
dockerzhang merged PR #726:
URL: https://github.com/apache/inlong-website/pull/726


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org