You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by "duowan1520 (via GitHub)" <gi...@apache.org> on 2023/03/25 10:09:20 UTC

[GitHub] [inlong] duowan1520 opened a new pull request, #7694: [INLONG-7693][Sort] CDC Connctor supports specifying field synchronization

duowan1520 opened a new pull request, #7694:
URL: https://github.com/apache/inlong/pull/7694

   ### Prepare a Pull Request
   - Fixes #7693 
   
   ### Motivation
   
   MySQL/Oracle/PostgreSQL CDC Connctor supports specifying field synchronization


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] EMsnap commented on pull request #7694: [INLONG-7693][Sort] CDC Connector supports specifying field synchronization

Posted by "EMsnap (via GitHub)" <gi...@apache.org>.
EMsnap commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1488008662

   BTW, we are planning to export the DDL to the sink connector to achieve the same action such as add a column;
   What if the user filter out column A, and then alter column A with a DDL statement. Will the DDL on column A be filtered using the parameter u provide ?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] e-mhui commented on a diff in pull request #7694: [INLONG-7693][Sort] MySQL CDC Connector supports specifying field synchronization

Posted by "e-mhui (via GitHub)" <gi...@apache.org>.
e-mhui commented on code in PR #7694:
URL: https://github.com/apache/inlong/pull/7694#discussion_r1160419451


##########
inlong-sort/sort-connectors/mysql-cdc/src/main/java/org/apache/inlong/sort/cdc/mysql/source/reader/MySqlRecordEmitter.java:
##########
@@ -112,7 +119,8 @@ public void emitRecord(SourceRecord element, SourceOutput<T> output, MySqlSplitS
             for (TableChange tableChange : changes) {
                 splitState.asBinlogSplitState().recordSchema(tableChange.getId(), tableChange);
                 if (includeSchemaChanges) {
-                    outputDdlElement(element, output, splitState, tableChange);
+                    TableChange newTableChange = ColumnFilterUtil.createTableChange(tableChange, columnNameFilter);

Review Comment:
   Here is the output of schema changes, but you have used a column filter which may affect the accuracy of the results.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on a diff in pull request #7694: [INLONG-7693][Sort] MySQL CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on code in PR #7694:
URL: https://github.com/apache/inlong/pull/7694#discussion_r1160569657


##########
inlong-sort/sort-connectors/mysql-cdc/src/main/java/org/apache/inlong/sort/cdc/mysql/source/reader/MySqlRecordEmitter.java:
##########
@@ -112,7 +119,8 @@ public void emitRecord(SourceRecord element, SourceOutput<T> output, MySqlSplitS
             for (TableChange tableChange : changes) {
                 splitState.asBinlogSplitState().recordSchema(tableChange.getId(), tableChange);
                 if (includeSchemaChanges) {
-                    outputDdlElement(element, output, splitState, tableChange);
+                    TableChange newTableChange = ColumnFilterUtil.createTableChange(tableChange, columnNameFilter);

Review Comment:
   Hi. There is a premise here. When the task sets the debezium.column.include.list/debezium.column.exclude.list parameter, it means that the task will only focus on those specified columns. When the filtered columns have dml and ddl changes , whose changes need to be ignored.
   
   Therefore, when the columnFilter is applied here, the schema only retains the columns required by the task. In this way, the downstream can also filter out the columns that are not concerned in the ddl statement based on the new schema (todo)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] dockerzhang commented on pull request #7694: [INLONG-7693][Sort] CDC Connector supports specifying field synchronization

Posted by "dockerzhang (via GitHub)" <gi...@apache.org>.
dockerzhang commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1484916360

   @duowan1520 pls rebase from the master branch to fix the conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on pull request #7694: [INLONG-7693][Sort] CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1493204314

   > BTW, we are planning to export the DDL to the sink connector to achieve the same action such as add a column; What if the user filter out column A, and then alter column A with a DDL statement. Will the DDL on column A be filtered using the parameter u provide ?
   
   @EMsnap 
   The current field filtering is implemented based on the comparison between the physical data column and TableChange. When the record is of ddl type, the current method cannot judge.
   
   After I read the relevant export ddl pr, I have a supplementary idea, that is, `debezium.table.include.list`, `debezium.table.exclude.list`, `debezium.column.include.list`, `debezium .column.exclude.list` and other parameters are passed to MySqlRecordEmitter to determine whether the current ddl record is sent downstream.
   
   What do you think of this implementation idea?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] EMsnap merged pull request #7694: [INLONG-7693][Sort] MySQL CDC Connector supports specifying field synchronization

Posted by "EMsnap (via GitHub)" <gi...@apache.org>.
EMsnap merged PR #7694:
URL: https://github.com/apache/inlong/pull/7694


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on pull request #7694: [INLONG-7693][Sort] CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1488567249

   > BTW, we are planning to export the DDL to the sink connector to achieve the same action such as add a column; What if the user filter out column A, and then alter column A with a DDL statement. Will the DDL on column A be filtered using the parameter u provide ?
   
   The ddl statement seems to be generated before the getCanalData() method, I decided to wait until your pr is merged before proceeding


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on a diff in pull request #7694: [INLONG-7693][Sort] MySQL CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on code in PR #7694:
URL: https://github.com/apache/inlong/pull/7694#discussion_r1160569657


##########
inlong-sort/sort-connectors/mysql-cdc/src/main/java/org/apache/inlong/sort/cdc/mysql/source/reader/MySqlRecordEmitter.java:
##########
@@ -112,7 +119,8 @@ public void emitRecord(SourceRecord element, SourceOutput<T> output, MySqlSplitS
             for (TableChange tableChange : changes) {
                 splitState.asBinlogSplitState().recordSchema(tableChange.getId(), tableChange);
                 if (includeSchemaChanges) {
-                    outputDdlElement(element, output, splitState, tableChange);
+                    TableChange newTableChange = ColumnFilterUtil.createTableChange(tableChange, columnNameFilter);

Review Comment:
   There is a premise here. When the task sets the debezium.column.include.list/debezium.column.exclude.list parameter, it means that the task will only focus on those specified columns. When the filtered columns have dml and ddl changes , whose changes need to be ignored.
   
   Therefore, when the columnFilter is applied here, the schema only retains the columns required by the task. In this way, the downstream can also filter out the columns that are not concerned in the ddl statement based on the new schema (todo)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on pull request #7694: [INLONG-7693][Sort] CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1489551073

   > BTW, we are planning to export the DDL to the sink connector to achieve the same action such as add a column; What if the user filter out column A, and then alter column A with a DDL statement. Will the DDL on column A be filtered using the parameter u provide ?
   
   I will get back to you after understanding the PR related to export ddl


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] EMsnap commented on pull request #7694: [INLONG-7693][Sort] MySQL CDC Connector supports specifying field synchronization

Posted by "EMsnap (via GitHub)" <gi...@apache.org>.
EMsnap commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1499873340

   @emhui PLAL at this pr , may be related to https://github.com/apache/inlong/pull/7750


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on a diff in pull request #7694: [INLONG-7693][Sort] MySQL CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on code in PR #7694:
URL: https://github.com/apache/inlong/pull/7694#discussion_r1160569657


##########
inlong-sort/sort-connectors/mysql-cdc/src/main/java/org/apache/inlong/sort/cdc/mysql/source/reader/MySqlRecordEmitter.java:
##########
@@ -112,7 +119,8 @@ public void emitRecord(SourceRecord element, SourceOutput<T> output, MySqlSplitS
             for (TableChange tableChange : changes) {
                 splitState.asBinlogSplitState().recordSchema(tableChange.getId(), tableChange);
                 if (includeSchemaChanges) {
-                    outputDdlElement(element, output, splitState, tableChange);
+                    TableChange newTableChange = ColumnFilterUtil.createTableChange(tableChange, columnNameFilter);

Review Comment:
   Hi. There is a premise here. When the task sets the debezium.column.include.list/debezium.column.exclude.list parameter, it means that the task will only focus on those specified columns. When the filtered columns have dml and ddl changes , whose changes need to be ignored.
   
   Therefore, when the columnFilter is applied here, it is expected that the schema will only retain the columns required by the task. In this way, the downstream can also filter out the columns that are not concerned in the ddl statement based on the new schema (todo)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on pull request #7694: [INLONG-7693][Sort] CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1486064655

   > @duowan1520 pls rebase from the master branch to fix the conflicts.
   
   ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on pull request #7694: [INLONG-7693][Sort] MySQL CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1498379068

   > BTW, we are planning to export the DDL to the sink connector to achieve the same action such as add a column; What if the user filter out column A, and then alter column A with a DDL statement. Will the DDL on column A be filtered using the parameter u provide ?
   
   > 
   Hi, the implementation has been adjusted this time. TableChange will remove the column that needs to be filtered before debeziumDeserializationSchema deserialization, and the downstream ddl statement can implement the corresponding filtering operation based on TableChange


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on a diff in pull request #7694: [INLONG-7693][Sort] MySQL CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on code in PR #7694:
URL: https://github.com/apache/inlong/pull/7694#discussion_r1160569657


##########
inlong-sort/sort-connectors/mysql-cdc/src/main/java/org/apache/inlong/sort/cdc/mysql/source/reader/MySqlRecordEmitter.java:
##########
@@ -112,7 +119,8 @@ public void emitRecord(SourceRecord element, SourceOutput<T> output, MySqlSplitS
             for (TableChange tableChange : changes) {
                 splitState.asBinlogSplitState().recordSchema(tableChange.getId(), tableChange);
                 if (includeSchemaChanges) {
-                    outputDdlElement(element, output, splitState, tableChange);
+                    TableChange newTableChange = ColumnFilterUtil.createTableChange(tableChange, columnNameFilter);

Review Comment:
   Hi. There is a premise here. When the task sets the `debezium.column.include.list/debezium.column.exclude.list` parameter, it means that the task will only focus on those specified columns. When the filtered columns have dml and ddl changes , whose changes need to be ignored.
   
   Therefore, when the columnFilter is applied here, it is expected that the schema will only retain the columns required by the task. In this way, the downstream can also filter out the columns that are not concerned in the ddl statement based on the new schema (todo)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] EMsnap commented on pull request #7694: [INLONG-7693][Sort] CDC Connector supports specifying field synchronization

Posted by "EMsnap (via GitHub)" <gi...@apache.org>.
EMsnap commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1484748721

   Can you add a Test for this case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] duowan1520 commented on pull request #7694: [INLONG-7693][Sort] CDC Connector supports specifying field synchronization

Posted by "duowan1520 (via GitHub)" <gi...@apache.org>.
duowan1520 commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1484768150

   > Can you add a Test for this case?
   ok,i will do it
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [inlong] EMsnap commented on pull request #7694: [INLONG-7693][Sort] CDC Connector supports specifying field synchronization

Posted by "EMsnap (via GitHub)" <gi...@apache.org>.
EMsnap commented on PR #7694:
URL: https://github.com/apache/inlong/pull/7694#issuecomment-1488005320

   pls update the doc at the same time thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org