You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/27 09:12:21 UTC

[GitHub] [doris] Im-Manshushu opened a new issue, #11258: [Feature] JSON data is dynamically written to the Doris table

Im-Manshushu opened a new issue, #11258:
URL: https://github.com/apache/doris/issues/11258

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   The flink connector automatically writes the data into the corresponding table by parsing the database name and table name in the JSON data
   
   
   
   ### Use case
   
   Collect data in real time, consume the data in Kafka through Flink, there are multiple table data in a topic, JSON data format, and the data includes database name, table name, field name and data value
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Wilson-BT commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Wilson-BT commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1198066436

   
   
   
   
   > > Yes, encapsulate such a data structure, dynamically load by spelling the URL on the sink side, and add a keyby operator before sink
   > 
   > At present, flink-doris-connector initiates the stream load of the table when the flink task starts, instead of doing the stream load when the upstream data is received. How to do this dynamic stream load? Please describe your design in detail~
   Many users put all the canal logs of all tables in the business library into one topic, which needs to be distributed before they can use doris-flink-connector. His idea is to edit a task to synchronize the entire library. Because currently doris-flink-connector uses http inputstream, that is, a checkpoint opens a stream, and a streamLoad url is strongly bound. In this case, we can only use the flink side to cache data, and then a table generates a buffer, and bind the corresponding table-streamload-url, set a threshold, such as rows number or batch size to submit tasks, just like doris-datax-writer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JNSimba commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
JNSimba commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1197622739

   > Customize the sink class, extends richsinkfunction and implement checkpointedfunction. In invoke dynamically sink through the properties in the custom entity object. The properties of the custom entity object are databasename, tablename, list<string&gt;. If there is a need to dynamically insert multi table data, the user only needs to encapsulate this entity object, I'm not good at English. I don't know whether I can express it clearly
   
   By encapsulating a data structure similar to <db, table, data>, when stream loading, replace the corresponding url, and then stream load the data of each table in turn on the sink side?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JNSimba commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
JNSimba commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1198849851

   > Many users put all the canal logs of all tables in the business library into one topic, which needs to be distributed before they can use doris-flink-connector. His idea is to edit a task to synchronize the entire library. Because currently doris-flink-connector uses http inputstream, that is, a checkpoint opens a stream, and a streamLoad url is strongly bound. Therefore, the current doris-flink-connector architecture is not suitable for the entire library synchronization, because it will involve too many http long link. In this case, we can only use the old streamload batch mode: the flink side caches data, then a table generates a buffer, and binds the corresponding table-streamload-url, and sets a threshold, such as rows number or batch size to submit tasks, just like doris-datax-writer.
   
   However, in the old version of stream load and batch writing, there may be several problems:
   1. A series of problems caused by the unreasonable setting of the cached batch size: For example, if it is too small, it will cause the -235 problem caused by frequent imports; if the setting is too large, the flink memory will be under pressure.
   2. And does not guarantee exactly-once semantics
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Im-Manshushu commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Im-Manshushu commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1198922608

   After the function development is completed, a reference threshold can be given to users. Users can set sink concurrency and checkpoint interval according to scenarios such as data volume and effectiveness
   
   
   
   
   
   
   
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/doris"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2022年7月29日(星期五) 中午11:47
   ***@***.***&gt;;
   抄送:&nbsp;"I'm ~ ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/doris] [Feature] JSON data is dynamically written to the Doris table (Issue #11258)
   
   
   
   
   
     
   Many users put all the canal logs of all tables in the business library into one topic, which needs to be distributed before they can use doris-flink-connector. His idea is to edit a task to synchronize the entire library. Because currently doris-flink-connector uses http inputstream, that is, a checkpoint opens a stream, and a streamLoad url is strongly bound. Therefore, the current doris-flink-connector architecture is not suitable for the entire library synchronization, because it will involve too many http long link. In this case, we can only use the old streamload batch mode: the flink side caches data, then a table generates a buffer, and binds the corresponding table-streamload-url, and sets a threshold, such as rows number or batch size to submit tasks, just like doris-datax-writer.
     
   However, in the old version of stream load and batch writing, there may be several problems:
     
   A series of problems caused by the unreasonable setting of the cached batch size: For example, if it is too small, it will cause the -235 problem caused by frequent imports; if the setting is too large, the flink memory will be under pressure.
    
   And does not guarantee exactly-once semantics
     
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Wilson-BT commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Wilson-BT commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1198067658

   > > Yes, encapsulate such a data structure, dynamically load by spelling the URL on the sink side, and add a keyby operator before sink
   > 
   > At present, flink-doris-connector initiates the stream load of the table when the flink task starts, instead of doing the stream load when the upstream data is received. How to do this dynamic stream load? Please describe your design in detail~
   
   Many users put all the canal logs of all tables in the business library into one topic, which needs to be distributed before they can use doris-flink-connector. His idea is to edit a task to synchronize the entire library. Because currently doris-flink-connector uses http inputstream, that is, a checkpoint opens a stream, and a streamLoad url is strongly bound. In this case, we can only use the flink side to cache data, and then a table generates a buffer, and bind the corresponding table-streamload-url, set a threshold, such as rows number or batch size to submit tasks, just like doris-datax-writer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Im-Manshushu commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Im-Manshushu commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1197626618

   Yes, encapsulate such a data structure, dynamically load by spelling the URL on the sink side, and add a keyby operator before sink
   
   
   
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/doris"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2022年7月28日(星期四) 中午11:48
   ***@***.***&gt;;
   抄送:&nbsp;"I'm ~ ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/doris] [Feature] JSON data is dynamically written to the Doris table (Issue #11258)
   
   
   
   
   
     
   Customize the sink class, extends richsinkfunction and implement checkpointedfunction. In invoke dynamically sink through the properties in the custom entity object. The properties of the custom entity object are databasename, tablename, list<string&gt;. If there is a need to dynamically insert multi table data, the user only needs to encapsulate this entity object, I'm not good at English. I don't know whether I can express it clearly
     
   By encapsulating a data structure similar to <db, table, data&gt;, when stream loading, replace the corresponding url, and then stream load the data of each table in turn on the sink side?
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Wilson-BT commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Wilson-BT commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1198067162

   Many users put all the canal logs of all tables in the business library into one topic, which needs to be distributed before they can use doris-flink-connector. His idea is to edit a task to synchronize the entire library. Because currently doris-flink-connector uses http inputstream, that is, a checkpoint opens a stream, and a streamLoad url is strongly bound. In this case, we can only use the flink side to cache data, and then a table generates a buffer, and bind the corresponding table-streamload-url, set a threshold, such as rows number or batch size to submit tasks, just like doris-datax-writer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Im-Manshushu commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Im-Manshushu commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1197588589

   Customize the sink class, extends richsinkfunction and implement checkpointedfunction. In invoke dynamically sink through the properties in the custom entity object. The properties of the custom entity object are databasename, tablename, list<string&gt;. If there is a need to dynamically insert multi table data, the user only needs to encapsulate this entity object, I'm not good at English. I don't know whether I can express it clearly
   
   
   
   
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/doris"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2022年7月28日(星期四) 上午9:39
   ***@***.***&gt;;
   抄送:&nbsp;"I'm ~ ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/doris] [Feature] JSON data is dynamically written to the Doris table (Issue #11258)
   
   
   
   
   
    
   Can you provide a brief description of your design?
     
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Im-Manshushu commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Im-Manshushu commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1198096957

   Yes, that's it. A flink task can synchronize the data of the entire database
   
   
   
   
   
   
   
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/doris"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2022年7月28日(星期四) 晚上8:31
   ***@***.***&gt;;
   抄送:&nbsp;"I'm ~ ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/doris] [Feature] JSON data is dynamically written to the Doris table (Issue #11258)
   
   
   
   
   
     
   生成
     
   canal json ---&gt; serialize to RowData --&gt; TableBufferMap<String,Buffer&gt; map
    key is {db}_{table}, buffer value is a buffer contained {"column_a":"value_a","column_b":"value_b"...}
    we can submit to http://xxx:xx/api/{db}/{table}/_streamLoad when buffer over size like doris-datax-writer.
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Wilson-BT commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Wilson-BT commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1198078387

   > 生成
   
   canal json ---> serialize to RowData --> TableBufferMap<String,Buffer<String>> map
   key is ```{db}_{table}```, buffer value is a buffer contained``` {"column_a":"value_a","column_b":"value_b"...}```
   we can submit to http://xxx:xx/api/{db}/{table}/_streamLoad when buffer over size like doris-datax-writer.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Wilson-BT commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Wilson-BT commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1208139420

   > 
   
   Perhaps,doris can provide another http interface for database sync,use a special header ```-H 'table:xxx'``` to flush into doris the same database. and we can also reuse the url. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Im-Manshushu commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Im-Manshushu commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1198094741

   What you said is quite right. My idea is like this. The previous answer may not be described clearly
   
   
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/doris"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2022年7月28日(星期四) 晚上8:20
   ***@***.***&gt;;
   抄送:&nbsp;"I'm ~ ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/doris] [Feature] JSON data is dynamically written to the Doris table (Issue #11258)
   
   
   
   
   
      
   Yes, encapsulate such a data structure, dynamically load by spelling the URL on the sink side, and add a keyby operator before sink
     
   At present, flink-doris-connector initiates the stream load of the table when the flink task starts, instead of doing the stream load when the upstream data is received. How to do this dynamic stream load? Please describe your design in detail~
     
   Many users put all the canal logs of all tables in the business library into one topic, which needs to be distributed before they can use doris-flink-connector. His idea is to edit a task to synchronize the entire library. Because currently doris-flink-connector uses http inputstream, that is, a checkpoint opens a stream, and a streamLoad url is strongly bound. In this case, we can only use the flink side to cache data, and then a table generates a buffer, and bind the corresponding table-streamload-url, set a threshold, such as rows number or batch size to submit tasks, just like doris-datax-writer.
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] stalary commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
stalary commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1197553031

   Can you provide a brief description of your design?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JNSimba commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
JNSimba commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1197739788

   > Yes, encapsulate such a data structure, dynamically load by spelling the URL on the sink side, and add a keyby operator before sink
   
   At present, flink-doris-connector initiates the stream load of the table when the flink task starts, instead of doing the stream load when the upstream data is received. How to do this dynamic stream load? Please describe your design in detail~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Xff686542 commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Xff686542 commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1208119148

   > Yes, encapsulate such a data structure, dynamically load by spelling the URL on the sink side, and add a keyby operator before sink
   > […](#)
   > ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "apache/doris" ***@***.***&gt;; 发送时间:&nbsp;2022年7月28日(星期四) 中午11:48 ***@***.***&gt;; 抄送:&nbsp;"I'm ~ ***@***.******@***.***&gt;; 主题:&nbsp;Re: [apache/doris] [Feature] JSON data is dynamically written to the Doris table (Issue #11258) Customize the sink class, extends richsinkfunction and implement checkpointedfunction. In invoke dynamically sink through the properties in the custom entity object. The properties of the custom entity object are databasename, tablename, list<string&gt;. If there is a need to dynamically insert multi table data, the user only needs to encapsulate this entity object, I'm not good at English. I don't know whether I can express it clearly By encapsulating a data structure similar to <db, table, data&gt;, when stream loading, replace the corresponding url, and then stream load the data of each table in turn on the sink side? — Reply to this email dir
 ectly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***&gt;
   
   That's right, we've done that for now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Im-Manshushu commented on issue #11258: [Feature] JSON data is dynamically written to the Doris table

Posted by GitBox <gi...@apache.org>.
Im-Manshushu commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1201967334

   > > Many users put all the canal logs of all tables in the business library into one topic, which needs to be distributed before they can use doris-flink-connector. His idea is to edit a task to synchronize the entire library. Because currently doris-flink-connector uses http inputstream, that is, a checkpoint opens a stream, and a streamLoad url is strongly bound. Therefore, the current doris-flink-connector architecture is not suitable for the entire library synchronization, because it will involve too many http long link. In this case, we can only use the old streamload batch mode: the flink side caches data, then a table generates a buffer, and binds the corresponding table-streamload-url, and sets a threshold, such as rows number or batch size to submit tasks, just like doris-datax-writer.
   > 
   > However, in the old version of stream load and batch writing, there may be several problems:
   > 
   > 1. A series of problems caused by the unreasonable setting of the cached batch size: For example, if it is too small, it will cause the -235 problem caused by frequent imports; if the setting is too large, the flink memory will be under pressure.
   > 2. And does not guarantee exactly-once semantics
   
   So in future versions of flink-connector-doris, will this function of dynamically writing doris data tables be added? If so, in which version will it be added?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org