You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "VincentSleepless (via GitHub)" <gi...@apache.org> on 2023/05/10 07:04:05 UTC

[GitHub] [incubator-seatunnel] VincentSleepless opened a new issue, #4729: [Feature][Connector-V2 File] connector-file-oss write mode optimization

VincentSleepless opened a new issue, #4729:
URL: https://github.com/apache/incubator-seatunnel/issues/4729

   ### Search before asking
   
   - [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement.
   
   
   ### Description
   
   
   [Feature][Connector-V2 File] connector-file-oss write mode optimization
   
   In the current connector design mode, all file connectors are abstracted by the hadoop fileSystem to read and write to specific storage(s3,oss,localfile , ftp,hdfs ).
   
   when we sink write data to some special file storage system, suck as s3 or aliyun ,  the default policy to buffer tmp data is wirte local file to avoid memory cost,but it may cause some problems.
   1.all write task speed  will related by disk IO performance , especially for large file intergration.
   2.some connector source checkpoint policy is split , when a task split is finish ,  it will trigger a checkpoint , then file connector sink upload local tmp file to storage tmp directory, then rename to config path, the capacity of temp files  is unpredictable.
   
   when we sink write data to aws s3, hadoop-aws  may buffer data in local disk or memory in config ,we can avoid the promblems.
   https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html
   
   ![image](https://github.com/apache/incubator-seatunnel/assets/15063109/f3e4eceb-b5f4-41f8-8546-e20be65b5842)
   
   
   but when we sink write data to aliyun oss, hadoop-aliyun has no config like haoodoop-aws, all data will buffer in local disk.
   https://hadoop.apache.org/docs/stable/hadoop-aliyun/tools/hadoop-aliyun/index.html
   
   ![image](https://github.com/apache/incubator-seatunnel/assets/15063109/d0798d5a-5a0e-4978-989d-aee71b7514e5)
   
   
   this problem  can also occur in  engine checkpoint  storage  and imap storage.
   
   the best idea to flush data for data integration in memory , this will avoid the io performanca problem ,can we abstract the commom method for inputout put stream in file connectors , the special storage may use special sdk to read and writ? or we try to support buffer mode in hadoop aliyun ?
   
   welcome to discuss~
   
   ### Usage Scenario
   
   connector-file-oss
   engine checkpoint-storage
   engine imap-storage
   
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] commented on issue #4729: [Feature][Connector-V2 File] connector-file-oss write mode optimization

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4729:
URL: https://github.com/apache/seatunnel/issues/4729#issuecomment-1586398356

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] TyrantLucifer commented on issue #4729: [Feature][Connector-V2 File] connector-file-oss write mode optimization

Posted by "TyrantLucifer (via GitHub)" <gi...@apache.org>.
TyrantLucifer commented on issue #4729:
URL: https://github.com/apache/incubator-seatunnel/issues/4729#issuecomment-1544976934

   IMO, aliyun-oss has the parameter to config buffer, so your concern is useless.  For more details you can refer to aliyun-jindo sdk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] closed issue #4729: [Feature][Connector-V2 File] connector-file-oss write mode optimization

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #4729: [Feature][Connector-V2 File] connector-file-oss write mode optimization
URL: https://github.com/apache/seatunnel/issues/4729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] commented on issue #4729: [Feature][Connector-V2 File] connector-file-oss write mode optimization

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4729:
URL: https://github.com/apache/seatunnel/issues/4729#issuecomment-1617039213

   This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] VincentSleepless commented on issue #4729: [Feature][Connector-V2 File] connector-file-oss write mode optimization

Posted by "VincentSleepless (via GitHub)" <gi...@apache.org>.
VincentSleepless commented on issue #4729:
URL: https://github.com/apache/incubator-seatunnel/issues/4729#issuecomment-1542290934

   
   the newest code for hadoop-aliyun will support the buffer mode in disk or memory in branch trunk
   
   https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java
   
   ![image](https://github.com/apache/incubator-seatunnel/assets/15063109/77152133-cc62-4fde-8049-6b79acc3c566)
   
   
   we can try to update hadoop-aliyun version to solve this probem when apache hadooop release next version ~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org