You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/07/02 08:21:27 UTC

[GitHub] [dolphinscheduler] jieguangzhou opened a new issue, #10738: [Enhancement][Task Plugin] Allows file transfer between tasks

jieguangzhou opened a new issue, #10738:
URL: https://github.com/apache/dolphinscheduler/issues/10738

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement.
   
   
   ### Description
   
   DolphinScheduler allows parameter transfer between tasks: https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/parameter/context.html
   
   But it can not allows file transfer between tasks. For example, I have two python scripts to do some analysis work. The second script process the data that come from the first script. I have to pass a path variable as a parameter.  
   
   Parameter passing will not work as expected if the two tasks are not the same worker, because actually, the path is not correct.
   
   I think if DolphinScheduler supports this feature, it would be a handy boost for scenarios such as data analysis and machine learning.
   
   ### Use case
   
   
   I think we can use the resource center as a file transfer store If the user has enabled the resource center. For example, In the task plugin, we can agree on a new path specification: 
   1. use `$from_remote(remote_path, local_path)` to download file from remote_path to local_path before task start.
   2. use `$to_remote(remote_path, local_path)` to upload file from local_path to remote_path
    
   The appeal was inspired by [AWS Sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/define-pipeline.html)
   
   ```python
   base_uri = f"s3://{default_bucket}/abalone"
   input_data_uri = sagemaker.s3.S3Uploader.upload(
       local_path=local_path, 
       desired_s3_uri=base_uri,
   )
   input_data = ParameterString(
       name="InputData",
       default_value=input_data_uri,
   )
   
   # This is the path to use directly
   ProcessingInput(source=input_data, destination="/opt/ml/processing/input")
   ```
   
   Above is the example of Sagemaker. If DolphinScheduler supports it, it should be easier to use it.
   Such as 
   ```shell
   # It will process data and save output data to the local path output/demo.csv, and upload that to bucket1/demo.csv in the resource center after the task is done.
   python process_data.py --output=$to_remote('bucket1/demo.csv', 'output/demo.csv')
   ```
   
   ```shell
   # It will download data from "bucket1/demo.csv" in the resource center and save it to the local path "output/demo.csv"
   # and than the following command actually executes
   # python analysis.py --input=data/demo.csv
   python analysis.py --input=$from_remote('bucket1/demo.csv', 'data/demo.csv')
   ```
   
   
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] jieguangzhou closed issue #10738: [Enhancement][Task Plugin] Allows file transfer between tasks

Posted by GitBox <gi...@apache.org>.
jieguangzhou closed issue #10738: [Enhancement][Task Plugin] Allows file transfer between tasks
URL: https://github.com/apache/dolphinscheduler/issues/10738


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] jieguangzhou commented on issue #10738: [Enhancement][Task Plugin] Allows file transfer between tasks

Posted by GitBox <gi...@apache.org>.
jieguangzhou commented on issue #10738:
URL: https://github.com/apache/dolphinscheduler/issues/10738#issuecomment-1172860382

   I'm not sure if I'll be able to implement it anytime soon. If anyone is interested in implementing it, thank you very much


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] zhongjiajie commented on issue #10738: [Enhancement][Task Plugin] Allows file transfer between tasks

Posted by "zhongjiajie (via GitHub)" <gi...@apache.org>.
zhongjiajie commented on issue #10738:
URL: https://github.com/apache/dolphinscheduler/issues/10738#issuecomment-1716832050

   close by https://github.com/apache/dolphinscheduler/pull/12552


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #10738: [Enhancement][Task Plugin] Allows file transfer between tasks

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #10738:
URL: https://github.com/apache/dolphinscheduler/issues/10738#issuecomment-1172859927

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS commented on issue #10738: [Enhancement][Task Plugin] Allows file transfer between tasks

Posted by GitBox <gi...@apache.org>.
SbloodyS commented on issue #10738:
URL: https://github.com/apache/dolphinscheduler/issues/10738#issuecomment-1172871389

   I think this feature depends on configuration center #10283. Otherwise, it is impossible to determine which object to use to store the configuration during uploading and downloading.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org