You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/06/09 13:51:01 UTC

[GitHub] [dolphinscheduler] complone opened a new issue, #10387: [Feature][dolphinscheduler-common] Support Git Protocol

complone opened a new issue, #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement.
   
   
   ### Description
   
   The current resource center supports hdfs/s3/local storage due to the way of uploading and reading files, only need to add git file storage
   
   When a user uploads a file to the resource center to access ```ResourcesController```, the implementation class ```HadoopUtils``` of the ```StorageOperate``` interface will implement file operations with ```S3Utils```
   
   Research, it protocol ecplise provides a Java client ``org.eclipse.jgit``` to support file storage based on
   I will compose the API-related storage operation implementation based on the production environment example here
   [jgit-cookbook](https://github.com/centic9/jgit-cookbook)
   
   
   ### Use case
   
   The following are two simple git manipulation examples, which will be further expanded in combination with [jgit-cookbook](https://github.com/centic9/jgit-cookbook)
   
   
   git create
   ```
   public class RepositoryProviderExistingClientImpl implements RepositoryProvider {
       
       private String[] clientPath;
       private String tenantCode;
       public RepositoryProviderExistingClientImpl(String tenantCode, String[] clientPath) {
           this.clientPath = clientPath;
           this.tenantCode = tenantCode;
       }
       
       @Override
       public Repository getRepository() throws Exception {
           try  {
               File workdir = getFile(tenantCode, clientPath);
               return new FileRepositoryBuilder().setWorkTree(workdir).build();
           }catch (Exception ex){
               throw new StorageOperateTransformException();
           }
       }
       
       public File getFile(String tenantCode, String... pathComponents) throws IOException {
           String rootPath = new File(new File(tenantCode), "").getPath();
           for (String pathComponent : pathComponents)
               rootPath = rootPath + File.separatorChar + pathComponent;
           File result = new File(rootPath);
           FileUtils.mkdirs(result, true);
           return result;
       }
   }
   ```
   
   
   git pull 
   ```
   public class RepositoryProviderCloneImpl implements RepositoryProvider {
       
       private String repoPath;
       private String clientPath;
       public RepositoryProviderCloneImpl(String repoPath, String clientPath) {
           this.repoPath = repoPath;
           this.clientPath = clientPath;
       }
       
       @Override
       public Repository getRepository() throws Exception {
           File client = new File(clientPath);
           boolean isCreated = client.mkdir();
           if (!isCreated){
               throw new StorageOperateTransformException("File create failed by using Git!");
           }
          
           try {
               Git result = Git.cloneRepository()
                   .setURI(repoPath)
                   .setDirectory(client)
                   .call();
               return result.getRepository();
           } catch (Exception ex){
               throw new StorageOperateTransformException(ex.getMessage());
           }
       }
   }
   ```
   
   ### Related issues
   
   Nope
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] complone commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
complone commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1181256441

   > @complone FYI, you may also refer to [airflow-code-editor](https://github.com/andreax79/airflow-code-editor) to see how workflow as code could be integrated with git. ![image](https://user-images.githubusercontent.com/34905992/178395668-71002a0b-14aa-484c-a0ef-daff1684dc31.png)
   
   Thank you very much for your help let me take a look first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1151148605

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] complone commented on issue #10387: [Feature][dolphinscheduler-common] Support workflow review

Posted by GitBox <gi...@apache.org>.
complone commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1229171831

   > https://github.com/xtr1993/datacenter-git-client-demo.git
   
   Thank you very much for the Git operation encapsulation logic you provide, will try to design logic compatible with dolphinscheduler


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1167015046

   > IMHO, we could separate this issue into two parts:
   > 
   > 1. Add source control ability for resource center.
   > 2. Enable users to review workflow changes.
   > 
   > For the first one, since resource center are based on HDFS / S3 / ..., we could add a log file in remote storage to store operation log / commit hash code, etc. and combine commit hash code with object name or tag. Use S3 / HDFS read/write interface to interact with this log file to ensure consistency. In that case, we could enable source control not only for txt / sql / sh file, but also for jar / tar and avoiding exploding the remote git repo.
   > 
   > For the second one, we could add some kind of mapping function to map workflows into python DAGs. Users will get different versions of python DAGs once they create / update their workflows. Based on that, we could add source control with git protocol to enable users to review workflow changes and provide them with better production experience.
   > 
   > WDYT @complone @xtr1993 @SbloodyS @zhongjiajie @davidzollo
   
   To clarify, those generated python DAGs mentioned above are only for review purposes, we do not really need to run those DAGs. Therefore, there's no need to change current code logic and we may just add an assistant feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] ruanwenjun commented on issue #10387: [Feature][dolphinscheduler-common] Support workflow review

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1229205466

   As far as I see, this issue is only talked about adding a new resource implementation about git.
   
   I didn't see any detailed design about how to manage the resource version, we need to store the resource version in our database? 
   
   And this pr doesn't talk about the detail of how to store the workflow in git(resource center), if we don't store the workflow in git, how can we review it? cc @davidzollo @caishunfeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] davidzollo commented on issue #10387: [Feature][dolphinscheduler-common] Support workflow review

Posted by GitBox <gi...@apache.org>.
davidzollo commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1229212577

   > As far as I see, this issue is only talked about adding a new resource center implementation by git.
   > 
   > I didn't see any detailed design about how to manage the resource version, we need to store the resource version in our database?
   > 
   > And this pr doesn't talk about the detail of how to store the workflow in git(resource center), if we don't store the workflow in git, how can we review it? cc @davidzollo @caishunfeng
   
   yes, as @EricGao888 said, https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1166904625 , I think splitting two issues will be better


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1181244208

   @complone FYI, you may also refer to [airflow-code-editor](https://github.com/andreax79/airflow-code-editor) to see how workflow as code could be integrated with git.
   ![image](https://user-images.githubusercontent.com/34905992/178395668-71002a0b-14aa-484c-a0ef-daff1684dc31.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1169538152

   > > 某种映射函数来将工作流映射到 python DA
   > 
   > @EricGao888 Thank you very much for your reply, for the second point after the discussion with @davidzollo, the more demand in the community is based on the git protocol management. Usually this scenario is every time the user modifies the version of the DAG. I will take the time recently. Check out the running process of airflow to generate DAG management for better design version
   
   @complone Hi complone, thx again for your effort. If you could bring such feature into DS, it will be fantastic. Instead of understanding the running process of airflow, I suggest spending some time on the syntax of airflow DAG. Actually, we may not need to really run such DAGs generated from workflows. The main purpose is to help users review / give suggestions on the changes and python DAGs are easier to review than graphs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] complone commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
complone commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1178330362

   > 主要目
   
   @EricGao888 Thanks for the supplement. During this time, I will read the process of generating DAG by airflow, so that I can discuss with you later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] complone commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
complone commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1151154239

   Welcome more friends to supplement the scene, discuss here.
   @SbloodyS Could you please discuss it together?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] xtr1993 commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
xtr1993 commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1151855691

   I have implemented the function of using git to manage code in my project,Here is my business flowchart, hope it helps you:
   ![632e0dd7f1daf6d0329b084537a3e16](https://user-images.githubusercontent.com/38655011/172978837-7b9f9830-5a44-430f-b911-9b8b46ac2f87.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1166904625

   IMHO, we could separate this issue into two parts:
   
   1. Add source control ability for resource center.
   2.  Enable users to review workflow changes.
   
   For the first one, since resource center are based on HDFS / S3 / ..., we could add a log file in remote storage to store operation log / commit hash code, etc. and combine commit hash code with object name or tag. Use S3 / HDFS read/write interface to interact with this log file to ensure consistency. In that case, we could enable source control not only for txt / sql / sh file, but also for jar / tar and avoiding explode the remote git repo.
   
   For the second one, we could add some kind of mapping function to map workflow into python dag. Users will get different versions of python dags once they create / update their workflows. Based on that, we could add source control with git protocol to enable users to review workflow changes and provide them with better production experience.
   
   WDYT @complone @xtr1993 @SbloodyS @zhongjiajie @davidzollo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] davidzollo commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
davidzollo commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1219542443

   @zhongjiajie do you have any ideas?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] complone commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
complone commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1169532200

   > 某种映射函数来将工作流映射到 python DA
   
   @EricGao888  Thank you very much for your reply. Regarding the second point, the more demand in the community after and @davidzollo is to manage the version of the DAG modified by each user based on the git protocol. It is time for me to take time to check the management of the generated DAG for better management. Design version


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] complone commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
complone commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1166197304

   After preliminary research,, I found that JGit, as a Git client, has a heavy logic and it is not very friendly to manually build a repository locally based on Git commands, so I found another way to upload and download files based on  [REST-API](https:// github.com/GerritCodeReview/gerrit)
   
   doc: https://gerrit-review.googlesource.com/Documentation/rest-api-projects.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] xtr1993 commented on issue #10387: [Feature][dolphinscheduler-common] Support workflow review

Posted by GitBox <gi...@apache.org>.
xtr1993 commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1229171050

   I have implemented this function and have demo code, we can discuss this function together;
   https://github.com/xtr1993/datacenter-git-client-demo.git


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] complone commented on issue #10387: [Feature][dolphinscheduler-common] Support Git Protocol

Posted by GitBox <gi...@apache.org>.
complone commented on issue #10387:
URL: https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1151946733

   @xtr1993 Thank you very much, will do some research in the near future


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org