You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2020/06/29 05:58:49 UTC

[GitHub] [incubator-dolphinscheduler] leehom opened a new issue #3072: [Feature] sharding or Paralleling task

leehom opened a new issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072


      sharding or  Paralleling is important feature for for scheduler,i have some idea about that,for ds to implementing sharding or sharding easily。
   1. changing the dag to chain,en,we can say it Dimension reducing
   2. adding Parallism property to node of chain
   3. when buiding the chaining task of the job, api can build dag according to the Parallism of the node
   at all,something like flink,  streaming graph, job graph,  execution graph


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom commented on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom commented on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-657443727


   ![分片作业-技术架构](https://user-images.githubusercontent.com/721472/87289194-e89d1e80-c52e-11ea-84a7-f14c977dd2c6.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom commented on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom commented on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-658120529


     if as what you siad,ds is just a scheduler to scheduling,  why ds supports dag?,  what dag dose?,   service arrangement,  what is  service arrangement?  arraging servies do something together,  sharding is arraging more services do more things and do quickly, 
   is it? so, sharding is important for  service arrangement, then is important for dag,  last  is important for ds


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Rubik-W commented on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
Rubik-W commented on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-656638984


   I think, the main responsibilities of DS are scheduling and job execution,
   Data processing should be the responsibility of the node, E.g datax.
   
   
   --------------------
   DolphinScheduler(Incubator) Commtter
   Hemin Wen  温合民
   wenhemin@apache.org
   --------------------
   
   
   leehom <no...@github.com> 于2020年7月10日周五 上午9:10写道:
   
   > ds can be seen as a data processor, sharding can improving the performance in whole data processing lifecycle. for example, data input from db table, sharding task can input data from table Parallizing; the save happening at data converting , udf , data wrriting to db
   >
   > —
   > You are receiving this because you commented.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-656426207>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AJNXTBOUQVHK7VXFIFWFQWDR2ZTCBANCNFSM4OK5W7VA>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom edited a comment on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom edited a comment on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-656426207


       ds can be seen as a data processor, sharding can improving the performance in whole data processing lifecycle. 
      for example, data input from db table, sharding task can input data from table Parallizing; 
      the save happening at data converting , udf , data wrriting to db


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom edited a comment on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom edited a comment on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-658120529


     if as what you siad,ds is just a scheduler to scheduling,  why ds supports dag?,  what dag dose?,   service arrangement,  what is  service arrangement?  arraging servies do something together,  sharding is arraging more services to do things  more quickly, 
   is it? so, sharding is important for  service arrangement, then is important for dag,  last  is important for ds


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom removed a comment on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom removed a comment on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-657313160


     if ds is as what said, just scheduling job,  ds don't need to support dag, dag is a chain or graph which data pass from it's node to next node, when passing from one node to the other, sharding is important to improving the performance
   
      


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom commented on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom commented on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-657403190


   i agree all that Data processing is the matter of job itself, but sharding support would be the matter of scheduler engine,follow is what i had said,what it mean is how ds to support sharding or Paralleling task,not data processing
   
   > sharding or Paralleling task is important feature for scheduler,i have some idea about that,for ds to implementing sharding or Paralling easily。
   > 
   > 1. changing the dag to chain,en,we can say it Dimension reducing
   > 2. adding Parallism property to node of chain
   > 3. when buiding the chaining task of the job, api can build dag according to the Parallism of the node
   >    something like flink,  streaming graph, job graph,  execution graph
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] gabrywu commented on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
gabrywu commented on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-657316977


   > I think, the main responsibilities of DS are scheduling and job execution, Data processing should be the responsibility of the node, E.g datax.
   > […](#)
   > -------------------- DolphinScheduler(Incubator) Commtter Hemin Wen 温合民 wenhemin@apache.org -------------------- leehom <no...@github.com> 于2020年7月10日周五 上午9:10写道:
   > ds can be seen as a data processor, sharding can improving the performance in whole data processing lifecycle. for example, data input from db table, sharding task can input data from table Parallizing; the save happening at data converting , udf , data wrriting to db — You are receiving this because you commented. Reply to this email directly, view it on GitHub <[#3072 (comment)](https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-656426207)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJNXTBOUQVHK7VXFIFWFQWDR2ZTCBANCNFSM4OK5W7VA> .
   
   Yes, Data processing should be the responsibility of one job, not the schedule framework. However changing the dag to chain is a good idea


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom edited a comment on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom edited a comment on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-657403190


   
   ![技术架构-构建作业](https://user-images.githubusercontent.com/721472/87282139-8b04d400-c526-11ea-8d2a-1e6f1cac8df8.png)
   i agree all that Data processing is the matter of job itself, but sharding support would be the matter of scheduler engine,follow is what i had said,what it mean is how ds to support sharding or Paralleling task,not data processing
   
   > sharding or Paralleling task is important feature for scheduler,i have some idea about that,for ds to implementing sharding or Paralling easily。
   > 
   > 1. changing the dag to chain,en,we can say it Dimension reducing
   > 2. adding Parallism property to node of chain
   > 3. when buiding the chaining task of the job, api can build dag according to the Parallism of the node
   >    something like flink,  streaming graph, job graph,  execution graph
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom edited a comment on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom edited a comment on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-657403190


   
   ![技术架构-构建作业](https://user-images.githubusercontent.com/721472/87289118-cefbd700-c52e-11ea-8b9c-32e6ff094203.png)
   
   i agree all that Data processing is the matter of job itself, but sharding support would be the matter of scheduler engine,follow is what i had said,what it mean is how ds to support sharding or Paralleling task,not data processing
   
   > sharding or Paralleling task is important feature for scheduler,i have some idea about that,for ds to implementing sharding or Paralling easily。
   > 
   > 1. changing the dag to chain,en,we can say it Dimension reducing
   > 2. adding Parallism property to node of chain
   > 3. when buiding the chaining task of the job, api can build dag according to the Parallism of the node
   >    something like flink,  streaming graph, job graph,  execution graph
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] yangyichao-mango commented on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
yangyichao-mango commented on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-658108391


   > I think, the main responsibilities of DS are scheduling and job execution, Data processing should be the responsibility of the node, E.g datax.
   > […](#)
   > -------------------- DolphinScheduler(Incubator) Commtter Hemin Wen 温合民 wenhemin@apache.org -------------------- leehom <no...@github.com> 于2020年7月10日周五 上午9:10写道:
   > ds can be seen as a data processor, sharding can improving the performance in whole data processing lifecycle. for example, data input from db table, sharding task can input data from table Parallizing; the save happening at data converting , udf , data wrriting to db — You are receiving this because you commented. Reply to this email directly, view it on GitHub <[#3072 (comment)](https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-656426207)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJNXTBOUQVHK7VXFIFWFQWDR2ZTCBANCNFSM4OK5W7VA> .
   
   I agree with @Rubik-W .
   Can you describe the benefit of implementing sharding or paralleling task in scheduler engine.
   I think the responsibility of the scheduling engine is scheduling, and sharding should be within the scope of the data processing engine. Perhaps in most cases, the performance of sharding on the scheduling system will not be better than that on the data processing engine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom commented on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom commented on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-657313160


     if ds is as what said, just scheduling job,  ds don't need to support dag, dag is a chain or graph which data pass from it's node to next node, when passing from one node to the other, sharding is important to improving the performance
   
      


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom edited a comment on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom edited a comment on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-656426207


      ds can be seen as a data processor, sharding can improving the performance in whole data processing lifecycle. 
      for example, data input from db table, sharding task can input data from table Parallizing; 
      the save happening at data converting , udf , data wrriting to db


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Rubik-W commented on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
Rubik-W commented on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-656089254


   Can you describe in detail the application scenarios of sharding and Paralleling?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] yangyichao-mango edited a comment on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
yangyichao-mango edited a comment on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-658108391






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] leehom commented on issue #3072: [Feature] sharding or Paralleling task

Posted by GitBox <gi...@apache.org>.
leehom commented on issue #3072:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3072#issuecomment-656426207


       ds can be seen as a data processor, sharding can improving the performance in whole data processing lifecycle. for example, data input from db table, sharding task can input data from table Parallizing; the save happening at data converting , udf , data wrriting to db


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org