You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Grandl Robert <rg...@yahoo.com> on 2014/09/26 02:54:07 UTC

communication models between tasks in different stages

Hi guys,

I want to better understand how the input / output data circulates across tasks from different vertices. Specifically I am wondering what are the communication patterns across tasks in different vertices.(many to many, many to one, one to many ...).

Can someone explains me how I can identify this for every pair of parent / children vertices ? Is there some property per task which allows me to see that, or do I need to look at types of vertices ? 


Thanks,
Robert

RE: communication models between tasks in different stages

Posted by Bikas Saha <bi...@hortonworks.com>.
Scatter gather means the outputs are scattered into N shards by the
producers and N consumers gather each of the shards. This is the standard
MapReduce shuffle connection.



You can create any custom pattern by deriving from the EdgeManagerPlugin
API.



*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Thursday, October 02, 2014 3:15 PM
*To:* user@tez.apache.org
*Subject:* Re: communication models between tasks in different stages



Jeff,



Thanks a lot for you answer. However, I am still not able to completely
understand how it is going, simply looking at
routeDataMovementEventToDestination methods in:

BroadcastEdgeManager

OneToOneEdgeManager

ScatterGatherEdgeManager



So for broadcast, I guess every input task has a connection with every
output task. One to One, means every input task has a connection with only
one output task, on the assumption they are equal or less than number of
output tasks. What about scatter gather ?



Thanks,

Robert



On Thursday, September 25, 2014 7:20 PM, Jianfeng (Jeff) Zhang <
jzhang@hortonworks.com> wrote:



Hi Robert,



   Please refer EdgeManagerPlugin, there's several implementations for
difference communication pattern (BroadcastEdgeManager,
OneToOneEdgeManager, ScatterGatherEdgeManager). Which one is used depend on
what DataMovementType you specify when you connect vertices.


Best Regards,

Jeff Zhang





On Fri, Sep 26, 2014 at 8:54 AM, Grandl Robert <rg...@yahoo.com> wrote:

Hi guys,



I want to better understand how the input / output data circulates across
tasks from different vertices. Specifically I am wondering what are the
communication patterns across tasks in different vertices.(many to many,
many to one, one to many ...).



Can someone explains me how I can identify this for every pair of parent /
children vertices ? Is there some property per task which allows me to see
that, or do I need to look at types of vertices ?



Thanks,

Robert




CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: communication models between tasks in different stages

Posted by Grandl Robert <rg...@yahoo.com>.
Jeff,
Thanks a lot for you answer. However, I am still not able to completely understand how it is going, simply looking at routeDataMovementEventToDestination methods in:BroadcastEdgeManagerOneToOneEdgeManagerScatterGatherEdgeManager
So for broadcast, I guess every input task has a connection with every output task. One to One, means every input task has a connection with only one output task, on the assumption they are equal or less than number of output tasks. What about scatter gather ?
Thanks,Robert
 

     On Thursday, September 25, 2014 7:20 PM, Jianfeng (Jeff) Zhang <jz...@hortonworks.com> wrote:
   

 Hi Robert,
   Please refer EdgeManagerPlugin, there's several implementations for difference communication pattern (BroadcastEdgeManager, OneToOneEdgeManager, ScatterGatherEdgeManager). Which one is used depend on what DataMovementType you specify when you connect vertices. 
Best Regards,Jeff Zhang

On Fri, Sep 26, 2014 at 8:54 AM, Grandl Robert <rg...@yahoo.com> wrote:

Hi guys,
I want to better understand how the input / output data circulates across tasks from different vertices. Specifically I am wondering what are the communication patterns across tasks in different vertices.(many to many, many to one, one to many ...).
Can someone explains me how I can identify this for every pair of parent / children vertices ? Is there some property per task which allows me to see that, or do I need to look at types of vertices ? 

Thanks,Robert


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

   

Re: communication models between tasks in different stages

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.
Hi Robert,

   Please refer EdgeManagerPlugin, there's several implementations for
difference communication pattern (BroadcastEdgeManager,
OneToOneEdgeManager, ScatterGatherEdgeManager). Which one is used depend on
what DataMovementType you specify when you connect vertices.

Best Regards,
Jeff Zhang


On Fri, Sep 26, 2014 at 8:54 AM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi guys,
>
> I want to better understand how the input / output data circulates across
> tasks from different vertices. Specifically I am wondering what are the
> communication patterns across tasks in different vertices.(many to many,
> many to one, one to many ...).
>
> Can someone explains me how I can identify this for every pair of parent /
> children vertices ? Is there some property per task which allows me to see
> that, or do I need to look at types of vertices ?
>
> Thanks,
> Robert
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.