You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by Adrian Nicoara <ad...@microsoft.com.INVALID> on 2018/08/10 20:46:52 UTC

TEZ-1190 status

Hello,

I wanted to see if there has been additional work regarding:
https://issues.apache.org/jira/browse/TEZ-1190
That has not been published yet.
This is one issue that we are currently running into.

Thank you in advance,
Adrian

Re: TEZ-1190 status

Posted by Gopal Vijayaraghavan <go...@apache.org>.
>  The named vertex approach seems the right way to go here. We took the patch in the JIRA and applied it to our branch. After fixing bugs and some tweaks the prototype does work. The change however is pretty big and we want to contribute back this to the trunk.
>  
> Are there any concerns if proceed further with the named edge approach?

None from me at least - a fully functional patch is always welcome. I can reassign that patch to whoever wants to contribute this.

There are definitely parts of Hive which will break when a vertex:vertex adjacency list is not enough to figure out edge type/properties (i.e input schema assumptions), but that is a problem for the Hive community to solve if it decides to start using parallel edges in query plans.

Cheers,
Gopal 



Re: TEZ-1190 status

Posted by Hitesh Sharma <hi...@microsoft.com.INVALID>.
We looked into this further and analyzed the two approaches discussed in the JIRA. There is no clear way to accomplish multiple edges between vertices using the dummy vertex approach. I don't see the VertexManager being involved in routing and seems to be more for updating the vertex properties like source tasks and edges. With the dummy vertex approach we will also have challenges with regards to revocations and resiliency.


The named vertex approach seems the right way to go here. We took the patch in the JIRA and applied it to our branch. After fixing bugs and some tweaks the prototype does work. The change however is pretty big and we want to contribute back this to the trunk.


Are there any concerns if proceed further with the named edge approach?


Thanks,

Hitesh

________________________________
From: Adrian Nicoara
Sent: Wednesday, August 22, 2018 5:05:53 PM
To: dev@tez.apache.org; Hitesh Sharma; . Anupam
Subject: RE: TEZ-1190 status

As a small follow up, here are parts of the code that I stumbled upon.
Each edge that has a vertex with 0 tasks as its destination, says that no routing of events is required:
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftez%2Fblob%2F4b9a7be1b98cff00c44e7d3ffb2486bb59ca6804%2Ftez-dag%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Ftez%2Fdag%2Fapp%2Fdag%2Fimpl%2FEdge.java%23L263-L264&amp;data=02%7C01%7C%7Cd14ea9dde43641988aca08d6088c3188%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636705795546228193&amp;sdata=BAA1xI8ZDfADcnTSlv72wgpbDWU2C6dWjOjebIeKuWg%3D&amp;reserved=0

That causes the EdgeManager code to be skipped:
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ftez%2Fblob%2F4b9a7be1b98cff00c44e7d3ffb2486bb59ca6804%2Ftez-dag%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Ftez%2Fdag%2Fapp%2Fdag%2Fimpl%2FEdge.java%23L487-L527&amp;data=02%7C01%7C%7Cd14ea9dde43641988aca08d6088c3188%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636705795546228193&amp;sdata=rI36ngW6LFmuB5zN%2Bv0Wg8rL1NLs3adX1gIzTszE9t4%3D&amp;reserved=0

Is there another mechanism that the VertexManager uses to take action on data movement events, other than routing through the edge manager?

> -----Original Message-----
> From: Adrian Nicoara <ad...@microsoft.com.INVALID>
> Sent: Monday, August 13, 2018 1:43 PM
> To: dev@tez.apache.org; Hitesh Sharma <hi...@microsoft.com>; . Anupam
> <an...@microsoft.com>
> Subject: RE: TEZ-1190 status
>
> > -----Original Message-----
> > From: Gopal Vijayaraghavan <go...@apache.org>
> > Sent: Friday, August 10, 2018 2:20 PM
> > To: dev@tez.apache.org; Hitesh Sharma <hi...@microsoft.com>; .
> Anupam
> > <an...@microsoft.com>
> > Subject: Re: TEZ-1190 status
> >
> > As far as I know, that patch + design has been abandoned.
> >
> > There was a discussion about adding it without fundamentally changing
> > the DAGImpl (and task recovery etc).
> >
> > The VertexManager already allows you to have a vertex without a task
> > in it, without giving up on being a participant in the data movement events.
>
> This sounds very promising. Can you point me to any tests, or sample code,
> or the implementation code I should be looking at?
>
> > However, there are no immediate takers for the new approach (which is
> > additive, so needs no significant refactoring changes) and this
> > feature did not have anyone waiting for it.
>
> If it can be done with the above suggestion, I can pick up this work, if it is OK.

RE: TEZ-1190 status

Posted by Adrian Nicoara <ad...@microsoft.com.INVALID>.
As a small follow up, here are parts of the code that I stumbled upon.
Each edge that has a vertex with 0 tasks as its destination, says that no routing of events is required:
https://github.com/apache/tez/blob/4b9a7be1b98cff00c44e7d3ffb2486bb59ca6804/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/Edge.java#L263-L264

That causes the EdgeManager code to be skipped:
https://github.com/apache/tez/blob/4b9a7be1b98cff00c44e7d3ffb2486bb59ca6804/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/Edge.java#L487-L527

Is there another mechanism that the VertexManager uses to take action on data movement events, other than routing through the edge manager?

> -----Original Message-----
> From: Adrian Nicoara <ad...@microsoft.com.INVALID>
> Sent: Monday, August 13, 2018 1:43 PM
> To: dev@tez.apache.org; Hitesh Sharma <hi...@microsoft.com>; . Anupam
> <an...@microsoft.com>
> Subject: RE: TEZ-1190 status
> 
> > -----Original Message-----
> > From: Gopal Vijayaraghavan <go...@apache.org>
> > Sent: Friday, August 10, 2018 2:20 PM
> > To: dev@tez.apache.org; Hitesh Sharma <hi...@microsoft.com>; .
> Anupam
> > <an...@microsoft.com>
> > Subject: Re: TEZ-1190 status
> >
> > As far as I know, that patch + design has been abandoned.
> >
> > There was a discussion about adding it without fundamentally changing
> > the DAGImpl (and task recovery etc).
> >
> > The VertexManager already allows you to have a vertex without a task
> > in it, without giving up on being a participant in the data movement events.
> 
> This sounds very promising. Can you point me to any tests, or sample code,
> or the implementation code I should be looking at?
> 
> > However, there are no immediate takers for the new approach (which is
> > additive, so needs no significant refactoring changes) and this
> > feature did not have anyone waiting for it.
> 
> If it can be done with the above suggestion, I can pick up this work, if it is OK.

RE: TEZ-1190 status

Posted by Adrian Nicoara <ad...@microsoft.com.INVALID>.
> -----Original Message-----
> From: Gopal Vijayaraghavan <go...@apache.org>
> Sent: Friday, August 10, 2018 2:20 PM
> To: dev@tez.apache.org; Hitesh Sharma <hi...@microsoft.com>; . Anupam
> <an...@microsoft.com>
> Subject: Re: TEZ-1190 status
> 
> As far as I know, that patch + design has been abandoned.
> 
> There was a discussion about adding it without fundamentally changing the
> DAGImpl (and task recovery etc).
> 
> The VertexManager already allows you to have a vertex without a task in it,
> without giving up on being a participant in the data movement events.

This sounds very promising. Can you point me to any tests, or sample code, or the implementation code I should be looking at?

> However, there are no immediate takers for the new approach (which is
> additive, so needs no significant refactoring changes) and this feature did not
> have anyone waiting for it.

If it can be done with the above suggestion, I can pick up this work, if it is OK.

Re: TEZ-1190 status

Posted by Gopal Vijayaraghavan <go...@apache.org>.
>    I wanted to see if there has been additional work regarding:
>    https://issues.apache.org/jira/browse/TEZ-1190
>    That has not been published yet.

As far as I know, that patch + design has been abandoned.

There was a discussion about adding it without fundamentally changing the DAGImpl (and task recovery etc).

The VertexManager already allows you to have a vertex without a task in it, without giving up on being a participant in the data movement events.

However, there are no immediate takers for the new approach (which is additive, so needs no significant refactoring changes) and this feature did not have anyone waiting for it.

Cheers,
Gopal