You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by Xuan Cao <ca...@yahoo.com.INVALID> on 2018/03/06 22:22:15 UTC

Tez statistics model

Hi,

 

Thisis Xuan with Microsoft. We are looking at the Tez statistics model trying toextend it to support our custom workloads. Here are some observations of thefew statistics objects in the current codebase:

1.   InputStatistics,OutputStatictics and VertexStatiscs are interfaces in tez-api.

2.   TaskStatistics is a classin tez-runtime-internals.

3.   VertexStatiscsImpl is aninner class of VertexImpl implementing VertexStatistics in tez-dag.

4.   IOStatistics is in a classin tez-runtime-internals and IOStaticsticsImpl is a static inner class ofVertexImpl extending IOStatistics but implementing InputStatistics andOutputStatistics.

 

Ourintention is to extend VertexStatisticsImpl so that we can aggregate the custompayload in TaskStatistics. But the object model here seems to be ratherconfusing and not consistent. We are planning to do some work in this area toimprove it, but not sure whether there is any work going on right now and thestatus of it? 

 

Regards,

XuanCao




Re: Tez statistics model

Posted by Jonathan Eagles <je...@gmail.com>.
It sounds like there is interest in this feature from others in the
community.

XuanCao, if you are willing to open a TEZ jira request with a description
of the features we can start discussing requirements and solution ideas at
that time.

Regards,
jeagles

On Thu, Mar 8, 2018 at 11:55 PM, Eric Wohlstadter <wo...@cs.ubc.ca>
wrote:

> Hi Xuan, Jonathan,
>
> We have some use-cases that could really benefit from
> "solidify and enhance the Tez Statistics API so that applications
> (pig/hive/scope/etc) can provide their own custom implementations"
>
> I'd be happy to contribute in some way.
>
> I think it would be a good idea to leverage standard libraries
> like DropWizard Metrics: http://metrics.dropwizard.io/4.0.0/
>
> DropWizard for example has a really nice number of off-the-shelf
> integrations: http://metrics.dropwizard.io/4.0.0/manual/third-party.html
>
> On Thu, Mar 8, 2018 at 1:14 PM, Jonathan Eagles <je...@gmail.com> wrote:
>
> > Thanks for reaching out to use XuanCao.
> >
> > Sounds like you are planning to use new custom statistics that a new
> custom
> > Vertex can utilize to make better decisions. Are you planning to
> contribute
> > these custom statistics changes to tez? Or are are we trying to solidify
> > and enhance the Tez Statistics API so that applications
> > (pig/hive/scope/etc) can provide their own custom implementations? This
> > sounds like a great addition if I understand correctly. There are some
> > request for similar (TEZ-1167, TEZ-764) so we can try to find a way to
> > implement this that the whole community can benefit.
> >
> > Regards,
> > jeagles
> >
> > On Tue, Mar 6, 2018 at 4:22 PM, Xuan Cao <ca...@yahoo.com.invalid>
> wrote:
> >
> > >
> > > Hi,
> > >
> > >
> > >
> > > Thisis Xuan with Microsoft. We are looking at the Tez statistics model
> > > trying toextend it to support our custom workloads. Here are some
> > > observations of thefew statistics objects in the current codebase:
> > >
> > > 1.   InputStatistics,OutputStatictics and VertexStatiscs are
> interfaces
> > > in tez-api.
> > >
> > > 2.   TaskStatistics is a classin tez-runtime-internals.
> > >
> > > 3.   VertexStatiscsImpl is aninner class of VertexImpl implementing
> > > VertexStatistics in tez-dag.
> > >
> > > 4.   IOStatistics is in a classin tez-runtime-internals and
> > > IOStaticsticsImpl is a static inner class ofVertexImpl extending
> > > IOStatistics but implementing InputStatistics andOutputStatistics.
> > >
> > >
> > >
> > > Ourintention is to extend VertexStatisticsImpl so that we can aggregate
> > > the custompayload in TaskStatistics. But the object model here seems to
> > be
> > > ratherconfusing and not consistent. We are planning to do some work in
> > this
> > > area toimprove it, but not sure whether there is any work going on
> right
> > > now and thestatus of it?
> > >
> > >
> > >
> > > Regards,
> > >
> > > XuanCao
> > >
> > >
> > >
> > >
> >
>

Re: Tez statistics model

Posted by Eric Wohlstadter <wo...@cs.ubc.ca>.
Hi Xuan, Jonathan,

We have some use-cases that could really benefit from
"solidify and enhance the Tez Statistics API so that applications
(pig/hive/scope/etc) can provide their own custom implementations"

I'd be happy to contribute in some way.

I think it would be a good idea to leverage standard libraries
like DropWizard Metrics: http://metrics.dropwizard.io/4.0.0/

DropWizard for example has a really nice number of off-the-shelf
integrations: http://metrics.dropwizard.io/4.0.0/manual/third-party.html

On Thu, Mar 8, 2018 at 1:14 PM, Jonathan Eagles <je...@gmail.com> wrote:

> Thanks for reaching out to use XuanCao.
>
> Sounds like you are planning to use new custom statistics that a new custom
> Vertex can utilize to make better decisions. Are you planning to contribute
> these custom statistics changes to tez? Or are are we trying to solidify
> and enhance the Tez Statistics API so that applications
> (pig/hive/scope/etc) can provide their own custom implementations? This
> sounds like a great addition if I understand correctly. There are some
> request for similar (TEZ-1167, TEZ-764) so we can try to find a way to
> implement this that the whole community can benefit.
>
> Regards,
> jeagles
>
> On Tue, Mar 6, 2018 at 4:22 PM, Xuan Cao <ca...@yahoo.com.invalid> wrote:
>
> >
> > Hi,
> >
> >
> >
> > Thisis Xuan with Microsoft. We are looking at the Tez statistics model
> > trying toextend it to support our custom workloads. Here are some
> > observations of thefew statistics objects in the current codebase:
> >
> > 1.   InputStatistics,OutputStatictics and VertexStatiscs are interfaces
> > in tez-api.
> >
> > 2.   TaskStatistics is a classin tez-runtime-internals.
> >
> > 3.   VertexStatiscsImpl is aninner class of VertexImpl implementing
> > VertexStatistics in tez-dag.
> >
> > 4.   IOStatistics is in a classin tez-runtime-internals and
> > IOStaticsticsImpl is a static inner class ofVertexImpl extending
> > IOStatistics but implementing InputStatistics andOutputStatistics.
> >
> >
> >
> > Ourintention is to extend VertexStatisticsImpl so that we can aggregate
> > the custompayload in TaskStatistics. But the object model here seems to
> be
> > ratherconfusing and not consistent. We are planning to do some work in
> this
> > area toimprove it, but not sure whether there is any work going on right
> > now and thestatus of it?
> >
> >
> >
> > Regards,
> >
> > XuanCao
> >
> >
> >
> >
>

Re: Tez statistics model

Posted by Jonathan Eagles <je...@gmail.com>.
Thanks for reaching out to use XuanCao.

Sounds like you are planning to use new custom statistics that a new custom
Vertex can utilize to make better decisions. Are you planning to contribute
these custom statistics changes to tez? Or are are we trying to solidify
and enhance the Tez Statistics API so that applications
(pig/hive/scope/etc) can provide their own custom implementations? This
sounds like a great addition if I understand correctly. There are some
request for similar (TEZ-1167, TEZ-764) so we can try to find a way to
implement this that the whole community can benefit.

Regards,
jeagles

On Tue, Mar 6, 2018 at 4:22 PM, Xuan Cao <ca...@yahoo.com.invalid> wrote:

>
> Hi,
>
>
>
> Thisis Xuan with Microsoft. We are looking at the Tez statistics model
> trying toextend it to support our custom workloads. Here are some
> observations of thefew statistics objects in the current codebase:
>
> 1.   InputStatistics,OutputStatictics and VertexStatiscs are interfaces
> in tez-api.
>
> 2.   TaskStatistics is a classin tez-runtime-internals.
>
> 3.   VertexStatiscsImpl is aninner class of VertexImpl implementing
> VertexStatistics in tez-dag.
>
> 4.   IOStatistics is in a classin tez-runtime-internals and
> IOStaticsticsImpl is a static inner class ofVertexImpl extending
> IOStatistics but implementing InputStatistics andOutputStatistics.
>
>
>
> Ourintention is to extend VertexStatisticsImpl so that we can aggregate
> the custompayload in TaskStatistics. But the object model here seems to be
> ratherconfusing and not consistent. We are planning to do some work in this
> area toimprove it, but not sure whether there is any work going on right
> now and thestatus of it?
>
>
>
> Regards,
>
> XuanCao
>
>
>
>