You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Todd Lipcon <to...@cloudera.com> on 2018/09/04 23:02:29 UTC

Adding profile info from Frontend

Hey folks,

I'm working on a patch to add some more diagnostics from the planning
process into query profiles.

Currently, only the planning "Timeline" is reported back as part of the
response to createExecRequest. As part of the fetch-on-demand catalog work
I'd like to start exposing various counters such as cache hit/miss counts,
time spent on remote calls to the catalog, etc. Even in the existing code
paths, I can see some similar metrics being useful.

My current thinking is to remove the 'timeline'  (TEventSequence) field
from TExecRequest and replace it with a full TRuntimeProfileNode. I'd then
add some capability in the Java side to fill in counters, etc, in this
structure.

Any concerns with this approach before I go down this path? Are there any
compatibility guarantees I need to uphold with the profile output of
queries?

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Adding profile info from Frontend

Posted by Todd Lipcon <to...@cloudera.com>.
I have a WIP patch up here in case anyone's interested in taking an early
look: https://gerrit.cloudera.org/c/11388/

-Todd


On Tue, Sep 4, 2018 at 5:02 PM, Todd Lipcon <to...@cloudera.com> wrote:

> On Tue, Sep 4, 2018 at 4:46 PM, Andrew Sherman <as...@cloudera.com>
> wrote:
>
>> Hi Todd,
>>
>> I'm going to put the patch up for review any minute. After I finish first
>> time setup on https://gerrit.cloudera.org
>
>
> OK, I guess that settles whose patch goes in first, then, because I
> haven't written a line of code yet :-D
>
> -Todd
>
>
>>
>>
>> -Andrew
>>
>> On Tue, Sep 4, 2018 at 4:43 PM Todd Lipcon <to...@cloudera.com> wrote:
>>
>> > On Tue, Sep 4, 2018 at 4:28 PM, Andrew Sherman <as...@cloudera.com>
>> > wrote:
>> >
>> > > Hi Todd,
>> > >
>> > > I am making a simple fix for
>> > > IMPALA-6568 <https://issues.apache.org/jira/browse/IMPALA-6568> add
>> > > missing
>> > > Query Compilation section to profiles.
>> > > which would add the Timeline to all responses to createExecRequest.
>> > >
>> > > It sounds like your change is more deep. If you go ahead with your
>> change
>> > > it  sounds like my change might be redundant.
>> > >
>> >
>> > I'm not sure if it's totally redundant. Do you have a WIP patch already?
>> > I'm not sure why the existing timeline doesn't show up in all statement
>> > types, so maybe some changes are needed related to that, and then those
>> > changes will still be necessary when exposing a full profile node?
>> >
>> > Agreed we're likely to conflict, at the least, though. Do you have an
>> > estimate of when your patch will be up for review so we can coordinate
>> > which one goes in first?
>> >
>> > -Todd
>> >
>> > >
>> > >
>> > > On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon <to...@cloudera.com> wrote:
>> > >
>> > > > Hey folks,
>> > > >
>> > > > I'm working on a patch to add some more diagnostics from the
>> planning
>> > > > process into query profiles.
>> > > >
>> > > > Currently, only the planning "Timeline" is reported back as part of
>> the
>> > > > response to createExecRequest. As part of the fetch-on-demand
>> catalog
>> > > work
>> > > > I'd like to start exposing various counters such as cache hit/miss
>> > > counts,
>> > > > time spent on remote calls to the catalog, etc. Even in the existing
>> > code
>> > > > paths, I can see some similar metrics being useful.
>> > > >
>> > > > My current thinking is to remove the 'timeline'  (TEventSequence)
>> field
>> > > > from TExecRequest and replace it with a full TRuntimeProfileNode.
>> I'd
>> > > then
>> > > > add some capability in the Java side to fill in counters, etc, in
>> this
>> > > > structure.
>> > > >
>> > > > Any concerns with this approach before I go down this path? Are
>> there
>> > any
>> > > > compatibility guarantees I need to uphold with the profile output of
>> > > > queries?
>> > > >
>> > > > -Todd
>> > > > --
>> > > > Todd Lipcon
>> > > > Software Engineer, Cloudera
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>> >
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Adding profile info from Frontend

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, Sep 4, 2018 at 4:46 PM, Andrew Sherman <as...@cloudera.com>
wrote:

> Hi Todd,
>
> I'm going to put the patch up for review any minute. After I finish first
> time setup on https://gerrit.cloudera.org


OK, I guess that settles whose patch goes in first, then, because I haven't
written a line of code yet :-D

-Todd


>
>
> -Andrew
>
> On Tue, Sep 4, 2018 at 4:43 PM Todd Lipcon <to...@cloudera.com> wrote:
>
> > On Tue, Sep 4, 2018 at 4:28 PM, Andrew Sherman <as...@cloudera.com>
> > wrote:
> >
> > > Hi Todd,
> > >
> > > I am making a simple fix for
> > > IMPALA-6568 <https://issues.apache.org/jira/browse/IMPALA-6568> add
> > > missing
> > > Query Compilation section to profiles.
> > > which would add the Timeline to all responses to createExecRequest.
> > >
> > > It sounds like your change is more deep. If you go ahead with your
> change
> > > it  sounds like my change might be redundant.
> > >
> >
> > I'm not sure if it's totally redundant. Do you have a WIP patch already?
> > I'm not sure why the existing timeline doesn't show up in all statement
> > types, so maybe some changes are needed related to that, and then those
> > changes will still be necessary when exposing a full profile node?
> >
> > Agreed we're likely to conflict, at the least, though. Do you have an
> > estimate of when your patch will be up for review so we can coordinate
> > which one goes in first?
> >
> > -Todd
> >
> > >
> > >
> > > On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon <to...@cloudera.com> wrote:
> > >
> > > > Hey folks,
> > > >
> > > > I'm working on a patch to add some more diagnostics from the planning
> > > > process into query profiles.
> > > >
> > > > Currently, only the planning "Timeline" is reported back as part of
> the
> > > > response to createExecRequest. As part of the fetch-on-demand catalog
> > > work
> > > > I'd like to start exposing various counters such as cache hit/miss
> > > counts,
> > > > time spent on remote calls to the catalog, etc. Even in the existing
> > code
> > > > paths, I can see some similar metrics being useful.
> > > >
> > > > My current thinking is to remove the 'timeline'  (TEventSequence)
> field
> > > > from TExecRequest and replace it with a full TRuntimeProfileNode. I'd
> > > then
> > > > add some capability in the Java side to fill in counters, etc, in
> this
> > > > structure.
> > > >
> > > > Any concerns with this approach before I go down this path? Are there
> > any
> > > > compatibility guarantees I need to uphold with the profile output of
> > > > queries?
> > > >
> > > > -Todd
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Adding profile info from Frontend

Posted by Andrew Sherman <as...@cloudera.com>.
Hi Todd,

I'm going to put the patch up for review any minute. After I finish first
time setup on https://gerrit.cloudera.org

-Andrew

On Tue, Sep 4, 2018 at 4:43 PM Todd Lipcon <to...@cloudera.com> wrote:

> On Tue, Sep 4, 2018 at 4:28 PM, Andrew Sherman <as...@cloudera.com>
> wrote:
>
> > Hi Todd,
> >
> > I am making a simple fix for
> > IMPALA-6568 <https://issues.apache.org/jira/browse/IMPALA-6568> add
> > missing
> > Query Compilation section to profiles.
> > which would add the Timeline to all responses to createExecRequest.
> >
> > It sounds like your change is more deep. If you go ahead with your change
> > it  sounds like my change might be redundant.
> >
>
> I'm not sure if it's totally redundant. Do you have a WIP patch already?
> I'm not sure why the existing timeline doesn't show up in all statement
> types, so maybe some changes are needed related to that, and then those
> changes will still be necessary when exposing a full profile node?
>
> Agreed we're likely to conflict, at the least, though. Do you have an
> estimate of when your patch will be up for review so we can coordinate
> which one goes in first?
>
> -Todd
>
> >
> >
> > On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon <to...@cloudera.com> wrote:
> >
> > > Hey folks,
> > >
> > > I'm working on a patch to add some more diagnostics from the planning
> > > process into query profiles.
> > >
> > > Currently, only the planning "Timeline" is reported back as part of the
> > > response to createExecRequest. As part of the fetch-on-demand catalog
> > work
> > > I'd like to start exposing various counters such as cache hit/miss
> > counts,
> > > time spent on remote calls to the catalog, etc. Even in the existing
> code
> > > paths, I can see some similar metrics being useful.
> > >
> > > My current thinking is to remove the 'timeline'  (TEventSequence) field
> > > from TExecRequest and replace it with a full TRuntimeProfileNode. I'd
> > then
> > > add some capability in the Java side to fill in counters, etc, in this
> > > structure.
> > >
> > > Any concerns with this approach before I go down this path? Are there
> any
> > > compatibility guarantees I need to uphold with the profile output of
> > > queries?
> > >
> > > -Todd
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Adding profile info from Frontend

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, Sep 4, 2018 at 4:28 PM, Andrew Sherman <as...@cloudera.com>
wrote:

> Hi Todd,
>
> I am making a simple fix for
> IMPALA-6568 <https://issues.apache.org/jira/browse/IMPALA-6568> add
> missing
> Query Compilation section to profiles.
> which would add the Timeline to all responses to createExecRequest.
>
> It sounds like your change is more deep. If you go ahead with your change
> it  sounds like my change might be redundant.
>

I'm not sure if it's totally redundant. Do you have a WIP patch already?
I'm not sure why the existing timeline doesn't show up in all statement
types, so maybe some changes are needed related to that, and then those
changes will still be necessary when exposing a full profile node?

Agreed we're likely to conflict, at the least, though. Do you have an
estimate of when your patch will be up for review so we can coordinate
which one goes in first?

-Todd

>
>
> On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon <to...@cloudera.com> wrote:
>
> > Hey folks,
> >
> > I'm working on a patch to add some more diagnostics from the planning
> > process into query profiles.
> >
> > Currently, only the planning "Timeline" is reported back as part of the
> > response to createExecRequest. As part of the fetch-on-demand catalog
> work
> > I'd like to start exposing various counters such as cache hit/miss
> counts,
> > time spent on remote calls to the catalog, etc. Even in the existing code
> > paths, I can see some similar metrics being useful.
> >
> > My current thinking is to remove the 'timeline'  (TEventSequence) field
> > from TExecRequest and replace it with a full TRuntimeProfileNode. I'd
> then
> > add some capability in the Java side to fill in counters, etc, in this
> > structure.
> >
> > Any concerns with this approach before I go down this path? Are there any
> > compatibility guarantees I need to uphold with the profile output of
> > queries?
> >
> > -Todd
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Adding profile info from Frontend

Posted by Andrew Sherman <as...@cloudera.com>.
Hi Todd,

I am making a simple fix for
IMPALA-6568 <https://issues.apache.org/jira/browse/IMPALA-6568> add missing
Query Compilation section to profiles.
which would add the Timeline to all responses to createExecRequest.

It sounds like your change is more deep. If you go ahead with your change
it  sounds like my change might be redundant.

Thanks

-Andrew




On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon <to...@cloudera.com> wrote:

> Hey folks,
>
> I'm working on a patch to add some more diagnostics from the planning
> process into query profiles.
>
> Currently, only the planning "Timeline" is reported back as part of the
> response to createExecRequest. As part of the fetch-on-demand catalog work
> I'd like to start exposing various counters such as cache hit/miss counts,
> time spent on remote calls to the catalog, etc. Even in the existing code
> paths, I can see some similar metrics being useful.
>
> My current thinking is to remove the 'timeline'  (TEventSequence) field
> from TExecRequest and replace it with a full TRuntimeProfileNode. I'd then
> add some capability in the Java side to fill in counters, etc, in this
> structure.
>
> Any concerns with this approach before I go down this path? Are there any
> compatibility guarantees I need to uphold with the profile output of
> queries?
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Adding profile info from Frontend

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, Sep 4, 2018 at 4:19 PM, Philip Zeyliger <ph...@cloudera.com> wrote:

> I'm certainly aware of tools that read TEventSequence and expect it to have
> certain structures. If we change the profiles significantly, we should
> insert a version sigil in there so that a client could at least figure out
> that they're looking at something new.
>

OK, I'll see if I can keep the existing timeline in the same spot in the
resulting profile (potentially with additional fields in that same profile
node)


>
> A bigger modeling conversation is the use of string keys everywhere. For
> the timeline, for example, it's a list of (description, time) pairs, with
> the descriptions being strings. An alternative would have been to put the
> strings in Thrift itself. The latter, I think, makes tooling easier to
> write (since you're calling "get_query_started_time()" instead of "for
> description, time in timelines: if description == 'query started': return
> time". I recognize working with Thrift is kind of a pain, but I want to
> throw the thought out there if we're adding things to profiles.
>

Agreed -- or a set of well-known fields defined as a thrift enum instead of
relying on strings.. but I dont want to bite off too much in one go here :)


>
> -- Philip
>
>
> On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon <to...@cloudera.com> wrote:
>
> > Hey folks,
> >
> > I'm working on a patch to add some more diagnostics from the planning
> > process into query profiles.
> >
> > Currently, only the planning "Timeline" is reported back as part of the
> > response to createExecRequest. As part of the fetch-on-demand catalog
> work
> > I'd like to start exposing various counters such as cache hit/miss
> counts,
> > time spent on remote calls to the catalog, etc. Even in the existing code
> > paths, I can see some similar metrics being useful.
> >
> > My current thinking is to remove the 'timeline'  (TEventSequence) field
> > from TExecRequest and replace it with a full TRuntimeProfileNode. I'd
> then
> > add some capability in the Java side to fill in counters, etc, in this
> > structure.
> >
> > Any concerns with this approach before I go down this path? Are there any
> > compatibility guarantees I need to uphold with the profile output of
> > queries?
> >
> > -Todd
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Adding profile info from Frontend

Posted by Philip Zeyliger <ph...@cloudera.com>.
I'm certainly aware of tools that read TEventSequence and expect it to have
certain structures. If we change the profiles significantly, we should
insert a version sigil in there so that a client could at least figure out
that they're looking at something new.

A bigger modeling conversation is the use of string keys everywhere. For
the timeline, for example, it's a list of (description, time) pairs, with
the descriptions being strings. An alternative would have been to put the
strings in Thrift itself. The latter, I think, makes tooling easier to
write (since you're calling "get_query_started_time()" instead of "for
description, time in timelines: if description == 'query started': return
time". I recognize working with Thrift is kind of a pain, but I want to
throw the thought out there if we're adding things to profiles.

-- Philip


On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon <to...@cloudera.com> wrote:

> Hey folks,
>
> I'm working on a patch to add some more diagnostics from the planning
> process into query profiles.
>
> Currently, only the planning "Timeline" is reported back as part of the
> response to createExecRequest. As part of the fetch-on-demand catalog work
> I'd like to start exposing various counters such as cache hit/miss counts,
> time spent on remote calls to the catalog, etc. Even in the existing code
> paths, I can see some similar metrics being useful.
>
> My current thinking is to remove the 'timeline'  (TEventSequence) field
> from TExecRequest and replace it with a full TRuntimeProfileNode. I'd then
> add some capability in the Java side to fill in counters, etc, in this
> structure.
>
> Any concerns with this approach before I go down this path? Are there any
> compatibility guarantees I need to uphold with the profile output of
> queries?
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>