You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by Bernard Quizon <be...@stellarloyalty.com> on 2016/09/26 06:36:30 UTC

Comprehensive Guide for EXPLAIN

Hi,


I know explain statement results are relatively easy to understand but is
there a comprehensive guide for it?


0: jdbc:phoenix:localhost> explain select count(1) from test.table where
insert_ts > now() - 30;

*+------------------------------------------+*

*| **                  PLAN                  ** |*

*+------------------------------------------+*

*| *CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER TEST.TABLE [0,'2016-08-27
06:27:17.719'] - [0,*]* |*

*| *    SERVER FILTER BY FIRST KEY ONLY     * |*

*| *    SERVER AGGREGATE INTO SINGLE ROW    * |*

*+------------------------------------------+ *



Like for example in the query above from what I understand, it fetches
everything from the server then applies the key range on client side.

But I’m thinking isn’t it less expensive if key range was already applied
on the server so less data will be transmitted, thus saving network and
memory overhead.

Or is my understanding wrong?


Thanks

Re: Comprehensive Guide for EXPLAIN

Posted by James Taylor <ja...@apache.org>.
Yes, the EXPLAIN plan documentation is a section in the Tuning Guide that
would be super valuable to fill in. It'd be much appreciated if you could
move that forward.

Thanks,
James

On Thu, Sep 29, 2016 at 10:52 AM, Peter Conrad <pc...@salesforce.com>
wrote:

> James:
>
> Should I work on putting together documentation on how to read an EXPLAIN
> plan? If so, what's the best way for me to get more information about it?
>
> Thanks,
> Peter
>
>
>
> Peter Conrad
>
> Staff Technical Writer: Infrastructure | salesforce.com
>
> Office: (415) 471-5265
>
>
> [image: http://www.salesforce.com/signature]
> <http://www.salesforce.com/signature>
>
> --
>
>
> On Mon, Sep 26, 2016 at 7:37 AM, James Taylor <ja...@apache.org>
> wrote:
>
> > Hi Bernard,
> > No, unfortunately there's no documentation for our explain plan
> > (PHOENIX-1481).
> >
> > In your example, it's not fetching everything from the server. If it
> > fetches everything from the server it would say "FULL TABLE SCAN". In
> this
> > case, it's doing a "RANGE SCAN" between these two key ranges:
> >
> >     [0,'2016-08-27 06:27:17.719'] - [0,*]
> >
> > I suspect your TEST.TABLE is salted, as the [0] above would represent the
> > salt byte. Maybe you're using a slightly older version of Phoenix too,
> > because AFAIR we show the salt bucket differently in later version. The
> > date is based on now() - 30. So the range is everything greater than that
> > date.
> >
> > The 1-CHUNK PARALLEL 1-WAY tells you how much parallelization the client
> is
> > doing and over how many scans a merge sort is being done over. Your table
> > must be only one region, or the
> >
> > Since you're only selecting columns contained in the row key, Phoenix is
> > pushing a FirstKeyOnlyFilter, that's what this represents:
> >
> >     SERVER FILTER BY FIRST KEY ONLY
> >
> > Then, since you're doing an ungrouped aggregation, this tells you that
> the
> > aggregation is happening on the server side:
> >
> >     SERVER AGGREGATE INTO SINGLE ROW
> >
> > Assuming your table is salted, the scan will be run on each salt bucket
> and
> > then the one row we get back from these scans will be summed together to
> > calculate the final COUNT(1).
> >
> > HTH. Thanks,
> >
> >     James
> >
> > On Sun, Sep 25, 2016 at 11:36 PM, Bernard Quizon <bernard.quizon@
> > stellarloyalty.com> wrote:
> >
> > > Hi,
> > >
> > >
> > > I know explain statement results are relatively easy to understand but
> is
> > > there a comprehensive guide for it?
> > >
> > >
> > > 0: jdbc:phoenix:localhost> explain select count(1) from test.table
> where
> > > insert_ts > now() - 30;
> > >
> > > *+------------------------------------------+*
> > >
> > > *| **                  PLAN                  ** |*
> > >
> > > *+------------------------------------------+*
> > >
> > > *| *CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER TEST.TABLE
> > [0,'2016-08-27
> > > 06:27:17.719'] - [0,*]* |*
> > >
> > > *| *    SERVER FILTER BY FIRST KEY ONLY     * |*
> > >
> > > *| *    SERVER AGGREGATE INTO SINGLE ROW    * |*
> > >
> > > *+------------------------------------------+ *
> > >
> > >
> > >
> > > Like for example in the query above from what I understand, it fetches
> > > everything from the server then applies the key range on client side.
> > >
> > > But I’m thinking isn’t it less expensive if key range was already
> applied
> > > on the server so less data will be transmitted, thus saving network and
> > > memory overhead.
> > >
> > > Or is my understanding wrong?
> > >
> > >
> > > Thanks
> > >
> >
>

Re: Comprehensive Guide for EXPLAIN

Posted by Peter Conrad <pc...@salesforce.com>.
James:

Should I work on putting together documentation on how to read an EXPLAIN
plan? If so, what's the best way for me to get more information about it?

Thanks,
Peter



Peter Conrad

Staff Technical Writer: Infrastructure | salesforce.com

Office: (415) 471-5265


[image: http://www.salesforce.com/signature]
<http://www.salesforce.com/signature>

--


On Mon, Sep 26, 2016 at 7:37 AM, James Taylor <ja...@apache.org>
wrote:

> Hi Bernard,
> No, unfortunately there's no documentation for our explain plan
> (PHOENIX-1481).
>
> In your example, it's not fetching everything from the server. If it
> fetches everything from the server it would say "FULL TABLE SCAN". In this
> case, it's doing a "RANGE SCAN" between these two key ranges:
>
>     [0,'2016-08-27 06:27:17.719'] - [0,*]
>
> I suspect your TEST.TABLE is salted, as the [0] above would represent the
> salt byte. Maybe you're using a slightly older version of Phoenix too,
> because AFAIR we show the salt bucket differently in later version. The
> date is based on now() - 30. So the range is everything greater than that
> date.
>
> The 1-CHUNK PARALLEL 1-WAY tells you how much parallelization the client is
> doing and over how many scans a merge sort is being done over. Your table
> must be only one region, or the
>
> Since you're only selecting columns contained in the row key, Phoenix is
> pushing a FirstKeyOnlyFilter, that's what this represents:
>
>     SERVER FILTER BY FIRST KEY ONLY
>
> Then, since you're doing an ungrouped aggregation, this tells you that the
> aggregation is happening on the server side:
>
>     SERVER AGGREGATE INTO SINGLE ROW
>
> Assuming your table is salted, the scan will be run on each salt bucket and
> then the one row we get back from these scans will be summed together to
> calculate the final COUNT(1).
>
> HTH. Thanks,
>
>     James
>
> On Sun, Sep 25, 2016 at 11:36 PM, Bernard Quizon <bernard.quizon@
> stellarloyalty.com> wrote:
>
> > Hi,
> >
> >
> > I know explain statement results are relatively easy to understand but is
> > there a comprehensive guide for it?
> >
> >
> > 0: jdbc:phoenix:localhost> explain select count(1) from test.table where
> > insert_ts > now() - 30;
> >
> > *+------------------------------------------+*
> >
> > *| **                  PLAN                  ** |*
> >
> > *+------------------------------------------+*
> >
> > *| *CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER TEST.TABLE
> [0,'2016-08-27
> > 06:27:17.719'] - [0,*]* |*
> >
> > *| *    SERVER FILTER BY FIRST KEY ONLY     * |*
> >
> > *| *    SERVER AGGREGATE INTO SINGLE ROW    * |*
> >
> > *+------------------------------------------+ *
> >
> >
> >
> > Like for example in the query above from what I understand, it fetches
> > everything from the server then applies the key range on client side.
> >
> > But I’m thinking isn’t it less expensive if key range was already applied
> > on the server so less data will be transmitted, thus saving network and
> > memory overhead.
> >
> > Or is my understanding wrong?
> >
> >
> > Thanks
> >
>

Re: Comprehensive Guide for EXPLAIN

Posted by James Taylor <ja...@apache.org>.
Hi Bernard,
No, unfortunately there's no documentation for our explain plan
(PHOENIX-1481).

In your example, it's not fetching everything from the server. If it
fetches everything from the server it would say "FULL TABLE SCAN". In this
case, it's doing a "RANGE SCAN" between these two key ranges:

    [0,'2016-08-27 06:27:17.719'] - [0,*]

I suspect your TEST.TABLE is salted, as the [0] above would represent the
salt byte. Maybe you're using a slightly older version of Phoenix too,
because AFAIR we show the salt bucket differently in later version. The
date is based on now() - 30. So the range is everything greater than that
date.

The 1-CHUNK PARALLEL 1-WAY tells you how much parallelization the client is
doing and over how many scans a merge sort is being done over. Your table
must be only one region, or the

Since you're only selecting columns contained in the row key, Phoenix is
pushing a FirstKeyOnlyFilter, that's what this represents:

    SERVER FILTER BY FIRST KEY ONLY

Then, since you're doing an ungrouped aggregation, this tells you that the
aggregation is happening on the server side:

    SERVER AGGREGATE INTO SINGLE ROW

Assuming your table is salted, the scan will be run on each salt bucket and
then the one row we get back from these scans will be summed together to
calculate the final COUNT(1).

HTH. Thanks,

    James

On Sun, Sep 25, 2016 at 11:36 PM, Bernard Quizon <bernard.quizon@
stellarloyalty.com> wrote:

> Hi,
>
>
> I know explain statement results are relatively easy to understand but is
> there a comprehensive guide for it?
>
>
> 0: jdbc:phoenix:localhost> explain select count(1) from test.table where
> insert_ts > now() - 30;
>
> *+------------------------------------------+*
>
> *| **                  PLAN                  ** |*
>
> *+------------------------------------------+*
>
> *| *CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER TEST.TABLE [0,'2016-08-27
> 06:27:17.719'] - [0,*]* |*
>
> *| *    SERVER FILTER BY FIRST KEY ONLY     * |*
>
> *| *    SERVER AGGREGATE INTO SINGLE ROW    * |*
>
> *+------------------------------------------+ *
>
>
>
> Like for example in the query above from what I understand, it fetches
> everything from the server then applies the key range on client side.
>
> But I’m thinking isn’t it less expensive if key range was already applied
> on the server so less data will be transmitted, thus saving network and
> memory overhead.
>
> Or is my understanding wrong?
>
>
> Thanks
>