You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2016/02/21 20:50:18 UTC

[jira] [Comment Edited] (PHOENIX-2702) Show estimate rows and bytes touched for explain plan.

    [ https://issues.apache.org/jira/browse/PHOENIX-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156191#comment-15156191 ] 

Lars Hofhansl edited comment on PHOENIX-2702 at 2/21/16 7:49 PM:
-----------------------------------------------------------------

Yeah I was looking at the plan output and code, when doing the SERIAL hint patch. I've also been thinking about how we could report size and row count for a tenant view, then realized we can get good estimates just through a bit more output on EXPLAIN :)

(the smaller the guideposts the better these estimates, and hence some additional need to separate parallelism from the guidepost count)

Generally I'm a fan of outputting plans in a way that can be machine processed. Either have an API, or output JSON or something (or both).

Re: calcite. We should sit down and come up with relative cost estimates per row (CPU) and per block (IO, which is way different for HDD vs SSD). In Phoenix we also need attach a cost to work done on the client and the server. All these costs factors should be configurable. Along with the actual estimates of rows/bytes touched this would be cool. (and of course we need "traditional" stats like selectivity, etc, etc).

Do you think I should hold off on a quick fix here which would simply a bit more text to the explain output (perhaps optionally), until we have the bigger questions answered?



was (Author: lhofhansl):
Yeah I was looking at the plan output and code, when doing the SERIAL hint patch. I've also been thinking about how we could report size and row count for a tenant view, then realized we can get good estimates just through a bit more output on EXPLAIN :)

Generally I'm a fan of outputting plans in a way that can be machine processed. Either have an API, or output JSON or something (or both).

Re: calcite. We should sit down and come up with relative cost estimates per row (CPU) and per block (IO, which is way different for HDD vs SSD). In Phoenix we also need attach a cost to work done on the client and the server. All these costs factors should be configurable. Along with the actual estimates of rows/bytes touched this would be cool. (and of course we need "traditional" stats like selectivity, etc, etc).

Do you think I should hold off on a quick fix here which would simply a bit more text to the explain output (perhaps optionally), until we have the bigger questions answered?


> Show estimate rows and bytes touched for explain plan.
> ------------------------------------------------------
>
>                 Key: PHOENIX-2702
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2702
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>
> We can already estimate the size of a table (both rows and uncompressed bytes) with q query like this:
> {code}
> SELECT physical_name AS table_name, SUM(guide_posts_row_count) AS est_rows, SUM(guilde_posts_width) AS est_size from SYSTEM.STATS GROUP BY physical_name;
> {code}
> During the planning phase we have more information, though. So we can report the actual numbers for a query during an explain since we have that info there anyway (we filtered the guidepost already with the key info provided in the query).
> I might whip up a quick patch for this.
> (Could also go further and add a est_count, est_size UDF for this, but that would be a bit harder to get hooked up at the right places, I think, and the meaning would be ambiguous)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)