You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Alok Singh <al...@cloudability.com> on 2015/10/03 02:35:59 UTC

Estimating the "cost" of a query

Is there a way to figure out how many rows/cells were scanned in
hbase perform a phoenix query? I tried using the explain command, but, it
is not clear how to estimate the number of rows touched by looking at the
explain plan. Essentially, I want to be able to report back to users the
"cost" of performing a phoenix query, where "cost" is some function of
rows/cells scanned.

Alok

alok@cloudability.com

Re: Estimating the "cost" of a query

Posted by Alok Singh <al...@cloudability.com>.
https://issues.apache.org/jira/browse/PHOENIX-2306

Alok

alok@cloudability.com

On Sat, Oct 3, 2015 at 12:37 PM, Alok Singh <al...@cloudability.com> wrote:

> Thanks James.
>
> I will create a jira ticket for exposing the APIs in a standard way.
>
> Alok
>
> Alok
>
> alok@cloudability.com
>
> On Fri, Oct 2, 2015 at 6:20 PM, James Taylor <ja...@apache.org>
> wrote:
>
>> Hi Alok,
>> Yes, you could calculate an estimate for this information, but it isn't
>> currently exposed through JDBC or through the explain plan (which would be
>> a good place for it to live). You'd need to dip down to the implementation
>> to get it. Something like this:
>>
>> PhoenixStatement statement =
>> connection.createStatement().unwrap(PhoenixStatement.class);
>> ResultSet rs = statement.executeQuery("EXPLAIN SELECT ...");
>> QueryPlan plan = statement.getQueryPlan();
>> List<KeyRange> ranges = plan.getSplits();
>>
>> Each KeyRange in ranges will be going over a configurable amount of bytes
>> (determined by phoenix.stats.guidepost.width
>> and/or phoenix.stats.guidepost.per.region), so a simple worst case estimate
>> would be to multiply the ranges.size() by this config value (using a
>> default of QueryServicesOptions.DEFAULT_STATS_GUIDEPOST_WIDTH_BYTES or
>> 300MB). If the query is a point lookup (which you can check with
>> plan.getContext().getScanRanges().isPointLookup()), then the cost would be
>> ranges.size() * average_row_size.
>>
>> Since these aren't exposed APIs, they're subject to change. Please file a
>> JIRA if you're interested in helping figure out what the "official" APIs
>> for this should be.
>>
>> HTH. Thanks,
>>
>>     James
>>
>> On Fri, Oct 2, 2015 at 5:35 PM, Alok Singh <al...@cloudability.com> wrote:
>>
>>> Is there a way to figure out how many rows/cells were scanned in
>>> hbase perform a phoenix query? I tried using the explain command, but, it
>>> is not clear how to estimate the number of rows touched by looking at the
>>> explain plan. Essentially, I want to be able to report back to users the
>>> "cost" of performing a phoenix query, where "cost" is some function of
>>> rows/cells scanned.
>>>
>>> Alok
>>>
>>> alok@cloudability.com
>>>
>>
>>
>

Re: Estimating the "cost" of a query

Posted by Alok Singh <al...@cloudability.com>.
Thanks James.

I will create a jira ticket for exposing the APIs in a standard way.

Alok

Alok

alok@cloudability.com

On Fri, Oct 2, 2015 at 6:20 PM, James Taylor <ja...@apache.org> wrote:

> Hi Alok,
> Yes, you could calculate an estimate for this information, but it isn't
> currently exposed through JDBC or through the explain plan (which would be
> a good place for it to live). You'd need to dip down to the implementation
> to get it. Something like this:
>
> PhoenixStatement statement =
> connection.createStatement().unwrap(PhoenixStatement.class);
> ResultSet rs = statement.executeQuery("EXPLAIN SELECT ...");
> QueryPlan plan = statement.getQueryPlan();
> List<KeyRange> ranges = plan.getSplits();
>
> Each KeyRange in ranges will be going over a configurable amount of bytes
> (determined by phoenix.stats.guidepost.width
> and/or phoenix.stats.guidepost.per.region), so a simple worst case estimate
> would be to multiply the ranges.size() by this config value (using a
> default of QueryServicesOptions.DEFAULT_STATS_GUIDEPOST_WIDTH_BYTES or
> 300MB). If the query is a point lookup (which you can check with
> plan.getContext().getScanRanges().isPointLookup()), then the cost would be
> ranges.size() * average_row_size.
>
> Since these aren't exposed APIs, they're subject to change. Please file a
> JIRA if you're interested in helping figure out what the "official" APIs
> for this should be.
>
> HTH. Thanks,
>
>     James
>
> On Fri, Oct 2, 2015 at 5:35 PM, Alok Singh <al...@cloudability.com> wrote:
>
>> Is there a way to figure out how many rows/cells were scanned in
>> hbase perform a phoenix query? I tried using the explain command, but, it
>> is not clear how to estimate the number of rows touched by looking at the
>> explain plan. Essentially, I want to be able to report back to users the
>> "cost" of performing a phoenix query, where "cost" is some function of
>> rows/cells scanned.
>>
>> Alok
>>
>> alok@cloudability.com
>>
>
>

Re: Estimating the "cost" of a query

Posted by James Taylor <ja...@apache.org>.
Hi Alok,
Yes, you could calculate an estimate for this information, but it isn't
currently exposed through JDBC or through the explain plan (which would be
a good place for it to live). You'd need to dip down to the implementation
to get it. Something like this:

PhoenixStatement statement =
connection.createStatement().unwrap(PhoenixStatement.class);
ResultSet rs = statement.executeQuery("EXPLAIN SELECT ...");
QueryPlan plan = statement.getQueryPlan();
List<KeyRange> ranges = plan.getSplits();

Each KeyRange in ranges will be going over a configurable amount of bytes
(determined by phoenix.stats.guidepost.width
and/or phoenix.stats.guidepost.per.region), so a simple worst case estimate
would be to multiply the ranges.size() by this config value (using a
default of QueryServicesOptions.DEFAULT_STATS_GUIDEPOST_WIDTH_BYTES or
300MB). If the query is a point lookup (which you can check with
plan.getContext().getScanRanges().isPointLookup()), then the cost would be
ranges.size() * average_row_size.

Since these aren't exposed APIs, they're subject to change. Please file a
JIRA if you're interested in helping figure out what the "official" APIs
for this should be.

HTH. Thanks,

    James

On Fri, Oct 2, 2015 at 5:35 PM, Alok Singh <al...@cloudability.com> wrote:

> Is there a way to figure out how many rows/cells were scanned in
> hbase perform a phoenix query? I tried using the explain command, but, it
> is not clear how to estimate the number of rows touched by looking at the
> explain plan. Essentially, I want to be able to report back to users the
> "cost" of performing a phoenix query, where "cost" is some function of
> rows/cells scanned.
>
> Alok
>
> alok@cloudability.com
>