You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Abhishek Gupta <ab...@gmail.com> on 2018/08/17 11:25:36 UTC

Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

Hi Team,

I am working on a use case where SQL aggregated queries are made such that
RVC cannot be used (aggregation on truncated primary key columns) instead
LIMIT-OFFSET has to be used. RVC is used for some user user cases

Currently I have disabled BLOCKCACHE for the table. I wanted to check if it
would be more performant to instead enable BLOCKCACHE on the table and pass
NO_CACHE hint for RVC queries because it uses non-LIMIT-OFFSET scans and
not pass NO_CACHE for the LIMIT-OFFSET queries so that for the subsequent
page calls can leverage prior page data in block cache.

Thanks,
Abhishek

Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

Posted by Thomas D'Silva <td...@salesforce.com>.
If the offset is very large you will end up scanning and filtering many
rows most of which won't be accessed again.
You need to test and see the performance of the NO_CACHE in your particular
use case to see if its helps.

On Wed, Aug 22, 2018 at 4:39 AM, Abhishek Gupta <ab...@gmail.com> wrote:

> The thing is that RVC will not work for case where the SQL query does
> aggregation on truncated primary key columns
> eg. SELECT pk1,
>         TO_CHAR(CONVERT_TZ(TO_DATE(time_col,'yyyyMMddHHmm'),
> 'UTC','Asia/Kolkata'),'yyyyMMdd') as truncated_time_sample,
>         SUM(col1) as agg_value,
>      FROM ... GROUP BY pk1, truncated_time_sample..LIMIT 1000 OFFSET 1000
>
> This is the reason to pick LIMIT OFFSET for pagination.
>
> From your answer it seems for OFFSET queries, all the previous pages rows
> are scanned aggregated to rows and rejected till offset. So in such a
> scenario wouldn't those rows in block cache help during the pagination
> duration.
>
> Thanks,
> Abhishek
>
> On Wed, Aug 22, 2018 at 12:07 AM Thomas D'Silva <td...@salesforce.com>
> wrote:
>
>> When you do an OFFSET Phoenix will scan rows and filter them out until it
>> reaches the offset count which can end up being very costly for large
>> offsets.
>> If you can use a RVC where the order matches the PK of the data table or
>> index the start key of the scan will be set based on the RVC, which is much
>> more efficient
>> (see http://phoenix.apache.org/paged.html).
>>
>> On Tue, Aug 21, 2018 at 8:06 AM, Abhishek Gupta <ab...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Could you help me understand how LIMIT OFFSET queries work under the
>>> hood in Phoenix, is the filtering out of rows done in heap or is there some
>>> sort of optimisation where it can skip at disk level.
>>> My idea about posting this question was to understand if the rows from
>>> paste pages of the query in block cache can optimize the subsequent page
>>> call that would use the cache hits for the filter out rows and not seek at
>>> disk.
>>>
>>> Thanks,
>>> Abhishek
>>>
>>> On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva <td...@salesforce.com>
>>> wrote:
>>>
>>>> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries,
>>>> since you will be reading and filtering out lots of rows on the server?
>>>> I guess using the block cache for RVC queries might help depending on
>>>> how many rows you read per query, you should be able to easily test this
>>>> out.
>>>>
>>>> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta <ab...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> I am working on a use case where SQL aggregated queries are made such
>>>>> that RVC cannot be used (aggregation on truncated primary key columns)
>>>>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>>>>
>>>>> Currently I have disabled BLOCKCACHE for the table. I wanted to check
>>>>> if it would be more performant to instead enable BLOCKCACHE on the
>>>>> table and pass NO_CACHE hint for RVC queries because it uses
>>>>> non-LIMIT-OFFSET scans and not pass NO_CACHE for the LIMIT-OFFSET
>>>>> queries so that for the subsequent page calls can leverage prior page data
>>>>> in block cache.
>>>>>
>>>>> Thanks,
>>>>> Abhishek
>>>>>
>>>>
>>>>
>>

Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

Posted by Abhishek Gupta <ab...@gmail.com>.
The thing is that RVC will not work for case where the SQL query does
aggregation on truncated primary key columns
eg. SELECT pk1,
        TO_CHAR(CONVERT_TZ(TO_DATE(time_col,'yyyyMMddHHmm'),
'UTC','Asia/Kolkata'),'yyyyMMdd') as truncated_time_sample,
        SUM(col1) as agg_value,
     FROM ... GROUP BY pk1, truncated_time_sample..LIMIT 1000 OFFSET 1000

This is the reason to pick LIMIT OFFSET for pagination.

From your answer it seems for OFFSET queries, all the previous pages rows
are scanned aggregated to rows and rejected till offset. So in such a
scenario wouldn't those rows in block cache help during the pagination
duration.

Thanks,
Abhishek

On Wed, Aug 22, 2018 at 12:07 AM Thomas D'Silva <td...@salesforce.com>
wrote:

> When you do an OFFSET Phoenix will scan rows and filter them out until it
> reaches the offset count which can end up being very costly for large
> offsets.
> If you can use a RVC where the order matches the PK of the data table or
> index the start key of the scan will be set based on the RVC, which is much
> more efficient
> (see http://phoenix.apache.org/paged.html).
>
> On Tue, Aug 21, 2018 at 8:06 AM, Abhishek Gupta <ab...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Could you help me understand how LIMIT OFFSET queries work under the hood
>> in Phoenix, is the filtering out of rows done in heap or is there some sort
>> of optimisation where it can skip at disk level.
>> My idea about posting this question was to understand if the rows from
>> paste pages of the query in block cache can optimize the subsequent page
>> call that would use the cache hits for the filter out rows and not seek at
>> disk.
>>
>> Thanks,
>> Abhishek
>>
>> On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva <td...@salesforce.com>
>> wrote:
>>
>>> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries,
>>> since you will be reading and filtering out lots of rows on the server?
>>> I guess using the block cache for RVC queries might help depending on
>>> how many rows you read per query, you should be able to easily test this
>>> out.
>>>
>>> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta <ab...@gmail.com>
>>> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> I am working on a use case where SQL aggregated queries are made such
>>>> that RVC cannot be used (aggregation on truncated primary key columns)
>>>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>>>
>>>> Currently I have disabled BLOCKCACHE for the table. I wanted to check
>>>> if it would be more performant to instead enable BLOCKCACHE on the
>>>> table and pass NO_CACHE hint for RVC queries because it uses
>>>> non-LIMIT-OFFSET scans and not pass NO_CACHE for the LIMIT-OFFSET
>>>> queries so that for the subsequent page calls can leverage prior page data
>>>> in block cache.
>>>>
>>>> Thanks,
>>>> Abhishek
>>>>
>>>
>>>
>

Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

Posted by Thomas D'Silva <td...@salesforce.com>.
When you do an OFFSET Phoenix will scan rows and filter them out until it
reaches the offset count which can end up being very costly for large
offsets.
If you can use a RVC where the order matches the PK of the data table or
index the start key of the scan will be set based on the RVC, which is much
more efficient
(see http://phoenix.apache.org/paged.html).

On Tue, Aug 21, 2018 at 8:06 AM, Abhishek Gupta <ab...@gmail.com> wrote:

> Hi,
>
> Could you help me understand how LIMIT OFFSET queries work under the hood
> in Phoenix, is the filtering out of rows done in heap or is there some sort
> of optimisation where it can skip at disk level.
> My idea about posting this question was to understand if the rows from
> paste pages of the query in block cache can optimize the subsequent page
> call that would use the cache hits for the filter out rows and not seek at
> disk.
>
> Thanks,
> Abhishek
>
> On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva <td...@salesforce.com>
> wrote:
>
>> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries, since
>> you will be reading and filtering out lots of rows on the server?
>> I guess using the block cache for RVC queries might help depending on how
>> many rows you read per query, you should be able to easily test this out.
>>
>> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta <ab...@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> I am working on a use case where SQL aggregated queries are made such
>>> that RVC cannot be used (aggregation on truncated primary key columns)
>>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>>
>>> Currently I have disabled BLOCKCACHE for the table. I wanted to check if
>>> it would be more performant to instead enable BLOCKCACHE on the table
>>> and pass NO_CACHE hint for RVC queries because it uses non-LIMIT-OFFSET
>>> scans and not pass NO_CACHE for the LIMIT-OFFSET queries so that for
>>> the subsequent page calls can leverage prior page data in block cache.
>>>
>>> Thanks,
>>> Abhishek
>>>
>>
>>

Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

Posted by Abhishek Gupta <ab...@gmail.com>.
Hi,

Could you help me understand how LIMIT OFFSET queries work under the hood
in Phoenix, is the filtering out of rows done in heap or is there some sort
of optimisation where it can skip at disk level.
My idea about posting this question was to understand if the rows from
paste pages of the query in block cache can optimize the subsequent page
call that would use the cache hits for the filter out rows and not seek at
disk.

Thanks,
Abhishek

On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva <td...@salesforce.com>
wrote:

> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries, since
> you will be reading and filtering out lots of rows on the server?
> I guess using the block cache for RVC queries might help depending on how
> many rows you read per query, you should be able to easily test this out.
>
> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta <ab...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> I am working on a use case where SQL aggregated queries are made such
>> that RVC cannot be used (aggregation on truncated primary key columns)
>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>
>> Currently I have disabled BLOCKCACHE for the table. I wanted to check if
>> it would be more performant to instead enable BLOCKCACHE on the table
>> and pass NO_CACHE hint for RVC queries because it uses non-LIMIT-OFFSET
>> scans and not pass NO_CACHE for the LIMIT-OFFSET queries so that for the
>> subsequent page calls can leverage prior page data in block cache.
>>
>> Thanks,
>> Abhishek
>>
>
>

Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

Posted by Thomas D'Silva <td...@salesforce.com>.
Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries, since
you will be reading and filtering out lots of rows on the server?
I guess using the block cache for RVC queries might help depending on how
many rows you read per query, you should be able to easily test this out.

On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta <ab...@gmail.com> wrote:

> Hi Team,
>
> I am working on a use case where SQL aggregated queries are made such that
> RVC cannot be used (aggregation on truncated primary key columns) instead
> LIMIT-OFFSET has to be used. RVC is used for some user user cases
>
> Currently I have disabled BLOCKCACHE for the table. I wanted to check if
> it would be more performant to instead enable BLOCKCACHE on the table and
> pass NO_CACHE hint for RVC queries because it uses non-LIMIT-OFFSET scans
> and not pass NO_CACHE for the LIMIT-OFFSET queries so that for the
> subsequent page calls can leverage prior page data in block cache.
>
> Thanks,
> Abhishek
>