You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by "Michael.Calvin" <77...@qq.com> on 2014/07/24 04:00:40 UTC

How to use Paged Queries?

Using HBase 0.98-hadoop-1, phoenix 4.0.0-incubating.‍
I have a table(named moninorlog, pk is id) created by phoenix with 13 columns and 1 secondary-index(named interfaceid).
Then I tried query like "SELECT id,interfaceid FROM moninorlog WHERE (id,interfaceid) > (?,?) ORDER BY id,interfaceid LIMIT 10‍".
I ran this query twice in the same connection with same variables (even in same SreparedStatement object) , but they retuned the same results.
Did I missed something?


------------------
Michael.Calvin.Shi

Re: Re: How to use Paged Queries?

Posted by James Taylor <ja...@apache.org>.

For an example of this flow, see QueryMoreIT:
https://github.com/apache/phoenix/blob/master/phoenix-core/src/it/java/org/apache/phoenix/end2end/QueryMoreIT.java

On Fri, Jul 25, 2014 at 12:45 PM, James Taylor <ja...@apache.org> wrote:
> Yes, we have code at Salesforce that does exactly that. You really
> only need to cache 1) the row number ordinal and 2) the PK column
> values (and only if there's an ORDER BY in your query, as otherwise
> you can page through the results through the paging mechanism
> described in this thread). Then you can just use this info to
> formulate a query with an IN clause to get the rows back a page at a
> time.
>
> On Fri, Jul 25, 2014 at 10:53 AM, Nicolas Maillard
> <nm...@hortonworks.com> wrote:
>> Hey james
>>
>> Looking at your reply got me thinking. Could pagination of very large result
>> sets be set up in a temporary hbase table.
>> Say your query for a large dataset with a complex query, the result could be
>> "cached" in a temp table and served paginated where rowkeys would be
>> postions. This way you could re-ask in a short window of time for the parts
>> of the data.
>> Maybe with a cached hint in the query with a TTL?
>>
>>
>> On Thu, Jul 24, 2014 at 10:30 AM, Michael.Calvin <77...@qq.com> wrote:
>>>
>>> Understood now.
>>> Thanks for the help.
>>>
>>> ------------------
>>> Michael.Calvin.Shi
>>>
>>> ___________________________
>>> Hi Michael,
>>> You need to re-bind the variables with the id of the last row you get
>>> back from the SELECT statement (the 10th row in this case). Also, you
>>> only need to compare only the PK column in the WHERE clause, so you
>>> could do something like this:
>>>     SELECT id, interfaceid FROM moninorlog WHERE id > ?
>>>     ORDER BY id LIMIT 10
>>> This would allow you to step through your rows 10 at a time.
>>>
>>> You could also step through them ordered by interfaceid since you have
>>> a secondary index on it:
>>>     SELECT id, interfaceid FROM moninorlog WHERE (interfaceid,id) > (?,?)
>>>     ORDER BY interfaceid,id LIMIT 10
>>>
>>> You need to include both interfaceid and id in the WHERE clause since
>>> (presumably) interfaceid might repeat with the same value. The row
>>> value constructor syntax, (i.e. listing more than one column, like
>>> this: (?, ?,?) ) is only necessary if you have multiple PK columns.
>>>
>>> The important part of this paging mechanism is that it provides for an
>>> efficient way of "re-establishing" your cursor position.
>>>
>>> HTH. Thanks,
>>> James
>>>
>>> On Wed, Jul 23, 2014 at 7:00 PM, Michael.Calvin <77...@qq.com> wrote:
>>> > Using HBase 0.98-hadoop-1, phoenix 4.0.0-incubating.‍
>>> > I have a table(named moninorlog, pk is id) created by phoenix with 13
>>> > columns and 1 secondary-index(named interfaceid).
>>> > Then I tried query like "SELECT id,interfaceid FROM moninorlog WHERE
>>> > (id,interfaceid) > (?,?) ORDER BY id,interfaceid LIMIT 10‍".
>>> > I ran this query twice in the same connection with same variables (even
>>> > in
>>> > same SreparedStatement object) , but they retuned the same results.
>>> > Did I missed something?
>>> >
>>> > ------------------
>>> > Michael.Calvin.Shi
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader of
>> this message is not the intended recipient, you are hereby notified that any
>> printing, copying, dissemination, distribution, disclosure or forwarding of
>> this communication is strictly prohibited. If you have received this
>> communication in error, please contact the sender immediately and delete it
>> from your system. Thank You.

Re: Re: How to use Paged Queries?

Posted by James Taylor <ja...@apache.org>.

Yes, we have code at Salesforce that does exactly that. You really
only need to cache 1) the row number ordinal and 2) the PK column
values (and only if there's an ORDER BY in your query, as otherwise
you can page through the results through the paging mechanism
described in this thread). Then you can just use this info to
formulate a query with an IN clause to get the rows back a page at a
time.

On Fri, Jul 25, 2014 at 10:53 AM, Nicolas Maillard
<nm...@hortonworks.com> wrote:
> Hey james
>
> Looking at your reply got me thinking. Could pagination of very large result
> sets be set up in a temporary hbase table.
> Say your query for a large dataset with a complex query, the result could be
> "cached" in a temp table and served paginated where rowkeys would be
> postions. This way you could re-ask in a short window of time for the parts
> of the data.
> Maybe with a cached hint in the query with a TTL?
>
>
> On Thu, Jul 24, 2014 at 10:30 AM, Michael.Calvin <77...@qq.com> wrote:
>>
>> Understood now.
>> Thanks for the help.
>>
>> ------------------
>> Michael.Calvin.Shi
>>
>> ___________________________
>> Hi Michael,
>> You need to re-bind the variables with the id of the last row you get
>> back from the SELECT statement (the 10th row in this case). Also, you
>> only need to compare only the PK column in the WHERE clause, so you
>> could do something like this:
>>     SELECT id, interfaceid FROM moninorlog WHERE id > ?
>>     ORDER BY id LIMIT 10
>> This would allow you to step through your rows 10 at a time.
>>
>> You could also step through them ordered by interfaceid since you have
>> a secondary index on it:
>>     SELECT id, interfaceid FROM moninorlog WHERE (interfaceid,id) > (?,?)
>>     ORDER BY interfaceid,id LIMIT 10
>>
>> You need to include both interfaceid and id in the WHERE clause since
>> (presumably) interfaceid might repeat with the same value. The row
>> value constructor syntax, (i.e. listing more than one column, like
>> this: (?, ?,?) ) is only necessary if you have multiple PK columns.
>>
>> The important part of this paging mechanism is that it provides for an
>> efficient way of "re-establishing" your cursor position.
>>
>> HTH. Thanks,
>> James
>>
>> On Wed, Jul 23, 2014 at 7:00 PM, Michael.Calvin <77...@qq.com> wrote:
>> > Using HBase 0.98-hadoop-1, phoenix 4.0.0-incubating.‍
>> > I have a table(named moninorlog, pk is id) created by phoenix with 13
>> > columns and 1 secondary-index(named interfaceid).
>> > Then I tried query like "SELECT id,interfaceid FROM moninorlog WHERE
>> > (id,interfaceid) > (?,?) ORDER BY id,interfaceid LIMIT 10‍".
>> > I ran this query twice in the same connection with same variables (even
>> > in
>> > same SreparedStatement object) , but they retuned the same results.
>> > Did I missed something?
>> >
>> > ------------------
>> > Michael.Calvin.Shi
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.

Re: Re: How to use Paged Queries?

Posted by Nicolas Maillard <nm...@hortonworks.com>.

Hey james

Looking at your reply got me thinking. Could pagination of very large
result sets be set up in a temporary hbase table.
Say your query for a large dataset with a complex query, the result could
be "cached" in a temp table and served paginated where rowkeys would be
postions. This way you could re-ask in a short window of time for the parts
of the data.
Maybe with a cached hint in the query with a TTL?


On Thu, Jul 24, 2014 at 10:30 AM, Michael.Calvin <77...@qq.com> wrote:

> Understood now.
> Thanks for the help.
>
> ------------------
> Michael.Calvin.Shi
>
> ___________________________
> Hi Michael,
> You need to re-bind the variables with the id of the last row you get
> back from the SELECT statement (the 10th row in this case). Also, you
> only need to compare only the PK column in the WHERE clause, so you
> could do something like this:
>     SELECT id, interfaceid FROM moninorlog WHERE id > ?
>     ORDER BY id LIMIT 10
> This would allow you to step through your rows 10 at a time.
>
> You could also step through them ordered by interfaceid since you have
> a secondary index on it:
>     SELECT id, interfaceid FROM moninorlog WHERE (interfaceid,id) > (?,?)
>     ORDER BY interfaceid,id LIMIT 10
>
> You need to include both interfaceid and id in the WHERE clause since
> (presumably) interfaceid might repeat with the same value. The row
> value constructor syntax, (i.e. listing more than one column, like
> this: (?, ?,?) ) is only necessary if you have multiple PK columns.
>
> The important part of this paging mechanism is that it provides for an
> efficient way of "re-establishing" your cursor position.
>
> HTH. Thanks,
> James
>
> On Wed, Jul 23, 2014 at 7:00 PM, Michael.Calvin <77...@qq.com> wrote:
> > Using HBase 0.98-hadoop-1, phoenix 4.0.0-incubating.‍
> > I have a table(named moninorlog, pk is id) created by phoenix with 13
> > columns and 1 secondary-index(named interfaceid).
> > Then I tried query like "SELECT id,interfaceid FROM moninorlog WHERE
> > (id,interfaceid) > (?,?) ORDER BY id,interfaceid LIMIT 10‍".
> > I ran this query twice in the same connection with same variables (even
> in
> > same SreparedStatement object) , but they retuned the same results.
> > Did I missed something?
> >
> > ------------------
> > Michael.Calvin.Shi
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Re: How to use Paged Queries?

Posted by "Michael.Calvin" <77...@qq.com>.

Understood now.
Thanks for the help.


------------------
Michael.Calvin.Shi


___________________________


Hi Michael,
You need to re-bind the variables with the id of the last row you get
back from the SELECT statement (the 10th row in this case). Also, you
only need to compare only the PK column in the WHERE clause, so you
could do something like this:
    SELECT id, interfaceid FROM moninorlog WHERE id > ?
    ORDER BY id LIMIT 10
This would allow you to step through your rows 10 at a time.

You could also step through them ordered by interfaceid since you have
a secondary index on it:
    SELECT id, interfaceid FROM moninorlog WHERE (interfaceid,id) > (?,?)
    ORDER BY interfaceid,id LIMIT 10

You need to include both interfaceid and id in the WHERE clause since
(presumably) interfaceid might repeat with the same value. The row
value constructor syntax, (i.e. listing more than one column, like
this: (?, ?,?) ) is only necessary if you have multiple PK columns.

The important part of this paging mechanism is that it provides for an
efficient way of "re-establishing" your cursor position.

HTH. Thanks,
James

On Wed, Jul 23, 2014 at 7:00 PM, Michael.Calvin <77...@qq.com> wrote:
> Using HBase 0.98-hadoop-1, phoenix 4.0.0-incubating.‍
> I have a table(named moninorlog, pk is id) created by phoenix with 13
> columns and 1 secondary-index(named interfaceid).
> Then I tried query like "SELECT id,interfaceid FROM moninorlog WHERE
> (id,interfaceid) > (?,?) ORDER BY id,interfaceid LIMIT 10‍".
> I ran this query twice in the same connection with same variables (even in
> same SreparedStatement object) , but they retuned the same results.
> Did I missed something?
>
> ------------------
> Michael.Calvin.Shi

Re: How to use Paged Queries?

Posted by James Taylor <ja...@apache.org>.

Hi Michael,
You need to re-bind the variables with the id of the last row you get
back from the SELECT statement (the 10th row in this case). Also, you
only need to compare only the PK column in the WHERE clause, so you
could do something like this:
    SELECT id, interfaceid FROM moninorlog WHERE id > ?
    ORDER BY id LIMIT 10
This would allow you to step through your rows 10 at a time.

You could also step through them ordered by interfaceid since you have
a secondary index on it:
    SELECT id, interfaceid FROM moninorlog WHERE (interfaceid,id) > (?,?)
    ORDER BY interfaceid,id LIMIT 10

You need to include both interfaceid and id in the WHERE clause since
(presumably) interfaceid might repeat with the same value. The row
value constructor syntax, (i.e. listing more than one column, like
this: (?, ?,?) ) is only necessary if you have multiple PK columns.

The important part of this paging mechanism is that it provides for an
efficient way of "re-establishing" your cursor position.

HTH. Thanks,
James

On Wed, Jul 23, 2014 at 7:00 PM, Michael.Calvin <77...@qq.com> wrote:
> Using HBase 0.98-hadoop-1, phoenix 4.0.0-incubating.‍
> I have a table(named moninorlog, pk is id) created by phoenix with 13
> columns and 1 secondary-index(named interfaceid).
> Then I tried query like "SELECT id,interfaceid FROM moninorlog WHERE
> (id,interfaceid) > (?,?) ORDER BY id,interfaceid LIMIT 10‍".
> I ran this query twice in the same connection with same variables (even in
> same SreparedStatement object) , but they retuned the same results.
> Did I missed something?
>
> ------------------
> Michael.Calvin.Shi