You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2018/12/07 21:56:00 UTC
[jira] [Commented] (PHOENIX-4845) Support using Row Value Constructors in OFFSET clause to support paging in tables where the sort order of PK columns varies

    [ https://issues.apache.org/jira/browse/PHOENIX-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713324#comment-16713324 ] 

Lars Hofhansl commented on PHOENIX-4845:
----------------------------------------

> do not require a user exposed syntax for "start the scan exactly here"

Agree there. I'll add it's still a useful HBase feature to make available via Phoenix, but that's a different discussion.

I think the problem with the {{LIMIT 5,10}} or {{OFFSET 4 ROWS FETCH NEXT 4 ROWS ONLY}} is that HBase is very bad - by design - with fixed row offsets.

There simply is no notion of an Nth row in HBase, unless you start from the beginning, merge-sort all the HFiles, and count off N rows. There is, however, a notion in HBase to "seek to or before this specific key".
As [~tdsilva] points out, we cannot simply keep the HBase scanners open, they are designed to stay open for a single RPC request. At the very least the lease will time out on the server to conserve resources and allow compactions to remove old files.

This is what led to the discussions here, we need to express the paging in terms of the notion that HBase understands and can do efficiently.
So whatever that is, there has be state kept somewhere (server or client) that allows Phoenix to issue "seek to this key" request to HBase.

The cursor syntax is nice as long as we can express in HBase friendly terms.

> Support using Row Value Constructors in OFFSET clause to support paging in tables where the sort order of PK columns varies
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4845
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4845
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Thomas D'Silva
>            Priority: Major
>              Labels: DESC, SFDC
>         Attachments: PHOENIX-offset.txt
>
>
> RVCs along with the LIMIT clause are useful for efficiently paging through rows (see [http://phoenix.apache.org/paged.html]). This works well if the pk columns are sorted ascending, we can always use the > operator to query for the next batch of row. 
> However if the PK of a table is (A  DESC, B DESC) we cannot use the following query to page through the data
> {code:java}
> SELECT * FROM TABLE WHERE (A, B) > (?, ?) ORDER BY A DESC, B DESC LIMIT 20
> {code}
> Since the rows are sorted by A desc and then by B descending we need change the comparison order
> {code:java}
> SELECT * FROM TABLE WHERE (A, B) < (?, ?) ORDER BY A DESC, B DESC LIMIT 20
> {code}
> If the PK of a table contains columns with mixed sort order for eg (A  DESC, B) then we cannot use RVC to page through data. 
> If we supported using RVCs in the offset clause we could use the offset to set the start row of the scan. Clients would not have to have logic to determine the comparison operator. This would also support paging through data for tables where the PK columns are sorted in mixed order. 
> {code:java}
> SELECT * FROM TABLE ORDER BY A DESC, B LIMIT 20 OFFSET (?,?)
> {code}
> We would only allow using the offset if the rows are ordered by the sort order of the PK columns.
>  
> FYI [~jfernando_sfdc]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)