You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2019/05/16 14:19:00 UTC

[jira] [Commented] (PHOENIX-5280) Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys

    [ https://issues.apache.org/jira/browse/PHOENIX-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841396#comment-16841396 ] 

James Taylor commented on PHOENIX-5280:
---------------------------------------

How is this different than the skip scan (which can be forced through the /*+ SKIP_SCAN */ hint)?

> Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5280
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5280
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Daniel Wong
>            Priority: Minor
>
> Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys
> Recently a user has had an issue where they have a composite pk with 2 columns say (organizationId varchar, departmentId varchar).  They want to query all their data with a condition where department is fully qualified department.  Example SELECT * FROM TABLE WHERE  departmentId='123'.  They also know that 95% of the organization leading edge contains the qualified trailing edge.  However department = '123' is less than 5% of the total data in the table.
> Based on the explain plan today for this we would run a Round Robin Full Scan with a filter on departmentId='123'.
>  While one possible approach to not run a full table scan is to build an index on department. Another approach could be to construct a new version of a skipscan like filter to control this scan.  Essentially we could use 1 lookup to find the organizationId then additional skipscan for the trailing key.  This could be triggered with a sql syntax hint or in the future data driven.
> For a given region assume the data looks like this.
> ||organizationId||departmentId||
> |org1|100|
> |org4|100|
> |org4|101|
> |org4|123|
> |org5|100|
> |org5|123|
> First query the initial row in the region.  We get 'org1','100'.  From this we can construct the next rows of ['org1','123' - 'org1','123\x0').  After processing that block (in our case 0 rows) we would run to the row at or greater than  nextKey(current orgnaziationId),'123'.  This would give us org4,101.  We would then run to the row of 'org4','123'.  Essentially 1 step to find the orgId and then a scan of all the departments for that value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)