You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2018/03/20 20:21:00 UTC

[jira] [Comment Edited] (PHOENIX-4594) Perform binary search on guideposts during query compilation

    [ https://issues.apache.org/jira/browse/PHOENIX-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406950#comment-16406950 ] 

James Taylor edited comment on PHOENIX-4594 at 3/20/18 8:20 PM:
----------------------------------------------------------------

A little more info on this one:
- Changes will be isolated to {{BaseResultIterators.getParallelScans(byte[] startKey, byte[] stopKey)}} method
- We currently walk through the guideposts by decoding them using PrefixByteCodec.decode(decoder, input) which prefix encodes the byte[] of the guideposts (since there will be a lot of overlap of the bytes from gp to gp).
- I'm pretty sure the issue with the slowness is due to our linear search (since the time is increasing as the guidepost width decreases), but you might want to confirm that through profiling first.
- Assuming this is the case, we'll need to make a pass through all guideposts and put them into a List in which we can perform a binary search. That'll use more memory (unless you know of a way to binary search while keeping the data encoded), but only during the execution of this method, so perhaps it's ok.
- Take note of the conditions we're search for as we linearly traverse the gps as those will become binary searches. You should be able to prune the search space as you go, since we traverse from smallest to biggest gp.
- We need to determine if there's a guidepost in each region as we use that information to drive the timestamp value we return (essentially the "last updated" information).
- The ExplainPlanWithStatsEnabledIT has a pretty good set of tests that need to keep passing after this change.




was (Author: jamestaylor):
A little more info on this one:
- Changes will be isolated to {{BaseResultIterators.getParallelScans(byte[] startKey, byte[] stopKey)}} method
- I'm pretty sure the issue with the slowness is due to our linear search (since the time is increasing as the guidepost width decreases), but you might want to confirm that through profiling first.
- We currently walk through the guideposts by decoding them using PrefixByteCodec.decode(decoder, input) which prefix encodes the byte[] of the guideposts (since there will be a lot of overlap of the bytes from gp to gp).
- Instead, we'll need to make a pass through all guideposts and put them into a List in which we can perform a binary search. That'll use more memory (unless you know of a way to binary search while keeping the data encoded), but only during the execution of this method, so perhaps it's ok.
- Take note of the conditions we're search for as we linearly traverse the gps as those will become binary searches. You should be able to prune the search space as you go, since we traverse from smallest to biggest gp.
- We need to determine if there's a guidepost in each region as we use that information to drive the timestamp value we return (essentially the "last updated" information).
- The ExplainPlanWithStatsEnabledIT has a pretty good set of tests that need to keep passing after this change.



> Perform binary search on guideposts during query compilation
> ------------------------------------------------------------
>
>                 Key: PHOENIX-4594
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4594
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Abhishek Singh Chouhan
>            Priority: Major
>
> If there are many guideposts, performance will suffer during query compilation because we do a linear search of the guideposts to find the intersection with the scan ranges. Instead, in BaseResultIterators.getParallelScans() we should populate an array of guideposts and perform a binary search. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)