You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2016/02/21 19:29:18 UTC

[jira] [Commented] (PHOENIX-1701) Adapt guidepost selection at compile time

    [ https://issues.apache.org/jira/browse/PHOENIX-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156139#comment-15156139 ] 

Lars Hofhansl commented on PHOENIX-1701:
----------------------------------------

[~giacomotaylor], I don't think this has to do with absolute byte values or not.
What I meant was that guideposts and produced scans are mapped one to one (possible I'm missing something or this has changes recently).

I.e. when we find 10 guideposts for a query, we'll translate to 10 scans, right? This in turn forces us to size the guide posts right for the parallelism we want. Instead we should be able to create small guideposts and combine them to larger ones as needed, so that the number of a guidepost and the desired parallelism can be independent. For that we need to be able to collect much smaller guideposts. We can always combine the, but we can't split them - without assuming anything about key distribution.


> Adapt guidepost selection at compile time
> -----------------------------------------
>
>                 Key: PHOENIX-1701
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1701
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>
> Currently we tweak the guide post width for the partition size we want - for example it just changed to 100mb to 300mb because FAST_DIFF is used by default.
> Instead it might better to collect more guideposts (maybe even as low as every 10mb) and then combine them at compile time into larger chunks.
> If we store them correctly the adjacent guideposts would be stored in order in the stats table and hence we would scan that table until we have a size we want (in terms of chunk size).
> The more information we have, the better, we can combine smaller guideposts, but we cannot split larger ones because we lack information.
> Just filing as a brainstorming issue for debate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)