You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2018/12/07 01:09:00 UTC

[jira] [Commented] (PHOENIX-4927) Disentangle the granularity of guidepost data from that of client cached guide post data

    [ https://issues.apache.org/jira/browse/PHOENIX-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712204#comment-16712204 ] 

Lars Hofhansl commented on PHOENIX-4927:
----------------------------------------

Nice. Thanks!

Maybe we can start with something simple: Just introduce a per client "granularity" property; a number indicating how many guiderposts to combine. If set to (say) 5 we'll always combine 5 guideposts in the stats table into 1 guidepost cached at the client.

That way we could set the guidepost width to something like 10MB and then by default combine 10 guideposts for actual 100MB guideposts. But if the client needs more information that "granularity" can be reduced all the way down to 1.

Of course it's better to auto-tune this. For 1PB table table it makes little sense to keep over 100m 10MB guideposts.

Most importantly we realize that the guidepost data and the actual number of scans is independent.

The number of scans should depend on (1) how many servers have data for the table(s) in question, (2) how many of the cluster's resource the client is willing (or allowed) to commit.
Once we have determined a good number of scan, *then* we use guidepost data to determine the start/stop keys for the scans.


> Disentangle the granularity of guidepost data from that of client cached guide post data
> ----------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4927
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4927
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Bin Shi
>            Assignee: Bin Shi
>            Priority: Major
>
> The expected behaviors:
>  # It should be possible to have 10MB guideposts for precision and not force that to cache that many guide posts on the clients.
>  # It should combine as many guideposts into chunks as needed for the particular query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)