You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2015/12/14 05:31:46 UTC

[jira] [Commented] (PHOENIX-2417) Compress memory used by row key byte[] of guideposts

    [ https://issues.apache.org/jira/browse/PHOENIX-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055402#comment-15055402 ] 

James Taylor commented on PHOENIX-2417:
---------------------------------------

Some more specifics:
- Isolate change to GuidePostsInfo as this is the in memory representation for an entire table's worth of guideposts
- Change GuidePostsInfo {{List<byte[]> getGuidePosts()}} to instead return an {{Iterator<byte[]>}} as it's only iterated over anyway
- Add GuidePostsInfo {{int getGuidePostsCount()}} method to replace {{getGuidePosts().size()}} calls
- Modify GuidePostsInfo serializeGuidePostsInfo() to compress the byte[] guide posts using prefix compress.
- Borrow or copy/paste the attached code as needed to implement the prefix compress.

> Compress memory used by row key byte[] of guideposts
> ----------------------------------------------------
>
>                 Key: PHOENIX-2417
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2417
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Samarth Jain
>             Fix For: 4.7.0
>
>
> We've found that smaller guideposts are better in terms of minimizing any increase in latency for point scans. However, this increases the amount of memory significantly when caching the guideposts on the client. Guidepost are equidistant row keys in the form of raw byte[] which are likely to have a large percentage of their leading bytes in common (as they're stored in sorted order. We should use a simple compression technique to mitigate this. I noticed that Apache Parquet has a run length encoding - perhaps we can use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)