You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by David Medinets <da...@gmail.com> on 2012/07/01 03:30:09 UTC

Re: [jira] [Commented] (ACCUMULO-473) Support binary search within RFile blocks

Can this behavior be simulated to get a feel for expected improvements?
On Jun 29, 2012 10:45 PM, "Keith Turner (JIRA)" <ji...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/ACCUMULO-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404374#comment-13404374]
>
> Keith Turner commented on ACCUMULO-473:
> ---------------------------------------
>
> One other thing I like about building a transient per block index is that
> it cleanly seperates indexing from the relative key encoding.  We can
> improve the relative key encoding independant of indexing as long as the
> index can still jump to an arbitrary point.
>
> > Support binary search within RFile blocks
> > -----------------------------------------
> >
> >                 Key: ACCUMULO-473
> >                 URL: https://issues.apache.org/jira/browse/ACCUMULO-473
> >             Project: Accumulo
> >          Issue Type: Improvement
> >          Components: tserver
> >            Reporter: Keith Turner
> >            Assignee: Keith Turner
> >             Fix For: 1.5.0
> >
> >
> > RFiles store blocks of key values using relative encoding.  By default
> these blocks are small (100k).  To find a key in an RFile the index is used
> to find the block, then the block is scanned for the key.  It would be nice
> for iterators that do alot of seeking if a binary search were done instead
> of a scan.
> > Relative encoding is a form of compression that serializes a key
> relative to the last key.  For example if the row in a key is the same as
> the previous key, then the row is not stored again.  This works well with
> sorted data.   The current on disk format does not support random access
> within a block, because to read entry N all previous entries must be read.
>   One option would be to deserialize the block into a form that supports
> random access and then do binary search.  However this will consume memory
> and CPU.  If the relative encoding format could be modified to support
> random access, then binary search could be supported in a memory and CPU
> efficient way.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>