You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Eric Newton (Created) (JIRA)" <ji...@apache.org> on 2012/03/29 00:39:26 UTC

[jira] [Created] (ACCUMULO-501) RFile should store the key count in metadata

RFile should store the key count in metadata
--------------------------------------------

                 Key: ACCUMULO-501
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-501
             Project: Accumulo
          Issue Type: Improvement
            Reporter: Eric Newton
            Assignee: Eric Newton
             Fix For: 1.5.0


BulkImport estimates the number of keys in a file to be zero.  We store the largest and smallest key in metadata, I think we can afford to store the key count use it to provide an estimate when we load it into the tablet.  Perhaps if we know the start key is "a" and the end key is "z" and the tablets range is "a->m" we can just estimate 50% of the key count.

When a bulk file fits completely in a range, the key count estimate will be accurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-501) RFile should store the key count in metadata

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241184#comment-13241184 ] 

Keith Turner commented on ACCUMULO-501:
---------------------------------------

One thing we have discussed before is storing a count in the index for each block.  Using this a scan of the index for the region of the tablet that overlaps the tablet will give a fairly accurate count.
                
> RFile should store the key count in metadata
> --------------------------------------------
>
>                 Key: ACCUMULO-501
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-501
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.5.0
>
>
> BulkImport estimates the number of keys in a file to be zero.  We store the largest and smallest key in metadata, I think we can afford to store the key count use it to provide an estimate when we load it into the tablet.  Perhaps if we know the start key is "a" and the end key is "z" and the tablets range is "a->m" we can just estimate 50% of the key count.
> When a bulk file fits completely in a range, the key count estimate will be accurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (ACCUMULO-501) RFile should store the key count in metadata

Posted by "Keith Turner (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241184#comment-13241184 ] 

Keith Turner edited comment on ACCUMULO-501 at 3/29/12 12:41 PM:
-----------------------------------------------------------------

One thing we have discussed before is storing a count in the index for each block.  Using this a scan of the index for the region of the tablet that overlaps the file will give a fairly accurate count.
                
      was (Author: kturner):
    One thing we have discussed before is storing a count in the index for each block.  Using this a scan of the index for the region of the tablet that overlaps the tablet will give a fairly accurate count.
                  
> RFile should store the key count in metadata
> --------------------------------------------
>
>                 Key: ACCUMULO-501
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-501
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.5.0
>
>
> BulkImport estimates the number of keys in a file to be zero.  We store the largest and smallest key in metadata, I think we can afford to store the key count use it to provide an estimate when we load it into the tablet.  Perhaps if we know the start key is "a" and the end key is "z" and the tablets range is "a->m" we can just estimate 50% of the key count.
> When a bulk file fits completely in a range, the key count estimate will be accurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-501) RFile should store the key count in metadata

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241194#comment-13241194 ] 

Keith Turner commented on ACCUMULO-501:
---------------------------------------

Actually RFile already stores this info for each index entry, its just a matter of using it.  Would be good to piggy back this computation on scan of the index bulk import is already doing, or have bulk import cache the index if multiple scans are done.  If the inner nodes of the index tree contain the sum of their children, then the computation can be made faster.
                
> RFile should store the key count in metadata
> --------------------------------------------
>
>                 Key: ACCUMULO-501
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-501
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.5.0
>
>
> BulkImport estimates the number of keys in a file to be zero.  We store the largest and smallest key in metadata, I think we can afford to store the key count use it to provide an estimate when we load it into the tablet.  Perhaps if we know the start key is "a" and the end key is "z" and the tablets range is "a->m" we can just estimate 50% of the key count.
> When a bulk file fits completely in a range, the key count estimate will be accurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira