You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (JIRA)" <ji...@apache.org> on 2012/04/21 00:27:33 UTC

[jira] [Created] (ACCUMULO-550) Colocate rfile index entries within file

Keith Turner created ACCUMULO-550:
-------------------------------------

             Summary: Colocate rfile index entries within file
                 Key: ACCUMULO-550
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-550
             Project: Accumulo
          Issue Type: Improvement
          Components: tserver
            Reporter: Keith Turner
            Assignee: Keith Turner
             Fix For: 1.5.0


Before multi-level indexes were introduced, when an an rfile was written its entire index was held in memory and written out then the file was closed.  With the introduction of multilevel index each index block is written when it fills up as the file is being written.  This was done to handle the case where the index may not fit into memory.  This leads to index blocks being sprinkled through the file.   So any operation that iterates over the entire index can be slow because it turns into a lot of random accesses.   

One possible solution is to buffer lots of index blocks up to some some threshold and write out alot of index blocks at once.  This would make a scan of the index much faster as it would turn into a set of sequential reads of large chunks of data.

Could buffer all block at a particular level and write them out when the parent index block fills up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-550) Colocate rfile index entries within file

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259920#comment-13259920 ] 

Keith Turner commented on ACCUMULO-550:
---------------------------------------

I did some experiments and found that scans of the index were 4 to 5 times faster for a large file with contiguous index blocks.   This is when the files were not in any cache (I discovered the purge command on the mac).  When the index blocks were in the filesystem cache it was about 1.5 times faster. 
                
> Colocate rfile index entries within file
> ----------------------------------------
>
>                 Key: ACCUMULO-550
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-550
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
>
>
> Before multi-level indexes were introduced, when an an rfile was written its entire index was held in memory and written out then the file was closed.  With the introduction of multilevel index each index block is written when it fills up as the file is being written.  This was done to handle the case where the index may not fit into memory.  This leads to index blocks being sprinkled through the file.   So any operation that iterates over the entire index can be slow because it turns into a lot of random accesses.   
> One possible solution is to buffer lots of index blocks up to some some threshold and write out alot of index blocks at once.  This would make a scan of the index much faster as it would turn into a set of sequential reads of large chunks of data.
> Could buffer all block at a particular level and write them out when the parent index block fills up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-550) Colocate rfile index entries within file

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258631#comment-13258631 ] 

Keith Turner commented on ACCUMULO-550:
---------------------------------------

This may be a worthwhile change for 1.4.x
                
> Colocate rfile index entries within file
> ----------------------------------------
>
>                 Key: ACCUMULO-550
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-550
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
>
>
> Before multi-level indexes were introduced, when an an rfile was written its entire index was held in memory and written out then the file was closed.  With the introduction of multilevel index each index block is written when it fills up as the file is being written.  This was done to handle the case where the index may not fit into memory.  This leads to index blocks being sprinkled through the file.   So any operation that iterates over the entire index can be slow because it turns into a lot of random accesses.   
> One possible solution is to buffer lots of index blocks up to some some threshold and write out alot of index blocks at once.  This would make a scan of the index much faster as it would turn into a set of sequential reads of large chunks of data.
> Could buffer all block at a particular level and write them out when the parent index block fills up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (ACCUMULO-550) Colocate rfile index entries within file

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner resolved ACCUMULO-550.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4.1
    
> Colocate rfile index entries within file
> ----------------------------------------
>
>                 Key: ACCUMULO-550
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-550
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0, 1.4.1
>
>
> Before multi-level indexes were introduced, when an an rfile was written its entire index was held in memory and written out then the file was closed.  With the introduction of multilevel index each index block is written when it fills up as the file is being written.  This was done to handle the case where the index may not fit into memory.  This leads to index blocks being sprinkled through the file.   So any operation that iterates over the entire index can be slow because it turns into a lot of random accesses.   
> One possible solution is to buffer lots of index blocks up to some some threshold and write out alot of index blocks at once.  This would make a scan of the index much faster as it would turn into a set of sequential reads of large chunks of data.
> Could buffer all block at a particular level and write them out when the parent index block fills up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-550) Colocate rfile index entries within file

Posted by "Adam Fuchs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403879#comment-13403879 ] 

Adam Fuchs commented on ACCUMULO-550:
-------------------------------------

I noticed yesterday when working on ACCUMULO-652 that the index entries at level 1 and higher are still interspersed between level 0 blocks with the current technique. Is there value in keeping indexes at a given level greater than 0 close to each other, or is that overkill?
                
> Colocate rfile index entries within file
> ----------------------------------------
>
>                 Key: ACCUMULO-550
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-550
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0, 1.4.1
>
>
> Before multi-level indexes were introduced, when an an rfile was written its entire index was held in memory and written out then the file was closed.  With the introduction of multilevel index each index block is written when it fills up as the file is being written.  This was done to handle the case where the index may not fit into memory.  This leads to index blocks being sprinkled through the file.   So any operation that iterates over the entire index can be slow because it turns into a lot of random accesses.   
> One possible solution is to buffer lots of index blocks up to some some threshold and write out alot of index blocks at once.  This would make a scan of the index much faster as it would turn into a set of sequential reads of large chunks of data.
> Could buffer all block at a particular level and write them out when the parent index block fills up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-550) Collocate rfile index entries within file

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404130#comment-13404130 ] 

Keith Turner commented on ACCUMULO-550:
---------------------------------------

I do not think having the blocks > level 0 sprinkled through between level 0 blocks is a problem for sequentially reading the index.  The blocks > level 0 are read so infrequently compared to the level 0 blocks that I suspect the cost of the occasional random read for a level 1 block is amortized away.
                
> Collocate rfile index entries within file
> -----------------------------------------
>
>                 Key: ACCUMULO-550
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-550
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0, 1.4.1
>
>
> Before multi-level indexes were introduced, when an an rfile was written its entire index was held in memory and written out then the file was closed.  With the introduction of multilevel index each index block is written when it fills up as the file is being written.  This was done to handle the case where the index may not fit into memory.  This leads to index blocks being sprinkled through the file.   So any operation that iterates over the entire index can be slow because it turns into a lot of random accesses.   
> One possible solution is to buffer lots of index blocks up to some some threshold and write out alot of index blocks at once.  This would make a scan of the index much faster as it would turn into a set of sequential reads of large chunks of data.
> Could buffer all block at a particular level and write them out when the parent index block fills up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ACCUMULO-550) Collocate rfile index entries within file

Posted by "Adam Fuchs (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Fuchs updated ACCUMULO-550:
--------------------------------

    Summary: Collocate rfile index entries within file  (was: Colocate rfile index entries within file)
    
> Collocate rfile index entries within file
> -----------------------------------------
>
>                 Key: ACCUMULO-550
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-550
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0, 1.4.1
>
>
> Before multi-level indexes were introduced, when an an rfile was written its entire index was held in memory and written out then the file was closed.  With the introduction of multilevel index each index block is written when it fills up as the file is being written.  This was done to handle the case where the index may not fit into memory.  This leads to index blocks being sprinkled through the file.   So any operation that iterates over the entire index can be slow because it turns into a lot of random accesses.   
> One possible solution is to buffer lots of index blocks up to some some threshold and write out alot of index blocks at once.  This would make a scan of the index much faster as it would turn into a set of sequential reads of large chunks of data.
> Could buffer all block at a particular level and write them out when the parent index block fills up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira