You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/11/02 19:33:32 UTC

[jira] [Issue Comment Edited] (ACCUMULO-112) Investigate partitioning in memory map by locality group

    [ https://issues.apache.org/jira/browse/ACCUMULO-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142395#comment-13142395 ] 

Keith Turner edited comment on ACCUMULO-112 at 11/2/11 6:32 PM:
----------------------------------------------------------------

I ran some test with random data.  The data was of the following format :

{noformat}
 <16 digit rand hex> <4 digit hex> <4 digit rand hex> <50 byte random value>
{noformat}

There were 32 column families, 0000 to 001f. 

For the experiment 32,768 rows with 32 columns were inserted, creating 1,048,576 entries.  The number of locality groups were varied and minor compaction times were recorded.  Column families were evenly divided among locality groups.  Below are the minor compaction times.

||Num Locality Groups||Minor Compaction Time||Relative Time||
|1 (default LG)|3.5 secs|1.0|
|4|6.4 secs|1.8|
|8|9.4 secs|2.7|
|16|16.4 secs|4.7|
|32|30.2 secs|8.6|

Since the data was written to an unpartitioned in memory map, the insert times should have been the same.  Once the in memory map is partitioned, it would be useful to track ingest time and minor compaction time.

LZO was used for compression.

                
      was (Author: kturner):
    I ran some test with random data.  The data was of the following format :

{noformat}
 <16 digit rand hex> <4 digit hex> <4 digit rand hex> <50 byte random value>
{noformat}

There were 32 column families, 0000 to 001f. 

For the experiment 32,768 rows with 32 columns were inserted, creating 1,048,576 entries.  The number of locality groups were varied and minor compaction times were recorded.  Column families were evenly divided among locality groups.  Below are the minor compaction times.

||Num Locality Groups||Minor Compaction Time||Relative Time||
|1 (default LG)|3.5 secs|1.0|
|4|6.4 secs|1.8|
|8|9.4 secs|2.7|
|16|16.4 secs|4.7|
|32|30.2 secs|8.6|

Since the data was written to an unpartitioned in memory map, the insert times should have been the same.  Once the in memory map is partitioned, it would be useful to track ingest time and minor compaction time.

                  
> Investigate partitioning in memory map by locality group
> --------------------------------------------------------
>
>                 Key: ACCUMULO-112
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-112
>             Project: Accumulo
>          Issue Type: Task
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
>
>
> Currently the in memory map is not partitioned by locality group.  This could negatively impact scan and minor compaction performance.    Would like to run some experiments to understand the performance implications.  Partitioning by locality group could negatively impact insert performance, it could go from O(log(R)+log(C))  to O(L * (log(R)+log(C))) in the worst case.  L is the number of locality groups, R is the number of rows and C is the number of columns.  The worst case is where each mutation has a change for each locality group. 
> Currently the in memory map is a map of maps.  Like the following.
> {noformat}
>   map<row, map<col, val>>
> {noformat}
> Could conceptually change this to one of the following.  The first is best for scans, that access some locality groups, and minor compactions.  The second is good for inserts where the mutation covers all locality groups, because the row is only looked up once.
> {noformat}
>   map<localityGroup, map<row, map<col, val>>>
> {noformat}
> {noformat}
>   map<row, map<localityGroup, map<col, val>>>
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira