You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/07/27 21:51:53 UTC

[jira] Updated: (HADOOP-1662) [hbase] Make region splits faster

     [ https://issues.apache.org/jira/browse/HADOOP-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-1662:
--------------------------

    Attachment: mapfile_split.patch

Here is a split function done as a static in mapfile so can get at the private mapfile index.  Includes unit test.  Needs testing in a loaded hbase.

> [hbase] Make region splits faster
> ---------------------------------
>
>                 Key: HADOOP-1662
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1662
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>         Attachments: mapfile_split.patch
>
>
> HADOOP-1644 '[hbase] Compactions should take no longer than period between memcache flushes' is about making compactions run faster.  This issue is about making splits faster.  Currently splits are done by reading as input a map file and per record, writing out two new mapfiles.  Its currently too slow.  ~30 seconds to split 120MB. Google hints in bigtable that splitting is very fast because they let the split children feed off the split parent.  Primitive testing has splitting mapfiles using raw streams running 3 to 4 times faster than splitting on mapfile keys.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.