You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/07/30 19:13:53 UTC
[jira] Commented: (HADOOP-1662) [hbase] Make region splits faster

    [ https://issues.apache.org/jira/browse/HADOOP-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516445 ] 

stack commented on HADOOP-1662:
-------------------------------

Here's a proposal for another split mechanism, one that should work even faster than the above mapfile split patch.

Currently on split, each of the parent regions' store files are halved with one daughter getting a copy of the top half of all store files and the other the bottom.  On completion of the division, the parent region is deleted and all references removed from the --META-- table replaced by references to the daughters.  Divvying up the parent store files amongst its daughters takes too long.

The below proposes a split method inspired by the suggestive tail of the 'Exploiting Immutabliilty' section of the Google Bigtable paper: "...the immutability of SSTables enables us to split tablets quickly. Instead of generating a new set of SSTables for each child tablet, we let the child tablets share the SSTables of the parent tablet." 

Rather than copy the top and bottom halves of the parent 's store files to new store files in the split's daughter regions, instead, on region mitosis, have the daughters keep references to the parent.  The references will be undone on a compaction when stores that parent references are rewritten into new files under the daughter region.

Detail:

On region split, no longer immediately delete the parent.  Instead move the parent to a directory named 'split.parents'. Corralling split parents this way makes it easier distingushing split parents from live regions.

Regions have subdirectories, one per column family.  On split, add to the parent a subdirectory at the same level as column families named in a manner illegal for column families: e.g. '.splits' or ':splits'.  Into this directory, write two empty files each named for the daughter regions so we have a means of relating parent to children.

Add a cleanup thread to HMaster that runs on a long period that looks at the content of 'split.parents'.  For each, per daughter, it looks to see if the child still references the parent (See later for how the cleanup thread detects references).  If not, it deletes the pertinent daughter file from :splits.  When both have been removed -- neither of the daughters hold references to the parent -- the HMaster cleanup thread removes the parent region.

On region split, two daughter regions are written.  One will reference the top halves of the parent regions store files.  The other the bottom halves.  Codify the reference to a parent store file in the name given the daughter referring store file.  Currently store files are named like this: mapfile.dat.3948616888538006163 for mapfiles and mapfile.info.3948616888538006163 for info files where the number suffix is a random unique id .  Name store files that reference a region of a parent store file as follows: mapfile.ref.REGION_NAME.[top|bottom].  E.g. mapfile.ref.region_hbaserepository,x1GAyQ6M_A2o2B8LpmlHKk==,7524499765167357666.8418484899696132011.top.  These referencing files are empty.

Add a mapfile subclass to hbase that does the right thing when its passed a reference proxying to appropriate region on the backing parent file.  Use this hbase mapfile subclass whenever hbase loads store files.

> [hbase] Make region splits faster
> ---------------------------------
>
>                 Key: HADOOP-1662
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1662
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>         Attachments: mapfile_split.patch
>
>
> HADOOP-1644 '[hbase] Compactions should take no longer than period between memcache flushes' is about making compactions run faster.  This issue is about making splits faster.  Currently splits are done by reading as input a map file and per record, writing out two new mapfiles.  Its currently too slow.  ~30 seconds to split 120MB. Google hints in bigtable that splitting is very fast because they let the split children feed off the split parent.  Primitive testing has splitting mapfiles using raw streams running 3 to 4 times faster than splitting on mapfile keys.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.