You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Billy Pearson (JIRA)" <ji...@apache.org> on 2008/02/06 22:08:07 UTC

[jira] Commented: (HBASE-48) [hbase] Bulk load and dump tools

    [ https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566338#action_12566338 ] 

Billy Pearson commented on HBASE-48:
------------------------------------

Would not the best way to do this would be to do a map that formats and sorts the data per column family then a reduce that writes a mapfiles directly to the regions columns?

Then that would skip the api and speed up the loading of the data and it would not matter so much if we has 1 region or not sense all we would be doing is adding a mapfile to hdfs.
Course the map would have to know if there is 1 region or 1000 and split the data correctly but even if each map 
only produces a few lines of data per column family the compactor will come along sooner or later and clean up and split where needed.

So if we add 100 map files to one column I would assume that it would slow reads down a little bit havening to sort threw all the map files while scanning but that would be a temporary speed problem.


> [hbase] Bulk load and dump tools
> --------------------------------
>
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via the current APIs, particularly if the dataset is large and cell content is small, uploads can take a long time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write regions directly in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.