You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/09/23 19:16:15 UTC

[jira] Created: (HBASE-1861) Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)

Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
-----------------------------------------------------------------------------

                 Key: HBASE-1861
                 URL: https://issues.apache.org/jira/browse/HBASE-1861
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: mapreduce
    Affects Versions: 0.20.0
            Reporter: Jonathan Gray
             Fix For: 0.21.0


Add multi-family support to bulk upload tools from HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1861) Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)

Posted by "Ioannis Konstantinou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798375#action_12798375 ] 

Ioannis Konstantinou commented on HBASE-1861:
---------------------------------------------

Hi again. One thing I noticed during bulk upload (of a single column family) is a bug in the following scenario (correct me if this is not the case): 
I have a mapper that reads input and emmits KeyValue objects to be fed in the KeyValueSortReducer. The mapper emmits a number of KeyValue objects for each row. For the same rowid, the KeyValue objects have different columnids. 
The problem is the following: when these KeyValue objects (that have the same rowid but different colids in the same column family) reach the reducer, the TreeSet used to sort KeyValues, keeps only the KeyValue that gets last (it replaces all entries with the last one that reaches the reducer), as the KeyValue.COMPARATOR compares only the rowid !!!!!
Can I use a different Comparator??? KeyValue objects of the same rowid must be sorted before writing them in the Hfile, or this does not matter???

> Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-1861
>                 URL: https://issues.apache.org/jira/browse/HBASE-1861
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.21.0
>
>
> Add multi-family support to bulk upload tools from HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1861) Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)

Posted by "Ioannis Konstantinou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792011#action_12792011 ] 

Ioannis Konstantinou commented on HBASE-1861:
---------------------------------------------

Is anyone working on this?

> Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-1861
>                 URL: https://issues.apache.org/jira/browse/HBASE-1861
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.21.0
>
>
> Add multi-family support to bulk upload tools from HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1861) Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792031#action_12792031 ] 

stack commented on HBASE-1861:
------------------------------

Not to my knowledge.   Thinking on it, this case is a little tougher than the single family case. 

1. In single family case, we just write single files and read the file metadata to create region (We extract from the file its start and end rows and use these conjuring the region description).  In the multiple family case, somehow you'll have to tie all files in a region together -- perhaps in metadata or with a file suffix or prefix.  I was thinking that you'd keep a running tab on the size of the file in each family and then as soon as any one file went over the region maximum file size limit, you'd rotate all files.
2. The loadtables.rb script would need to change to read across all files in a region to find the least first row and the maximum last row by looking at all file metadatas. 

If you want to discuss this issue more, put up some questions and I'll have a stab at them.  Thanks.

> Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-1861
>                 URL: https://issues.apache.org/jira/browse/HBASE-1861
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.21.0
>
>
> Add multi-family support to bulk upload tools from HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.