You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michaela Buergle <Mi...@neofonie.de> on 2008/04/08 17:17:37 UTC

Save lists relating to records

Hi all,
I'm planning to save (and retrieve) potentially very long lists of
values, each list relating to one row in my HBase table. Have any of you
tried something similar with HBase?

Possible approaches that come to mind are:
- Insert a new row for each list item, duplicate the rest
- Create a column family for the list and insert each list item into a
new column in this family
- Write the whole list into one cell using custom formatting
- Use several tables + perform some kind of join

If you have any clues or experiences, I'd love to hear them.

micha

Re: Save lists relating to records

Posted by stack <st...@duboce.net>.
Michaela Buergle wrote:
> I think the HBase documentation could be more explicit on the subject of
> column families/columns.
>   

Help us out Micha.  Please identify how you think it should be better.  
You can also make a patch if so inclined.
Thanks,
St.Ack

Re: Save lists relating to records

Posted by Michaela Buergle <Mi...@neofonie.de>.
Ok, once again HQL got me confused :)
I just found out that I can easily get all columns for a family via the
API by just passing "familyName:" as an argument. Before, I thought it
was necessary to explicitly name each column because that is the case
from within the HQL shell.
Considering that, the multi-column approach will clearly be the favorite
for my list.

I think the HBase documentation could be more explicit on the subject of
column families/columns.

micha

Bryan Duxbury wrote:
> If you are using the getRow method, then you can get the whole row
> (including your list column) all at once. There is also an overload that
> allows you to select a desired subset of columns.
> -Bryan
> 
> On Apr 8, 2008, at 8:47 AM, Michaela Buergle wrote:
> 
>> Thanks for your remarks!
>> The multi-column approach does seem more HBase-y. I'm concerned however
>> about performance when retrieving a large number of columns in order to
>> get the whole list per record. What do you think?
>>
>> micha
>>
>> Bryan Duxbury wrote:
>>> I think the approach of using a column family for the list and a column
>>> for each element is the way to go. It seems to be the most HBase-y way
>>> to lay the schema out.
>>>
>>> You can of course use multiple tables if you want, but we have no joins
>>> of any kind implemented in HBase, so it'd be up to you to perform the
>>> join yourself in application code.
>>>
>>> Custom formatting would work too, but then you pay the cost of being
>>> unable to look at and add manipulate items in the list individually, so
>>> I'd also stay away from that.
>>>
>>> -Bryan
>>>
>>> On Apr 8, 2008, at 8:17 AM, Michaela Buergle wrote:
>>>
>>>> Hi all,
>>>> I'm planning to save (and retrieve) potentially very long lists of
>>>> values, each list relating to one row in my HBase table. Have any of
>>>> you
>>>> tried something similar with HBase?
>>>>
>>>> Possible approaches that come to mind are:
>>>> - Insert a new row for each list item, duplicate the rest
>>>> - Create a column family for the list and insert each list item into a
>>>> new column in this family
>>>> - Write the whole list into one cell using custom formatting
>>>> - Use several tables + perform some kind of join
>>
> 

Re: Save lists relating to records

Posted by Bryan Duxbury <br...@rapleaf.com>.
If you are using the getRow method, then you can get the whole row  
(including your list column) all at once. There is also an overload  
that allows you to select a desired subset of columns.
-Bryan

On Apr 8, 2008, at 8:47 AM, Michaela Buergle wrote:

> Thanks for your remarks!
> The multi-column approach does seem more HBase-y. I'm concerned  
> however
> about performance when retrieving a large number of columns in  
> order to
> get the whole list per record. What do you think?
>
> micha
>
> Bryan Duxbury wrote:
>> I think the approach of using a column family for the list and a  
>> column
>> for each element is the way to go. It seems to be the most HBase-y  
>> way
>> to lay the schema out.
>>
>> You can of course use multiple tables if you want, but we have no  
>> joins
>> of any kind implemented in HBase, so it'd be up to you to perform the
>> join yourself in application code.
>>
>> Custom formatting would work too, but then you pay the cost of being
>> unable to look at and add manipulate items in the list  
>> individually, so
>> I'd also stay away from that.
>>
>> -Bryan
>>
>> On Apr 8, 2008, at 8:17 AM, Michaela Buergle wrote:
>>
>>> Hi all,
>>> I'm planning to save (and retrieve) potentially very long lists of
>>> values, each list relating to one row in my HBase table. Have any  
>>> of you
>>> tried something similar with HBase?
>>>
>>> Possible approaches that come to mind are:
>>> - Insert a new row for each list item, duplicate the rest
>>> - Create a column family for the list and insert each list item  
>>> into a
>>> new column in this family
>>> - Write the whole list into one cell using custom formatting
>>> - Use several tables + perform some kind of join
>


Re: Save lists relating to records

Posted by Michaela Buergle <Mi...@neofonie.de>.
Thanks for your remarks!
The multi-column approach does seem more HBase-y. I'm concerned however
about performance when retrieving a large number of columns in order to
get the whole list per record. What do you think?

micha

Bryan Duxbury wrote:
> I think the approach of using a column family for the list and a column
> for each element is the way to go. It seems to be the most HBase-y way
> to lay the schema out.
> 
> You can of course use multiple tables if you want, but we have no joins
> of any kind implemented in HBase, so it'd be up to you to perform the
> join yourself in application code.
> 
> Custom formatting would work too, but then you pay the cost of being
> unable to look at and add manipulate items in the list individually, so
> I'd also stay away from that.
> 
> -Bryan
> 
> On Apr 8, 2008, at 8:17 AM, Michaela Buergle wrote:
> 
>> Hi all,
>> I'm planning to save (and retrieve) potentially very long lists of
>> values, each list relating to one row in my HBase table. Have any of you
>> tried something similar with HBase?
>>
>> Possible approaches that come to mind are:
>> - Insert a new row for each list item, duplicate the rest
>> - Create a column family for the list and insert each list item into a
>> new column in this family
>> - Write the whole list into one cell using custom formatting
>> - Use several tables + perform some kind of join


Re: Save lists relating to records

Posted by Bryan Duxbury <br...@rapleaf.com>.
I think the approach of using a column family for the list and a  
column for each element is the way to go. It seems to be the most  
HBase-y way to lay the schema out.

You can of course use multiple tables if you want, but we have no  
joins of any kind implemented in HBase, so it'd be up to you to  
perform the join yourself in application code.

Custom formatting would work too, but then you pay the cost of being  
unable to look at and add manipulate items in the list individually,  
so I'd also stay away from that.

-Bryan

On Apr 8, 2008, at 8:17 AM, Michaela Buergle wrote:

> Hi all,
> I'm planning to save (and retrieve) potentially very long lists of
> values, each list relating to one row in my HBase table. Have any  
> of you
> tried something similar with HBase?
>
> Possible approaches that come to mind are:
> - Insert a new row for each list item, duplicate the rest
> - Create a column family for the list and insert each list item into a
> new column in this family
> - Write the whole list into one cell using custom formatting
> - Use several tables + perform some kind of join
>
> If you have any clues or experiences, I'd love to hear them.
>
> micha