You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Stephen Boesch <ja...@gmail.com> on 2014/09/08 23:06:31 UTC

Nested data structures examples for HBase

While I am aware that HBase does not have native support for nested
structures, surely there are some of you that have thought through this use
case carefully.

Our particular use case is likely having single digit nested layers with
tens to hundreds of items in the lists at each level.

An example would be a

 top Level  300 items
 middle level :  1 to 100 items  ("1 value"  may indicate a single value as
opposed to a list)
 third level:  1 to 50 items
 fourth level  1 to 20 items

The column names are likely known ahead of time- which may or may not
matter for hbase.  We could model the above structure in a Parquet File or
in Hive (with nested struct's)- but we would like to consider whether
HBase.might also be an option.

Re: Nested data structures examples for HBase

Posted by Stephen Boesch <ja...@gmail.com>.
Thanks Demai. That HBasec conf 2012 presentation leaned pretty heavily on
the nested data as a common HBase design pattern.  Nothing magic there: but
good to confirm it as the accepted norm.  We will proceed with our
considerations given the approach from that talk.

There was one moderately interesting point in the talk: to create duplicate
copies of nested data structures into separate column families according to
different sorting orders from the application query patterns. . The cost of
course is more disk space  - but the faster access time is more important
than the additional disk usage.

2014-09-08 15:59 GMT-07:00 Demai Ni <ni...@gmail.com>:

> there was a schema design talk in HBase conf 2012 (
> http://www.slideshare.net/cloudera/5-h-base-schemahbasecon2012). There was
> a video but the link from here: http://hbasecon.com/archive.html, but it
> is
> broken
>
> Anyway, if I remember correctly. the idea is to use column(aka
> columnfamily) as one layer, and qualifier as another layer, and play some
> tricks around it. You may like to take a look to see whether it fits your
> needs. The layers has to be simple, or an alternative is to save a json
> file(or other structured file format as the raw value)
>
> Demai
>
> On Mon, Sep 8, 2014 at 2:06 PM, Stephen Boesch <ja...@gmail.com> wrote:
>
> > While I am aware that HBase does not have native support for nested
> > structures, surely there are some of you that have thought through this
> use
> > case carefully.
> >
> > Our particular use case is likely having single digit nested layers with
> > tens to hundreds of items in the lists at each level.
> >
> > An example would be a
> >
> >  top Level  300 items
> >  middle level :  1 to 100 items  ("1 value"  may indicate a single value
> as
> > opposed to a list)
> >  third level:  1 to 50 items
> >  fourth level  1 to 20 items
> >
> > The column names are likely known ahead of time- which may or may not
> > matter for hbase.  We could model the above structure in a Parquet File
> or
> > in Hive (with nested struct's)- but we would like to consider whether
> > HBase.might also be an option.
> >
>

Re: Nested data structures examples for HBase

Posted by Demai Ni <ni...@gmail.com>.
there was a schema design talk in HBase conf 2012 (
http://www.slideshare.net/cloudera/5-h-base-schemahbasecon2012). There was
a video but the link from here: http://hbasecon.com/archive.html, but it is
broken

Anyway, if I remember correctly. the idea is to use column(aka
columnfamily) as one layer, and qualifier as another layer, and play some
tricks around it. You may like to take a look to see whether it fits your
needs. The layers has to be simple, or an alternative is to save a json
file(or other structured file format as the raw value)

Demai

On Mon, Sep 8, 2014 at 2:06 PM, Stephen Boesch <ja...@gmail.com> wrote:

> While I am aware that HBase does not have native support for nested
> structures, surely there are some of you that have thought through this use
> case carefully.
>
> Our particular use case is likely having single digit nested layers with
> tens to hundreds of items in the lists at each level.
>
> An example would be a
>
>  top Level  300 items
>  middle level :  1 to 100 items  ("1 value"  may indicate a single value as
> opposed to a list)
>  third level:  1 to 50 items
>  fourth level  1 to 20 items
>
> The column names are likely known ahead of time- which may or may not
> matter for hbase.  We could model the above structure in a Parquet File or
> in Hive (with nested struct's)- but we would like to consider whether
> HBase.might also be an option.
>

Re: Nested data structures examples for HBase

Posted by Stephen Boesch <ja...@gmail.com>.
Thanks Sean.  We have some internal requirements that lead us to most
likely need to stick with native HBase API's.  But the suggestion is still
appreciated - I was not aware of that project.

2014-09-10 12:09 GMT-07:00 Sean Busbey <bu...@cloudera.com>:

> Hi Stephen!
>
> Have you taken a look at Apache Gora? It uses Avro for its data model,
> which supports nested data structures, and can store in a variety of
> backing stores, including HBase.
>
> -Sean
>
> On Tue, Sep 9, 2014 at 4:20 PM, Stephen Boesch <ja...@gmail.com> wrote:
>
> > Thanks Michael, yes  cells are byte[]; therefore, storing JSON or other
> > document structures is always possible.  Our use cases include querying
> > individual elements in the structure - so that would require
> reconstituting
> > the documents and then parsing them for every row.  We probably are not
> > headed in the direction of HBase for those use cases: but we are trying
> to
> > make that determination after having carefully considered the extent of
> the
> > mismatch.
> >
> > 2014-09-09 13:37 GMT-07:00 Michael Segel <mi...@hotmail.com>:
> >
> > > You do realize that everything you store in Hbase are byte arrays,
> right?
> > > That is each cell is a blob.
> > >
> > > So you have the ability to create nested structures like… JSON records?
> > ;-)
> > >
> > > So to your point. You can have a column A which represents a set of
> > values.
> > >
> > > This is one reason why you shouldn’t think of HBase in terms of being
> > > relational. In fact for Hadoop, you really don’t want to think in terms
> > of
> > > relational structures.
> > > Think more of Hierarchical.
> > >
> > > So yes, you can do what you want to do…
> > >
> > > HTH
> > >
> > > -Mike
> > >
> > > On Sep 8, 2014, at 10:06 PM, Stephen Boesch <ja...@gmail.com> wrote:
> > >
> > > > While I am aware that HBase does not have native support for nested
> > > > structures, surely there are some of you that have thought through
> this
> > > use
> > > > case carefully.
> > > >
> > > > Our particular use case is likely having single digit nested layers
> > with
> > > > tens to hundreds of items in the lists at each level.
> > > >
> > > > An example would be a
> > > >
> > > > top Level  300 items
> > > > middle level :  1 to 100 items  ("1 value"  may indicate a single
> value
> > > as
> > > > opposed to a list)
> > > > third level:  1 to 50 items
> > > > fourth level  1 to 20 items
> > > >
> > > > The column names are likely known ahead of time- which may or may not
> > > > matter for hbase.  We could model the above structure in a Parquet
> File
> > > or
> > > > in Hive (with nested struct's)- but we would like to consider whether
> > > > HBase.might also be an option.
> > >
> > >
> >
>
>
>
> --
> Sean
>

Re: Nested data structures examples for HBase

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Stephen!

Have you taken a look at Apache Gora? It uses Avro for its data model,
which supports nested data structures, and can store in a variety of
backing stores, including HBase.

-Sean

On Tue, Sep 9, 2014 at 4:20 PM, Stephen Boesch <ja...@gmail.com> wrote:

> Thanks Michael, yes  cells are byte[]; therefore, storing JSON or other
> document structures is always possible.  Our use cases include querying
> individual elements in the structure - so that would require reconstituting
> the documents and then parsing them for every row.  We probably are not
> headed in the direction of HBase for those use cases: but we are trying to
> make that determination after having carefully considered the extent of the
> mismatch.
>
> 2014-09-09 13:37 GMT-07:00 Michael Segel <mi...@hotmail.com>:
>
> > You do realize that everything you store in Hbase are byte arrays, right?
> > That is each cell is a blob.
> >
> > So you have the ability to create nested structures like… JSON records?
> ;-)
> >
> > So to your point. You can have a column A which represents a set of
> values.
> >
> > This is one reason why you shouldn’t think of HBase in terms of being
> > relational. In fact for Hadoop, you really don’t want to think in terms
> of
> > relational structures.
> > Think more of Hierarchical.
> >
> > So yes, you can do what you want to do…
> >
> > HTH
> >
> > -Mike
> >
> > On Sep 8, 2014, at 10:06 PM, Stephen Boesch <ja...@gmail.com> wrote:
> >
> > > While I am aware that HBase does not have native support for nested
> > > structures, surely there are some of you that have thought through this
> > use
> > > case carefully.
> > >
> > > Our particular use case is likely having single digit nested layers
> with
> > > tens to hundreds of items in the lists at each level.
> > >
> > > An example would be a
> > >
> > > top Level  300 items
> > > middle level :  1 to 100 items  ("1 value"  may indicate a single value
> > as
> > > opposed to a list)
> > > third level:  1 to 50 items
> > > fourth level  1 to 20 items
> > >
> > > The column names are likely known ahead of time- which may or may not
> > > matter for hbase.  We could model the above structure in a Parquet File
> > or
> > > in Hive (with nested struct's)- but we would like to consider whether
> > > HBase.might also be an option.
> >
> >
>



-- 
Sean

Re: Nested data structures examples for HBase

Posted by Michael Segel <mi...@hotmail.com>.
Are you just kicking the tires or do you want to roll up your sleeves and do some work? 

You have options. 
Secondary Indexes. 

I don’t mean an inverted table but things like SOLR, Lucene, Elastic search… 

The only downside is that depending on what you index, you can see an explosion in the data being stored in HBase.

But that may be beyond you.  Its a non-trivial task, and to be honest… a bit of ‘rocket science’. 

Its still doable…


On Sep 9, 2014, at 10:20 PM, Stephen Boesch <ja...@gmail.com> wrote:

> Thanks Michael, yes  cells are byte[]; therefore, storing JSON or other
> document structures is always possible.  Our use cases include querying
> individual elements in the structure - so that would require reconstituting
> the documents and then parsing them for every row.  We probably are not
> headed in the direction of HBase for those use cases: but we are trying to
> make that determination after having carefully considered the extent of the
> mismatch.
> 
> 2014-09-09 13:37 GMT-07:00 Michael Segel <mi...@hotmail.com>:
> 
>> You do realize that everything you store in Hbase are byte arrays, right?
>> That is each cell is a blob.
>> 
>> So you have the ability to create nested structures like… JSON records? ;-)
>> 
>> So to your point. You can have a column A which represents a set of values.
>> 
>> This is one reason why you shouldn’t think of HBase in terms of being
>> relational. In fact for Hadoop, you really don’t want to think in terms of
>> relational structures.
>> Think more of Hierarchical.
>> 
>> So yes, you can do what you want to do…
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Sep 8, 2014, at 10:06 PM, Stephen Boesch <ja...@gmail.com> wrote:
>> 
>>> While I am aware that HBase does not have native support for nested
>>> structures, surely there are some of you that have thought through this
>> use
>>> case carefully.
>>> 
>>> Our particular use case is likely having single digit nested layers with
>>> tens to hundreds of items in the lists at each level.
>>> 
>>> An example would be a
>>> 
>>> top Level  300 items
>>> middle level :  1 to 100 items  ("1 value"  may indicate a single value
>> as
>>> opposed to a list)
>>> third level:  1 to 50 items
>>> fourth level  1 to 20 items
>>> 
>>> The column names are likely known ahead of time- which may or may not
>>> matter for hbase.  We could model the above structure in a Parquet File
>> or
>>> in Hive (with nested struct's)- but we would like to consider whether
>>> HBase.might also be an option.
>> 
>> 


Re: Nested data structures examples for HBase

Posted by Stephen Boesch <ja...@gmail.com>.
Thanks Michael, yes  cells are byte[]; therefore, storing JSON or other
document structures is always possible.  Our use cases include querying
individual elements in the structure - so that would require reconstituting
the documents and then parsing them for every row.  We probably are not
headed in the direction of HBase for those use cases: but we are trying to
make that determination after having carefully considered the extent of the
mismatch.

2014-09-09 13:37 GMT-07:00 Michael Segel <mi...@hotmail.com>:

> You do realize that everything you store in Hbase are byte arrays, right?
> That is each cell is a blob.
>
> So you have the ability to create nested structures like… JSON records? ;-)
>
> So to your point. You can have a column A which represents a set of values.
>
> This is one reason why you shouldn’t think of HBase in terms of being
> relational. In fact for Hadoop, you really don’t want to think in terms of
> relational structures.
> Think more of Hierarchical.
>
> So yes, you can do what you want to do…
>
> HTH
>
> -Mike
>
> On Sep 8, 2014, at 10:06 PM, Stephen Boesch <ja...@gmail.com> wrote:
>
> > While I am aware that HBase does not have native support for nested
> > structures, surely there are some of you that have thought through this
> use
> > case carefully.
> >
> > Our particular use case is likely having single digit nested layers with
> > tens to hundreds of items in the lists at each level.
> >
> > An example would be a
> >
> > top Level  300 items
> > middle level :  1 to 100 items  ("1 value"  may indicate a single value
> as
> > opposed to a list)
> > third level:  1 to 50 items
> > fourth level  1 to 20 items
> >
> > The column names are likely known ahead of time- which may or may not
> > matter for hbase.  We could model the above structure in a Parquet File
> or
> > in Hive (with nested struct's)- but we would like to consider whether
> > HBase.might also be an option.
>
>

Re: Nested data structures examples for HBase

Posted by Michael Segel <mi...@hotmail.com>.
You do realize that everything you store in Hbase are byte arrays, right? That is each cell is a blob. 

So you have the ability to create nested structures like… JSON records? ;-) 

So to your point. You can have a column A which represents a set of values. 

This is one reason why you shouldn’t think of HBase in terms of being relational. In fact for Hadoop, you really don’t want to think in terms of relational structures. 
Think more of Hierarchical. 

So yes, you can do what you want to do… 

HTH

-Mike

On Sep 8, 2014, at 10:06 PM, Stephen Boesch <ja...@gmail.com> wrote:

> While I am aware that HBase does not have native support for nested
> structures, surely there are some of you that have thought through this use
> case carefully.
> 
> Our particular use case is likely having single digit nested layers with
> tens to hundreds of items in the lists at each level.
> 
> An example would be a
> 
> top Level  300 items
> middle level :  1 to 100 items  ("1 value"  may indicate a single value as
> opposed to a list)
> third level:  1 to 50 items
> fourth level  1 to 20 items
> 
> The column names are likely known ahead of time- which may or may not
> matter for hbase.  We could model the above structure in a Parquet File or
> in Hive (with nested struct's)- but we would like to consider whether
> HBase.might also be an option.


Re: Nested data structures examples for HBase

Posted by Michael Segel <mi...@hotmail.com>.
@Wilm, 

Let me put it a different way… 

Think of a sales invoice. 

You can have columns for invoice_id, customer_id, customer_name, customer_billing_address (Nested structure), customer_contact# (nested structure), ship_to (nested structure)… 
And that’s the header information. 

Add to that the actual invoice line items… (row#, SKU#, description, qty, unit_price, line_price, tax-code) … [Note: this is also nested]

How do you have a single column family to handle all of that? 

Again, when you look at designs with respect to a real use case, you start to see where they fall apart. 

If we take a long look at what HBase is, and is not, we can start to see how we would want to model the data and how to better organize the data. 

I don’t want to morph this thread in to a more theoretical discussion on design, but this isn’t a new thing. 
Informix had project Arrowhead back in the late 90’s that got killed when Janet Perna bought them.  Had that project not been killed, the landscape would be very different. 
(And that’s again another story. ;-) 

But I digress. 

The point I’m trying to make is that when you start to look at the data, where you would have a Master/Slave relationship in terms of the data, you can replace it with some sort of array/list structure in a single column since everything is a blob.   (And again there are areas where you can impose more constraints on hbase and make it either more in to a relational model or in to a hierarchal model. and this would again be a different discussion.)

HTH

-Mike

On Sep 10, 2014, at 10:25 PM, Wilm Schumacher <wi...@cawoom.com> wrote:

> 
> 
> Am 10.09.2014 um 22:25 schrieb Michael Segel:
>> Ok, but here’s the thing… you extrapolate the design out… each column
>> with a subordinate record will get its own CF.
> I disagree. Not by the proposed design. You could do it with one CF.
> 
>> Simple examples can go
>> very bad when you move to real life.
> I agree.
> 
>> Again you need to look at hierarchical databases and not think in
>> terms of relational. To give you a really good example… look at a
>> point of sale system in Pick/Revelation/U2 …
>> 
>> You are great at finding a specific customer’s order and what they
>> ordered. You suck at telling me how many customers ordered that
>> widget  in red.  during the past month’s promotion. (You’ll need to
>> do a map/reduce for that. )
> correct, that's the downside of the suggestion. If you want to query
> something like that ("give all 'toplevel columns' that that have this
> and that!"), you would have to make a map reduce. Or you need something
> like an index. But that's a question only the thread owner can answer
> because we don't know what he's trying to accomplish. If there is a
> chance that he want to query something like that, my suggestion would be
> a bad plan.
> 
> I think the thread owner has now 3 ideas how to do what he was asking
> for, with up and downsides. Now he has to decide what's the best plan
> for the future.
> 
> Best wishes,
> 
> Wilm
> 


Re: Nested data structures examples for HBase

Posted by Wilm Schumacher <wi...@cawoom.com>.

Am 10.09.2014 um 22:25 schrieb Michael Segel:
> Ok, but here’s the thing… you extrapolate the design out… each column
> with a subordinate record will get its own CF.
I disagree. Not by the proposed design. You could do it with one CF.

> Simple examples can go
> very bad when you move to real life.
I agree.

> Again you need to look at hierarchical databases and not think in
> terms of relational. To give you a really good example… look at a
> point of sale system in Pick/Revelation/U2 …
> 
> You are great at finding a specific customer’s order and what they
> ordered. You suck at telling me how many customers ordered that
> widget  in red.  during the past month’s promotion. (You’ll need to
> do a map/reduce for that. )
correct, that's the downside of the suggestion. If you want to query
something like that ("give all 'toplevel columns' that that have this
and that!"), you would have to make a map reduce. Or you need something
like an index. But that's a question only the thread owner can answer
because we don't know what he's trying to accomplish. If there is a
chance that he want to query something like that, my suggestion would be
a bad plan.

I think the thread owner has now 3 ideas how to do what he was asking
for, with up and downsides. Now he has to decide what's the best plan
for the future.

Best wishes,

Wilm

Re: Nested data structures examples for HBase

Posted by Michael Segel <mi...@hotmail.com>.
Ok, but here’s the thing… you extrapolate the design out… each column with a subordinate record will get its own CF.
Simple examples can go very bad when you move to real life. 

Again you need to look at hierarchical databases and not think in terms of relational. 
To give you a really good example… look at a point of sale system in Pick/Revelation/U2 … 

You are great at finding a specific customer’s order and what they ordered. 
You suck at telling me how many customers ordered that widget  in red.  during the past month’s promotion. 
(You’ll need to do a map/reduce for that. )

This is why you have to go in to secondary indexing. (Which is a whole different ball of wax from inverted tables to SOLR. ) 

But to really grok hbase, you have to understand data structures and databases beyond relational. 

On Sep 10, 2014, at 6:33 PM, Wilm Schumacher <wi...@cawoom.com> wrote:

> 
> 
> Am 10.09.2014 um 17:33 schrieb Michael Segel:
>> Because you really don’t want to do that since you need to keep the number of CFs low. 
> in my example the number of CFs is 1. So this is not a problem.
> 
> Best wishes,
> 
> Wilm
> 


Re: Nested data structures examples for HBase

Posted by Wilm Schumacher <wi...@cawoom.com>.

Am 10.09.2014 um 17:33 schrieb Michael Segel:
> Because you really don’t want to do that since you need to keep the number of CFs low. 
in my example the number of CFs is 1. So this is not a problem.

Best wishes,

Wilm

Re: Nested data structures examples for HBase

Posted by Michael Segel <mi...@hotmail.com>.
Because you really don’t want to do that since you need to keep the number of CFs low. 

Again, you can store the data within the structure and index it. 

On Sep 10, 2014, at 7:17 AM, Wilm Schumacher <wi...@cawoom.com> wrote:

> as stated above you can use JSON or something similar, which is always
> possible. However, if you have to do that very often (and I think you
> are, if you using hbase ;) ), this could be a bad plan, because parsing
> JSON is expensive in terms of CPU.
> 
> As I am relativly new to hbase (using it perhaps for a year and not
> using most of the fancy features) perhaps my suggestion is not clever
> ... but why not using hbase directly?
> 
> If your structure is something like
> 
> {
> 	A : "A"
> 	B : {
> 		B1 : "B1" ,
> 		B2 : "B2"
> 	}
> }
> 
> why not using qualifiers like "data:B,B1" where "data" is your column
> family?
> 
> Your explaination of your problem seems to fit this idea perfectly, as
> you are not interested in JSON like behaviour (requesting B => getting
> "{B1: "B1" , B2 : "B2"}"), but like having a defined structure (fixed
> number of layers etc.).
> 
> So if you want to query "B=>B2", just adding "B,B2" as qualifier to the
> get request and fire?
> 
> This is of course only possible if the queried names are known. If not
> you have to query the whole column family, which could get very big
> regarding your requirements below ... but still would be possible.
> 
> However, by using a "," as seperator, just as an example, the parsing of
> the object to whatever you need should be very simple. however, as you
> stated, that you just want to write stuff and query it directly even
> this cheap parsing shouldn't be required.
> 
> This sounds much more easy and much cheaper regarding CPU usage to me
> than the JSON, XML, whatever plan.
> 
> Do I misunderstood your problem completely? Or does the above outlined
> plan has flaws (as question to the hbase experts)?
> 
> Best wishes,
> 
> Wilm
> 
> Am 08.09.2014 um 23:06 schrieb Stephen Boesch:
>> While I am aware that HBase does not have native support for nested
>> structures, surely there are some of you that have thought through this use
>> case carefully.
>> 
>> Our particular use case is likely having single digit nested layers with
>> tens to hundreds of items in the lists at each level.
>> 
>> An example would be a
>> 
>> top Level  300 items
>> middle level :  1 to 100 items  ("1 value"  may indicate a single value as
>> opposed to a list)
>> third level:  1 to 50 items
>> fourth level  1 to 20 items
>> 
>> The column names are likely known ahead of time- which may or may not
>> matter for hbase.  We could model the above structure in a Parquet File or
>> in Hive (with nested struct's)- but we would like to consider whether
>> HBase.might also be an option.
>> 
> 


Re: Nested data structures examples for HBase

Posted by Stephen Boesch <ja...@gmail.com>.
Hi Wilm
 that is actually an interesting option - include the entire json-path in
the cq

2014-09-09 23:17 GMT-07:00 Wilm Schumacher <wi...@cawoom.com>:

> as stated above you can use JSON or something similar, which is always
> possible. However, if you have to do that very often (and I think you
> are, if you using hbase ;) ), this could be a bad plan, because parsing
> JSON is expensive in terms of CPU.
>
> As I am relativly new to hbase (using it perhaps for a year and not
> using most of the fancy features) perhaps my suggestion is not clever
> ... but why not using hbase directly?
>
> If your structure is something like
>
> {
>         A : "A"
>         B : {
>                 B1 : "B1" ,
>                 B2 : "B2"
>         }
> }
>
> why not using qualifiers like "data:B,B1" where "data" is your column
> family?
>
> Your explaination of your problem seems to fit this idea perfectly, as
> you are not interested in JSON like behaviour (requesting B => getting
> "{B1: "B1" , B2 : "B2"}"), but like having a defined structure (fixed
> number of layers etc.).
>
> So if you want to query "B=>B2", just adding "B,B2" as qualifier to the
> get request and fire?
>
> This is of course only possible if the queried names are known. If not
> you have to query the whole column family, which could get very big
> regarding your requirements below ... but still would be possible.
>
> However, by using a "," as seperator, just as an example, the parsing of
> the object to whatever you need should be very simple. however, as you
> stated, that you just want to write stuff and query it directly even
> this cheap parsing shouldn't be required.
>
> This sounds much more easy and much cheaper regarding CPU usage to me
> than the JSON, XML, whatever plan.
>
> Do I misunderstood your problem completely? Or does the above outlined
> plan has flaws (as question to the hbase experts)?
>
> Best wishes,
>
> Wilm
>
> Am 08.09.2014 um 23:06 schrieb Stephen Boesch:
> > While I am aware that HBase does not have native support for nested
> > structures, surely there are some of you that have thought through this
> use
> > case carefully.
> >
> > Our particular use case is likely having single digit nested layers with
> > tens to hundreds of items in the lists at each level.
> >
> > An example would be a
> >
> >  top Level  300 items
> >  middle level :  1 to 100 items  ("1 value"  may indicate a single value
> as
> > opposed to a list)
> >  third level:  1 to 50 items
> >  fourth level  1 to 20 items
> >
> > The column names are likely known ahead of time- which may or may not
> > matter for hbase.  We could model the above structure in a Parquet File
> or
> > in Hive (with nested struct's)- but we would like to consider whether
> > HBase.might also be an option.
> >
>

Re: Nested data structures examples for HBase

Posted by Wilm Schumacher <wi...@cawoom.com>.
as stated above you can use JSON or something similar, which is always
possible. However, if you have to do that very often (and I think you
are, if you using hbase ;) ), this could be a bad plan, because parsing
JSON is expensive in terms of CPU.

As I am relativly new to hbase (using it perhaps for a year and not
using most of the fancy features) perhaps my suggestion is not clever
... but why not using hbase directly?

If your structure is something like

{
	A : "A"
	B : {
		B1 : "B1" ,
		B2 : "B2"
	}
}

why not using qualifiers like "data:B,B1" where "data" is your column
family?

Your explaination of your problem seems to fit this idea perfectly, as
you are not interested in JSON like behaviour (requesting B => getting
"{B1: "B1" , B2 : "B2"}"), but like having a defined structure (fixed
number of layers etc.).

So if you want to query "B=>B2", just adding "B,B2" as qualifier to the
get request and fire?

This is of course only possible if the queried names are known. If not
you have to query the whole column family, which could get very big
regarding your requirements below ... but still would be possible.

However, by using a "," as seperator, just as an example, the parsing of
the object to whatever you need should be very simple. however, as you
stated, that you just want to write stuff and query it directly even
this cheap parsing shouldn't be required.

This sounds much more easy and much cheaper regarding CPU usage to me
than the JSON, XML, whatever plan.

Do I misunderstood your problem completely? Or does the above outlined
plan has flaws (as question to the hbase experts)?

Best wishes,

Wilm

Am 08.09.2014 um 23:06 schrieb Stephen Boesch:
> While I am aware that HBase does not have native support for nested
> structures, surely there are some of you that have thought through this use
> case carefully.
> 
> Our particular use case is likely having single digit nested layers with
> tens to hundreds of items in the lists at each level.
> 
> An example would be a
> 
>  top Level  300 items
>  middle level :  1 to 100 items  ("1 value"  may indicate a single value as
> opposed to a list)
>  third level:  1 to 50 items
>  fourth level  1 to 20 items
> 
> The column names are likely known ahead of time- which may or may not
> matter for hbase.  We could model the above structure in a Parquet File or
> in Hive (with nested struct's)- but we would like to consider whether
> HBase.might also be an option.
>