You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Dima Spivak <ds...@cloudera.com> on 2015/02/12 21:12:23 UTC

Fwd: data base design question

Forwarding to users@, moving dev@ to bcc. People on the user list might be
more helpful here, Jignesh.

Cheers,
  Dima

---------- Forwarded message ----------
From: *Jignesh Patel* <ji...@gmail.com>
Date: Thursday, February 12, 2015
Subject: data base design question
To: "dev@hbase.apache.org" <de...@hbase.apache.org>

I have a requirement where I have to define two entities: orders and
results.
Now each order can have multiple results, for db design I will have two
options:

Option 1: Create an embedded entity of results and store it as list object
inside order tabel as one of the column field.
Option 2. Create a separate table of orders and results and build secondary
index inside solr where for given order id, multiple results ids are mapped.

Is there a better alternative than  above options for one to many
relationships?

Re: Fwd: data base design question

Posted by Wilm Schumacher <wi...@gmail.com>.

Hi,

Am 13.02.2015 um 04:08 schrieb Jignesh Patel:
> How about Option 1: Create an embedded entity of results and store it as
> list object inside order table as one of the column field.
the problem is, that a hbase cell value must be a byte array. Thus you
have to convert the "list object" to a byte array. One option could be
e.g. the json representation.

But now, when you want to look at the results for one order you have to
parse the e.g. json array. If you have to do that over and over again
you produce very high CPU costs. If the main task of your application is
to show or manipulate the results your main task on server side is to
parse and create json strings. This gets more and more problematic as
the number of results per order grow.

Another point is that by this design you cannot address a result directly.

But in the end: your plan would work. But there are some problems with that.

Best wishes

Wilm

Re: Fwd: data base design question

Posted by Jignesh Patel <ji...@gmail.com>.

How about Option 1: Create an embedded entity of results and store it as
list object inside order table as one of the column field.

On Thu, Feb 12, 2015 at 6:59 PM, Wilm Schumacher <wi...@gmail.com>
wrote:

> Hi,
>
> I think that Aloks solution and mine are kind of contrary.
>
> If you are able to make the one-to-many relationship by the key design
> in the result table then you wouldn't need the column family "results"
> in the order table. The "relation" would be intrinsic.
>
> If the key design in the results is somehow fixed by something else,
> than the column family solution would be necessary.
>
> But of course Aloks solution is somehow "cleaner", as you wouldn't need
> to save the relation to disk as it is intrinsic in the table. And you
> wouldn't produce problems with race conditions or the problem that the
> tables could end up in a erroneous state due to problems. E.g. deleting
> the result but leave the reference to it etc..
>
> And Aloks solution would be faster, as the other way you need to make a
> "Get" to get the rowkeys in the forst place, as Alok pointed out.
>
> Best wishes
>
> Wilm
>
> Am 12.02.2015 um 22:51 schrieb Alok Singh:
> > How are you going to access the results? Do you first lookup the order
> > and then the results? If so, you could do something like this:
> >
> > Table 1: Order
> > row_key => order_id
> > Column Family => order { columns: order.prop1, order.prop2....}
> >
> > Table 2: order_result
> > row_key => order_id:result_id
> > Column Family => result { columns: result.prop1, result.prop2....}
> >
> > To lookup a specific result, do a get(order_id + ":" + result_id) on
> > order_result.
> >
> > To get all results for an order, do a scan with startrow/endrow or a
> > use a prefix filter with order_id as the prefix.
> >
> > Alok
> >
> >
> > On Thu, Feb 12, 2015 at 1:23 PM, Wilm Schumacher
> > <wi...@gmail.com> wrote:
> >> Am 12.02.2015 um 21:12 schrieb Dima Spivak:
> >>> Is there a better alternative than  above options for one to many
> >>> relationships?
> >> you could use a column family in table 2 for that.
> >>
> >> table 1
> >> result1 data:foo => bar
> >> result2 data:foo => baz
> >> result3 data:foo => bar
> >> result4 data:foo => baz
> >> ....
> >>
> >> table 2
> >> order1 data:metadata => foobar , result:1 => result1, result:2 =>
> result2
> >> order1 data:metadata => foobaz , result:1 => result3, result:2 =>
> >> result4 , ....
> >>
> >> Seems to be the best plan.
> >>
> >> Best wishes
> >>
> >> Wilm
>
>

Re: Fwd: data base design question

Posted by Wilm Schumacher <wi...@gmail.com>.

Hi,

I think that Aloks solution and mine are kind of contrary.

If you are able to make the one-to-many relationship by the key design
in the result table then you wouldn't need the column family "results"
in the order table. The "relation" would be intrinsic.

If the key design in the results is somehow fixed by something else,
than the column family solution would be necessary.

But of course Aloks solution is somehow "cleaner", as you wouldn't need
to save the relation to disk as it is intrinsic in the table. And you
wouldn't produce problems with race conditions or the problem that the
tables could end up in a erroneous state due to problems. E.g. deleting
the result but leave the reference to it etc..

And Aloks solution would be faster, as the other way you need to make a
"Get" to get the rowkeys in the forst place, as Alok pointed out.

Best wishes

Wilm

Am 12.02.2015 um 22:51 schrieb Alok Singh:
> How are you going to access the results? Do you first lookup the order
> and then the results? If so, you could do something like this:
>
> Table 1: Order
> row_key => order_id
> Column Family => order { columns: order.prop1, order.prop2....}
>
> Table 2: order_result
> row_key => order_id:result_id
> Column Family => result { columns: result.prop1, result.prop2....}
>
> To lookup a specific result, do a get(order_id + ":" + result_id) on
> order_result.
>
> To get all results for an order, do a scan with startrow/endrow or a
> use a prefix filter with order_id as the prefix.
>
> Alok
>
>
> On Thu, Feb 12, 2015 at 1:23 PM, Wilm Schumacher
> <wi...@gmail.com> wrote:
>> Am 12.02.2015 um 21:12 schrieb Dima Spivak:
>>> Is there a better alternative than  above options for one to many
>>> relationships?
>> you could use a column family in table 2 for that.
>>
>> table 1
>> result1 data:foo => bar
>> result2 data:foo => baz
>> result3 data:foo => bar
>> result4 data:foo => baz
>> ....
>>
>> table 2
>> order1 data:metadata => foobar , result:1 => result1, result:2 => result2
>> order1 data:metadata => foobaz , result:1 => result3, result:2 =>
>> result4 , ....
>>
>> Seems to be the best plan.
>>
>> Best wishes
>>
>> Wilm

Re: Fwd: data base design question

Posted by Alok Singh <al...@gmail.com>.

How are you going to access the results? Do you first lookup the order
and then the results? If so, you could do something like this:

Table 1: Order
row_key => order_id
Column Family => order { columns: order.prop1, order.prop2....}

Table 2: order_result
row_key => order_id:result_id
Column Family => result { columns: result.prop1, result.prop2....}

To lookup a specific result, do a get(order_id + ":" + result_id) on
order_result.

To get all results for an order, do a scan with startrow/endrow or a
use a prefix filter with order_id as the prefix.

Alok


On Thu, Feb 12, 2015 at 1:23 PM, Wilm Schumacher
<wi...@gmail.com> wrote:
> Am 12.02.2015 um 21:12 schrieb Dima Spivak:
>> Is there a better alternative than  above options for one to many
>> relationships?
>
> you could use a column family in table 2 for that.
>
> table 1
> result1 data:foo => bar
> result2 data:foo => baz
> result3 data:foo => bar
> result4 data:foo => baz
> ....
>
> table 2
> order1 data:metadata => foobar , result:1 => result1, result:2 => result2
> order1 data:metadata => foobaz , result:1 => result3, result:2 =>
> result4 , ....
>
> Seems to be the best plan.
>
> Best wishes
>
> Wilm

Re: Fwd: data base design question

Posted by Wilm Schumacher <wi...@gmail.com>.

Am 12.02.2015 um 21:12 schrieb Dima Spivak:
> Is there a better alternative than  above options for one to many
> relationships?

you could use a column family in table 2 for that.

table 1
result1 data:foo => bar
result2 data:foo => baz
result3 data:foo => bar
result4 data:foo => baz
....

table 2
order1 data:metadata => foobar , result:1 => result1, result:2 => result2
order1 data:metadata => foobaz , result:1 => result3, result:2 =>
result4 , ....

Seems to be the best plan.

Best wishes

Wilm