You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Sean Bigdatafun <se...@gmail.com> on 2011/01/02 03:36:28 UTC

Re: Schema Design

I think so. Unless you have some way to index the contract time (in HBase,
the only way doing so is to encode that information into your row-key), you
have to MapReduce to examine item by item.

On Tue, Dec 28, 2010 at 4:46 PM, Valter Nogueira <vg...@gmail.com>wrote:

> And what about searching such contents?
>
> How to search for overdued contracts?
>
> I could read every contract thru map-reduce, select overdued contracts and
> build a table with such contracts - is that the right approach?
>
> Valter
>
> 2010/12/28 Sean Bigdatafun <se...@gmail.com>
>
> > I'd suggest json object, or xml, or any binary protocol buffer such as
> > Google PB, Facebook Thrift PB.
> >
> > If you use any of those, you will have much better control over version
> > upgrade
> >
> >
> >
> >
> > On Tue, Dec 28, 2010 at 4:16 PM, Valter Nogueira <vgnogueira@gmail.com
> > >wrote:
> >
> > > Since contract has attributes such NUMBER, TOTAL, ACCOUNT and soon
> > >
> > > When doing the follow:
> > >
> > > row_key   ||  CF: Contract
> > > ---------------------------------------------------------------------
> > > valter        || 'C11' | info_for_11 | 'C12' | info_for_12
> > > ----------------------------------------------------------------------
> > >
> > > info_for_NN should be comma-separeted values, xml, a serialized java
> > > object?
> > > On CF for contract attribute?
> > >
> > > Thanks,
> > >
> > > Valter
> > >
> > >
> > >
> > > 2010/12/28 Sean Bigdatafun <se...@gmail.com>
> > >
> > > > 1. customer_table:
> > > > row_key --> column_family : (customer --> contract)
> > > >
> > > > An example row,
> > > > row_key   ||  CF: Contract
> > > > ---------------------------------------------------------------------
> > > > valter        || 'C11' | info_for_11 | 'C12' | info_for_12
> > > >
> ----------------------------------------------------------------------
> > > >
> > > >
> > > > 2. contract_table:
> > > > row_key --> column_family (contract_id --> installment)
> > > >
> > > > An example row,
> > > > row_key   ||  CF: Installment
> > > >
> > > >
> > >
> >
> -----------------------------------------------------------------------------------------
> > > > C11           || 'I_1001' | info_for_1001 | 'I_1002' | info_for_1002
> > > >
> > > >
> > >
> >
> ----------------------------------------------------------------------------------------
> > > >
> > > >
> > > > On Tue, Dec 28, 2010 at 3:26 PM, Valter Nogueira <
> vgnogueira@gmail.com
> > > > >wrote:
> > > >
> > > > > I have a small JAVA system using relational database.
> > > > >
> > > > > Basically, the app have 3 entities: CUSTOMER has many CONTRACTs and
> > > each
> > > > > CONTRACT has many INSTALLMENTS
> > > > >
> > > > > Very classic.
> > > > >
> > > > > I am figuring out what is the best approach to convert it to HBASE.
> > > > >
> > > > > One way is to create a schema that mimics the relational schema.
> > > > >
> > > > > Other is saving INSTALLMENTS into the CONTRACT table.
> > > > >
> > > > > I would like some advice on doing this.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Valter
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --Sean
> > > >
> > >
> >
> >
> >
> > --
> > --Sean
> >
>



-- 
--Sean

Re: Schema Design

Posted by Sean Bigdatafun <se...@gmail.com>.
Not sure if the secondary index helps his use case or not. Anyone has
experience on that?




On Sun, Jan 2, 2011 at 12:34 AM, Hari Sreekumar <hs...@clickable.com>wrote:

> Ultimately it depends on how you will be accessing your data. If you need
> to
> query on the contract time frequently, then this approach wouldn't be
> great.
> You have to identify the frequent queries and design schema according to
> that. What are your frequent queries like?
>
> Hari
>
> On Sun, Jan 2, 2011 at 8:06 AM, Sean Bigdatafun
> <se...@gmail.com>wrote:
>
> > I think so. Unless you have some way to index the contract time (in
> HBase,
> > the only way doing so is to encode that information into your row-key),
> you
> > have to MapReduce to examine item by item.
> >
> > On Tue, Dec 28, 2010 at 4:46 PM, Valter Nogueira <vgnogueira@gmail.com
> > >wrote:
> >
> > > And what about searching such contents?
> > >
> > > How to search for overdued contracts?
> > >
> > > I could read every contract thru map-reduce, select overdued contracts
> > and
> > > build a table with such contracts - is that the right approach?
> > >
> > > Valter
> > >
> > > 2010/12/28 Sean Bigdatafun <se...@gmail.com>
> > >
> > > > I'd suggest json object, or xml, or any binary protocol buffer such
> as
> > > > Google PB, Facebook Thrift PB.
> > > >
> > > > If you use any of those, you will have much better control over
> version
> > > > upgrade
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Dec 28, 2010 at 4:16 PM, Valter Nogueira <
> vgnogueira@gmail.com
> > > > >wrote:
> > > >
> > > > > Since contract has attributes such NUMBER, TOTAL, ACCOUNT and soon
> > > > >
> > > > > When doing the follow:
> > > > >
> > > > > row_key   ||  CF: Contract
> > > > >
> ---------------------------------------------------------------------
> > > > > valter        || 'C11' | info_for_11 | 'C12' | info_for_12
> > > > >
> > ----------------------------------------------------------------------
> > > > >
> > > > > info_for_NN should be comma-separeted values, xml, a serialized
> java
> > > > > object?
> > > > > On CF for contract attribute?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Valter
> > > > >
> > > > >
> > > > >
> > > > > 2010/12/28 Sean Bigdatafun <se...@gmail.com>
> > > > >
> > > > > > 1. customer_table:
> > > > > > row_key --> column_family : (customer --> contract)
> > > > > >
> > > > > > An example row,
> > > > > > row_key   ||  CF: Contract
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > valter        || 'C11' | info_for_11 | 'C12' | info_for_12
> > > > > >
> > > ----------------------------------------------------------------------
> > > > > >
> > > > > >
> > > > > > 2. contract_table:
> > > > > > row_key --> column_family (contract_id --> installment)
> > > > > >
> > > > > > An example row,
> > > > > > row_key   ||  CF: Installment
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -----------------------------------------------------------------------------------------
> > > > > > C11           || 'I_1001' | info_for_1001 | 'I_1002' |
> > info_for_1002
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ----------------------------------------------------------------------------------------
> > > > > >
> > > > > >
> > > > > > On Tue, Dec 28, 2010 at 3:26 PM, Valter Nogueira <
> > > vgnogueira@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I have a small JAVA system using relational database.
> > > > > > >
> > > > > > > Basically, the app have 3 entities: CUSTOMER has many CONTRACTs
> > and
> > > > > each
> > > > > > > CONTRACT has many INSTALLMENTS
> > > > > > >
> > > > > > > Very classic.
> > > > > > >
> > > > > > > I am figuring out what is the best approach to convert it to
> > HBASE.
> > > > > > >
> > > > > > > One way is to create a schema that mimics the relational
> schema.
> > > > > > >
> > > > > > > Other is saving INSTALLMENTS into the CONTRACT table.
> > > > > > >
> > > > > > > I would like some advice on doing this.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Valter
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > --Sean
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --Sean
> > > >
> > >
> >
> >
> >
> > --
> > --Sean
> >
>



-- 
--Sean

Re: Schema Design

Posted by Hari Sreekumar <hs...@clickable.com>.
Ultimately it depends on how you will be accessing your data. If you need to
query on the contract time frequently, then this approach wouldn't be great.
You have to identify the frequent queries and design schema according to
that. What are your frequent queries like?

Hari

On Sun, Jan 2, 2011 at 8:06 AM, Sean Bigdatafun
<se...@gmail.com>wrote:

> I think so. Unless you have some way to index the contract time (in HBase,
> the only way doing so is to encode that information into your row-key), you
> have to MapReduce to examine item by item.
>
> On Tue, Dec 28, 2010 at 4:46 PM, Valter Nogueira <vgnogueira@gmail.com
> >wrote:
>
> > And what about searching such contents?
> >
> > How to search for overdued contracts?
> >
> > I could read every contract thru map-reduce, select overdued contracts
> and
> > build a table with such contracts - is that the right approach?
> >
> > Valter
> >
> > 2010/12/28 Sean Bigdatafun <se...@gmail.com>
> >
> > > I'd suggest json object, or xml, or any binary protocol buffer such as
> > > Google PB, Facebook Thrift PB.
> > >
> > > If you use any of those, you will have much better control over version
> > > upgrade
> > >
> > >
> > >
> > >
> > > On Tue, Dec 28, 2010 at 4:16 PM, Valter Nogueira <vgnogueira@gmail.com
> > > >wrote:
> > >
> > > > Since contract has attributes such NUMBER, TOTAL, ACCOUNT and soon
> > > >
> > > > When doing the follow:
> > > >
> > > > row_key   ||  CF: Contract
> > > > ---------------------------------------------------------------------
> > > > valter        || 'C11' | info_for_11 | 'C12' | info_for_12
> > > >
> ----------------------------------------------------------------------
> > > >
> > > > info_for_NN should be comma-separeted values, xml, a serialized java
> > > > object?
> > > > On CF for contract attribute?
> > > >
> > > > Thanks,
> > > >
> > > > Valter
> > > >
> > > >
> > > >
> > > > 2010/12/28 Sean Bigdatafun <se...@gmail.com>
> > > >
> > > > > 1. customer_table:
> > > > > row_key --> column_family : (customer --> contract)
> > > > >
> > > > > An example row,
> > > > > row_key   ||  CF: Contract
> > > > >
> ---------------------------------------------------------------------
> > > > > valter        || 'C11' | info_for_11 | 'C12' | info_for_12
> > > > >
> > ----------------------------------------------------------------------
> > > > >
> > > > >
> > > > > 2. contract_table:
> > > > > row_key --> column_family (contract_id --> installment)
> > > > >
> > > > > An example row,
> > > > > row_key   ||  CF: Installment
> > > > >
> > > > >
> > > >
> > >
> >
> -----------------------------------------------------------------------------------------
> > > > > C11           || 'I_1001' | info_for_1001 | 'I_1002' |
> info_for_1002
> > > > >
> > > > >
> > > >
> > >
> >
> ----------------------------------------------------------------------------------------
> > > > >
> > > > >
> > > > > On Tue, Dec 28, 2010 at 3:26 PM, Valter Nogueira <
> > vgnogueira@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I have a small JAVA system using relational database.
> > > > > >
> > > > > > Basically, the app have 3 entities: CUSTOMER has many CONTRACTs
> and
> > > > each
> > > > > > CONTRACT has many INSTALLMENTS
> > > > > >
> > > > > > Very classic.
> > > > > >
> > > > > > I am figuring out what is the best approach to convert it to
> HBASE.
> > > > > >
> > > > > > One way is to create a schema that mimics the relational schema.
> > > > > >
> > > > > > Other is saving INSTALLMENTS into the CONTRACT table.
> > > > > >
> > > > > > I would like some advice on doing this.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Valter
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --Sean
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --Sean
> > >
> >
>
>
>
> --
> --Sean
>