You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by stack <st...@duboce.net> on 2008/10/01 00:08:50 UTC

Re: Pigi project

Hey Antoni & Krzysztof:

Couple of things:

+ How does it work?  The indices in particular? (I suppose I'm 
interested in seeing the technial presentation).
+ Why the name Pigi?
+ What features do you need in hbase to support Pigi?
+ What Jim said regards the list (unless you wanted just two of us to 
see it first?).
+ Multivalue fields?  Is that cells in hbase-speak?
+ Distributed object cache?  How?  Sounds great.

Great stuff lads,
St.Ack



Jim Kellerman (POWERSET) wrote:
>
> In general, it is better to ask on the hbase-user mailing list because 
> there
>
> are a number of people doing things similar to what you are doing and 
> may be
>
> able to speak from experience. Neither Stack nor I really have much 
> experience with this, and we may not give you as good an answer.
>
>  
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>  
>
> *From:* Antoni ..... [mailto:antoni.kozielewski@gmail.com]
> *Sent:* Tuesday, September 30, 2008 2:32 PM
> *To:* stack@duboce.net; Jim Kellerman (POWERSET)
> *Subject:* Pigi project
>
>  
>
> We develop a big social network portal. We decided to use HBase as our 
> data storage. During this work, we found out that we need one to many 
> relations between some objects. Because we didn't want to hardcode 
> that relations between objects every time, we created kind of ORM 
> framework. We want to publish this framework as an open source project 
> (we called it Pigi), but first of all we would like to ask you to view 
> attached presentation and tell us what do you think about our solution.
>
> We would be grateful if you assess the idea and comment on the road map.
>
> We are working also on some more technical presentation. It should be 
> done in several days - we can of course mail it to you if you like.
>
> Simple usecase presentation:
>         http://docs.google.com/Presentation?id=dhsz359t_2fgbm9x32
>
>
>  Antoni & Krzysztof
>

RE: Pigi project

Posted by Krzysztof Gałęcki <kr...@gmail.com>.

Hi

1. This is not some great advantage. But if you want to index for example
users (described by firstname, lastname, age) and you would like to execute
queries based on all combinations of that fields - then you have about 2^3
indexes (without ordering). Because of paging, each index can have even 3
tables (we will describe it in technical presentation). So without ordering,
you have 8*3 = 24 additional tables for 1 data table. I would rather want to
have 1 data table and 3 index tables. It is just more clear for me, but if
you like, you can have another table (or 3 tables) for each index.

2. At this stage we don't. It is interesting feature, but I'm not sure if it
is possible to ensure transactions.

Regards

Chriss

-----Original Message-----
From: Ding, Hui [mailto:hui.ding@sap.com] 
Sent: Wednesday, October 01, 2008 7:18 PM
To: hbase-user@hadoop.apache.org
Subject: RE: Pigi project

 This sounds really interesting. A few more questions if I may:

1. what do you see as the advantage of having one index table that
contains all, rather than having separate index tables?
2. do you ensure that update to the main table and the index table are
done in one transaction?

-----Original Message-----
From: cure@g.pl [mailto:cure@g.pl] 
Sent: Wednesday, October 01, 2008 1:48 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Pigi project

> Hey Antoni & Krzysztof:
>
> Couple of things:
>
> + How does it work?  The indices in particular? (I suppose I'm
> interested in seeing the technial presentation).
> + Why the name Pigi?
> + What features do you need in hbase to support Pigi?
> + What Jim said regards the list (unless you wanted just two of us to
> see it first?).
> + Multivalue fields?  Is that cells in hbase-speak?
> + Distributed object cache?  How?  Sounds great.
>


stack pisze:
> Hey Antoni & Krzysztof:
>
> Couple of things:
>
> + How does it work?  The indices in particular? (I suppose I'm
interested in seeing the technial presentation).
> + Why the name Pigi?
> + What features do you need in hbase to support Pigi?
> + What Jim said regards the list (unless you wanted just two of us to
see it first?).
> + Multivalue fields?  Is that cells in hbase-speak?
> + Distributed object cache?  How?  Sounds great.
>

   Hi


We will prepare a short technical presentation, but at this moment i'll
try to answer your questions:

    1) How does it work ?

    The idea is based on fact that identifiers in hbase table are sorted
lexicographically.
    For every 1:n relation Pigi maintains additional table (index
table).
Every row added to child table causes insert row
    to each index designed for that child object. Index table contains
identifiers of ordered child object identifiers.

    This order is cause by special prepared identifiers of rows in index
table - it contains:

        index name
        parent object id
        optional index parameters (for example: color of the car)
        optional ordering parameters (if we want to order results)
        child object id

    Because of index name field in that id, many indexes can share one
index table (so in fact there is no need to create another table for
every one index)

    Pigi helps to create and maintain such kind of indexes. Otherwise
user
has to do it manually (probably individually for each 1:n relation)




             indexes - our framework creates an additional table and
puts
there all data it needs.
                          Indexing is realised by preparing complex
rowId:
                              for example :
                                     we have objects:
                                              -  UserVO  with fields:
id,
name, surname
                                              -  CarVO with fields:  id,
userId, color
                          Each user can have many cars, and one car has
only one owner.

                          We want to execute queries:
                                      - find all cars by userId
                                      - find all cars by userId and
color

                          Framework maintain 2 indexes:
                                    - cars by userId - where rowId in
index table will contain userId data.
                                    - cars by userId and color - where
rowId in index table will contain
userId and color data.

                         indexes are ordered lexicographicaly, than for
descendant index rowId will be "reversed".

                         When we want to change color of a car, we only
have to notify framework about changes in CarVO
object.
                         Framework will update all indexes of this
object.

      2) Why the name Pigi?

               there are no specyfic reason..... :-)

      3) What features do you need in hbase to support Pigi?
              only java API - we use only scanners and simple gets, we
don't use filters.

      4) Multivalue fields?  Is that cells in hbase-speak?

      5) Distributed object cache?  How?  Sounds great.
            in future we will need to write distributed cache -
something
like TreeCache - or use some existing solution.
            We need it to reduce reads from hbase - like in hibernate
and
any Cache.


   Antony

RE: Pigi project

Posted by "Ding, Hui" <hu...@sap.com>.

 This sounds really interesting. A few more questions if I may:

1. what do you see as the advantage of having one index table that
contains all, rather than having separate index tables?
2. do you ensure that update to the main table and the index table are
done in one transaction?

-----Original Message-----
From: cure@g.pl [mailto:cure@g.pl] 
Sent: Wednesday, October 01, 2008 1:48 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Pigi project

> Hey Antoni & Krzysztof:
>
> Couple of things:
>
> + How does it work?  The indices in particular? (I suppose I'm
> interested in seeing the technial presentation).
> + Why the name Pigi?
> + What features do you need in hbase to support Pigi?
> + What Jim said regards the list (unless you wanted just two of us to
> see it first?).
> + Multivalue fields?  Is that cells in hbase-speak?
> + Distributed object cache?  How?  Sounds great.
>


stack pisze:
> Hey Antoni & Krzysztof:
>
> Couple of things:
>
> + How does it work?  The indices in particular? (I suppose I'm
interested in seeing the technial presentation).
> + Why the name Pigi?
> + What features do you need in hbase to support Pigi?
> + What Jim said regards the list (unless you wanted just two of us to
see it first?).
> + Multivalue fields?  Is that cells in hbase-speak?
> + Distributed object cache?  How?  Sounds great.
>

   Hi


We will prepare a short technical presentation, but at this moment i'll
try to answer your questions:

    1) How does it work ?

    The idea is based on fact that identifiers in hbase table are sorted
lexicographically.
    For every 1:n relation Pigi maintains additional table (index
table).
Every row added to child table causes insert row
    to each index designed for that child object. Index table contains
identifiers of ordered child object identifiers.

    This order is cause by special prepared identifiers of rows in index
table - it contains:

        index name
        parent object id
        optional index parameters (for example: color of the car)
        optional ordering parameters (if we want to order results)
        child object id

    Because of index name field in that id, many indexes can share one
index table (so in fact there is no need to create another table for
every one index)

    Pigi helps to create and maintain such kind of indexes. Otherwise
user
has to do it manually (probably individually for each 1:n relation)




             indexes - our framework creates an additional table and
puts
there all data it needs.
                          Indexing is realised by preparing complex
rowId:
                              for example :
                                     we have objects:
                                              -  UserVO  with fields:
id,
name, surname
                                              -  CarVO with fields:  id,
userId, color
                          Each user can have many cars, and one car has
only one owner.

                          We want to execute queries:
                                      - find all cars by userId
                                      - find all cars by userId and
color

                          Framework maintain 2 indexes:
                                    - cars by userId - where rowId in
index table will contain userId data.
                                    - cars by userId and color - where
rowId in index table will contain
userId and color data.

                         indexes are ordered lexicographicaly, than for
descendant index rowId will be "reversed".

                         When we want to change color of a car, we only
have to notify framework about changes in CarVO
object.
                         Framework will update all indexes of this
object.

      2) Why the name Pigi?

               there are no specyfic reason..... :-)

      3) What features do you need in hbase to support Pigi?
              only java API - we use only scanners and simple gets, we
don't use filters.

      4) Multivalue fields?  Is that cells in hbase-speak?

      5) Distributed object cache?  How?  Sounds great.
            in future we will need to write distributed cache -
something
like TreeCache - or use some existing solution.
            We need it to reduce reads from hbase - like in hibernate
and
any Cache.


   Antony

Re: Pigi project

Posted by cu...@g.pl.

> cure@g.pl wrote:
>> ..
>> We will prepare a short technical presentation, but at this moment i'll
>> try to answer your questions:
>>
>>     1) How does it work ?
>>
>>     The idea is based on fact that identifiers in hbase table are sorted
>> lexicographically.
>>     For every 1:n relation Pigi maintains additional table (index
>> table).
>>
>
> Is Clint Morgans' HBASE-883 work of use to you?
>

  Our solutions is independent of this feature, but we will study it, and
draw conclusions.


>
> Would suggest you air any proposals out here on the list -- maybe better
> up on hbase-dev -- because I'd guess you are not the only folks in need.
>
> Looking forward to more on your porcine project.

 We will release beta versions of framework and a technical presentation
at the begining of the next week.

  Antony

Re: Pigi project

Posted by stack <st...@duboce.net>.

cure@g.pl wrote:
> ..
> We will prepare a short technical presentation, but at this moment i'll
> try to answer your questions:
>
>     1) How does it work ?
>
>     The idea is based on fact that identifiers in hbase table are sorted
> lexicographically.
>     For every 1:n relation Pigi maintains additional table (index table).
>   

Is Clint Morgans' HBASE-883 work of use to you?

...
>
>       2) Why the name Pigi?
>
>                there are no specyfic reason..... :-)
>   

Logo should be easy enough I'd say.  Smile.

>       5) Distributed object cache?  How?  Sounds great.
>             in future we will need to write distributed cache - something
> like TreeCache - or use some existing solution.
>             We need it to reduce reads from hbase - like in hibernate and
> any Cache.
>   

Would suggest you air any proposals out here on the list -- maybe better 
up on hbase-dev -- because I'd guess you are not the only folks in need.

Looking forward to more on your porcine project.

Great stuff,
St.Ack

Re: Pigi project

Posted by cu...@g.pl.

> Hey Antoni & Krzysztof:
>
> Couple of things:
>
> + How does it work?  The indices in particular? (I suppose I'm
> interested in seeing the technial presentation).
> + Why the name Pigi?
> + What features do you need in hbase to support Pigi?
> + What Jim said regards the list (unless you wanted just two of us to
> see it first?).
> + Multivalue fields?  Is that cells in hbase-speak?
> + Distributed object cache?  How?  Sounds great.
>


stack pisze:
> Hey Antoni & Krzysztof:
>
> Couple of things:
>
> + How does it work?  The indices in particular? (I suppose I'm
interested in seeing the technial presentation).
> + Why the name Pigi?
> + What features do you need in hbase to support Pigi?
> + What Jim said regards the list (unless you wanted just two of us to
see it first?).
> + Multivalue fields?  Is that cells in hbase-speak?
> + Distributed object cache?  How?  Sounds great.
>

   Hi


We will prepare a short technical presentation, but at this moment i'll
try to answer your questions:

    1) How does it work ?

    The idea is based on fact that identifiers in hbase table are sorted
lexicographically.
    For every 1:n relation Pigi maintains additional table (index table).
Every row added to child table causes insert row
    to each index designed for that child object. Index table contains
identifiers of ordered child object identifiers.

    This order is cause by special prepared identifiers of rows in index
table - it contains:

        index name
        parent object id
        optional index parameters (for example: color of the car)
        optional ordering parameters (if we want to order results)
        child object id

    Because of index name field in that id, many indexes can share one
index table (so in fact there is no need to create another table for
every one index)

    Pigi helps to create and maintain such kind of indexes. Otherwise user
has to do it manually (probably individually for each 1:n relation)




             indexes - our framework creates an additional table and puts
there all data it needs.
                          Indexing is realised by preparing complex rowId:
                              for example :
                                     we have objects:
                                              -  UserVO  with fields: id,
name, surname
                                              -  CarVO with fields:  id,
userId, color
                          Each user can have many cars, and one car has
only one owner.

                          We want to execute queries:
                                      - find all cars by userId
                                      - find all cars by userId and color

                          Framework maintain 2 indexes:
                                    - cars by userId - where rowId in
index table will contain userId data.
                                    - cars by userId and color - where
rowId in index table will contain
userId and color data.

                         indexes are ordered lexicographicaly, than for
descendant index rowId will be "reversed".

                         When we want to change color of a car, we only
have to notify framework about changes in CarVO
object.
                         Framework will update all indexes of this object.

      2) Why the name Pigi?

               there are no specyfic reason..... :-)

      3) What features do you need in hbase to support Pigi?
              only java API - we use only scanners and simple gets, we
don't use filters.

      4) Multivalue fields?  Is that cells in hbase-speak?

      5) Distributed object cache?  How?  Sounds great.
            in future we will need to write distributed cache - something
like TreeCache - or use some existing solution.
            We need it to reduce reads from hbase - like in hibernate and
any Cache.


   Antony