You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@openjpa.apache.org by Fernando Padilla <fe...@alum.mit.edu> on 2008/11/21 20:01:26 UTC

slices, collocation

So, now that I have some attention, I'll post up a question I sent out a 
month ago.

I want to make a connected datamodel, but I want to put objects on 
different databases..

Let's say I have 3 objects:

User (slice root)
  - name

Group (slice root)
  - name
  - users

Comment (slice grouped with group)
  - group
  - user
  - text


As you can see they are all inter-related.  But I let's say I want to 
distribute Users and Groups across databases.  But they are related, but 
can't be collocated.

So can you help me understand the "collocation" limitation of slices, 
and a way to enhance it to remove this limitation ( if I understand it 
properly ).



ps - If i understand the limitation, I can't have a ManyToMany 
relationship from Group to Users, or ManyToOne from Comment to User, 
instead I would have to have a set of userIds.  And I would have to load 
up each user object myself through code.

Re: slices, collocation

Posted by Fernando Padilla <fe...@alum.mit.edu>.

 >> Person p1 = em2.getReference( p1Id );
 >> Address a1 = p1.getAddress();

So.. Thank you agreeing with what we thought that this won't work.  But, 
is there anyway I can get this to work :) :) :)  Like I keep saying, I'm 
cool with reviewing openjpa source and submitting patches, any ideas on 
how we could enable this capability:

This is what is so inhibiting. I want the code above to work, without 
having to change p1.address to be a p1.addressId then load it by hand:

Person p1 = em2.getReference( p1Id );
Address a1 = em2.getReference( p1.getAddressId() );
logger.debgu( "a1: " + a1 );

This 3 line example doesn't look too bad, but it gets old quickly. :) 
We currently do this, and I'm willing stick it out, but I was just 
really hoping for a mature sharding technology that could handle this..

any ideas??

Pinaki Poddar wrote:
>> Person p1 = em2.getReference( p1Id );
>> Address a1 = p1.getAddress();
>> logger.debug( "a1: " + a1 );
> 
> If DistributionPolicy.distribute(p1) = DistributionPolicy.distribute(a1),
> then 
>   assertNotNull(a1);
> 
> else
>   assertNull(a1);
> 
> SlicePersistence.getSlice(x) for any managed instance x return the slice
> name that stores x.
> 
>

Re: slices, collocation

Posted by Pinaki Poddar <pp...@apache.org>.

> Person p1 = em2.getReference( p1Id );
> Address a1 = p1.getAddress();
> logger.debug( "a1: " + a1 );

If DistributionPolicy.distribute(p1) = DistributionPolicy.distribute(a1),
then 
  assertNotNull(a1);

else
  assertNull(a1);

SlicePersistence.getSlice(x) for any managed instance x return the slice
name that stores x.


-- 
View this message in context: http://n2.nabble.com/slices%2C-collocation-tp1563065p1578339.html
Sent from the OpenJPA Developers mailing list archive at Nabble.com.

Re: slices, collocation

Posted by Fernando Padilla <fe...@alum.mit.edu>.

Below is the precise usecase I'm trying to get to work reliably.  And 
trying to understand "collocation constraint".  So you're saying that I 
can always distribute objects into different slices if I call persist on 
each one.

The question I have is what would I expect when I try doing 
p1.getAddress() ( in another EntityManager, and loading it fresh )?


Person p1 = em2.getReference( p1Id );
Address a1 = p1.getAddress();
logger.debug( "a1: " + a1 );






Pinaki Poddar wrote:

> Please note that Slice stores transitive closure of X as it exists at the
> point of persist() call. One can exploit that fact to store related
> instances in different slices.
> For example, assume Person p1 with a reference to Address a1 and that
> reference is annotated with CascadeType.PERSIST. The DistributionPolicy is
> implemented such that p1 and a1 return different slices S1 and S2:
>   DistributionPolicy.distribute(p1) = S1
>   DistributionPolicy.distribute(a1) = S2
> 
> Now if we do the following, p1 and a1 will be stored in the same slice S1
>    Person p1 = new Person();
>    Address a1 = new Address();
>    p1.setAddress(a1);
>    em.persist(p1);
> 
> But you can store p1 and a1 in different slices as follows:
>    Person p1 = new Person();
>    em.persist(p1);
>    Address a1 = new Address();
>    p1.setAddress(a1);
>    em.persist(p1);
> 
> However, it becomes the applications responsibility to reestablish the
> linkage between p1 and a1 when they are later realized in a different
> persistence context because at the database level p1 stored in slice S1 with
> a foreign key for a1 but the Address record with that foreign key actually
> sits in slice S2.

Re: slices, collocation

Posted by Pinaki Poddar <pp...@apache.org>.

Hi,
   I appreciate your thoughtful observations and comments. I hope that your
project will benefit from Slice and design/implementation of Slice, in turn,
will improve from practical use cases and independent views. 

> hmm.  So maybe I was too quick to say that the collocation constraint is
> too inhibiting. 

Constraints are always integral to design. Slice is not a panacea for
distributed data management -- but it does address a specific usage where
data model is amenable to horizontal partition. Such constraints (i.e.
Constrained Tree Schema, if anyone prefers a fancier term) exist *naturally*
in many practical domains on temporal (e.g. PurchaseOrder per Month) or
geographical (e.g. Homes per State) or personal (e.g. Preferences per User)
dimensions. Imposing that constraint upfront, Slice attempts to meets its
goal well with a simple (but not simpler:) implementation that leverages
OpenJPA's excellently extensible architecture (as an adage: Slice required
only one single-line of code change in entire OpenJPA codebase). Such
trade-off between self-imposed constraint and efficiency is core of any
engineering design. 

> Coming from my expectations of what a sharding ORM system would provide
> for me, it definitely is too constraining.  
> I know that with sharding you can never execute a join across databases, 

I am not being able to follow the central argument about why you consider
collocation constraint *too inhibiting'. In fact, the above two statements
seem to contradict each other and justify why Slice has imposed a constraint
on the data model. 

> Just warning people that they have to be careful not to traverse relations
> that are not collocated would be fine.. we're not children after all :) :)

Please note that Slice stores transitive closure of X as it exists at the
point of persist() call. One can exploit that fact to store related
instances in different slices.
For example, assume Person p1 with a reference to Address a1 and that
reference is annotated with CascadeType.PERSIST. The DistributionPolicy is
implemented such that p1 and a1 return different slices S1 and S2:
  DistributionPolicy.distribute(p1) = S1
  DistributionPolicy.distribute(a1) = S2

Now if we do the following, p1 and a1 will be stored in the same slice S1
   Person p1 = new Person();
   Address a1 = new Address();
   p1.setAddress(a1);
   em.persist(p1);

But you can store p1 and a1 in different slices as follows:
   Person p1 = new Person();
   em.persist(p1);
   Address a1 = new Address();
   p1.setAddress(a1);
   em.persist(p1);

However, it becomes the applications responsibility to reestablish the
linkage between p1 and a1 when they are later realized in a different
persistence context because at the database level p1 stored in slice S1 with
a foreign key for a1 but the Address record with that foreign key actually
sits in slice S2. 

> But like I said, we're taking a big bet that OpenJPA slices will fit our 
> scale out requirements.  So thank you!  This is an amazing head start, 
> and looks solidly built and coded.  So I'll keep thinking on this, the 
> limitations and possibilities :)  And my complaints are pretty minor in 
> the big picture.

Good luck.


-- 
View this message in context: http://n2.nabble.com/slices%2C-collocation-tp1563065p1577575.html
Sent from the OpenJPA Developers mailing list archive at Nabble.com.

Re: slices, collocation

Posted by Fernando Padilla <fe...@alum.mit.edu>.

hmm.  So maybe I was too quick to say that the collocation constraint is 
too inhibiting.  Coming from my expectations of what a sharding ORM 
system would provide for me, it definitely is too constraining.  But I 
promise to put more thought, maybe in different use cases it's still ok. 
  So I'll continue to think on this.

But I ask for you guys to think on the use cases that can't be 
implemented and usability costs that the collocation constraint places 
on the system.

I know that with sharding you can never execute a join across databases, 
so fancier queries will not execute as expected.  But baking that 
limitation of sharding into the data model system itself seems like over 
doing it.  Just warning people that they have to be careful not to 
traverse relations that are not collocated would be fine.. we're not 
children after all :) :)

But like I said, we're taking a big bet that OpenJPA slices will fit our 
scale out requirements.  So thank you!  This is an amazing head start, 
and looks solidly built and coded.  So I'll keep thinking on this, the 
limitations and possibilities :)  And my complaints are pretty minor in 
the big picture.

For example, I have a work-around to the collocation constraint, I'm 
just seeing if we can make the system nicer and easier to use.  My 
work-around would be to store references to objects (ids), not the 
objects themselves (cross db joins are impossible).  Then in our 
application we'll load the referenced objects are desired.. So that we 
maintain the relations, not the ORM system...

Fernando Padilla wrote:
> right, thank you :)
> 
> you have re-confirmed how I thought the collocation constraint worked, 
> and you also gave me a great motivation why the "replicated" feature 
> came about ( as a work around for the collocation constraint ).
> 
> So now we're back to sqaure one.  Looking at my example use case, the 
> collocation constraint is still too inhibiting.  I want to get rid of 
> those requirements! :)
> 
> So if you wanted to remove that requirement, how would you go about it? 
>  What code would you look at, etc etc.  If I want to put work into 
> fixing this up, where should I begin to look, etc etc.  what are some 
> possible plans.. :) :) :)
> 
> 
> 
> 
> 
> 
> Pinaki Poddar wrote:
>>
>>   One key aspect of data distribution model used in Slice is that the
>> distribution policy is based at instance level and *not* at class level.
>> What it implies for your given scenario is that while User U1 instance 
>> can
>> be persisted in Slice A, another User instance U2 can be stored in 
>> Slice B.
>> So it is not necessary that all User instances are stored in one Slice 
>> and
>> all Comment instances are in a different slice and so forth.
>>   But what about related instances? For the sake of concreteness let us
>> consider the following instances and relations:
>>   User U1 belongs to Group G1 and has commented C11, C12, C13
>>   User U2 belongs to Group G1 and has commented C21
>>
>> The distribution policy determines that U1 and U2 are stored in Slice 
>> A and
>> B respectively.
>> The collocation constraint forces that any instance reachable from U1 
>> (i.e.
>> closure of U1 in Graph theory terms) is stored in Slice A and any 
>> instance
>> reachable from U2 is stored in U2. Thus, C11, C12, C13 go to Slice A 
>> while
>> C21 goes to Slice B.
>>
>> Where does G1 go? G1 is reachable from both U1 and U2. The only current
>> option is G1 is annotated as @Replicated and identical copies of G1 are
>> stored in both Slice A and B.
>> Of course, collocation constraint will prohibit G1 to have a relation 
>> to U1
>> and U2. So, @Replicated is mainly serves to model 'master' data i.e. data
>> that are referred by many but itself refers none. However, the 
>> relationship
>> is not completely lost. For example, a query such as    select u from 
>> User u where u.group.name='G1'" will fetch both U1 and U2 by executing 
>> parallel queries across Slice A and B
>> and merging the results.
>>  
>>
>> Fernando Padilla wrote:
>>> So, now that I have some attention, I'll post up a question I sent 
>>> out a month ago.
>>>
>>> I want to make a connected datamodel, but I want to put objects on 
>>> different databases..
>>>
>>> Let's say I have 3 objects:
>>>
>>> User (slice root)
>>>   - name
>>>
>>> Group (slice root)
>>>   - name
>>>   - users
>>>
>>> Comment (slice grouped with group)
>>>   - group
>>>   - user
>>>   - text
>>>
>>>
>>> As you can see they are all inter-related.  But I let's say I want to 
>>> distribute Users and Groups across databases.  But they are related, 
>>> but can't be collocated.
>>>
>>> So can you help me understand the "collocation" limitation of slices, 
>>> and a way to enhance it to remove this limitation ( if I understand 
>>> it properly ).
>>>
>>>
>>>
>>> ps - If i understand the limitation, I can't have a ManyToMany 
>>> relationship from Group to Users, or ManyToOne from Comment to User, 
>>> instead I would have to have a set of userIds.  And I would have to 
>>> load up each user object myself through code.
>>>
>>>
>>>
>>>
>>

Re: slices, collocation

Posted by Fernando Padilla <fe...@alum.mit.edu>.

right, thank you :)

you have re-confirmed how I thought the collocation constraint worked, 
and you also gave me a great motivation why the "replicated" feature 
came about ( as a work around for the collocation constraint ).

So now we're back to sqaure one.  Looking at my example use case, the 
collocation constraint is still too inhibiting.  I want to get rid of 
those requirements! :)

So if you wanted to remove that requirement, how would you go about it? 
  What code would you look at, etc etc.  If I want to put work into 
fixing this up, where should I begin to look, etc etc.  what are some 
possible plans.. :) :) :)






Pinaki Poddar wrote:
> 
>   One key aspect of data distribution model used in Slice is that the
> distribution policy is based at instance level and *not* at class level.
> What it implies for your given scenario is that while User U1 instance can
> be persisted in Slice A, another User instance U2 can be stored in Slice B.
> So it is not necessary that all User instances are stored in one Slice and
> all Comment instances are in a different slice and so forth. 
> 
>   But what about related instances? For the sake of concreteness let us
> consider the following instances and relations:
>   User U1 belongs to Group G1 and has commented C11, C12, C13
>   User U2 belongs to Group G1 and has commented C21
> 
> The distribution policy determines that U1 and U2 are stored in Slice A and
> B respectively.
> The collocation constraint forces that any instance reachable from U1 (i.e.
> closure of U1 in Graph theory terms) is stored in Slice A and any instance
> reachable from U2 is stored in U2. Thus, C11, C12, C13 go to Slice A while
> C21 goes to Slice B.
> 
> Where does G1 go? G1 is reachable from both U1 and U2. The only current
> option is G1 is annotated as @Replicated and identical copies of G1 are
> stored in both Slice A and B. 
> 
> Of course, collocation constraint will prohibit G1 to have a relation to U1
> and U2. So, @Replicated is mainly serves to model 'master' data i.e. data
> that are referred by many but itself refers none. However, the relationship
> is not completely lost. For example, a query such as 
>    select u from User u where u.group.name='G1'" 
> will fetch both U1 and U2 by executing parallel queries across Slice A and B
> and merging the results. 
> 
>  
> 
> Fernando Padilla wrote:
>> So, now that I have some attention, I'll post up a question I sent out a 
>> month ago.
>>
>> I want to make a connected datamodel, but I want to put objects on 
>> different databases..
>>
>> Let's say I have 3 objects:
>>
>> User (slice root)
>>   - name
>>
>> Group (slice root)
>>   - name
>>   - users
>>
>> Comment (slice grouped with group)
>>   - group
>>   - user
>>   - text
>>
>>
>> As you can see they are all inter-related.  But I let's say I want to 
>> distribute Users and Groups across databases.  But they are related, but 
>> can't be collocated.
>>
>> So can you help me understand the "collocation" limitation of slices, 
>> and a way to enhance it to remove this limitation ( if I understand it 
>> properly ).
>>
>>
>>
>> ps - If i understand the limitation, I can't have a ManyToMany 
>> relationship from Group to Users, or ManyToOne from Comment to User, 
>> instead I would have to have a set of userIds.  And I would have to load 
>> up each user object myself through code.
>>
>>
>>
>>
>

Re: slices, collocation

Posted by Pinaki Poddar <pp...@apache.org>.

  One key aspect of data distribution model used in Slice is that the
distribution policy is based at instance level and *not* at class level.
What it implies for your given scenario is that while User U1 instance can
be persisted in Slice A, another User instance U2 can be stored in Slice B.
So it is not necessary that all User instances are stored in one Slice and
all Comment instances are in a different slice and so forth. 

  But what about related instances? For the sake of concreteness let us
consider the following instances and relations:
  User U1 belongs to Group G1 and has commented C11, C12, C13
  User U2 belongs to Group G1 and has commented C21

The distribution policy determines that U1 and U2 are stored in Slice A and
B respectively.
The collocation constraint forces that any instance reachable from U1 (i.e.
closure of U1 in Graph theory terms) is stored in Slice A and any instance
reachable from U2 is stored in U2. Thus, C11, C12, C13 go to Slice A while
C21 goes to Slice B.

Where does G1 go? G1 is reachable from both U1 and U2. The only current
option is G1 is annotated as @Replicated and identical copies of G1 are
stored in both Slice A and B. 

Of course, collocation constraint will prohibit G1 to have a relation to U1
and U2. So, @Replicated is mainly serves to model 'master' data i.e. data
that are referred by many but itself refers none. However, the relationship
is not completely lost. For example, a query such as 
   select u from User u where u.group.name='G1'" 
will fetch both U1 and U2 by executing parallel queries across Slice A and B
and merging the results. 

Fernando Padilla wrote:
> 
> So, now that I have some attention, I'll post up a question I sent out a 
> month ago.
> 
> I want to make a connected datamodel, but I want to put objects on 
> different databases..
> 
> Let's say I have 3 objects:
> 
> User (slice root)
>   - name
> 
> Group (slice root)
>   - name
>   - users
> 
> Comment (slice grouped with group)
>   - group
>   - user
>   - text
> 
> 
> As you can see they are all inter-related.  But I let's say I want to 
> distribute Users and Groups across databases.  But they are related, but 
> can't be collocated.
> 
> So can you help me understand the "collocation" limitation of slices, 
> and a way to enhance it to remove this limitation ( if I understand it 
> properly ).
> 
> 
> 
> ps - If i understand the limitation, I can't have a ManyToMany 
> relationship from Group to Users, or ManyToOne from Comment to User, 
> instead I would have to have a set of userIds.  And I would have to load 
> up each user object myself through code.
> 
> 
> 
> 

-- 
View this message in context: http://n2.nabble.com/slices%2C-collocation-tp1563065p1569339.html
Sent from the OpenJPA Developers mailing list archive at Nabble.com.