You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cayenne.apache.org by Richard Frovarp <rf...@apache.org> on 2011/11/17 00:00:49 UTC

Multithreaded application trouble with contexts

I'm trying to write a multithreaded crawler using Cayenne. I previously 
had it working with Torque.

I'm writing different information out to the database (and Solr). Some 
of the information is used by multiple threads, and should only be 
created if it doesn't already exist in the db. Outgoing links is the one 
that is giving trouble. Many of our pages point to the same link, so it 
should use that same reference in the database if one exists. If one 
does not exist, it should create it. Further actions should check for 
existence.

If I don't commit the context frequently enough, it starts attempting to 
insert duplicate URLs. I have that fixed, but now am getting this sort 
of message:

Cannot set object as destination of relationship toResource because it 
is in a different ObjectContext

What's the best strategy for doing frequent updates to the database with 
multiple threads?

I am beginning to think I'm headed down the wrong path and should switch 
to something else completely to store this data, such as NoSQL.

Re: Multithreaded application trouble with contexts

Posted by Richard Frovarp <rf...@apache.org>.
On 11/16/2011 09:15 PM, Robert Zeigler wrote:
> You said "I have that fixed" but you don't say how; also, is the exception you are seeing now related to the shared URLs?
>
> In any event, you need a call to ObjectContext.localObject to resolve the error you are seeing.
>
> Robert

I used very heavy synchronization with one ObjectContext with child 
contexts. The exception was related to the shared URLs. I want to 
retrieve the row related to the URL if one already exists, other wise 
create a row and return that.

Using ObjectContext.localObject got me past the different ObjectContext 
problem.

Using Michael's warning about threads sharing contexts, and creating a 
synchonized block to handle adds got me the rest of the way.

So, now everything is working, and I'm not getting any errors or any 
deadlocks.

Thanks Robert and Michael.

Richard

Re: Multithreaded application trouble with contexts

Posted by Robert Zeigler <ro...@roxanemy.com>.
You said "I have that fixed" but you don't say how; also, is the exception you are seeing now related to the shared URLs? 

In any event, you need a call to ObjectContext.localObject to resolve the error you are seeing. 

Robert

On Nov 16, 2011, at 11/165:00 PM , Richard Frovarp wrote:

> I'm trying to write a multithreaded crawler using Cayenne. I previously had it working with Torque.
> 
> I'm writing different information out to the database (and Solr). Some of the information is used by multiple threads, and should only be created if it doesn't already exist in the db. Outgoing links is the one that is giving trouble. Many of our pages point to the same link, so it should use that same reference in the database if one exists. If one does not exist, it should create it. Further actions should check for existence.
> 
> If I don't commit the context frequently enough, it starts attempting to insert duplicate URLs. I have that fixed, but now am getting this sort of message:
> 
> Cannot set object as destination of relationship toResource because it is in a different ObjectContext
> 
> What's the best strategy for doing frequent updates to the database with multiple threads?
> 
> I am beginning to think I'm headed down the wrong path and should switch to something else completely to store this data, such as NoSQL.


Re: Multithreaded application trouble with contexts

Posted by Michael Gentry <mg...@masslight.net>.
Hi Richard,
When to commit is really dependent upon the application, but a general
guide would be whenever your object graph is in a state that you want
to commit the changes.  Cayenne's DataContext (and supporting stack)
is thread-safe in that an application can have numerous DataContexts
doing queries and commits, but you shouldn't have multiple threads
sharing a DataContext without some kind of thread-safe controls over
it.

It sounds like your crawler needs to commit the URLs every time it
adds one, but even then I could see a situation where multiple threads
could insert the same URL.  Creating a thread-safe singleton that used
a single DataContext might be the best approach for you.  The
singleton would have addURL() type methods and those would manipulate
the DataContext in a thread-safe way and ensure no duplicates.

mrg


On Wed, Nov 16, 2011 at 6:00 PM, Richard Frovarp <rf...@apache.org> wrote:
> What's the best strategy for doing frequent updates to the database with
> multiple threads?