You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Phillip Rhodes <sp...@rhoderunner.com> on 2007/04/20 16:20:52 UTC

SimpleDBPersistenceManager file cache?

hi everyone,
Sorry for the basic question... 

Because applications on multiple machines need to be able to grab jackrabbit stored content from the repository, I am using the SimpleDBPersistenceManager, with externalBLOBs set to false.  I realize that if I set externalBlobs to true, the blobs will be stored on the filesystem, but this makes the content "sticky", that is, accessible by only the machine that has it on the filesystem.

My question is:  Do I need to implement my own file caching for my application?  For example, if my application repeatedly retrieves a binary file from the jackrabbit repository, does jackrabbit cache the retrieved binary file on the filesystem for subsequent requests, or is each request cause the same blob to be read from the db?

No problem either way, I just need to know, since I have 20,000 users with each user having their own collection of images on a very high traffic site.

Thank you everyone for all your help.  It's been great.

Phillip

Re: SimpleDBPersistenceManager file cache?

Posted by David Nuescheler <da...@gmail.com>.

Hi Phillip,

You may want to look into "clustering" or into using RMI.
Jackrabbit "clustering" allows you have multiple Jackrabbit
instances on different machines running off of the same
backing store.

regards,
david

Re: SimpleDBPersistenceManager file cache?

Posted by Miro Walker <mi...@gmail.com>.

Phillip,

As far as I'm aware the configuration you're talking about is not
supported. You'll end up with exceptions as the search indexes and
shared item state cache on each jackrabbit instance will end up
inconsistent and you'll start getting exceptions. By way of analogy
think of setting up two databases to point at the same filesystem -
it's not going to work...

Miro

On 4/20/07, Phillip Rhodes <sp...@rhoderunner.com> wrote:
> Miro,
>
> I thought by storing my jackrabbit data inside the database would allow multiple jackrabbit instances on different machines to access the same backing store (the db is the authoritative source).
>
> I understand that write operations on one machine will not update the cache on the other machines, but the write operations should update the blob in the db, so that other machines can pick up the content from the db (once their cache's expire).
>
> Can you let me know if my beliefs above are wrong?  In the meantime, I will do some testing, and investigate jcr-rmi, etc...
>
> Thanks.
> Phillip

Re: SimpleDBPersistenceManager file cache?

Posted by Phillip Rhodes <sp...@rhoderunner.com>.

Miro,

I thought by storing my jackrabbit data inside the database would allow multiple jackrabbit instances on different machines to access the same backing store (the db is the authoritative source).

I understand that write operations on one machine will not update the cache on the other machines, but the write operations should update the blob in the db, so that other machines can pick up the content from the db (once their cache's expire).

Can you let me know if my beliefs above are wrong?  In the meantime, I will do some testing, and investigate jcr-rmi, etc...

Thanks.
Phillip


----- Original Message -----
From: "Miro Walker" <mi...@gmail.com>
To: users@jackrabbit.apache.org
Sent: Friday, April 20, 2007 11:05:15 AM (GMT-0500) America/New_York
Subject: Re: SimpleDBPersistenceManager file cache?

Hi Phillip,

Why would putting blobs in the database have a bearing on which
machines can access them? Because of the way that Jackrabbit's caching
works, you can't ordinarily access the same backing store (database or
other) from multiple jackrabbit instances on different machines (as
write operations on one machine will not be reflected in the cache on
the other machine(s)). If you want to do this, I believe you might
need to:

* use jcr-rmi or webdav or some other custom remoting protocol to
allow multiple client machines to access a single jackrabbit server
instance; or
* possibly look into clustering multiple jackrabbit servers using the
new capabilities of JR 1.3.

The latter is pretty new technology, so comes with the obvious warnings :-).

Note also that blobs and workspace data are not the only pieces of
transactional data that can be stored on the filesystem. You may also
want to look at the DBFileSystem stuff to allow you to store
workspace.xml files in there too, otherwise newly created workspaces
will only exist on the local filesystem.

Cheers,

Miro

Re: SimpleDBPersistenceManager file cache?

Posted by Miro Walker <mi...@gmail.com>.

Hi Phillip,

Why would putting blobs in the database have a bearing on which
machines can access them? Because of the way that Jackrabbit's caching
works, you can't ordinarily access the same backing store (database or
other) from multiple jackrabbit instances on different machines (as
write operations on one machine will not be reflected in the cache on
the other machine(s)). If you want to do this, I believe you might
need to:

* use jcr-rmi or webdav or some other custom remoting protocol to
allow multiple client machines to access a single jackrabbit server
instance; or
* possibly look into clustering multiple jackrabbit servers using the
new capabilities of JR 1.3.

The latter is pretty new technology, so comes with the obvious warnings :-).

Note also that blobs and workspace data are not the only pieces of
transactional data that can be stored on the filesystem. You may also
want to look at the DBFileSystem stuff to allow you to store
workspace.xml files in there too, otherwise newly created workspaces
will only exist on the local filesystem.

Cheers,

Miro

Re: SimpleDBPersistenceManager file cache?

Posted by Stefan Guggisberg <st...@gmail.com>.

hi phillip,

On 4/20/07, Phillip Rhodes <sp...@rhoderunner.com> wrote:
> hi everyone,
> Sorry for the basic question...

no problem and not a basic question at all ;-)

>
> Because applications on multiple machines need to be able to grab jackrabbit stored content from the repository, I am using the SimpleDBPersistenceManager, with externalBLOBs set to false.  I realize that if I set externalBlobs to true, the blobs will be stored on the filesystem, but this makes the content "sticky", that is, accessible by only the machine that has it on the filesystem.
>
> My question is:  Do I need to implement my own file caching for my application?  For example, if my application repeatedly retrieves a binary file from the jackrabbit repository, does jackrabbit cache the retrieved binary file on the filesystem for subsequent requests, or is each request cause the same blob to be read from the db?

it's the former. items read from the persistence layer are cached in
jackrabbit. this
includes binary properties (i.e. blob values). furthermore, blobs read
from a db are
internally spooled to a temp file. once the binary property associated
with that
blob/temp file is evicted from the cache, the temp file will be deleted.

therefore, repeatedly requesting the same binary property will in general not
cause the same blob to be read repeatedly from the db (assuming it is still
in the cache of course).

cheers
stefan

>
> No problem either way, I just need to know, since I have 20,000 users with each user having their own collection of images on a very high traffic site.
>
> Thank you everyone for all your help.  It's been great.
>
> Phillip
>
>
>
>
>
>
>