You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Dennis van der Laan <d....@rug.nl> on 2009/12/11 11:29:23 UTC

bundle PM versus db PM

Hi,

There's a bundle.OraclePersistencemanager and a
db.OraclePersistenceManager. I think I read somewhere the bundle PM is
newer and you might want to migrate towards this. The clustering
documentation on the wiki shows an example using the
db.OraclePersistenceManager. Should we use the bundle PM instead of the
db PM? What's the difference? And should the javadoc not contain some
information about how these two implementations relate to one another?

Regards,
Dennis

-- 
Dennis van der Laan


Re: bundle PM versus db PM (and clustering)

Posted by Alexander Klimetschek <ak...@day.com>.
On Mon, Dec 14, 2009 at 10:16, Dennis van der Laan
<d....@rug.nl> wrote:
> Alexander said he thinks a FileSystem is not used, in conjunction with a
> SimpleDB PM (my configuration at the moment), so it would be save to
> just switch to a local filesystem implementation. He pointed me to [1],
> which says FileSystem is only used by some parts of Jackrabbit.
> I checked the database and I see entries in the globally shared
> filesystem (records for /meta/rootUUID, /meta/rep.properties,
> /namespaces/ns_reg.properties, /namespaces/ns_idx.properties and
> /nodetypes/custom_nodetypes.xml).

Yes, that's a reason why the FileSystem is still required. Since these
files are per-repository (or per-node), they must not be shared in the
database when a cluster is used. That was the definition of "private"
(see below). Using a LocalFileSystem for that is the simplest
solution, however, you need to back these files up as well.

Also, these files contain data that is changed not very often, hence
this part is not performance critical.

> The persistence manager FAQ [2] says BundlePMs are usually the fastest,
> and are used in conjunction with either a LocalFileSystem or a
> DbFileSystem, so according to this it seems a FileSystem is still needed.

A persistence manager is free to chose whether it uses the configured
Jackrabbit FileSystem or not. That's another reason why it is still in
the repository.xml configuration. However, all modern bundle PMs don't
use it (or at least not if blobs are used, which shouldn't, as a
datastore is better for that).


> The clustering documentation [3] states each cluster node needs its own
> (private) FileSystem and all nodes must store their data in the same
> globally accessible location.

As noted above: private == separate schema if DbFileSystem is used on
the same db, or simply LocalFileSystem

> When I have one cluster node with a repository which already has some
> data, and I add a new node to the cluster with a different DbFileSystem,
> after the new node has updated its state and is in sync with the
> repository, when accessing the repository through the new node I get an
> exception the node does not know my custom namespace configuration. When
> using the same DbFileSystem configuration for all nodes, I don't get
> this exception and all seems to work well... But it doesn't feel right,
> because I don't know what the effects might be. If I would use a
> database-bundle-PM in a cluster setup, do I need a shared (Db)FileSystem
> or would it be better to use a local filesystem and do I have to
> configure my custom nodetypes on every cluster node separately?

Nodetypes and namespaces are not synchronized over the cluster (IIRC),
so your local cluster startup code has to register them (if not
present yet) upon start of the repository.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: bundle PM versus db PM (and clustering)

Posted by Dennis van der Laan <d....@rug.nl>.
Hi,
>>> classes may explain HOW they work, but it's unclear for me which one is
>>> the most appropriate one to use.
>>>       
>
> See http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
>
>   
Thanks Thomas. I saw this, too. But when browsing all the information on
the Wiki, I really get confused about which information is up-to-date,
if there is any. A problem a lot of open source projects suffer from, I
guess.
Alexander said he thinks a FileSystem is not used, in conjunction with a
SimpleDB PM (my configuration at the moment), so it would be save to
just switch to a local filesystem implementation. He pointed me to [1],
which says FileSystem is only used by some parts of Jackrabbit.
I checked the database and I see entries in the globally shared
filesystem (records for /meta/rootUUID, /meta/rep.properties,
/namespaces/ns_reg.properties, /namespaces/ns_idx.properties and
/nodetypes/custom_nodetypes.xml).

The persistence manager FAQ [2] says BundlePMs are usually the fastest,
and are used in conjunction with either a LocalFileSystem or a
DbFileSystem, so according to this it seems a FileSystem is still needed.

The clustering documentation [3] states each cluster node needs its own
(private) FileSystem and all nodes must store their data in the same
globally accessible location.

When I have one cluster node with a repository which already has some
data, and I add a new node to the cluster with a different DbFileSystem,
after the new node has updated its state and is in sync with the
repository, when accessing the repository through the new node I get an
exception the node does not know my custom namespace configuration. When
using the same DbFileSystem configuration for all nodes, I don't get
this exception and all seems to work well... But it doesn't feel right,
because I don't know what the effects might be. If I would use a
database-bundle-PM in a cluster setup, do I need a shared (Db)FileSystem
or would it be better to use a local filesystem and do I have to
configure my custom nodetypes on every cluster node separately?

Thanks,
Dennis

-- 
Dennis van der Laan


Re: bundle PM versus db PM

Posted by Thomas Müller <th...@day.com>.
Hi,

>> classes may explain HOW they work, but it's unclear for me which one is
>> the most appropriate one to use.

See http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ

Regards,
Thomas

Re: bundle PM versus db PM

Posted by Guo Du <mr...@gmail.com>.
On Sun, Dec 13, 2009 at 1:25 PM, Dennis van der Laan
<d....@rug.nl> wrote:
> classes may explain HOW they work, but it's unclear for me which one is
> the most appropriate one to use. I guess, if there are more read-actions
Bundle PM is more appropriate in most of the case because it will
reduce the number of query to db which we know it is VERY expensive.
And we normally access multiple property of the node, it make scence
to have them all loaded together.

> or more large item state changes, the Bundle PM would be the most
> efficient, and if there are more small item state changes (single
> property changes?), the SimpleDB PM would be more efficient, correct?
SimpleDB PM could be faster for some special case. It's easy to switch
back and forth, so you can do some test to find out for your app.

-Guo

Re: bundle PM versus db PM

Posted by Dennis van der Laan <d....@rug.nl>.
Guo Du wrote:
> On Fri, Dec 11, 2009 at 10:29 AM, Dennis van der Laan
> <d....@rug.nl> wrote:
>   
>> db PM? What's the difference? And should the javadoc not contain some
>> information about how these two implementations relate to one another?
>>     
>
> BundlePersistenceManager: The state and all property states of one
> node are stored together in one record.
> SimpleDbPersistenceManager: persists ItemState NodeReferences objects
> using a simple custom binary serialization format and a very basic
> non-normalized database schema.
>
> The difference for singe node would be you will get one row for
> BundlePersistenceManager and possible multi rows for
> SimpleDbPersistenceManager.
>
> You may find more details from following two javadocs:
> org.apache.jackrabbit.core.persistence.bundle.AbstractBundlePersistenceManager
> org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManage
>   
Thanks Guo Du, but I had already found the javadoc. The point is, both
classes may explain HOW they work, but it's unclear for me which one is
the most appropriate one to use. I guess, if there are more read-actions
or more large item state changes, the Bundle PM would be the most
efficient, and if there are more small item state changes (single
property changes?), the SimpleDB PM would be more efficient, correct?

Dennis

-- 
Dennis van der Laan


Re: bundle PM versus db PM

Posted by Guo Du <mr...@gmail.com>.
On Fri, Dec 11, 2009 at 10:29 AM, Dennis van der Laan
<d....@rug.nl> wrote:
> db PM? What's the difference? And should the javadoc not contain some
> information about how these two implementations relate to one another?

BundlePersistenceManager: The state and all property states of one
node are stored together in one record.
SimpleDbPersistenceManager: persists ItemState NodeReferences objects
using a simple custom binary serialization format and a very basic
non-normalized database schema.

The difference for singe node would be you will get one row for
BundlePersistenceManager and possible multi rows for
SimpleDbPersistenceManager.

You may find more details from following two javadocs:
org.apache.jackrabbit.core.persistence.bundle.AbstractBundlePersistenceManager
org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManage

-Guo