You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Thomas Müller <th...@day.com> on 2010/03/12 10:22:55 UTC

[jr3] Store journal as nodes

Currently the journal (cluster journal and event journal) is stored
using a separate storage mechanism.

I think it should be stored using the 'normal' storage mechanism.

Advantages:
- Simplifies the architecture (specially for clustering)
- Events and node data are in the same transaction, which improves
reliability and performance

Regards,
Thomas

Re: [jr3] Store journal as nodes

Posted by Ian Boston <ie...@tfd.co.uk>.
On 15 Mar 2010, at 15:04, Thomas Müller wrote:

> Hi,
> 
>> wasn't the journal added to be separate from the persistence manager
>> implementation and allow for a "fast" exchange of master/slave node
>> information and latest revisions? Or is this separation not useful?
> 
> I'm not sure what the reasons for the current implementation were.

IIRC The Journal was introduced to support cluster wide replication of state changes (and that alone). If you have append only persistence with versions of a master root node, then you effectively have a journal.

For cluster wide replication of state changes, something based on transmission before persistence should, IMHO be used that would allow each node in a cluster to maintain a journal for the entire cluster alongside the append only storage.

I did a prototype of this for the 1.x APIs a while back based on JGroups with master election which looked reasonable and performed at about the IO speed of disk, but I haven't tried it for real.

Ian

Re: [jr3] Store journal as nodes

Posted by Thomas Müller <th...@day.com>.
Hi,

> wasn't the journal added to be separate from the persistence manager
> implementation and allow for a "fast" exchange of master/slave node
> information and latest revisions? Or is this separation not useful?

I'm not sure what the reasons for the current implementation were. Of
course the even journal needs to be fast. However, the "persistence
manager" (main storage, whatever we want to call it) also needs to be
fast.

If it is possible to use the same storage mechanism for both
persistent data and the journal, then that would simplify the
architecture (also with regards to transactions). If this is a
problem, we will use a separate mechanism.

Regards,
Thomas

Re: [jr3] Store journal as nodes

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Mar 12, 2010 at 10:22, Thomas Müller <th...@day.com> wrote:
> Currently the journal (cluster journal and event journal) is stored
> using a separate storage mechanism.
>
> I think it should be stored using the 'normal' storage mechanism.
>
> Advantages:
> - Simplifies the architecture (specially for clustering)
> - Events and node data are in the same transaction, which improves
> reliability and performance

Generally I am in favor of that, but I have one question: wasn't the
journal added to be separate from the persistence manager
implementation and allow for a "fast" exchange of master/slave node
information and latest revisions? Or is this separation not useful?

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: [jr3] Store journal as nodes

Posted by Thomas Müller <th...@day.com>.
Hi,

> In case of cluster db journal, the hostname of db connection.

The hostname of the database (if a database is used) and the database
name needs to be known when creating the repository object. Storing it
in a 'repository.xml' file is possible, but it's just an unnecessary
indirection. If you keep this information in the repository.xml file,
where do you store the path of the repository.xml file? If the user
name and password need to be protected (not stored as plain text) how
do you do that? Using yet another indirection (JNDI)?

I suggest to pass the database URL (or whatever storage you use) when
creating the repository object. Example (using a helper method; just
an example): RepositoryFactoryImpl.openRepository("jdbc:postgresql:repo",
"user", "password");

If you want to use a repository.xml file (that only contains the
database connection information) you can of course. But do you really
need an XML file for the database URL, the user name, and the
password? Specially if the user name and password are things that
normally should not be stored in a file?

Speaking about databases: do you know of a database where you need to
store the location of the database files in an XML file? I guess there
are some databases where you *can* do that, but I don't know any where
you *have to*.

>>> Configuration should be editable without boot the repository.
>> Why?
> Again, for db store, if db host changes after repository shutdown, we
> should be able to config the repository to use a different db host.
> Like we can change in current repository.xml.

The current repository.xml file contains much more than just the
database connection settings. It contains the search index
configuration (or at least part of it), file system configuration,
cluster configuration, data store configuration, security
configuration, workspace configuration (for some the version store),
etc. All that, except for the database connection settings, can be
stored in the repository itself. Because it simplifies things.

> It's a feature of some application server to manage cluster configurations.

I don't see a problem here. They can.

> I would prefer leave the complicity out of default standalone deployment.

I like to keep things as simple as possible. The repository.xml and
workspace.xml files are not required; they actually make things more
complicated than necessary (specially, but not only, when clustering
is used).

Regards,
Thomas

Re: [jr3] Store journal as nodes

Posted by Guo Du <mr...@gmail.com>.
On Fri, Mar 12, 2010 at 10:52 AM, Thomas Müller <th...@day.com> wrote:
>> cluster host information.
> What is that exactly?
In case of cluster db journal, the hostname of db connection.

>> Configuration should be editable without boot the repository.
> Why?
Again, for db store, if db host changes after repository shutdown, we
should be able to config the repository to use a different db host.
Like we can change in current repository.xml.

> One reason is that possibly affects multiple cluster nodes (fulltext
> index configuration, workspace names, security, data store
> configuration, cluster configuration, node type registry). Would you
> rather copy the changed XML configuration files manually to all other
> cluster nodes when there are changes?
It's a feature of some application server to manage cluster
configurations. Nice to have if doesn't cost too much. I would prefer
leave the complicity out of default standalone deployment.

-Guo

Re: [jr3] Store journal as nodes

Posted by Thomas Müller <th...@day.com>.
Hi,

> cluster host information.

What is that exactly?

> Configuration should be editable without boot the repository.

Why?

> Even we store in repository, we should have easy way to override without boot
> the repository.

Yes, that would be an option: settings in
RepositoryFactory.getRepository(Map parameters) could override the
configuration stored in the repository.

> Why we need configuration change transactional?

One reason is that possibly affects multiple cluster nodes (fulltext
index configuration, workspace names, security, data store
configuration, cluster configuration, node type registry). Would you
rather copy the changed XML configuration files manually to all other
cluster nodes when there are changes?

> It doesn't make sense to make security configuration change transactional because normally we edit the configuration when repository is offline.

*Currently* you do, because there is no other way.

Regards,
Thomas

Re: [jr3] Store journal as nodes

Posted by Guo Du <mr...@gmail.com>.
On Fri, Mar 12, 2010 at 10:29 AM, Thomas Müller <th...@day.com> wrote:
> The rest of the configuration (fulltext index configuration for
> example, workspace names, security, data store configuration, cluster
> configuration, node type registry) should be in the repository (as
Configuration should be editable without boot the repository. Even we
store in repository, we should have easy way to override without boot
the repository. Such as cluster host information.

> system nodes) in the normal case. This is to simplify the system and
> to make configuration changes transactional.
Why we need configuration change transactional? It doesn't make sense
to make security configuration change transactional because normally
we edit the configuration when repository is offline.

-Guo

Re: [jr3] Store journal as nodes

Posted by Thomas Müller <th...@day.com>.
Hi,

> (except logging

Yes, I think SLF4J is fine

> and configuration, probably

Some information need to be available when the repository is
constructed, or at the latest when logging in: What storage backend to
use, and how to connect to the storage backend.

The rest of the configuration (fulltext index configuration for
example, workspace names, security, data store configuration, cluster
configuration, node type registry) should be in the repository (as
system nodes) in the normal case. This is to simplify the system and
to make configuration changes transactional.

There may be a ways to override that (for example when constructing
the repository object), but that should be the exception. I think it
doesn't make sense to keep the xml configuration files.

> What do you mean by "'normal' storage mechanism" ?

I mean the data should be stored in the same place as the node data.
Unless we find it is a performance problem, I would try to store the
events as "node bundles" of some kind (possibly multiple events plus
regular nodes in the same bundle). For the "micro kernel" it could
look exactly like a normal node.

> Is it nodes and properties, in which case I fear further performance issues in this area.

If it does turn out to be a performance problem, we will change it of course.

Regards,
Thomas

Re: [jr3] Store journal as nodes

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

IIRC this would be inline with another discussion in the [jr3] arena, to
have a common low-level persistence upon which all the other parts of
Jackrabbit requiring some form of persistence (except logging and
configuration, probably) build.

In this context, I would think it to be a good idea to consolidate this
data in the common persistence.

On 12.03.2010 10:22, Thomas Müller wrote:
> Currently the journal (cluster journal and event journal) is stored
> using a separate storage mechanism.
> 
> I think it should be stored using the 'normal' storage mechanism.

What do you mean by "'normal' storage mechanism" ?

Is it nodes and properties, in which case I fear further performance
issues in this area.

Is it the common persistence layer, then definitely +1

Regards
Felix

> 
> Advantages:
> - Simplifies the architecture (specially for clustering)
> - Events and node data are in the same transaction, which improves
> reliability and performance
> 
> Regards,
> Thomas
>