You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Medha C Sutaria <ms...@csc.com> on 2009/11/06 08:39:47 UTC
jackrabbit configuration in clustered environment
Hi,
We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). It
uses the following configuration -
JCRHook + PersistenceManager + File system
The file system is stored on a local folder of the same machine where the
application is running.
Now we want to cluster our application, for which we need to share the
jackrabbit repository. Tried to find a lot of solutions for this, but
couldn't zero down on the best solution for this. We need to decide which
combination of congifurations is the best for clustering and high
performance -
JCRHook vs FileSystemHook?
PersistenceManager vs Datastore?
FileSystem vs Database?
if filesystem, sharing the file system? or using SAN?
More concerns -
1. To select a different solution, migration of current documents to the
new solutions has to be done
2. The repository is increasing exponentially. The file system size is
already 10 GB. Does jackrabbit support such large repositories? At what
point will the performance start degrading?
3. What will it take to upgrade the jackrabbit version from 1.4 to 1.6?
Will be require any migration?
Thanks and regards,
Medha Sutaria
Re: jackrabbit configuration in clustered environment
Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Nov 6, 2009 at 13:22, Medha C Sutaria <ms...@csc.com> wrote:
> Medha - we use BundleFsPersistenceManager.
Note that the FS-based persistence managers don't guarantee any consistency.
See http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
>> - FileSystem (element in repository.xml) is not important anymore,
>> does not influence peformance
> Is it this tag you are talking about? If yes, then isn't this which decides
> if we want to store data in DB or on LocalFileSystem?
> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
> <param name="path" value="${rep.home}/repository" />
> </FileSystem>
No, it doesn't. As I mentioned, it is legacy and not important. Where
your data is stored depends on the persistence manager (central
persistence component), datastore (if used, either db or file) and
also search index (file only if enabled) and the clustering journal.
> Will adding this tag in our repository.xml make the repository usable with
> SAN? (our repository.xml attached)
> <DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
> <param name="path" value="${rep.home}/repository/datastore"/>
> <param name="minRecordLength" value="1000"/>
> </DataStore>
The datastore only stores large binaries. If you want to share your
repository, you need clustering anyway.
See http://wiki.apache.org/jackrabbit/DataStore and
http://wiki.apache.org/jackrabbit/Clustering
>> See here http://wiki.apache.org/jackrabbit/BackupAndMigration for
>> some options.
> I checked out these options in the past. I've learned that there's a problem
> in migrating versions. That part of the repository tree is secured and
> cannot be exported to an xml file. We needed migration of data when we tried
> to use DbFileSystem instead of LocalFileSystem. I guess we don't need to
> migrate in case of changing other configuration? Eg. using datastore?
Have you tried the (fairly new, since 1.6) RepositoryCopier mentioned
at the end of that wiki page?
http://jackrabbit.apache.org/api/1.6/org/apache/jackrabbit/core/RepositoryCopier.html
> Can you suggest which is the best configuration for clustering (based on
> performance and large repositories)
See the mentioned wiki pages. A database with good and fast clustering
is important. For fast write and streaming of large binaries a shared
filedatastore is optimal, though that depends on the speed of the SAN.
Regards,
Alex
--
Alexander Klimetschek
alexander.klimetschek@day.com
Re: jackrabbit configuration in clustered environment
Posted by Medha C Sutaria <ms...@csc.com>.
my repository.xml (dont know how to attach a file on this thread!) -
<?xml version="1.0"?>
<Repository>
<!-- added to use datastore for larger files -->
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
<param name="path"
value="${rep.home}/repository/datastore"/>
<param name="minRecordLength" value="1000"/>
</DataStore>
<FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/repository" />
</FileSystem>
<!--
Database File System (Cluster Configuration)
This is sample configuration for mysql persistence that can be
used for
clustering Jackrabbit. For other databases, change the connection,
credentials, and schema settings.
-->
<!--<FileSystem
class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url" value="jdbc:mysql://localhost/jcr" />
<param name="user" value="" />
<param name="password" value="" />
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="J_R_FS_"/>
</FileSystem>-->
<Security appName="Jackrabbit">
<AccessManager
class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
<LoginModule
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
<param name="anonymousId" value="anonymous" />
</LoginModule>
</Security>
<Workspaces rootPath="${rep.home}/workspaces"
defaultWorkspace="liferay" />
<Workspace name="${wsp.name}">
<FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${wsp.home}" />
</FileSystem>
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.BundleFsPersistenceManager"
/>
<!--
Database File System and Persistence (Cluster
Configuration)
This is sample configuration for mysql persistence that
can be used for
clustering Jackrabbit. For other databases, change the
connection,
credentials, and schema settings.
-->
<!--<PersistenceManager
class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager">
<param name="driver" value="com.mysql.jdbc.Driver"
/>
<param name="url"
value="jdbc:mysql://localhost/jcr" />
<param name="user" value="" />
<param name="password" value="" />
<param name="schema" value="mysql" />
<param name="schemaObjectPrefix"
value="J_PM_${wsp.name}_" />
<param name="externalBLOBs" value="false" />
</PersistenceManager>
<FileSystem
class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="driver"
value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql://localhost/jcr" />
<param name="user" value="" />
<param name="password" value="" />
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix"
value="J_FS_${wsp.name}_"/>
</FileSystem>-->
</Workspace>
<Versioning rootPath="${rep.home}/version">
<FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/version" />
</FileSystem>
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.BundleFsPersistenceManager"
/>
<!--
Database File System and Persistence (Cluster
Configuration)
This is sample configuration for mysql persistence that
can be used for
clustering Jackrabbit. For other databases, change the
connection,
credentials, and schema settings.
-->
<!--<FileSystem
class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="driver"
value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql://localhost/jcr" />
<param name="user" value="" />
<param name="password" value="" />
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="J_V_FS_"/>
</FileSystem>
<PersistenceManager
class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager">
<param name="driver" value="com.mysql.jdbc.Driver"
/>
<param name="url"
value="jdbc:mysql://localhost/jcr" />
<param name="user" value="" />
<param name="password" value="" />
<param name="schema" value="mysql" />
<param name="schemaObjectPrefix" value="J_V_PM_"
/>
<param name="externalBLOBs" value="false" />
</PersistenceManager>-->
</Versioning>
<!--
Cluster Configuration
This is sample configuration for mysql persistence that can be
used for
clustering Jackrabbit. For other databases, change the connection,
credentials, and schema settings.
-->
<!--<Cluster id="node_1" syncDelay="5">
<Journal
class="org.apache.jackrabbit.core.journal.DatabaseJournal">
<param name="revision"
value="${rep.home}/revision"/>
<param name="driver"
value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql://localhost/jcr"/>
<param name="user" value=""/>
<param name="password" value=""/>
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="J_C_"/>
</Journal>
</Cluster>-->
</Repository>
Medha C Sutaria/FSG/CSC@CSC
11/06/2009 05:52 PM
Please respond to
users@jackrabbit.apache.org
To
users@jackrabbit.apache.org
cc
Subject
Re: jackrabbit configuration in clustered environment
Thanks Alex for the prompt reply! Your answers have really cleared some of
the things. Few more queries inline -
Alexander Klimetschek <ak...@day.com> wrote on 11/06/2009 02:40:38 PM:
> On Fri, Nov 6, 2009 at 08:39, Medha C Sutaria <ms...@csc.com> wrote:
> > We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2).
It
> > uses the following configuration -
> > JCRHook + PersistenceManager + File system
> >...
> > JCRHook vs FileSystemHook?
> > PersistenceManager vs Datastore?
> > FileSystem vs Database?
> > if filesystem, sharing the file system? or using SAN?
>
> You need to be more specific. Which persistence manager are you using?
Medha - we use BundleFsPersistenceManager.
>
> Quick notes:
> - bundle based persistence managers are best
> - local dbs (like derby or h2) have better performance than remote dbs
Medha - Any idea about MySql? Saw some posts about table locking and
concurrent access issues while retrieving/updating files
> - datastore will only be used for large binaries; using filedatastore
> is a better choice than storing the binaries in a database (using a db
> pm)
> - FileSystem (element in repository.xml) is not important anymore,
> does not influence peformance
Is it this tag you are talking about? If yes, then isn't this which
decides if we want to store data in DB or on LocalFileSystem?
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/repository" />
</FileSystem>
> - JCRHook seems to be a proprietary liferay component, so we
> (jackrabbit devs) cannot give you any information on this
> - if you do clustering and use the datastore, you will need a shared
> file system, SAN is typically the best (but know your network
> performance)
Will adding this tag in our repository.xml make the repository usable with
SAN? (our repository.xml attached)
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
<param name="path" value="${rep.home}/repository/datastore"/>
<param name="minRecordLength" value="1000"/>
</DataStore>
>
> > 1. To select a different solution, migration of current documents to
the
> > new solutions has to be done
>
> See here http://wiki.apache.org/jackrabbit/BackupAndMigration for
> some options.
I checked out these options in the past. I've learned that there's a
problem in migrating versions. That part of the repository tree is secured
and cannot be exported to an xml file. We needed migration of data when we
tried to use DbFileSystem instead of LocalFileSystem. I guess we don't
need to migrate in case of changing other configuration? Eg. using
datastore?
>
> > 2. The repository is increasing exponentially. The file system size is
> > already 10 GB. Does jackrabbit support such large repositories? At
what
> > point will the performance start degrading?
>
> Depends on what configuration you actually use.
Can you suggest which is the best configuration for clustering (based on
performance and large repositories)
>
> > 3. What will it take to upgrade the jackrabbit version from 1.4 to
1.6?
> > Will be require any migration?
>
> AFAIK nothing would be required from 1.4 to 1.6. Minor version numbers
> are meant to be backwards compatible in Jackrabbit.
This sounds great!
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com
Re: jackrabbit configuration in clustered environment
Posted by Medha C Sutaria <ms...@csc.com>.
Thanks Alex for the prompt reply! Your answers have really cleared some of
the things. Few more queries inline -
Alexander Klimetschek <ak...@day.com> wrote on 11/06/2009 02:40:38 PM:
> On Fri, Nov 6, 2009 at 08:39, Medha C Sutaria <ms...@csc.com> wrote:
> > We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2).
It
> > uses the following configuration -
> > JCRHook + PersistenceManager + File system
> >...
> > JCRHook vs FileSystemHook?
> > PersistenceManager vs Datastore?
> > FileSystem vs Database?
> > if filesystem, sharing the file system? or using SAN?
>
> You need to be more specific. Which persistence manager are you using?
Medha - we use BundleFsPersistenceManager.
>
> Quick notes:
> - bundle based persistence managers are best
> - local dbs (like derby or h2) have better performance than remote dbs
Medha - Any idea about MySql? Saw some posts about table locking and
concurrent access issues while retrieving/updating files
> - datastore will only be used for large binaries; using filedatastore
> is a better choice than storing the binaries in a database (using a db
> pm)
> - FileSystem (element in repository.xml) is not important anymore,
> does not influence peformance
Is it this tag you are talking about? If yes, then isn't this which
decides if we want to store data in DB or on LocalFileSystem?
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/repository" />
</FileSystem>
> - JCRHook seems to be a proprietary liferay component, so we
> (jackrabbit devs) cannot give you any information on this
> - if you do clustering and use the datastore, you will need a shared
> file system, SAN is typically the best (but know your network
> performance)
Will adding this tag in our repository.xml make the repository usable with
SAN? (our repository.xml attached)
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
<param name="path" value="${rep.home}/repository/datastore"/>
<param name="minRecordLength" value="1000"/>
</DataStore>
>
> > 1. To select a different solution, migration of current documents to
the
> > new solutions has to be done
>
> See here http://wiki.apache.org/jackrabbit/BackupAndMigration for
> some options.
I checked out these options in the past. I've learned that there's a
problem in migrating versions. That part of the repository tree is secured
and cannot be exported to an xml file. We needed migration of data when we
tried to use DbFileSystem instead of LocalFileSystem. I guess we don't
need to migrate in case of changing other configuration? Eg. using
datastore?
>
> > 2. The repository is increasing exponentially. The file system size is
> > already 10 GB. Does jackrabbit support such large repositories? At
what
> > point will the performance start degrading?
>
> Depends on what configuration you actually use.
Can you suggest which is the best configuration for clustering (based on
performance and large repositories)
>
> > 3. What will it take to upgrade the jackrabbit version from 1.4 to
1.6?
> > Will be require any migration?
>
> AFAIK nothing would be required from 1.4 to 1.6. Minor version numbers
> are meant to be backwards compatible in Jackrabbit.
This sounds great!
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com
Re: jackrabbit configuration in clustered environment
Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Nov 6, 2009 at 08:39, Medha C Sutaria <ms...@csc.com> wrote:
> We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). It
> uses the following configuration -
> JCRHook + PersistenceManager + File system
>...
> JCRHook vs FileSystemHook?
> PersistenceManager vs Datastore?
> FileSystem vs Database?
> if filesystem, sharing the file system? or using SAN?
You need to be more specific. Which persistence manager are you using?
Quick notes:
- bundle based persistence managers are best
- local dbs (like derby or h2) have better performance than remote dbs
- datastore will only be used for large binaries; using filedatastore
is a better choice than storing the binaries in a database (using a db
pm)
- FileSystem (element in repository.xml) is not important anymore,
does not influence peformance
- JCRHook seems to be a proprietary liferay component, so we
(jackrabbit devs) cannot give you any information on this
- if you do clustering and use the datastore, you will need a shared
file system, SAN is typically the best (but know your network
performance)
> 1. To select a different solution, migration of current documents to the
> new solutions has to be done
See here http://wiki.apache.org/jackrabbit/BackupAndMigration for some options.
> 2. The repository is increasing exponentially. The file system size is
> already 10 GB. Does jackrabbit support such large repositories? At what
> point will the performance start degrading?
Depends on what configuration you actually use.
> 3. What will it take to upgrade the jackrabbit version from 1.4 to 1.6?
> Will be require any migration?
AFAIK nothing would be required from 1.4 to 1.6. Minor version numbers
are meant to be backwards compatible in Jackrabbit.
Regards,
Alex
--
Alexander Klimetschek
alexander.klimetschek@day.com