You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Medha C Sutaria <ms...@csc.com> on 2009/11/06 08:39:47 UTC

jackrabbit configuration in clustered environment

Hi,

We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). It 
uses the following configuration -
JCRHook + PersistenceManager + File system

The file system is stored on a local folder of the same machine where the 
application is running.

Now we want to cluster our application, for which we need to share the 
jackrabbit repository. Tried to find a lot of solutions for this, but 
couldn't zero down on the best solution for this. We need to decide which 
combination of congifurations is the best for clustering and high 
performance -
JCRHook vs FileSystemHook?
PersistenceManager vs Datastore?
FileSystem vs Database?
if filesystem, sharing the file system? or using SAN?

More concerns -
1. To select a different solution, migration of current documents to the 
new solutions has to be done
2. The repository is increasing exponentially. The file system size is 
already 10 GB. Does jackrabbit support such large repositories? At what 
point will the performance start degrading?
3. What will it take to upgrade the jackrabbit version from 1.4 to 1.6? 
Will be require any migration?

Thanks and regards,
Medha Sutaria

Re: jackrabbit configuration in clustered environment

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Nov 6, 2009 at 13:22, Medha C Sutaria <ms...@csc.com> wrote:
> Medha - we use BundleFsPersistenceManager.

Note that the FS-based persistence managers don't guarantee any consistency.

See http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ

>> - FileSystem (element in repository.xml) is not important anymore,
>> does not influence peformance
> Is it this tag you are talking about? If yes, then isn't this which decides
> if we want to store data in DB or on LocalFileSystem?
> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>         <param name="path" value="${rep.home}/repository" />
> </FileSystem>

No, it doesn't. As I mentioned, it is legacy and not important. Where
your data is stored depends on the persistence manager (central
persistence component), datastore (if used, either db or file) and
also search index (file only if enabled) and the clustering journal.

> Will adding this tag in our repository.xml make the repository usable with
> SAN? (our repository.xml attached)
> <DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
>         <param name="path" value="${rep.home}/repository/datastore"/>
>         <param name="minRecordLength" value="1000"/>
> </DataStore>

The datastore only stores large binaries. If you want to share your
repository, you need clustering anyway.

See http://wiki.apache.org/jackrabbit/DataStore and
http://wiki.apache.org/jackrabbit/Clustering

>> See here http://wiki.apache.org/jackrabbit/BackupAndMigration for
>> some options.
> I checked out these options in the past. I've learned that there's a problem
> in migrating versions. That part of the repository tree is secured and
> cannot be exported to an xml file. We needed migration of data when we tried
> to use DbFileSystem instead of LocalFileSystem. I guess we don't need to
> migrate in case of changing other configuration? Eg. using datastore?

Have you tried the (fairly new, since 1.6) RepositoryCopier mentioned
at the end of that wiki page?

http://jackrabbit.apache.org/api/1.6/org/apache/jackrabbit/core/RepositoryCopier.html

> Can you suggest which is the best configuration for clustering (based on
> performance and large repositories)

See the mentioned wiki pages. A database with good and fast clustering
is important. For fast write and streaming of large binaries a shared
filedatastore is optimal, though that depends on the speed of the SAN.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: jackrabbit configuration in clustered environment

Posted by Medha C Sutaria <ms...@csc.com>.
my repository.xml (dont know how to attach a file on this thread!) -

<?xml version="1.0"?>

<Repository>

        <!-- added to use datastore for larger files  -->
        <DataStore class="org.apache.jackrabbit.core.data.FileDataStore"> 
                <param name="path" 
value="${rep.home}/repository/datastore"/> 
                <param name="minRecordLength" value="1000"/> 
        </DataStore>

        <FileSystem 
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
                <param name="path" value="${rep.home}/repository" />
        </FileSystem>

        <!--
        Database File System (Cluster Configuration)

        This is sample configuration for mysql persistence that can be 
used for
        clustering Jackrabbit. For other databases, change the connection,
        credentials, and schema settings.
        -->

        <!--<FileSystem 
class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
                <param name="driver" value="com.mysql.jdbc.Driver"/>
                <param name="url" value="jdbc:mysql://localhost/jcr" />
                <param name="user" value="" />
                <param name="password" value="" />
                <param name="schema" value="mysql"/>
                <param name="schemaObjectPrefix" value="J_R_FS_"/>
        </FileSystem>-->

        <Security appName="Jackrabbit">
                <AccessManager 
class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
                <LoginModule 
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
                        <param name="anonymousId" value="anonymous" />
                </LoginModule>
        </Security>
        <Workspaces rootPath="${rep.home}/workspaces" 
defaultWorkspace="liferay" />
        <Workspace name="${wsp.name}">
                <FileSystem 
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
                        <param name="path" value="${wsp.home}" />
                </FileSystem>
                <PersistenceManager 
class="org.apache.jackrabbit.core.persistence.bundle.BundleFsPersistenceManager" 
/>

                <!--
                Database File System and Persistence (Cluster 
Configuration)

                This is sample configuration for mysql persistence that 
can be used for
                clustering Jackrabbit. For other databases, change the 
connection,
                credentials, and schema settings.
                -->

                <!--<PersistenceManager 
class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager">
                        <param name="driver" value="com.mysql.jdbc.Driver" 
/>
                        <param name="url" 
value="jdbc:mysql://localhost/jcr" />
                        <param name="user" value="" />
                        <param name="password" value="" />
                        <param name="schema" value="mysql" />
                        <param name="schemaObjectPrefix" 
value="J_PM_${wsp.name}_" />
                        <param name="externalBLOBs" value="false" />
                </PersistenceManager>
                <FileSystem 
class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
                        <param name="driver" 
value="com.mysql.jdbc.Driver"/>
                        <param name="url" 
value="jdbc:mysql://localhost/jcr" />
                        <param name="user" value="" />
                        <param name="password" value="" />
                        <param name="schema" value="mysql"/>
                        <param name="schemaObjectPrefix" 
value="J_FS_${wsp.name}_"/>
                </FileSystem>-->
        </Workspace>
        <Versioning rootPath="${rep.home}/version">
                <FileSystem 
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
                        <param name="path" value="${rep.home}/version" />
                </FileSystem>
                <PersistenceManager 
class="org.apache.jackrabbit.core.persistence.bundle.BundleFsPersistenceManager" 
/>

                <!--
                Database File System and Persistence (Cluster 
Configuration)

                This is sample configuration for mysql persistence that 
can be used for
                clustering Jackrabbit. For other databases, change the 
connection,
                credentials, and schema settings.
                -->

                <!--<FileSystem 
class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
                        <param name="driver" 
value="com.mysql.jdbc.Driver"/>
                        <param name="url" 
value="jdbc:mysql://localhost/jcr" />
                        <param name="user" value="" />
                        <param name="password" value="" />
                        <param name="schema" value="mysql"/>
                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
                </FileSystem>
                <PersistenceManager 
class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager">
                        <param name="driver" value="com.mysql.jdbc.Driver" 
/>
                        <param name="url" 
value="jdbc:mysql://localhost/jcr" />
                        <param name="user" value="" />
                        <param name="password" value="" />
                        <param name="schema" value="mysql" />
                        <param name="schemaObjectPrefix" value="J_V_PM_" 
/>
                        <param name="externalBLOBs" value="false" />
                </PersistenceManager>-->
        </Versioning>

        <!--
        Cluster Configuration

        This is sample configuration for mysql persistence that can be 
used for
        clustering Jackrabbit. For other databases, change the connection,
        credentials, and schema settings.
        -->

    <!--<Cluster id="node_1" syncDelay="5">
                <Journal 
class="org.apache.jackrabbit.core.journal.DatabaseJournal">
                        <param name="revision" 
value="${rep.home}/revision"/>
                        <param name="driver" 
value="com.mysql.jdbc.Driver"/>
                        <param name="url" 
value="jdbc:mysql://localhost/jcr"/>
                        <param name="user" value=""/>
                        <param name="password" value=""/>
                        <param name="schema" value="mysql"/>
                        <param name="schemaObjectPrefix" value="J_C_"/>
                </Journal>
    </Cluster>-->
</Repository>






Medha C Sutaria/FSG/CSC@CSC 
11/06/2009 05:52 PM
Please respond to
users@jackrabbit.apache.org


To
users@jackrabbit.apache.org
cc

Subject
Re: jackrabbit configuration in clustered environment







Thanks Alex for the prompt reply! Your answers have really cleared some of 
the things. Few more queries inline - 

Alexander Klimetschek <ak...@day.com> wrote on 11/06/2009 02:40:38 PM:

> On Fri, Nov 6, 2009 at 08:39, Medha C Sutaria <ms...@csc.com> wrote:
> > We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). 
It
> > uses the following configuration -
> > JCRHook + PersistenceManager + File system
> >...
> > JCRHook vs FileSystemHook?
> > PersistenceManager vs Datastore?
> > FileSystem vs Database?
> > if filesystem, sharing the file system? or using SAN?
> 
> You need to be more specific. Which persistence manager are you using? 

Medha - we use BundleFsPersistenceManager. 

> 
> Quick notes:
> - bundle based persistence managers are best
> - local dbs (like derby or h2) have better performance than remote dbs 

Medha - Any idea about MySql? Saw some posts about table locking and 
concurrent access issues while retrieving/updating files 

> - datastore will only be used for large binaries; using filedatastore
> is a better choice than storing the binaries in a database (using a db
> pm)
> - FileSystem (element in repository.xml) is not important anymore,
> does not influence peformance 
Is it this tag you are talking about? If yes, then isn't this which 
decides if we want to store data in DB or on LocalFileSystem? 
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> 
        <param name="path" value="${rep.home}/repository" /> 
</FileSystem> 

> - JCRHook seems to be a proprietary liferay component, so we
> (jackrabbit devs) cannot give you any information on this
> - if you do clustering and use the datastore, you will need a shared
> file system, SAN is typically the best (but know your network
> performance) 
Will adding this tag in our repository.xml make the repository usable with 
SAN? (our repository.xml attached)   
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore"> 
        <param name="path" value="${rep.home}/repository/datastore"/> 
        <param name="minRecordLength" value="1000"/> 
</DataStore> 

> 
> > 1. To select a different solution, migration of current documents to 
the
> > new solutions has to be done
> 
> See here http://wiki.apache.org/jackrabbit/BackupAndMigration for 
> some options. 
I checked out these options in the past. I've learned that there's a 
problem in migrating versions. That part of the repository tree is secured 
and cannot be exported to an xml file. We needed migration of data when we 
tried to use DbFileSystem instead of LocalFileSystem. I guess we don't 
need to migrate in case of changing other configuration? Eg. using 
datastore? 

> 
> > 2. The repository is increasing exponentially. The file system size is
> > already 10 GB. Does jackrabbit support such large repositories? At 
what
> > point will the performance start degrading?
> 
> Depends on what configuration you actually use. 
Can you suggest which is the best configuration for clustering (based on 
performance and large repositories) 

> 
> > 3. What will it take to upgrade the jackrabbit version from 1.4 to 
1.6?
> > Will be require any migration?
> 
> AFAIK nothing would be required from 1.4 to 1.6. Minor version numbers
> are meant to be backwards compatible in Jackrabbit. 
This sounds great!
> 
> Regards,
> Alex
> 
> -- 
> Alexander Klimetschek
> alexander.klimetschek@day.com

Re: jackrabbit configuration in clustered environment

Posted by Medha C Sutaria <ms...@csc.com>.
Thanks Alex for the prompt reply! Your answers have really cleared some of 
the things. Few more queries inline -

Alexander Klimetschek <ak...@day.com> wrote on 11/06/2009 02:40:38 PM:

> On Fri, Nov 6, 2009 at 08:39, Medha C Sutaria <ms...@csc.com> wrote:
> > We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). 
It
> > uses the following configuration -
> > JCRHook + PersistenceManager + File system
> >...
> > JCRHook vs FileSystemHook?
> > PersistenceManager vs Datastore?
> > FileSystem vs Database?
> > if filesystem, sharing the file system? or using SAN?
> 
> You need to be more specific. Which persistence manager are you using?

Medha - we use BundleFsPersistenceManager.

> 
> Quick notes:
> - bundle based persistence managers are best
> - local dbs (like derby or h2) have better performance than remote dbs

Medha - Any idea about MySql? Saw some posts about table locking and 
concurrent access issues while retrieving/updating files

> - datastore will only be used for large binaries; using filedatastore
> is a better choice than storing the binaries in a database (using a db
> pm)
> - FileSystem (element in repository.xml) is not important anymore,
> does not influence peformance
Is it this tag you are talking about? If yes, then isn't this which 
decides if we want to store data in DB or on LocalFileSystem?
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
        <param name="path" value="${rep.home}/repository" />
</FileSystem>

> - JCRHook seems to be a proprietary liferay component, so we
> (jackrabbit devs) cannot give you any information on this
> - if you do clustering and use the datastore, you will need a shared
> file system, SAN is typically the best (but know your network
> performance)
Will adding this tag in our repository.xml make the repository usable with 
SAN? (our repository.xml attached)  
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore"> 
        <param name="path" value="${rep.home}/repository/datastore"/> 
        <param name="minRecordLength" value="1000"/> 
</DataStore>

> 
> > 1. To select a different solution, migration of current documents to 
the
> > new solutions has to be done
> 
> See here http://wiki.apache.org/jackrabbit/BackupAndMigration for 
> some options.
I checked out these options in the past. I've learned that there's a 
problem in migrating versions. That part of the repository tree is secured 
and cannot be exported to an xml file. We needed migration of data when we 
tried to use DbFileSystem instead of LocalFileSystem. I guess we don't 
need to migrate in case of changing other configuration? Eg. using 
datastore?

> 
> > 2. The repository is increasing exponentially. The file system size is
> > already 10 GB. Does jackrabbit support such large repositories? At 
what
> > point will the performance start degrading?
> 
> Depends on what configuration you actually use.
Can you suggest which is the best configuration for clustering (based on 
performance and large repositories)

> 
> > 3. What will it take to upgrade the jackrabbit version from 1.4 to 
1.6?
> > Will be require any migration?
> 
> AFAIK nothing would be required from 1.4 to 1.6. Minor version numbers
> are meant to be backwards compatible in Jackrabbit.
This sounds great!
> 
> Regards,
> Alex
> 
> -- 
> Alexander Klimetschek
> alexander.klimetschek@day.com

Re: jackrabbit configuration in clustered environment

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Nov 6, 2009 at 08:39, Medha C Sutaria <ms...@csc.com> wrote:
> We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). It
> uses the following configuration -
> JCRHook + PersistenceManager + File system
>...
> JCRHook vs FileSystemHook?
> PersistenceManager vs Datastore?
> FileSystem vs Database?
> if filesystem, sharing the file system? or using SAN?

You need to be more specific. Which persistence manager are you using?

Quick notes:
- bundle based persistence managers are best
- local dbs (like derby or h2) have better performance than remote dbs
- datastore will only be used for large binaries; using filedatastore
is a better choice than storing the binaries in a database (using a db
pm)
- FileSystem (element in repository.xml) is not important anymore,
does not influence peformance
- JCRHook seems to be a proprietary liferay component, so we
(jackrabbit devs) cannot give you any information on this
- if you do clustering and use the datastore, you will need a shared
file system, SAN is typically the best (but know your network
performance)

> 1. To select a different solution, migration of current documents to the
> new solutions has to be done

See here http://wiki.apache.org/jackrabbit/BackupAndMigration for some options.

> 2. The repository is increasing exponentially. The file system size is
> already 10 GB. Does jackrabbit support such large repositories? At what
> point will the performance start degrading?

Depends on what configuration you actually use.

> 3. What will it take to upgrade the jackrabbit version from 1.4 to 1.6?
> Will be require any migration?

AFAIK nothing would be required from 1.4 to 1.6. Minor version numbers
are meant to be backwards compatible in Jackrabbit.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com