You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by ni...@emeter.com on 2010/11/16 08:32:58 UTC

Multiple instances of repository

Hi,

I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
The application read the content from repository and also writes some content in repository.
There could be multiple concurrent instances of my application running on the same or different machines.
I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
After doing some search I came to know about running the jackrabbit in clustered mode.
Now my question is even in this case I will have to specify a different repository home for every run, right?
That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
This is a standalone java application and theoretically n number of instance can be run.
My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
Because .lock file will be different for different runs as the repository home is different.
I know I am missing something here, please help me.
I am attaching my conf file with this mail.

Thanks,
Nikhil


Re: Multiple instances of repository

Posted by Justin Edelson <ju...@justinedelson.com>.
On Wed, Nov 17, 2010 at 12:05 PM,  <ni...@emeter.com> wrote:
> So I will have to run a cluster configuration on this machine1, because I will have two independent JVMs hitting on
> the same repository?
Yes.

> I really don't want to run  cluster nodes on a single machine, just so that different JVMs can access the repository.
> That doesn't look correct. I am sure that will be better ways to solve this issue as well.
Although I suspect this isn't typical, there's nothing wrong with
this. Multiple JVMs = cluster nodes; doesn't really matter if they're
on the same physical machine or multiple physical machines.

Justin

>
> Any ideas will be of great help.
>
> -Nikhil
>
>
> -----Original Message-----
> From: justinedelson@gmail.com [mailto:justinedelson@gmail.com] On Behalf Of Justin Edelson
> Sent: Wednesday, November 17, 2010 12:12 AM
> To: users@jackrabbit.apache.org
> Subject: Re: Multiple instances of repository
>
> Nikhil-
> I think you should rethink you're architecture. It really doesn't make
> sense to be bringing repository instances up only for a 2-4 minute
> job. Instead, you should think about using the Command pattern and
> package your "applications" as executable jobs which can be run inside
> a long-running VM against a local repository instance (i.e. making
> in-process calls instead of RMI or DavEx).
>
> This is where something like OSGi and Apache Sling can be *very*
> helpful, but there are obviously other ways to add/remove jobs at
> runtime. See, for example, Sling's Scheduler support:
> http://sling.apache.org/site/scheduler-service-commons-scheduler.html
>
> Justin
>
> On Tue, Nov 16, 2010 at 5:16 AM,  <ni...@emeter.com> wrote:
>> Thanks for your inputs, they are really helpful.
>>
>> Well, so does my application is not a good candidate to use jackrabbit.
>>
>> The other option, I had was to use jackrabbit in client-server mode. In this case I will be accessing the repository from RMI. But in the jackrabbit documents it has been mentioned that RMI is not optimized for performance and I should use embedded repository instance in my application code for better performance.
>>
>> I can remove the search functionality from these clusters, because the life span of these will be very short. The application will take 2-4 minutes to do its job and I don't think we really need search for these clusters.
>>
>> But my question is, should I really use the clustering feature. I mean cluster nodes should normally have a longer life span. But here in this case the nodes will have very short life span 2-4 minutes.
>> I am kind of finding it hard to use these short span applications as cluster nodes.
>>
>> Thanks,
>> Nikhil
>>
>> -----Original Message-----
>> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
>> Sent: Tuesday, November 16, 2010 3:33 PM
>> To: users@jackrabbit.apache.org
>> Subject: AW: Multiple instances of repository
>>
>> Hi Nikhil,
>>
>> I don't know if it will work (setProperty), but you have another problem. The Lucene search index is always saved in the file system. And afaik, each repository home needs its own index directories (so you have the index files for each cluster). If you make a new cluster, you have to wait for a long time till the index is built, depending on the data in your repository (if you have tons of data, you have to wait a week or longer).
>>
>> The tables of the FS and PM will be shared between all cluster nodes - that works.
>>
>> Kindly regards, Robert
>>
>> -----Ursprüngliche Nachricht-----
>> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
>> Gesendet: Dienstag, 16. November 2010 10:54
>> An: users@jackrabbit.apache.org
>> Betreff: RE: Multiple instances of repository
>>
>> Since there could be n number of instances. So I can't decide the cluster id beforehand.
>> Hence I have the following code that creates a cluster id at run time.
>>
>> System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());
>>
>> Similarly the repositoryHome path is generated at run time.
>>
>> But do I also need separate tables for workspace file system? I have the following configuration for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager will be shared between all the nodes or will these tables will be different?
>>
>> <?xml version="1.0"?>
>> <!DOCTYPE Repository
>>          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
>>          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
>>
>> <Repository>
>>
>>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>>                <param name="driver" value="javax.naming.InitialContext"/>
>>                <param name="url" value="jdbc/amiDBDataSource"/>
>>                <param name="databaseType" value="oracle"/>
>>        <param name="copyWhenReading" value="true"/>
>>        <param name="tablePrefix" value=""/>
>>        <param name="schemaObjectPrefix" value="J_R_DS_"/>
>>        <param name="schemaCheckEnabled" value="false"/>
>>    </DataStore>
>>
>>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>>                <param name="driver" value="javax.naming.InitialContext"/>
>>                <param name="url" value="jdbc/amiDBDataSource"/>
>>                <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>>                <param name="schema" value="oracle"/>
>>                <param name="schemaObjectPrefix" value="J_R_FS_"/>
>>                <param name="schemaCheckEnabled" value="false"/>
>>        </FileSystem>
>>
>>        <Security appName="Jackrabbit">
>>                <SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager" />
>>                <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
>>                <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>>                        <param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider" />
>>                </LoginModule>
>>        </Security>
>>
>>        <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
>>
>>        <Workspace name="${wsp.name}">
>>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle"/>
>>                        <param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </FileSystem>
>>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <param name="tableSpace" value="" />
>>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle" />
>>                        <param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
>>                        <param name="externalBLOBs" value="false" />
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </PersistenceManager>
>>                <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>            <param name="path" value="${wsp.home}/index"/>
>>            <param name="supportHighlighting" value="true"/>
>>        </SearchIndex>
>>        </Workspace>
>>
>>        <Versioning rootPath="${rep.home}/version">
>>
>>                <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle"/>
>>                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </FileSystem>
>>                <!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
>>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <param name="tableSpace" value="" />
>>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>>                        <param name="schema" value="oracle" />
>>                        <param name="schemaObjectPrefix" value="J_V_PM_" />
>>                        <param name="externalBLOBs" value="false" />
>>                        <param name="schemaCheckEnabled" value="false"/>
>>                </PersistenceManager>
>>
>>        </Versioning>
>>
>>    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>        <param name="path" value="${rep.home}/search/index"/>
>>        <param name="supportHighlighting" value="true"/>
>>    </SearchIndex>
>>
>>        <Cluster syncDelay="2000">
>>                <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>>                <param name="revision" value="${rep.home}/revision.log" />
>>                        <param name="driver" value="javax.naming.InitialContext"/>
>>                        <param name="url" value="jdbc/amiDBDataSource"/>
>>                        <param name="schemaObjectPrefix" value="J_R_" />
>>                        <param name="databaseType" value="oracle"/>
>>                </Journal>
>>        </Cluster>
>>
>> </Repository>
>>
>> Thanks,
>> Nikhil
>> -----Original Message-----
>> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
>> Sent: Tuesday, November 16, 2010 2:42 PM
>> To: users@jackrabbit.apache.org
>> Subject: AW: Multiple instances of repository
>>
>> Hi Nikhil,
>>
>> you need clustering, because all of your instances should access the same repository.
>>
>> What you need is separate repository homes for each instance. In my use case I have an installation directory for each instance, so the repository home is located below this directory.
>>
>> You have to make sure, that each instance has also its own repository.xml because you need to define different clusterIDs.
>>
>> And you have to define a cluster section in the repository.xml where the journal is located, which is necessary for synchronization:
>>
>>    <Cluster id="node1" syncDelay="5000">
>>      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>>        <param name="driver" value="javax.naming.InitialContext"/>
>>        <param name="url" value="jdbc/amiDBDataSource"/>
>>          ...
>>      </Journal>
>>    </Cluster>
>>
>> Kindly regards, Robert
>>
>> -----Ursprüngliche Nachricht-----
>> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
>> Gesendet: Dienstag, 16. November 2010 09:37
>> An: users@jackrabbit.apache.org
>> Betreff: RE: Multiple instances of repository
>>
>> Thanks for replying back. I will need little more help to understand the things completely.
>> I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.
>>
>> In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
>> I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
>> Now this application reads some files from repository and also inserts some data in repository.
>> There could be two instances of the application app1 running on machine1 and app2 running on machine2.
>> So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
>> So if all the application instances use the same repository configuration file and specify their own repository home.
>> Will that work in my case? Will there be any consistency issues?
>>
>> When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.
>>
>> My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.
>>
>> Thanks,
>> Nikhil
>>
>> -----Original Message-----
>> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
>> Sent: Tuesday, November 16, 2010 1:22 PM
>> To: users@jackrabbit.apache.org
>> Subject: AW: Multiple instances of repository
>>
>> Hi Nikhil,
>>
>> if you want to use clustering, you have to define a repository home for each cluster.
>>
>> Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.
>>
>> If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.
>>
>> Kindly regards, Robert
>>
>> -----Ursprüngliche Nachricht-----
>> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
>> Gesendet: Dienstag, 16. November 2010 08:33
>> An: users@jackrabbit.apache.org
>> Betreff: Multiple instances of repository
>>
>> Hi,
>>
>> I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
>> The application read the content from repository and also writes some content in repository.
>> There could be multiple concurrent instances of my application running on the same or different machines.
>> I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
>> Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
>> After doing some search I came to know about running the jackrabbit in clustered mode.
>> Now my question is even in this case I will have to specify a different repository home for every run, right?
>> That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
>> This is a standalone java application and theoretically n number of instance can be run.
>> My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
>> Because .lock file will be different for different runs as the repository home is different.
>> I know I am missing something here, please help me.
>> I am attaching my conf file with this mail.
>>
>> Thanks,
>> Nikhil
>>
>>
>

AW: Multiple instances of repository

Posted by "Seidel. Robert" <Ro...@aeb.de>.
Hi Nikhil,

why don't you just use one 24x7 server jvm hitting the jackrabbit repository? If one of the hundred jvms want something from the repository, they have to make a web service call to your server instance, which gets the job done.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
Gesendet: Mittwoch, 17. November 2010 18:06
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

I am really thankful for all the suggestions.
I am not an expert in architecting the applications and the answers are really providing me lots of help.

Justin as you suggested, I think there is a need to change the architecture.
Let's say I restructure my application, let's call it app1, such that it's 24X7 type of application.
It will wait for a job and may be some scheduler ( quartz may be) will provide it a job instance to run.
Now this application 'app1' can be run on two different machines (in a clustered environment) and in that case these two jackrabbit repository instances should be configured as a cluster, right?
But I will also have a web-application that will also hit the repository instance. Right now it just reads the content from repository but in future it might write into the repository as well. This web application can be also run on machine 1 and machine2.
So now on machine 1, I will have one web-application and one other 24X7 application and they both will be hitting the jackrabbit repository.
So I will have to run a cluster configuration on this machine1, because I will have two independent JVMs hitting on the same repository?
I really don't want to run  cluster nodes on a single machine, just so that different JVMs can access the repository. That doesn't look correct. I am sure that will be better ways to solve this issue as well.

Any ideas will be of great help.

-Nikhil


-----Original Message-----
From: justinedelson@gmail.com [mailto:justinedelson@gmail.com] On Behalf Of Justin Edelson
Sent: Wednesday, November 17, 2010 12:12 AM
To: users@jackrabbit.apache.org
Subject: Re: Multiple instances of repository

Nikhil-
I think you should rethink you're architecture. It really doesn't make
sense to be bringing repository instances up only for a 2-4 minute
job. Instead, you should think about using the Command pattern and
package your "applications" as executable jobs which can be run inside
a long-running VM against a local repository instance (i.e. making
in-process calls instead of RMI or DavEx).

This is where something like OSGi and Apache Sling can be *very*
helpful, but there are obviously other ways to add/remove jobs at
runtime. See, for example, Sling's Scheduler support:
http://sling.apache.org/site/scheduler-service-commons-scheduler.html

Justin

On Tue, Nov 16, 2010 at 5:16 AM,  <ni...@emeter.com> wrote:
> Thanks for your inputs, they are really helpful.
>
> Well, so does my application is not a good candidate to use jackrabbit.
>
> The other option, I had was to use jackrabbit in client-server mode. In this case I will be accessing the repository from RMI. But in the jackrabbit documents it has been mentioned that RMI is not optimized for performance and I should use embedded repository instance in my application code for better performance.
>
> I can remove the search functionality from these clusters, because the life span of these will be very short. The application will take 2-4 minutes to do its job and I don't think we really need search for these clusters.
>
> But my question is, should I really use the clustering feature. I mean cluster nodes should normally have a longer life span. But here in this case the nodes will have very short life span 2-4 minutes.
> I am kind of finding it hard to use these short span applications as cluster nodes.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 3:33 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> I don't know if it will work (setProperty), but you have another problem. The Lucene search index is always saved in the file system. And afaik, each repository home needs its own index directories (so you have the index files for each cluster). If you make a new cluster, you have to wait for a long time till the index is built, depending on the data in your repository (if you have tons of data, you have to wait a week or longer).
>
> The tables of the FS and PM will be shared between all cluster nodes - that works.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 10:54
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Since there could be n number of instances. So I can't decide the cluster id beforehand.
> Hence I have the following code that creates a cluster id at run time.
>
> System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());
>
> Similarly the repositoryHome path is generated at run time.
>
> But do I also need separate tables for workspace file system? I have the following configuration for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager will be shared between all the nodes or will these tables will be different?
>
> <?xml version="1.0"?>
> <!DOCTYPE Repository
>          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
>          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
>
> <Repository>
>
>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <param name="databaseType" value="oracle"/>
>        <param name="copyWhenReading" value="true"/>
>        <param name="tablePrefix" value=""/>
>        <param name="schemaObjectPrefix" value="J_R_DS_"/>
>        <param name="schemaCheckEnabled" value="false"/>
>    </DataStore>
>
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                <param name="schema" value="oracle"/>
>                <param name="schemaObjectPrefix" value="J_R_FS_"/>
>                <param name="schemaCheckEnabled" value="false"/>
>        </FileSystem>
>
>        <Security appName="Jackrabbit">
>                <SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager" />
>                <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
>                <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>                        <param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider" />
>                </LoginModule>
>        </Security>
>
>        <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
>
>        <Workspace name="${wsp.name}">
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>                <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>            <param name="path" value="${wsp.home}/index"/>
>            <param name="supportHighlighting" value="true"/>
>        </SearchIndex>
>        </Workspace>
>
>        <Versioning rootPath="${rep.home}/version">
>
>                <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" value="J_V_PM_" />
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>
>        </Versioning>
>
>    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>        <param name="path" value="${rep.home}/search/index"/>
>        <param name="supportHighlighting" value="true"/>
>    </SearchIndex>
>
>        <Cluster syncDelay="2000">
>                <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>                <param name="revision" value="${rep.home}/revision.log" />
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="schemaObjectPrefix" value="J_R_" />
>                        <param name="databaseType" value="oracle"/>
>                </Journal>
>        </Cluster>
>
> </Repository>
>
> Thanks,
> Nikhil
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 2:42 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> you need clustering, because all of your instances should access the same repository.
>
> What you need is separate repository homes for each instance. In my use case I have an installation directory for each instance, so the repository home is located below this directory.
>
> You have to make sure, that each instance has also its own repository.xml because you need to define different clusterIDs.
>
> And you have to define a cluster section in the repository.xml where the journal is located, which is necessary for synchronization:
>
>    <Cluster id="node1" syncDelay="5000">
>      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>        <param name="driver" value="javax.naming.InitialContext"/>
>        <param name="url" value="jdbc/amiDBDataSource"/>
>          ...
>      </Journal>
>    </Cluster>
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 09:37
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Thanks for replying back. I will need little more help to understand the things completely.
> I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.
>
> In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
> I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
> Now this application reads some files from repository and also inserts some data in repository.
> There could be two instances of the application app1 running on machine1 and app2 running on machine2.
> So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
> So if all the application instances use the same repository configuration file and specify their own repository home.
> Will that work in my case? Will there be any consistency issues?
>
> When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.
>
> My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 1:22 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> if you want to use clustering, you have to define a repository home for each cluster.
>
> Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.
>
> If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 08:33
> An: users@jackrabbit.apache.org
> Betreff: Multiple instances of repository
>
> Hi,
>
> I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
> The application read the content from repository and also writes some content in repository.
> There could be multiple concurrent instances of my application running on the same or different machines.
> I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
> Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
> After doing some search I came to know about running the jackrabbit in clustered mode.
> Now my question is even in this case I will have to specify a different repository home for every run, right?
> That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
> This is a standalone java application and theoretically n number of instance can be run.
> My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
> Because .lock file will be different for different runs as the repository home is different.
> I know I am missing something here, please help me.
> I am attaching my conf file with this mail.
>
> Thanks,
> Nikhil
>
>

RE: Multiple instances of repository

Posted by ni...@emeter.com.
I am really thankful for all the suggestions.
I am not an expert in architecting the applications and the answers are really providing me lots of help.

Justin as you suggested, I think there is a need to change the architecture.
Let's say I restructure my application, let's call it app1, such that it's 24X7 type of application.
It will wait for a job and may be some scheduler ( quartz may be) will provide it a job instance to run.
Now this application 'app1' can be run on two different machines (in a clustered environment) and in that case these two jackrabbit repository instances should be configured as a cluster, right?
But I will also have a web-application that will also hit the repository instance. Right now it just reads the content from repository but in future it might write into the repository as well. This web application can be also run on machine 1 and machine2.
So now on machine 1, I will have one web-application and one other 24X7 application and they both will be hitting the jackrabbit repository.
So I will have to run a cluster configuration on this machine1, because I will have two independent JVMs hitting on the same repository?
I really don't want to run  cluster nodes on a single machine, just so that different JVMs can access the repository. That doesn't look correct. I am sure that will be better ways to solve this issue as well.

Any ideas will be of great help.

-Nikhil


-----Original Message-----
From: justinedelson@gmail.com [mailto:justinedelson@gmail.com] On Behalf Of Justin Edelson
Sent: Wednesday, November 17, 2010 12:12 AM
To: users@jackrabbit.apache.org
Subject: Re: Multiple instances of repository

Nikhil-
I think you should rethink you're architecture. It really doesn't make
sense to be bringing repository instances up only for a 2-4 minute
job. Instead, you should think about using the Command pattern and
package your "applications" as executable jobs which can be run inside
a long-running VM against a local repository instance (i.e. making
in-process calls instead of RMI or DavEx).

This is where something like OSGi and Apache Sling can be *very*
helpful, but there are obviously other ways to add/remove jobs at
runtime. See, for example, Sling's Scheduler support:
http://sling.apache.org/site/scheduler-service-commons-scheduler.html

Justin

On Tue, Nov 16, 2010 at 5:16 AM,  <ni...@emeter.com> wrote:
> Thanks for your inputs, they are really helpful.
>
> Well, so does my application is not a good candidate to use jackrabbit.
>
> The other option, I had was to use jackrabbit in client-server mode. In this case I will be accessing the repository from RMI. But in the jackrabbit documents it has been mentioned that RMI is not optimized for performance and I should use embedded repository instance in my application code for better performance.
>
> I can remove the search functionality from these clusters, because the life span of these will be very short. The application will take 2-4 minutes to do its job and I don't think we really need search for these clusters.
>
> But my question is, should I really use the clustering feature. I mean cluster nodes should normally have a longer life span. But here in this case the nodes will have very short life span 2-4 minutes.
> I am kind of finding it hard to use these short span applications as cluster nodes.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 3:33 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> I don't know if it will work (setProperty), but you have another problem. The Lucene search index is always saved in the file system. And afaik, each repository home needs its own index directories (so you have the index files for each cluster). If you make a new cluster, you have to wait for a long time till the index is built, depending on the data in your repository (if you have tons of data, you have to wait a week or longer).
>
> The tables of the FS and PM will be shared between all cluster nodes - that works.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 10:54
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Since there could be n number of instances. So I can't decide the cluster id beforehand.
> Hence I have the following code that creates a cluster id at run time.
>
> System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());
>
> Similarly the repositoryHome path is generated at run time.
>
> But do I also need separate tables for workspace file system? I have the following configuration for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager will be shared between all the nodes or will these tables will be different?
>
> <?xml version="1.0"?>
> <!DOCTYPE Repository
>          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
>          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
>
> <Repository>
>
>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <param name="databaseType" value="oracle"/>
>        <param name="copyWhenReading" value="true"/>
>        <param name="tablePrefix" value=""/>
>        <param name="schemaObjectPrefix" value="J_R_DS_"/>
>        <param name="schemaCheckEnabled" value="false"/>
>    </DataStore>
>
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                <param name="schema" value="oracle"/>
>                <param name="schemaObjectPrefix" value="J_R_FS_"/>
>                <param name="schemaCheckEnabled" value="false"/>
>        </FileSystem>
>
>        <Security appName="Jackrabbit">
>                <SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager" />
>                <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
>                <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>                        <param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider" />
>                </LoginModule>
>        </Security>
>
>        <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
>
>        <Workspace name="${wsp.name}">
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>                <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>            <param name="path" value="${wsp.home}/index"/>
>            <param name="supportHighlighting" value="true"/>
>        </SearchIndex>
>        </Workspace>
>
>        <Versioning rootPath="${rep.home}/version">
>
>                <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" value="J_V_PM_" />
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>
>        </Versioning>
>
>    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>        <param name="path" value="${rep.home}/search/index"/>
>        <param name="supportHighlighting" value="true"/>
>    </SearchIndex>
>
>        <Cluster syncDelay="2000">
>                <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>                <param name="revision" value="${rep.home}/revision.log" />
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="schemaObjectPrefix" value="J_R_" />
>                        <param name="databaseType" value="oracle"/>
>                </Journal>
>        </Cluster>
>
> </Repository>
>
> Thanks,
> Nikhil
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 2:42 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> you need clustering, because all of your instances should access the same repository.
>
> What you need is separate repository homes for each instance. In my use case I have an installation directory for each instance, so the repository home is located below this directory.
>
> You have to make sure, that each instance has also its own repository.xml because you need to define different clusterIDs.
>
> And you have to define a cluster section in the repository.xml where the journal is located, which is necessary for synchronization:
>
>    <Cluster id="node1" syncDelay="5000">
>      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>        <param name="driver" value="javax.naming.InitialContext"/>
>        <param name="url" value="jdbc/amiDBDataSource"/>
>          ...
>      </Journal>
>    </Cluster>
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 09:37
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Thanks for replying back. I will need little more help to understand the things completely.
> I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.
>
> In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
> I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
> Now this application reads some files from repository and also inserts some data in repository.
> There could be two instances of the application app1 running on machine1 and app2 running on machine2.
> So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
> So if all the application instances use the same repository configuration file and specify their own repository home.
> Will that work in my case? Will there be any consistency issues?
>
> When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.
>
> My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 1:22 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> if you want to use clustering, you have to define a repository home for each cluster.
>
> Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.
>
> If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 08:33
> An: users@jackrabbit.apache.org
> Betreff: Multiple instances of repository
>
> Hi,
>
> I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
> The application read the content from repository and also writes some content in repository.
> There could be multiple concurrent instances of my application running on the same or different machines.
> I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
> Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
> After doing some search I came to know about running the jackrabbit in clustered mode.
> Now my question is even in this case I will have to specify a different repository home for every run, right?
> That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
> This is a standalone java application and theoretically n number of instance can be run.
> My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
> Because .lock file will be different for different runs as the repository home is different.
> I know I am missing something here, please help me.
> I am attaching my conf file with this mail.
>
> Thanks,
> Nikhil
>
>

Re: Multiple instances of repository

Posted by Justin Edelson <ju...@justinedelson.com>.
Nikhil-
I think you should rethink you're architecture. It really doesn't make
sense to be bringing repository instances up only for a 2-4 minute
job. Instead, you should think about using the Command pattern and
package your "applications" as executable jobs which can be run inside
a long-running VM against a local repository instance (i.e. making
in-process calls instead of RMI or DavEx).

This is where something like OSGi and Apache Sling can be *very*
helpful, but there are obviously other ways to add/remove jobs at
runtime. See, for example, Sling's Scheduler support:
http://sling.apache.org/site/scheduler-service-commons-scheduler.html

Justin

On Tue, Nov 16, 2010 at 5:16 AM,  <ni...@emeter.com> wrote:
> Thanks for your inputs, they are really helpful.
>
> Well, so does my application is not a good candidate to use jackrabbit.
>
> The other option, I had was to use jackrabbit in client-server mode. In this case I will be accessing the repository from RMI. But in the jackrabbit documents it has been mentioned that RMI is not optimized for performance and I should use embedded repository instance in my application code for better performance.
>
> I can remove the search functionality from these clusters, because the life span of these will be very short. The application will take 2-4 minutes to do its job and I don't think we really need search for these clusters.
>
> But my question is, should I really use the clustering feature. I mean cluster nodes should normally have a longer life span. But here in this case the nodes will have very short life span 2-4 minutes.
> I am kind of finding it hard to use these short span applications as cluster nodes.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 3:33 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> I don't know if it will work (setProperty), but you have another problem. The Lucene search index is always saved in the file system. And afaik, each repository home needs its own index directories (so you have the index files for each cluster). If you make a new cluster, you have to wait for a long time till the index is built, depending on the data in your repository (if you have tons of data, you have to wait a week or longer).
>
> The tables of the FS and PM will be shared between all cluster nodes - that works.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 10:54
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Since there could be n number of instances. So I can't decide the cluster id beforehand.
> Hence I have the following code that creates a cluster id at run time.
>
> System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());
>
> Similarly the repositoryHome path is generated at run time.
>
> But do I also need separate tables for workspace file system? I have the following configuration for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager will be shared between all the nodes or will these tables will be different?
>
> <?xml version="1.0"?>
> <!DOCTYPE Repository
>          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
>          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
>
> <Repository>
>
>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <param name="databaseType" value="oracle"/>
>        <param name="copyWhenReading" value="true"/>
>        <param name="tablePrefix" value=""/>
>        <param name="schemaObjectPrefix" value="J_R_DS_"/>
>        <param name="schemaCheckEnabled" value="false"/>
>    </DataStore>
>
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                <param name="driver" value="javax.naming.InitialContext"/>
>                <param name="url" value="jdbc/amiDBDataSource"/>
>                <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                <param name="schema" value="oracle"/>
>                <param name="schemaObjectPrefix" value="J_R_FS_"/>
>                <param name="schemaCheckEnabled" value="false"/>
>        </FileSystem>
>
>        <Security appName="Jackrabbit">
>                <SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager" />
>                <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
>                <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>                        <param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider" />
>                </LoginModule>
>        </Security>
>
>        <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
>
>        <Workspace name="${wsp.name}">
>        <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>                <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>            <param name="path" value="${wsp.home}/index"/>
>            <param name="supportHighlighting" value="true"/>
>        </SearchIndex>
>        </Workspace>
>
>        <Versioning rootPath="${rep.home}/version">
>
>                <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle"/>
>                        <param name="schemaObjectPrefix" value="J_V_FS_"/>
>                        <param name="schemaCheckEnabled" value="false"/>
>                </FileSystem>
>                <!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
>                <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="tableSpace" value="" />
>                        <!-- The following value must oracle for oracle server this is not the same as the database schema -->
>                        <param name="schema" value="oracle" />
>                        <param name="schemaObjectPrefix" value="J_V_PM_" />
>                        <param name="externalBLOBs" value="false" />
>                        <param name="schemaCheckEnabled" value="false"/>
>                </PersistenceManager>
>
>        </Versioning>
>
>    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>        <param name="path" value="${rep.home}/search/index"/>
>        <param name="supportHighlighting" value="true"/>
>    </SearchIndex>
>
>        <Cluster syncDelay="2000">
>                <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>                <param name="revision" value="${rep.home}/revision.log" />
>                        <param name="driver" value="javax.naming.InitialContext"/>
>                        <param name="url" value="jdbc/amiDBDataSource"/>
>                        <param name="schemaObjectPrefix" value="J_R_" />
>                        <param name="databaseType" value="oracle"/>
>                </Journal>
>        </Cluster>
>
> </Repository>
>
> Thanks,
> Nikhil
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 2:42 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> you need clustering, because all of your instances should access the same repository.
>
> What you need is separate repository homes for each instance. In my use case I have an installation directory for each instance, so the repository home is located below this directory.
>
> You have to make sure, that each instance has also its own repository.xml because you need to define different clusterIDs.
>
> And you have to define a cluster section in the repository.xml where the journal is located, which is necessary for synchronization:
>
>    <Cluster id="node1" syncDelay="5000">
>      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
>        <param name="driver" value="javax.naming.InitialContext"/>
>        <param name="url" value="jdbc/amiDBDataSource"/>
>          ...
>      </Journal>
>    </Cluster>
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 09:37
> An: users@jackrabbit.apache.org
> Betreff: RE: Multiple instances of repository
>
> Thanks for replying back. I will need little more help to understand the things completely.
> I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.
>
> In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
> I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
> Now this application reads some files from repository and also inserts some data in repository.
> There could be two instances of the application app1 running on machine1 and app2 running on machine2.
> So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
> So if all the application instances use the same repository configuration file and specify their own repository home.
> Will that work in my case? Will there be any consistency issues?
>
> When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.
>
> My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.
>
> Thanks,
> Nikhil
>
> -----Original Message-----
> From: Seidel. Robert [mailto:Robert.Seidel@aeb.de]
> Sent: Tuesday, November 16, 2010 1:22 PM
> To: users@jackrabbit.apache.org
> Subject: AW: Multiple instances of repository
>
> Hi Nikhil,
>
> if you want to use clustering, you have to define a repository home for each cluster.
>
> Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.
>
> If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.
>
> Kindly regards, Robert
>
> -----Ursprüngliche Nachricht-----
> Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com]
> Gesendet: Dienstag, 16. November 2010 08:33
> An: users@jackrabbit.apache.org
> Betreff: Multiple instances of repository
>
> Hi,
>
> I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
> The application read the content from repository and also writes some content in repository.
> There could be multiple concurrent instances of my application running on the same or different machines.
> I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
> Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
> After doing some search I came to know about running the jackrabbit in clustered mode.
> Now my question is even in this case I will have to specify a different repository home for every run, right?
> That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
> This is a standalone java application and theoretically n number of instance can be run.
> My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
> Because .lock file will be different for different runs as the repository home is different.
> I know I am missing something here, please help me.
> I am attaching my conf file with this mail.
>
> Thanks,
> Nikhil
>
>

AW: Multiple instances of repository

Posted by "Seidel. Robert" <Ro...@aeb.de>.
Hi Nikhil,

you can also set up one server which accesses the jackrabbit repository and all instances use this server (for example by web service calls). You can have multiple sessions/connections in one jackrabbit instance (a session for each client). In this case you have only one jackrabbit instance, so no clustering is necessary.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 11:17
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

Thanks for your inputs, they are really helpful.

Well, so does my application is not a good candidate to use jackrabbit.

The other option, I had was to use jackrabbit in client-server mode. In this case I will be accessing the repository from RMI. But in the jackrabbit documents it has been mentioned that RMI is not optimized for performance and I should use embedded repository instance in my application code for better performance.

I can remove the search functionality from these clusters, because the life span of these will be very short. The application will take 2-4 minutes to do its job and I don't think we really need search for these clusters. 

But my question is, should I really use the clustering feature. I mean cluster nodes should normally have a longer life span. But here in this case the nodes will have very short life span 2-4 minutes.
I am kind of finding it hard to use these short span applications as cluster nodes.

Thanks,
Nikhil

-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 3:33 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

I don't know if it will work (setProperty), but you have another problem. The Lucene search index is always saved in the file system. And afaik, each repository home needs its own index directories (so you have the index files for each cluster). If you make a new cluster, you have to wait for a long time till the index is built, depending on the data in your repository (if you have tons of data, you have to wait a week or longer).

The tables of the FS and PM will be shared between all cluster nodes - that works.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 10:54
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

Since there could be n number of instances. So I can't decide the cluster id beforehand.
Hence I have the following code that creates a cluster id at run time.

System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());

Similarly the repositoryHome path is generated at run time.

But do I also need separate tables for workspace file system? I have the following configuration for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager will be shared between all the nodes or will these tables will be different?

<?xml version="1.0"?>
<!DOCTYPE Repository
          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">

<Repository>

     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
 		<param name="driver" value="javax.naming.InitialContext"/>
  		<param name="url" value="jdbc/amiDBDataSource"/>
		<param name="databaseType" value="oracle"/>  		
       	<param name="copyWhenReading" value="true"/>
        <param name="tablePrefix" value=""/>
        <param name="schemaObjectPrefix" value="J_R_DS_"/>
        <param name="schemaCheckEnabled" value="false"/> 
    </DataStore>

	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
  		<param name="driver" value="javax.naming.InitialContext"/>
  		<param name="url" value="jdbc/amiDBDataSource"/>
		<!-- The following value must oracle for oracle server this is not the same as the database schema -->
		<param name="schema" value="oracle"/>
		<param name="schemaObjectPrefix" value="J_R_FS_"/>
		<param name="schemaCheckEnabled" value="false"/> 
	</FileSystem>

	<Security appName="Jackrabbit">
		<SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager" />
		<AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
		<LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
			<param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider" />
		</LoginModule>
	</Security>
	
	<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
	
	<Workspace name="${wsp.name}">
	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle"/>
			<param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
			<param name="schemaCheckEnabled" value="false"/> 
		</FileSystem>
		<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<param name="tableSpace" value="" />
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle" />
			<param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
			<param name="externalBLOBs" value="false" />
			<param name="schemaCheckEnabled" value="false"/> 
		</PersistenceManager>
		<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>
	</Workspace>
	
	<Versioning rootPath="${rep.home}/version">
		
		<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle"/>
			<param name="schemaObjectPrefix" value="J_V_FS_"/>
			<param name="schemaCheckEnabled" value="false"/> 
		</FileSystem>
		<!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
		<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<param name="tableSpace" value="" />
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle" />
			<param name="schemaObjectPrefix" value="J_V_PM_" />
			<param name="externalBLOBs" value="false" />
			<param name="schemaCheckEnabled" value="false"/> 
		</PersistenceManager>

	</Versioning>
	
    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
        <param name="path" value="${rep.home}/search/index"/>
        <param name="supportHighlighting" value="true"/>
    </SearchIndex>
    
	<Cluster syncDelay="2000">
  		<Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
    		<param name="revision" value="${rep.home}/revision.log" />
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
  			<param name="schemaObjectPrefix" value="J_R_" />
  			<param name="databaseType" value="oracle"/>
  		</Journal>
	</Cluster> 
	  
</Repository>

Thanks,
Nikhil
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 2:42 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

you need clustering, because all of your instances should access the same repository.

What you need is separate repository homes for each instance. In my use case I have an installation directory for each instance, so the repository home is located below this directory.

You have to make sure, that each instance has also its own repository.xml because you need to define different clusterIDs.

And you have to define a cluster section in the repository.xml where the journal is located, which is necessary for synchronization:

    <Cluster id="node1" syncDelay="5000">
      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
        <param name="driver" value="javax.naming.InitialContext"/>
        <param name="url" value="jdbc/amiDBDataSource"/>
	  ...  
      </Journal>
    </Cluster>

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 09:37
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

Thanks for replying back. I will need little more help to understand the things completely.
I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.

In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
Now this application reads some files from repository and also inserts some data in repository.
There could be two instances of the application app1 running on machine1 and app2 running on machine2.
So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
So if all the application instances use the same repository configuration file and specify their own repository home.
Will that work in my case? Will there be any consistency issues?

When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.

My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.

Thanks,
Nikhil
 
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 1:22 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

if you want to use clustering, you have to define a repository home for each cluster. 

Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.

If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 08:33
An: users@jackrabbit.apache.org
Betreff: Multiple instances of repository

Hi,

I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
The application read the content from repository and also writes some content in repository.
There could be multiple concurrent instances of my application running on the same or different machines.
I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
After doing some search I came to know about running the jackrabbit in clustered mode.
Now my question is even in this case I will have to specify a different repository home for every run, right?
That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
This is a standalone java application and theoretically n number of instance can be run.
My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
Because .lock file will be different for different runs as the repository home is different.
I know I am missing something here, please help me.
I am attaching my conf file with this mail.

Thanks,
Nikhil


RE: Multiple instances of repository

Posted by ni...@emeter.com.
Thanks for your inputs, they are really helpful.

Well, so does my application is not a good candidate to use jackrabbit.

The other option, I had was to use jackrabbit in client-server mode. In this case I will be accessing the repository from RMI. But in the jackrabbit documents it has been mentioned that RMI is not optimized for performance and I should use embedded repository instance in my application code for better performance.

I can remove the search functionality from these clusters, because the life span of these will be very short. The application will take 2-4 minutes to do its job and I don't think we really need search for these clusters. 

But my question is, should I really use the clustering feature. I mean cluster nodes should normally have a longer life span. But here in this case the nodes will have very short life span 2-4 minutes.
I am kind of finding it hard to use these short span applications as cluster nodes.

Thanks,
Nikhil

-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 3:33 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

I don't know if it will work (setProperty), but you have another problem. The Lucene search index is always saved in the file system. And afaik, each repository home needs its own index directories (so you have the index files for each cluster). If you make a new cluster, you have to wait for a long time till the index is built, depending on the data in your repository (if you have tons of data, you have to wait a week or longer).

The tables of the FS and PM will be shared between all cluster nodes - that works.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 10:54
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

Since there could be n number of instances. So I can't decide the cluster id beforehand.
Hence I have the following code that creates a cluster id at run time.

System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());

Similarly the repositoryHome path is generated at run time.

But do I also need separate tables for workspace file system? I have the following configuration for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager will be shared between all the nodes or will these tables will be different?

<?xml version="1.0"?>
<!DOCTYPE Repository
          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">

<Repository>

     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
 		<param name="driver" value="javax.naming.InitialContext"/>
  		<param name="url" value="jdbc/amiDBDataSource"/>
		<param name="databaseType" value="oracle"/>  		
       	<param name="copyWhenReading" value="true"/>
        <param name="tablePrefix" value=""/>
        <param name="schemaObjectPrefix" value="J_R_DS_"/>
        <param name="schemaCheckEnabled" value="false"/> 
    </DataStore>

	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
  		<param name="driver" value="javax.naming.InitialContext"/>
  		<param name="url" value="jdbc/amiDBDataSource"/>
		<!-- The following value must oracle for oracle server this is not the same as the database schema -->
		<param name="schema" value="oracle"/>
		<param name="schemaObjectPrefix" value="J_R_FS_"/>
		<param name="schemaCheckEnabled" value="false"/> 
	</FileSystem>

	<Security appName="Jackrabbit">
		<SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager" />
		<AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
		<LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
			<param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider" />
		</LoginModule>
	</Security>
	
	<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
	
	<Workspace name="${wsp.name}">
	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle"/>
			<param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
			<param name="schemaCheckEnabled" value="false"/> 
		</FileSystem>
		<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<param name="tableSpace" value="" />
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle" />
			<param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
			<param name="externalBLOBs" value="false" />
			<param name="schemaCheckEnabled" value="false"/> 
		</PersistenceManager>
		<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>
	</Workspace>
	
	<Versioning rootPath="${rep.home}/version">
		
		<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle"/>
			<param name="schemaObjectPrefix" value="J_V_FS_"/>
			<param name="schemaCheckEnabled" value="false"/> 
		</FileSystem>
		<!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
		<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<param name="tableSpace" value="" />
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle" />
			<param name="schemaObjectPrefix" value="J_V_PM_" />
			<param name="externalBLOBs" value="false" />
			<param name="schemaCheckEnabled" value="false"/> 
		</PersistenceManager>

	</Versioning>
	
    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
        <param name="path" value="${rep.home}/search/index"/>
        <param name="supportHighlighting" value="true"/>
    </SearchIndex>
    
	<Cluster syncDelay="2000">
  		<Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
    		<param name="revision" value="${rep.home}/revision.log" />
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
  			<param name="schemaObjectPrefix" value="J_R_" />
  			<param name="databaseType" value="oracle"/>
  		</Journal>
	</Cluster> 
	  
</Repository>

Thanks,
Nikhil
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 2:42 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

you need clustering, because all of your instances should access the same repository.

What you need is separate repository homes for each instance. In my use case I have an installation directory for each instance, so the repository home is located below this directory.

You have to make sure, that each instance has also its own repository.xml because you need to define different clusterIDs.

And you have to define a cluster section in the repository.xml where the journal is located, which is necessary for synchronization:

    <Cluster id="node1" syncDelay="5000">
      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
        <param name="driver" value="javax.naming.InitialContext"/>
        <param name="url" value="jdbc/amiDBDataSource"/>
	  ...  
      </Journal>
    </Cluster>

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 09:37
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

Thanks for replying back. I will need little more help to understand the things completely.
I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.

In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
Now this application reads some files from repository and also inserts some data in repository.
There could be two instances of the application app1 running on machine1 and app2 running on machine2.
So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
So if all the application instances use the same repository configuration file and specify their own repository home.
Will that work in my case? Will there be any consistency issues?

When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.

My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.

Thanks,
Nikhil
 
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 1:22 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

if you want to use clustering, you have to define a repository home for each cluster. 

Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.

If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 08:33
An: users@jackrabbit.apache.org
Betreff: Multiple instances of repository

Hi,

I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
The application read the content from repository and also writes some content in repository.
There could be multiple concurrent instances of my application running on the same or different machines.
I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
After doing some search I came to know about running the jackrabbit in clustered mode.
Now my question is even in this case I will have to specify a different repository home for every run, right?
That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
This is a standalone java application and theoretically n number of instance can be run.
My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
Because .lock file will be different for different runs as the repository home is different.
I know I am missing something here, please help me.
I am attaching my conf file with this mail.

Thanks,
Nikhil


AW: Multiple instances of repository

Posted by "Seidel. Robert" <Ro...@aeb.de>.
Hi Nikhil,

I don't know if it will work (setProperty), but you have another problem. The Lucene search index is always saved in the file system. And afaik, each repository home needs its own index directories (so you have the index files for each cluster). If you make a new cluster, you have to wait for a long time till the index is built, depending on the data in your repository (if you have tons of data, you have to wait a week or longer).

The tables of the FS and PM will be shared between all cluster nodes - that works.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 10:54
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

Since there could be n number of instances. So I can't decide the cluster id beforehand.
Hence I have the following code that creates a cluster id at run time.

System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());

Similarly the repositoryHome path is generated at run time.

But do I also need separate tables for workspace file system? I have the following configuration for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager will be shared between all the nodes or will these tables will be different?

<?xml version="1.0"?>
<!DOCTYPE Repository
          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">

<Repository>

     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
 		<param name="driver" value="javax.naming.InitialContext"/>
  		<param name="url" value="jdbc/amiDBDataSource"/>
		<param name="databaseType" value="oracle"/>  		
       	<param name="copyWhenReading" value="true"/>
        <param name="tablePrefix" value=""/>
        <param name="schemaObjectPrefix" value="J_R_DS_"/>
        <param name="schemaCheckEnabled" value="false"/> 
    </DataStore>

	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
  		<param name="driver" value="javax.naming.InitialContext"/>
  		<param name="url" value="jdbc/amiDBDataSource"/>
		<!-- The following value must oracle for oracle server this is not the same as the database schema -->
		<param name="schema" value="oracle"/>
		<param name="schemaObjectPrefix" value="J_R_FS_"/>
		<param name="schemaCheckEnabled" value="false"/> 
	</FileSystem>

	<Security appName="Jackrabbit">
		<SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager" />
		<AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
		<LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
			<param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider" />
		</LoginModule>
	</Security>
	
	<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
	
	<Workspace name="${wsp.name}">
	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle"/>
			<param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
			<param name="schemaCheckEnabled" value="false"/> 
		</FileSystem>
		<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<param name="tableSpace" value="" />
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle" />
			<param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
			<param name="externalBLOBs" value="false" />
			<param name="schemaCheckEnabled" value="false"/> 
		</PersistenceManager>
		<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>
	</Workspace>
	
	<Versioning rootPath="${rep.home}/version">
		
		<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle"/>
			<param name="schemaObjectPrefix" value="J_V_FS_"/>
			<param name="schemaCheckEnabled" value="false"/> 
		</FileSystem>
		<!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
		<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<param name="tableSpace" value="" />
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle" />
			<param name="schemaObjectPrefix" value="J_V_PM_" />
			<param name="externalBLOBs" value="false" />
			<param name="schemaCheckEnabled" value="false"/> 
		</PersistenceManager>

	</Versioning>
	
    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
        <param name="path" value="${rep.home}/search/index"/>
        <param name="supportHighlighting" value="true"/>
    </SearchIndex>
    
	<Cluster syncDelay="2000">
  		<Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
    		<param name="revision" value="${rep.home}/revision.log" />
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
  			<param name="schemaObjectPrefix" value="J_R_" />
  			<param name="databaseType" value="oracle"/>
  		</Journal>
	</Cluster> 
	  
</Repository>

Thanks,
Nikhil
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 2:42 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

you need clustering, because all of your instances should access the same repository.

What you need is separate repository homes for each instance. In my use case I have an installation directory for each instance, so the repository home is located below this directory.

You have to make sure, that each instance has also its own repository.xml because you need to define different clusterIDs.

And you have to define a cluster section in the repository.xml where the journal is located, which is necessary for synchronization:

    <Cluster id="node1" syncDelay="5000">
      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
        <param name="driver" value="javax.naming.InitialContext"/>
        <param name="url" value="jdbc/amiDBDataSource"/>
	  ...  
      </Journal>
    </Cluster>

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 09:37
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

Thanks for replying back. I will need little more help to understand the things completely.
I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.

In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
Now this application reads some files from repository and also inserts some data in repository.
There could be two instances of the application app1 running on machine1 and app2 running on machine2.
So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
So if all the application instances use the same repository configuration file and specify their own repository home.
Will that work in my case? Will there be any consistency issues?

When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.

My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.

Thanks,
Nikhil
 
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 1:22 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

if you want to use clustering, you have to define a repository home for each cluster. 

Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.

If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 08:33
An: users@jackrabbit.apache.org
Betreff: Multiple instances of repository

Hi,

I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
The application read the content from repository and also writes some content in repository.
There could be multiple concurrent instances of my application running on the same or different machines.
I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
After doing some search I came to know about running the jackrabbit in clustered mode.
Now my question is even in this case I will have to specify a different repository home for every run, right?
That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
This is a standalone java application and theoretically n number of instance can be run.
My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
Because .lock file will be different for different runs as the repository home is different.
I know I am missing something here, please help me.
I am attaching my conf file with this mail.

Thanks,
Nikhil


RE: Multiple instances of repository

Posted by ni...@emeter.com.
Since there could be n number of instances. So I can't decide the cluster id beforehand.
Hence I have the following code that creates a cluster id at run time.

System.setProperty("org.apache.jackrabbit.core.cluster.node_id", "cluster_id"+System.nanoTime());

Similarly the repositoryHome path is generated at run time.

But do I also need separate tables for workspace file system? I have the following configuration for my workspace. Is it correct? The tables for the workspace FS and PersistenceManager will be shared between all the nodes or will these tables will be different?

<?xml version="1.0"?>
<!DOCTYPE Repository
          PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
          "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">

<Repository>

     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
 		<param name="driver" value="javax.naming.InitialContext"/>
  		<param name="url" value="jdbc/amiDBDataSource"/>
		<param name="databaseType" value="oracle"/>  		
       	<param name="copyWhenReading" value="true"/>
        <param name="tablePrefix" value=""/>
        <param name="schemaObjectPrefix" value="J_R_DS_"/>
        <param name="schemaCheckEnabled" value="false"/> 
    </DataStore>

	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
  		<param name="driver" value="javax.naming.InitialContext"/>
  		<param name="url" value="jdbc/amiDBDataSource"/>
		<!-- The following value must oracle for oracle server this is not the same as the database schema -->
		<param name="schema" value="oracle"/>
		<param name="schemaObjectPrefix" value="J_R_FS_"/>
		<param name="schemaCheckEnabled" value="false"/> 
	</FileSystem>

	<Security appName="Jackrabbit">
		<SecurityManager class="repository.jcr.jackrabbit.EipSecurityManager" />
		<AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
		<LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
			<param name="principalProvider" value="repository.jcr.jackrabbit.EipPrincipalProvider" />
		</LoginModule>
	</Security>
	
	<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="eip" />
	
	<Workspace name="${wsp.name}">
	<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle"/>
			<param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
			<param name="schemaCheckEnabled" value="false"/> 
		</FileSystem>
		<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<param name="tableSpace" value="" />
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle" />
			<param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
			<param name="externalBLOBs" value="false" />
			<param name="schemaCheckEnabled" value="false"/> 
		</PersistenceManager>
		<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>
	</Workspace>
	
	<Versioning rootPath="${rep.home}/version">
		
		<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle"/>
			<param name="schemaObjectPrefix" value="J_V_FS_"/>
			<param name="schemaCheckEnabled" value="false"/> 
		</FileSystem>
		<!-- Change to Oracle Class <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> -->
		<PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager">
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
			<param name="tableSpace" value="" />
			<!-- The following value must oracle for oracle server this is not the same as the database schema -->
			<param name="schema" value="oracle" />
			<param name="schemaObjectPrefix" value="J_V_PM_" />
			<param name="externalBLOBs" value="false" />
			<param name="schemaCheckEnabled" value="false"/> 
		</PersistenceManager>

	</Versioning>
	
    <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
        <param name="path" value="${rep.home}/search/index"/>
        <param name="supportHighlighting" value="true"/>
    </SearchIndex>
    
	<Cluster syncDelay="2000">
  		<Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
    		<param name="revision" value="${rep.home}/revision.log" />
 			<param name="driver" value="javax.naming.InitialContext"/>
  			<param name="url" value="jdbc/amiDBDataSource"/>
  			<param name="schemaObjectPrefix" value="J_R_" />
  			<param name="databaseType" value="oracle"/>
  		</Journal>
	</Cluster> 
	  
</Repository>

Thanks,
Nikhil
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 2:42 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

you need clustering, because all of your instances should access the same repository.

What you need is separate repository homes for each instance. In my use case I have an installation directory for each instance, so the repository home is located below this directory.

You have to make sure, that each instance has also its own repository.xml because you need to define different clusterIDs.

And you have to define a cluster section in the repository.xml where the journal is located, which is necessary for synchronization:

    <Cluster id="node1" syncDelay="5000">
      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
        <param name="driver" value="javax.naming.InitialContext"/>
        <param name="url" value="jdbc/amiDBDataSource"/>
	  ...  
      </Journal>
    </Cluster>

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 09:37
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

Thanks for replying back. I will need little more help to understand the things completely.
I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.

In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
Now this application reads some files from repository and also inserts some data in repository.
There could be two instances of the application app1 running on machine1 and app2 running on machine2.
So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
So if all the application instances use the same repository configuration file and specify their own repository home.
Will that work in my case? Will there be any consistency issues?

When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.

My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.

Thanks,
Nikhil
 
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 1:22 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

if you want to use clustering, you have to define a repository home for each cluster. 

Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.

If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 08:33
An: users@jackrabbit.apache.org
Betreff: Multiple instances of repository

Hi,

I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
The application read the content from repository and also writes some content in repository.
There could be multiple concurrent instances of my application running on the same or different machines.
I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
After doing some search I came to know about running the jackrabbit in clustered mode.
Now my question is even in this case I will have to specify a different repository home for every run, right?
That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
This is a standalone java application and theoretically n number of instance can be run.
My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
Because .lock file will be different for different runs as the repository home is different.
I know I am missing something here, please help me.
I am attaching my conf file with this mail.

Thanks,
Nikhil


AW: Multiple instances of repository

Posted by "Seidel. Robert" <Ro...@aeb.de>.
Hi Nikhil,

you need clustering, because all of your instances should access the same repository.

What you need is separate repository homes for each instance. In my use case I have an installation directory for each instance, so the repository home is located below this directory.

You have to make sure, that each instance has also its own repository.xml because you need to define different clusterIDs.

And you have to define a cluster section in the repository.xml where the journal is located, which is necessary for synchronization:

    <Cluster id="node1" syncDelay="5000">
      <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
        <param name="driver" value="javax.naming.InitialContext"/>
        <param name="url" value="jdbc/amiDBDataSource"/>
	  ...  
      </Journal>
    </Cluster>

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 09:37
An: users@jackrabbit.apache.org
Betreff: RE: Multiple instances of repository

Thanks for replying back. I will need little more help to understand the things completely.
I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.

In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
Now this application reads some files from repository and also inserts some data in repository.
There could be two instances of the application app1 running on machine1 and app2 running on machine2.
So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
So if all the application instances use the same repository configuration file and specify their own repository home.
Will that work in my case? Will there be any consistency issues?

When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.

My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.

Thanks,
Nikhil
 
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 1:22 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

if you want to use clustering, you have to define a repository home for each cluster. 

Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.

If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 08:33
An: users@jackrabbit.apache.org
Betreff: Multiple instances of repository

Hi,

I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
The application read the content from repository and also writes some content in repository.
There could be multiple concurrent instances of my application running on the same or different machines.
I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
After doing some search I came to know about running the jackrabbit in clustered mode.
Now my question is even in this case I will have to specify a different repository home for every run, right?
That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
This is a standalone java application and theoretically n number of instance can be run.
My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
Because .lock file will be different for different runs as the repository home is different.
I know I am missing something here, please help me.
I am attaching my conf file with this mail.

Thanks,
Nikhil


RE: Multiple instances of repository

Posted by ni...@emeter.com.
Thanks for replying back. I will need little more help to understand the things completely.
I will just elaborate a bit more on my usage scenario. I am also attaching my repository.xml file with this mail. Please let me know if you want to know more about my environment.

In my case, I want to keep all the data in one database and I want to use jackrabbit as JCR over this database.
I have the jackrabbit embedded in my application so the repository gets-up as part of the application.
Now this application reads some files from repository and also inserts some data in repository.
There could be two instances of the application app1 running on machine1 and app2 running on machine2.
So my application instances are different and I can create multiple repository homes to avoid the locking problem but I still wants to insert the data from these applications in same database tables.
So if all the application instances use the same repository configuration file and specify their own repository home.
Will that work in my case? Will there be any consistency issues?

When you say separate data store and separate persistence managers, you mean separate repository configuration file or separate database tables for data stores and persistence managers.

My instances and the repositories operate separately from each other but they still want to share the data. The data inserted by one application instance should be visible to other instance. So they all should be inserting the data in same tables, that's what my understanding is.

Thanks,
Nikhil
 
-----Original Message-----
From: Seidel. Robert [mailto:Robert.Seidel@aeb.de] 
Sent: Tuesday, November 16, 2010 1:22 PM
To: users@jackrabbit.apache.org
Subject: AW: Multiple instances of repository

Hi Nikhil,

if you want to use clustering, you have to define a repository home for each cluster. 

Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.

If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 08:33
An: users@jackrabbit.apache.org
Betreff: Multiple instances of repository

Hi,

I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
The application read the content from repository and also writes some content in repository.
There could be multiple concurrent instances of my application running on the same or different machines.
I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
After doing some search I came to know about running the jackrabbit in clustered mode.
Now my question is even in this case I will have to specify a different repository home for every run, right?
That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
This is a standalone java application and theoretically n number of instance can be run.
My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
Because .lock file will be different for different runs as the repository home is different.
I know I am missing something here, please help me.
I am attaching my conf file with this mail.

Thanks,
Nikhil


AW: Multiple instances of repository

Posted by "Seidel. Robert" <Ro...@aeb.de>.
Hi Nikhil,

if you want to use clustering, you have to define a repository home for each cluster. 

Clustering is necessary, if you want to have the same data/indexes at all cluster nodes - the key word is synchronization.

If your instances and the repositories operate separately from each other, you don't need clustering. Separate repository homes, data stores and persistence managers will do the job.

Kindly regards, Robert

-----Ursprüngliche Nachricht-----
Von: nikhil.agrawal@emeter.com [mailto:nikhil.agrawal@emeter.com] 
Gesendet: Dienstag, 16. November 2010 08:33
An: users@jackrabbit.apache.org
Betreff: Multiple instances of repository

Hi,

I am using jackrabbit as JCR implementation in my project. I am running jackrabbit with in my application in the same jvm.
The application read the content from repository and also writes some content in repository.
There could be multiple concurrent instances of my application running on the same or different machines.
I have a configuration file for jackrabbit and I have a single repository home for jackrabbit.
Now as soon as one instance of the application is up and running, I can't run the other instance as the first instance creates a lock file in repository home.
After doing some search I came to know about running the jackrabbit in clustered mode.
Now my question is even in this case I will have to specify a different repository home for every run, right?
That means I should form the repository home path at the run time because at compile time I am not sure how many instance will be run.
This is a standalone java application and theoretically n number of instance can be run.
My question is when I have to specify a different repository path for every run, then the jackrabbit will work even with out clustering?
Because .lock file will be different for different runs as the repository home is different.
I know I am missing something here, please help me.
I am attaching my conf file with this mail.

Thanks,
Nikhil