You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Erik Froese <er...@hallwaytech.com> on 2011/02/14 16:47:27 UTC

Indexing in a separate thread on startup

Hey everyone,

Is there a way to configure the Lucene that comes with Jackrabbit so
that it creates the initial indexes in a separate thread when starting
up?

I found this:
http://wiki.apache.org/jackrabbit/Search

and this:
http://wiki.apache.org/jackrabbit/IndexingConfiguration

But neither provide a way to have the indexing go on in the background
while the rest of the bundles start up.

We have some development and staging servers with tens of thousands of
nodes in jackrabbit and we wipe and re-install the application server
area quite frequently. Since most of our other bundles depend on
jackrabbit being active they all wait for the Lucene indexing to
complete. This takes up to 20 minutes the first time the server comes
up after a re-install.

I tried copying the index files from install to install but jackrabbit
complains that the default workspace directory exists already.

14.02.2011 10:46:05.964 *ERROR* [Repository Pinger]
org.apache.sling.jcr.jackrabbit.server acquireRepository: Repository
problem starting repository from
file:/PATH/TO/sling/jackrabbit/repository.xml in
/PATH/TO/sling/jackrabbit
(org.apache.jackrabbit.core.config.ConfigurationException: Workspace
directory already exists: default)
org.apache.jackrabbit.core.config.ConfigurationException: Workspace
directory already exists: default

Any pointers?

Thanks
Erik

Re: Indexing in a separate thread on startup

Posted by Erik Froese <er...@hallwaytech.com>.
Not thinking in through maybe? I just did exactly that and it worked
just fine. I've adjusted our build process and we're good to go with
much shorter builds.

Thanks for pointing that out Alex,
Erik

On Tue, Feb 15, 2011 at 2:54 AM, Alexander Klimetschek
<ak...@adobe.com> wrote:
> On 15.02.11 01:33, "Erik Froese" <er...@hallwaytech.com> wrote:
>>I need to copy in the indexes for the default workspace in
>>jackrabbit/workspaces/default/index/ as well as in
>>jackrabbit/repository/index. I can't do that unless the directory
>>exists, which means jackrabbit must have come up already to create it.
>
> What prevents you from copying the entire jackrabbit directory?
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>
>
>
>
>

Re: Indexing in a separate thread on startup

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 15.02.11 01:33, "Erik Froese" <er...@hallwaytech.com> wrote:
>I need to copy in the indexes for the default workspace in
>jackrabbit/workspaces/default/index/ as well as in
>jackrabbit/repository/index. I can't do that unless the directory
>exists, which means jackrabbit must have come up already to create it.

What prevents you from copying the entire jackrabbit directory?

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel





Re: Indexing in a separate thread on startup

Posted by Erik Froese <er...@hallwaytech.com>.
Hey Alex,

Thanks for the clarification. I'm not sure how to copy the indexes in
there since Jackrabbit seems to not like the fact that the workspace
folder already exists. It prevents the rest of the application from
coming up.

The culprit is org.apache.jackrabbit.core.config.RepositoryConfig on
line ~712 (Jackrabbit 2.1.3).

I need to copy in the indexes for the default workspace in
jackrabbit/workspaces/default/index/ as well as in
jackrabbit/repository/index. I can't do that unless the directory
exists, which means jackrabbit must have come up already to create it.

Erik

On Mon, Feb 14, 2011 at 11:59 AM, Alexander Klimetschek
<ak...@adobe.com> wrote:
> On 14.02.11 16:47, "Erik Froese" <er...@hallwaytech.com> wrote:
>>Is there a way to configure the Lucene that comes with Jackrabbit so
>>that it creates the initial indexes in a separate thread when starting
>>up?
>
> AFAIK no. I think this is because the JCR spec requires that the index for
> queries is up-to-date upon a save() (and thus also on restart).
>
> But if the repository is empty, this should be very quick. If you start
> with a copy of an existing (larger) repository, you can copy the search
> index as well, so the lucene index isn't recreated from the repo upon
> start.
>
>>I tried copying the index files from install to install but jackrabbit
>>complains that the default workspace directory exists already.
>>
>>14.02.2011 10:46:05.964 *ERROR* [Repository Pinger]
>>org.apache.sling.jcr.jackrabbit.server acquireRepository: Repository
>>problem starting repository from
>>file:/PATH/TO/sling/jackrabbit/repository.xml in
>>/PATH/TO/sling/jackrabbit
>>(org.apache.jackrabbit.core.config.ConfigurationException: Workspace
>>directory already exists: default)
>>org.apache.jackrabbit.core.config.ConfigurationException: Workspace
>>directory already exists: default
>
> I don't know how this issue comes up, but copying over should be possible.
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>
>
>
>
>

Re: Indexing in a separate thread on startup

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 14.02.11 16:47, "Erik Froese" <er...@hallwaytech.com> wrote:
>Is there a way to configure the Lucene that comes with Jackrabbit so
>that it creates the initial indexes in a separate thread when starting
>up?

AFAIK no. I think this is because the JCR spec requires that the index for
queries is up-to-date upon a save() (and thus also on restart).

But if the repository is empty, this should be very quick. If you start
with a copy of an existing (larger) repository, you can copy the search
index as well, so the lucene index isn't recreated from the repo upon
start.

>I tried copying the index files from install to install but jackrabbit
>complains that the default workspace directory exists already.
>
>14.02.2011 10:46:05.964 *ERROR* [Repository Pinger]
>org.apache.sling.jcr.jackrabbit.server acquireRepository: Repository
>problem starting repository from
>file:/PATH/TO/sling/jackrabbit/repository.xml in
>/PATH/TO/sling/jackrabbit
>(org.apache.jackrabbit.core.config.ConfigurationException: Workspace
>directory already exists: default)
>org.apache.jackrabbit.core.config.ConfigurationException: Workspace
>directory already exists: default

I don't know how this issue comes up, but copying over should be possible.

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel