You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Christoph Kiehl <ki...@subshell.com> on 2006/08/25 19:26:03 UTC

Jackrabbits own FileSystem and unit tests

Hi,

I'm trying to modify Jackrabbit to work _only_ in memory and persist _nothing_ 
in the file system. I want this for testing purposes, where I don't like any 
files being created besides the fact that memory should be the fastest. My test 
data sets aren't that large anyway so memory usage is not a concern.

My first approach was to use an InMemPersistenceManager, a RAMDirectory for 
lucene and my own MemoryFileSystem based on JackRabbits FileSystem. But I found 
that InMemPersistenceManager uses the file system as well. And apparently it's 
quite difficult to get PersistentIndex using RAMDirectories. To make it even 
harder the configured FileSystem isn't used for all data Jackrabbit persists. 
And if this FileSystem is used or not depends on "configRootPath"-attribute 
which must be set in the repository.xml but is not allowed by the DTD.
I still don't get what this FileSystem is used for. Maybe someone with deeper 
knowlegde of the system could explain it to me? Or is it just there for 
historical reasons?

However, maybe someone has a better strategy to get a lightweight repository 
which could be quickly initialized and is usable in unit tests. I don't like the 
idea to mock the JCR API because this way our tests become to unnatural and complex.

Looking forward some ideas how to best test the interface from your own code to 
Jackrabbit ;)

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Nicolas wrote:

> An idea might be to use a FS persisted only in RAM and mount it: in pure
> performance this wouldn't be that good but you can have something working
> quite easily (I don't know what you want to test so maybe this is not
> relevant).
> 
> This would work on Linux. This is a quick hack but easy to deploy.

Nice idea, but I would prefer tests that run on any platform without any other 
prerequisites than java since these tests are executed on various platforms.

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Nicolas <nt...@gmail.com>.

Hi,

An idea might be to use a FS persisted only in RAM and mount it: in pure
performance this wouldn't be that good but you can have something working
quite easily (I don't know what you want to test so maybe this is not
relevant).

This would work on Linux. This is a quick hack but easy to deploy.

Cheers
Nico
my blog! http://www.deviant-abstraction.net !!

Re: Jackrabbits own FileSystem and unit tests

Posted by Marcel Reutegger <ma...@gmx.net>.

sorry for the late response but I was on paternity leave...

Christoph Kiehl wrote:
> But isn't it necessary for the index data to be committed to the 
> database/pm to get a transactional index? I mean if you commit the index 
> changes from the redo.log in a new transaction you don't really gain 
> anything compared to the current solution regarding transactional index 
> behavior, do you?

I expect there will be a performance gain. rather than committing the 
inverted index, which possibly requires segment optimizations in 
lucene, simply appending to a log is much faster.

regards
  marcel

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>> Ok. To get this working, you have to create at least one segment per 
>> transaction, right?
> 
> not necessarily. as an optimization the current implementation uses the 
> redo.log to keep track of index modifications that were only done in 
> memory. this means that at the end of a transaction there won't 
> necessarily be a new index segment on disk.

But isn't it necessary for the index data to be committed to the database/pm to 
get a transactional index? I mean if you commit the index changes from the 
redo.log in a new transaction you don't really gain anything compared to the 
current solution regarding transactional index behavior, do you?

>> And index merging could be done in background?
> index merging *is* already done in the background.

Yes, of course.

>> Sounds really interesting. But if the blob values are cached locally 
>> they have to be downloaded on startup first before the index starts to 
>> be fast.
> 
> correct.

Hm, for our case this would mean to download about 10GB on each restart :( Might 
take a while ;)

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Marcel Reutegger <ma...@gmx.net>.

lots of answers...

Christoph Kiehl wrote:
> Ok. To get this working, you have to create at least one segment per 
> transaction, right?

not necessarily. as an optimization the current implementation uses 
the redo.log to keep track of index modifications that were only done 
in memory. this means that at the end of a transaction there won't 
necessarily be a new index segment on disk.

> And index merging could be done in background?

index merging *is* already done in the background.

> Sounds really interesting. But if the blob values are cached locally 
> they have to be downloaded on startup first before the index starts to 
> be fast.

correct.

> Or does the blob cache survive restarts?

no, it doesn't.

regards
  marcel

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>> I like the idea of having a transactional index but I don't think it's 
>> a good idea to read this index from a binary property in a database, 
>> because in our case we've got a fairly large repository where we got 
>> index files with a size of 40MB. As far as I understand you have to 
>> transfer 40MB to the database on every index change that gets 
>> committed. Am I right?
> 
> In general, this is correct. but lucene is designed in a way that it 
> never modifies an existing index file. if you have a 40 MB index segment 
> file and you delete a document within that index, lucene will simply 
> update a small other file which is kept along the index called 
> <segment-name>.del. Adding a new document to an existing index segment 
> is not possible, in that case lucene will create a new segment.

Ok. To get this working, you have to create at least one segment per 
transaction, right? And index merging could be done in background? Sounds really 
interesting. But if the blob values are cached locally they have to be 
downloaded on startup first before the index starts to be fast. Or does the blob 
cache survive restarts? Lots of questions ;)

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Marcel Reutegger <ma...@gmx.net>.

Christoph Kiehl wrote:
> I like the idea of having a transactional index but I don't think it's a 
> good idea to read this index from a binary property in a database, 
> because in our case we've got a fairly large repository where we got 
> index files with a size of 40MB. As far as I understand you have to 
> transfer 40MB to the database on every index change that gets committed. 
> Am I right?

In general, this is correct. but lucene is designed in a way that it 
never modifies an existing index file. if you have a 40 MB index 
segment file and you delete a document within that index, lucene will 
simply update a small other file which is kept along the index called 
<segment-name>.del. Adding a new document to an existing index segment 
is not possible, in that case lucene will create a new segment.

in practice this would mean that an index segment file is read from 
the persistence manager (e.g. database) only once and then deleted at 
some point when the index merger created a new index segment from 
existing ones.

regards
  marcel

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>>> in the long term I would like to put the index in the repository 
>>> itself. but that means that the repository (at least internally) has 
>>> to support random access on its binary resources.
>>
>> Hm, sounds interesting, but wouldn't this only work with file system 
>> based persistence managers?
> 
> no, the index would simply store its files as jcr resource nodes and any 
> persistence manager must be able to handle binary properties.

Okay ;) I didn't mean technical limitations, more practical limitations.

> Jackrabbit already does some optimization. when a binary property is 
> read it is either copied into memory if it is smaller than 64k or 
> spooled to a temp file. Further access to that property is then either 
> directly from memory or a local file.
> 
> finally, what is it good for?
> 
> - jackrabbit does not need a separate location for the index anymore, 
> it's all in the repository itself.
> - the index will be fully transactional. See issue JCR-204

I like the idea of having a transactional index but I don't think it's a good 
idea to read this index from a binary property in a database, because in our 
case we've got a fairly large repository where we got index files with a size of 
40MB. As far as I understand you have to transfer 40MB to the database on every 
index change that gets committed. Am I right?

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Marcel Reutegger <ma...@gmx.net>.

Christoph Kiehl wrote:
>> in the long term I would like to put the index in the repository 
>> itself. but that means that the repository (at least internally) has 
>> to support random access on its binary resources.
> 
> Hm, sounds interesting, but wouldn't this only work with file system 
> based persistence managers?

no, the index would simply store its files as jcr resource nodes and 
any persistence manager must be able to handle binary properties.

> I mean, I can't imagine how to get an index 
> with decent performance using a database based persistence manager in 
> the near future? Do you already have an idea how achieve the needed 
> performance?

Jackrabbit already does some optimization. when a binary property is 
read it is either copied into memory if it is smaller than 64k or 
spooled to a temp file. Further access to that property is then either 
directly from memory or a local file.

what's missing is random access support on the BLOBFileValue. This 
class currently only offers a plain InputStream.

finally, what is it good for?

- jackrabbit does not need a separate location for the index anymore, 
it's all in the repository itself.
- the index will be fully transactional. See issue JCR-204

regards
  marcel

Re: Jackrabbits own FileSystem and unit tests

Posted by Nicolas <nt...@gmail.com>.

>
> Hm, sounds interesting, but wouldn't this only work with file system based
> persistence managers? I mean, I can't imagine how to get an index with
> decent
> performance using a database based persistence manager in the near future?
> Do
> you already have an idea how achieve the needed performance?
>

Hi,

Why would we put the index in the repository? What would be the use?

BR,
Nico
my blog! http://www.deviant-abstraction.net !!

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Jukka Zitting wrote:
> On 8/31/06, Christoph Kiehl <ki...@subshell.com> wrote:
>> Hm, I think all these folders & files:
>> [...]
>> could be created using jr's FileSystem.
> 
> Sounds good to me. I assume the main performance problem with Lucene
> is with the Lucene Directories and not with the extra files written by
> the search index.

Yeah, I totally agree. And I like your idea with the decorator to move the 
locking out of RepositoryImpl. What do the others think?

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 8/31/06, Christoph Kiehl <ki...@subshell.com> wrote:
> Hm, I think all these folders & files:
> [...]
> could be created using jr's FileSystem.

Sounds good to me. I assume the main performance problem with Lucene
is with the Lucene Directories and not with the extra files written by
the search index.

> The only problem I see is with the ".lock" file. This file requires FileSystem
> to implement a proper mechanism to create lock files. But this could be done.

The .lock mechanism in RepositoryImpl is currently totally independent
of all the other functionality of that class, so I'd actually like to
refactor locking to a separate decorator class (or something) like
LockRepository that could work a bit like TransientRepository using a
RepositoryFactory to instantiate the underlying RepositoryImpl once
the lock on the file has been acquired.

An in-memory repository instance could just use RepositoryImpl
directly without worrying about the lock file, since there is by
definition no chance of concurrent access.

BR,

Jukka Zitting

-- 
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Hi Marcel,

> as mentioned before the current search index implementation does not 
> work on the jackrabbit FileSystem abstraction. The reason is lack of 
> support for random access on the FileSystemResource which is a 
> performance killer when lucene is running on top of that.

Hm, I think all these folders & files:

 >> ./repository
 >> ./repository/index
 >> ./repository/index/_0
 >> ./repository/index/indexes
 >> ./repository/index/ns_mappings.properties
 >> ./repository/index/redo.log
 >> ./version
 >> ./version/blobs
 >> ./workspaces
 >> ./workspaces/default
 >> ./workspaces/default/blobs
 >> ./workspaces/default/index
 >> ./workspaces/default/index/_0
 >> ./workspaces/default/index/indexes
 >> ./workspaces/default/index/redo.log
 >> ./workspaces/default/workspace.xml
 >> ./.lock

could be created using jr's FileSystem. The only problem I see is with the 
".lock" file. This file requires FileSystem to implement a proper mechanism to 
create lock files. But this could be done.
The indexes itself can use whatever they like, e.g. FSDirectory or RAMDirectory.
This way there will be no files left written to the disk if I use a memory based 
file system.

> in the long term I would like to put the index in the repository itself. 
> but that means that the repository (at least internally) has to support 
> random access on its binary resources.

Hm, sounds interesting, but wouldn't this only work with file system based 
persistence managers? I mean, I can't imagine how to get an index with decent 
performance using a database based persistence manager in the near future? Do 
you already have an idea how achieve the needed performance?

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Marcel Reutegger <ma...@gmx.net>.

Hi Christoph,

the search index implementation uses a collection of lucene indexes 
for performance reasons. To keep track of the indexes in use it 
creates a file indexes with the index names.

the file ns_mappings.properties is a mapping of namespaces to index 
internal prefixes to save storage space.

finally, the redo.log is required to turn lucene into a transactional 
index.

as mentioned before the current search index implementation does not 
work on the jackrabbit FileSystem abstraction. The reason is lack of 
support for random access on the FileSystemResource which is a 
performance killer when lucene is running on top of that.

in the long term I would like to put the index in the repository 
itself. but that means that the repository (at least internally) has 
to support random access on its binary resources.

regards
  marcel

Christoph Kiehl wrote:
> I've got a configuration running where I use my MemoryFileSystem, 
> InMemoryPersistenceManager and a lucene RAMDirectory, but there jr still 
> creates the following structure in the file system:
> 
> ./repository
> ./repository/index
> ./repository/index/_0
> ./repository/index/indexes
> ./repository/index/ns_mappings.properties
> ./repository/index/redo.log
> ./version
> ./version/blobs
> ./workspaces
> ./workspaces/default
> ./workspaces/default/blobs
> ./workspaces/default/index
> ./workspaces/default/index/_0
> ./workspaces/default/index/indexes
> ./workspaces/default/index/redo.log
> ./workspaces/default/workspace.xml
> ./repository.xml
> ./.lock

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Stefan Guggisberg wrote:

> jr does have a 'physical' home directory. i don't think that we should 
> change
> that. that doesn't necessarily mean that any files are written to that
> directory.

For which reason does jr need a 'physical' home directory? And how do I prevent 
jr from writing files to this directory? I've got a configuration running where 
I use my MemoryFileSystem, InMemoryPersistenceManager and a lucene RAMDirectory, 
but there jr still creates the following structure in the file system:

./repository
./repository/index
./repository/index/_0
./repository/index/indexes
./repository/index/ns_mappings.properties
./repository/index/redo.log
./version
./version/blobs
./workspaces
./workspaces/default
./workspaces/default/blobs
./workspaces/default/index
./workspaces/default/index/_0
./workspaces/default/index/indexes
./workspaces/default/index/redo.log
./workspaces/default/workspace.xml
./repository.xml
./.lock

Is there anything else I need to configure?

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Stefan Guggisberg <st...@gmail.com>.

On 8/30/06, Christoph Kiehl <ki...@subshell.com> wrote:
> Stefan Guggisberg wrote:
>
> >> There are already working solutions for 2. and 3. So, as you wrote,
> >> the right
> >> way would be to refactor jackrabbits core to use an own storage. This
> >> could even
> >> be the current FileSystem, while this FileSystem lacks a proper locking
> >> mechanism which probably could be added.
> >
> > locking is imo not required in the jr FileSystem abstraction since jr
> > (or the
> > relevant subcomponent) *owns* the file system.
>
> With locking I meant the ".lock" file jr creates on startup
> (RepositoryImpl.acquireRepositoryLock). This file is created using java.io and
> java.nio. If we could abstract this functionality into FileSystem there would be
> no need for jr to access the filesystem directly to create a ".lock" file.
>
> >> Would that work? The mentioned persistence managers could still use an
> >> own
> >> instance of the Jackrabbit FileSystem, while they might be better off
> >> to use the
> >> file system directly. I mean there is no sense in telling an XML
> >> persistence
> >> manager to use memory or database based FileSystem. You could as well
> >> exchange
> >> the persistence manager for a memory or database persistence manager.
> >
> > i disagree. you might e.g. want to use the XMLPersistenceManager on a
> > DbPersistenceManager.
>
> You probably mean "on a Db_FileSystem_", don't you? I still don't think it's a

yup, i meant DbFileSystem

> good idea to do this, because you probably won't get a good performance using
> this constellation. But if someone really needs it ...
>
> > there are imo 3 issues which make it currently impossible to run jr
> > exclusively in memory, i.e. without leaving any traces on the disk:
> >
> > 1. there's obviously an issue with the InMemPersistenceManager;
> >   it uses a LocalFileSystem to persist blob's even with 'persistent=false'
> >
> >   feel free to open a jira issue for this.
>
> Ok, I will open a ticket.
>
> > 2. lucene doesn't use the jr FileSystem abstraction (there are valid
> > reasons
> >    for this and they were explained on the mailing list).
> >
> >    you seem to have found a workaround for this. another option could
> > perhaps
> >    be to disable SearchIIndex entirely.
>
> But which component should do the queries? Or would you implement a QueryHandler
> which simply traverse all nodes to get query results (performance?)? For me
> using a lucene RAMDirectory is okay. I already added a parameter "persistent" to
> SearchIndex which creates a PersistentIndex based on a RAMDirectory if set to
> "true". I would be glad to provide a patch.
>
> > 3. there's currently no InMemFileSystem available in jr. since you wrote
> > one
> >    we'd be very interested if you'd consider contributing it ;-)
>
> I implemented a very simple one based on a Map just to get something working.
> But I would be happy to contribute it if there is demand.
>
> > with the above 3 issues resolved it should imo be possible to run
> > jackrabbit
> > entirely in memory.
>
> If I do a search for java.io.File in jackrabbit core I found it to be used e.g. in:
>
> SessionImpl.createAccessManager()
> RepositoryConfig.init()
> RepositoryConfig.internalCreateWorkspaceConfig()
>
> I think these occurrences have to be eleminated as well to make jackrabbit run
> entirely in memory. Am I wrong?

yes ;-)

re:  SessionImpl.createAccessManager()

jr does have a 'physical' home directory. i don't think that we should change
that. that doesn't necessarily mean that any files are written to that
directory.

the other references to java.io.File are within conditional blocks which are
never executed, depending on your configuration.

cheers
stefan

>
> Cheers,
> Christoph
>
>

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Stefan Guggisberg wrote:

>> There are already working solutions for 2. and 3. So, as you wrote, 
>> the right
>> way would be to refactor jackrabbits core to use an own storage. This 
>> could even
>> be the current FileSystem, while this FileSystem lacks a proper locking
>> mechanism which probably could be added.
> 
> locking is imo not required in the jr FileSystem abstraction since jr 
> (or the
> relevant subcomponent) *owns* the file system.

With locking I meant the ".lock" file jr creates on startup 
(RepositoryImpl.acquireRepositoryLock). This file is created using java.io and 
java.nio. If we could abstract this functionality into FileSystem there would be 
no need for jr to access the filesystem directly to create a ".lock" file.

>> Would that work? The mentioned persistence managers could still use an 
>> own
>> instance of the Jackrabbit FileSystem, while they might be better off 
>> to use the
>> file system directly. I mean there is no sense in telling an XML 
>> persistence
>> manager to use memory or database based FileSystem. You could as well 
>> exchange
>> the persistence manager for a memory or database persistence manager.
> 
> i disagree. you might e.g. want to use the XMLPersistenceManager on a
> DbPersistenceManager.

You probably mean "on a Db_FileSystem_", don't you? I still don't think it's a 
good idea to do this, because you probably won't get a good performance using 
this constellation. But if someone really needs it ...

> there are imo 3 issues which make it currently impossible to run jr
> exclusively in memory, i.e. without leaving any traces on the disk:
> 
> 1. there's obviously an issue with the InMemPersistenceManager;
>   it uses a LocalFileSystem to persist blob's even with 'persistent=false'
> 
>   feel free to open a jira issue for this.

Ok, I will open a ticket.

> 2. lucene doesn't use the jr FileSystem abstraction (there are valid 
> reasons
>    for this and they were explained on the mailing list).
> 
>    you seem to have found a workaround for this. another option could 
> perhaps
>    be to disable SearchIIndex entirely.

But which component should do the queries? Or would you implement a QueryHandler 
which simply traverse all nodes to get query results (performance?)? For me 
using a lucene RAMDirectory is okay. I already added a parameter "persistent" to 
SearchIndex which creates a PersistentIndex based on a RAMDirectory if set to 
"true". I would be glad to provide a patch.

> 3. there's currently no InMemFileSystem available in jr. since you wrote 
> one
>    we'd be very interested if you'd consider contributing it ;-)

I implemented a very simple one based on a Map just to get something working. 
But I would be happy to contribute it if there is demand.

> with the above 3 issues resolved it should imo be possible to run 
> jackrabbit
> entirely in memory.

If I do a search for java.io.File in jackrabbit core I found it to be used e.g. in:

SessionImpl.createAccessManager()
RepositoryConfig.init()
RepositoryConfig.internalCreateWorkspaceConfig()

I think these occurrences have to be eleminated as well to make jackrabbit run 
entirely in memory. Am I wrong?

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Stefan Guggisberg <st...@gmail.com>.

On 8/28/06, Christoph Kiehl <ki...@subshell.com> wrote:
> Jukka Zitting wrote:
>
> >> I still don't get what this FileSystem is used for. Maybe someone with
> >> deeper
> >> knowlegde of the system could explain it to me? Or is it just there for
> >> historical reasons?
> >
> > It is used extensively by the Object and XML persistence managers, but
> > the more "modern" database persistence managers generally ignore the
> > configured FileSystem for anything else than locally stored binary
> > properties.
>
> I think this file system abstraction idea does not for all parts of Jackrabbit
> as the current process in development shows. Every persistence manager and index
> implementation knows best how to save its data and it's sufficient to configure
> their storage layer through the repository configuration.
>
> As I understand right now there are basically three places where data is persisted:
>
> 1. Jackrabbit core (repository.xml, workspace.xml, locking)
> 2. Persistence manager
> 3. Search index
>
> There are already working solutions for 2. and 3. So, as you wrote, the right
> way would be to refactor jackrabbits core to use an own storage. This could even
> be the current FileSystem, while this FileSystem lacks a proper locking
> mechanism which probably could be added.

locking is imo not required in the jr FileSystem abstraction since jr (or the
relevant subcomponent) *owns* the file system.

> Would that work? The mentioned persistence managers could still use an own
> instance of the Jackrabbit FileSystem, while they might be better off to use the
> file system directly. I mean there is no sense in telling an XML persistence
> manager to use memory or database based FileSystem. You could as well exchange
> the persistence manager for a memory or database persistence manager.

i disagree. you might e.g. want to use the XMLPersistenceManager on a
DbPersistenceManager.

back to your original use case (which i think is a valid one)...

there are imo 3 issues which make it currently impossible to run jr
exclusively in memory, i.e. without leaving any traces on the disk:

1. there's obviously an issue with the InMemPersistenceManager;
   it uses a LocalFileSystem to persist blob's even with 'persistent=false'

   feel free to open a jira issue for this.

2. lucene doesn't use the jr FileSystem abstraction (there are valid reasons
    for this and they were explained on the mailing list).

    you seem to have found a workaround for this. another option could perhaps
    be to disable SearchIIndex entirely.

3. there's currently no InMemFileSystem available in jr. since you wrote one
    we'd be very interested if you'd consider contributing it ;-)

with the above 3 issues resolved it should imo be possible to run jackrabbit
entirely in memory.

cheers
stefan


>
> Cheers,
> Christoph
>
>
>

Re: Jackrabbits own FileSystem and unit tests

Posted by Christoph Kiehl <ki...@subshell.com>.

Jukka Zitting wrote:

>> I still don't get what this FileSystem is used for. Maybe someone with 
>> deeper
>> knowlegde of the system could explain it to me? Or is it just there for
>> historical reasons?
> 
> It is used extensively by the Object and XML persistence managers, but
> the more "modern" database persistence managers generally ignore the
> configured FileSystem for anything else than locally stored binary
> properties.

I think this file system abstraction idea does not for all parts of Jackrabbit 
as the current process in development shows. Every persistence manager and index 
implementation knows best how to save its data and it's sufficient to configure 
their storage layer through the repository configuration.

As I understand right now there are basically three places where data is persisted:

1. Jackrabbit core (repository.xml, workspace.xml, locking)
2. Persistence manager
3. Search index

There are already working solutions for 2. and 3. So, as you wrote, the right 
way would be to refactor jackrabbits core to use an own storage. This could even 
be the current FileSystem, while this FileSystem lacks a proper locking 
mechanism which probably could be added.
Would that work? The mentioned persistence managers could still use an own 
instance of the Jackrabbit FileSystem, while they might be better off to use the 
file system directly. I mean there is no sense in telling an XML persistence 
manager to use memory or database based FileSystem. You could as well exchange 
the persistence manager for a memory or database persistence manager.

Cheers,
Christoph

Re: Jackrabbits own FileSystem and unit tests

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 8/25/06, Christoph Kiehl <ki...@subshell.com> wrote:
> I'm trying to modify Jackrabbit to work _only_ in memory and persist _nothing_
> in the file system. I want this for testing purposes, where I don't like any
> files being created besides the fact that memory should be the fastest. My test
> data sets aren't that large anyway so memory usage is not a concern.

I think that's impossible with the current Jackrabbit. There was some
discussion quite a while ago about implementing a simple in-memory
repository for testing purposes, but that idea never went anywhere. I
think a more realistic plan for implementing that would be to go
through and refactor all the filesystem dependencies in Jackrabbit.

> I still don't get what this FileSystem is used for. Maybe someone with deeper
> knowlegde of the system could explain it to me? Or is it just there for
> historical reasons?

It is used extensively by the Object and XML persistence managers, but
the more "modern" database persistence managers generally ignore the
configured FileSystem for anything else than locally stored binary
properties.

> However, maybe someone has a better strategy to get a lightweight repository
> which could be quickly initialized and is usable in unit tests. I don't like the
> idea to mock the JCR API because this way our tests become to unnatural and complex.

Agreed, the JCR API is quite difficult to Mock properly for any
non-trivial test case.

BR,

Jukka Zitting

-- 
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development