You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by freak182 <em...@gmail.com> on 2009/09/19 03:49:42 UTC

FileDataStore vs DatabaseDataStore

Hello,

what is your recommended data store in production environment? which
datastore can be easily manage like backup and restore, space occupied and
fast access? (given a same hardware specs)

has anyone implemented this "Theoretically the data store could be split to
different directories / hard drives."?

thanks a lot.
cheers.
-- 
View this message in context: http://www.nabble.com/FileDataStore-vs-DatabaseDataStore-tp25517799p25517799.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: FileDataStore vs DatabaseDataStore

Posted by Thomas Müller <th...@day.com>.
Hi,

I have extended the wiki a bit: http://wiki.apache.org/jackrabbit/DataStore

"Because the data store is append-only, the FileDataStore is
guaranteed to be consistent after a crash (unlike the
BundleFsPersistenceManager). It is usually faster than the database
data store, and the preferred choice unless you have strict
operational reasons to put everything into a database."

> has anyone implemented this "Theoretically the data store could be split to
> different directories / hard drives."?

No, but "currently this can be done manually moving directories to
different disks and by creating softlinks"

> point in time recovery

The data store is append-only and does not "need" point-in-time
recovery. Just restore all files you find :-) To free up disk space
run the data store garbage collection.

Regards,
Thomas

Re: FileDataStore vs DatabaseDataStore

Posted by Gamba <ho...@handelshof.de>.
Hi,

we're thinking about using a DBDataStore with MySQL. But with about  > 80GB
in production we still don't know how the backup is managed by MySQL. Maybe
a FileDataStore is more efficient, thinking about performance with
file-access and backup ... ?

Regards,
Gamba




> Hello,
> 
> what is your recommended data store in production environment? which
> datastore can be easily manage like backup and restore, space occupied and
> fast access? (given a same hardware specs)
> 
> has anyone implemented this "Theoretically the data store could be split
> to different directories / hard drives."?
> 
> thanks a lot.
> cheers.
> 

-- 
View this message in context: http://www.nabble.com/FileDataStore-vs-DatabaseDataStore-tp25517799p25531037.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: H2 Persistence Manager problem

Posted by Thomas Fromm <tf...@inubit.com>.
Hi,

On Monday 19 October 2009 17:20, Thomas Müller wrote:
> I can't reproduce the problem. Everything works for me. I guess you
> have an old Jackrabbit jar file somewhere in your classpath. My test
> case:


I found the problem.
The Problem was a missing .cnd file which contained the node types 
definitions. :-/

Sorry for the circumstances!

--tf

Re: H2 Persistence Manager problem

Posted by Thomas Fromm <tf...@inubit.com>.
> I can't reproduce the problem. Everything works for me. I guess you
> have an old Jackrabbit jar file somewhere in your classpath. My test
> case:

The only difference I can see at the first sight is, that I use
RepositoryConfig.create(...) to create the configuration and use this as 
parameter for TransientRepository.

No older versions of jackrabbit or the listed other components can be found 
inside this Tomcat...
Hm. 

Re: H2 Persistence Manager problem

Posted by Thomas Müller <th...@day.com>.
Hi,

I can't reproduce the problem. Everything works for me. I guess you
have an old Jackrabbit jar file somewhere in your classpath. My test
case:

import javax.jcr.*;
import org.apache.jackrabbit.core.TransientRepository;
public class Test {
    public static void main(String... args) throws Exception {
        System.setProperty("log4j.configuration", "file:log4j.xml");
        Repository repository = new TransientRepository();
        Session session = repository.login(new SimpleCredentials("",
"".toCharArray()));
        Node root = session.getRootNode();
        Node n = root.hasNode("test") ? root.getNode("test") :
root.addNode("test");
        n.setProperty("test", "abc");
        session.save();
        session.logout();
    }
}

Jar files:

commons-collections-3.1.jar
commons-io-1.4.jar
concurrent-1.3.4.jar
h2-1.2.121.jar
jackrabbit-api-1.6.0.jar
jackrabbit-core-1.6.0.jar
jackrabbit-jcr-commons-1.6.0.jar
jackrabbit-spi-1.6.0.jar
jackrabbit-spi-commons-1.6.0.jar
jackrabbit-text-extractors-1.6.0.jar
jcr-1.0.jar
log4j-1.2.14.jar
lucene-core-2.4.1.jar
slf4j-api-1.5.3.jar
slf4j-log4j12-1.5.3.jar

repository.xml: the same as your, except i replaced

<!--	<LoginModule
class="com.inubit.ibis.server.repository.IBISLoginModule"/> -->

with

        <LoginModule
class="org.apache.jackrabbit.core.security.simple.SimpleLoginModule">
           <param name="anonymousId" value="anonymous"/>
           <param name="adminId" value="admin"/>
        </LoginModule>

Log output:

19.10.2009 17:13:18 *INFO * RepositoryImpl: Starting repository...
(RepositoryImpl.java, line 261)
19.10.2009 17:13:18 *INFO * LocalFileSystem: LocalFileSystem
initialized at path repository/repository (LocalFileSystem.java, line
166)
19.10.2009 17:13:18 *INFO * NodeTypeRegistry: no custom node type
definitions found (NodeTypeRegistry.java, line 852)
19.10.2009 17:13:18 *INFO * LocalFileSystem: LocalFileSystem
initialized at path repository/version (LocalFileSystem.java, line
166)
19.10.2009 17:13:19 *INFO * ConnectionRecoveryManager: Database: H2 /
1.2.121 (2009-10-11) (ConnectionRecoveryManager.java, line 345)
19.10.2009 17:13:19 *INFO * ConnectionRecoveryManager: Driver: H2 JDBC
Driver / 1.2.121 (2009-10-11) (ConnectionRecoveryManager.java, line
346)
19.10.2009 17:13:19 *INFO * BundleDbPersistenceManager: version:
checking workspace consistency... (BundleDbPersistenceManager.java,
line 805)
19.10.2009 17:13:19 *INFO * BundleDbPersistenceManager: version:
checked 1/1 bundles. (BundleDbPersistenceManager.java, line 958)
19.10.2009 17:13:19 *INFO * RepositoryImpl: initializing workspace
'ibis'... (RepositoryImpl.java, line 1918)
19.10.2009 17:13:19 *INFO * LocalFileSystem: LocalFileSystem
initialized at path repository/workspaces/ibis (LocalFileSystem.java,
line 166)
19.10.2009 17:13:19 *INFO * ConnectionRecoveryManager: Database: H2 /
1.2.121 (2009-10-11) (ConnectionRecoveryManager.java, line 345)
19.10.2009 17:13:19 *INFO * ConnectionRecoveryManager: Driver: H2 JDBC
Driver / 1.2.121 (2009-10-11) (ConnectionRecoveryManager.java, line
346)
19.10.2009 17:13:19 *INFO * BundleDbPersistenceManager: ibis: checking
workspace consistency... (BundleDbPersistenceManager.java, line 805)
19.10.2009 17:13:19 *INFO * BundleDbPersistenceManager: ibis: checked
3/3 bundles. (BundleDbPersistenceManager.java, line 958)
19.10.2009 17:13:19 *INFO * SearchIndex: Index initialized:
repository/repository/index Version: 3 (SearchIndex.java, line 540)
19.10.2009 17:13:19 *INFO * SearchIndex: Index initialized:
repository/workspaces/ibis/index Version: 3 (SearchIndex.java, line
540)
19.10.2009 17:13:19 *INFO * RepositoryImpl: workspace 'ibis'
initialized (RepositoryImpl.java, line 1922)
19.10.2009 17:13:19 *INFO * RepositoryImpl: Repository started
(RepositoryImpl.java, line 366)
19.10.2009 17:13:19 *INFO * TransientRepository: Transient repository
initialized (TransientRepository.java, line 252)
19.10.2009 17:13:19 *INFO * SimpleSecurityManager: init: using
Repository LoginModule configuration for Jackrabbit
(SimpleSecurityManager.java, line 114)
19.10.2009 17:13:19 *INFO * RepositoryImpl: SecurityManager = class
org.apache.jackrabbit.core.security.simple.SimpleSecurityManager
(RepositoryImpl.java, line 441)
19.10.2009 17:13:19 *INFO * TransientRepository: Session opened
(TransientRepository.java, line 327)
19.10.2009 17:13:19 *INFO * TransientRepository: Session closed
(TransientRepository.java, line 422)
19.10.2009 17:13:19 *INFO * RepositoryImpl: Shutting down
repository... (RepositoryImpl.java, line 1093)
19.10.2009 17:13:19 *INFO * IndexMerger: IndexMerger terminated
(IndexMerger.java, line 341)
19.10.2009 17:13:19 *INFO * SearchIndex: Index closed:
repository/repository/index (SearchIndex.java, line 728)
19.10.2009 17:13:19 *INFO * RepositoryImpl: shutting down workspace
'ibis'... (RepositoryImpl.java, line 2068)
19.10.2009 17:13:19 *INFO * ObservationDispatcher: Notification of
EventListeners stopped. (ObservationDispatcher.java, line 106)
19.10.2009 17:13:19 *INFO * IndexMerger: IndexMerger terminated
(IndexMerger.java, line 341)
19.10.2009 17:13:19 *INFO * SearchIndex: Index closed:
repository/workspaces/ibis/index (SearchIndex.java, line 728)
19.10.2009 17:13:19 *INFO * RepositoryImpl: workspace 'ibis' has been
shutdown (RepositoryImpl.java, line 2074)
19.10.2009 17:13:19 *INFO * RepositoryImpl: Repository has been
shutdown (RepositoryImpl.java, line 1185)
19.10.2009 17:13:19 *INFO * TransientRepository: Transient repository
shut down (TransientRepository.java, line 262)

Regards,
Thomas

Re: H2 Persistence Manager problem

Posted by Thomas Fromm <tf...@inubit.com>.
> Log and configuration file attached.

Forgot to add log of org.apache.jackrabbit for the startup time....




Re: H2 Persistence Manager problem

Posted by Thomas Fromm <tf...@inubit.com>.
On Monday 19 October 2009 15:50, Thomas Müller wrote:
> So this exception occurred when creating the repository, right? The
> repository directory is empty before you start the application?

Both right.

> Could you post the log file, the configuration (repository.xml and all
> workspace.xml files), as well as the list of Jackrabbit jar files you
> use?

Log and configuration file attached.

Used Jars:
jackrabbit-core-1.6.0.jar
jackrabbit-jcr-commons-1.6.0.jar
jackrabbit-api-1.6.0.jar
jackrabbit-text-extractors-1.6.0.jar
jackrabbit-spi-commons-1.6.0.jar
jackrabbit-spi-1.6.0.jar
jcr-1.0.jar


Re: H2 Persistence Manager problem

Posted by Thomas Müller <th...@day.com>.
Hi,

> It is an fresh installation :-|

So this exception occurred when creating the repository, right? The
repository directory is empty before you start the application?

Could you post the log file, the configuration (repository.xml and all
workspace.xml files), as well as the list of Jackrabbit jar files you
use?

Regards,
Thomas

Re: H2 Persistence Manager problem

Posted by Thomas Fromm <tf...@inubit.com>.
> the persistence manager. It looks like the built-in nodetype
> "jcr:referenceable" doesn't exists in your installation. Could you try
> using a new (fresh) repository?

It is an fresh installation :-|

Normally I used the XMLPersistenceManager. But now I want to try the H2 one.

As FileSystem I use org.apache.jackrabbit.core.fs.local.LocalFileSystem.


Re: H2 Persistence Manager problem

Posted by Thomas Müller <th...@day.com>.
Hi,

I'm not sure what the problem is, but I don't think it is related to
the persistence manager. It looks like the built-in nodetype
"jcr:referenceable" doesn't exists in your installation. Could you try
using a new (fresh) repository?

Regards,
Thomas


On Mon, Oct 19, 2009 at 2:03 PM, Thomas Fromm <tf...@inubit.com> wrote:
>
>> today I configured H2 for both persistence managers (versioning and
>> workspace), but at startup I get:
>
> The used Jackrabbit version is 1.6.
>

Re: H2 Persistence Manager problem

Posted by Thomas Fromm <tf...@inubit.com>.
> today I configured H2 for both persistence managers (versioning and
> workspace), but at startup I get:

The used Jackrabbit version is 1.6.

H2 Persistence Manager problem

Posted by Thomas Fromm <tf...@inubit.com>.
Hi,

today I configured H2 for both persistence managers (versioning and 
workspace), but at startup I get:

Cannot instantiate persistence manager org.apache.jackra
bbit.core.persistence.bundle.H2PersistenceManager: 
{http://www.jcp.org/jcr/mix/1.0}referenceable
javax.jcr.RepositoryException: Cannot instantiate persistence manager 
org.apache.jackrabbit.core.persistence.bundle.H2PersistenceManager: 
{http://www.j
cp.org/jcr/mix/1.0}referenceable: 
{http://www.jcp.org/jcr/mix/1.0}referenceable
        at 
org.apache.jackrabbit.core.RepositoryImpl.createPersistenceManager(RepositoryImpl.java:1328)
        at 
org.apache.jackrabbit.core.RepositoryImpl.createVersionManager(RepositoryImpl.java:459)
        at 
org.apache.jackrabbit.core.RepositoryImpl.<init>(RepositoryImpl.java:319)
...

Caused by: javax.jcr.nodetype.NoSuchNodeTypeException: 
{http://www.jcp.org/jcr/mix/1.0}referenceable
        at 
org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.getEffectiveNodeType(NodeTypeRegistry.java:1024)
        at 
org.apache.jackrabbit.core.nodetype.NodeTypeRegistry.getEffectiveNodeType(NodeTypeRegistry.java:471)
        at 
org.apache.jackrabbit.core.persistence.bundle.AbstractBundlePersistenceManager.init(AbstractBundlePersistenceManager.java:416)
        at 
org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager.init(BundleDbPersistenceManager.java:598)
        at 
org.apache.jackrabbit.core.persistence.bundle.H2PersistenceManager.init(H2PersistenceManager.java:91)
        at 
org.apache.jackrabbit.core.RepositoryImpl.createPersistenceManager(RepositoryImpl.java:1324)
        ... 33 more


The configuration e.g. for versioning looks like this:

 <PersistenceManager 
class="org.apache.jackrabbit.core.persistence.bundle.H2PersistenceManager">
     <param name="bundleCacheSize" value="8"/>
     <param name="consistencyCheck" value="true"/>
        <param name="consistencyFix" value="true"/>
     <param name="minBlobSize" value="16384"/>
     <param name="driver" value="org.h2.Driver"/>
     <param name="url" value="jdbc:h2:file:${rep.home}/version/database"/>
     <param name="user" value="sa"/>
     <param name="password" value=""/>
     <param name="schema" value="h2"/>
     <param name="schemaObjectPrefix" value=""/>
     <param name="errorHandling" value=""/>
     <param name="lockTimeout" value="10000"/>
 </PersistenceManager>

Any ideas how to fix the problem?

--tf

Re: PersistenceManager question

Posted by Thomas Müller <th...@day.com>.
Hi,

> Is it possible to connect to the H2 database read-only later on to gain the
> best performance for the read only content repository then?

This is a H2 specific question, I suggest to use the H2 Google Group.
H2 does support read-only databases in multiple ways:
- Set the file permission to read-only.
- Create a new database user and give him only read-only access.

> to gain the best performance for the read only content

Read-only access is not much faster than read-write access.

Regards,
Thomas

Re: PersistenceManager question

Posted by Lukas Zapletal <lu...@zapletalovi.com>.
>> Whats the best combination ATM to store the repository in file system?
> 
> If you want to use Jackrabbit, I suggest to use a database persistence
> manager with an embedded database such as the H2 Database Engine:
> http://www.h2database.com and the file data store:
> http://wiki.apache.org/jackrabbit/DataStore

Is it possible to connect to the H2 database read-only later on to gain 
the best performance for the read only content repository then?

LZ

-- 
Lukas Zapletal
Please do not respond directly but
to the list or use this contact:
http://lukas.zapletalovi.com


Re: PersistenceManager question

Posted by Thomas Müller <th...@day.com>.
Hi,

> Whats the best combination ATM to store the repository in file system?

If you want to use Jackrabbit, I suggest to use a database persistence
manager with an embedded database such as the H2 Database Engine:
http://www.h2database.com and the file data store:
http://wiki.apache.org/jackrabbit/DataStore

Regards,
Thomas

PersistenceManager question

Posted by Thomas Fromm <tf...@inubit.com>.
Hi,

currently I use XMLPersistenceManager in combination with LocalFileSystem.

Now I want to change the configuration to reduce the number of files created 
in the file system and decrease the risk of getting corrupted storage due JVM 
crashes and so on. Also I don't need that the nodes are stored in XML.

At the PersistenceManager FAQ
http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
is the XMLPersistenceManager marked as "obsolete".

Whats the best combination ATM to store the repository in file system?

--tf

Re: FileDataStore vs DatabaseDataStore

Posted by Torgeir Veimo <to...@netenviron.com>.
2009/9/21 Thomas Fromm <tf...@inubit.com>:
> I think I've to look at it a little bit closer, would be nice to have an
> repository with less files and ruggedly designed file storage :-).

What about Day Software's CRX tar persistence manager? Wasn't it
append only and recently released as open source?


-- 
-Tor

Re: FileDataStore vs DatabaseDataStore

Posted by Thomas Müller <th...@day.com>.
Hi,

> I thought the DataStore works together with the persist manager, just as
> optional storage used for large binaries.

Yes it does. You can use any DataStore implementation with any
PersistenceManager implementation. See also:

http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ
http://wiki.apache.org/jackrabbit/DataStore

> I think I've to look at it a little bit closer, would be nice to have an
> repository with less files and ruggedly designed file storage :-).

I believe the FileDataStore is a "ruggedly designed file storage". If
you want less files, you could increase the minRecordLength setting
(in which case more data is stored in the persistence manager).

> What about Day Software's CRX tar persistence manager?
> Wasn't it append only and recently released as open source?

Yes, it's append only, but not (yet) open source. See also:
http://dev.day.com/microsling/content/blogs/main/tarpm.html

Regards,
Thomas

Re: FileDataStore vs DatabaseDataStore

Posted by Thomas Fromm <tf...@inubit.com>.
On Monday 21 September 2009 10:42, Jukka Zitting wrote:

> Both of the above issues are problems with the file based persistence
> managers, but not with the FileDataStore. The data store feature
> avoids such problems by design.

I thought the DataStore works together with the persist manager, just as 
optional storage used for large binaries.

I think I've to look at it a little bit closer, would be nice to have an 
repository with less files and ruggedly designed file storage :-).





Re: FileDataStore vs DatabaseDataStore

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Sep 21, 2009 at 10:01 AM, Thomas Fromm <tf...@inubit.com> wrote:
> You can make some snapshots with your SAN storage. But nice things like point
> in time recovery you have only with database ;-).
>
> Beside this the file storage is prone to JVM crashes (then you have lots of
> corrupt written files... often enough the storage doesn't work anymore after
> this happens). Because of this I recommend my customers to use a database in
> productive environments.

Both of the above issues are problems with the file based persistence
managers, but not with the FileDataStore. The data store feature
avoids such problems by design.

BR,

Jukka Zitting

Re: FileDataStore vs DatabaseDataStore

Posted by Thomas Fromm <tf...@inubit.com>.
> > which datastore can be easily manage like backup and restore, space
> > occupied and fast access? (given a same hardware specs)
>
> FileDataStore wins on all these counts. Backup/restore can be done by
> just copying the files, and incremental backups are also possible.
> Only minimal extra disk space (intermediate directories and the file
> inodes) is required beyond your binary content. And you can access the
> data at file system speed.

You can make some snapshots with your SAN storage. But nice things like point 
in time recovery you have only with database ;-).

Beside this the file storage is prone to JVM crashes (then you have lots of 
corrupt written files... often enough the storage doesn't work anymore after 
this happens). Because of this I recommend my customers to use a database in 
productive environments.

Re: FileDataStore vs DatabaseDataStore

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Sat, Sep 19, 2009 at 3:49 AM, freak182 <em...@gmail.com> wrote:
> what is your recommended data store in production environment?

I would recommend using FileDataStore unless you have strict
operational reasons to put everything into a database. Due to the
append-only nature of the data store feature, FileDataStore is not
affected by the transaction limitations of our file-based persistence
managers. And since the data is stored directly on the file system,
you don't need to worry about the complexities or performance
limitations of an intermediate database.

> which datastore can be easily manage like backup and restore, space
> occupied and fast access? (given a same hardware specs)

FileDataStore wins on all these counts. Backup/restore can be done by
just copying the files, and incremental backups are also possible.
Only minimal extra disk space (intermediate directories and the file
inodes) is required beyond your binary content. And you can access the
data at file system speed.

> has anyone implemented this "Theoretically the data store could be split to
> different directories / hard drives."?

I haven't heard about anyone doing something like that. Note that you
can also achieve similar functionality by using ZFS or another file
system that natively allows you to extend the storage capacity by
adding new disks to the volume.

BR,

Jukka Zitting