You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by jd...@21technologies.com on 2006/12/15 23:10:57 UTC

Storing data in a non-public schema and a postgres persistance manager

Hi,
I have two questions: 
 First, is there a way to configure Jackrabbit to store data someplace 
other than the default public schema of a database?  The "schema" argument 
in the repository configuration file refers to the type of schema.  The 
only way I can control where data goes in the database is by changing the 
schema object prefix.  It would be nice if I could setup different schemas 
within the same database for different repositories that I have setup for 
testing.

Second, I've continued playing around with Jackrabbit performance under 
PostgreSQL.  Because Jackrabbit uses the bytea data type to store blobs in 
the database, it performs poorly creating a giant memory footprint that is 
dependent on the size of the data that is being put into the database. 
This memory footprint requires a JVM heap size of anywhere from 4 times to 
8 times the size of the data being loaded or stored.  There are several 
articles that can be found about postgres' problems with blobs and the 
memroy consumption that is a result of using the bytea data type.  It 
appears that these memory problems can be avoided if I use the LargeObject 
API instead of the bytea data type to represent my blobs (see 
http://jdbc.postgresql.org/documentation/82/binary-data.html), although I 
have not tried it yet.  I am considering building a 
PostgresPersistenceManager by extending SimpleDBPersistenceManager that 
will use LargeObjects instead of bytea.  Has anybody tried using 
LargeObjects with Jackrabbit instead of bytea? Is there a reason this 
approach won't work?  Why did Jackrabbit use bytea to begin with?

Thanks for your help,
Joe.

Re: Storing data in a non-public schema and a postgres persistance manager

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 12/16/06, jdente@21technologies.com <jd...@21technologies.com> wrote:
> First, is there a way to configure Jackrabbit to store data someplace
> other than the default public schema of a database?  The "schema" argument
> in the repository configuration file refers to the type of schema.  The
> only way I can control where data goes in the database is by changing the
> schema object prefix.  It would be nice if I could setup different schemas
> within the same database for different repositories that I have setup for
> testing.

I assume you mean PostgreSQL schemas? Perhaps you could achieve that
by using a repository prefix like "schemaA." (note the dot at the
end).

> I am considering building a PostgresPersistenceManager by extending
> SimpleDBPersistenceManager that will use LargeObjects instead of bytea.
> Has anybody tried using LargeObjects with Jackrabbit instead of bytea?
> Is there a reason this approach won't work?  Why did Jackrabbit use bytea
> to begin with?

Using LargeObjects seems reasonable with PostgreSQL, though I really
consider it a deficiency of PostgreSQL that a custom mechanism is
needed for efficient handling of binary columns.

The database persistence manager was originally written for databases
binary columns can be used efficiently, and using bytea is the easiest
way to make the default implementation work with PostgreSQL. A custom
implementation that uses LargeObjects would certainly be possible, see
for example the OraclePersistenceManager that also uses custom
processing for blob fields.

BR,

Jukka Zitting