You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by lujie <lj...@126.com> on 2009/01/06 12:54:31 UTC

Extends SharedItemstateManager for special application?

Hi,
   I have seen so much issues about jackrabbit's in memory caches. For
example, only one jackrabbit can access the repository, cache
sychronization, concurrent read, concurrent write, ismlocking issue,etc.
  Can i extends the SharedItemStateManager, my application use
springframework and hibernate, i think maybe putting the cache into
hibernate can avoid so much problems.
  any ideas?

        regards 

                            lujie
-- 
View this message in context: http://www.nabble.com/Extends-SharedItemstateManager-for-special-application--tp21309023p21309023.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Extends SharedItemstateManager for special application?

Posted by Thomas Müller <th...@day.com>.
Hi,

only one jackrabbit can access the repository, cache
> sychronization, concurrent read, concurrent write, ismlocking issue,etc.


Using a different cache mechanism will not change all that. The cache is
only one part of the architecture of Jackrabbit.


>  Can i extends the SharedItemStateManager, my application use
> springframework and hibernate, i think maybe putting the cache into
> hibernate can avoid so much problems.
>  any ideas?


Sorry, I don't think that changing the cache mechanism will help.

Regards,
Thomas

Re: Jackrabbit RMI Performance

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Jan 9, 2009 at 8:59 AM, Angela Schreiber <an...@day.com> wrote:
>> - using the SPI interface, which is kind of a "batch-optimized" and
>> smaller API compared to JCR, but provides the same feature set
>
> just for clarification: the SPI is an internal layer
> and not meant to be used *instead* of the JCR API.

Oh, yes, you are right, sorry for any confusion.

It is very useful in the long run to stick with the JCR API and not
use workarounds (like direct access to the files in the datastore,
accessing the db directly, using a "similar" API like SPI, etc.),
because JCR is a good standard and it keeps your source independent
from the application.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Jackrabbit RMI Performance

Posted by Angela Schreiber <an...@day.com>.
> - using the SPI interface, which is kind of a "batch-optimized" and
> smaller API compared to JCR, but provides the same feature set

just for clarification: the SPI is an internal layer
and not meant to be used *instead* of the JCR API.

if you are looking for JCR in a remoted environment
and decide to take advantage of the SPI you are
still encouraged to use the JCR API. the difference is
that you will then use the jackrabbit-jcr2spi JCR
implementation instead of jackrabbit-core.

nevertheless:
i will put an extended version of the spi2dav into
the sandbox as soon as i get along ((actually i promised
to have that done in january)). it uses json blocks
for batch-read and batch write... anyway... i will write
some lines when i'm done...

angela

Re: Jackrabbit RMI Performance

Posted by Alexander Klimetschek <ak...@day.com>.
On Wed, Jan 7, 2009 at 4:09 PM, Kurz Wolfgang <wo...@gwvs.de> wrote:
> I started working with Jackrabbit a while back and it seems really cool but atm I am using the RMI support which seems really slow.
>
> When I do a query the Query takes like 90 ms which is fine but then I fill my objects with the node properties which takes like forever.

Yes, RMI is slow over the network since each call is a network
roundtrip. It is just the simplest way to enable a complete JCR
remoting protocol.

> Somone told my that clustering would be a way to have Jackrabbit available to all the applications and use JNDI to access the Repository but I have like no odea if that would help:-)

Clustering will definitely help if you have many read operations,
since the contents of the repository will be replicated on each
cluster node, ie. server. That include the search as well. Heavy
writing to a cluster might have a small impact on performance,
depending on the underlying database that does the data clustering
(see http://wiki.apache.org/jackrabbit/Clustering for the requirements
for clustering).

> Anyone have some Hints performance wise?

Depending on what the application should do, there are some other
remoting options:

- using webdav (http://jackrabbit.apache.org/jackrabbit-jcr-server.html)
- using the http interface that Apache Sling puts on top of JCR
(http://incubator.apache.org/sling/)
- using the SPI interface, which is kind of a "batch-optimized" and
smaller API compared to JCR, but provides the same feature set (see
SPI components on
http://jackrabbit.apache.org/jackrabbit-components.html)

SPI is a good generic solution, but not 100% polished for remoting yet
(it allows remoting via WebDAV and via RMI, but this time on the
optimized SPI interfaces, which should give much less network
roundtris).

Pure webdav works for many document-level cases, but webdav clients in
Java are not so good (AFAIK). Sling offers multiple ways to access the
JCR via HTTP using a JSON format and webdav as well. You can easily
extend Sling to build optimized remoting calls specific for your
application.

Regards,
Alex


--
Alexander Klimetschek
alexander.klimetschek@day.com

Re: JackRabbit1.5 PersistenceManager and DataStore

Posted by Thomas Müller <th...@day.com>.
Hi,

> Persistence Manager class-->
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager

I didn't test it with IBM DB2 so I'm not sure.

> With one of the attributes
> <param name="minBlobSize" value="4096"/>

This is not necessary if you use the DataStore.

> <DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
>  <param name="path" value="${rep.home}/repository/datastore"/>
>  <param name="minRecordLength" value="100"/>
> </DataStore>

Both parameters are not required, I would not define them. See also
http://wiki.apache.org/jackrabbit/DataStore (recently updated) "All
configuration options are optional".

> Is my assumption correct? I have read the wiki, but what's the real
> difference between "minBlobSize" and "minRecordLength", how do they differ?

I believe they don't actually differ much technically, only that one
is for the BLOB Store and the oder for the DataStore.

> With this configuration above, will all the time meta data properties will
> be saved in the database and the PDF's (stream objects) on the file system?

Yes.

Regards,
Thomas

JackRabbit1.5 PersistenceManager and DataStore

Posted by Vijay Pandey <VP...@mdes.ms.gov>.
Hi,

Our project has been using JackRabbit 1.0.1, now we would like to migrate to
JackRabbit 1.5, I have few questions on the dataStore for jackrabbit.

Current 1.0.1 Configuration is:

Persistence Manager class--> SimpleDbPersistenceManager (
externalBlobs=true, database is DB2). In the repository we are basically
storing PDF docs(stream object) that range from size 10kb to around 100KB
max and we associate around 10-14 properties (string and date types,
metadata).With the externalBlobs being true, the actual PDF's are stored in
FileSystem (SAN)

<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${wsp.home}"/>
</FileSystem>

Now with 1.5 Jacrabbit configuration ( with datastore - we still want to
store the PDF's on the file system), so will the following configuration
will suffice?

Persistence Manager class-->
org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager

With one of the attributes
<param name="minBlobSize" value="4096"/> 

<DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
  <param name="path" value="${rep.home}/repository/datastore"/>
  <param name="minRecordLength" value="100"/>
</DataStore>

So if we use DataStore, externalBlobs property will never be used( as we
have set up the DataStore) and as
the dataStore is FileDataStore, will the PDF's (stream) will be stored on
the fileSystem.

Is my assumption correct? I have read the wiki, but what's the real
difference between "minBlobSize" and "minRecordLength", how do they differ?
With this configuration above, will all the time meta data properties will
be saved in the database and the PDF's (stream objects) on the file system?

Thanks
Vijay


Re: Jackrabbit RMI Performance

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi,

On Wed, Jan 7, 2009 at 4:09 PM, Kurz Wolfgang <wo...@gwvs.de> wrote:
> ...When I do a query the Query takes like 90 ms which is fine but then I fill my objects
> with the node properties which takes like forever....

As Alex says, the Sling http/json interface would very probably be
much faster for this - basically you make one HTTP request to an URL
ending in ".query.json", with the query parameters (language,
statement, paging params), and you get back JSON data with all node
values. See [1] for examples.

If you want to try this, see [2] for how to get started with Sling.
IIRC the current Sling release does not include this query feature, so
you'd need to build the current Sling trunk and use the launchpad/app
module for example.

> ...I have the Jackrabbit Webapplication running as a Server providing the RMI support.
> I am using the same application on more than one server so I thought having the
> Jackrabbit Service on a own machine would be a good idea. But now I am not so sure anymore...

That would work fine using the Sling http/json interface, assuming it
implements all the JCR operations that you need.

> ...Somone told my that clustering would be a way to have Jackrabbit available to all
> the applications and use JNDI to access the Repository but I have like no odea if
> that would help:-)...

With JNDI people usually mean accessing a single Repository object
from several webapps, this works well but all webapps must run in the
same JVM.

-Bertrand

[1] http://svn.apache.org/repos/asf/incubator/sling/trunk/launchpad/testing/src/test/java/org/apache/sling/launchpad/webapp/integrationtest/JsonQueryServletTest.java

[2] http://incubator.apache.org/sling/site/discover-sling-in-15-minutes.html

Jackrabbit RMI Performance

Posted by Kurz Wolfgang <wo...@gwvs.de>.
Hello everyone,

I started working with Jackrabbit a while back and it seems really cool but atm I am using the RMI support which seems really slow.

When I do a query the Query takes like 90 ms which is fine but then I fill my objects with the node properties which takes like forever.

I have the Jackrabbit Webapplication running as a Server providing the RMI support.
I am using the same application on more than one server so I thought having the Jackrabbit Service on a own machine would be a good idea. But now I am not so sure anymore.

Somone told my that clustering would be a way to have Jackrabbit available to all the applications and use JNDI to access the Repository but I have like no odea if that would help:-)

Anyone have some Hints performance wise?

Thx a lot in Advance!




Re: Extends SharedItemstateManager for special application?

Posted by lujie <lj...@126.com>.
Hi,
   this is fragments from post
http://www.nabble.com/JackRabbit-Caching%3A-BundleCache-vs-ItemManager-vs-CacheManager-td18190879.html#a18211906:
   To summarise what we're seeing, potential bottlenecks we think we're
seeing
and how we worked around them. Please note I'm not 100% familiar with the
JackRabbit design so some conclusions may be wrong:

 1) application uses Session to read a Node Property
 2) SessionImpl delegates to ItemManager
 3) ItemManager synch on a itemCache (Contention Point 1: Session Wide)
 4) On cache miss, ItemManager ultimately delegates to an SISM
 5) SISM synchs on ISMLocking (Contention Point 2: Global or per item
depending on DefaultISM or FineGrainedISM implementation)
 6) On cache miss, SISM delegates to persistence manager
 7) AbstractBundlePersistenceManager synchs on itself (Contention Point 3:
On persistence Manager)

In some cases our web application will read 2,000 or 3,000 Node properties
to deliver a single page request.

Initially we saw 7) as a bottleneck:
 - can JackRabbit leverage multiple database connections if its synched on a
single persistence manager?
 - we resolved this by configuring a large BundleCache

We then saw 5) as a bottleneck:
 - it seems as each node property is an item every property read contends on
ISMLocking. Is that correct? Is there scope for reading properties/lazy
loading in bulk for item?
 - we partly resolved this by moving from an "pooled session per view"
pattern to a "shared session per view" pattern

We now see contention occasionally on 3). 

I just see three synchronized methods:
1. synchronized(cache) in sharedItemStateManager
2. synchronized ismlocking.
3. synchronzed in persistencemanager

All for consistency in jackrabbit's cache.
If i do not want to use JR's cache, then the three synchronized methods can
be make non-sync,and the ismlocking is not a must.
Any ideas.
    regards.

                  lujie
-- 
View this message in context: http://www.nabble.com/Extends-SharedItemstateManager-for-special-application--tp21309023p21309717.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.