You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Martijn Hendriks (JIRA)" <ji...@apache.org> on 2009/11/25 08:36:39 UTC

[jira] Created: (JCR-2407) Make the disk space used by cached binary properties configurable

Make the disk space used by cached binary properties configurable
-----------------------------------------------------------------

                 Key: JCR-2407
                 URL: https://issues.apache.org/jira/browse/JCR-2407
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-core
    Affects Versions: 2.0-beta1
            Reporter: Martijn Hendriks


Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782373#action_12782373 ] 

Thomas Mueller commented on JCR-2407:
-------------------------------------

So you are using a blob store? Jackrabbit 2.0 uses a data store by default, see also http://wiki.apache.org/jackrabbit/DataStore

Did you try using a data store? I believe the problem doesn't apply when using a file data store. The database data store does create some temporary files, but only if and when you actually read from the stream, and until the stream is fully read.

I don't think the problem is related to database connection pooling.

> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>         Attachments: repository.xml, workspace.xml
>
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Stefan Guggisberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782345#action_12782345 ] 

Stefan Guggisberg commented on JCR-2407:
----------------------------------------

deletion of the temp files is triggered by gc. rather than limiting the disk space we should investigate why the temp files aren't collected.
maybe there are places where the reference is not cleared. 

i assume the problem only occurs in in certain configurations where the binary values are stored in a db (pm or data store).

it might also be related to the size of the temp files (smaller binary values should be kept entirely in memory).

> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Martijn Hendriks (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782349#action_12782349 ] 

Martijn Hendriks commented on JCR-2407:
---------------------------------------

We've had problems with this once or twice before and as far as I could see the cleaning up by GC and shut-down hook worked fine. The issue was that the SharedItemStateManager just kept a lot of binary properties in its cache whose contents were stored on disk in a small filesystem (via BlobInTempFile) instances. We usually use configurations were the blobs are stored through the PM in the database.


> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Martijn Hendriks (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782831#action_12782831 ] 

Martijn Hendriks commented on JCR-2407:
---------------------------------------

After having a look at the code I also think that this problem is specific for deployments that use a BlobStore and not the DataStore. The DbDataStore seems to cache nothing on the local file system (except when copyWhileReading is true but even then the temp file is deleted after closing the stream). So repeated reads just stream the blob every time from the database? If so, isn't this a performance issue?

I can see the performance advantages of caching the DB blob in the temp dir clearly.  Can we add information about the disk usage to the Cache interface similar the getMemoryUsed method? The PropertyState class should then also get a method similar to calculateMemoryFootprint for the disk footprint. The CacheManager and Cache implementations can then use this additional information to evict binary properties if necessary. This might fit in quite cleanly in the current design :)

> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>         Attachments: repository.xml, workspace.xml
>
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Julien Poffet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782375#action_12782375 ] 

Julien Poffet commented on JCR-2407:
------------------------------------

My repository source wasn't configure with a datastore, so all my files are stored as blob in my database. What I'm doing now is to extract all the files from this config to a new config which is now setup with a datastore.

May be it's at read time that the stream aren't closed correctly?

> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>         Attachments: repository.xml, workspace.xml
>
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Martijn Hendriks (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790699#action_12790699 ] 

Martijn Hendriks commented on JCR-2407:
---------------------------------------

Another thing is that the AbstractBundlePersistenceManager has a cache which is not managed by the CacheManager. If the BlobStore is used, then binary properties in in the bundle cache are stored in the temp file system.

> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>         Attachments: repository.xml, workspace.xml
>
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Julien Poffet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782369#action_12782369 ] 

Julien Poffet commented on JCR-2407:
------------------------------------

According to the comment of Stefan, I think that it would be nice if we can have an option to say if we want to read the blobs on demand. In my situation I do not really care about performance. I'm just making a full scan of a repository through WebDav to read the content and re-import it to a new repository. My source repository is about 60go so I don't want 60go of blobs cached in the temporary directory... As it is now, all the blobs are spooled to the local file systems but deleted only when I stop tomcat. I tried to tuned the cache manager with no better result...


> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>         Attachments: repository.xml, workspace.xml
>
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Stefan Guggisberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782361#action_12782361 ] 

Stefan Guggisberg commented on JCR-2407:
----------------------------------------

> [...] were the blobs are stored through the PM in the database.

this explains it. blobs read from a db are spooled to the local file system for performance reasons, that's by design.

we could e.g. change the caching behavior of the SharedItemStateManager to not 'cache' PropertyState instances with temp-file based binary values. but that seems a bit ugly.

another option would be to read the binary value on demand. this would however impact performance and would require to tie the Value implementation directly to the persistence layer. we deliberately avoided this in the past since that would sigificantly compromise the current design.
 
BTW: please note that the SharedItemStateManager 'cache' is not just a cache in the traditional sense but also an integral part of jackrabbit's isolation level support (read committed).


> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782380#action_12782380 ] 

Jukka Zitting commented on JCR-2407:
------------------------------------

> I don't think the problem is related to database connection pooling.

The fact that we now use only a single database connection is the core reason for why we need to pull the full binary already during the PersistenceManager.load() call instead of streaming it to the client directly from the database.

We can sort why the temp files are not being reclaimed earlier without involving connnection pools, but we do weed them if we want to avoid the temp files entirely.


> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>         Attachments: repository.xml, workspace.xml
>
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Julien Poffet (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Poffet updated JCR-2407:
-------------------------------

    Attachment: workspace.xml
                repository.xml

> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>         Attachments: repository.xml, workspace.xml
>
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Stefan Guggisberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782400#action_12782400 ] 

Stefan Guggisberg commented on JCR-2407:
----------------------------------------

> The fact that we now use only a single database connection is the core reason for why we need to pull the full binary already during the PersistenceManager.load() call instead of streaming it to the client directly from the database.

no, the main reason was to allow repeated reads without requiring server-roudtrips... performance of blob handling in db's is usually very slooooooow (at l;east in my experience).


> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>         Attachments: repository.xml, workspace.xml
>
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782402#action_12782402 ] 

Thomas Mueller commented on JCR-2407:
-------------------------------------

> The fact that we now use only a single database connection is the core

>From what I heard the main reason for the temp file was to detach SharedItemStateManager from the persistence manager / blob store.

Again, the problem only occurs when using the blob store (unless I'm wrong). We want to deprecate the blob store anyway, right? So would it make sense to change the blob store implementation once database connection pooling is implemented?

> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>         Attachments: repository.xml, workspace.xml
>
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2407) Make the disk space used by cached binary properties configurable

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782348#action_12782348 ] 

Thomas Mueller commented on JCR-2407:
-------------------------------------

This sounds more like a bug than like an improvement. 

In any case, a reproducible test case would help a lot. Also, we would need to know the configuration (repository.xml and workspace.xml) and 
Jackrabbit version where it occurs.

> Make the disk space used by cached binary properties configurable
> -----------------------------------------------------------------
>
>                 Key: JCR-2407
>                 URL: https://issues.apache.org/jira/browse/JCR-2407
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.0-beta1
>            Reporter: Martijn Hendriks
>
> Binary properties which are in Jackrabbit's caches (SharedItemStateManager eg) are stored on disk in the temp dir. This can cause problems on small temporary file systems as the size of the binary properties on disk is not limited by Jackrabbit. There is one way to influence this indirectly: make the Jackrabbit cache sizes smaller (via the CacheManager). It could be helpful in some cases if an upper bound on the disk usage can be given. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.