You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Martin Böttcher (Created JIRA)" <ji...@apache.org> on 2011/10/11 10:28:29 UTC

[jira] [Created] (JCR-3107) Speed up hierarchy cache initialization

Speed up hierarchy cache initialization
---------------------------------------

                 Key: JCR-3107
                 URL: https://issues.apache.org/jira/browse/JCR-3107
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-core
            Reporter: Martin Böttcher


Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.

It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.

The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Martin Böttcher (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Böttcher updated JCR-3107:
---------------------------------

    Attachment:     (was: JCR-3107.patch)
    
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Martin Böttcher
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Martin Böttcher (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Böttcher updated JCR-3107:
---------------------------------

    Attachment: JCR-3107.patch

proposed patch
                
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Martin Böttcher
>         Attachments: JCR-3107.patch
>
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Stefan Guggisberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Guggisberg updated JCR-3107:
-----------------------------------

    Component/s: query
    
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Martin Böttcher
>         Attachments: JCR-3107.patch
>
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Resolved] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Alex Parvulescu (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Parvulescu resolved JCR-3107.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 2.3.2

good work, thanks!
applied the patch with some modifications

fixed in revision 1064058
                
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Martin Böttcher
>             Fix For: 2.3.2
>
>         Attachments: JCR-3107.patch
>
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Martin Böttcher (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Böttcher updated JCR-3107:
---------------------------------

    Attachment: JCR-3107.patch

new proposed patch
                
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Martin Böttcher
>         Attachments: JCR-3107.patch
>
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Alex Parvulescu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126541#comment-13126541 ] 

Alex Parvulescu commented on JCR-3107:
--------------------------------------

added some logs in revision 1182824
                
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Martin Böttcher
>             Fix For: 2.3.2
>
>         Attachments: JCR-3107.patch
>
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Julian Reschke (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julian Reschke updated JCR-3107:
--------------------------------

    Fix Version/s: 2.2.11
    
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Martin Böttcher
>             Fix For: 2.2.11, 2.3.2
>
>         Attachments: JCR-3107.patch
>
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Martin Böttcher (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Böttcher updated JCR-3107:
---------------------------------

    Status: Patch Available  (was: Open)

proposed patch
                
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Martin Böttcher
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Martin Böttcher (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Böttcher updated JCR-3107:
---------------------------------

    Status: Open  (was: Patch Available)

revoked due to results of discussion with Marcel
                
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Martin Böttcher
>         Attachments: JCR-3107.patch
>
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Issue Comment Edited] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Alex Parvulescu (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126474#comment-13126474 ] 

Alex Parvulescu edited comment on JCR-3107 at 10/13/11 1:02 PM:
----------------------------------------------------------------

good work, thanks!
applied the patch with some modifications

fixed in revision 1182761

(edit: changed to the revision number)
                
      was (Author: alex.parvulescu):
    good work, thanks!
applied the patch with some modifications

fixed in revision 1064058
                  
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Martin Böttcher
>             Fix For: 2.3.2
>
>         Attachments: JCR-3107.patch
>
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Martin Böttcher (Commented JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125004#comment-13125004 ] 

Martin Böttcher commented on JCR-3107:
--------------------------------------

Writing the array every time the "doClose"-method is invoked may lead to some runtime issues because the array can have a size of many megabytes. If the file is touched every time (even if it's contents isn't changed) a backup mechanism will need to handle this persistence file again and again.

Therefore it seems to be a better idea to write the file only once as soon as the initializer finished.
                
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Martin Böttcher
>         Attachments: JCR-3107.patch
>
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (JCR-3107) Speed up hierarchy cache initialization

Posted by "Martin Böttcher (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Böttcher updated JCR-3107:
---------------------------------

    Comment: was deleted

(was: proposed patch)
    
> Speed up hierarchy cache initialization
> ---------------------------------------
>
>                 Key: JCR-3107
>                 URL: https://issues.apache.org/jira/browse/JCR-3107
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Martin Böttcher
>
> Initializing a workspace can take quite a long time if there is a big number of nodes and some search indexes involved. The reason is that the setup of the CachingIndexReader is processed using chunks of a certain size (actually 400K) in order to reduce the memory footprint. As soon as the number of documents exceeds this limit some operations (actually traversing complete indexes) are performed again and again.
> It seems that the current algorithm "initializeParents" in the CachingIndexReader class can't be optimized without increasing the memory consumption. Therefore it should be a promising approach to persist the "state" of this class (actually it's main member array and map) and reload it on startup.
> The "load" of the state can be done implicitly in the initializing phase of the cache. This is obvious. The correct point of time to call the "save" operation isn't obvious at all. I tried the "doClose" method of the class and it seems sufficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira