You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Christoph Kiehl (JIRA)" <ji...@apache.org> on 2007/07/30 17:17:52 UTC

[jira] Created: (JCR-1041) Avoid using BitSets in ChildAxisQuery to minimize memory usage

Avoid using BitSets in ChildAxisQuery to minimize memory usage
--------------------------------------------------------------

                 Key: JCR-1041
                 URL: https://issues.apache.org/jira/browse/JCR-1041
             Project: Jackrabbit
          Issue Type: Improvement
          Components: query
    Affects Versions: 1.3
            Reporter: Christoph Kiehl
            Assignee: Christoph Kiehl


When doing ChildAxisQueries on large indexes the internal BitSet instance (hits) may consume a lot of memory because the BitSet is always as large as IndexReader.maxDoc(). In our case we had a query consisting of 7 ChildAxisQueries which combined to a total of 14MB. Since we have multiple users executing this query simultaneously this caused an out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1041) Avoid using BitSets in ChildAxisQuery to minimize memory usage

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518422 ] 

Jukka Zitting commented on JCR-1041:
------------------------------------

How about an alternative implementation of using just a TreeSet of Integers instead of BitSets? A sparse set would scale with the number of query results instead of the size of the entire workspace.

> Avoid using BitSets in ChildAxisQuery to minimize memory usage
> --------------------------------------------------------------
>
>                 Key: JCR-1041
>                 URL: https://issues.apache.org/jira/browse/JCR-1041
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>            Assignee: Christoph Kiehl
>         Attachments: avoid_using_bitsets.patch
>
>
> When doing ChildAxisQueries on large indexes the internal BitSet instance (hits) may consume a lot of memory because the BitSet is always as large as IndexReader.maxDoc(). In our case we had a query consisting of 7 ChildAxisQueries which combined to a total of 14MB. Since we have multiple users executing this query simultaneously this caused an out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1041) Avoid using BitSets in ChildAxisQuery to minimize memory usage

Posted by "Christoph Kiehl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518568 ] 

Christoph Kiehl commented on JCR-1041:
--------------------------------------

Well, according to various sources (http://martin.nobilitas.com/java/sizeof.html and http://www.javaworld.com/javaworld/javatips/jw-javatip130.html?page=2) an Integer instance needs 16 bytes. A native int needs 4 bytes. This is why I would prefer an array of ints (not even counting the TreeSet overhead which uses a TreeMap which wraps every Integer in an Entry instance).
The AdaptingHits class is only needed for corner cases where contextQuery result sets get really large. You could instead just use ArrayHits directly to resemble what a TreeSet would achieve just using less memory.
But I thought we could use those Hits classes in a few other places like DescendantSelfAxisQuery as well, where large contextQuery results are much more common.

> Avoid using BitSets in ChildAxisQuery to minimize memory usage
> --------------------------------------------------------------
>
>                 Key: JCR-1041
>                 URL: https://issues.apache.org/jira/browse/JCR-1041
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>            Assignee: Christoph Kiehl
>         Attachments: avoid_using_bitsets.patch
>
>
> When doing ChildAxisQueries on large indexes the internal BitSet instance (hits) may consume a lot of memory because the BitSet is always as large as IndexReader.maxDoc(). In our case we had a query consisting of 7 ChildAxisQueries which combined to a total of 14MB. Since we have multiple users executing this query simultaneously this caused an out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-1041) Avoid using BitSets in ChildAxisQuery to minimize memory usage

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-1041:
-------------------------------

          Component/s: jackrabbit-core
    Affects Version/s:     (was: 1.3)

> Avoid using BitSets in ChildAxisQuery to minimize memory usage
> --------------------------------------------------------------
>
>                 Key: JCR-1041
>                 URL: https://issues.apache.org/jira/browse/JCR-1041
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Christoph Kiehl
>            Assignee: Christoph Kiehl
>             Fix For: 1.4
>
>         Attachments: avoid_using_bitsets.patch
>
>
> When doing ChildAxisQueries on large indexes the internal BitSet instance (hits) may consume a lot of memory because the BitSet is always as large as IndexReader.maxDoc(). In our case we had a query consisting of 7 ChildAxisQueries which combined to a total of 14MB. Since we have multiple users executing this query simultaneously this caused an out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-1041) Avoid using BitSets in ChildAxisQuery to minimize memory usage

Posted by "Christoph Kiehl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christoph Kiehl updated JCR-1041:
---------------------------------

    Attachment: avoid_using_bitsets.patch

This patch introduces a few new classes for the handling of hits. In essence this patch does two things:

1. It tries to use int arrays instead of BitSet to reduce memory consumption. If a BitSet is more efficient than an int array it switches to using a BitSet instance.
2. The nameTestHits are only created lazily by wrapping the nameTestScorer in a Hits instance.
3. The calculation if the index of a childnode is valid is only done if the corresponding hit is requested.

In our internal tests the query execution performance didn't suffer but is much better for ChildAxisQueries with a large number of matching child nodes.

> Avoid using BitSets in ChildAxisQuery to minimize memory usage
> --------------------------------------------------------------
>
>                 Key: JCR-1041
>                 URL: https://issues.apache.org/jira/browse/JCR-1041
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>            Assignee: Christoph Kiehl
>         Attachments: avoid_using_bitsets.patch
>
>
> When doing ChildAxisQueries on large indexes the internal BitSet instance (hits) may consume a lot of memory because the BitSet is always as large as IndexReader.maxDoc(). In our case we had a query consisting of 7 ChildAxisQueries which combined to a total of 14MB. Since we have multiple users executing this query simultaneously this caused an out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (JCR-1041) Avoid using BitSets in ChildAxisQuery to minimize memory usage

Posted by "Christoph Kiehl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christoph Kiehl resolved JCR-1041.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4

Added suggested patch with additional license headers in revision 566042

> Avoid using BitSets in ChildAxisQuery to minimize memory usage
> --------------------------------------------------------------
>
>                 Key: JCR-1041
>                 URL: https://issues.apache.org/jira/browse/JCR-1041
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>            Assignee: Christoph Kiehl
>             Fix For: 1.4
>
>         Attachments: avoid_using_bitsets.patch
>
>
> When doing ChildAxisQueries on large indexes the internal BitSet instance (hits) may consume a lot of memory because the BitSet is always as large as IndexReader.maxDoc(). In our case we had a query consisting of 7 ChildAxisQueries which combined to a total of 14MB. Since we have multiple users executing this query simultaneously this caused an out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1041) Avoid using BitSets in ChildAxisQuery to minimize memory usage

Posted by "Christoph Kiehl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524997 ] 

Christoph Kiehl commented on JCR-1041:
--------------------------------------

Fixed a little bug in ChildAxisScorer.skipTo() in revision 572885 as suggested by Ard Schrijvers in http://www.nabble.com/AIOOBE-in-ChildAxisScorer-in-jackrabbit-trunk-tf4376105.html

> Avoid using BitSets in ChildAxisQuery to minimize memory usage
> --------------------------------------------------------------
>
>                 Key: JCR-1041
>                 URL: https://issues.apache.org/jira/browse/JCR-1041
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>            Assignee: Christoph Kiehl
>             Fix For: 1.4
>
>         Attachments: avoid_using_bitsets.patch
>
>
> When doing ChildAxisQueries on large indexes the internal BitSet instance (hits) may consume a lot of memory because the BitSet is always as large as IndexReader.maxDoc(). In our case we had a query consisting of 7 ChildAxisQueries which combined to a total of 14MB. Since we have multiple users executing this query simultaneously this caused an out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1041) Avoid using BitSets in ChildAxisQuery to minimize memory usage

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519649 ] 

Marcel Reutegger commented on JCR-1041:
---------------------------------------

Very nice work.

Can you please add a copy right header to the new files? And then feel free to commit the changes.

> Avoid using BitSets in ChildAxisQuery to minimize memory usage
> --------------------------------------------------------------
>
>                 Key: JCR-1041
>                 URL: https://issues.apache.org/jira/browse/JCR-1041
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>            Assignee: Christoph Kiehl
>         Attachments: avoid_using_bitsets.patch
>
>
> When doing ChildAxisQueries on large indexes the internal BitSet instance (hits) may consume a lot of memory because the BitSet is always as large as IndexReader.maxDoc(). In our case we had a query consisting of 7 ChildAxisQueries which combined to a total of 14MB. Since we have multiple users executing this query simultaneously this caused an out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.