You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2010/03/01 16:37:05 UTC

[jira] Created: (JCR-2524) Reduce memory usage of DocIds

Reduce memory usage of DocIds
-----------------------------

                 Key: JCR-2524
                 URL: https://issues.apache.org/jira/browse/JCR-2524
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-core
            Reporter: Marcel Reutegger
            Priority: Minor


Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2524) Reduce memory usage of DocIds

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcel Reutegger updated JCR-2524:
----------------------------------

    Attachment: JCR-2524.patch

Updated patch with described changes.

> Reduce memory usage of DocIds
> -----------------------------
>
>                 Key: JCR-2524
>                 URL: https://issues.apache.org/jira/browse/JCR-2524
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>         Attachments: JCR-2524.patch, JCR-2524.patch
>
>
> Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2524) Reduce memory usage of DocIds

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839717#action_12839717 ] 

Marcel Reutegger commented on JCR-2524:
---------------------------------------

Some memory stats from a real life system: a fully populated DocId cache for 300'000 nodes consumes about 6MB of heap.

> Reduce memory usage of DocIds
> -----------------------------
>
>                 Key: JCR-2524
>                 URL: https://issues.apache.org/jira/browse/JCR-2524
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>
> Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2524) Reduce memory usage of DocIds

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839855#action_12839855 ] 

Marcel Reutegger commented on JCR-2524:
---------------------------------------

Forgot to mention that the proposed patch reduces the memory usage to about a third.

> Reduce memory usage of DocIds
> -----------------------------
>
>                 Key: JCR-2524
>                 URL: https://issues.apache.org/jira/browse/JCR-2524
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>         Attachments: JCR-2524.patch
>
>
> Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2524) Reduce memory usage of DocIds

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840074#action_12840074 ] 

Thomas Mueller commented on JCR-2524:
-------------------------------------

> Caching the most recently used DocIds and reuse them might help to reduce the memory usage

+1

> DocIds that could be represented with a short instead of an int

According to my test, this will not reduce memory usage: http://h2database.com/p.html#da4c6a321d0dc84a2b7b96cdbf468a47

For the Sun JVM (JDK 1.5, 32 bit), objects with one field of type boolean, byte, short, character, integer, and long all need 16 bytes. A small BigInteger uses 56 bytes, a small  BigDecimal uses 32 bytes (probably re-uses the same BigInteger internally), and a String uses 24 bytes. Object uses 8 bytes.

For JDK 1.6, 32 bit and 64 bit, it's a bit different: 20 bytes for an object, 24 bytes for boolean - long.

For JDK 1.5, 64 bit, it's again different: 16 bytes for an object, 24 bytes for boolean - long.


> Reduce memory usage of DocIds
> -----------------------------
>
>                 Key: JCR-2524
>                 URL: https://issues.apache.org/jira/browse/JCR-2524
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>         Attachments: JCR-2524.patch
>
>
> Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2524) Reduce memory usage of DocIds

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcel Reutegger updated JCR-2524:
----------------------------------

    Status: Patch Available  (was: Open)

> Reduce memory usage of DocIds
> -----------------------------
>
>                 Key: JCR-2524
>                 URL: https://issues.apache.org/jira/browse/JCR-2524
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>         Attachments: JCR-2524.patch
>
>
> Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2524) Reduce memory usage of DocIds

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcel Reutegger updated JCR-2524:
----------------------------------

       Resolution: Fixed
    Fix Version/s: 2.1.0
           Status: Resolved  (was: Patch Available)

Committed new patch and added a test case to improve test coverage of CachingIndexReader.

svn revision: 924677

> Reduce memory usage of DocIds
> -----------------------------
>
>                 Key: JCR-2524
>                 URL: https://issues.apache.org/jira/browse/JCR-2524
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>             Fix For: 2.1.0
>
>         Attachments: JCR-2524.patch, JCR-2524.patch
>
>
> Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2524) Reduce memory usage of DocIds

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841737#action_12841737 ] 

Marcel Reutegger commented on JCR-2524:
---------------------------------------

hmm, you are right. should have looked more closely what the memory analyzer reported.

here's another idea:

- use int arrays and create PlainDocIds on the fly (possibly using cached instances)
- a special value in the int array marks the existence of a UUIDDocId, which are held in a separate map


> Reduce memory usage of DocIds
> -----------------------------
>
>                 Key: JCR-2524
>                 URL: https://issues.apache.org/jira/browse/JCR-2524
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>         Attachments: JCR-2524.patch
>
>
> Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2524) Reduce memory usage of DocIds

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcel Reutegger updated JCR-2524:
----------------------------------

    Attachment: JCR-2524.patch

Proposed patch.

> Reduce memory usage of DocIds
> -----------------------------
>
>                 Key: JCR-2524
>                 URL: https://issues.apache.org/jira/browse/JCR-2524
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>         Attachments: JCR-2524.patch
>
>
> Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2524) Reduce memory usage of DocIds

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847282#action_12847282 ] 

Marcel Reutegger commented on JCR-2524:
---------------------------------------

Removed System.out debug calls in test class.

svn revision: 925141

> Reduce memory usage of DocIds
> -----------------------------
>
>                 Key: JCR-2524
>                 URL: https://issues.apache.org/jira/browse/JCR-2524
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>             Fix For: 2.1.0
>
>         Attachments: JCR-2524.patch, JCR-2524.patch
>
>
> Implementations of DocIds are used to cache parent child relations of nodes in the index. Usually there are a lot of duplicate objects because a DocId instance is used to identify the parent of a node in the index. That is, sibling nodes will all have DocIds with the same value. Currently a new DocId instance is created for each node. Caching the most recently used DocIds and reuse them might help to reduce the memory usage. Furthermore there are DocIds that could be represented with a short instead of an int when possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.