You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by aasoj j <aa...@gmail.com> on 2009/05/27 03:40:46 UTC

Scalability issues while storing large number of small properties

Hi,

I am a new jackrabbit user. We are using this repository for storing our
application's data. In doing so we are facing scalability issues.

Our application has a huge number of properties, around 1 million. These
properties are distributed in versionable jackrabbit nodes, each node having
around 50 properties and 15 children nodes. Each property has a unique 50
character long value. We use MySql for persistence.

While creating the tree our application crashed. The indexes grew to more
that 4.5 GB. Later when we tried to remove the root node, the indexes grew
to 15 GB and the application crashed again. As we plan to use search
functionality, we cannot disable indexes.

The actual data is around 100 MB. As jackrabbit is a content store, I am
sure it can support this data size. Please provide some pointers suggestion
to fix the problem.

Regards
aasoj

Re: Scalability issues while storing large number of small properties

Posted by Thomas Müller <th...@day.com>.
Hi,

> Our application has a huge number of properties, around 1 million

Each unique property (and node name) is kept in memory in Jackrabbit
(in the StringIndex). In any case you may run into memory problems.
Could you try to not use a lot of distinct property names? In addition
to that, I suggest not to create nodes with many thousand properties.

Regards,
Thomas

Re: Scalability issues while storing large number of small properties

Posted by Alexander Klimetschek <ak...@day.com>.
On Wed, May 27, 2009 at 2:12 PM, Alexander Klimetschek <ak...@day.com> wrote:
> the entire repository will be put into the memory, with some overhead
> (albeit I am wondering why the factor is around 150 (100MB -> 15GB).

Ahm, yes, 15 GB was the size of the index, not of the heap. So its
probably a lot less and makes more sense. As Ian suggested, try to
increase the heap size for the JVM. 512 MB or even 1 GB is a minimum,
practically speaking.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Scalability issues while storing large number of small properties

Posted by Alexander Klimetschek <ak...@day.com>.
On Wed, May 27, 2009 at 3:40 AM, aasoj j <aa...@gmail.com> wrote:
> Our application has a huge number of properties, around 1 million. These
> properties are distributed in versionable jackrabbit nodes, each node having
> around 50 properties and 15 children nodes. Each property has a unique 50
> character long value. We use MySql for persistence.
>
> While creating the tree our application crashed. The indexes grew to more
> that 4.5 GB. Later when we tried to remove the root node, the indexes grew
> to 15 GB and the application crashed again. As we plan to use search
> functionality, we cannot disable indexes.

That the index can grow that much could be "ok", because with all
those unique values it probably blows up a lot. But this is just my
assumption, I don't know the exact behaviour of the Lucene index in
that case.

I guess the outofmemory is not related to the index, but rather to the
removal of the whole tree. Removing nodes is currently memory-bound,
as the whole removal process happens in the transient part of the
session, which is solely kept in memory. If you delete the root node,
the entire repository will be put into the memory, with some overhead
(albeit I am wondering why the factor is around 150 (100MB -> 15GB).
As a workaround, you could remove smaller subtrees and call save() in
between.

BTW, I guess the use case for removing the whole tree is just for
development, where you want to reimport or populate your repository
again. If you delete everything anyway, you can just swipe away your
database and the contents of the workspace directory (except for the
workspace.xml file).

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Scalability issues while storing large number of small properties

Posted by Alexander Klimetschek <ak...@day.com>.
On Mon, Jun 1, 2009 at 11:43 PM, aasoj j <aa...@gmail.com> wrote:
> *Repository configuration
> *    <Workspace>
>                ....
>        <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>            <param name="path" value="${wsp.home}/index"/>
>            <param name="textFilterClasses"
> value="org.apache.jackrabbit.extractor.PlainTextExtractor"/>
>            <param name="extractorPoolSize" value="2"/>
>            <param name="supportHighlighting" value="true"/>
>            <param name="analyzer"
> value="repository.jackrabbit.EmptyAnalyzer" />
>        </SearchIndex>
>    </Workspace>

Did you change the workspace.xml?

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Scalability issues while storing large number of small properties

Posted by aasoj j <aa...@gmail.com>.
Hi,

Thanks for your suggestions.

I allocated 1GB memory and still my application crashed everytime while the
tree was getting created. I then thought of disabling indexes using my own
analyzer and filter. And to my surprise even then the application crashed.

public class EmptyAnalyzer extends Analyzer {
    public TokenStream tokenStream(String arg0, Reader arg1) {
        return new EmptyFilter(new KeywordTokenizer(arg1));
    }

    private class EmptyFilter extends TokenFilter {
        public EmptyFilter(KeywordTokenizer keywordTokenizer) {
            super(keywordTokenizer);
        }
        public Token next(Token result) throws IOException {
            return null;
        }
    }
}

*Repository configuration
*    <Workspace>
                ....
        <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.PlainTextExtractor"/>
            <param name="extractorPoolSize" value="2"/>
            <param name="supportHighlighting" value="true"/>
            <param name="analyzer"
value="repository.jackrabbit.EmptyAnalyzer" />
        </SearchIndex>
    </Workspace>


*
Some log messages
*2009-06-02 02:55:33 INFO
org.apache.jackrabbit.core.persistence.bundle.util.LRUNodeIdCache: (70)
num=142/10240 hits=9830 miss=170
2009-06-02 02:57:48 INFO
org.apache.jackrabbit.core.persistence.bundle.util.LRUNodeIdCache: (70)
num=157/10240 hits=9840 miss=160
2009-06-02 02:57:48 INFO
org.apache.jackrabbit.core.persistence.bundle.util.LRUNodeIdCache: (70)
num=157/10240 hits=19840 miss=160
2009-06-02 02:57:51 INFO
org.apache.jackrabbit.core.persistence.bundle.util.LRUNodeIdCache: (70)
num=157/10240 hits=22795 miss=7205
2009-06-02 02:57:51 INFO
org.apache.jackrabbit.core.persistence.bundle.util.BundleCache: (106) num=3
mem=3638k max=8192k avg=1242050 hits=9839 miss=161
2009-06-02 02:57:51 INFO
org.apache.jackrabbit.core.persistence.bundle.util.LRUNodeIdCache: (70)
num=157/10240 hits=22795 miss=17205
2009-06-02 02:57:51 INFO
org.apache.jackrabbit.core.persistence.bundle.util.BundleCache: (106) num=3
mem=3638k max=8192k avg=1242050 hits=19839 miss=161
.....


*And the error
*2009-06-02 02:57:53 ERROR
org.apache.jackrabbit.core.query.lucene.MultiIndex: (1170) Unable to commit
volatile index
java.io.IOException: Java heap space
    at
org.apache.jackrabbit.core.query.lucene.Util.createIOException(Util.java:108)
    at
org.apache.jackrabbit.core.query.lucene.AbstractIndex.addDocuments(AbstractIndex.java:206)
    at
org.apache.jackrabbit.core.query.lucene.VolatileIndex.commitPending(VolatileIndex.java:162)
    at
org.apache.jackrabbit.core.query.lucene.VolatileIndex.commit(VolatileIndex.java:140)
    at
org.apache.jackrabbit.core.query.lucene.PersistentIndex.copyIndex(PersistentIndex.java:124)
    at
org.apache.jackrabbit.core.query.lucene.MultiIndex$VolatileCommit.execute(MultiIndex.java:1955)



Best regards
aasoj

On Wed, May 27, 2009 at 5:41 PM, Ian Boston <ie...@tfd.co.uk> wrote:

> Might be worth giving it more memory ? both perm space and heap, its a bit
> hard to see how much you have given it so far, but under load or with a
> large index space (thats number of terms not number of items) I would not be
> surprised to give the JVM 1G. (YMMV)
>
> How much memory have you given it ?
> Ian
>
>
>
> On 27 May 2009, at 12:27, aasoj j wrote:
>
>  Hi Ian,
>>
>> In case it helps, this is what I have observed:
>>
>> Heap
>> PSYoungGen      total 61504K, used 16471K [0xb0d70000, 0xb6420000,
>> 0xb7f30000)
>>  eden space 39488K, 40% used [0xb0d70000,0xb1ce7b70,0xb3400000)
>>  from space 22016K, 2% used [0xb4ea0000,0xb4f3e398,0xb6420000)
>>  to   space 24640K, 0% used [0xb3400000,0xb3400000,0xb4c10000)
>> PSOldGen        total 315840K, used 270946K [0x77f30000, 0x8b3a0000,
>> 0xb0d70000)
>>  object space 315840K, 85% used [0x77f30000,0x887c88a8,0x8b3a0000)
>> PSPermGen       total 41088K, used 23034K [0x73f30000, 0x76750000,
>> 0x77f30000)
>>  object space 41088K, 56% used [0x73f30000,0x755aebd0,0x76750000)
>>
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to /home/y/logs/
>> /home/y/logs/yjava_tomcat ...
>> Unable to create /home/y/logs/
>> /home/y/logs/yjava_tomcat: No such file or directory
>> Exception in thread "RMI TCP Connection(idle)" Exception in thread
>> "RMI TCP Connection(idle)"
>> Exception in thread "Thread-1" Exception in thread "RMI TCP
>> Connection(idle)" Exception in thread
>> "RMI TCP Connection(idle)" Exception in thread
>> "http-0.0.0.0-4080-Acceptor-0"
>> java.lang.OutOfMemoryError: Java heap space
>> Exception in thread "RMI TCP Connection(idle)" Exception in thread
>> "RMI TCP Connection(idle)"
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>>       at java.lang.StringCoding.encode(StringCoding.java:266)
>>       at java.lang.String.getBytes(String.java:947)
>>       at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>>       at
>> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
>>       at java.io.File.exists(File.java:733)
>>       at
>> org.apache.log4j.helpers.FileWatchdog.checkAndConfigure(FileWatchdog.java:76)
>>       at org.apache.log4j.helpers.FileWatchdog.run(FileWatchdog.java:107)
>> java.lang.OutOfMemoryError: Java heap space
>> Exception in thread "RMI TCP Connection(idle)"
>> java.lang.OutOfMemoryError: Java heap space
>> Exception in thread "RMI TCP Connection(idle)" Exception in thread
>> "RMI TCP Connection(idle)"
>> Exception in thread "RMI TCP Connection(idle)"
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> Exception in thread "RMI TCP Connection(idle)"
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> Exception in thread "RMI TCP Connection(idle)"
>> java.lang.OutOfMemoryError: Java heap space
>> Exception in thread "RMI TCP Connection(idle)"
>> java.lang.OutOfMemoryError: Java heap space
>> Exception in thread "RMI TCP Connection(idle)"
>> java.lang.OutOfMemoryError: Java heap space
>>
>> Regards
>> aasoj
>>
>> On Wed, May 27, 2009 at 12:47 PM, Ian Boston <ie...@tfd.co.uk> wrote:
>>
>>  Hi,
>>> Out of interest, what was the nature of the crash ?
>>> Did the JVM just freeze or was there a traceback?
>>> Ian
>>>
>>> On 27 May 2009, at 02:40, aasoj j wrote:
>>>
>>> Hi,
>>>
>>>>
>>>> I am a new jackrabbit user. We are using this repository for storing our
>>>> application's data. In doing so we are facing scalability issues.
>>>>
>>>> Our application has a huge number of properties, around 1 million. These
>>>> properties are distributed in versionable jackrabbit nodes, each node
>>>> having
>>>> around 50 properties and 15 children nodes. Each property has a unique
>>>> 50
>>>> character long value. We use MySql for persistence.
>>>>
>>>> While creating the tree our application crashed. The indexes grew to
>>>> more
>>>> that 4.5 GB. Later when we tried to remove the root node, the indexes
>>>> grew
>>>> to 15 GB and the application crashed again. As we plan to use search
>>>> functionality, we cannot disable indexes.
>>>>
>>>> The actual data is around 100 MB. As jackrabbit is a content store, I am
>>>> sure it can support this data size. Please provide some pointers
>>>> suggestion
>>>> to fix the problem.
>>>>
>>>> Regards
>>>> aasoj
>>>>
>>>>
>>>
>>>
>

Re: Scalability issues while storing large number of small properties

Posted by Ian Boston <ie...@tfd.co.uk>.
Might be worth giving it more memory ? both perm space and heap, its a  
bit hard to see how much you have given it so far, but under load or  
with a large index space (thats number of terms not number of items) I  
would not be surprised to give the JVM 1G. (YMMV)

How much memory have you given it ?
Ian


On 27 May 2009, at 12:27, aasoj j wrote:

> Hi Ian,
>
> In case it helps, this is what I have observed:
>
> Heap
> PSYoungGen      total 61504K, used 16471K [0xb0d70000, 0xb6420000,  
> 0xb7f30000)
>  eden space 39488K, 40% used [0xb0d70000,0xb1ce7b70,0xb3400000)
>  from space 22016K, 2% used [0xb4ea0000,0xb4f3e398,0xb6420000)
>  to   space 24640K, 0% used [0xb3400000,0xb3400000,0xb4c10000)
> PSOldGen        total 315840K, used 270946K [0x77f30000, 0x8b3a0000,
> 0xb0d70000)
>  object space 315840K, 85% used [0x77f30000,0x887c88a8,0x8b3a0000)
> PSPermGen       total 41088K, used 23034K [0x73f30000, 0x76750000,  
> 0x77f30000)
>  object space 41088K, 56% used [0x73f30000,0x755aebd0,0x76750000)
>
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to /home/y/logs/
> /home/y/logs/yjava_tomcat ...
> Unable to create /home/y/logs/
> /home/y/logs/yjava_tomcat: No such file or directory
> Exception in thread "RMI TCP Connection(idle)" Exception in thread
> "RMI TCP Connection(idle)"
> Exception in thread "Thread-1" Exception in thread "RMI TCP
> Connection(idle)" Exception in thread
> "RMI TCP Connection(idle)" Exception in thread "http-0.0.0.0-4080- 
> Acceptor-0"
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "RMI TCP Connection(idle)" Exception in thread
> "RMI TCP Connection(idle)"
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
>        at java.lang.StringCoding.encode(StringCoding.java:266)
>        at java.lang.String.getBytes(String.java:947)
>        at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>        at  
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
>        at java.io.File.exists(File.java:733)
>        at  
> org 
> .apache 
> .log4j.helpers.FileWatchdog.checkAndConfigure(FileWatchdog.java:76)
>        at  
> org.apache.log4j.helpers.FileWatchdog.run(FileWatchdog.java:107)
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "RMI TCP Connection(idle)"
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "RMI TCP Connection(idle)" Exception in thread
> "RMI TCP Connection(idle)"
> Exception in thread "RMI TCP Connection(idle)"
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "RMI TCP Connection(idle)"
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "RMI TCP Connection(idle)"
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "RMI TCP Connection(idle)"
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "RMI TCP Connection(idle)"
> java.lang.OutOfMemoryError: Java heap space
>
> Regards
> aasoj
>
> On Wed, May 27, 2009 at 12:47 PM, Ian Boston <ie...@tfd.co.uk> wrote:
>
>> Hi,
>> Out of interest, what was the nature of the crash ?
>> Did the JVM just freeze or was there a traceback?
>> Ian
>>
>> On 27 May 2009, at 02:40, aasoj j wrote:
>>
>> Hi,
>>>
>>> I am a new jackrabbit user. We are using this repository for  
>>> storing our
>>> application's data. In doing so we are facing scalability issues.
>>>
>>> Our application has a huge number of properties, around 1 million.  
>>> These
>>> properties are distributed in versionable jackrabbit nodes, each  
>>> node
>>> having
>>> around 50 properties and 15 children nodes. Each property has a  
>>> unique 50
>>> character long value. We use MySql for persistence.
>>>
>>> While creating the tree our application crashed. The indexes grew  
>>> to more
>>> that 4.5 GB. Later when we tried to remove the root node, the  
>>> indexes grew
>>> to 15 GB and the application crashed again. As we plan to use search
>>> functionality, we cannot disable indexes.
>>>
>>> The actual data is around 100 MB. As jackrabbit is a content  
>>> store, I am
>>> sure it can support this data size. Please provide some pointers
>>> suggestion
>>> to fix the problem.
>>>
>>> Regards
>>> aasoj
>>>
>>
>>


Re: Scalability issues while storing large number of small properties

Posted by aasoj j <aa...@gmail.com>.
Hi Ian,

In case it helps, this is what I have observed:

Heap
 PSYoungGen      total 61504K, used 16471K [0xb0d70000, 0xb6420000, 0xb7f30000)
  eden space 39488K, 40% used [0xb0d70000,0xb1ce7b70,0xb3400000)
  from space 22016K, 2% used [0xb4ea0000,0xb4f3e398,0xb6420000)
  to   space 24640K, 0% used [0xb3400000,0xb3400000,0xb4c10000)
 PSOldGen        total 315840K, used 270946K [0x77f30000, 0x8b3a0000,
0xb0d70000)
  object space 315840K, 85% used [0x77f30000,0x887c88a8,0x8b3a0000)
 PSPermGen       total 41088K, used 23034K [0x73f30000, 0x76750000, 0x77f30000)
  object space 41088K, 56% used [0x73f30000,0x755aebd0,0x76750000)

java.lang.OutOfMemoryError: Java heap space
Dumping heap to /home/y/logs/
/home/y/logs/yjava_tomcat ...
Unable to create /home/y/logs/
/home/y/logs/yjava_tomcat: No such file or directory
Exception in thread "RMI TCP Connection(idle)" Exception in thread
"RMI TCP Connection(idle)"
Exception in thread "Thread-1" Exception in thread "RMI TCP
Connection(idle)" Exception in thread
"RMI TCP Connection(idle)" Exception in thread "http-0.0.0.0-4080-Acceptor-0"
java.lang.OutOfMemoryError: Java heap space
Exception in thread "RMI TCP Connection(idle)" Exception in thread
"RMI TCP Connection(idle)"
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
        at java.lang.StringCoding.encode(StringCoding.java:266)
        at java.lang.String.getBytes(String.java:947)
        at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
        at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
        at java.io.File.exists(File.java:733)
        at org.apache.log4j.helpers.FileWatchdog.checkAndConfigure(FileWatchdog.java:76)
        at org.apache.log4j.helpers.FileWatchdog.run(FileWatchdog.java:107)
java.lang.OutOfMemoryError: Java heap space
Exception in thread "RMI TCP Connection(idle)"
java.lang.OutOfMemoryError: Java heap space
Exception in thread "RMI TCP Connection(idle)" Exception in thread
"RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)"
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Exception in thread "RMI TCP Connection(idle)"
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Exception in thread "RMI TCP Connection(idle)"
java.lang.OutOfMemoryError: Java heap space
Exception in thread "RMI TCP Connection(idle)"
java.lang.OutOfMemoryError: Java heap space
Exception in thread "RMI TCP Connection(idle)"
java.lang.OutOfMemoryError: Java heap space

Regards
aasoj

On Wed, May 27, 2009 at 12:47 PM, Ian Boston <ie...@tfd.co.uk> wrote:

> Hi,
> Out of interest, what was the nature of the crash ?
> Did the JVM just freeze or was there a traceback?
> Ian
>
> On 27 May 2009, at 02:40, aasoj j wrote:
>
>  Hi,
>>
>> I am a new jackrabbit user. We are using this repository for storing our
>> application's data. In doing so we are facing scalability issues.
>>
>> Our application has a huge number of properties, around 1 million. These
>> properties are distributed in versionable jackrabbit nodes, each node
>> having
>> around 50 properties and 15 children nodes. Each property has a unique 50
>> character long value. We use MySql for persistence.
>>
>> While creating the tree our application crashed. The indexes grew to more
>> that 4.5 GB. Later when we tried to remove the root node, the indexes grew
>> to 15 GB and the application crashed again. As we plan to use search
>> functionality, we cannot disable indexes.
>>
>> The actual data is around 100 MB. As jackrabbit is a content store, I am
>> sure it can support this data size. Please provide some pointers
>> suggestion
>> to fix the problem.
>>
>> Regards
>> aasoj
>>
>
>

Re: Scalability issues while storing large number of small properties

Posted by Ian Boston <ie...@tfd.co.uk>.
Hi,
Out of interest, what was the nature of the crash ?
Did the JVM just freeze or was there a traceback?
Ian
On 27 May 2009, at 02:40, aasoj j wrote:

> Hi,
>
> I am a new jackrabbit user. We are using this repository for storing  
> our
> application's data. In doing so we are facing scalability issues.
>
> Our application has a huge number of properties, around 1 million.  
> These
> properties are distributed in versionable jackrabbit nodes, each  
> node having
> around 50 properties and 15 children nodes. Each property has a  
> unique 50
> character long value. We use MySql for persistence.
>
> While creating the tree our application crashed. The indexes grew to  
> more
> that 4.5 GB. Later when we tried to remove the root node, the  
> indexes grew
> to 15 GB and the application crashed again. As we plan to use search
> functionality, we cannot disable indexes.
>
> The actual data is around 100 MB. As jackrabbit is a content store,  
> I am
> sure it can support this data size. Please provide some pointers  
> suggestion
> to fix the problem.
>
> Regards
> aasoj