You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Andrey Adamovich <su...@yahoo.com> on 2008/01/23 17:51:20 UTC

NameFactoryImpl$NameImpl

Hello guys!

We have implemented JCR facade for our portal system based on JackRabbit. Facade delegates it's calls to JackRabbit repository, and if data is not available, a request to a legacy CMS is performed and data is inserted into JackRabbit.

The structure of repository is similar to:

/root/<level1>/<level2>/<leve3>/<level4>/<level5>/<id>/<sub_id>/<DocumentNode1>/<DocumentNode2>/textProperty

Number of possible path values at each level is from 2 to 20.
id and sub_id are unique identifiers of the document, under them document structure is stored with maximum depth of 5. Many documents have the same type of structure and property names.

We faced some performance bottlenecks and I have tried to profile our application (with YourKit Java Profiler) and I have noticed that there are many duplicate strings stored on the heap and most of those duplicates (I mean almost all of them) are contained by org.apache.jackrabbit.spi.commons.name.NameFactoryImpl$NameImpl class instances.

After doing several portal page requests (which would mean about 1000 JR requests) and taking memory snapshot I have noticed that string "root" is stored and contained by NameImpl about 12000 times, which is about 2Mb waste. Also other strings with values of the repository level names and property names had from 11000 to 3000 duplicates. The total calculated waste is about 50Mb and that is only after not that many requests.

It is probably not the only memory/performance bottleneck and it also could be that our app is doing something wrong, but it would be good to get some ideas on that from you guys.

After leaving server alone and not doing anything on that for a while (6-8 hours), I have taken memory snapshot again and the number of duplicates has slightly reduced, but I would not say that it changed a lot or many of the duplicate strings have been garbage-collected.

I have also looked at the source of NameFactoryImpl$NameImpl and found that it uses String.intern() for name space storing, but not for local name part, which is wise in general, but may not work if JackRabbit is stressed to have too many requests.

Therefore I have several questions, that some of you may help me with:

1) Is there a way to implement different name creation strategy? I see that NameFactory is an interface, but how would I plug in different implementation to adapt to my repository structure, so, that "root" string would not be stored 12000 times or even more?

2) Can someone explain me how JR cache manager works and can this leak happen because of cache manager storing to many states? Is the size of JR cache depends on the live session number? Would it be wise to disable it? or at least limit it?

Best regards,

Andrey

___________________________________________________________
Support the World Aids Awareness campaign this month with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/

Re: NameFactoryImpl$NameImpl

Posted by Thomas Mueller <th...@gmail.com>.

Hi,

Would it be possible for you to create a simple, standalone test case
(that means, no external dependencies except Jackrabbit core, a single
class with a main method, similar to the FirstHop examples)? Best
would be if the memory problem can be reproduced using the standard
configuration; if not, could you also send the configuration you use?

Getting rid of the duplicate Strings would be fairly easy (using a
simple string cache), but we need to be sure we have a test case so we
know we solve the right problem.

Thanks,
Thomas


On Jan 23, 2008 5:51 PM, Andrey Adamovich <su...@yahoo.com> wrote:
> Hello guys!
>
> We have implemented JCR facade for our portal system based on JackRabbit. Facade delegates it's calls to JackRabbit repository, and if data is not available, a request to a legacy CMS is performed and data is inserted into JackRabbit.
>
> The structure of repository is similar to:
>
> /root/<level1>/<level2>/<leve3>/<level4>/<level5>/<id>/<sub_id>/<DocumentNode1>/<DocumentNode2>/textProperty
>
> Number of possible path values at each level is from 2 to 20.
> id and sub_id are unique identifiers of the document, under them document structure is stored with maximum depth of 5. Many documents have the same type of structure and property names.
>
> We faced some performance bottlenecks and I have tried to profile our application (with YourKit Java Profiler) and I have noticed that there are many duplicate strings stored on the heap and most of those duplicates (I mean almost all of them) are contained by org.apache.jackrabbit.spi.commons.name.NameFactoryImpl$NameImpl class instances.
>
> After doing several portal page requests (which would mean about 1000 JR requests) and taking memory snapshot I have noticed that string "root" is stored and contained by NameImpl about 12000 times, which is about 2Mb waste. Also other strings with values of the repository level names and property names had from 11000 to 3000 duplicates. The total calculated waste is about 50Mb and that is only after not that many requests.
>
> It is probably not the only memory/performance bottleneck and it also could be that our app is doing something wrong, but it would be good to get some ideas on that from you guys.
>
> After leaving server alone and not doing anything on that for a while (6-8 hours), I have taken memory snapshot again and the number of duplicates has slightly reduced, but I would not say that it changed a lot or many of the duplicate strings have been garbage-collected.
>
> I have also looked at the source of NameFactoryImpl$NameImpl and found that it uses String.intern() for name space storing, but not for local name part, which is wise in general, but may not work if JackRabbit is stressed to have too many requests.
>
> Therefore I have several questions, that some of you may help me with:
>
> 1) Is there a way to implement different name creation strategy? I see that NameFactory is an interface, but how would I plug in different implementation to adapt to my repository structure, so, that "root" string would not be stored 12000 times or even more?
>
> 2) Can someone explain me how JR cache manager works and can this leak happen because of cache manager storing to many states? Is the size of JR cache depends on the live session number? Would it be wise to disable it? or at least limit it?
>
> Best regards,
>
> Andrey
>
>
>
>       ___________________________________________________________
> Support the World Aids Awareness campaign this month with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/