You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2009/03/26 22:47:50 UTC
[jira] Issue Comment Edited: (PDFBOX-441) remove CosName nameMap
cache
[ https://issues.apache.org/jira/browse/PDFBOX-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689485#action_12689485 ]
Andreas Lehmkühler edited comment on PDFBOX-441 at 3/26/09 2:45 PM:
--------------------------------------------------------------------
I have a look at the code and there are many getPDFName calls like that one Sean already mentioned above. If we replace all of these with a reference to a static constant (some have to be added at first) from COSName we are able to decrease the number of access to the map a lot.
Furthermore we should follow Chris suggestions from PDFBOX-351 to replace the HashMap with a WeakHashMap to get the gc working.
I fear we can't get rid of the cache, because of some code (e.g. in the BaseParser) which works with a lot of generic getPDFName calls. These calls perhaps would produce a OutOfMemory if really large documents will be parsed without using a cache.
was (Author: lehmi):
I have a look at the code and there are many getPDFName calls like that one Sean already mentioned above. If we replace all of these with a reference to a static constant (some have to be added at first) from COSName we are able to decrease the number of access to the map a lot.
Furthermore we should follow Joshs suggestions from PDFBOX-351 to replace the HashMap with a WeakHashMap to get the gc working.
I fear we can't get rid of the cache, because of some code (e.g. in the BaseParser) which works with a lot of generic getPDFName calls. These calls perhaps would produce a OutOfMemory if really large documents will be parsed without using a cache.
> remove CosName nameMap cache
> ----------------------------
>
> Key: PDFBOX-441
> URL: https://issues.apache.org/jira/browse/PDFBOX-441
> Project: PDFBox
> Issue Type: Improvement
> Affects Versions: 0.7.3
> Reporter: Sean Bridges
> Priority: Minor
>
> The CosName class keeps a cache of all instances created in a static synchronized map. I am guessing this is for performance reasons to avoid creating objects, but in our system it is causing performance problems. We are running 7 threads extracting text from pdf's, and we can see a large number of conflicts reading from nameMap.
> The CosName map is also a potential memory leak, which forces users to periodically clear it, as noted in PDFBOX-351
> Can nameMap be removed altogether?
> At the least, if PDSimpleFont replaced,
> COSName.getPDFName( "FontDescriptor" )
> with
> COSName.FONT_DESC
> It would reduce contention.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.