You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2009/03/26 22:47:50 UTC

[jira] Issue Comment Edited: (PDFBOX-441) remove CosName nameMap cache

    [ https://issues.apache.org/jira/browse/PDFBOX-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689485#action_12689485 ] 

Andreas Lehmkühler edited comment on PDFBOX-441 at 3/26/09 2:45 PM:
--------------------------------------------------------------------

I have a look at the code and there are many getPDFName calls like that one Sean already mentioned above. If we replace all of these with a reference to a static constant (some have to be added at first) from COSName we are able to decrease the number of access to the map a lot.
Furthermore we should follow Chris suggestions from PDFBOX-351 to replace the HashMap with a WeakHashMap to get the gc working.
I fear we can't get rid of the cache, because of some code (e.g. in the BaseParser) which works with a lot of generic getPDFName calls. These calls perhaps would produce a OutOfMemory if really large documents will be parsed without using a cache.

      was (Author: lehmi):
    I have a look at the code and there are many getPDFName calls like that one Sean already mentioned above. If we replace all of these with a reference to a static constant (some have to be added at first) from COSName we are able to decrease the number of access to the map a lot.
Furthermore we should follow Joshs suggestions from PDFBOX-351 to replace the HashMap with a WeakHashMap to get the gc working.
I fear we can't get rid of the cache, because of some code (e.g. in the BaseParser) which works with a lot of generic getPDFName calls. These calls perhaps would produce a OutOfMemory if really large documents will be parsed without using a cache.
  
> remove CosName nameMap cache
> ----------------------------
>
>                 Key: PDFBOX-441
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-441
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 0.7.3
>            Reporter: Sean Bridges
>            Priority: Minor
>
> The CosName class keeps a cache of all instances created in a static synchronized map.  I am guessing this is for performance reasons to avoid creating objects, but in our system it is causing performance problems.  We are running 7 threads extracting text from pdf's, and we can see a large number of conflicts reading from nameMap.
> The CosName map is also a potential memory leak, which forces users to periodically clear it, as noted in PDFBOX-351
> Can nameMap be removed altogether?
> At the least, if PDSimpleFont replaced, 
>  COSName.getPDFName( "FontDescriptor" ) 
> with 
> COSName.FONT_DESC
> It would reduce contention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.