You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2012/06/11 18:25:43 UTC

[jira] [Commented] (PDFBOX-1337) Improve PDFOperator performance on multithreading environment

    [ https://issues.apache.org/jira/browse/PDFBOX-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292863#comment-13292863 ] 

Timo Boehme commented on PDFBOX-1337:
-------------------------------------

While the current implementation has the problem that get+put are not atomic and thus the same operator might be put twice I can't see why this should result in a dead lock. Clearly synchronizing both operations with a single block seems to be a better solution. Nevertheless could you please give a stack trace of the blocked threads (shown current one is runnable and line 76 is the return operation in current version). Could you also please test that this problem occurs with the recent 1.7.0 version?
                
> Improve PDFOperator performance on multithreading environment
> -------------------------------------------------------------
>
>                 Key: PDFBOX-1337
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1337
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Utilities
>    Affects Versions: 1.6.0
>            Reporter: Alexis
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> With more than 6 threads, the API PDFOperator#getOperator(String operator) is still blocked :
> Sample with 48 threads :
> pool-1-thread-46" - Thread t@72
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.pdfbox.util.PDFOperator.getOperator(PDFOperator.java:76)
> 	at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:441)
> 	at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:46)
> 	at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:175)
> 	at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187)
> 	at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266)
> I propose to remove the synchronization of the attribute "operators" and set up a synchronization 
> on the put operation. (This optimization saves 30 percent of time)
> public class PDFOperator
> {
>     [...]
>     // private static Map operators = Collections.synchronizedMap( new HashMap() );
>     private static Map operators = new HashMap();
>     [...]
>     public static PDFOperator getOperator( String operator )
>     {
>         PDFOperator operation = null;
>         if( operator.equals( "ID" ) || operator.equals( "BI" ) )
>         {
>             //we can't cache the ID operators.
>             operation = new PDFOperator( operator );
>         }
>         else
>         {
>             operation = (PDFOperator)operators.get(operator);
>             if( operation == null )
>             {
>               synchronized (operators) {
>                 operation = (PDFOperator)operators.get(operator);
>                 if ( operation == null ) {
>                   operation = new PDFOperator( operator );
>                   operators.put( operator, operation );
>                 }
>               }
>             }
>         }
>         return operation;
>     }
>     [...]
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira