You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2019/04/16 17:13:00 UTC

[jira] [Commented] (PDFBOX-4514) inefficient use of synchronized in PDICCBased.java

    [ https://issues.apache.org/jira/browse/PDFBOX-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819286#comment-16819286 ] 

Tilman Hausherr commented on PDFBOX-4514:
-----------------------------------------

The {{synchronized(LOG)}} is because of weird effects when calling {{ICC_Profile.getInstance()}} from several threads on linux, see PDFBOX-3267 and PDFBOX-3641 and 
https://bugs.openjdk.java.net/browse/JDK-8058973
https://bugs.openjdk.java.net/browse/JDK-6986863

> inefficient use of synchronized in PDICCBased.java
> --------------------------------------------------
>
>                 Key: PDFBOX-4514
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4514
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Jason
>            Priority: Minor
>
> PDICCBased.java uses synchronized with static variable, e.g. synchronized (LOG) . It doesn't look to me it really needs to do it this way. This is very inefficient when multiple threads process different PDF at the same time. Change it to synchronized (this) will improve the performance.
> [https://github.com/apache/pdfbox/blob/3b16f3b4f42c61dd5fe990c586f60465f83a8ef8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDICCBased.java#L191]
> Sample code simulates multiple threads process different PDF at the same time:
>  
> {code:java}
> public static void main(String[] args) throws IOException {
>   for (int i = 0; i < 10; i++) { // just run multiple time
>     doWork();
>   }
> }
> private static void doWork() throws IOException {
>   long startTime = System.currentTimeMillis();
>   String pdfFilename = "<absolute path to your pdf file>"; // replace this with your test file
>   System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");
>   PDDocument document = PDDocument.load(new File(pdfFilename));
>   List<PDDocument> pdfPages = new Splitter().split(document);
>   Map<Integer, PDDocument> pdfPagesWithIndex = new HashMap<>();
>   for (int i = 0; i < pdfPages.size(); i++) {
>     pdfPagesWithIndex.put(i, pdfPages.get(i));
>   }
>   // multiple threads running in parallel
>   pdfPagesWithIndex.entrySet().parallelStream().forEach(entry -> {
>     try {
>       processPDF(entry.getKey(), entry.getValue());
>     } catch (Exception e) {
>       System.out.println(e);
>     }
>   });
>   System.out.println("Convertion time: " + (System.currentTimeMillis() - startTime));
>   try {
>     document.close();
>   } catch (IOException ignored) {
>   }
> }
> private static void processPDF(int index, PDDocument pdfPage) throws IOException {
>   PDFRenderer renderer = new PDFRenderer(pdfPage);
>   try {
>     renderer.renderImageWithDPI(0, 180, ImageType.RGB);
>   } catch (IOException e) {
>     System.out.println(e);
>   }
>   try {
>     pdfPage.close();
>   } catch (IOException ignored) {
>   }
> }
> {code}
> I observed by changing synchronized (LOG) to synchronized (this), the above code can have maybe 20-30% reduction in latency. If I do a thread dump, I can see many threads are blocked on synchronized (LOG). 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org