You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2019/06/17 08:18:01 UTC

[jira] [Comment Edited] (PDFBOX-4559) Parse error reading document from several threads

    [ https://issues.apache.org/jira/browse/PDFBOX-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865403#comment-16865403 ] 

Timo Boehme edited comment on PDFBOX-4559 at 6/17/19 8:17 AM:
--------------------------------------------------------------

I think we have to explore different levels of creating/using streams in regard to be thread safe. The base implementation for our memory paging - ScratchFile - is (as the Javadoc states) thread safe (at least it was meant to be :)). However the RandomAccess instances (ScratchFileBuffer) created from it are not - as we have possibilities of mixed reads and writes (and so far parallel access to an instance was not supported by the API). RandomAccessInputStream is only a small layer on top of RandomAccessRead - here as ScratchFileBuffer. The first step would be to switch the ScratchFileBuffer in a read-only mode (or have a small wrapper only allowing thread-safe read access, implementing RandomAccessRead).

However even this might not help in this case as using a single RandomAccessInputStream from multiple threads will lead to errors (even if the methods would be synchronized) as one thread would not see a sequential stream of input bytes because the other threads will read some bytes in between.

For thread safe access the RandomAccessInputStream has to be created on request of a specific thread and method which wants to read the data. Thus the COSInputStream would have to store the thread safe RandomAccessRead implementation (as it does so indirectly now for the ScratchFileBuffer underlying the RandomAccessInputStream) and would have a method for creating a RandomAccessInputStream each time it is needed (being only a small access wrapper for the data).

 


was (Author: tboehme):
I think we have to explore different levels of creating/using streams in regard to be thread safe. The base implementation for out memory paging - ScratchFile - is (as the Javadoc states) thread safe (at least was meant to be it :) ). However the RandomAccess instances (ScratchFileBuffer) created from it are not - as we have possibilities of mixed reads and writes (and so far parallel access to an instance was not supported by the API). RandomAccessInputStream is only a small layer on top of RandomAccessRead - here as ScratchFileBuffer. The first step would be to switch the ScratchFileBuffer in a read-only mode (or have a small wrapper only allowing thread-safe read access, implementing RandomAccessRead).

However even this might not help in this case as using a single RandomAccessInputStream from multiple threads will be go wrong (even if the methods would be synchronized) as one thread would not see a sequential stream of input bytes but the other threads will read some bytes in between.

For thread safe access the RandomAccessInputStream has to be created on request of a specific thread and method which wants to read the data. Thus the COSInputStream would have to store the thread safe RandomAccessRead implementation (as it does so indirectly now for the ScratchFileBuffer underlying the RandomAccessInputStream) and would have a method for creating a RandomAccessInputStream each time it is needed (beeing only a small access wrapper for the data).

 

> Parse error reading document from several threads
> -------------------------------------------------
>
>                 Key: PDFBOX-4559
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4559
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Documentation, Rendering
>    Affects Versions: 2.0.15
>         Environment: Oracle Java 8 update125 on both Mac OS X and centos
>            Reporter: Jack
>            Priority: Major
>              Labels: concurrency, multithreading, type1, type1font
>         Attachments: test.pdf
>
>
> I got following error while running a simple parallel rendering code. However, the error doesn't happen when I change parallelStream to sequential (stream()). Interestingly, both methods will render exact same images. I saw a possible related ticket PDFBOX-3654. But seems that issue was fixed. I'd like to learn if we have some more bugs related?  
> *Sample code*:
> {code:java}
> PDDocument document = PDDocument.load(new File(pdfFilename));
> List<PDDocument> pdfPages = new Splitter().split(document);
> pdfPages.parallelStream().forEach(page -> {
>  try {
> PDFRenderer renderer = new PDFRenderer(page);
> renderer.renderImageWithDPI(0, 180, ImageType.RGB); // change dpi to your number
> } catch (IOException e) {
>  System.out.println(e);
> }
> try {
>  pdfPage.close();
> } catch (IOException ignored) {
> }
> });
> try {
>  document.close();
> } catch (IOException ignored) {
> }
> {code}
>  
> *Error log*:
> {noformat}
> ERROR [PDType1Font] Can't read the embedded Type1 font POAEND+Gotham-Book
> java.io.IOException: unexpected closing parenthesis
>  at org.apache.fontbox.type1.Type1Lexer.readToken(Type1Lexer.java:123) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Lexer.nextToken(Type1Lexer.java:75) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Parser.readValue(Type1Parser.java:398) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Parser.readOtherSubrs(Type1Parser.java:707) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Parser.parseBinary(Type1Parser.java:550) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Parser.parse(Type1Parser.java:64) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Font.createWithSegments(Type1Font.java:85) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:262) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:62) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:869) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:505) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:479) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:152) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:265) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:314) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229) ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
> WARN [PDType1Font] Using fallback font Helvetica for POAEND+Gotham-Book
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org