You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2017/10/17 16:13:00 UTC

[jira] [Commented] (PDFBOX-3966) Operator not found in resources

    [ https://issues.apache.org/jira/browse/PDFBOX-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207855#comment-16207855 ] 

Tilman Hausherr commented on PDFBOX-3966:
-----------------------------------------

I'm reluctant to change that one without seeing the file. Usually missing resources mean that the page is completely messed up. What happens when you try to display that file?

> Operator not found in resources
> -------------------------------
>
>                 Key: PDFBOX-3966
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3966
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.7
>            Reporter: Jorge Spinsanti
>
> I got an exception to extract HTML from PDF. Source PDF is not available.
> {code}
> Main cause:
> 	org.apache.tika.exception.TikaException: Unable to extract PDF content
> 	at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139)
> 	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:167)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	....
> Caused by: java.io.IOException: name for 'gs' operator not found in resources: /R8
> 	at org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:54)
> 	at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)
> 	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
> 	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
> 	at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
> 	at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
> 	at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
> 	at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
> 	at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
> 	at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
> 	at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
> 	... 27 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org