You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Erich C. Beyrent (JIRA)" <ji...@apache.org> on 2010/06/02 16:19:38 UTC
[jira] Commented: (PDFBOX-697) Error: Expected an integer type,
actual='' -
[ https://issues.apache.org/jira/browse/PDFBOX-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874589#action_12874589 ]
Erich C. Beyrent commented on PDFBOX-697:
-----------------------------------------
I'm also running into this issue. Here's the exceptions that are being thrown:
SEVERE: org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unable to extract PDF content
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF content
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
... 20 more
Caused by: java.io.IOException: Error: Expected an integer type, actual=''
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1100)
at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:579)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:235)
at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
... 25 more
> Error: Expected an integer type, actual='' -
> --------------------------------------------
>
> Key: PDFBOX-697
> URL: https://issues.apache.org/jira/browse/PDFBOX-697
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.1.0
> Environment: Windows XP
> Reporter: ik Glop
> Attachments: gridBase 014.pdf, sample_password_BRILD.pdf
>
>
> Hello,
> The following exception is being thrown when attempting to parse a pdf document password protected using Adobe Acrobat 9 Pro:
> Apr 7, 2010 2:53:22 PM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: Stop reading corrupt stream
> 07-Apr-2010 14:53:22.829: WARNING: java.io.IOException: Error: Expected an integer type, actual=''
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1275)
> at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
> at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
> at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1100)
> at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:579)
> I have the following libraries:
> fontbox-1.1.0.jar
> pdfbox-1.1.0.jar
> Sample password protected file is attached.
> Would appreciate if someone can help.
> Thank you
> iglop
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.