You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Erich C. Beyrent (JIRA)" <ji...@apache.org> on 2010/06/02 16:19:38 UTC

[jira] Commented: (PDFBOX-697) Error: Expected an integer type, actual='' -

    [ https://issues.apache.org/jira/browse/PDFBOX-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874589#action_12874589 ] 

Erich C. Beyrent commented on PDFBOX-697:
-----------------------------------------

I'm also running into this issue.  Here's the exceptions that are being thrown:

SEVERE: org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unable to extract PDF content
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
	at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
	at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
	at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
	at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
	at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF content
	at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
	... 20 more
Caused by: java.io.IOException: Error: Expected an integer type, actual=''
	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
	at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
	at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
	at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1100)
	at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:579)
	at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:235)
	at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
	at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
	... 25 more


> Error: Expected an integer type, actual='' -
> --------------------------------------------
>
>                 Key: PDFBOX-697
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-697
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.1.0
>         Environment: Windows XP
>            Reporter: ik Glop
>         Attachments: gridBase 014.pdf, sample_password_BRILD.pdf
>
>
> Hello,
> The following exception is being thrown when attempting to parse a pdf document password protected using Adobe Acrobat 9 Pro:
> Apr 7, 2010 2:53:22 PM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: Stop reading corrupt stream
> 07-Apr-2010 14:53:22.829: WARNING: java.io.IOException: Error: Expected an integer type, actual=''
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1275)
> at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
> at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
> at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1100)
> at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:579)
> I have the following libraries:
> fontbox-1.1.0.jar
> pdfbox-1.1.0.jar
> Sample password protected file is attached.
> Would appreciate if someone can help. 
> Thank you
> iglop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.