You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by David Moss <mo...@googlemail.com> on 2006/12/20 15:19:38 UTC

PdfTextFilter throws IOException on certain PDF documents

I'm trying to add the document iBATIS-SqlMaps-2_en.pdf to my repository, but
I think indexing the document fails.  Searching for words within the
document fails to return the document as a result, and checking my logs the
following error is generated.

exception initializing reader
org.apache.jackrabbit.core.query.PdfTextFilter$1: java.io.IOException:
Error: Expected hex number, actual=' 0'
java.lang.Throwable: Warning: You did not close the PDF Document
at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)

I've inserted other PDFs without any problem, but this one seems to be
different.  I believe that it's being generated in the background when
session.save() is called. Any ideas what's going wrong?

The file is attached.