You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Madhu <ma...@sraindia.com> on 2007/08/30 14:54:01 UTC
Lucene indexing for pdf files
Hi all...
i am indexing pdf document using pdfbox 7.4, its working fine for some pdf
files. for japanese pdf files its giving the below exception.
caught a class java.io.IOException
with message: Unknown encoding for 'UniJIS-UCS2-H'
Can any one help me , how to set the encoding while reading pdf files.
Regards,
Madhu
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Lucene indexing for pdf files
Posted by Steven Rowe <sa...@syr.edu>.
Hi Madhu,
Madhu wrote:
> i am indexing pdf document using pdfbox 7.4, its working fine for some pdf
> files. for japanese pdf files its giving the below exception.
>
> caught a class java.io.IOException
> with message: Unknown encoding for 'UniJIS-UCS2-H'
>
> Can any one help me , how to set the encoding while reading pdf files.
This question will get much better and quicker answers from PDFBox
mailing lists/forums. The SF forums look much more active than the
mailing lists:
http://sourceforge.net/forum/?group_id=78314
Steve
--
Steve Rowe
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org