You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Madhu <ma...@sraindia.com> on 2007/08/30 14:54:01 UTC

Lucene indexing for pdf files

Hi all...

i am indexing pdf document using pdfbox 7.4, its working fine for some pdf
files. for japanese pdf files its giving the below exception.

caught a class java.io.IOException
 with message: Unknown encoding for 'UniJIS-UCS2-H'

Can any one help me , how to set the encoding while reading pdf files.

Regards,
Madhu



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene indexing for pdf files

Posted by Steven Rowe <sa...@syr.edu>.
Hi Madhu,

Madhu wrote:
> i am indexing pdf document using pdfbox 7.4, its working fine for some pdf
> files. for japanese pdf files its giving the below exception.
> 
> caught a class java.io.IOException
>  with message: Unknown encoding for 'UniJIS-UCS2-H'
> 
> Can any one help me , how to set the encoding while reading pdf files.

This question will get much better and quicker answers from PDFBox
mailing lists/forums.  The SF forums look much more active than the
mailing lists:

   http://sourceforge.net/forum/?group_id=78314

Steve

-- 
Steve Rowe
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org