You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Peter Margetiak <pe...@gmail.com> on 2014/06/09 00:45:37 UTC
Apache POI XWPFDocument to Pdf format
Hello devs!
I am developer, too and I want to intergrate your library into my
project (comparing to other converter libraries give me best results)
But I found some kind of interesting bug in Docx to pdf converter, which
I am not able to fix without your help.
I used sample code with explicitly changed font encoding to windows-1250
(from page:
https://code.google.com/p/xdocreport/wiki/XWPFConverterPDFViaIText)
// 1) Load DOCX into XWPFDocument
InputStreamin=newFileInputStream(newFile("HelloWord.docx"));
XWPFDocumentdocument =newXWPFDocument(in);
// 2) Prepare Pdf options
PdfOptionsoptions =PdfOptions.create().fontEncoding("windows-1250");
// 3) Convert XWPFDocument to Pdf
OutputStreamout=newFileOutputStream(newFile("HelloWord.pdf"));
PdfConverter.getInstance().convert(document,out,options);
Document .docx containts just characters:
Aáäbcčdďeéfghiíjklĺľmnňoóôpqrsštťuúvwxyýzž
Interesting is that on my Windows8 jdk7 it works without any problem -
PDF is OK.
On my Ubuntu server oracle sun jdk7, it skip some characters and the
results seems like:
Aáäbcdeéfghiíjklmnoóôpqrsštuúvwxyýzž
Please, can U tell me, what I am doing wrong?
I spend a lot of time on this problem.
Thank you very much for any help from U.
With best regards,
Peter Margetiak
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
Re: Apache POI XWPFDocument to Pdf format
Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 9 Jun 2014, Peter Margetiak wrote:
> // 1) Load DOCX into XWPFDocument
> InputStreamin=newFileInputStream(newFile("HelloWord.docx"));
> XWPFDocumentdocument =newXWPFDocument(in);
You're better off loading from the File directly, it's lower memory than
going via an input stream
> // 2) Prepare Pdf options
> PdfOptionsoptions =PdfOptions.create().fontEncoding("windows-1250");
>
> // 3) Convert XWPFDocument to Pdf
> OutputStreamout=newFileOutputStream(newFile("HelloWord.pdf"));
> PdfConverter.getInstance().convert(document,out,options);
None of these classes come from Apache POI, so I'm minded to blame one of
those...
> Interesting is that on my Windows8 jdk7 it works without any problem -
> PDF is OK. On my Ubuntu server oracle sun jdk7, it skip some characters
> and the results seems like: Aáäbcdeéfghiíjklmnoóôpqrsštuúvwxyýzž
If you ditch all the XPWF bits, and just do it with plain text, does that
work? My hunch is it's either a bug in the pdf library you're using, or
you're missing some key fonts on your linux box
Nick