You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Peter Margetiak <pe...@gmail.com> on 2014/06/09 00:45:37 UTC

Apache POI XWPFDocument to Pdf format

Hello devs!

I am developer, too and I want to intergrate your library into my 
project (comparing to other converter libraries give me best results)

But I found some kind of interesting bug in Docx to pdf converter, which 
I am not able to fix without your help.
I used sample code with explicitly changed font encoding to windows-1250 
(from page: 
https://code.google.com/p/xdocreport/wiki/XWPFConverterPDFViaIText)

// 1) Load DOCX into XWPFDocument
InputStreamin=newFileInputStream(newFile("HelloWord.docx"));
XWPFDocumentdocument =newXWPFDocument(in);

// 2) Prepare Pdf options
PdfOptionsoptions =PdfOptions.create().fontEncoding("windows-1250");

// 3) Convert XWPFDocument to Pdf
OutputStreamout=newFileOutputStream(newFile("HelloWord.pdf"));
PdfConverter.getInstance().convert(document,out,options);

Document .docx containts just characters:
Aáäbcčdďeéfghiíjklĺľmnňoóôpqrsštťuúvwxyýzž

Interesting is that on my Windows8 jdk7 it works without any problem - 
PDF is OK.
On my Ubuntu server oracle sun jdk7, it skip some characters and the 
results seems like:
Aáäbcdeéfghiíjklmnoóôpqrsštuúvwxyýzž

Please, can U tell me, what I am doing wrong?

I spend a lot of time on this problem.
Thank you very much for any help from U.

With best regards,
Peter Margetiak

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Apache POI XWPFDocument to Pdf format

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 9 Jun 2014, Peter Margetiak wrote:
> // 1) Load DOCX into XWPFDocument
> InputStreamin=newFileInputStream(newFile("HelloWord.docx"));
> XWPFDocumentdocument =newXWPFDocument(in);

You're better off loading from the File directly, it's lower memory than 
going via an input stream

> // 2) Prepare Pdf options
> PdfOptionsoptions =PdfOptions.create().fontEncoding("windows-1250");
>
> // 3) Convert XWPFDocument to Pdf
> OutputStreamout=newFileOutputStream(newFile("HelloWord.pdf"));
> PdfConverter.getInstance().convert(document,out,options);

None of these classes come from Apache POI, so I'm minded to blame one of 
those...

> Interesting is that on my Windows8 jdk7 it works without any problem - 
> PDF is OK. On my Ubuntu server oracle sun jdk7, it skip some characters 
> and the results seems like: Aáäbcdeéfghiíjklmnoóôpqrsštuúvwxyýzž

If you ditch all the XPWF bits, and just do it with plain text, does that 
work? My hunch is it's either a bug in the pdf library you're using, or 
you're missing some key fonts on your linux box

Nick