You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Vaishali Mahajan <va...@gmail.com> on 2023/03/20 05:40:05 UTC
PDF to Word Conversion
Hi,
Creating PDF to Word conversion application using PDFbox .net version.
Getting all text from pdf but without formatting. I want to Preserve the
formatting of text as well as all images from pdf to word files. Please
guide me.
Thanks
Re: PDF to Word Conversion
Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,
There is no PDFBox .net version. There is some unofficial stuff from old
PDFBox versions.
There is no "formatting" in PDF like in HTML. Glyphs are put at
specified places, sometimes 1 character at a time.
There are products that try to recreate paragraphs from this. Even
PDFBox tries this, but it's not perfect, see PDFText2HTML.java .
To get the images, see the ExtractImages.java and
PrintImageLocations.java . One would have to combine all this, and it
would still not look very close to a PDF.
Tilman
On 20.03.2023 06:40, Vaishali Mahajan wrote:
> Hi,
>
> Creating PDF to Word conversion application using PDFbox .net version.
> Getting all text from pdf but without formatting. I want to Preserve the
> formatting of text as well as all images from pdf to word files. Please
> guide me.
>
>
> Thanks
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org