You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by murtaza ali <mu...@softinn.com.pk> on 2010/04/06 10:21:00 UTC

Query Regarding POI API - Is DOC to HTML conversion possible?

Hi,

I want to inquire that is is possible to convert Office document to HTML
using Apache POI api.

I would appreciate your answer, and I will be obliged.

Regards,

Murtaza Ali

Re: Query Regarding POI API - Is DOC to HTML conversion possible?

Posted by MSB <ma...@tiscali.co.uk>.
Assuming that you want to convert from the older binary format then there is
no method that you can call to output the document as HTML. Depending on the
complexity of the document - HWPF at the least is not mature and there are
areas that it cannot yet handle successfully - you could use the api to
extract information from the file and then 're-assemble' this into a marked
up format but you would have to write the code to control that process and
handle the creation of the HTML.

If you want a simple API that is able to convert one file format into
another, it may be best to take a look at JODConverter -
http://www.artofsolving.com/opensource/jodconverter. It uses OpenOffice to
convert between different file formats and may be the sort of tool you are
looking for. The one thing that I do not know is just how the conversion
process if performed; I mean by that, just what sort of HTML markup it will
produce and how much control you have over the format of the markup. I do
know that many people are dismayed by the sort of HTML Word itself produces
and about the fact that it is not possible to control the sort of markup the
application outputs. That is where writing the code yourself - using POI to
strip the information from the source file and, possibly, something like the
HTMLEditorKit that is encapsulated within the core Java distribution - may
prove to be advantageous.

Yours

Mark B


murtaza ali wrote:
> 
> Hi,
> 
> I want to inquire that is is possible to convert Office document to HTML
> using Apache POI api.
> 
> I would appreciate your answer, and I will be obliged.
> 
> Regards,
> 
> Murtaza Ali
> 
> 

-- 
View this message in context: http://old.nabble.com/Query-Regarding-POI-API---Is-DOC-to-HTML-conversion-possible--tp28148734p28154794.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org