You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by Ross Gardler <rg...@apache.org> on 2005/06/29 19:04:08 UTC

Converting HTML to XDoc

I've been struggling for a couple of days with this. I wonder if someone 
can help.

I need to convert an HTML document to XDoc (or XHTML2). I'm using the 
html2document.xsl in our SVN as a starting point but am thinking that it 
may be a dead end.

The problem is that the html2document.xsl stylesheet assumes that the 
HTML document has been authored in a structured way, that is, <h2> 
always follows <h1>, <h3> always follow <h2> etc. Unfortunately that is 
not the case in many of the documents I have to work with.

Does anyone know of a stylesheet (or other means) that will do the job?

I know I can do it with a custom generator, but I thought I'd ask if 
there is another solution first.

Ross