You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by David Crossley <cr...@apache.org> on 2006/01/31 23:29:56 UTC

[docs] whitespace and indenting of xml

Martin Sebor wrote:
> 
> I'm a little distressed to see the conversion process has messed
> up the formatting of the original HTML that I manually maintained
> for readability. Specifically, many of the terminating tags (such
> as </p>) are not indented as they ought to be and instead are in
> column 1. I don't suppose there is an easy way to regenerate the
> page so as to preserve more of the original formatting, is there?

I tried my best to format stuff automatically
as part of the Forrest output process. If it
was raw xml serialiser output then it would have
been even worse. No we cannot retain original
formatting.

I know that it is not good enough.

Someone could run all documents through something
like HTML Tidy or Henning's CodeWrestler or perhaps
some XSL.

I would be pleased to see how they do this, because
i want to add the ability to our future tools.

On many projects i have seen messy source documents
cause grief with svn diffs - too much clutter and
inconsistent whitespace.

-David

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [docs] whitespace and indenting of xml

Posted by Leo Simons <ma...@leosimons.com>.
How to clean up HTML...

Using shell with xmllint (yes, ugly shortcuts below):

  export cmd="xmllint --html"
  find . -name '*.html' -exec $cmd \{\} > \{\}.new \;
  find . -name '*.html' -exec cp \{\}.new \{\} \;
  svn status | egrep '^\?' | sed -e 's/^\? *//g' | xargs rm

(--html not totally necessary if you have valid XML, eg you
can format xml as follows:


  export cmd="xmllint"
  find . -name '*.xml' -exec $cmd \{\} > \{\}.new \;
  find . -name '*.xml' -exec cp \{\}.new \{\} \;
  svn status | egrep '^\?' | sed -e 's/^\? *//g' | xargs rm
)

Using shell with tidy:

  export cmd="tidy -m -i -c -e"
  find . -name '*.html' -exec $cmd \{\} \;

In ant you would create a <fileset> and then do an <exec> of
much the same.

Both tools have some more interesting options.

- LSD

On Wed, Feb 01, 2006 at 09:29:56AM +1100, David Crossley wrote:
> Martin Sebor wrote:
> > 
> > I'm a little distressed to see the conversion process has messed
> > up the formatting of the original HTML that I manually maintained
> > for readability. Specifically, many of the terminating tags (such
> > as </p>) are not indented as they ought to be and instead are in
> > column 1. I don't suppose there is an easy way to regenerate the
> > page so as to preserve more of the original formatting, is there?
> 
> I tried my best to format stuff automatically
> as part of the Forrest output process. If it
> was raw xml serialiser output then it would have
> been even worse. No we cannot retain original
> formatting.
> 
> I know that it is not good enough.
> 
> Someone could run all documents through something
> like HTML Tidy or Henning's CodeWrestler or perhaps
> some XSL.
> 
> I would be pleased to see how they do this, because
> i want to add the ability to our future tools.
> 
> On many projects i have seen messy source documents
> cause grief with svn diffs - too much clutter and
> inconsistent whitespace.
> 
> -David
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [docs] whitespace and indenting of xml

Posted by Yoav Shapira <yo...@apache.org>.
> Someone could run all documents through something
> like HTML Tidy or Henning's CodeWrestler or perhaps
> some XSL.


It shouldn't be hard to integrate JTidy either by itself from the
command line or as an Ant task:
http://jtidy.sourceforge.net/howto.html.

Yoav

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org