You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@forrest.apache.org by Jeff Turner <je...@apache.org> on 2003/01/24 11:13:39 UTC

XHTML 2 intermediate format (Re: Letting through raw HTML)

On Fri, Jan 24, 2003 at 09:38:13AM +0100, Nicola Ken Barozzi wrote:
...
> >So I'd like to introduce behaviour where:
> >
> >content/xdocs/*.html
> >
> >is treated as well-formed XML (hence in xdocs), and just has the menu and
> >tabs added.
> >
> >Does this sound decent?
> 
> To be fair, I think it sucks. It's a big fat hole in our SOC.
>
> That said, I also think that keeping a documentDTD11 that's 
> -almostbutnotquite- HTML sucks even more, and takes away lots of 
> flexibility. We're trying to push an elephant (html, docbook, whatever) 
> through a narrow hole (document11).

True.

> The solution IMHO would be to switch to XHTML. It doesn't have sections? 
> I had proposed to follow XHTML2 which has them, and has all HTML features.
> 
> So we would have:
> 
>  - XHTML2WD
>  - DOCBOOK
>  - WIKI
>  - HTML
> 
> Then as an intermediate format
> 
>  - XHTML2 ?
>  - DOCBOOK ?

XHTML 2 sounds like the best bet for an intermediate format, because:

 - It's structurally closest to HTML, so the xhtml22html.xsl stylesheet
   would be simple.
 - There's already Docbook -> XHTML stylesheets, so supporting Docbook as
   a source format should be quite easy.

It's also pretty good as a 'source' format too.  Non-proprietary,
politically neutral, familiar to users..

> Then as output
> 
>  - DOCBOOK

Who wants Docbook output?

>  - HTML
>  - TEXT

- XML + CSS

> Finally we would stop maintaining a DTD that was created to surpass HTML 
> deficiencies, at a time when HTML is going forward faster.

Hooray..


To: www-html-request@w3.org
Subject: subscribe


http://lists.w3.org/Archives/Public/www-html/


See you there.

--Jeff

> See my previous post for a comparison of DocumentDTD and XHTML2WD to see 
> the differences.
> 
> -- 
> Nicola Ken Barozzi                   nicolaken@apache.org
>             - verba volant, scripta manent -
>    (discussions get forgotten, just code remains)
> ---------------------------------------------------------------------
> 
>

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Joshua P. Dady wrote:
> Steven Noels wrote:
> 
>> +1 for your rationale - but I _really_ would like _not_ to use DTDs 
>> nor Schemas for formally defining our mid-tier format. We are 
>> suffering (my bad) due to the use of DTDs for docv11 already.
> 
> It seems like as good time as any to throw some of my loose change into 
> the discussion.

:-)

> I rather liked the idea of an "HTML-Light" intermediate format, even if 
> I had to tweak the site2html bits of the sitemap when I added a new 
> source type (because I wanted to modify the CSS rules used by the final 
> HTML, and I document didn't give me a way to say that).  Having a DTD 
> for the intermediate format has the advantage of being an easy way for 
> newcomers such as myself to get their bearings, as opposed to, say, 
> trying to learn the intermediate format by reading a bunch of XSLT. Then 
> there's the big blinking red arrow that pops up when your source file 
> was valid, but an intermediate file wasn't.  8)

Why IMHO is Steven advocating a non-DTD intermediate format. Because he 
wants HTML things not in DTD11 to pass anyway. Right?

Using HTML as an intermediate format will basically make this 
unnecessary, and we will have validation possible there too.

DocBook as an intermediate format is too heavy ATM and too semantically 
rich. Yes, we will loose semantics with HTML2, yes, it's not perfect.
But IMHO we will retain all the semantics we want.

An intermediate format will *always* loose semantics, because the 
original document can have infinite schemas. The balance is finding a 
format that suits what we want to present as a result and keeping it 
simple. And using as intermediate format a common source format also helps.

I don't want to make massive changes to DocumentDTD11 files and convert 
all to XHTML2. Heck, XHTML2 is not here yet.

But I'd like to start and change elements one by one to make it more 
similar to th current XHTML2 proposal. We already have a second 
proposal, and can see what is still being discussed and what will 
probably remain.

I'll start shortly in proposing changes to DocumentDTD11 so to bring it 
nearer to XHTML2. After the 0.3 release of course.

Here is the original comparison:
http://marc.theaimsgroup.com/?l=forrest-dev&m=102909917505608&w=2

   '-----------------'---------------------------------------'
   |  XHTML2 WD2     |       current document11 DTD          |
   '-----------------'---------------------------------------'
  *XHTML  Structure Module*

      html              document
      head              header
      title             title
      body              body

These are just name changes and I'd not do them now.

   *XHTML Text Module*

      abbr              * (requested by users via dictionary links)
      acronym           * (requested by users via dictionary links)
      address           * (requested by users)
      blockquote        *
      cite              *
      br  (deprecated)  br
      code              code
      dfn               * (requested by users via dictionary links)
      div               footer, legal  (requested by skinners)
      em                em
      h                 title
      kbd               * (needed, currently we misuse "code" instead)
      line              br
      p                 p
      |                 fixme   (with class attribute)
      |                 note    (with class attribute)
      |                 warning (with class attribute)
      pre               source
      quote             *
      samp              * (needed, currently we misuse "source" instead)
      section           section
      span              *  (requested by skinners)
      strong            strong
      var               * (needed, currently we misuse "code" instead)

Here we have two types of tags: ones that we lack, and ones that html2 
lacks.
I'd start with adding the ones we lack and find useful. Those we lack 
but are not really needed can stay off for now.

Then we can change the ones that we have and that html lacks.

These
                <fixme>
                <note>
                <warning>
will become
                <p class="fixme">
                <p class="note">
                <p class="warning">

    *XHTML Hypertext Module*

      a                 link (already decided to reduce)
      |                 jump (already decided to reduce)
      |                 fork (already decided to reduce)
      |                 anchor

This is a change that users should like and I'd like to do it.
Use <a> everywhere.

   * XHTML List Module*

      dl                dl
      dt                dt
      dd                dd
      nl                * (basically makes multilinks possible, very cool)
      name              * (part of nl spec)
      ol                ol
      ul                ul
      li                li

We just have multilinks here. They have already been proposed, can be 
done eventually, not fundamental now.

    *XHTML Linking Module*

      link element      book.xml

      Metainformation Module

       meta             abstract (never used)
       |                authors
       |                person
       |                subtitle (never used)
       |                type     (never used)
       |                version
       |                notice   (never used)

Here we can switch to use meta tags, and we can add links in the page 
for navigation. Mozilla uses them, it's very cool, and can make reading 
paths.

     *XHTML Object Module*

       object           img
       |                icon   (never used)
       |                figure (never used)

I'd leave this off for now, and simply switch to using img everywhere.

   *XHTML Presentation Module*

       hr               *
       sub              sub
       sup              sup

Easy to add.

    *XHTML Tables Module*

      caption           caption
      table             table
      tbody             *
      td                td
      th                th
      thead             * (needed, currently misusing caption)
      tfoot             * (needed, currently misusing other tags)
      tr                tr

Easy to add too.

So it seems that there are not many changes to do.
Yes, HTML has more features than these, like xforms and such, but they 
would be additional features to do when this is ok.

So, does it seems sensible? One step at a time.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by "Joshua P. Dady" <jp...@indecisive.com>.

Steven Noels wrote:
> +1 for your rationale - but I _really_ would like _not_ to use DTDs nor 
> Schemas for formally defining our mid-tier format. We are suffering (my 
> bad) due to the use of DTDs for docv11 already.

It seems like as good time as any to throw some of my loose change into 
the discussion.

I rather liked the idea of an "HTML-Light" intermediate format, even if 
I had to tweak the site2html bits of the sitemap when I added a new 
source type (because I wanted to modify the CSS rules used by the final 
HTML, and I document didn't give me a way to say that).  Having a DTD 
for the intermediate format has the advantage of being an easy way for 
newcomers such as myself to get their bearings, as opposed to, say, 
trying to learn the intermediate format by reading a bunch of XSLT. 
Then there's the big blinking red arrow that pops up when your source 
file was valid, but an intermediate file wasn't.  8)

--
Joshua P. Dady

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

> I think XHTML2 is the best candidate.
> 
> My understanding is that XHTML 1.1 and above are broken into modules, and
> it is possible to cleanly extend XHTML by adding new modules (eg SVG).
> So for example, if we wanted to include metadata, we'd throw some RDF
> into the <head> tag and call it a module.  As an intermediate format,
> XHTML2 would be just a base which to build.

Isn't it fun when people come up with the same conclusion across 
timezones and on the opposite side of the planet? We have been composing 
our answer at the same time - I hadn't yet read your answer. :-)

+1 for your rationale - but I _really_ would like _not_ to use DTDs nor 
Schemas for formally defining our mid-tier format. We are suffering (my 
bad) due to the use of DTDs for docv11 already.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at            http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org                stevenn at apache.org

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Miles Elam wrote:
> By the way, in case it wasn't clear, I have boundless respect for all of 
> you.  My posts were merely for the sake of counterpoints and submitting 
> options for discussion.

We know, your intentions are very clear from your mails, it's very 
evident that you are actively helping in the discussion with very 
precise, concrete and interesting points.

:-)

> P.S.  I have (X)HTML+CSS skins lying around doing nothing useful.  Are 
> they wanted?  

LEt us see, maybe it can be interesting. No promises though ;-)

> Am I waiting for the intermediate layer to be finalized 
> before working on the basic XSLT?

No, we will be doing changes in a gradual manner, and have a transition 
system. The goal IMO is to have them automated starting after 0.3

> I'm afraid I'll need a bit of 
> guidance in this area; I'm a relative newcomer, and while I want to 
> help, I'm still a bit fuzzy as to where any help would do the most good 
> and won't be obsoleted by imminent changes.

Help is appreciated in any area, don't worry. What you read on this list 
is what we all know about the fuure. This is the nice thing of OS, all 
is in the open :-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by Steven Noels <st...@outerthought.org>.

Miles Elam wrote:

> By the way, in case it wasn't clear, I have boundless respect for all of 
> you.  My posts were merely for the sake of counterpoints and submitting 
> options for discussion.

We know you are not Robert Simmons. ;-/

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at            http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org                stevenn at apache.org

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by Miles Elam <mi...@geekspeak.org>.

By the way, in case it wasn't clear, I have boundless respect for all of 
you.  My posts were merely for the sake of counterpoints and submitting 
options for discussion.

Jeff, Steven, Nicola, et al., I realize you guys have been doing the 
heavy lifting for a while now, and believe me, I put great stock in your 
opinions.

Just so you you all know.  :)

- Miles

P.S.  I have (X)HTML+CSS skins lying around doing nothing useful.  Are 
they wanted?  Am I waiting for the intermediate layer to be finalized 
before working on the basic XSLT?  I'm afraid I'll need a bit of 
guidance in this area; I'm a relative newcomer, and while I want to 
help, I'm still a bit fuzzy as to where any help would do the most good 
and won't be obsoleted by imminent changes.

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by Miles Elam <mi...@geekspeak.org>.

Jeff Turner wrote:

>Any format used by only one tool is too proprietary.
>
More than one tool:
http://www.dpawson.co.uk/docbook/reference.html#d12e96

I would think that the Linux Documentation Project would be pretty 
sensitive to a proprietary solution.  Something else, here's a bit of 
irony in Forrest/Cocoon-preferred technology: James Clark's RelaxNG 
specs are backed by OASIS and are written in DocBook.  ;-)

>>Familiarity to users is indeed an issue, but most web designers don't
>>use XHTML 1.0 yet let alone the backwards-incompatible XHTML2 which
>>does away with <br>, <img>, <h1> - <h6>, requires the use of CSS for
>>display styling, etc.  The W3C is more popular than OASIS, but then
>>Microsoft is more popular than the W3C.  How big is big enough?
>>    
>>
>
>In my mind, bigger than Apache is 'big enough'.
>

Better dump RelaxNG then.  ;-)

>There are two issues here:
>
>1) Assuming we have an intermediate format, is XHTML2 (or Docbook?)
>suitable.
>

I would think a lingua franca layer is necessary.  Both fit the role of 
Esperanto well enough I guess.  ;-)

>2) Is XHTML2 an appropriate 'source' format.
>

Shouldn't *anything* be an appropriate source format?  Given enough 
development time, even PDFs could be parsed and dumped into a pipeline, 
right?

>For 1), I can't see how Docbook could make a decent intermediate format.
>It's not designed for that.  It's too 'semantic'.  For example, say we
>invent a source syntax for describing directory heirarchies:
>
><dir id="somedir">
>  <file id="README.txt" desc="README file"/>
>  <file id="build.xml" desc="Ant build file"/>
>  <dir id="src">
>    <dir id="java" desc="Java Source code">
>    </dir>
>  </dir>
></dir>
>
>How can we possibly transform this into Docbook?
>
<section>
  <title><filename class="directory">somedir</filename></title>
  <section>
    <title><filename>README.txt</filename></title>
    <para>README file</para>
  </section>
  <section>
    <title><filename>build.xml</filename></title>
    <para>Ant build file</para><!-- Usage data and fully marked up 
examples could go here -->
  </section>
  <section>
    <title><filename class="directory">src</filename></title>
    <section>
      <title><filename class="directory">java</filename></title>
      <para>Java Source code</para>
    </section>
  </section>
</section>

Although I freely admit, I'm abusing the model somewhat.  XHTML2's <nl 
/> would almost certainly be better in many instances, but that's a 
primarily navigational element.  Would you end up with something 
fundamentally better with XHTML2?

>Forrest's doc-v11 format suffers the same problem.  We resort to abusing
>tags like <code> and <table> to indicate a certain presentation.
>

On the bright side, the previous example is semantically a hierarchy and 
even a hierarchy of filesystem elements.  The advantage of DocBook is 
that there is no layout.  Using alternate tags to indicate a certain 
presentation doesn't really apply.

>Whatever the intermediate format it, it must contain *less* semantics and
>*more* presentation than source formats.  However it cannot contain more
>'presentation' than the destination format (HTML, PDF), so it cannot be
>something like XSLFO.  Our intermediate format must sit in the middle of
>a gradient:
>
>
>SEMANTIC                                        PRESENTATIONAL
>
>authors
>HR-XML                                
>Docbook                       /---> HTML
>doc-v11   >---->  Intermediate
>myformat                      '---> XSL:FO: ---> PDF
>...
>

Fair enough.  More on my rebuttal below.  ;-)

>So, what XML format can encapsulate the presentational aspects of all our
>'source' formats (resumes, project docs, user manuals, etc) yet isn't
>*too* well defined that we can't transform it into HTML and XSL:FO?
>

Hunh?  Too well defined that we can't transform it?  That makes precious 
little sense to me.  If I have a semantically rich document telling me 
the an element is a legal notice (copyright info for example) with an 
itemized list of requirements, what prevents me from transforming that 
into strictly presentational views?  In fact, by splitting it off and 
clearly identifying the items in context, it makes it easier to make 
decisions like "does the copyright notice only appear centered on the 
last page by iteself or in a small blurb at the footer of each page?"  I 
honestly don't see the issue here.

As for transforming from a more presentational base (eg. Wiki), you are 
in much the same boat with either Simplified DocBook or XHTML2.  Also, 
XHTML2 is still being actively worked on by committee.  DocBook has been 
around for ten years (as long as HTML?) and has stabilized greatly.  Not 
only is it their policy to announce changes in advance of making those 
changes, but deprecated items are publically announced at the next 
release and only actually removed at the following release.  (There is 
some info about the direction of DocBook 6 even though DocBook 5 hasn't 
been released yet.)  If you browse the DocBook element documentation, 
you'll see references to future versions, where difficient items are 
scheduled for replacement and by what, and where new items have just 
appeared (and what it replaced/augmented).

>I think XHTML2 is the best candidate.
>
>My understanding is that XHTML 1.1 and above are broken into modules, and
>it is possible to cleanly extend XHTML by adding new modules (eg SVG).
>So for example, if we wanted to include metadata, we'd throw some RDF
>into the <head> tag and call it a module.  As an intermediate format,
>XHTML2 would be just a base which to build.
>

Cleanly extend, yes, but it's not without effort.  They made a 
completely separate DTD for XHTML+MathML+SVG.  On the bright side, the 
W3C has already done it for SVG+MathML, but I don't see it being a five 
minute affair AND the W3C has been working with DTDs for these 
aggregations.  YMMV I think.  Be careful that you aren't extending 
XHTML2 to the point where it's just DocBook in pieces.

Now, all that said, let me clarify that I think XHTML2 would make an 
excellent selection (it could be a whole hell of a lot worse).  I think 
the disagreement is in the goal.  Jeff, you are viewing the gradiant 
from start document to end presentation.  I was looking at the middle 
tier as a new common start point.  From your point of view, the 
semantics degrade in favor of layout.  For me, I saw it as enforcing a 
minimum semantic threshold.  I know...  I know...  I can hear the 
snickering already.  But hear me out.

If someone writes the following in Wiki:

!!!Semantically rich middle tiers

!!Coming from mostly presentational starting points

The above, when converted to XHTML2 (and DocBook), would have to nest 
items in a hierarchy somehow.  Moving from one semantically-poor 
document to another of lesser or equal richness (poverty?) is not 
fitting to XHTML2;  Rather, it's more fitting to XHTML 1.x and HTML's 
concepts of <h1> to <h6>.  In order to fit this into the ostensibly 
richer semantic models, you must convert to a hierarchy.  This is, in 
effect, adding fairly rigid semantics and structure where previously it 
was weak.

{{{<section>
  <h>Semantically rich middle tiers</h>
  <section>
    <h>Coming from mostly presentational starting points</h>
    <p>The above, when converted to XHTML2 (and DocBook), would have to 
nest items in a hierarchy somehow.  Moving from one semantically-poor 
document to another of lesser or equal richness (poverty?) is not 
fitting to XHTML2.  Rather, it's more fitting to XHTML 1.x and HTML's 
concepts of &lt;h1> to &lt;h6>.  In order to fit this into the 
ostensibly richer semantic models, you must convert to hierarchy.</p>
    <code><!-- Aaaaaaaah!  Recursion!!!!  ;-)  --></code>
  </section>
  <!-- snip content that hasn't been written yet -->
</section>}}}

!!Pulling information from backend content revision stores/CMS

Perhaps you want to display metainformation that does not exist in the 
source document?  How do you organize a revision history, document 
versioning, and version notes in XHTML2 cleanly?  As time moves on, 
WebDAV, CVS, Subversion, Xindice etc. backends will be the norm for 
documentation systems, not the exception -- and it will happen 
relatively soon (already?).  Document ownership can take a large role as 
well.  And while you can have <meta name="author">Jeff Turner</meta>, 
that doesn't get displayed -- which means you'll need to have it again 
in the content area (body).

Hmmm...  That brings up an interesting thought.  If you are correct in 
the gradiant model, then it's correct for the meta information to go 
into the content area.  Of course, this makes other drawbacks.  If, 
indeed, the middle tier is a semi-presentational model, then changing 
that presentation -- which will happen often to some extent from 
orgranization to organization -- will require changes to every type of 
input document and every output format.  Do purely semantic solutions 
have this drawback?

Well, ain't this a fine "how do you do?"  I'm much more amicable toward 
XHTML2 as the middle tier, but having that middle tier be non-trivially 
presentational is IMHO a bad idea.

Whether you have

<article>
  <articleinfo>
    <author/>
  </articleinfo>
</article>

or

<html>
  <meta name="author">Jeff Turner</meta>
  <body>
    <span id="author">Jeff Turner</span>
  </body>
</html>

it matters little -- just a question of aesthetics and personal 
preference.  However, if

<html>
  <body>
    <span class="toprightcorner">Jeff Turner</span>
  </body>
</html>

ever shows up, make way for the pathway to pain.

Wait...did I have a point?  I can't remember anymore.  Damn these stream 
of consciousness emails...

- Miles

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by Jeff Turner <je...@apache.org>.

On Fri, Jan 24, 2003 at 01:56:44PM -0800, Miles Elam wrote:
...
> >It's also pretty good as a 'source' format too.  Non-proprietary,
> >politically neutral, familiar to users..
> >
> How proprietary is too proprietary?

Any format used by only one tool is too proprietary.

> DocBook is backed by the non-profit group OASIS
> (http://www.oasis-open.org/).  Many of the W3C specs had strong
> influence from industry heavyweights.  (Wasn't CSS originally a
> Microsoft proposal?)  When you say "politically neutral," I think of
> things like XML Schema.

Ew :)

> Familiarity to users is indeed an issue, but most web designers don't
> use XHTML 1.0 yet let alone the backwards-incompatible XHTML2 which
> does away with <br>, <img>, <h1> - <h6>, requires the use of CSS for
> display styling, etc.  The W3C is more popular than OASIS, but then
> Microsoft is more popular than the W3C.  How big is big enough?

In my mind, bigger than Apache is 'big enough'.

> XHTML2 and Simplified DocBook are deceptively close in many respects.  
> For example, where in XHTML2 you would write
> 
>  <section>
>    <h>Section Title</h>
>    <p>section content</p>
>    <section>
>      <h>Subsection Title</h>
>      <p>subsection content</p>
>    </section>
>  </section>
> 
> in DocBook you would write
> 
>  <section>
>    <title>Section Title</title>
>    <para>section content</para>
>    <section>
>      <title>Subsection Title</title>
>      <para>subsection content</para>
>    </section>
>  </section>
> 
> The stylesheets to convert between the two in this case is trivial.  But 
> XHTML lacks many items in DocBook especially with regard to meta 
> information.  As an example
> 
>  <article>
>    <articleinfo>
>      <title>Why I like DocBook</title>
>      <subtitle>Although XHTML2 isn't bad either</subtitle>
>      <pubdate>2003-01-24T12:34:00-08:00</pubdate>
>      <authorgroup>
>        <author>
>          <firstname>John</firstname>
>          <surname>Doe</surname>
>          <honorific>PhD</honorific>
>          <affiliation>DocBook Examples, Inc.</affiliation>
>          <jobtitle>Example fodder</jobtitle>
>          <email>jdoe@imaginary.com</email>
>        </author>
>        <author>
>          <firstname>Miles</firstname>
>          <surname>Elam</surname>
>          <email>miles@avoidingspamharvesting.com</email>
>        </author>
>      </authorgroup>
>      <copyright>
>        <year>2002</year>
>        <holder>Miles Elam</holder>
>      </copyright>
>      <legalnotice>
>        The content presented here is the property of DocBook Examples, Inc.
>        Duplication without written consent is forbidden.
>      </legalnotice>
>      <revhistory>
>        <revision>
>          <revnumber>1.0</revnumber>
>          <date>2003-01-24</date>
>          <authorinitials>ME</authorinitials>
>          <revremark>Initial Revision</revremark>
>        </revision>
>        <revision>
>          <revnumber>1.1</revnumber>
>          <date>2003-01-24</date>
>          <authorinitials>ME</authorinitials>
>          <revremark>Fixed well-formedness errors and made spelling 
> corrections</revremark>
>        </revision>
>      </revhistory>
>      <abstract>
>        <para>A full example of the benefits (drawbacks?) of using 
> Simplified DocBook</para>
>      </abstract>
>      <keywordset>
>        <keyword>simplified docbook</keyword>
>        <keyword>docbook</keyword>
>        <keyword>middle tier</keyword>
>        <keyword>meta information</keyword>
>        <keyword>semantic content</keyword>
>      </keywordset>
>    </articleinfo>
>    <!-- *snip content* -->
>  </article>
> 
> Going down the list, title is obviously handled by (X)HTML and items 
> such as subtitle, pubdate, legalnotice can be handled roughly with a 
> series of meta tags (assuming of course that meta names don't conflict 
> with browser display behavior).  Legal notices are commonly held in the 
> final XSLT transformation for site-wide consistency.  Then again, with 
> things like an abstract and a revision history (either manually entered 
> or if the document is pulled from CVS or some CMS backend), XHTML falls 
> short.  You could specify a "class" attribute to the first section 
> specifying that it's an abstract, of course.  And this assumes that 
> people go through the effort of entering the extra metadata in the first 
> place.  Then again, not every tag in DocBook needs to be used.  DocBook 
> also has references published under the Free Documentation License like 
> this (http://www.docbook.org/tdg/simple/en/html/sdocbook.html) for its 
> various elements so you wouldn't be in the same boat found now.  
> (Granted that XHTML2 is likely to have far more articles, books, and 
> tutorials in the future.)
> 
> In the end, with first tiers like Wiki, you most likely won't have this 
> meta information, but since only a small subset of XHTML2 would be used 
> as well, it's a wash.  If DocBook is your start and XHTML is your lingua 
> franca, you lose information before you get to your presentation layer 
> (meta tags don't display on the page) or it loses it's semantic meaning 
> (just another bunch of <p> tags in the body).  Once again, you have the 
> option of using ids and classes to simulate it, but do you want the CSS 
> stylesheets dependant upon definitions in the middle tier when there's 
> another transformation(s) coming?  There's a difference between starting 
> with a limited set of information and limiting your set of information.
> 
> In addition, XHTML is strictly tailored to web display (not necessarily 
> a bad thing), but it limits your choices for alternate display.  There 
> are HTML to FO and HTML to PDF converters, but as things move further 
> away from <font> and <i> tags, these tools that don't understand CSS 
> will make those output PDFs quite bland and sometimes unusable.  If you 
> are going to have to put some extra legwork for XHTML2 + CSS to PDF 
> anyway, it doesn't save much effort over Simplified DocBook.  And full 
> DocBook lends itself well to complete compilations (aggregation of 
> articles and notes into volumes and books) whereas XHTML does not; in 
> other words, there's a clear migration path for the future if needs and 
> functionality becomes more complex.
> 
> Also, DocBook has reference XSL stylesheets for output to both HTML and 
> XSL:FO and instructions for customization 
> (http://docbook.sourceforge.net/release/xsl/current/doc/).
> 
> >- XML + CSS
> >
> As before, once you dump some of your semantic meaning, this becomes 
> more difficult.  Also, if you are already at XHTML2, why would you want 
> to fall back to a non-layout oriented markup as the final display step?


I think this thread illustrates why OSS works despite lack of formal
design work.  All these intelligent contributions forcing one to think :)


There are two issues here:

1) Assuming we have an intermediate format, is XHTML2 (or Docbook?)
suitable.
2) Is XHTML2 an appropriate 'source' format.


2) Doesn't matter for now.  I imagine we'd support both, or *at least*
Docbook.

For 1), I can't see how Docbook could make a decent intermediate format.
It's not designed for that.  It's too 'semantic'.  For example, say we
invent a source syntax for describing directory heirarchies:

<dir id="somedir">
  <file id="README.txt" desc="README file"/>
  <file id="build.xml" desc="Ant build file"/>
  <dir id="src">
    <dir id="java" desc="Java Source code">
    </dir>
  </dir>
</dir>

How can we possibly transform this into Docbook?

Forrest's doc-v11 format suffers the same problem.  We resort to abusing
tags like <code> and <table> to indicate a certain presentation.

Whatever the intermediate format it, it must contain *less* semantics and
*more* presentation than source formats.  However it cannot contain more
'presentation' than the destination format (HTML, PDF), so it cannot be
something like XSLFO.  Our intermediate format must sit in the middle of
a gradient:


SEMANTIC                                        PRESENTATIONAL

authors
HR-XML                                
Docbook                       /---> HTML
doc-v11   >---->  Intermediate
myformat                      '---> XSL:FO: ---> PDF
...


So, what XML format can encapsulate the presentational aspects of all our
'source' formats (resumes, project docs, user manuals, etc) yet isn't
*too* well defined that we can't transform it into HTML and XSL:FO?

I think XHTML2 is the best candidate.

My understanding is that XHTML 1.1 and above are broken into modules, and
it is possible to cleanly extend XHTML by adding new modules (eg SVG).
So for example, if we wanted to include metadata, we'd throw some RDF
into the <head> tag and call it a module.  As an intermediate format,
XHTML2 would be just a base which to build.


--Jeff


> Anyway, there's my petition for Simplfied DocBook in the middle tier.
> 
> - Miles
> 
>

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by Steven Noels <st...@outerthought.org>.

Miles Elam wrote:

> How proprietary is too proprietary?  DocBook is backed by the non-profit 
> group OASIS (http://www.oasis-open.org/).  Many of the W3C specs had 
> strong influence from industry heavyweights.  (Wasn't CSS originally a 
> Microsoft proposal?)  When you say "politically neutral," I think of 
> things like XML Schema.  Familiarity to users is indeed an issue, but 
> most web designers don't use XHTML 1.0 yet let alone the 
> backwards-incompatible XHTML2 which does away with <br>, <img>, <h1> - 
> <h6>, requires the use of CSS for display styling, etc.  The W3C is more 
> popular than OASIS, but then Microsoft is more popular than the W3C.  
> How big is big enough?

We should look at the merits of the formats instead of the endorsing 
institute. I would not trust W3C neither Oasis for not being influenced 
by industry. Quite the opposite, as a matter of fact.

> XHTML2 and Simplified DocBook are deceptively close in many respects.  

<snip type="good analysis"/>

I'm stuck with the idea that the mid-tier format serves only one 
purpose: being _rendered_ across a number of different formats (RSS, 
(X)HTML, PDF). As such, I doubt whether we need semantically-rich markup 
such as (s)docbook, if that means we loose some typical 'web' constructs 
that _can_ be translated into paper layout (dare I say: forms).

That mid-tier format doesn't necessarily need to be meta-fied, since it 
serves only one purpose: being translated into the result format. It is 
the bland cake that gets decorated with a skin.

That being said, I have strong reservations towards XHTML2, it being a 
focal point for heavy politics, and still having several issues (see 
http://www.w3.org/TR/2002/WD-xhtml2-20021218/). But still, it feels 
better as a mid-tier format than (s)docbook.

Coolness would be if we come up with an RNG grammar defining our own 
mid-tier format, which would be augmented XHTML2, with good anchors for 
skinning, and some decent metastuff.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at            http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org                stevenn at apache.org

Re: XHTML 2 intermediate format (Re: Letting through raw HTML)

Posted by Miles Elam <mi...@geekspeak.org>.

Jeff Turner wrote:

>On Fri, Jan 24, 2003 at 09:38:13AM +0100, Nicola Ken Barozzi wrote:
>
>  
>
>>The solution IMHO would be to switch to XHTML. It doesn't have sections? 
>>I had proposed to follow XHTML2 which has them, and has all HTML features.
>>    
>>
>XHTML 2 sounds like the best bet for an intermediate format, because:
>
> - It's structurally closest to HTML, so the xhtml22html.xsl stylesheet
>   would be simple.
> - There's already Docbook -> XHTML stylesheets, so supporting Docbook as
>   a source format should be quite easy.
>
>It's also pretty good as a 'source' format too.  Non-proprietary,
>politically neutral, familiar to users..
>
How proprietary is too proprietary?  DocBook is backed by the non-profit 
group OASIS (http://www.oasis-open.org/).  Many of the W3C specs had 
strong influence from industry heavyweights.  (Wasn't CSS originally a 
Microsoft proposal?)  When you say "politically neutral," I think of 
things like XML Schema.  Familiarity to users is indeed an issue, but 
most web designers don't use XHTML 1.0 yet let alone the 
backwards-incompatible XHTML2 which does away with <br>, <img>, <h1> - 
<h6>, requires the use of CSS for display styling, etc.  The W3C is more 
popular than OASIS, but then Microsoft is more popular than the W3C.  
How big is big enough?

XHTML2 and Simplified DocBook are deceptively close in many respects.  
For example, where in XHTML2 you would write

  <section>
    <h>Section Title</h>
    <p>section content</p>
    <section>
      <h>Subsection Title</h>
      <p>subsection content</p>
    </section>
  </section>

in DocBook you would write

  <section>
    <title>Section Title</title>
    <para>section content</para>
    <section>
      <title>Subsection Title</title>
      <para>subsection content</para>
    </section>
  </section>

The stylesheets to convert between the two in this case is trivial.  But 
XHTML lacks many items in DocBook especially with regard to meta 
information.  As an example

  <article>
    <articleinfo>
      <title>Why I like DocBook</title>
      <subtitle>Although XHTML2 isn't bad either</subtitle>
      <pubdate>2003-01-24T12:34:00-08:00</pubdate>
      <authorgroup>
        <author>
          <firstname>John</firstname>
          <surname>Doe</surname>
          <honorific>PhD</honorific>
          <affiliation>DocBook Examples, Inc.</affiliation>
          <jobtitle>Example fodder</jobtitle>
          <email>jdoe@imaginary.com</email>
        </author>
        <author>
          <firstname>Miles</firstname>
          <surname>Elam</surname>
          <email>miles@avoidingspamharvesting.com</email>
        </author>
      </authorgroup>
      <copyright>
        <year>2002</year>
        <holder>Miles Elam</holder>
      </copyright>
      <legalnotice>
        The content presented here is the property of DocBook Examples, Inc.
        Duplication without written consent is forbidden.
      </legalnotice>
      <revhistory>
        <revision>
          <revnumber>1.0</revnumber>
          <date>2003-01-24</date>
          <authorinitials>ME</authorinitials>
          <revremark>Initial Revision</revremark>
        </revision>
        <revision>
          <revnumber>1.1</revnumber>
          <date>2003-01-24</date>
          <authorinitials>ME</authorinitials>
          <revremark>Fixed well-formedness errors and made spelling 
corrections</revremark>
        </revision>
      </revhistory>
      <abstract>
        <para>A full example of the benefits (drawbacks?) of using 
Simplified DocBook</para>
      </abstract>
      <keywordset>
        <keyword>simplified docbook</keyword>
        <keyword>docbook</keyword>
        <keyword>middle tier</keyword>
        <keyword>meta information</keyword>
        <keyword>semantic content</keyword>
      </keywordset>
    </articleinfo>
    <!-- *snip content* -->
  </article>

Going down the list, title is obviously handled by (X)HTML and items 
such as subtitle, pubdate, legalnotice can be handled roughly with a 
series of meta tags (assuming of course that meta names don't conflict 
with browser display behavior).  Legal notices are commonly held in the 
final XSLT transformation for site-wide consistency.  Then again, with 
things like an abstract and a revision history (either manually entered 
or if the document is pulled from CVS or some CMS backend), XHTML falls 
short.  You could specify a "class" attribute to the first section 
specifying that it's an abstract, of course.  And this assumes that 
people go through the effort of entering the extra metadata in the first 
place.  Then again, not every tag in DocBook needs to be used.  DocBook 
also has references published under the Free Documentation License like 
this (http://www.docbook.org/tdg/simple/en/html/sdocbook.html) for its 
various elements so you wouldn't be in the same boat found now.  
(Granted that XHTML2 is likely to have far more articles, books, and 
tutorials in the future.)

In the end, with first tiers like Wiki, you most likely won't have this 
meta information, but since only a small subset of XHTML2 would be used 
as well, it's a wash.  If DocBook is your start and XHTML is your lingua 
franca, you lose information before you get to your presentation layer 
(meta tags don't display on the page) or it loses it's semantic meaning 
(just another bunch of <p> tags in the body).  Once again, you have the 
option of using ids and classes to simulate it, but do you want the CSS 
stylesheets dependant upon definitions in the middle tier when there's 
another transformation(s) coming?  There's a difference between starting 
with a limited set of information and limiting your set of information.

In addition, XHTML is strictly tailored to web display (not necessarily 
a bad thing), but it limits your choices for alternate display.  There 
are HTML to FO and HTML to PDF converters, but as things move further 
away from <font> and <i> tags, these tools that don't understand CSS 
will make those output PDFs quite bland and sometimes unusable.  If you 
are going to have to put some extra legwork for XHTML2 + CSS to PDF 
anyway, it doesn't save much effort over Simplified DocBook.  And full 
DocBook lends itself well to complete compilations (aggregation of 
articles and notes into volumes and books) whereas XHTML does not; in 
other words, there's a clear migration path for the future if needs and 
functionality becomes more complex.

Also, DocBook has reference XSL stylesheets for output to both HTML and 
XSL:FO and instructions for customization 
(http://docbook.sourceforge.net/release/xsl/current/doc/).

>- XML + CSS
>
As before, once you dump some of your semantic meaning, this becomes 
more difficult.  Also, if you are already at XHTML2, why would you want 
to fall back to a non-layout oriented markup as the final display step?

Anyway, there's my petition for Simplfied DocBook in the middle tier.

- Miles