You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@forrest.apache.org by Peter Hargreaves <su...@pdh-online.info> on 2003/12/29 17:03:32 UTC

Documentv20 --> DocBook

Hi, Folks,

The following is to strengthen the case for DocBook (or some other media
independent DTD) as the central document type.

1) The central document type should be independent of the final media
type. Yes/No?

2) The central document type must be styleable for any target media type
such as XHTML, PDF, DOC, WML, etc.

3) The central document type should not be biased toward any one of the
possible target media types.

4) The skin should be the very first time that presentational, layout
and target media languages (such as XHTML, fo, WML, PDF, DOC,etc) are
used.

5) Some legacy source content DTDs are derived from target media
languages but they need not influence the choice of central DTD.

6) Other source content DTDs are virtually independent of the target
media type (e.g. DocBook)

7) A source content type such as DocBook is very rich in document
structure and semantic markup. The style sheets (or skins) for DocBook
feed on that richness.

8) A central document type, standing between content and skin, must not
filter out richness of meaning of document structure, or it will
undermine the presentational possibilities.

9) A central document type must therefore be very rich in its
description of document structure and meaning - without bias toward
media type.

10) The central document type might be user chooseable, if Forrest can
cope with the complexity of this?

11) Yes, you've guessed, I propose DocBook for the central document
type, starting with a very simplified version.

12) Why re-invent the wheel?

For those not familiar, here is an extract from Norman Walsh's DocBook:
The Definative Guide:
"DocBook provides a system for writing structured documents using SGML
or XML. It is particularly well-suited to books and papers about
computer hardware and software, though it is by no means limited to
them. DocBook is a document type definition (DTD). Because it is a large
and robust DTD, and because its main structures correspond to the
general notion of what constitutes a book, DocBook has been adopted by a
large and growing community of authors. DocBook is supported “out of the
box” by a number of commercial tools, and support for it is rapidly
growing in a number of free software environments. In short, DocBook is
an easy-to-understand and widely used DTD. Dozens of organizations use
DocBook for millions of pages of documentation, in various print and
online formats, worldwide."

Anyone else on this wavelength?

Peter.
--

Re: Documentv20 --> DocBook

Posted by Ross Gardler <rg...@wkwyw.net>.

Robert Koberg wrote:
> Hi,
> 
> Ross Gardler wrote:
> 
>>
>> As a result it can all be captured in the class attribute. How it is 
>> presented is then up to the rendering engine, which is exactly what 
>> should happen. I have to admit, I was suprised to find that I ended up 
>> agreeing with this view, but agree I did once I tried to justify my 
>> case with what I thought were rock solid use cases!
> 
> 
> 
> I see a few problems if using an (unconstrained) class attribute. First, 
> how is the ~schema~ communicated the users? Are they a preset list or 
> can they be anything? How do you ensure validity?

OK, I'll do my best to communicate what I have interprested from the 
archives, I am sure someone else will chime in if I get this wrong.

XHTML is for the intermediate representation between the users schema 
and the display:

         XSLT           XSLT + CSS
Source -------> XHTML ------------> Display

The idea is that we can X number of source schemas and only one skin.

> 
> If they are preset, why not just make them elements and define them in 
> the schema. As for transforming, this:

This is in the source schema, not the intermediate schema.

> If the value of the class attribute can be anything then how would you 
> keep it consistent (or actually, valid)?

At the intermediate stage the only reason we need "semantic" info such 
as (in my case) "slide", is because we display it differently. Therefore 
it should be a class.

> Speaking from a gui dev point of view how would you present the choices 
> available to a use? How would you ensure that the user is inserting a 
> 'valid' element (e.g. there is a policy that the 'byline' comes after 
> the title. Or an 'answer' comes after a 'question' for an faq type 
> content piece.)?

The semantic constraints are in the source schema. The display 
constraints are in the skin. To use your latet example of a faw answer 
after a question. We currently have the FAQ DTD which is converted to 
XDOC and then to PDF, HTML or whatever.

The only thing that will.change is that the FAQ DTD will be convereted 
to XHTML instead of XDoc before converting into whatever other format.

>>> 9) A central document type must therefore be very rich in its
>>> description of document structure and meaning - without bias toward
>>> media type.
> 
> 
> I would suggest that you use XHTML but take out all structural (html, 
> head, body, etc) and div/span. Instead of div/span, use what would have 
> been the classname. Everything would be defined in the schema so new 
> authors can know what they have to work with. There shouldn't be 
> structural elements because the content piece might just be one piece in 
> a rendering.

But this means that we have X number of intermediate formats. The whole 
point is to use a single intermediate format so that any number of 
source formats can be rendered succesfully with any Forrest Skin.

>> Quite the reverse. It should be as simple as possible, semantic 
>> meaning has no place at the presentaional layer, it is only 
>> presentation that is important.
>>
>> Can you give us a use case in which we need semantic meaning at the 
>> intermediate stage in order to do anything *other* than effect how the 
>> data is presented.
> 
> 
> validation to ensure contracts are upheld from/for a variety of 
> different points.

We don't need to validate the intermediate format since it is generated 
from a valid source format.

Ross

Re: Documentv20 --> DocBook

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Robert Koberg wrote:

> Hi,
> 
> Ross Gardler wrote:
> 
>> As a result it can all be captured in the class attribute. How it is 
>> presented is then up to the rendering engine, which is exactly what 
>> should happen. I have to admit, I was suprised to find that I ended up 
>> agreeing with this view, but agree I did once I tried to justify my 
>> case with what I thought were rock solid use cases!
> 
> I see a few problems if using an (unconstrained) class attribute. First, 
> how is the ~schema~ communicated the users? 

Implicitly.

>Are they a preset list or can they be anything? 

Anything.

> How do you ensure validity?

This is the point, they are *all* valid. It's unconstrained metadata.

The important thing is that class attributes are *extra* metadata, that 
can be ignored.

For example, Forrest will use:

   <p class="note">A note.</p>

and will render it as a note.

But another renderer can easily show it as a paragraph, and it's a 
good-enough aproximation. It's still valid, and the emantical loss is 
not much.

Or it can render it as

note:
  A note.

Bt using the class attribute as a moniker.

The base semantics are in the normal tags, and class tags are *extra* 
semantics. XHTML is made so that they are not necessary to convey the 
information, but enhance it.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Documentv20 --> DocBook

Posted by Robert Koberg <ro...@koberg.com>.

Hi,

Ross Gardler wrote:

> 
> As a result it can all be captured in the class attribute. How it is 
> presented is then up to the rendering engine, which is exactly what 
> should happen. I have to admit, I was suprised to find that I ended up 
> agreeing with this view, but agree I did once I tried to justify my case 
> with what I thought were rock solid use cases!

I see a few problems if using an (unconstrained) class attribute. First, 
how is the ~schema~ communicated the users? Are they a preset list or 
can they be anything? How do you ensure validity?

If they are preset, why not just make them elements and define them in 
the schema. As for transforming, this:
<xsl:template match="note"/>
<xsl:template match="footnote"/>

seems cleaner than:
<xsl:template match="div">
   <xsl:choose>
     <xsl:when test="@class='note'"/>
     <xsl:when test="@class='footnote'"/>
   </xsl:choose>
</xsl:template>

If the value of the class attribute can be anything then how would you 
keep it consistent (or actually, valid)?

Speaking from a gui dev point of view how would you present the choices 
available to a use? How would you ensure that the user is inserting a 
'valid' element (e.g. there is a policy that the 'byline' comes after 
the title. Or an 'answer' comes after a 'question' for an faq type 
content piece.)?

> 
>> 9) A central document type must therefore be very rich in its
>> description of document structure and meaning - without bias toward
>> media type.

I would suggest that you use XHTML but take out all structural (html, 
head, body, etc) and div/span. Instead of div/span, use what would have 
been the classname. Everything would be defined in the schema so new 
authors can know what they have to work with. There shouldn't be 
structural elements because the content piece might just be one piece in 
a rendering.

> 
> 
> Quite the reverse. It should be as simple as possible, semantic meaning 
> has no place at the presentaional layer, it is only presentation that is 
> important.
> 
> Can you give us a use case in which we need semantic meaning at the 
> intermediate stage in order to do anything *other* than effect how the 
> data is presented.

validation to ensure contracts are upheld from/for a variety of 
different points.

best,
-Rob

Re: Documentv20 --> DocBook

Posted by Ross Gardler <rg...@apache.org>.

Ferdinand Soethe wrote:
> Different angle (not connected to the above)
> 
> In terms of editing I think we should encourage people to use a
> semantic markup language (such as docbook) as a source format for all
> the reasons that Peter Hargreaves pointed out since all longer living
> documents (and their authors will) will, in the long run, benefit from
> such an approach.

I don't think Forrest should encourage its users to use any particular 
source format. We can't begin to understand everyones use cases. 
However, if you mean that, in general, we encourage users to find and 
use an appropriate source format for their use case then I would agree.

Of course, we need to support all those formats as input plugins. This 
will be much easier when we have an internal XHTML format.

Ross

Re: Documentv20 --> DocBook

Posted by Ferdinand Soethe <sa...@soethe.net>.

Thanks for explaining that. The transformation part is a strong point
for xhtml and so is the standard conformance.

Different angle (not connected to the above)

In terms of editing I think we should encourage people to use a
semantic markup language (such as docbook) as a source format for all
the reasons that Peter Hargreaves pointed out since all longer living
documents (and their authors will) will, in the long run, benefit from
such an approach.

Mind you, even discussions on this mailing list would benefit from
semantic markup when I think of applying a search engine to the archives.

--
Ferdinand Soethe

Re: Documentv20 --> DocBook

Posted by Ross Gardler <rg...@apache.org>.

Ferdinand Soethe wrote:
> Ross Gardler wrote:
> 
> the move to a
> RG> subset of XHTML2 is only to enable us to leverage emerging XHTML2 
> RG> editors
> 
> If that is so, would it not make more sense to only support XHTML2 as
> an input format and stick with documentv-xx for our interal format.
> 
> Especially now that xhtml-2 support could be easily offered as a
> plugin?

Document v2.0 is not the same as the XHTML2 subset we want to support. 
Plus, we don't know how XHTML2 may change in the future, by adopting 
XHTML2 as an internal format we remove the need to have XDoc track any 
changes in XHTML2.

Since XDoc brings no additional value, why keep the overhead?

Perhaps even more importantly, I "forgot" another key reason to using 
XHTML as the internal format:

Many source formats already provide a set of XSL sheets for transforming 
into XHTML. If we use that as our internal format we simplify our 
transformation pipeline. Consider the docbook scenario in this original 
thread:

Currently, to support the full Docbook format we would have to do:

docbook -> XHTML -> XDoc -> Output format

However, if we adopt XHTML as our internal format this becomes:

docbook -> XHTML -> Output format

This would also be true if we chose to support XHTML as an input format:

XHTML2 -> XDoc ->Output Format

instead of:

XHTML2 -> Output format

So the transformation is more efficient and we need only maintain one 
set of stylesheets and no schema definitions. Again, if XDoc is bringing 
now benefit over XHTML why increase out maintenance requirements in 
order to continue its support?

Finally, users like things to be standards compliant - we can remove a 
proprietary schema and replace it with a open standard, got to be good.

Ross

Re: Documentv20 --> DocBook

Posted by Ferdinand Soethe <sa...@soethe.net>.

Ross Gardler wrote:

the move to a
RG> subset of XHTML2 is only to enable us to leverage emerging XHTML2 
RG> editors

If that is so, would it not make more sense to only support XHTML2 as
an input format and stick with documentv-xx for our interal format.

Especially now that xhtml-2 support could be easily offered as a
plugin?

--
Ferdinand Soethe

Re: Documentv20 --> DocBook

Posted by Ross Gardler <rg...@apache.org>.

Ferdinand Soethe wrote:
> Nicola Ken Barozzi wrote:
> 
> NKB> Ross Gardler wrote:
> NKB> ...
> 
>>>My point is, *no* (usable) intermediate format will be so expressive
>>>that it can accomodate all users.
>>>
>>>On the XHTML side of things, the following text from the XHTML working
>>>draft convinces me that XHTML should be the intermediate format:
>>>
>>>"The XHTML family is designed with general user agent interoperability
>>>in mind. Through a new user agent and document profiling mechanism,
>>>servers, proxies, and user agents will be able to perform best effort
>>>content transformation. Ultimately, it will be possible to develop 
>>>XHTML-conforming content that is usable by any XHTML-conforming user
>>>agent."
>>>
>>>If I am going to lose some semantic information I want to be sure that
>>>the language I am using is so generic that I don;t lose any 
>>>presentational information regardless of the media type. That is what
>>>XHTML is designed for.
>>>
>>>Am I making any sense?
> 
> 
> NKB> Well said, we should put this part up on the site to explain why we use
> NKB> xhtml, it's exactly the point :-)
> 
> 
> Has this happended yet?

Do you mean has it been documented? Sadly, no. It really would help if 
it were documented, that particular discussion has happened many times 
on this list. It is only because of your willingness to read archives 
that you haven't needed to ask the same questions - if only everyone was 
so diligent ;-)

If we do document this it needs to be done in a way that justifies our 
document schema since the move to XHTML is not underway yet.

The argument is just as valid for our current format, the move to a 
subset of XHTML2 is only to enable us to leverage emerging XHTML2 
editors. When Forrest started XHTML2 was unspecified and XHTML1 was too 
close to HTML in that it had lots of tags that were being misused to 
denote style. Since, XHTML2 is modular  we can opt not to use those 
elements and it appears to be mature enough to actually use in a 
production system.

Ross

Re: Documentv20 --> DocBook

Posted by Ferdinand Soethe <ma...@soethe.net>.

Nicola Ken Barozzi wrote:

NKB> Ross Gardler wrote:
NKB> ...
>> My point is, *no* (usable) intermediate format will be so expressive
>> that it can accomodate all users.
>> 
>> On the XHTML side of things, the following text from the XHTML working
>> draft convinces me that XHTML should be the intermediate format:
>> 
>> "The XHTML family is designed with general user agent interoperability
>> in mind. Through a new user agent and document profiling mechanism,
>> servers, proxies, and user agents will be able to perform best effort
>> content transformation. Ultimately, it will be possible to develop 
>> XHTML-conforming content that is usable by any XHTML-conforming user
>> agent."
>> 
>> If I am going to lose some semantic information I want to be sure that
>> the language I am using is so generic that I don;t lose any 
>> presentational information regardless of the media type. That is what
>> XHTML is designed for.
>> 
>> Am I making any sense?

NKB> Well said, we should put this part up on the site to explain why we use
NKB> xhtml, it's exactly the point :-)

Has this happended yet? I totally agree that this and some other
pieces from Ross's mail should become part of the documentation as he
makes a very convincing case for the user of XHML while at the same
time explaining why the central document format does not have to have
the semantic expressiveness of docbook.

Reading this mail I - for the first time - realized that the decision
in no way means that "Forrest is not about semantic markup" but that it
is about "not needing semantic markup in the central document format"
while still supporting it in source documents.

I guess the point would become even more obvious if the central
document format was just that and not also an option for writers to
create their documents in. (Not saying that it shouldn't!).

Ferdinand

Re: Documentv20 --> DocBook

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Ross Gardler wrote:
...
> My point is, *no* (usable) intermediate format will be so expressive 
> that it can accomodate all users.
> 
> On the XHTML side of things, the following text from the XHTML working 
> draft convinces me that XHTML should be the intermediate format:
> 
> "The XHTML family is designed with general user agent interoperability 
> in mind. Through a new user agent and document profiling mechanism, 
> servers, proxies, and user agents will be able to perform best effort 
> content transformation. Ultimately, it will be possible to develop 
> XHTML-conforming content that is usable by any XHTML-conforming user 
> agent."
> 
> If I am going to lose some semantic information I want to be sure that 
> the language I am using is so generic that I don;t lose any 
> presentational information regardless of the media type. That is what 
> XHTML is designed for.
> 
> Am I making any sense?

Well said, we should put this part up on the site to explain why we use 
xhtml, it's exactly the point :-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Documentv20 --> DocBook

Posted by Ross Gardler <rg...@wkwyw.net>.

Peter Hargreaves wrote:
>>All we are proposing is moving from our existing proprietary docbook DTD 
>>to a subset of the much more widely available, used and accepted XHTML.
>>
>>Currently XML > XDOC > PDF works just fine, what will change if we go to 
>>XHTML?
> 
> 
> Now that penny that I think has dropped!
> 
> I think you are saying that the Forrest XHTML intermediate stage
> represents a media independent presention layer. So, in the case of
> something written in DocBook - it is transformed into XHTML - as was
> understood. Although something written in DocBook (or other source)
> should be free of presentational information, the transformation to
> XHTML necessarily makes decisions about presentation. In other words,
> the transformation aims to add presentational information, but in a
> media independent way. Have I got it?

Yes, or at least that is exactly how I understand it to be.

> I'll assume I'm now understanding the XHTML strategy better. I think
> you've made it clear that Forrest XHTML is not he same as XHTML. But why
> not? Is it because XHTML is not what it sets out to be? Is it because of
> the clumsy implications of css? To what degree is it possible to have
> media independent presentational instructions? Perhaps just for a subset
> of devices? I have reservations, I don't know.

It is still XHTML because XHTML is designed to be modular. If you don't 
need some part of XHTML then don't use that module. XHTML does have the 
limitation of having to be backward compatible with HTML so some of the 
modules are not as media independant as Forrest needs.

> So, I can now see that Forrest must pursue its FXHTML ambition (F as in
> Forrest I hastily add ;-). And I must discard my DocBook style sheets
> and instead write DocBook > FXHTML transformations.

I believe this is the best route, yes. This is because other people will 
be writing docbook -> XHTML and so you can assist with that effort but 
focus mainly on your individual skin to control the final presentation. 
It should make your life much simpler. Note, there is already a docbook 
--> xdoc stylesheet, not complete yet but a good start.

Ross

Re: Documentv20 --> DocBook

Posted by Peter Hargreaves <su...@pdh-online.info>.

On Tue, 2003-12-30 at 04:13, Ross Gardler wrote:
> Peter Hargreaves wrote:
> > On Mon, 2003-12-29 at 22:37, Ross Gardler wrote:
> > 
> >>Peter Hargreaves wrote:
> >>
> 
> <snip what="XHTML isn;t about presentation"/>
> 
> > Maybe the subset of XHTML adopted for Forrest could be a media
> > independent DTD like DocBook is.
> > 
> > What do you mean by presentational information? For me
> > <header>...</header> or <footer>...</footer> would be presentational
> > information because its meaning is about layout, and it assumes a media
> > type like paper that has headers and footers. However,
> > <title>...</title> is not presentational because its meaning is not
> > about layout. The skin or style sheet for the media called paper can
> > choose to put a title in the footer of the page or in the header.
> 
> Agreed.
> 
> Take a look at the comparison of XDoc and XHTML that Nicola Ken did 
> quite some time ago and I reposted at the start of this thread 
> (http://nagoya.apache.org/eyebrowse/ReadMsg?listName=forrest-dev@xml.apache.org&msgNo=9131). 
> That is the subset we are proposing, it does not (or should not) include 
> any presentational information.
> 
> >>>9) A central document type must therefore be very rich in its
> >>>description of document structure and meaning - without bias toward
> >>>media type.
> >>
> >>Quite the reverse. It should be as simple as possible, semantic meaning 
> >>has no place at the presentaional layer, it is only presentation that is 
> >>important.
> > 
> > If its simple then it will undermine the meaning of the original markup
> > - so the final presentation will be limited.
> 
> Can you give an example of why the presentation will suffer?

If my source document has <article>, <chapter>, <section>, etc. I want
my skin to style them as those things. But wait, I think a penny has
dropped. See below...
 
> >>Can you give us a use case in which we need semantic meaning at the 
> >>intermediate stage in order to do anything *other* than effect how the 
> >>data is presented.
> > 
> > 
> > Of course ALL markup meanings at the intermediate stage can be used to
> > affect the final presentation. But their meanings should not be about
> > presentation or layout, So, role="MSc" is OK but align="right" is not.
> > The skin stylist should decide left or right and in any case the target
> > media might not have a left or a right. e.g. Audio.
> 
> Agreed, and XHTML does not allow align="right" whilst it does allow 
> class="MSc"..
> 
> >>
> >>"its main structures correspond to the general notion of what 
> >>constitutes a book"
> > 
> > 
> > Ah, excellent point! Note that "book" has two meanings:
> > 
> > 1) The media type called book. I.E. a blank hard back with a binding and
> > blank white pages.
> > 
> > 2) The content type called book. I.E. a novel or work of reference that
> > may or may not be delivered via the media type called book.
> 
> Yes, but my point is that not everything can be represented as a book. A 
> book is, usually, linear. This section comes after that section. Not all 
> materials are, my trainning materials for example. Read my comment that 
> followed the above.

Its the media called book that is linear. Content called book is linear
if it is a novel, but not if it is reference material. But, yes I agree
that not everything can be represented as a book - that is why cruelly
suggested alternative intermediate DTDs. Penny still dropping...

> >>- I use forrest for training materials. Docbook *could* be used as a 
> >>source format but it is not ideal in my domain. I use other formats as 
> >>my source that have been designed for the purpose. By converting to 
> >>docbook as an intermediate format I will be losing semantic information, 
> >>which is something you say I shouldn't do, but something I am now 
> >>comfortable with (see above).
> > 
> > 
> > When an author writes a document or book using a DTD that is truely
> > media independent - he will have no control of style, layout or
> > presentation and should not think of such things. His document might be
> > delivered through any of many different style sheets for each of many
> > different media types. So, for the author, the cost of delivering
> > content to many media types is that he can no longer use style as part
> > of his message.
> 
> What you say is true, but what you say is "When an author writes a 
> document or book using a DTD that is truely media independent".
> 
> We are not talking about writing the document or book in XHTML, we are 
> talking about writing in the most suitable DTD and converting to XHTML 
> as an intermediate format for Forrest. Forrest then gives us the style 
> information without having to worry about it.

When I said write a document or book, I did mean anything. But I take
your point - the penny is dropping.

> I will not write my training materials in XHTMl or Docbook. I will write 
> them in a specially defined DTD, at that point I am not, as you say, 
> concerned with presentation. I'm not looking for an intermediate format 
> that gives me semantic meaning because...
> 
> >>My point is, *no* (usable) intermediate format will be so expressive 
> >>that it can accomodate all users.
> > 
> > 
> > Quite. Equally true for whatever intermediate DTD you choose.
> 
> Intermediate is what I said, read again ;-)

I think you are assuming single Intermediate format. I'm mischievously
still considering alternative ones ;-)

> >>On the XHTML side of things, the following text from the XHTML working 
> >>draft convinces me that XHTML should be the intermediate format:
> >>
> >>"The XHTML family is designed with general user agent interoperability 
> >>in mind. Through a new user agent and document profiling mechanism, 
> >>servers, proxies, and user agents will be able to perform best effort 
> >>content transformation. Ultimately, it will be possible to develop 
> >>XHTML-conforming content that is usable by any XHTML-conforming user agent."
> > 
> > 
> > Yes, an excellent and interesting point. But, hasn't XML sort of
> > displaced this approach to some degree? Maybe it will catch up again?
> 
> XHTML *is* an XML defined language. XML is nothing more than a way to 
> *define* markup languages. It itself is not a markup language. Therefore 
> XML has *enabled* this approach rather than displaced it - or am I 
> missing your point?
> 
> > XML is designed for XML > FO > PDF for instance. But if you try to do
> > XML > XHTML > FO > PDF you screw it all up.
> 
> OK, this is very important. Why will it get screwed up?

Because my existing DocBook content will not render using my existing
DocBook style sheets - that are very rich and choose to use the meaning
of docbook markup to determine presentation.

> All we are proposing is moving from our existing proprietary docbook DTD 
> to a subset of the much more widely available, used and accepted XHTML.
> 
> Currently XML > XDOC > PDF works just fine, what will change if we go to 
> XHTML?

Now that penny that I think has dropped!

I think you are saying that the Forrest XHTML intermediate stage
represents a media independent presention layer. So, in the case of
something written in DocBook - it is transformed into XHTML - as was
understood. Although something written in DocBook (or other source)
should be free of presentational information, the transformation to
XHTML necessarily makes decisions about presentation. In other words,
the transformation aims to add presentational information, but in a
media independent way. Have I got it?

I'll assume I'm now understanding the XHTML strategy better. I think
you've made it clear that Forrest XHTML is not he same as XHTML. But why
not? Is it because XHTML is not what it sets out to be? Is it because of
the clumsy implications of css? To what degree is it possible to have
media independent presentational instructions? Perhaps just for a subset
of devices? I have reservations, I don't know.

So, I can now see that Forrest must pursue its FXHTML ambition (F as in
Forrest I hastily add ;-). And I must discard my DocBook style sheets
and instead write DocBook > FXHTML transformations.

OK I understand - I'll think about it. Thanks for your patience.

Peter.
> 
> Ross
--

Re: Documentv20 --> DocBook

Posted by Ross Gardler <rg...@wkwyw.net>.

Peter Hargreaves wrote:
> On Mon, 2003-12-29 at 22:37, Ross Gardler wrote:
> 
>>Peter Hargreaves wrote:
>>

<snip what="XHTML isn;t about presentation"/>

> Maybe the subset of XHTML adopted for Forrest could be a media
> independent DTD like DocBook is.
> 
> What do you mean by presentational information? For me
> <header>...</header> or <footer>...</footer> would be presentational
> information because its meaning is about layout, and it assumes a media
> type like paper that has headers and footers. However,
> <title>...</title> is not presentational because its meaning is not
> about layout. The skin or style sheet for the media called paper can
> choose to put a title in the footer of the page or in the header.

Agreed.

Take a look at the comparison of XDoc and XHTML that Nicola Ken did 
quite some time ago and I reposted at the start of this thread 
(http://nagoya.apache.org/eyebrowse/ReadMsg?listName=forrest-dev@xml.apache.org&msgNo=9131). 
That is the subset we are proposing, it does not (or should not) include 
any presentational information.

>>>9) A central document type must therefore be very rich in its
>>>description of document structure and meaning - without bias toward
>>>media type.
>>
>>Quite the reverse. It should be as simple as possible, semantic meaning 
>>has no place at the presentaional layer, it is only presentation that is 
>>important.
> 
> If its simple then it will undermine the meaning of the original markup
> - so the final presentation will be limited.

Can you give an example of why the presentation will suffer?

> 
>>Can you give us a use case in which we need semantic meaning at the 
>>intermediate stage in order to do anything *other* than effect how the 
>>data is presented.
> 
> 
> Of course ALL markup meanings at the intermediate stage can be used to
> affect the final presentation. But their meanings should not be about
> presentation or layout, So, role="MSc" is OK but align="right" is not.
> The skin stylist should decide left or right and in any case the target
> media might not have a left or a right. e.g. Audio.

Agreed, and XHTML does not allow align="right" whilst it does allow 
class="MSc"..

>>
>>"its main structures correspond to the general notion of what 
>>constitutes a book"
> 
> 
> Ah, excellent point! Note that "book" has two meanings:
> 
> 1) The media type called book. I.E. a blank hard back with a binding and
> blank white pages.
> 
> 2) The content type called book. I.E. a novel or work of reference that
> may or may not be delivered via the media type called book.

Yes, but my point is that not everything can be represented as a book. A 
book is, usually, linear. This section comes after that section. Not all 
materials are, my trainning materials for example. Read my comment that 
followed the above.

>>- I use forrest for training materials. Docbook *could* be used as a 
>>source format but it is not ideal in my domain. I use other formats as 
>>my source that have been designed for the purpose. By converting to 
>>docbook as an intermediate format I will be losing semantic information, 
>>which is something you say I shouldn't do, but something I am now 
>>comfortable with (see above).
> 
> 
> When an author writes a document or book using a DTD that is truely
> media independent - he will have no control of style, layout or
> presentation and should not think of such things. His document might be
> delivered through any of many different style sheets for each of many
> different media types. So, for the author, the cost of delivering
> content to many media types is that he can no longer use style as part
> of his message.

What you say is true, but what you say is "When an author writes a 
document or book using a DTD that is truely media independent".

We are not talking about writing the document or book in XHTML, we are 
talking about writing in the most suitable DTD and converting to XHTML 
as an intermediate format for Forrest. Forrest then gives us the style 
information without having to worry about it.

I will not write my training materials in XHTMl or Docbook. I will write 
them in a specially defined DTD, at that point I am not, as you say, 
concerned with presentation. I'm not looking for an intermediate format 
that gives me semantic meaning because...

>>My point is, *no* (usable) intermediate format will be so expressive 
>>that it can accomodate all users.
> 
> 
> Quite. Equally true for whatever intermediate DTD you choose.

Intermediate is what I said, read again ;-)

>>On the XHTML side of things, the following text from the XHTML working 
>>draft convinces me that XHTML should be the intermediate format:
>>
>>"The XHTML family is designed with general user agent interoperability 
>>in mind. Through a new user agent and document profiling mechanism, 
>>servers, proxies, and user agents will be able to perform best effort 
>>content transformation. Ultimately, it will be possible to develop 
>>XHTML-conforming content that is usable by any XHTML-conforming user agent."
> 
> 
> Yes, an excellent and interesting point. But, hasn't XML sort of
> displaced this approach to some degree? Maybe it will catch up again?

XHTML *is* an XML defined language. XML is nothing more than a way to 
*define* markup languages. It itself is not a markup language. Therefore 
XML has *enabled* this approach rather than displaced it - or am I 
missing your point?

> XML is designed for XML > FO > PDF for instance. But if you try to do
> XML > XHTML > FO > PDF you screw it all up.

OK, this is very important. Why will it get screwed up?

All we are proposing is moving from our existing proprietary docbook DTD 
to a subset of the much more widely available, used and accepted XHTML.

Currently XML > XDOC > PDF works just fine, what will change if we go to 
XHTML?

Ross

Re: Documentv20 --> DocBook

Posted by Peter Hargreaves <su...@pdh-online.info>.

On Mon, 2003-12-29 at 22:37, Ross Gardler wrote:
> Peter Hargreaves wrote:
> > The following is to strengthen the case for DocBook (or some other media
> > independent DTD) as the central document type.
> > 
> 
> <snip what="points 1-4 saying we need a generic media independant format"/>
> 
> > 4) The skin should be the very first time that presentational, layout
> > and target media languages (such as XHTML, fo, WML, PDF, DOC,etc) are
> > used.
> 
> True.
> 
> XHTML doesn't contain presentational information. It does guide 
> presnetation, but it doesn't dictate it. It is the final rendering into 
> the target media language that has the actual presentation information 
> (such as XHTML+CSS, note the '+CSS', PDF, DOC etc.)

Maybe the subset of XHTML adopted for Forrest could be a media
independent DTD like DocBook is.

What do you mean by presentational information? For me
<header>...</header> or <footer>...</footer> would be presentational
information because its meaning is about layout, and it assumes a media
type like paper that has headers and footers. However,
<title>...</title> is not presentational because its meaning is not
about layout. The skin or style sheet for the media called paper can
choose to put a title in the footer of the page or in the header.

> Just because it is called X*HTML* doesn't mean it has to be rendered in 
> a browser.

No, but if it is inspired by the real XHTML spec then it will be
particularly suitable for a subset of final client types - and might not
be so suitable for others.

> 
> > 5) Some legacy source content DTDs are derived from target media
> > languages but they need not influence the choice of central DTD.
> 
> True, but the XHTML Vs Docbook discussion has been had many times in the 
> past, my original post was prompted because I have been reading the 
> archives alot lately and whenever XHTML Vs Docbook xomes up XHTML seems 
> to win out. For good reasons, lets see if it still does...
> 
> <snip what="pints 6-7 saying docbok is media independant and 
> semantically rich"/>
> 
> > 8) A central document type, standing between content and skin, must not
> > filter out richness of meaning of document structure, or it will
> > undermine the presentational possibilities.
> 
> I have recently been conviced that at the forrest intermediate level all 
> the semantic information I *thought* I needed is actually presentational 
> information (see 
> http://nagoya.apache.org/eyebrowse/BrowseList?listName=forrest-dev@xml.apache.org&by=thread&from=572241)
> 
> As a result it can all be captured in the class attribute. How it is 
> presented is then up to the rendering engine, which is exactly what 
> should happen. I have to admit, I was suprised to find that I ended up 
> agreeing with this view, but agree I did once I tried to justify my case 
> with what I thought were rock solid use cases!
> 
> > 9) A central document type must therefore be very rich in its
> > description of document structure and meaning - without bias toward
> > media type.
> 
> Quite the reverse. It should be as simple as possible, semantic meaning 
> has no place at the presentaional layer, it is only presentation that is 
> important.
If its simple then it will undermine the meaning of the original markup
- so the final presentation will be limited.
> 
> Can you give us a use case in which we need semantic meaning at the 
> intermediate stage in order to do anything *other* than effect how the 
> data is presented.

Of course ALL markup meanings at the intermediate stage can be used to
affect the final presentation. But their meanings should not be about
presentation or layout, So, role="MSc" is OK but align="right" is not.
The skin stylist should decide left or right and in any case the target
media might not have a left or a right. e.g. Audio.

> 
> (He He, this is exactly what Jeff asked me to do and I thought I could 
> do it, now I'm a convert - ex-smokers are always the worst :-)).
> 
> > 10) The central document type might be user chooseable, if Forrest can
> > cope with the complexity of this?
> 
> Interesting... what would Forrest gain from this? We have user 
> selectable source formats, isn't that what is important from the user 
> perspective?
> 
> Having said that, you can easily change your local xmaps in order to use 
> a different intermediate format for certain documents if you find the need.
> 
> > For those not familiar, here is an extract from Norman Walsh's DocBook:
> > The Definative Guide:
> > "DocBook provides a system for writing structured documents using SGML
> > or XML. It is particularly well-suited to books and papers about
> > computer hardware and software, though it is by no means limited to
> > them. DocBook is a document type definition (DTD). Because it is a large
> > and robust DTD, and because its main structures correspond to the
> > general notion of what constitutes a book, DocBook has been adopted by a
> > large and growing community of authors. DocBook is supported “out of the
> > box” by a number of commercial tools, and support for it is rapidly
> > growing in a number of free software environments. In short, DocBook is
> > an easy-to-understand and widely used DTD. Dozens of organizations use
> > DocBook for millions of pages of documentation, in various print and
> > online formats, worldwide."
> 
> For me this indicates it can be used as a source format, but not as an 
> intermediate format. The key phrases for me are:
> 
> "particularly well-suited to books and papers about computer hardware 
> and software"

"though it is by no means limited to them"

> and
> 
> "its main structures correspond to the general notion of what 
> constitutes a book"

Ah, excellent point! Note that "book" has two meanings:

1) The media type called book. I.E. a blank hard back with a binding and
blank white pages.

2) The content type called book. I.E. a novel or work of reference that
may or may not be delivered via the media type called book.

> - I use forrest for training materials. Docbook *could* be used as a 
> source format but it is not ideal in my domain. I use other formats as 
> my source that have been designed for the purpose. By converting to 
> docbook as an intermediate format I will be losing semantic information, 
> which is something you say I shouldn't do, but something I am now 
> comfortable with (see above).

When an author writes a document or book using a DTD that is truely
media independent - he will have no control of style, layout or
presentation and should not think of such things. His document might be
delivered through any of many different style sheets for each of many
different media types. So, for the author, the cost of delivering
content to many media types is that he can no longer use style as part
of his message.

> My point is, *no* (usable) intermediate format will be so expressive 
> that it can accomodate all users.

Quite. Equally true for whatever intermediate DTD you choose.

> On the XHTML side of things, the following text from the XHTML working 
> draft convinces me that XHTML should be the intermediate format:
> 
> "The XHTML family is designed with general user agent interoperability 
> in mind. Through a new user agent and document profiling mechanism, 
> servers, proxies, and user agents will be able to perform best effort 
> content transformation. Ultimately, it will be possible to develop 
> XHTML-conforming content that is usable by any XHTML-conforming user agent."

Yes, an excellent and interesting point. But, hasn't XML sort of
displaced this approach to some degree? Maybe it will catch up again?

XML is designed for XML > FO > PDF for instance. But if you try to do
XML > XHTML > FO > PDF you screw it all up.

> If I am going to lose some semantic information I want to be sure that 
> the language I am using is so generic that I don;t lose any 
> presentational information regardless of the media type. That is what 
> XHTML is designed for.

I think I've answered this one.

> Am I making any sense?
> 
> Ross

Yes lots of sense - good discussion - interesting points.

Peter.
--

Re: Documentv20 --> DocBook

Posted by Ross Gardler <rg...@wkwyw.net>.

Peter Hargreaves wrote:
> The following is to strengthen the case for DocBook (or some other media
> independent DTD) as the central document type.
> 

<snip what="points 1-4 saying we need a generic media independant format"/>

> 4) The skin should be the very first time that presentational, layout
> and target media languages (such as XHTML, fo, WML, PDF, DOC,etc) are
> used.

True.

XHTML doesn't contain presentational information. It does guide 
presnetation, but it doesn't dictate it. It is the final rendering into 
the target media language that has the actual presentation information 
(such as XHTML+CSS, note the '+CSS', PDF, DOC etc.)

Just because it is called X*HTML* doesn't mean it has to be rendered in 
a browser.

> 5) Some legacy source content DTDs are derived from target media
> languages but they need not influence the choice of central DTD.

True, but the XHTML Vs Docbook discussion has been had many times in the 
past, my original post was prompted because I have been reading the 
archives alot lately and whenever XHTML Vs Docbook xomes up XHTML seems 
to win out. For good reasons, lets see if it still does...

<snip what="pints 6-7 saying docbok is media independant and 
semantically rich"/>

> 8) A central document type, standing between content and skin, must not
> filter out richness of meaning of document structure, or it will
> undermine the presentational possibilities.

I have recently been conviced that at the forrest intermediate level all 
the semantic information I *thought* I needed is actually presentational 
information (see 
http://nagoya.apache.org/eyebrowse/BrowseList?listName=forrest-dev@xml.apache.org&by=thread&from=572241)

As a result it can all be captured in the class attribute. How it is 
presented is then up to the rendering engine, which is exactly what 
should happen. I have to admit, I was suprised to find that I ended up 
agreeing with this view, but agree I did once I tried to justify my case 
with what I thought were rock solid use cases!

> 9) A central document type must therefore be very rich in its
> description of document structure and meaning - without bias toward
> media type.

Quite the reverse. It should be as simple as possible, semantic meaning 
has no place at the presentaional layer, it is only presentation that is 
important.

Can you give us a use case in which we need semantic meaning at the 
intermediate stage in order to do anything *other* than effect how the 
data is presented.

(He He, this is exactly what Jeff asked me to do and I thought I could 
do it, now I'm a convert - ex-smokers are always the worst :-)).

> 10) The central document type might be user chooseable, if Forrest can
> cope with the complexity of this?

Interesting... what would Forrest gain from this? We have user 
selectable source formats, isn't that what is important from the user 
perspective?

Having said that, you can easily change your local xmaps in order to use 
a different intermediate format for certain documents if you find the need.

> For those not familiar, here is an extract from Norman Walsh's DocBook:
> The Definative Guide:
> "DocBook provides a system for writing structured documents using SGML
> or XML. It is particularly well-suited to books and papers about
> computer hardware and software, though it is by no means limited to
> them. DocBook is a document type definition (DTD). Because it is a large
> and robust DTD, and because its main structures correspond to the
> general notion of what constitutes a book, DocBook has been adopted by a
> large and growing community of authors. DocBook is supported “out of the
> box” by a number of commercial tools, and support for it is rapidly
> growing in a number of free software environments. In short, DocBook is
> an easy-to-understand and widely used DTD. Dozens of organizations use
> DocBook for millions of pages of documentation, in various print and
> online formats, worldwide."

For me this indicates it can be used as a source format, but not as an 
intermediate format. The key phrases for me are:

"particularly well-suited to books and papers about computer hardware 
and software"

and

"its main structures correspond to the general notion of what 
constitutes a book"

- I use forrest for training materials. Docbook *could* be used as a 
source format but it is not ideal in my domain. I use other formats as 
my source that have been designed for the purpose. By converting to 
docbook as an intermediate format I will be losing semantic information, 
which is something you say I shouldn't do, but something I am now 
comfortable with (see above).

My point is, *no* (usable) intermediate format will be so expressive 
that it can accomodate all users.

On the XHTML side of things, the following text from the XHTML working 
draft convinces me that XHTML should be the intermediate format:

"The XHTML family is designed with general user agent interoperability 
in mind. Through a new user agent and document profiling mechanism, 
servers, proxies, and user agents will be able to perform best effort 
content transformation. Ultimately, it will be possible to develop 
XHTML-conforming content that is usable by any XHTML-conforming user agent."

If I am going to lose some semantic information I want to be sure that 
the language I am using is so generic that I don;t lose any 
presentational information regardless of the media type. That is what 
XHTML is designed for.

Am I making any sense?

Ross