You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@jspwiki.apache.org by Frank Jennings <Fr...@Sun.COM> on 2009/02/23 15:48:08 UTC

JSPWiki to DocBook

Dear all,

I was searching the list for information on producing structured content 
from the wiki pages. I couldn't find any.

I developed this small standalone tool to produce DocBook content from 
the JSPWiki pages:
http://code.google.com/p/wits-parser/

Read Me is here:
http://code.google.com/p/wits-parser/wiki/ReadMe

I don't know if it will be of any use to people in this list. I would 
like to know if you really have a strong business case for converting 
wiki to structured documents.

Regards
Frank J

Re: JSPWiki to DocBook

Posted by "lgilardoni61@gmail.com" <lg...@gmail.com>.

Hi Frank
   happy to see you work. I myself did some search without finding that 
much (and had never - or at least yet - have time to build something 
myself), so I will surely have a look.

As for the business case ... yes - albeit I don't know how much strong. 
Anyway the story is more or less like this:
- use a wiki to prepare a book (or technical doc or ...); ideally you 
can use it to lay down main content, use the hypertext structure to keep 
working notes (and possibly
also notes that should end up in final material) plus anything you need 
for cooperative work
- at the end, be able to easily move relevant material to a structured 
document (word/latex/docboook ... that's probably the easy part)

I feel the hard part would be to discriminate between 'real' structure 
going from wiki to doc and 'support material'. I could expect a manually 
crafted index page could
point to main content (each referenced page a chapter - or different 
granularity) but how you can then specifify which links should end up in 
final doc and which shouldn't?

Frank Jennings ha scritto:
> Dear all,
>
> I was searching the list for information on producing structured 
> content from the wiki pages. I couldn't find any.
>
> I developed this small standalone tool to produce DocBook content from 
> the JSPWiki pages:
> http://code.google.com/p/wits-parser/
>
> Read Me is here:
> http://code.google.com/p/wits-parser/wiki/ReadMe
>
> I don't know if it will be of any use to people in this list. I would 
> like to know if you really have a strong business case for converting 
> wiki to structured documents.
>
> Regards
> Frank J

Re: JSPWiki to DocBook

Posted by Murray Altheim <mu...@altheim.com>.

Frank Jennings wrote:
> Dear all,
> 
> I was searching the list for information on producing structured content 
> from the wiki pages. I couldn't find any.
> 
> I developed this small standalone tool to produce DocBook content from 
> the JSPWiki pages:
> http://code.google.com/p/wits-parser/
> 
> Read Me is here:
> http://code.google.com/p/wits-parser/wiki/ReadMe
> 
> I don't know if it will be of any use to people in this list. I would 
> like to know if you really have a strong business case for converting 
> wiki to structured documents.

Hi Frank,

When I was still at Sun we did a lot of DocBook and HTML/XHTML stuff,
as Sun's documentation is largely in DocBook (well, a DocBook subset
called SolBook). So I know DocBook very well and have no criticisms
of its use.

When transforming DocBook to XHTML one loses much of the structure,
with the only reasonable way of maintaining some of it by populating
the 'class' attribute values of <div>, <span>, <p> and other block
elements to mimic the original DocBook element types. This is similar
to what people now call "microformats" (i.e., it was done many years
before that term was coined). You could of course transform all of
DocBook to simply <div> and <span> elements with the 'class' attributes
being the original DocBook element types and a CSS stylesheet to suit.
This would in effect be more appropriate than the tag abuse of forcing
DocBook's semantics into XHTML's. But HTML/XHTML has such a long
history of abuse that its semantics aren't very strong anyway, in
terms of normative practice.

One of the issues with transforming XHTML to DocBook is that one has
almost no structure to work with. There's none of the containment and
almost none of the required sequences or optional structures one finds
in DocBook. It's going from chaos to structure, and implying structure
where none is extant is a bit of tag abuse as well. With the wiki the
markup is at least a bit more regularized since it is itself a
transformation from the wiki markup. We can imply *some* of the
structures.

What I *might* recommend is looking at transforming the XHTML output
of JSPWiki into a tighter XHTML-based document type. If you look at
what is available in ISO HTML the design is actually somewhat similar
to DocBook, i.e., there's a set of numbered divisions (<DIV1> through
<DIV6>) with numbered headings for each. This is about as much real
structure as one finds in HTML/XHTML anyway and there's no tag abuse.

   Information technology — Document description and processing
   languages — HyperText Markup Language (HTML). ISO/IEC 15445:2000(E)
   https://www.cs.tcd.ie/15445/15445.HTML

   User's Guide to ISO/IEC 15445:2000 HyperText Markup Language (HTML)
   https://www.cs.tcd.ie/15445/UG.HTML

The relevant part of the ISO HTML DTD is

   <!-- The following marked section is informative only -->
   <![ %Preparation; [
   <!ELEMENT Pre-HTML    - -  (HEAD, BODY) >
   <!ATTLIST Pre-HTML %i18n;  -- Internationalization DIR and LANG -->
   <!ELEMENT BODY        - O  ((%block;)*,(H1,DIV1)* ) +(DEL|INS) >
   <!ELEMENT H1          - -  (%text;)+ >
   <!ELEMENT DIV1        O O  ((%block;)*, (H2,DIV2)* ) >
   <!ELEMENT H2          - -  (%text;)+ >
   <!ELEMENT DIV2        O O  ((%block;)*, (H3,DIV3)* ) >
   <!ELEMENT H3          - -  (%text;)+ >
   <!ELEMENT DIV3        O O  ((%block;)*, (H4,DIV4)* ) >
   <!ELEMENT H4          - -  (%text;)+ >
   <!ELEMENT DIV4        O O  ((%block;)*, (H5,DIV5)* ) >
   <!ELEMENT H5          - -  (%text;)+ >
   <!ELEMENT DIV5        O O  ((%block;)*, (H6,DIV6)* ) >
   <!ELEMENT H6          - -  (%text;)+ >
   <!ELEMENT DIV6        O O  ((%block;)*) >
   ]]>

You can see how the divisions and headings mimic DocBook. The headings
could either precede the division or be the first child element. I
personally think ISO HTML should have put the heading inside of the
division since the heading is for that division. But no matter.

Now, I'm not actually suggesting use of ISO HTML since (a) it's SGML
rather than XML based, so it's incompatible with XHTML, and (b) it
uses uppercase element type names, and (c) I don't actually recommend
using <DIV1> through <DIV6> (possibly <div class="sect1"> through
<div class="sect6"> instead?). Point is, this can all be done within
the existing XHTML DTD.

If you actually wanted a more restrictive XHTML DTD for an output
structure mimicking ISO HTML's hierarchy, I'm willing to contribute
some time writing an XHTML module to do this (I might even have one
somewhere from when I did that work back in the late 90s). That is, if
you decided you wanted to do this and got to the point of needing it.

To answer your question more directly, we've been looking into an
archive format for content coming off the wiki and have considered
DocBook, but are more likely to go with validated XHTML since it
more closely fits with the semantics of the wiki's output markup.

Murray

...........................................................................
Murray Altheim <murray09 at altheim dot com>                       ===  = =
http://www.altheim.com/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk               = =  = =

       Boundless wind and moon - the eye within eyes,
       Inexhaustible heaven and earth - the light beyond light,
       The willow dark, the flower bright - ten thousand houses,
       Knock at any door - there's one who will respond.
                                       -- The Blue Cliff Record