You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@forrest.apache.org by Jeff Turner <je...@apache.org> on 2002/12/27 14:43:32 UTC

[RT] Entities in XML docs

Hi,

Stylebook has a nice feature whereby a project can create a file,
entities.ent, containing XML entity definitions for use in project XML
files.  Here is a sample from Xalan's entities.ent:

>>>>>>>>>>
<?xml encoding="US-ASCII"?>

<!ENTITY xslt "Xalan">
<!ENTITY xslt4j "Xalan-Java">
<!ENTITY xslt4j2 "Xalan-Java 2">
<!ENTITY xslt4j-dist "xalan-j_2_4_D1">
<!ENTITY xslt4j-dist-bin "&xslt4j-dist;-bin">
<!ENTITY xslt4j-dist-src "&xslt4j-dist;-src">
<!ENTITY xslt4j-current "&xslt4j; version 2.4.D1">
<!ENTITY xslt4j-distdir "http://xml.apache.org/dist/xalan-j/">
<!ENTITY xml4j "Xerces-Java">
<!ENTITY xml4j1 "Xerces-Java 1">
<!ENTITY xml4j2 "Xerces-Java 2">
<!ENTITY xml4j-used "&xml4j; 2.0.1">
<!ENTITY xml4j-jar "xercesImpl.jar">
<!ENTITY xslt4c "Xalan-C++">
<!ENTITY xml4c "Xerces-C++">
<!ENTITY download "The &xslt4j-current; download from xml.apache.org includes &xml4j-jar; from &xml4j-used; and xml-apis.jar. For version
information about the contents of xml-apis.jar, see the JAR manifest.">

<!ENTITY xsltcwhatsnewhead '<li><link anchor="xsltc">XSLTC</link></li>'>

<<<<<<<<<<<<

This entities.ent file is automatically included in the book.dtd, through this
PEref:

<!ENTITY % externalEntity SYSTEM "sbk:/sources/entities.ent">
%externalEntity;


Reusing snippets of content like this seems a pretty nice feature.  In Forrest,
we have a couple of options to get the same effect:


1) Emulate the Stylebook solution in document-v11.dtd:

<!ENTITY % externalEntity SYSTEM "context://entities.ent">
%externalEntity;

Currently, this just results in an 'unknown protocol: context' error.
Which is odd, because I thought the XML parser would have an
EntityResolver set that understands Cocoon protocols.  Or is this just
wishful thinking?

The problem with this general approach is that XML docs can no longer be
validated outside Cocoon, eg from a catalog-aware editor.  IMHO that
makes this approach unacceptable.


2) Tell users to do it themselves.  Each XML file would have something like:

<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
"document-v11.dtd" [
<!ENTITY % local-ents SYSTEM "entities.ent">
%local-ents;
]>

<document>
  ...
</document>

Simple, effective, and doesn't lock users into using only Forrest.  Only problem
is, it assumes rather more XML knowledge than I'd expect most doc editors would
have.  I think this should be our default solution, unless something better
comes up..


3) Avoid XML entities altogether.

3.1) Use XInclude.  Eg, given an entities.xml file:

<entities>
  <entity id="xml4j">Xerces-Java</entity>
  <entity id="xml4j1">Xerces-Java 1</entity>
  <entity id="xml4j2">Xerces-Java 2</entity>

  <entity id="xslt4j-current">
    <xi:include href="#xslt4j"/> version 2.4.D1
  </entity>
  <entity id="download">
    <p>
      The <xi:include href="#xslt4j-current"/> download includes ...
    </p>
  </entity>
</entities>

to include an entity, we'd use:

<xi:include href="../entities.xml#download"/>

With a SimpleMappingMetaModule we can simplify that to 

<xi:include href="res:download"/>

This method has the limitation that values cannot be included halfway inside an
attribute.  Eg, we couldn't have

<s1 title="The <xi:include href="#xml4j"/> project">
  ...
</s1>

Another disadvantage is that it imposes XInclude (and namespaces) on docs.  We
currently have a DTD based architecture that can't really handle namespaces.

It is also a PITA having to modify the DTD to support xi:include.  Do we define
it as an inline or block-level element?  We really need both.  Then when users
want to use Docbook, they must first hack the DTD to allow xi:include.


3.2) We implement a SearchReplaceTransformer, which replaces ${variables} with
values.  Eg, entities.xml:

<entities>
  <xml4j>Xerces-Java</xml4j>
  <xml4j1>Xerces-Java 1</xml4j1>
  <xml4j2>Xerces-Java 2</xml4j2>

  <xslt4j-current>
    ${xslt4j} version 2.4.D1
  </xslt4j-current>

  <download>
    <p>
      The ${xslt4j-current} download includes ...
    </p>
  </download>
</entities>

This seems a lot more intuitive than XInclude, and doesn't require modifying
DTDs.  We could go all the way and use one of the expression languages in
Jakarta Commons, like jexl[1].


Are there any more options I haven't thought of?


My current preference is to go with 3.2, and implement it with InputModules, the
same way LinkRewriterTransformer works.  Using XInclude would involve less
coding, but the DTD problems would be too horrible..

Thoughts?


--Jeff


[1] http://jakarta.apache.org/commons/sandbox/jexl/

Re: [RT] Entities in XML docs

Posted by Steven Noels <st...@outerthought.org>.

Joerg Pietschmann wrote:

> As for XSD support, I though more of poor Steven wrestling with
> this expensive XMetal thingie, which is a bit agnostic when it comes
> to RNG. But then, if *Steven* asks Corel for RNG support and pulls
> the ASF weight properly, they might even listen... 

I strongly advise anyone to just go and talk to their tools vendor.

I managed to (co-)influence XMLSpy in doing a (half a decent) job in 
supporting OASIS catalogs already, and have been bugging Stylus about it 
too. Pulling the 'being an Apache committer' trick did help.

I think the old XMetal gang is just happy that they still are allowed to 
continue development by the new Corel bosses :-<

<snip/>

> Well, the above might be more appropriate on a weblog than on forrest-dev.

I sure would become a reader then ;)

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: [RT] Entities in XML docs

Posted by Joerg Pietschmann <j3...@yahoo.de>.

On Monday 30 December 2002 22:04, you wrote:
> XMLSchema suc**
Compared to DTDs, XSD is a big leap, and even in a vaguely
right direction. That they ended up in the middle of a tar pit is,
well, lack of good karma or something.

As for XSD support, I though more of poor Steven wrestling with
this expensive XMetal thingie, which is a bit agnostic when it comes
to RNG. But then, if *Steven* asks Corel for RNG support and pulls
the ASF weight properly, they might even listen... 

> But vendors and managers don't care if a technology is good or not, they
> go for what they understand, and 'recommended by W3C' today means 'good
> for you', so they'll use it.

Amen. Straying OT again, it seems most CTOs still think of the W3C as
the independent consortium lead by individuals with great visions and
without commercial interests. Unfortunately the W3C is now dominated
by the big software vendors, and they have other priorities:
- leverage the branding power of the standard body
- get the standards as compatible as possible with their own current
  and planned products
- create some buzz causing customers to move to this products.
If a standard turns out to be bloated and overcomplicated, so what?
This is even an opportunity to sell more tools, lock out would-be
competitors who can't afford 500+ on staff just for testing, and to "add
value" (read: proprietary extensions).

> > Amazing how much your thoughts diverged from Jeff's.
> Well, I wish somebody commented on them though.
The point appears to be moot now.

> Yes, I know that. Still I can't believe it's not possible to do a better
> job on such important pieces of the XML model, expecially when there is
> a wonderful community of expert on xml-dev that have very clear and
> precise visions.
The XML-DEV isn't infallible either, and I doubt the list as a whole would
have done a better job. Good standards appear to be driven by a *small*
group with a common vision and a really good understanding of the
whole subject. On XML-DEV there are enough "names" who don't
see the whole picture, and probably don't *want* to see it. They are, of
course, very good at doing post mortem analysis.

It would be interesting to have an analysis why for example the SCSI
standard series grew organically without major problems and even
survived hitting hard physical limits, while SQL or some XML-related
standards turned into a big ball of mud.
A guess: SCSI had SoC built in right from the start and applied it
rigorously each time new abstractions were needed. But this is not the
whole magic, SQL1 wasn't that bad at SoC either.
Ironically: SQL couldn't get clear of a seemingly small snag called
"datetype", which had 99.9% hidden in the mud (a commonly accepted
algebra for manipulating and comparing dates and timespans expressed
in terms of calendar dates). This is what XQuery can bring us now,
regardless of what bad you can say of it otherwise...
Well, the above might be more appropriate on a weblog than on forrest-dev.

J.Pietschmann

Re: [RT] Entities in XML docs

Posted by Stefano Mazzocchi <st...@apache.org>.

Joerg Pietschmann wrote:
> On Monday 30 December 2002 04:07, you wrote:
> 
>>JClark's RNG validator works as a SAX filter
> 
> Good news!

Yep.

>>Still, the issue is: do we *really* want to maintain the structure of
>>our documents in both DTDs and RNGs at the same time?
> 
> The issue is: Can you get away with dumping DTDs and forcing
> RNG on everyone?

I'll tell you one thing: for sure I won't get fired for that :)

 > If there is enough documentation which tells
> users where to get command line RNG validators and how to plug
> them into various authoring environments, then why not? Well,
> the industry went for XSD (prematurely, I'm about to think), so it
> may be wise to cater for XSD for a while too...

XMLSchema sucks ass. Period. I still have to hear *one* person that 
doesn't think the opposite.

Damn, even my internal spies in W3C tell me there are rumors that 
somebody is talking about using RNG as XMLSchema 2.0. This should tell 
us something.

Imagine that RNG was a W3C reccomendation and XMLSchema came out of 
Oracle and was submitted to Oasis, would we be having any doubt at all?

But vendors and managers don't care if a technology is good or not, they 
go for what they understand, and 'recommended by W3C' today means 'good 
for you', so they'll use it.

My point is: *we* don't need to be influenced by them if we have a tool 
that falls back and gives them what the need.

This is exactly why JClark is spending his time writing Trang because he 
figured out that automation is the only way to route around the W3C 
political bullshit on the schema realm.

He wants to balance the picture so that people will write the schema in 
the semantics they like the best and the tools will translate them at need.

This is what I'm suggesting for Forrest in the long run.

>>>An easy implementation doesn't mean there are no problems.
> 
> [Notes and answers snipped]
 >
> Amazing how much your thoughts diverged from Jeff's.

Well, I wish somebody commented on them though.

>>It's scary to see that the only people that actually *get it* are those
>>who are not seating in an expert group.
> 
> I've been in two standardization commissions. Its no fun at all.

Yes, I know that. Still I can't believe it's not possible to do a better 
job on such important pieces of the XML model, expecially when there is 
a wonderful community of expert on xml-dev that have very clear and 
precise visions.

> It is already bad if there are managers throwing around the market
> power of the organisations they represent, but experts are even worse:
> everybody (including me) has his (never seen a woman there :) pet
> mechanism/architecture/syntax, nurtured for half a professional life,
> which *must* get into the standard for the benefit of mankind,
> regardless of other losses.

Yep.

> It seems odd that there are occasionally standards which are good
> right from the start, like SCSI or XSLT/XPath 1.0. Even more odd
> that SCSI didn't run into the "second system syndrome" (XPath 2.0
> probably will).

Yes, it will, influenced by XQuery and the ego-problems in the XSL WG

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: [RT] Entities in XML docs

Posted by Joerg Pietschmann <j3...@yahoo.de>.

On Monday 30 December 2002 04:07, you wrote:
> JClark's RNG validator works as a SAX filter
Good news!

> Still, the issue is: do we *really* want to maintain the structure of
> our documents in both DTDs and RNGs at the same time?
The issue is: Can you get away with dumping DTDs and forcing
RNG on everyone? If there is enough documentation which tells
users where to get command line RNG validators and how to plug
them into various authoring environments, then why not? Well,
the industry went for XSD (prematurely, I'm about to think), so it
may be wise to cater for XSD for a while too...

> > An easy implementation doesn't mean there are no problems.
[Notes and answers snipped]
Amazing how much your thoughts diverged from Jeff's.

> It's scary to see that the only people that actually *get it* are those
> who are not seating in an expert group.
I've been in two standardization commissions. Its no fun at all.
It is already bad if there are managers throwing around the market
power of the organisations they represent, but experts are even worse:
everybody (including me) has his (never seen a woman there :) pet
mechanism/architecture/syntax, nurtured for half a professional life,
which *must* get into the standard for the benefit of mankind,
regardless of other losses.

It seems odd that there are occasionally standards which are good
right from the start, like SCSI or XSLT/XPath 1.0. Even more odd
that SCSI didn't run into the "second system syndrome" (XPath 2.0
probably will).

J.Pietschmann

Re: [RT] Entities in XML docs

Posted by Joerg Pietschmann <j3...@yahoo.de>.

On Monday 30 December 2002 13:06, Jeff wrote:
[snip]

Just a few clarifications:

> And &&foo;;.  Ie, it's not a valid variable, 
The problem is &&foo;; looks intrinsically bad, but ${${foo}} doesn't.
You have to sell this point to the users who might confuse this with
"recursive substitution". Belive me, this will happen.

> >   <!DOCTYPE foo [
> >     <!ELEMENT foo (a)>
> >     <!ELEMENT a #PCDATA>]>
> >   <foo>${foo}</foo>
> >   and foo expands to <a>bar</a>.
>
> That looks valid to me.  I assume you meant foo to expand to <b>bar</b>
> or something.
The point was that the document is *invalid* before ${foo} is
substituted, thereby possibly causing problems with schema directed
editors, and is valid only after the substitution.

> Options I see:
...
> My preferences are 1), 3) and 2.2).
You forgot 2.2b) Use <xi:include> for including XML snippets, and too
bad about validation.
BTW for 3), I don't see in what sense it ties forrest to Xerces: its the
easiest way to hook into Xerces, but you also can turn off parser
validation and put a SAX driven RNG (or DTD) validator in the pipeline
after the XInclude processor. Basically, its the same as 1), just a
concrete, near term implementation.

> ... write thick books explaining..
If I'd been the author of only *one* of this thick best-selling books... ;)

J.Pietschmann

Re: [RT] Entities in XML docs

Posted by Jeff Turner <je...@apache.org>.

On Sun, Dec 29, 2002 at 08:38:04PM +0100, Joerg Pietschmann wrote:
> On Sunday 29 December 2002 04:47, Jeff Turner wrote:
> > That was Stefano's suggestion: that we do text-only expansion for now
> > (element expansion is still possible with xinclude), and when we migrate
> > to a decent schema language we can think about removing the text-only
> > restriction.
> 
> Why not migrating to either a more powerful schema language
> or another validation process right now?

Can do, if it solves more problems than it creates.

> AFAIR your proposal was meant as a mechanism to supplant
> XML entities, in particular in contexts where it is hard for users
> to get their entity definitions into the DTD.
> The problem you want to avoid is that a document with <xi:include>
> or <nn:replace> would not validate.
> 
> Entities work because they are part of the DTD agains which the
> parser validates and because the parser expands them before
> examining the context for validation.
> In any other approach, the parser does not know about the
> substitutions to be made. Because the validation is, historically,
> still an integral part of the parsing step, rather than a separate
> step, this may cause problems. This is independent whether
> the substitution is done by XInclude, an XSLT replacing
> <nn:replace> elements or ${} substitution.
> This doesn't mean we can't solve the problem: Run a processor
> doing the expansion, then a validator. If performance doesn't
> matter all that much, an intermediate file can be used. Unfortunately,
> I don't know of any validator taking a SAX event stream as input
> for better performance, but I'm sure if the need arises, someone
> will take care about this. The only problem remaining are schema
> directed editors.

Yes, I think that is generally understood to be the long-term goal.

> > I don't fully understand why we can't give users the option to shoot
> > themselves in the foot by including elements, but implementation-wise
> > there's little difference (two different InputModules).
> An easy implementation doesn't mean there are no problems.
> 1. Entity expansion is recursive. Is ${} expansion recursive too?
>   Like foo -> ${bar} and bar -> baz.

Don't see why not.  It's quite useful:

projName = Forrest
proj = ${projName} v0.3.1

>   How do you avoid loops? <evil grin>

Two options:
 1) Loop detection algorithm, like XInclude.  Can't be that hard can it?
 2) Same way entity expansion avoid loops: a variable value cannot
 contain an undeclared variable reference.

> 2. Is something like ${${foo}} allowed, supposed "foo" is substituted by
>   "bar" and "bar" by "baz"? Don't forget to explain the difference to
>   recursive expansion as in 1.

Same as if we had:

<!ENTITY foo "bar">
<!ENTITY bar "baz">

And &&foo;;.  Ie, it's not a valid variable, so print a warning and
ignore it.

> 3. An XML file with a ${} substituted by a subtree with mandatory
>   elements at the place is not valid. For example
>   <!DOCTYPE foo [
>     <!ELEMENT foo (a)>
>     <!ELEMENT a #PCDATA>]>
>   <foo>${foo}</foo>
>   and foo expands to <a>bar</a>.

That looks valid to me.  I assume you meant foo to expand to <b>bar</b>
or something.

>   That's the point of restricting substitutions to text.

Say we have a layered set of operations:

1) parse XML+ns
2) variable substitution
3) validation

Then why should anything in step 2 care about step 3?  Why should the
variable substituter have to worry about if the result is valid?
XInclude doesn't worry about this.  It deals with infosets, not PSVIs.

> 4. Elements in ${} substitution get their namespaces from the repository,
>  I think. Like if foo -> <nn:a>, the binding for the nn prefix is taken from
>  the repository XML file rather than from the document where ${foo}
>  occurs. XInclude has the same problem, but then, the XInclude spec
>  takes care of this aspect.
>  Well, namespaces and entities mix even less well.

Well I never.  First they tell me Santa isn't real, and now you say
namespaces will cause problems.  Keeping namespace consistency *sounds*
relatively easy, but perhaps it's time for me to read the XInclude spec
properly :)

> Last but not least I think giving users plenty of means to shoot themselves
> in the foot is not a very good approach, even if the users demand them.

See above about layering.  Kapow.. no more foot, but at least we still
have SoC :) :) I love bad puns.

> Read through the discussions about <xsl:script> on the XSL list for some
> arguments.

I remember lots of (non-Java) implementors complaining (quite rightly)
about having to implement Javascript..

> > > XML editors
> > vim + xmllint
> External validation, can be handled easily.
>
> > > - Write a customized toolset.
> > ?
> The processor doing the substitution, perhaps catalogue support, cross
> references, authoring support. Someone might also want to have a
> processor working outside Cocoon.

Yes, inventing some new ${variable} syntax ties the XML to Forrest, which
isn't nice.

So what do we do?

Options I see:

 1) Abandon DTDs and move to a properly layered system, where we can use
    <xi:include> elements (or any other mechanism), and have them replaced
    *before* validation.

 2) Stick with DTDs and the implication that validation occurs before
    variable substitution.

  2.1) Use XInclude, and simply hack any DTDs we need to support the
       xi:include element.  We could provide specially modified driver
       DTDs for things like Docbook, so users don't need to figure out
       which %peref; to modify themselves.

  2.2) Use ${variables} for including XML snippets, and too bad about
       validation.

  2.3) Use ${variables}, but as text-only replacers, so the validity of
       the XML is preserved.  Easy to implement.  Not very useful, as
       I'd imagine many inclusions would be small XML snippets, like
       paragraphs and <link>s.

 3) Tie Forrest to Xerces, and use an XNI XInclude processor:

   "XInclude Processor
     An XNI parser component can be written to handle XInclude by
     analyzing the streaming information set and automatically inserting
     the contents of referenced links into the event stream. By adding
     this component to the parser pipeline before the validator, included
     content would appear transparent to the validator as if that content
     was in the original document. "
     - http://xml.apache.org/xerces2-j/xni.html

My preferences are 1), 3) and 2.2).


> > Just like the C preprocessor, It is an opt-in solution to a practical
> > problem.
> I've seen simple "solutions to practical problems" used and getting into
> deep doo-doo in the long term much to often. This kind of pragmatism
> brought us BASIC, file name suffixes denoting the content format, Tag
> Soup and the unmentionable abominations related to what's commonly
> called gHorribleKludge on XML-DEV. I still think the world would be a
> better place if such abberations had been avoided. Also, propagators
> of "pragmatic solutions" tend to walk on to the next buzz, leaving the
> mess to others to clean up. :-/

Where would the XML industry be if it weren't for kludges to support,
document, hype, anti-hype, complain about on XML-DEV, code around, write
thick books explaining..

> J.Pietschmann

Thought-provoking email :)  Thanks, it probably saved the project a
time-consuming detour.


--Jeff

Re: [RT] Entities in XML docs

Posted by Stefano Mazzocchi <st...@apache.org>.

Joerg Pietschmann wrote:
> On Sunday 29 December 2002 04:47, Jeff Turner wrote:
> 
>>That was Stefano's suggestion: that we do text-only expansion for now
>>(element expansion is still possible with xinclude), and when we migrate
>>to a decent schema language we can think about removing the text-only
>>restriction.
> 
> 
> Why not migrating to either a more powerful schema language
> or another validation process right now?
> AFAIR your proposal was meant as a mechanism to supplant
> XML entities, in particular in contexts where it is hard for users
> to get their entity definitions into the DTD.
> The problem you want to avoid is that a document with <xi:include>
> or <nn:replace> would not validate.
> 
> Entities work because they are part of the DTD agains which the
> parser validates and because the parser expands them before
> examining the context for validation.
> In any other approach, the parser does not know about the
> substitutions to be made. Because the validation is, historically,
> still an integral part of the parsing step, rather than a separate
> step, this may cause problems. This is independent whether
> the substitution is done by XInclude, an XSLT replacing
> <nn:replace> elements or ${} substitution.
> This doesn't mean we can't solve the problem: Run a processor
> doing the expansion, then a validator. If performance doesn't
> matter all that much, an intermediate file can be used. Unfortunately,
> I don't know of any validator taking a SAX event stream as input
> for better performance, but I'm sure if the need arises, someone
> will take care about this.

JClark's RNG validator works as a SAX filter

http://www.thaiopensource.com/relaxng/jing.html

Still, the issue is: do we *really* want to maintain the structure of 
our documents in both DTDs and RNGs at the same time?

But wait: JClark has a working RNG -> DTD/XMLSchema converter.

http://www.thaiopensource.com/relaxng/trang.html

NOTE: this conversion is *intrinsically* lossy since RNG is *more* 
powerful in some areas than DTD and XMLSchema, but the tool tries to 
guess what's the best thing to do. Quite impressive internal design, to 
be honest, like all JClark's work.

> The only problem remaining are schema directed editors.

We might:

  1) have our documentation structure described as RNG
  2) use Trang to convert it to both DTDs and XMLSchemas (so that users 
can use whichever fits them)
  3) we write a Jing-validating transformer and validate at the stage of 
the pipeline that we like.

NOTE: this path is totally orthogonal to the 'token expansion' one.

>>I don't fully understand why we can't give users the option to shoot
>>themselves in the foot by including elements, but implementation-wise
>>there's little difference (two different InputModules).
> 
> An easy implementation doesn't mean there are no problems.
> 1. Entity expansion is recursive. Is ${} expansion recursive too?

I would say no. No recursion for token expansion.

>   Like foo -> ${bar} and bar -> baz.
>   How do you avoid loops? <evil grin>

Exactly.

> 2. Is something like ${${foo}} allowed, supposed "foo" is substituted by
>   "bar" and "bar" by "baz"? Don't forget to explain the difference to
>   recursive expansion as in 1.

That should not be allowed. Only one pass of token expansion will be 
performed.

> 3. An XML file with a ${} substituted by a subtree with mandatory
>   elements at the place is not valid. For example
>   <!DOCTYPE foo [
>     <!ELEMENT foo (a)>
>     <!ELEMENT a #PCDATA>]>
>   <foo>${foo}</foo>
>   and foo expands to <a>bar</a>.
>   That's the point of restricting substitutions to text.

Exactly. Token expansion should be limited to text and will escape 
anythign into text (so if you had nexted elements, they will end up 
escaped like in a big CDATa section)

> 4. Elements in ${} substitution get their namespaces from the repository,
>  I think. Like if foo -> <nn:a>, the binding for the nn prefix is taken from
>  the repository XML file rather than from the document where ${foo}
>  occurs. XInclude has the same problem, but then, the XInclude spec
>  takes care of this aspect.
>  Well, namespaces and entities mix even less well.

The above fixes this as well.

> Last but not least I think giving users plenty of means to shoot themselves
> in the foot is not a very good approach, even if the users demand them.

There has been *no* demand of things that let shoot them in their foot. 
The demand is: I want to store text tokens in one place and use them all 
over so that update is easier.

The use of ${*:*} variables with copying-over fallback allows that with 
very little hassle and doesn't create future problems if:

  1) token expansion is a single pass (no recursion, loops or other 
weird things)
  2) expanded tokens are escaped (no internal structure allowed)

Also, I see no need for escaping syntax since these variable will 
probably happen inside code pieces and those normally need CDATA 
escaping anyway for < > and &.

> Read through the discussions about <xsl:script> on the XSL list for some
> arguments.
> 
> 
>>>XML editors
>>
>>vim + xmllint
> 
> External validation, can be handled easily.
> 
> 
>>>- Write a customized toolset.
>>
>>?
> 
> The processor doing the substitution, perhaps catalogue support, cross
> references, authoring support. Someone might also want to have a
> processor working outside Cocoon.
> 
> 
>>Just like the C preprocessor, It is an opt-in solution to a practical
>>problem.
> 
> I've seen simple "solutions to practical problems" used and getting into
> deep doo-doo in the long term much to often. This kind of pragmatism
> brought us BASIC, file name suffixes denoting the content format, Tag
> Soup and the unmentionable abominations related to what's commonly
> called gHorribleKludge on XML-DEV. I still think the world would be a
> better place if such abberations had been avoided. Also, propagators
> of "pragmatic solutions" tend to walk on to the next buzz, leaving the
> mess to others to clean up. :-/

No shit.

Look at what FS can do to you:

http://lists.xml.org/archives/xml-dev/200211/msg00467.html

It's scary to see that the only people that actually *get it* are those 
who are not seating in an expert group.

Sometimes it's better to just say: "screw the damn W3C" and do you own 
simple KISS stuff and have a user community keep you honest about it.

Will the W3C ever get this? well, hopefully some of them read xml-dev. :-)

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: [RT] Entities in XML docs

Posted by Joerg Pietschmann <j3...@yahoo.de>.

On Sunday 29 December 2002 04:47, Jeff Turner wrote:
> That was Stefano's suggestion: that we do text-only expansion for now
> (element expansion is still possible with xinclude), and when we migrate
> to a decent schema language we can think about removing the text-only
> restriction.

Why not migrating to either a more powerful schema language
or another validation process right now?
AFAIR your proposal was meant as a mechanism to supplant
XML entities, in particular in contexts where it is hard for users
to get their entity definitions into the DTD.
The problem you want to avoid is that a document with <xi:include>
or <nn:replace> would not validate.

Entities work because they are part of the DTD agains which the
parser validates and because the parser expands them before
examining the context for validation.
In any other approach, the parser does not know about the
substitutions to be made. Because the validation is, historically,
still an integral part of the parsing step, rather than a separate
step, this may cause problems. This is independent whether
the substitution is done by XInclude, an XSLT replacing
<nn:replace> elements or ${} substitution.
This doesn't mean we can't solve the problem: Run a processor
doing the expansion, then a validator. If performance doesn't
matter all that much, an intermediate file can be used. Unfortunately,
I don't know of any validator taking a SAX event stream as input
for better performance, but I'm sure if the need arises, someone
will take care about this. The only problem remaining are schema
directed editors.

> I don't fully understand why we can't give users the option to shoot
> themselves in the foot by including elements, but implementation-wise
> there's little difference (two different InputModules).
An easy implementation doesn't mean there are no problems.
1. Entity expansion is recursive. Is ${} expansion recursive too?
  Like foo -> ${bar} and bar -> baz.
  How do you avoid loops? <evil grin>
2. Is something like ${${foo}} allowed, supposed "foo" is substituted by
  "bar" and "bar" by "baz"? Don't forget to explain the difference to
  recursive expansion as in 1.
3. An XML file with a ${} substituted by a subtree with mandatory
  elements at the place is not valid. For example
  <!DOCTYPE foo [
    <!ELEMENT foo (a)>
    <!ELEMENT a #PCDATA>]>
  <foo>${foo}</foo>
  and foo expands to <a>bar</a>.
  That's the point of restricting substitutions to text.
4. Elements in ${} substitution get their namespaces from the repository,
 I think. Like if foo -> <nn:a>, the binding for the nn prefix is taken from
 the repository XML file rather than from the document where ${foo}
 occurs. XInclude has the same problem, but then, the XInclude spec
 takes care of this aspect.
 Well, namespaces and entities mix even less well.
Last but not least I think giving users plenty of means to shoot themselves
in the foot is not a very good approach, even if the users demand them.
Read through the discussions about <xsl:script> on the XSL list for some
arguments.

> > XML editors
> vim + xmllint
External validation, can be handled easily.

> > - Write a customized toolset.
> ?
The processor doing the substitution, perhaps catalogue support, cross
references, authoring support. Someone might also want to have a
processor working outside Cocoon.

> Just like the C preprocessor, It is an opt-in solution to a practical
> problem.
I've seen simple "solutions to practical problems" used and getting into
deep doo-doo in the long term much to often. This kind of pragmatism
brought us BASIC, file name suffixes denoting the content format, Tag
Soup and the unmentionable abominations related to what's commonly
called gHorribleKludge on XML-DEV. I still think the world would be a
better place if such abberations had been avoided. Also, propagators
of "pragmatic solutions" tend to walk on to the next buzz, leaving the
mess to others to clean up. :-/

J.Pietschmann

Re: [RT] Entities in XML docs

Posted by Steven Noels <st...@outerthought.org>.

Joerg Pietschmann wrote:

> Can we have a survey on which XML editors people (doc editors)
> use, and how much they rely on on-the-fly XML validation repectively
> DTD direction of the editor?
> I use Emacs+PSGML, and while PSGML is DTD directed, I can insert
> any elements without problems, full validation is an external process
> anyway.

I use the heavily DTD/XMLSchema-directed editor XMetal. Besides, I also 
use Stylus and (sometimes) XMLSpy.

> As you might have guessed, I usually oppose the creation of new
> languages, even (especially!) simple ones like ${} substitution. Every
> new language means
> - Full spec. BTW your proposal is heavily underspecified. E.g. what is the
>   prefix in ${some:ambiguous:stuff}?
> - Unexpected interactions with other mechanisms, potential for name
>   clashes etc.
> - Write a customized toolset.
> - Dependency on the availability of said tools in other environments, like
>   IDEs, other toolsets etc.
> You know, I'm not really a fan of "specification by implementation", look
> what happened to the "easy and intuitive" C preprocessor.

Wise words - thanks, Joerg!

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: [RT] Entities in XML docs

Posted by Jeff Turner <je...@apache.org>.

On Sat, Dec 28, 2002 at 05:33:05PM +0100, Joerg Pietschmann wrote:
> On Saturday 28 December 2002 16:52, you wrote:
> > It has the same problems as <xi:include>, ie requires editing DTDs so
> > that nn:replace is recognized.
> 
> I'm not sure whether this is an argument: the doc with ${} abbrevs
> might be invalid according to the DTD too because of missing
> mandatory child elements, unless you restrict the replacement
> to pure text, no elements.

That was Stefano's suggestion: that we do text-only expansion for now
(element expansion is still possible with xinclude), and when we migrate
to a decent schema language we can think about removing the text-only
restriction.

I don't fully understand why we can't give users the option to shoot
themselves in the foot by including elements, but implementation-wise
there's little difference (two different InputModules).

> The solution is of course to expand the replacements before
> validating. At this point the abbreviation syntax doesn't matter.
> Ok, this doesn't exactly work out of the box with current tools,
> and is certainly bad for DTD directed editors.
>
> If you start relaxing the schema you can allow <xi:include> or
> <nn:replace/> as well, in particular if you use a schema language
> which allows elements wholesale from a namespace, like XSD
> (dunno about RNG). This might even facilitate editor support
> for abbreviations.

> Can we have a survey on which XML editors people (doc editors)
> use, and how much they rely on on-the-fly XML validation repectively
> DTD direction of the editor?

vim + xmllint

> I use Emacs+PSGML, and while PSGML is DTD directed, I can insert
> any elements without problems, full validation is an external process
> anyway.
> 
> As you might have guessed, I usually oppose the creation of new
> languages, even (especially!) simple ones like ${} substitution. Every
> new language means
> - Full spec. BTW your proposal is heavily underspecified. E.g. what is the
>   prefix in ${some:ambiguous:stuff}?

I guess with InputModules, 'some' is the prefix.

> - Unexpected interactions with other mechanisms, potential for name
>   clashes etc.

Practically I don't think this will be a problem:

 - The ${*:*} syntax is sufficiently unique to avoid most conflicts.
 - If 'scheme' in ${scheme:variable} is not a known scheme, the string
   isn't replaced.
 - If 'variable' in ${scheme:variable} isn't defined, the string isn't
   replaced.
 - If ${scheme:variable} IS meaningful to the user, then they can escape
   it with $${scheme:variable}, or some other syntax.

> - Write a customized toolset.

?

> - Dependency on the availability of said tools in other environments, like
>   IDEs, other toolsets etc.

My editor doesn't do anything special with C #includes either.

> You know, I'm not really a fan of "specification by implementation", look
> what happened to the "easy and intuitive" C preprocessor.

Just like the C preprocessor, It is an opt-in solution to a practical
problem.

--Jeff

> J.Pietschmann

Re: [RT] Entities in XML docs

Posted by Joerg Pietschmann <j3...@yahoo.de>.

On Saturday 28 December 2002 16:52, you wrote:
> It has the same problems as <xi:include>, ie requires editing DTDs so
> that nn:replace is recognized.

I'm not sure whether this is an argument: the doc with ${} abbrevs
might be invalid according to the DTD too because of missing
mandatory child elements, unless you restrict the replacement
to pure text, no elements.

The solution is of course to expand the replacements before
validating. At this point the abbreviation syntax doesn't matter.
Ok, this doesn't exactly work out of the box with current tools,
and is certainly bad for DTD directed editors.

If you start relaxing the schema you can allow <xi:include> or
<nn:replace/> as well, in particular if you use a schema language
which allows elements wholesale from a namespace, like XSD
(dunno about RNG). This might even facilitate editor support
for abbreviations.

Can we have a survey on which XML editors people (doc editors)
use, and how much they rely on on-the-fly XML validation repectively
DTD direction of the editor?
I use Emacs+PSGML, and while PSGML is DTD directed, I can insert
any elements without problems, full validation is an external process
anyway.

As you might have guessed, I usually oppose the creation of new
languages, even (especially!) simple ones like ${} substitution. Every
new language means
- Full spec. BTW your proposal is heavily underspecified. E.g. what is the
  prefix in ${some:ambiguous:stuff}?
- Unexpected interactions with other mechanisms, potential for name
  clashes etc.
- Write a customized toolset.
- Dependency on the availability of said tools in other environments, like
  IDEs, other toolsets etc.
You know, I'm not really a fan of "specification by implementation", look
what happened to the "easy and intuitive" C preprocessor.

J.Pietschmann

Re: [RT] Entities in XML docs

Posted by Jeff Turner <je...@apache.org>.

On Sat, Dec 28, 2002 at 03:13:45PM +0100, Joerg Pietschmann wrote:
> On Friday 27 December 2002 14:43, you wrote:
> > 3.2) We implement a SearchReplaceTransformer, which replaces ${variables}
> > with values.  Eg, entities.xml:
> Ouw!
> 
> > Are there any more options I haven't thought of?
> It might be more robust to stick with XML elements, even
> though they are more verbose:
>  <nn:replace name="variables"/>

It has the same problems as <xi:include>, ie requires editing DTDs so
that nn:replace is recognized.

> instead of ${variables}
> You avoid quoting some issues, think of the poor guys writing
> Ant code samples:
>  <code>&gtfile-set base="${some-file-set}"/></code>
> What will they get?

I'd imagine that if no 'some-file-set' variable is found, the variable
ref wouldn't be substituted, and the user would get an unmodified
'${some-file-set}'.  Using Stefano's ${pfx:name} syntax reduces the
chance of unexpected name collisions a fair bit, as we can check the
'pfx' before substituting.

--Jeff

> J.Pietschmann

Re: [RT] Entities in XML docs

Posted by Joerg Pietschmann <j3...@yahoo.de>.

On Friday 27 December 2002 14:43, you wrote:
> 3.2) We implement a SearchReplaceTransformer, which replaces ${variables}
> with values.  Eg, entities.xml:
Ouw!

> Are there any more options I haven't thought of?
It might be more robust to stick with XML elements, even
though they are more verbose:
 <nn:replace name="variables"/>
instead of ${variables}
You avoid quoting some issues, think of the poor guys writing
Ant code samples:
 <code>&gtfile-set base="${some-file-set}"/></code>
What will they get?

J.Pietschmann

Re: [RT] Entities in XML docs

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

> My current preference is to go with 3.2, and implement it with InputModules, the
> same way LinkRewriterTransformer works.  Using XInclude would involve less
> coding, but the DTD problems would be too horrible..

Related thought: would 'entities' be required to be well-formed in the 
sense of external XML entities?

Would they be read and injected as if they are XML, or just strings?

Gotchas when going full monty: they can have multiple root elements but 
still be valid when included.

Somehow related thought, but too scared to think about: i18n.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

RE: site.xml -> was -> RE: [RT] Entities in XML docs

Posted by Robert Koberg <ro...@koberg.com>.

Hey there,

> -----Original Message-----
> From: Jeff Turner [mailto:jefft@apache.org]
> Sent: Saturday, December 28, 2002 7:40 AM

> On Sat, Dec 28, 2002 at 05:36:32AM -0800, Robert Koberg wrote:
> ...
> > > > <page id="dreams"/>
> > >
> > > Less typing :)  And trying to treat all URL-addressable parts of the site
> > > in the same way.  It shouldn't matter if a node is a directory, file or
> > > #anchor.  In a linkmap, they're all just "things to link to".
> > >
> > > > To me, this allows for 'grouping' of IDs at the element level.
> > >
> > > How do you mean, grouping?
> >
> > [assuming?? that there will be metadata in the site.xml page and
> folder elements
> > and so you can't simply test for children, but even still, it is an
> extra test
> > in the transform]
>
> Oh I see.  Yes, I did need to use the 'has children == folder' rule to
> generate book.xml, which will indeed break with metadata.
>
> > I mean I want to know explicitly if something is a page or a
> folder. For example
> > in the lsb site's nav we have folder icons before folder labels and
> page icons
> > before page labels. If I know it is a page I can just:
> >
> > <xsl:template match="page">
>
> Though, it would be just as easy to make the node type an attribute, and
> match on that:
>
> <xsl:template match="*[dc:format='Page']">

Sure, but who is doing more typing now? :) That is:

one:
<page id="dreams">

and several:
<xsl:template match="page" mode="nav">
<xsl:template match="page" mode="snailtrail">
<xsl:template match="page" mode="path_builder">
etc

looks prettier this way too, but basically you are right, it would not matter if
there is an attribute for this.

More importantly, however, if you use any element name as the ID you cannot
validate the document, as you mentioned previously. Well you could but the
schema could not easily be reused (for non-apache-like sites). I would think
this is a showstopper.

>
> >   <xsl:variable name="href">
> > <!-- travels up and down the tree to find ../'s and path -->
> >     <xsl:call-template name="page_path_builder"/>
> >   </xsl:variable>
> >   <a href="{href}">
> >     <img src="{$relative_path}images/page_icon.gif"/>
> >     <xsl:value-of select="@label"/>
> >   </a>
> > </xsl:template>
> >
> > In building the href at generation time, I know that since it is a
> page I will
> > use (depending on site prefs) either the page ID or page label
> (replacing things
> > like spaces, :, ', etc) and then concatenate the file extension.
>
> Well if you stick to generic attributes, instead of *page* Id and
> label, then you can just glue the href's together and see what you end
> up with :) Eg, with:
>
> <site>
>   <primer label="Forrest Primer" href="primer.html">
>     <cvs href="#cvs"/>
>   </primer>
> </site>
>
> Then <link href="site:cvs"> gets translated to <a href="primer.html#cvs">

What happens if you have:

<site>
  <primer label="Forrest Primer" href="primer.html">
    <cvs href="#cvs"/>
  </primer>
  <old_primer label="Old Forrest Primer" href="old_primer.html">
    <cvs href="#cvs"/>
  </primer>
</site>

You need unique IDs.

>
> The original idea with site.xml was that it is a totally abstract
> representation of the site's information content.  Eg, it should be
> possible to replace the filesystem with a Xindice database, and have only
> the source URIs in site.xml change.  Say we have a FAQ entry:
>
> <site>
>   <faq>
>     <how_can_I_help />
>     <build_problems />
>     <useless_docs />
>   </faq>
> </site>
>
> One day, each entry might be mapped to an XML node:
>
> <site>
>   <faq src="faq.xml">
>     <how_can_I_help src="#xpointer(/faqs/question[@id='how_can_I_help'])"/>
>     <build_problems src="#xpointer(/faqs/question[@id='build_problems'])"/>
>     <useless_docs src="#xpointer(/faqs/question[@id='useless_docs'])"/>
>   </faq>
> </site>
>
> Then, by only changing @src attributes, we could map to Xindice:
>
> <site href="xmldb:xindice://localhost:4080/db/website">
>   <faq src="faq">
>     <how_can_I_help src="#/faqs/question[@id='how_can_I_help']"/>
>     <build_problems src="#/faqs/question[@id='build_problems']"/>
>     <useless_docs src="#/faqs/question[@id='useless_docs']"/>
>   </faq>
> </site>
>

I would do this currently by using an alternate URIResolver, but I am very
interested in your approach.


>
> So that's all very nice, but it's turning out to be not very
> practical.  Even to generate book.xml, I had to add these horrible
> non-addressable 'category' elements for grouping nodes:

I am not following this ?

>
> <getting-involved label="Getting Involved">
>   <contrib label="Contributing" href="contrib.html"/>
>   <CVS label="CVS"
>     href="http://cvs.apache.org/viewcvs/xml-forrest/"/>
>   <mail-lists label="Mail lists" href="mail-lists.html"/>
>   <mail-archives label="Mail Archives"
>     href="mail-archives.html"/>
>   <bugs label="Bugs and Issues"
>
> href="http://issues.cocoondev.org/jira/secure/BrowseProject.jspa?id=10000"/>
> </getting-involved>
>
>
> > If it is a folder, I will just append index.{html | jsp | php} and
> be done.  I
> > have a property in a folder_conf element that tells me the
> index_page - this is
> > copied at generation time to index.html.
>
> Oh yes.  index_page is another thing we really need a way to indicate.
> At the very least, it can be present in menus of subdirectories as a
> '../' link.

site_index at the top config level is nice too. Think of special holiday
promotions or some such thing.

>
> > - Or perhaps I want to create a pager (<< 1 2 3 4 >>) to have each
> 'page' in a
> > directory show up in the horizontal list, but I don't want child folders.
> >
> > - Or I want to create a site map/index page that shows the site
> structure with
> > meaningful icons/colors
> >
> > - Or I might want to offer a folder with individual page views or
> the option to
> > see all the pages (not folders) aggregated into one page view.
> >
> > - Or I might want to create an folder index page from a folder's pages using
> > dc:titles and dc:descriptions
>
> mm :)  Good ideas..
>
> > I don't see how to do the above without extra xsl:choose's or xsl:if's
>
> Or *[@dc:format='whatever'] I assume.

Sure, but you are looking at too many things to find what you need. First you
match all child elements and then have to check the appropriate attribute. The
way I am advocating would just check the element name.

>
> > > > On book.xml - why is this needed anymore? Cannot the site.xml be
> > > > used in its place?
> > >
> > > Yes, book.xml isn't necessary anymore (in the linkmap CVS branch).
> > > It's still kept around as an intermediate format (see
> > > site2book.xsl) so that if necessary, users can specify it directly
> > > rather than generate from site.xml.  There are various cases where
> > > the desired menu is not the same as that generated from site.xml.
> > > In Forrest's own site, we could not generate these menus from
> > > site.xml:
> > >
> > > http://xml.apache.org/forrest/community/index.html
> > > http://xml.apache.org/forrest/community/howto/index.html
> > >
> > > Whether these pages show good menu design is another question :)
> >
> >
> > That is why you should always storyboard out the site/project before
> > setting the contracts in stone :)
>
> Our customer pays very poorly ;P

oh yea, those bastards :)

>
> > > > On the metadata front, I have been adopting a mix of Dublin Core
> > > > and mixing in the stuff my tool requires. For example, at the
> > > > bottom is a snippet of what I am currently using in the site.xml
> > > > [1].
> > >
> > > Nice!  RDF, Dublin Core.. I see an opportunity for more shameless
> > > LSB-copying ;)
> >
> > I would love it! I am trying to bend toward forrest so I can
> > eventually publish a forrest site. But I need the metadata for a
> > flexible storyboarding process.
> >
> >
> > >
> > > I'm not sure I understand it fully though..
> > >
> > > > <lsb:folder name="en-us">
> > > >     <lsb:folder_conf>
> > > >       <rdf:Description about="folder.dcxml">
> > > ...
> > > >       </rdf:Description>
> > > >     </lsb:folder_conf>
> > > >     <lsb:page_conf>
> > > >       <rdf:Description about="preamble">
> > > ...
> > > >       </rdf:Description>
> > > >     </lsb:page_conf>
> > >
> > > I gather this is describing a directory 'en-us', and a file
> > > en-us/preamble?  What is 'folder.dcxml'?
> >
> >
> > I started out creating a metadata file (*.dcxml) for each resource
> > on the site and at app startup I would crawl the metadata and
> > aggregate those into one site.xml. I found that to be a CVS
> > nightmare given the fact that I allow pages and folders to be moved
> > around. So I went back to just having the static site.xml and at
> > generation time I either include the metadata inline (page level) or
> > write it to a file (folder, binary, ??).
>
> I like the idea of storing a RDF file in each directory, providing
> metadata for those files (and overall directory metadata).  What was
> the CVS nightmare?  .dcxml files needing to be updated on every file
> move?


I am trying to automate as much as possible. Would it be a good idea to use java
to control CVS to remove dirs/files -> commit and add dirs/files -> commit? I am
not good at this type of thing but I understand that you should not script
commits???

If a developer user (as opposed to an editor/author) wants to grab the latest
from CVS (chroot jail) I would want them to have the latest. But even if I
postponed the commit on the server it would require someone/thing to do the
commit and make sure everything is OK (there might not be a developer user in a
project).

By using site.xml for updates/changes I don't have to worry as much. Then at
generation time The metadata get put out as individual files on included in the
HTML.

>
> > The lsb:folder tells me the location of the *.dcxml (perhaps I
> > should use *.rdf...) and the rdf:Description tells me the file name.
>
> In that case, <rdf:Description about="folder.dcxml"> means "here's
> some metadata about a file containing metadata about the folder",
> which doesn't sound right?

The metadata for the folder would only exist in the generation output. Meanwhile
it lives in the site.xml.


>
> > > I don't really understand how a
> > > directory could be considered to have a title, subject etc.  Is that just
> > > indicating what the directory should contain?
> >
> > It is a test site.xml that is using things I 'might' want to play
> with. But as a
> > solid case, like I mentioned above, you might want to have a folder offer
> > individual pages or one inclusive, aggregated view. In the latter case the
> > folder is actually a page.
>
> I see, makes sense.
>
> > But you could create your schema to include anything you want and
> > perhaps setting hardcoded values for some items.
> >
> >
> > >
> > > If there is a lsb:folder, shouldn't there be a lsb:page too?
> >
> >
> > My thinking (which could easily change) was that lsb:folder's are a virtual
> > representation of the folder-file system as it should be after
> generation. The
> > lsb:folder_conf holds meta info about a folder (including
> navigation - lsb:nav -
> > items). The lsb:page_conf, among other things, describes one or
> more possible
> > page views (don't know if I am using dc:format correctly...):
> >
> > <rng:optional>
> >   <rng:element name="format" xmlns:dc="http://purl.org/dc/elements/1.1/">
> >     <rng:value type="token">text/html</rng:value>
> >   </rng:element>
> > </rng:optional>
> > <rng:optional>
> >   <rng:element name="format" xmlns:dc="http://purl.org/dc/elements/1.1/">
> >     <rng:value type="token">text/plain</rng:value>
> >   </rng:element>
> > </rng:optional>
> > <rng:optional>
> >   <rng:element name="format" xmlns:dc="http://purl.org/dc/elements/1.1/">
> >     <rng:value type="token">application/pdf</rng:value>
> >   </rng:element>
> > </rng:optional>
> >
> > I represent these in a form and let the user 'check' which views to
> generate.
> > Still working on this...
>
> I don't know how this 'configure the site generation' part would fit
> in with Cocoon.  Perhaps when Cocoon blocks arrive, we could have a
> 'add PDF block' checkbox which adds the *.pdf rules.


Yea, I will probably have to bend more in this direction.

>
> > > Is it necessary to have the intermediate *_conf elements?  Why not just
> > > have <lsb:folder> and <rdf:Description> directly inside it?
> >
> > I want to know what the thing's group is to ease template matching :)
>
> I don't understand.  What XPath expression is possible with:
>
> <lsb:folder name="en-us">
>    <lsb:folder_conf>
>       <rdf:Description about="folder.dcxml">
>
> But not with:
>
> <lsb:folder name="en-us">
>       <rdf:Description about="folder.dcxml">
>
> If rdf:Description is the only child of lsb:folder, you could just do
> match="rdf:Description[../lsb:folder]".

I guess we are debating personal preferences that can be handled in many ways. I
simply like to group things semantically. rdf:Description, for me, is not
engough information. It's like

<div class="note">This is a note</div>
vs
<note>This is a note</note>

But what you are proposing by using keys as element names goes too far, in my
opinion, because it cannot be validated.


best,
-Rob

Re: site.xml -> was -> RE: [RT] Entities in XML docs

Posted by Jeff Turner <je...@apache.org>.

On Sat, Dec 28, 2002 at 05:36:32AM -0800, Robert Koberg wrote:
...
> > > <page id="dreams"/>
> >
> > Less typing :)  And trying to treat all URL-addressable parts of the site
> > in the same way.  It shouldn't matter if a node is a directory, file or
> > #anchor.  In a linkmap, they're all just "things to link to".
> >
> > > To me, this allows for 'grouping' of IDs at the element level.
> >
> > How do you mean, grouping?
> 
> [assuming?? that there will be metadata in the site.xml page and folder elements
> and so you can't simply test for children, but even still, it is an extra test
> in the transform]

Oh I see.  Yes, I did need to use the 'has children == folder' rule to
generate book.xml, which will indeed break with metadata.

> I mean I want to know explicitly if something is a page or a folder. For example
> in the lsb site's nav we have folder icons before folder labels and page icons
> before page labels. If I know it is a page I can just:
> 
> <xsl:template match="page">

Though, it would be just as easy to make the node type an attribute, and
match on that:

<xsl:template match="*[dc:format='Page']">

>   <xsl:variable name="href">
> <!-- travels up and down the tree to find ../'s and path -->
>     <xsl:call-template name="page_path_builder"/>
>   </xsl:variable>
>   <a href="{href}">
>     <img src="{$relative_path}images/page_icon.gif"/>
>     <xsl:value-of select="@label"/>
>   </a>
> </xsl:template>
> 
> In building the href at generation time, I know that since it is a page I will
> use (depending on site prefs) either the page ID or page label (replacing things
> like spaces, :, ', etc) and then concatenate the file extension.

Well if you stick to generic attributes, instead of *page* Id and
label, then you can just glue the href's together and see what you end
up with :) Eg, with:

<site>
  <primer label="Forrest Primer" href="primer.html">
    <cvs href="#cvs"/>
  </primer>
</site>

Then <link href="site:cvs"> gets translated to <a href="primer.html#cvs">

The original idea with site.xml was that it is a totally abstract
representation of the site's information content.  Eg, it should be
possible to replace the filesystem with a Xindice database, and have only
the source URIs in site.xml change.  Say we have a FAQ entry:

<site>
  <faq>
    <how_can_I_help />
    <build_problems />
    <useless_docs />
  </faq>
</site>

One day, each entry might be mapped to an XML node:

<site>
  <faq src="faq.xml">
    <how_can_I_help src="#xpointer(/faqs/question[@id='how_can_I_help'])"/>
    <build_problems src="#xpointer(/faqs/question[@id='build_problems'])"/>
    <useless_docs src="#xpointer(/faqs/question[@id='useless_docs'])"/>
  </faq>
</site>

Then, by only changing @src attributes, we could map to Xindice:

<site href="xmldb:xindice://localhost:4080/db/website">
  <faq src="faq">
    <how_can_I_help src="#/faqs/question[@id='how_can_I_help']"/>
    <build_problems src="#/faqs/question[@id='build_problems']"/>
    <useless_docs src="#/faqs/question[@id='useless_docs']"/>
  </faq>
</site>


So that's all very nice, but it's turning out to be not very
practical.  Even to generate book.xml, I had to add these horrible
non-addressable 'category' elements for grouping nodes:

<getting-involved label="Getting Involved">
  <contrib label="Contributing" href="contrib.html"/>
  <CVS label="CVS"
    href="http://cvs.apache.org/viewcvs/xml-forrest/"/>
  <mail-lists label="Mail lists" href="mail-lists.html"/>
  <mail-archives label="Mail Archives"
    href="mail-archives.html"/>
  <bugs label="Bugs and Issues"
    href="http://issues.cocoondev.org/jira/secure/BrowseProject.jspa?id=10000"/>
</getting-involved>


> If it is a folder, I will just append index.{html | jsp | php} and be done.  I
> have a property in a folder_conf element that tells me the index_page - this is
> copied at generation time to index.html.

Oh yes.  index_page is another thing we really need a way to indicate.
At the very least, it can be present in menus of subdirectories as a
'../' link.

> - Or perhaps I want to create a pager (<< 1 2 3 4 >>) to have each 'page' in a
> directory show up in the horizontal list, but I don't want child folders.
> 
> - Or I want to create a site map/index page that shows the site structure with
> meaningful icons/colors
> 
> - Or I might want to offer a folder with individual page views or the option to
> see all the pages (not folders) aggregated into one page view.
> 
> - Or I might want to create an folder index page from a folder's pages using
> dc:titles and dc:descriptions

mm :)  Good ideas..

> I don't see how to do the above without extra xsl:choose's or xsl:if's

Or *[@dc:format='whatever'] I assume.

> > > On book.xml - why is this needed anymore? Cannot the site.xml be
> > > used in its place?
> >
> > Yes, book.xml isn't necessary anymore (in the linkmap CVS branch).
> > It's still kept around as an intermediate format (see
> > site2book.xsl) so that if necessary, users can specify it directly
> > rather than generate from site.xml.  There are various cases where
> > the desired menu is not the same as that generated from site.xml.
> > In Forrest's own site, we could not generate these menus from
> > site.xml:
> >
> > http://xml.apache.org/forrest/community/index.html
> > http://xml.apache.org/forrest/community/howto/index.html
> >
> > Whether these pages show good menu design is another question :)
> 
> 
> That is why you should always storyboard out the site/project before
> setting the contracts in stone :)

Our customer pays very poorly ;P

> > > On the metadata front, I have been adopting a mix of Dublin Core
> > > and mixing in the stuff my tool requires. For example, at the
> > > bottom is a snippet of what I am currently using in the site.xml
> > > [1].
> >
> > Nice!  RDF, Dublin Core.. I see an opportunity for more shameless
> > LSB-copying ;)
> 
> I would love it! I am trying to bend toward forrest so I can
> eventually publish a forrest site. But I need the metadata for a
> flexible storyboarding process.
> 
> 
> >
> > I'm not sure I understand it fully though..
> >
> > > <lsb:folder name="en-us">
> > >     <lsb:folder_conf>
> > >       <rdf:Description about="folder.dcxml">
> > ...
> > >       </rdf:Description>
> > >     </lsb:folder_conf>
> > >     <lsb:page_conf>
> > >       <rdf:Description about="preamble">
> > ...
> > >       </rdf:Description>
> > >     </lsb:page_conf>
> >
> > I gather this is describing a directory 'en-us', and a file
> > en-us/preamble?  What is 'folder.dcxml'?
> 
> 
> I started out creating a metadata file (*.dcxml) for each resource
> on the site and at app startup I would crawl the metadata and
> aggregate those into one site.xml. I found that to be a CVS
> nightmare given the fact that I allow pages and folders to be moved
> around. So I went back to just having the static site.xml and at
> generation time I either include the metadata inline (page level) or
> write it to a file (folder, binary, ??).

I like the idea of storing a RDF file in each directory, providing
metadata for those files (and overall directory metadata).  What was
the CVS nightmare?  .dcxml files needing to be updated on every file
move?

> The lsb:folder tells me the location of the *.dcxml (perhaps I
> should use *.rdf...) and the rdf:Description tells me the file name.

In that case, <rdf:Description about="folder.dcxml"> means "here's
some metadata about a file containing metadata about the folder",
which doesn't sound right?

> > I don't really understand how a
> > directory could be considered to have a title, subject etc.  Is that just
> > indicating what the directory should contain?
> 
> It is a test site.xml that is using things I 'might' want to play with. But as a
> solid case, like I mentioned above, you might want to have a folder offer
> individual pages or one inclusive, aggregated view. In the latter case the
> folder is actually a page.

I see, makes sense.

> But you could create your schema to include anything you want and
> perhaps setting hardcoded values for some items.
> 
> 
> >
> > If there is a lsb:folder, shouldn't there be a lsb:page too?
> 
> 
> My thinking (which could easily change) was that lsb:folder's are a virtual
> representation of the folder-file system as it should be after generation. The
> lsb:folder_conf holds meta info about a folder (including navigation - lsb:nav -
> items). The lsb:page_conf, among other things, describes one or more possible
> page views (don't know if I am using dc:format correctly...):
> 
> <rng:optional>
>   <rng:element name="format" xmlns:dc="http://purl.org/dc/elements/1.1/">
>     <rng:value type="token">text/html</rng:value>
>   </rng:element>
> </rng:optional>
> <rng:optional>
>   <rng:element name="format" xmlns:dc="http://purl.org/dc/elements/1.1/">
>     <rng:value type="token">text/plain</rng:value>
>   </rng:element>
> </rng:optional>
> <rng:optional>
>   <rng:element name="format" xmlns:dc="http://purl.org/dc/elements/1.1/">
>     <rng:value type="token">application/pdf</rng:value>
>   </rng:element>
> </rng:optional>
> 
> I represent these in a form and let the user 'check' which views to generate.
> Still working on this...

I don't know how this 'configure the site generation' part would fit
in with Cocoon.  Perhaps when Cocoon blocks arrive, we could have a
'add PDF block' checkbox which adds the *.pdf rules.

> > Is it necessary to have the intermediate *_conf elements?  Why not just
> > have <lsb:folder> and <rdf:Description> directly inside it?
> 
> I want to know what the thing's group is to ease template matching :)

I don't understand.  What XPath expression is possible with:

<lsb:folder name="en-us">
   <lsb:folder_conf>
      <rdf:Description about="folder.dcxml">

But not with:

<lsb:folder name="en-us">
      <rdf:Description about="folder.dcxml">

If rdf:Description is the only child of lsb:folder, you could just do
match="rdf:Description[../lsb:folder]".

> > Anyway, great stuff.. at the very least, Forrest needs a version
> > attribute for site.xml so we can evolve the file to these heights.
> 
> 
> Thanks for the kind words. I hope others will come to the table soon. It is
> really popwerful/flexible. I really like the direction your branch is taking. I
> have not had time to play with it yet, what with Xmas and I am leaving tomorrw
> to go down to Austin TX for a Robert Earl Keen New Years eve show :) (see him if
> you ever get a chance - awesome live!), so I probably won't have time till early
> Jan. Hopefully there will be some kind of metadata impl by then :) just joking,
> you have been doing an amazing amount of work!

:) Thanks.. reinventing the site.xml wheel is fun.

--Jeff

> best,
> -Rob
> 
> >
> >
> > --Jeff
>

RE: site.xml -> was -> RE: [RT] Entities in XML docs

Posted by Robert Koberg <ro...@koberg.com>.

Hi again,

> -----Original Message-----
> From: Jeff Turner [mailto:jefft@apache.org]
> Sent: Saturday, December 28, 2002 1:37 AM

> On Fri, Dec 27, 2002 at 07:52:03AM -0800, Robert Koberg wrote:
> ...
> > > :) Think of site.xml as a small database:
> > >
> > > PAGE_ID   LABEL        HREF           TIMESTAMP
> > > -------   -----        ----           ---------
> > > dreams    Dream list   dreams.html       ...
> > > faq       FAQs         faq.html
> > > toc       ToC          doclist.html
> > > changes   Changes      changes.html
> > > todo      Todo         todo.html
> > >
> > > PAGE_ID is the primary key, and therefore deserves greater syntactic
> > > importance than all the other attributes.  Seems most natural to make the
> > > primary key the element name:
> >
> > I agree that PAGE_ID is used as the primary key but why is it more
> natural to be
> > the element name? Why it is better than:
> >
> > <page id="dreams"/>
>
> Less typing :)  And trying to treat all URL-addressable parts of the site
> in the same way.  It shouldn't matter if a node is a directory, file or
> #anchor.  In a linkmap, they're all just "things to link to".
>
> > To me, this allows for 'grouping' of IDs at the element level.
>
> How do you mean, grouping?

[assuming?? that there will be metadata in the site.xml page and folder elements
and so you can't simply test for children, but even still, it is an extra test
in the transform]

I mean I want to know explicitly if something is a page or a folder. For example
in the lsb site's nav we have folder icons before folder labels and page icons
before page labels. If I know it is a page I can just:

<xsl:template match="page">
  <xsl:variable name="href">
<!-- travels up and down the tree to find ../'s and path -->
    <xsl:call-template name="page_path_builder"/>
  </xsl:variable>
  <a href="{href}">
    <img src="{$relative_path}images/page_icon.gif"/>
    <xsl:value-of select="@label"/>
  </a>
</xsl:template>

In building the href at generation time, I know that since it is a page I will
use (depending on site prefs) either the page ID or page label (replacing things
like spaces, :, ', etc) and then concatenate the file extension.

If it is a folder, I will just append index.{html | jsp | php} and be done.  I
have a property in a folder_conf element that tells me the index_page - this is
copied at generation time to index.html.

- Or perhaps I want to create a pager (<< 1 2 3 4 >>) to have each 'page' in a
directory show up in the horizontal list, but I don't want child folders.

- Or I want to create a site map/index page that shows the site structure with
meaningful icons/colors

- Or I might want to offer a folder with individual page views or the option to
see all the pages (not folders) aggregated into one page view.

- Or I might want to create an folder index page from a folder's pages using
dc:titles and dc:descriptions

I don't see how to do the above without extra xsl:choose's or xsl:if's

> >
> > On book.xml - why is this needed anymore? Cannot the site.xml be used in its
> > place?
>
> Yes, book.xml isn't necessary anymore (in the linkmap CVS branch).  It's
> still kept around as an intermediate format (see site2book.xsl) so that
> if necessary, users can specify it directly rather than generate from
> site.xml.  There are various cases where the desired menu is not the same
> as that generated from site.xml.  In Forrest's own site, we could not
> generate these menus from site.xml:
>
> http://xml.apache.org/forrest/community/index.html
> http://xml.apache.org/forrest/community/howto/index.html
>
> Whether these pages show good menu design is another question :)

That is why you should always storyboard out the site/project before setting the
contracts in stone :)

>
> > On the metadata front, I have been adopting a mix of Dublin Core
> and mixing in
> > the stuff my tool requires. For example, at the bottom is a snippet
> of what I am
> > currently using in the site.xml [1].
>
> Nice!  RDF, Dublin Core.. I see an opportunity for more shameless
> LSB-copying ;)

I would love it! I am trying to bend toward forrest so I can eventually publish
a forrest site. But I need the metadata for a flexible storyboarding process.

>
> I'm not sure I understand it fully though..
>
> > <lsb:folder name="en-us">
> >     <lsb:folder_conf>
> >       <rdf:Description about="folder.dcxml">
> ...
> >       </rdf:Description>
> >     </lsb:folder_conf>
> >     <lsb:page_conf>
> >       <rdf:Description about="preamble">
> ...
> >       </rdf:Description>
> >     </lsb:page_conf>
>
> I gather this is describing a directory 'en-us', and a file
> en-us/preamble?  What is 'folder.dcxml'?

I started out creating a metadata file (*.dcxml) for each resource on the site
and at app startup I would crawl the metadata and aggregate those into one
site.xml. I found that to be a CVS nightmare given the fact that I allow pages
and folders to be moved around. So I went back to just having the static
site.xml and at generation time I either include the metadata inline (page
level) or write it to a file (folder, binary, ??). The lsb:folder tells me the
location of the *.dcxml (perhaps I should use *.rdf...) and the rdf:Description
tells me the file name.

> I don't really understand how a
> directory could be considered to have a title, subject etc.  Is that just
> indicating what the directory should contain?

It is a test site.xml that is using things I 'might' want to play with. But as a
solid case, like I mentioned above, you might want to have a folder offer
individual pages or one inclusive, aggregated view. In the latter case the
folder is actually a page. But you could create your schema to include anything
you want and perhaps setting hardcoded values for some items.

>
> If there is a lsb:folder, shouldn't there be a lsb:page too?

My thinking (which could easily change) was that lsb:folder's are a virtual
representation of the folder-file system as it should be after generation. The
lsb:folder_conf holds meta info about a folder (including navigation - lsb:nav -
items). The lsb:page_conf, among other things, describes one or more possible
page views (don't know if I am using dc:format correctly...):

<rng:optional>
  <rng:element name="format" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rng:value type="token">text/html</rng:value>
  </rng:element>
</rng:optional>
<rng:optional>
  <rng:element name="format" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rng:value type="token">text/plain</rng:value>
  </rng:element>
</rng:optional>
<rng:optional>
  <rng:element name="format" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rng:value type="token">application/pdf</rng:value>
  </rng:element>
</rng:optional>

I represent these in a form and let the user 'check' which views to generate.
Still working on this...

>
> Is it necessary to have the intermediate *_conf elements?  Why not just
> have <lsb:folder> and <rdf:Description> directly inside it?

I want to know what the thing's group is to ease template matching :)

>
> Anyway, great stuff.. at the very least, Forrest needs a version
> attribute for site.xml so we can evolve the file to these heights.

Thanks for the kind words. I hope others will come to the table soon. It is
really popwerful/flexible. I really like the direction your branch is taking. I
have not had time to play with it yet, what with Xmas and I am leaving tomorrw
to go down to Austin TX for a Robert Earl Keen New Years eve show :) (see him if
you ever get a chance - awesome live!), so I probably won't have time till early
Jan. Hopefully there will be some kind of metadata impl by then :) just joking,
you have been doing an amazing amount of work!

best,
-Rob

>
>
> --Jeff

Re: site.xml -> was -> RE: [RT] Entities in XML docs

Posted by Jeff Turner <je...@apache.org>.

On Fri, Dec 27, 2002 at 07:52:03AM -0800, Robert Koberg wrote:
...
> > :) Think of site.xml as a small database:
> >
> > PAGE_ID   LABEL        HREF           TIMESTAMP
> > -------   -----        ----           ---------
> > dreams    Dream list   dreams.html       ...
> > faq       FAQs         faq.html
> > toc       ToC          doclist.html
> > changes   Changes      changes.html
> > todo      Todo         todo.html
> >
> > PAGE_ID is the primary key, and therefore deserves greater syntactic
> > importance than all the other attributes.  Seems most natural to make the
> > primary key the element name:
> 
> I agree that PAGE_ID is used as the primary key but why is it more natural to be
> the element name? Why it is better than:
> 
> <page id="dreams"/>

Less typing :)  And trying to treat all URL-addressable parts of the site
in the same way.  It shouldn't matter if a node is a directory, file or
#anchor.  In a linkmap, they're all just "things to link to".

> To me, this allows for 'grouping' of IDs at the element level.

How do you mean, grouping?

> Just trying to understand.
> 
> On book.xml - why is this needed anymore? Cannot the site.xml be used in its
> place?

Yes, book.xml isn't necessary anymore (in the linkmap CVS branch).  It's
still kept around as an intermediate format (see site2book.xsl) so that
if necessary, users can specify it directly rather than generate from
site.xml.  There are various cases where the desired menu is not the same
as that generated from site.xml.  In Forrest's own site, we could not
generate these menus from site.xml:

http://xml.apache.org/forrest/community/index.html
http://xml.apache.org/forrest/community/howto/index.html

Whether these pages show good menu design is another question :)

> On the metadata front, I have been adopting a mix of Dublin Core and mixing in
> the stuff my tool requires. For example, at the bottom is a snippet of what I am
> currently using in the site.xml [1].

Nice!  RDF, Dublin Core.. I see an opportunity for more shameless
LSB-copying ;)

I'm not sure I understand it fully though..

> <lsb:folder name="en-us">
>     <lsb:folder_conf>
>       <rdf:Description about="folder.dcxml">
...
>       </rdf:Description>
>     </lsb:folder_conf>
>     <lsb:page_conf>
>       <rdf:Description about="preamble">
...
>       </rdf:Description>
>     </lsb:page_conf>

I gather this is describing a directory 'en-us', and a file
en-us/preamble?  What is 'folder.dcxml'?  I don't really understand how a
directory could be considered to have a title, subject etc.  Is that just
indicating what the directory should contain?

If there is a lsb:folder, shouldn't there be a lsb:page too?

Is it necessary to have the intermediate *_conf elements?  Why not just
have <lsb:folder> and <rdf:Description> directly inside it?

Anyway, great stuff.. at the very least, Forrest needs a version
attribute for site.xml so we can evolve the file to these heights.


--Jeff

> Below that I have included a schema for the
> page level [2] (I have schemas for config, folder and content as well, if
> interested). I build a 'properties' form from the schema and if existing,
> populate it with the current metadata (very handy). Some resources indicate that
> they are just copied while the default would be to transform. I am trying
> 
> 
> <snip/>
> 
> 
> [1] snippet from new site.xml
> ...
> <lsb:folder name="css" copy="true" />
> <lsb:folder name="en-us">
>     <lsb:folder_conf>
>       <rdf:Description about="folder.dcxml">
>         <dc:title>"We the people...."</dc:title>
>         <dc:subject>US English version of US Constitution</dc:subject>
>         <dc:description>Main site folder</dc:description>
>         <dc:coverage>USA</dc:coverage>
>         <dc:creator>Robert Koberg</dc:creator>
>         <dc:publisher>liveSTORYBOARD</dc:publisher>
>         <dc:contributor>Iva Koberg</dc:contributor>
>         <dc:rights>Open Source :)</dc:rights>
>         <dc:date.created>2002-12-06</dc:date.created>
>         <dc:date.modified>2002-12-07</dc:date.modified>
>         <dc:format>Folder</dc:format>
>         <dc:identifier>en_us</dc:identifier>
>         <dc:language>en-us</dc:language>
>         <lsb:col name="left" />
>         <lsb:col name="wide" />
>         <lsb:col name="right" />
>         <lsb:css>default</lsb:css>
>         <lsb:displ_label>true</lsb:displ_label>
>         <lsb:expand>false</lsb:expand>
>         <lsb:index_page>preamble</lsb:index_page>
>         <lsb:label>Index</lsb:label>
>         <lsb:name>en-us</lsb:name>
>         <lsb:nav>preamble</lsb:nav>
>         <lsb:nav>Article_I</lsb:nav>
>         <lsb:nav>Article_II</lsb:nav>
>         <lsb:nav>etc...</lsb:nav>
>         <lsb:pager>true</lsb:pager>
>         <lsb:snailtrail>true</lsb:snailtrail>
>         <lsb:type>folder</lsb:type>
>         <lsb:xsl>xsl:default</lsb:xsl>
>       </rdf:Description>
>     </lsb:folder_conf>
>     <lsb:page_conf>
>       <rdf:Description about="preamble">
>         <dc:title>US Constitution Preamble</dc:title>
>         <dc:subject>Inaleinable rights</dc:subject>
>         <dc:description>The preamble to the US Constitution</dc:description>
>         <dc:coverage>USA</dc:coverage>
>         <dc:creator>Robert Koberg</dc:creator>
>         <dc:publisher>liveSTORYBOARD</dc:publisher>
>         <dc:contributor>Iva Koberg</dc:contributor>
>         <dc:rights>Open Source :)</dc:rights>
>         <dc:date.created>2002-12-06</dc:date.created>
>         <dc:date.modified>2002-12-27</dc:date.modified>
>         <dc:format>text/html</dc:format>
>         <dc:format>text/plain</dc:format>
>         <dc:format>application/pdf</dc:format>
>         <dc:identifier>preamble</dc:identifier>
>         <dc:language>en-us</dc:language>
>         <lsb:css>default</lsb:css>
>         <lsb:displ_label>true</lsb:displ_label>
>         <lsb:generate>true</lsb:generate>
>         <lsb:label>Preamble</lsb:label>
>         <lsb:metadata>false</lsb:metadata>
>         <lsb:print_friendly>true</lsb:print_friendly>
>         <lsb:toc>false</lsb:toc>
>         <lsb:type>page</lsb:type>
>         <lsb:xsl>xsl:default</lsb:xsl>
>         <lsb:col name="left" />
>         <lsb:col name="wide">
>           <dc:source>preamble_content</dc:source>
>         </lsb:col>
>         <lsb:col name="right" />
>       </rdf:Description>
>     </lsb:page_conf>
> ....
<snip rng schema>

RE: [RT] Entities in XML docs

Posted by Robert Koberg <ro...@koberg.com>.

Hi,

> -----Original Message-----
> From: Jeff Turner [mailto:jefft@apache.org]
> Sent: Friday, December 27, 2002 10:54 PM
<snip/>
>
> > An attribute type of NMTOKEN also exists. I'm not sure whether you can
> > declare 'any' attribute in RNG while still specifying the type being
> > NMTOKEN or something similar, but 'someone will tell me' ;-)
>
> There's a NMTOKEN datatype in XSD or something that we could use.

Something like?
...
<rng:oneOrMore>
  <rng:element
    name="language"
    a:default="en-us"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rng:choice>
      <rng:value a:label="US English" type="NMTOKEN">en-us</rng:value>
      <rng:value a:label="Bulgarian" type="NMTOKEN">bg</rng:value>
    </rng:choice>
  </rng:element>
</rng:oneOrMore>
...
>
> > >Finally, we could never have a RNG or DTD for site.xml, because it is
> > >intended to be arbirtarily extended, vertically by whatever page
> > >classification scheme the user wants, and horizontally with whatever
> > >page metadata the user wants.  There could be attributes for
> > >timestamps, access levels, difficulty levels, related pages, bogosity
> > >readings, anything.  At best, we could have a Schematron enforcing the
> > >presence of minimal metadata, ie @href.  Even @label is optional, eg:
> > >
> > >  <primer label="Forrest Primer" href="primer.html">
> > >    <cvs href="#cvs"/>
> > >  </primer>
> >
> > In my mind and practice, I only use Schematron for things which can't be
> > expressed in other grammar languages, i.e. context- or value-dependent
> > values or models, like:
> >
> > the contentmodel for element c depends on the value of the attribute b
> > attached to some element a
> >
> > and even then, I wonder how that would like in Schematron :-s
>
> Oh well, any old schema language.. in the end, we still have an
> arbirtarily structured file where the only thing validatable is metadata.

Well, if content pieces have links that use IDs (or element names as IDs) from
the site.xml you can use schematron or just XSLT and document() to validate
links.

I am not clear on how to make an element name a validatable ID in RNG.


>
> Thinking about it, RDF might be a good way to express metadata about
> files in a website.  Didn't Tim Bray or someone recently come up with a
> Notation that Doesn't Suck?  I can't seem to find it.  Anyone here know
> anything about RDF? :)  Is it usable by mortals to store metadata about a
> site?

This is not exactly what you are asking for but perhaps checkout:
http://dublincore.org/documents/2002/07/31/dcmes-xml/

<snip/>

best,
-Rob

Re: [RT] Entities in XML docs

Posted by Jeff Turner <je...@apache.org>.

On Fri, Dec 27, 2002 at 06:02:16PM +0100, Steven Noels wrote:
...
> >Looking at the definition for NMTOKEN, this also provides some convenient
> >restrictions to the primary key value, which otherwise we'd have to enforce
> >with regexps:
> >
> >Nmtoken  ::= (NameChar)+
> >NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | 
> >Extender
> 
> Hm... we somehow collectively know we'll use RelaxNG instead of DTDs for 
> the validation of 'technical' resources (i.e. not documents), and given 
> the prospect of pluggable datatypes in RNG, we can force any type of 
> validation we want for both element content and attributes values.

True

> An attribute type of NMTOKEN also exists. I'm not sure whether you can 
> declare 'any' attribute in RNG while still specifying the type being 
> NMTOKEN or something similar, but 'someone will tell me' ;-)

There's a NMTOKEN datatype in XSD or something that we could use.

> >Finally, we could never have a RNG or DTD for site.xml, because it is
> >intended to be arbirtarily extended, vertically by whatever page
> >classification scheme the user wants, and horizontally with whatever
> >page metadata the user wants.  There could be attributes for
> >timestamps, access levels, difficulty levels, related pages, bogosity
> >readings, anything.  At best, we could have a Schematron enforcing the
> >presence of minimal metadata, ie @href.  Even @label is optional, eg:
> >
> >  <primer label="Forrest Primer" href="primer.html">
> >    <cvs href="#cvs"/>
> >  </primer>
> 
> In my mind and practice, I only use Schematron for things which can't be 
> expressed in other grammar languages, i.e. context- or value-dependent 
> values or models, like:
> 
> the contentmodel for element c depends on the value of the attribute b 
> attached to some element a
> 
> and even then, I wonder how that would like in Schematron :-s

Oh well, any old schema language.. in the end, we still have an
arbirtarily structured file where the only thing validatable is metadata.

Thinking about it, RDF might be a good way to express metadata about
files in a website.  Didn't Tim Bray or someone recently come up with a
Notation that Doesn't Suck?  I can't seem to find it.  Anyone here know
anything about RDF? :)  Is it usable by mortals to store metadata about a
site?

> >><shortcut name="xml4j">Xerces-Java</shortcut>
> >>
> >>These are shortcuts, aren't they? - http://radio.userland.com/shortcuts
> >
> >
> >Yes, shortcuts, macros etc.  If we go with ${foo} syntax, 'variable' sounds
> >most natural.
> 
> ${} is everywhere... I don't mind, but maybe {{}} would be kewl too...? 

Anything really.. it should be specifiable in the sitemap.  If ${foo}
can't be resolved it should be left unmodified.

> We should define escape characters too, if only for our own documentation.

$${variable}?

--Jeff

> </Steven>
> -- 
> Steven Noels                            http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> Read my weblog at              http://radio.weblogs.com/0103539/
> stevenn at outerthought.org                stevenn at apache.org
>

Re: [RT] Entities in XML docs

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

> On Fri, Dec 27, 2002 at 03:02:28PM +0100, Steven Noels wrote:

>>Seems OK to me - fairly similar to the LinkMapTransformer. I'm still 
>>stuck with the obsession we shouldn't use element names as lookup keys, 
>>however, so:
> 
> 
> :) Think of site.xml as a small database:
> 
> PAGE_ID   LABEL        HREF           TIMESTAMP
> -------   -----        ----           ---------
> dreams    Dream list   dreams.html       ...
> faq       FAQs         faq.html
> toc       ToC          doclist.html
> changes   Changes      changes.html
> todo      Todo         todo.html
> 
> PAGE_ID is the primary key, and therefore deserves greater syntactic
> importance than all the other attributes.  Seems most natural to make the
> primary key the element name:

OK on the key thing.

>  <dreams label="Dream list" href="dreams.html"/>
>  <faq label="FAQs" href="faq.html"/>
>  <toc label="ToC" href="doclist.html"/>
>  <changes label="Changes" href="changes.html"/>
>  <todo label="Todo" href="todo.html"/>
> 
> Looking at the definition for NMTOKEN, this also provides some convenient
> restrictions to the primary key value, which otherwise we'd have to enforce
> with regexps:
> 
> Nmtoken  ::= (NameChar)+
> NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender

Hm... we somehow collectively know we'll use RelaxNG instead of DTDs for 
the validation of 'technical' resources (i.e. not documents), and given 
the prospect of pluggable datatypes in RNG, we can force any type of 
validation we want for both element content and attributes values.

An attribute type of NMTOKEN also exists. I'm not sure whether you can 
declare 'any' attribute in RNG while still specifying the type being 
NMTOKEN or something similar, but 'someone will tell me' ;-)

> Finally, we could never have a RNG or DTD for site.xml, because it is intended
> to be arbirtarily extended, vertically by whatever page classification scheme
> the user wants, and horizontally with whatever page metadata the user wants.
> There could be attributes for timestamps, access levels, difficulty levels,
> related pages, bogosity readings, anything.  At best, we could have a
> Schematron enforcing the presence of minimal metadata, ie @href.  Even @label
> is optional, eg:
> 
>   <primer label="Forrest Primer" href="primer.html">
>     <cvs href="#cvs"/>
>   </primer>

In my mind and practice, I only use Schematron for things which can't be 
expressed in other grammar languages, i.e. context- or value-dependent 
values or models, like:

the contentmodel for element c depends on the value of the attribute b 
attached to some element a

and even then, I wonder how that would like in Schematron :-s

>><shortcut name="xml4j">Xerces-Java</shortcut>
>>
>>These are shortcuts, aren't they? - http://radio.userland.com/shortcuts
> 
> 
> Yes, shortcuts, macros etc.  If we go with ${foo} syntax, 'variable' sounds
> most natural.

${} is everywhere... I don't mind, but maybe {{}} would be kewl too...? 
We should define escape characters too, if only for our own documentation.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

site.xml -> was -> RE: [RT] Entities in XML docs

Posted by Robert Koberg <ro...@koberg.com>.

Hi Jeff - great work!

Catching up on the recent threads...(hopefully this post is seen as a
contribution?) I have a few questions/comments inline.

> -----Original Message-----
> From: Jeff Turner [mailto:jefft@apache.org]
> Sent: Friday, December 27, 2002 6:48 AM

> :) Think of site.xml as a small database:
>
> PAGE_ID   LABEL        HREF           TIMESTAMP
> -------   -----        ----           ---------
> dreams    Dream list   dreams.html       ...
> faq       FAQs         faq.html
> toc       ToC          doclist.html
> changes   Changes      changes.html
> todo      Todo         todo.html
>
> PAGE_ID is the primary key, and therefore deserves greater syntactic
> importance than all the other attributes.  Seems most natural to make the
> primary key the element name:

I agree that PAGE_ID is used as the primary key but why is it more natural to be
the element name? Why it is better than:

<page id="dreams"/>

To me, this allows for 'grouping' of IDs at the element level. Just trying to
understand.

On book.xml - why is this needed anymore? Cannot the site.xml be used in its
place?

On the metadata front, I have been adopting a mix of Dublin Core and mixing in
the stuff my tool requires. For example, at the bottom is a snippet of what I am
currently using in the site.xml [1]. Below that I have included a schema for the
page level [2] (I have schemas for config, folder and content as well, if
interested). I build a 'properties' form from the schema and if existing,
populate it with the current metadata (very handy). Some resources indicate that
they are just copied while the default would be to transform. I am trying


<snip/>


[1] snippet from new site.xml
...
<lsb:folder name="css" copy="true" />
<lsb:folder name="en-us">
    <lsb:folder_conf>
      <rdf:Description about="folder.dcxml">
        <dc:title>"We the people...."</dc:title>
        <dc:subject>US English version of US Constitution</dc:subject>
        <dc:description>Main site folder</dc:description>
        <dc:coverage>USA</dc:coverage>
        <dc:creator>Robert Koberg</dc:creator>
        <dc:publisher>liveSTORYBOARD</dc:publisher>
        <dc:contributor>Iva Koberg</dc:contributor>
        <dc:rights>Open Source :)</dc:rights>
        <dc:date.created>2002-12-06</dc:date.created>
        <dc:date.modified>2002-12-07</dc:date.modified>
        <dc:format>Folder</dc:format>
        <dc:identifier>en_us</dc:identifier>
        <dc:language>en-us</dc:language>
        <lsb:col name="left" />
        <lsb:col name="wide" />
        <lsb:col name="right" />
        <lsb:css>default</lsb:css>
        <lsb:displ_label>true</lsb:displ_label>
        <lsb:expand>false</lsb:expand>
        <lsb:index_page>preamble</lsb:index_page>
        <lsb:label>Index</lsb:label>
        <lsb:name>en-us</lsb:name>
        <lsb:nav>preamble</lsb:nav>
        <lsb:nav>Article_I</lsb:nav>
        <lsb:nav>Article_II</lsb:nav>
        <lsb:nav>etc...</lsb:nav>
        <lsb:pager>true</lsb:pager>
        <lsb:snailtrail>true</lsb:snailtrail>
        <lsb:type>folder</lsb:type>
        <lsb:xsl>xsl:default</lsb:xsl>
      </rdf:Description>
    </lsb:folder_conf>
    <lsb:page_conf>
      <rdf:Description about="preamble">
        <dc:title>US Constitution Preamble</dc:title>
        <dc:subject>Inaleinable rights</dc:subject>
        <dc:description>The preamble to the US Constitution</dc:description>
        <dc:coverage>USA</dc:coverage>
        <dc:creator>Robert Koberg</dc:creator>
        <dc:publisher>liveSTORYBOARD</dc:publisher>
        <dc:contributor>Iva Koberg</dc:contributor>
        <dc:rights>Open Source :)</dc:rights>
        <dc:date.created>2002-12-06</dc:date.created>
        <dc:date.modified>2002-12-27</dc:date.modified>
        <dc:format>text/html</dc:format>
        <dc:format>text/plain</dc:format>
        <dc:format>application/pdf</dc:format>
        <dc:identifier>preamble</dc:identifier>
        <dc:language>en-us</dc:language>
        <lsb:css>default</lsb:css>
        <lsb:displ_label>true</lsb:displ_label>
        <lsb:generate>true</lsb:generate>
        <lsb:label>Preamble</lsb:label>
        <lsb:metadata>false</lsb:metadata>
        <lsb:print_friendly>true</lsb:print_friendly>
        <lsb:toc>false</lsb:toc>
        <lsb:type>page</lsb:type>
        <lsb:xsl>xsl:default</lsb:xsl>
        <lsb:col name="left" />
        <lsb:col name="wide">
          <dc:source>preamble_content</dc:source>
        </lsb:col>
        <lsb:col name="right" />
      </rdf:Description>
    </lsb:page_conf>
....

[2] Page level RNG schema:

<rng:grammar xmlns:a="http://livestoryboard.com/schemas/annotations/2.0/"
xmlns:rng="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <rng:start>
    <rng:element name="RDF"
xmlns:rdf="http://www.w3.org/1999/02/220-rdf-syntax-ns#">
      <rng:element name="Description"
xmlns:rdf="http://dublincore.org/resources/faq/">
        <a:h>Metadata</a:h>
        <rng:div a:id="dc">
          <a:h>Dublin Core</a:h>
          <rng:element name="title" xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rng:data type="token">
              <rng:param name="maxLength">100</rng:param>
            </rng:data>
          </rng:element>
          <rng:element name="subject"
xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rng:data type="token">
              <rng:param name="maxLength">256</rng:param>
            </rng:data>
          </rng:element>
          <rng:element name="description"
xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rng:data type="token">
              <rng:param name="maxLength">256</rng:param>
            </rng:data>
          </rng:element>
          <rng:element name="coverage"
xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rng:data type="token">
              <rng:param name="maxLength">100</rng:param>
            </rng:data>
          </rng:element>
          <rng:oneOrMore>
            <rng:element name="creator"
xmlns:dc="http://purl.org/dc/elements/1.1/">
              <rng:data type="token">
                <rng:param name="maxLength">100</rng:param>
              </rng:data>
            </rng:element>
          </rng:oneOrMore>
          <rng:element name="publisher"
xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rng:data type="token">
              <rng:param name="maxLength">100</rng:param>
            </rng:data>
          </rng:element>
          <rng:oneOrMore>
            <rng:element name="contributor"
xmlns:dc="http://purl.org/dc/elements/1.1/">
              <rng:data type="token">
                <rng:param name="maxLength">100</rng:param>
              </rng:data>
            </rng:element>
          </rng:oneOrMore>
          <rng:element name="rights"
xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rng:data type="token">
              <rng:param name="maxLength">100</rng:param>
            </rng:data>
          </rng:element>
          <rng:element name="date.created"
xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rng:data type="token"></rng:data>
          </rng:element>
          <rng:element name="date.modified"
xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rng:data type="token"></rng:data>
          </rng:element>
          <rng:optional>
            <rng:element name="format"
xmlns:dc="http://purl.org/dc/elements/1.1/">
              <rng:value type="token">text/html</rng:value>
            </rng:element>
          </rng:optional>
          <rng:optional>
            <rng:element name="format"
xmlns:dc="http://purl.org/dc/elements/1.1/">
              <rng:value type="token">text/plain</rng:value>
            </rng:element>
          </rng:optional>
          <rng:optional>
            <rng:element name="format"
xmlns:dc="http://purl.org/dc/elements/1.1/">
              <rng:value type="token">application/pdf</rng:value>
            </rng:element>
          </rng:optional>
          <rng:element name="identifier"
xmlns:dc="http://purl.org/dc/elements/1.1/">
            <rng:data type="ID"></rng:data>
          </rng:element>
          <rng:oneOrMore>
            <rng:element name="language"
xmlns:dc="http://purl.org/dc/elements/1.1/">
              <rng:data type="token"></rng:data>
            </rng:element>
          </rng:oneOrMore>
        </rng:div>
        <rng:div a:id="lsb">
          <a:h>liveSTORYBOARD</a:h>
          <rng:optional>
            <rng:element name="col"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
              <rng:attribute name="name">
                <rng:value type="token">left</rng:value>
              </rng:attribute>
              <rng:oneOrMore>
                <rng:element name="source"
xmlns:dc="http://purl.org/dc/elements/1.1/">
                  <rng:data type="IDREF"></rng:data>
                </rng:element>
              </rng:oneOrMore>
            </rng:element>
          </rng:optional>
          <rng:optional>
            <rng:element name="col"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
              <rng:attribute name="name">
                <rng:value type="token">wide</rng:value>
              </rng:attribute>
              <rng:oneOrMore>
                <rng:element name="source"
xmlns:dc="http://purl.org/dc/elements/1.1/">
                  <rng:data type="IDREF"></rng:data>
                </rng:element>
              </rng:oneOrMore>
            </rng:element>
          </rng:optional>
          <rng:optional>
            <rng:element name="col"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
              <rng:attribute name="name">
                <rng:value type="token">right</rng:value>
              </rng:attribute>
              <rng:oneOrMore>
                <rng:element name="source"
xmlns:dc="http://purl.org/dc/elements/1.1/">
                  <rng:data type="IDREF"></rng:data>
                </rng:element>
              </rng:oneOrMore>
            </rng:element>
          </rng:optional>
          <rng:element name="css"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
            <rng:choice>
              <rng:value type="NMTOKEN">default</rng:value>
              <rng:value type="NMTOKEN">optional</rng:value>
            </rng:choice>
          </rng:element>
          <rng:element name="displ_label"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
            <rng:data type="boolean"></rng:data>
          </rng:element>
          <rng:element name="generate"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
            <rng:data type="boolean"></rng:data>
          </rng:element>
          <rng:element name="label"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
            <rng:data type="token"></rng:data>
          </rng:element>
          <rng:element name="metadata"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
            <rng:data type="boolean"></rng:data>
          </rng:element>
          <rng:element name="print_friendly"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
            <rng:data type="boolean"></rng:data>
          </rng:element>
          <rng:element name="toc"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
            <rng:data type="boolean"></rng:data>
          </rng:element>
          <rng:element name="type"
xmlns:dc="http://livestoryboard.com/schemas/config/2.0/">
            <rng:value type="token">page</rng:value>
          </rng:element>
          <rng:element name="xsl"
xmlns:lsb="http://livestoryboard.com/schemas/config/2.0/">
            <rng:choice>
              <rng:value type="NMTOKEN">xsl:default</rng:value>
              <rng:value type="NMTOKEN">xsl:homepage</rng:value>
              <rng:value type="NMTOKEN">xsl:index</rng:value>
              <rng:value type="NMTOKEN">xsl:sitemap</rng:value>
              <rng:value type="NMTOKEN">xsl:news</rng:value>
              <rng:value type="NMTOKEN">xsl:faqs</rng:value>
              <rng:value type="NMTOKEN">xsl:jobs</rng:value>
              <rng:value type="NMTOKEN">xsl:blog</rng:value>
            </rng:choice>
          </rng:element>
        </rng:div>
      </rng:element>
    </rng:element>
  </rng:start>
</rng:grammar>


>
> --Jeff

Re: [RT] Entities in XML docs

Posted by Jeff Turner <je...@apache.org>.

On Fri, Dec 27, 2002 at 03:02:28PM +0100, Steven Noels wrote:
> Jeff Turner wrote:
> 
> >3.2) We implement a SearchReplaceTransformer, which replaces ${variables} 
> >with
> >values.  Eg, entities.xml:
> >
> ><entities>
> >  <xml4j>Xerces-Java</xml4j>
> >  <xml4j1>Xerces-Java 1</xml4j1>
> >  <xml4j2>Xerces-Java 2</xml4j2>
> >
> >  <xslt4j-current>
> >    ${xslt4j} version 2.4.D1
> >  </xslt4j-current>
> >
> >  <download>
> >    <p>
> >      The ${xslt4j-current} download includes ...
> >    </p>
> >  </download>
> ></entities>
> >
> >This seems a lot more intuitive than XInclude, and doesn't require 
> >modifying
> >DTDs.  We could go all the way and use one of the expression languages in
> >Jakarta Commons, like jexl[1].
> >
> >
> >Are there any more options I haven't thought of?
> >
> >
> >My current preference is to go with 3.2, and implement it with 
> >InputModules, the
> >same way LinkRewriterTransformer works.  Using XInclude would involve less
> >coding, but the DTD problems would be too horrible..
> >
> >Thoughts?
> 
> Seems OK to me - fairly similar to the LinkMapTransformer. I'm still 
> stuck with the obsession we shouldn't use element names as lookup keys, 
> however, so:

:) Think of site.xml as a small database:

PAGE_ID   LABEL        HREF           TIMESTAMP
-------   -----        ----           ---------
dreams    Dream list   dreams.html       ...
faq       FAQs         faq.html
toc       ToC          doclist.html
changes   Changes      changes.html
todo      Todo         todo.html

PAGE_ID is the primary key, and therefore deserves greater syntactic
importance than all the other attributes.  Seems most natural to make the
primary key the element name:

 <dreams label="Dream list" href="dreams.html"/>
 <faq label="FAQs" href="faq.html"/>
 <toc label="ToC" href="doclist.html"/>
 <changes label="Changes" href="changes.html"/>
 <todo label="Todo" href="todo.html"/>

Looking at the definition for NMTOKEN, this also provides some convenient
restrictions to the primary key value, which otherwise we'd have to enforce
with regexps:

Nmtoken  ::= (NameChar)+
NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender


Finally, we could never have a RNG or DTD for site.xml, because it is intended
to be arbirtarily extended, vertically by whatever page classification scheme
the user wants, and horizontally with whatever page metadata the user wants.
There could be attributes for timestamps, access levels, difficulty levels,
related pages, bogosity readings, anything.  At best, we could have a
Schematron enforcing the presence of minimal metadata, ie @href.  Even @label
is optional, eg:

  <primer label="Forrest Primer" href="primer.html">
    <cvs href="#cvs"/>
  </primer>


> <shortcut name="xml4j">Xerces-Java</shortcut>
> 
> These are shortcuts, aren't they? - http://radio.userland.com/shortcuts

Yes, shortcuts, macros etc.  If we go with ${foo} syntax, 'variable' sounds
most natural.


--Jeff

Re: [RT] Entities in XML docs

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

> 3.2) We implement a SearchReplaceTransformer, which replaces ${variables} with
> values.  Eg, entities.xml:
> 
> <entities>
>   <xml4j>Xerces-Java</xml4j>
>   <xml4j1>Xerces-Java 1</xml4j1>
>   <xml4j2>Xerces-Java 2</xml4j2>
> 
>   <xslt4j-current>
>     ${xslt4j} version 2.4.D1
>   </xslt4j-current>
> 
>   <download>
>     <p>
>       The ${xslt4j-current} download includes ...
>     </p>
>   </download>
> </entities>
> 
> This seems a lot more intuitive than XInclude, and doesn't require modifying
> DTDs.  We could go all the way and use one of the expression languages in
> Jakarta Commons, like jexl[1].
> 
> 
> Are there any more options I haven't thought of?
> 
> 
> My current preference is to go with 3.2, and implement it with InputModules, the
> same way LinkRewriterTransformer works.  Using XInclude would involve less
> coding, but the DTD problems would be too horrible..
> 
> Thoughts?

Seems OK to me - fairly similar to the LinkMapTransformer. I'm still 
stuck with the obsession we shouldn't use element names as lookup keys, 
however, so:

<shortcut name="xml4j">Xerces-Java</shortcut>

These are shortcuts, aren't they? - http://radio.userland.com/shortcuts

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: [RT] Entities in XML docs

Posted by Steven Noels <st...@outerthought.org>.

Stefano Mazzocchi wrote:

> But there is *one* thing where entities are very handy: char expansion.
> 
> Stuff like
> 
>  &raquo;
>  &copy;
>  &nbsp;
> 
> are something will be hard to go along without.

My typical advice: if you can't find it in the list of (5) predefined 
entities, and you can't find it on your keyboard or your XML editor 
messes up, use *character references* instead. They are ugly but portable.

If we encourage people to use non-XML constructs for things for which we 
have perfectly valid counterparts in the spec, we alienate them from 
possible portability: reusing a document across XML processing 
environments which uses &#xA9; instead of ${ch:copy} is a nice example. 
Admittedly &#xA9; looks ugly, but it will work in each decent XML 
processing environment, whereas our shortcuts won't.

> And this is the reason why I think that having something like
> 
>  ${ch:raquo}
>  ${ch:copy}
>  ${ch:nbsp}
> 
> will make it possible to have the same funtionality without the need for 
> a DTD.

Yes for text fragments, no for glyphs outside our keyboard range.

I like your position on what XML standards one should use, BTW. 
Unfortunately, the amount of RNG-implementing tools is still quite small.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: [RT] Entities in XML docs

Posted by Stefano Mazzocchi <st...@apache.org>.

Steven Noels wrote:
> Stefano Mazzocchi wrote:
> 
> (comments & snips)
> 
>>> This method has the limitation that values cannot be included halfway 
>>> inside an
>>> attribute.  Eg, we couldn't have
>>>
>>> <s1 title="The <xi:include href="#xml4j"/> project">
>>>   ...
>>> </s1>
>>
>>
>>
>> This is actually a good limitation because it emerges a design pattern 
>> for schema creation: don't use attributes for anything that might 
>> require token expansion.
> 
> 
> Yep. Layman's terms: if you need structure inside, don't use attributes.

Well, no, a little more than that.

For 'token expansion' I mean that you might want to have values inside 
your attribute which are not "immediate" but require infoset modification.

In short, attributes should always be infoset-immutable. If they aren't, 
they should be elements.

Examples:

  here it makes sense to use 'year' as an attribute

  <events year="2002">
   <even by="stefano" when="23 December">I did this and that</event>
   ...
  </events>

  here it doesn't

  <legal copyright-clause="Copyright 2002. All Rights Reserved.">

This doesn't remove all the element vs. attribute concerns, but gives 
another reason to choose one instead of the other.

>>  1) provide a namespace-like prefix for variables
>>
>>  example
>>
>>    <p>Copyright ${char:copy} ${project:year} ${project:owner}. All 
>> Rights Reserved.</p>
>>
>>  then we can associate different fragment collections, some of which 
>> can be inherited across projects.
>>
>>  NOTE: the ${char:copy} will be *MUCH* handy when we get rid of DTDs
> 
> 
> Please expand? I don't parse that sentence...

I hate entities. I think XML shouldn't have them. I think schemas should 
*NOT* mess with the infoset. I'm totally alined with James Clark on 
this. Big time.

In the future, I see the XML world buying more and more into this. XML 
is a syntax to describe a structure. It should not validate, it should 
only contain syntax correctness. Validation belongs to another layer. 
Infoset manipulation/expansion belongs to yet another layer.

In short, this is my view of the future of XML

  infoset description = XML + namespaces - DTD
  infoset validation = Relax NG (or whatever name they end up calling it)
  infoset expansion = XInclude + XPointer
  infoset transformation = XSLT + XPath - document()

where '-' means "thou shall not use!". The above model is the only one 
that enforces SoC.

Since I'll try to push all XML projects I work on to follow the above 
model in the future more and more closely, I'm already thinking of 
better ways to avoid the use of DTD *and* document(), which I consider 
harmful.

Cocoon already makes it hard for you to use document(), so DTDs are the 
next step.

But there is *one* thing where entities are very handy: char expansion.

Stuff like

  &raquo;
  &copy;
  &nbsp;

are something will be hard to go along without.

And this is the reason why I think that having something like

  ${ch:raquo}
  ${ch:copy}
  ${ch:nbsp}

will make it possible to have the same funtionality without the need for 
a DTD.

>>  2) make the token expander copy-over those variables that are not found.
>>
>>  This allows us to avoid the need to escape stuff since normally 
>> variable names don't include a namespace-like prefix and if they do, 
>> they can be escaped with normal CDATA sections.
>>
>>> Are there any more options I haven't thought of?
>>
>>
>>
>> Use Ant filtering. That's how this works on Cocoon right now, but it 
>> requires ant to preprocess all xdocs and this is not an optimal 
>> solution but a hacky one.
> 
> 
> We are trying to shift away from Ant dependencies, since these won't 
> work in a webapp context. We have been using Ant-filtered copying but it 
> bite us already.

I totally agree. I was just listing it for sake of completeness.

> The 'namespace' thing might as well be expanded to support i18n, I 
> assume? Since this touches content and we don't want to put anything in 
> the way (nor do we support anything special) for i18n, we must be sure 
> this won't clash with multi-lingual sites neither.
> 
> If anyone wants to play with a multilingual collection, check out 
> http://cvs.cocoondev.org/cgi-bin/viewcvs.cgi/xreporter/src/documentation/content/xdocs/?cvsroot=xreporter 
> 
> 
> This Transformer should be at the end of the pipeline, since a skin 
> could also contain shortcuts.

Yes, totally.

>>> My current preference is to go with 3.2, and implement it with 
>>> InputModules, the
>>> same way LinkRewriterTransformer works.  Using XInclude would involve 
>>> less
>>> coding, but the DTD problems would be too horrible..
>>
>>
>>
>> Like I showed above, I do see in the future the need for forrest to 
>> support xinclude of document fragments, but this is a separate concern 
>> from the inclusion of string tokens.
> 
> 
> I wouldn't mind them containing document fragments, if we are aware of 
> possible issues 
> (http://marc.theaimsgroup.com/?l=forrest-dev&m=104100883222616&w=2). 
> Consider them being entities on steroids.

I would.

I think that ${} expansion should be used for stuff that is to be 
considered a character() SAX event or a CDATA section. In short: 
unstructured text.

Even if it is structured text (say an XML snippet), it will be 
considered an escaped text string and angle brakets will be passed along 
escaped.

On the other hand, if you want to insert document fragments you'll have 
to use xinclude. That will guarantee separation between structured and 
unstructured infoset expansion.

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: [RT] Entities in XML docs

Posted by Steven Noels <st...@outerthought.org>.

Stefano Mazzocchi wrote:

(comments & snips)

>> This method has the limitation that values cannot be included halfway 
>> inside an
>> attribute.  Eg, we couldn't have
>>
>> <s1 title="The <xi:include href="#xml4j"/> project">
>>   ...
>> </s1>
> 
> 
> This is actually a good limitation because it emerges a design pattern 
> for schema creation: don't use attributes for anything that might 
> require token expansion.

Yep. Layman's terms: if you need structure inside, don't use attributes.

>  1) provide a namespace-like prefix for variables
> 
>  example
> 
>    <p>Copyright ${char:copy} ${project:year} ${project:owner}. All 
> Rights Reserved.</p>
> 
>  then we can associate different fragment collections, some of which can 
> be inherited across projects.
> 
>  NOTE: the ${char:copy} will be *MUCH* handy when we get rid of DTDs

Please expand? I don't parse that sentence...

>  2) make the token expander copy-over those variables that are not found.
> 
>  This allows us to avoid the need to escape stuff since normally 
> variable names don't include a namespace-like prefix and if they do, 
> they can be escaped with normal CDATA sections.
> 
>> Are there any more options I haven't thought of?
> 
> 
> Use Ant filtering. That's how this works on Cocoon right now, but it 
> requires ant to preprocess all xdocs and this is not an optimal solution 
> but a hacky one.

We are trying to shift away from Ant dependencies, since these won't 
work in a webapp context. We have been using Ant-filtered copying but it 
bite us already.

The 'namespace' thing might as well be expanded to support i18n, I 
assume? Since this touches content and we don't want to put anything in 
the way (nor do we support anything special) for i18n, we must be sure 
this won't clash with multi-lingual sites neither.

If anyone wants to play with a multilingual collection, check out 
http://cvs.cocoondev.org/cgi-bin/viewcvs.cgi/xreporter/src/documentation/content/xdocs/?cvsroot=xreporter

This Transformer should be at the end of the pipeline, since a skin 
could also contain shortcuts.

>> My current preference is to go with 3.2, and implement it with 
>> InputModules, the
>> same way LinkRewriterTransformer works.  Using XInclude would involve 
>> less
>> coding, but the DTD problems would be too horrible..
> 
> 
> Like I showed above, I do see in the future the need for forrest to 
> support xinclude of document fragments, but this is a separate concern 
> from the inclusion of string tokens.

I wouldn't mind them containing document fragments, if we are aware of 
possible issues 
(http://marc.theaimsgroup.com/?l=forrest-dev&m=104100883222616&w=2). 
Consider them being entities on steroids.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: [RT] Entities in XML docs

Posted by Stefano Mazzocchi <st...@apache.org>.

Jeff Turner wrote:
> Hi,
> 
> Stylebook has a nice feature whereby a project can create a file,
> entities.ent, containing XML entity definitions for use in project XML
> files.  Here is a sample from Xalan's entities.ent:
> 
> 
> <?xml encoding="US-ASCII"?>
> 
> <!ENTITY xslt "Xalan">
> <!ENTITY xslt4j "Xalan-Java">
> <!ENTITY xslt4j2 "Xalan-Java 2">
> <!ENTITY xslt4j-dist "xalan-j_2_4_D1">
> <!ENTITY xslt4j-dist-bin "&xslt4j-dist;-bin">
> <!ENTITY xslt4j-dist-src "&xslt4j-dist;-src">
> <!ENTITY xslt4j-current "&xslt4j; version 2.4.D1">
> <!ENTITY xslt4j-distdir "http://xml.apache.org/dist/xalan-j/">
> <!ENTITY xml4j "Xerces-Java">
> <!ENTITY xml4j1 "Xerces-Java 1">
> <!ENTITY xml4j2 "Xerces-Java 2">
> <!ENTITY xml4j-used "&xml4j; 2.0.1">
> <!ENTITY xml4j-jar "xercesImpl.jar">
> <!ENTITY xslt4c "Xalan-C++">
> <!ENTITY xml4c "Xerces-C++">
> <!ENTITY download "The &xslt4j-current; download from xml.apache.org includes &xml4j-jar; from &xml4j-used; and xml-apis.jar. For version
> information about the contents of xml-apis.jar, see the JAR manifest.">
> 
> <!ENTITY xsltcwhatsnewhead '<li><link anchor="xsltc">XSLTC</link></li>'>
> 
> <<<<<<<<<<<<
> 
> This entities.ent file is automatically included in the book.dtd, through this
> PEref:
> 
> <!ENTITY % externalEntity SYSTEM "sbk:/sources/entities.ent">
> %externalEntity;
> 
> 
> Reusing snippets of content like this seems a pretty nice feature.  In Forrest,
> we have a couple of options to get the same effect:
> 
> 
> 1) Emulate the Stylebook solution in document-v11.dtd:
> 
> <!ENTITY % externalEntity SYSTEM "context://entities.ent">
> %externalEntity;
> 
> Currently, this just results in an 'unknown protocol: context' error.
> Which is odd, because I thought the XML parser would have an
> EntityResolver set that understands Cocoon protocols.  Or is this just
> wishful thinking?
> 
> The problem with this general approach is that XML docs can no longer be
> validated outside Cocoon, eg from a catalog-aware editor.  IMHO that
> makes this approach unacceptable.

Agreed.

> 2) Tell users to do it themselves.  Each XML file would have something like:
> 
> <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
> "document-v11.dtd" [
> <!ENTITY % local-ents SYSTEM "entities.ent">
> %local-ents;
> ]>
> 
> <document>
>   ...
> </document>
> 
> Simple, effective, and doesn't lock users into using only Forrest.  Only problem
> is, it assumes rather more XML knowledge than I'd expect most doc editors would
> have.  I think this should be our default solution, unless something better
> comes up..

Yes. Right now we could use this to move Xerces and Xalan to forrest but 
for sure it's good to think about a better and more long-term acceptable 
solution.

> 
> 3) Avoid XML entities altogether.
> 
> 3.1) Use XInclude.  Eg, given an entities.xml file:
> 
> <entities>
>   <entity id="xml4j">Xerces-Java</entity>
>   <entity id="xml4j1">Xerces-Java 1</entity>
>   <entity id="xml4j2">Xerces-Java 2</entity>
> 
>   <entity id="xslt4j-current">
>     <xi:include href="#xslt4j"/> version 2.4.D1
>   </entity>
>   <entity id="download">
>     <p>
>       The <xi:include href="#xslt4j-current"/> download includes ...
>     </p>
>   </entity>
> </entities>
> 
> to include an entity, we'd use:
> 
> <xi:include href="../entities.xml#download"/>
> 
> With a SimpleMappingMetaModule we can simplify that to 
> 
> <xi:include href="res:download"/>
> 
> This method has the limitation that values cannot be included halfway inside an
> attribute.  Eg, we couldn't have
> 
> <s1 title="The <xi:include href="#xml4j"/> project">
>   ...
> </s1>

This is actually a good limitation because it emerges a design pattern 
for schema creation: don't use attributes for anything that might 
require token expansion.

We already agreed that sections should have a <title> element and 
attributes should be left for 'element-related' stuff, mostly non 
content related but element-property related.

> Another disadvantage is that it imposes XInclude (and namespaces) on docs.  We
> currently have a DTD based architecture that can't really handle namespaces.

Yep.

> It is also a PITA having to modify the DTD to support xi:include.  Do we define
> it as an inline or block-level element?  We really need both.  Then when users
> want to use Docbook, they must first hack the DTD to allow xi:include.

You are touching the nerve of the XML model right there.

This reminds me of the pre/post-schema-infoset discussion. Big mess 
because you are not taking into consideration the fact that if we use 
xinclude we could also include document fragments, each one of which 
could have multiple namespaced content.

For example, say you use Forrest for your writings and you want to use 
the 'Creative Commons' license (www.creativecommons.org), a nice feature 
to have would be something like this

  <document>
   ....
   <legal>
    <xi:include href="res:license"/>
   </legal>
  </document>

where the license fragment is (this is real!)

<a href="http://creativecommons.org/licenses/by/1.0">
  <img alt="Creative Commons License" border="0" 
src="http://creativecommons.org/images/public/somerights.gif" /></a><br />
This work is licensed under a
<a href="http://creativecommons.org/licenses/by/1.0">Creative Commons 
License</a>.

<rdf:RDF xmlns="http://web.resource.org/cc/"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<Work rdf:about="">
<license rdf:resource="http://creativecommons.org/licenses/by/1.0" />
</Work>

<License rdf:about="http://creativecommons.org/licenses/by/1.0">
    <requires rdf:resource="http://web.resource.org/cc/Attribution" />
    <permits rdf:resource="http://web.resource.org/cc/Reproduction" />
    <permits rdf:resource="http://web.resource.org/cc/Distribution" />
    <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
    <requires rdf:resource="http://web.resource.org/cc/Notice" />
</License>

</rdf:RDF>

But another concern is: when do we do validation? before the infoset 
expansion or after?

> 3.2) We implement a SearchReplaceTransformer, which replaces ${variables} with
> values.  Eg, entities.xml:
> 
> <entities>
>   <xml4j>Xerces-Java</xml4j>
>   <xml4j1>Xerces-Java 1</xml4j1>
>   <xml4j2>Xerces-Java 2</xml4j2>
> 
>   <xslt4j-current>
>     ${xslt4j} version 2.4.D1
>   </xslt4j-current>
> 
>   <download>
>     <p>
>       The ${xslt4j-current} download includes ...
>     </p>
>   </download>
> </entities>
> 
> This seems a lot more intuitive than XInclude, and doesn't require modifying
> DTDs.  We could go all the way and use one of the expression languages in
> Jakarta Commons, like jexl[1].

Yes, this would be very handy for simply token expansion. But I fear 
collisions for documentation that includes lots of ${} stuff.

A solution for this is:

  1) provide a namespace-like prefix for variables

  example

    <p>Copyright ${char:copy} ${project:year} ${project:owner}. All 
Rights Reserved.</p>

  then we can associate different fragment collections, some of which 
can be inherited across projects.

  NOTE: the ${char:copy} will be *MUCH* handy when we get rid of DTDs

  2) make the token expander copy-over those variables that are not found.

  This allows us to avoid the need to escape stuff since normally 
variable names don't include a namespace-like prefix and if they do, 
they can be escaped with normal CDATA sections.

> Are there any more options I haven't thought of?

Use Ant filtering. That's how this works on Cocoon right now, but it 
requires ant to preprocess all xdocs and this is not an optimal solution 
but a hacky one.

> My current preference is to go with 3.2, and implement it with InputModules, the
> same way LinkRewriterTransformer works.  Using XInclude would involve less
> coding, but the DTD problems would be too horrible..

Like I showed above, I do see in the future the need for forrest to 
support xinclude of document fragments, but this is a separate concern 
from the inclusion of string tokens.

I would suggest we go with 3.2 with the prefixed variables that I 
outlined above, but we use them for text-only expansion and avoid using 
them for document fragments.

This removes all pre/post-schema-infoset problems that we'll tackle when 
we have a Relax-based validation system in the future. Right now it's 
not such a high priority.

As for projects that use entity expansion for document fragments, well, 
they will be forced to update their docs. Disturbing, I know, but 
entities shouldn't have been used for that anyway. We cannot enforce bad 
practices at this stage or we'll have a *VERY* hard time when we need to 
validate a post-fragment-aggration stage later on down the road.

Thoughts?

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------