You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Mike Beckerle <mb...@apache.org> on 2023/02/13 17:40:55 UTC

Page on DFDL Schema Style Guidance

I created a wiki page to document some best practices we've gradually
accumulated with experience building large DFDL schemas.

Comments are welcome

https://cwiki.apache.org/confluence/display/DAFFODIL/DFDL+Schema+Style+Guide

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com

Re: Page on DFDL Schema Style Guidance

Posted by Mike Beckerle <mb...@apache.org>.
I'll add the rational for elementFormDefault="unqualified". Which is:

There's no need for every child element to have a namespace (hence prefix),
when the tree they are part of has a prefix somewhere further up the tree
which makes the identity of those child elements unambiguous.

I wish we had known this when we created TDML for example.

As for using the xmlns="the target namespace", with
elementFormDefault="qualified" the problem with this is that it doesn't
compose well, as you anticipated.

To use such a schema in a context with some other default namespace, you
have to bind your namespace to some other prefix, at which point you will
get every element, including every child element, carrying a prefix.

The right compromise seems to be

(a) not to define global elements that users *must* use, rather, just
define types and/or groups.
(b) elementFormDefault="unqualified"
(c) avoid using the default namespace binding except for when using a
language like XSD or TDML, where it is just a convenience to reduce prefix
clutter, and won't show up in instance documents.





On Mon, Feb 13, 2023 at 2:02 PM Interrante, John A (GE Research, US) <
John.Interrante@ge.com> wrote:

> I have also transitioned away from defining element references to defining
> complex types and only one global element on my own, so I agree completely
> with these best practices.
>
> You've recommended that DFDL schemas should use
> elementFormDefault="unqualified" in the style guide without giving a reason
> for it.  Can you please expand that one sentence to include your rationale,
> such as that the XML instance document would become larger because every
> single element in it would have a prefix even when all the elements are
> already in the same target namespace?
>
> Interestingly enough, I was experimenting with the ex_nums.dfdl.xsd schema
> a while ago when testing it with the C code generator and I happened to
> change it to use elementFormDefault="qualified", targetNamespace="
> http://example.com", and xmlns="http://example.com" all together.  I
> found out that the combination of these 3 attributes together was the
> secret sauce to outputting the simplest XML instance document yet:
>
> <ex_nums xmlns="http://example.com">
>   <array>
> ...
>   </fixed>
> </ex_nums>
>
> The root element had an xmlns attribute and no prefix.  No other elements
> had either an xmlns attribute or a prefix.  Normally, a schema uses both a
> target namespace and a prefix, and the XML instance document ends up
> looking like:
>
> <ex:ex_nums xmlns:ex="http://example.com">
>   <array>
> ...
>   </fixed>
> </ex:ex_nums>
>
> Which is not too bad, and I'm sure that makes it easier to combine
> elements from multiple schemas with different target namespaces and
> prefixes in the same XML instance document.  The magic combination above
> probably works only for a single example schema and its instance document
> and can't scale to large schemas.  If you try the magic combination but
> change xmlns="http://example.com" to xmlns:ex="http://example.com", you
> go from the smallest XML instance document to the largest XML instance
> document:
>
> <ex:ex_nums xmlns:ex="http://example.com">
>   <ex:array>
> ...
>   </ex:fixed>
> </ex:ex_nums>
>
> Every single element gets a prefix, so that's probably your reason for
> recommending elementFormDefault="unqualified".
>
> John
>
>
> From: Mike Beckerle <mb...@apache.org>
> Sent: Monday, February 13, 2023 9:41 AM
> To: users@daffodil.apache.org
> Subject: EXT: Page on DFDL Schema Style Guidance
>
> Comments are welcome
>
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/DFDL+Schema+Style+Guide
>
>
> Mike Beckerle
> Apache Daffodil PMC | http://daffodil.apache.org/
> OGF DFDL Workgroup Co-Chair |
> http://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> Owl Cyber Defense | http://www.owlcyberdefense.com/
>
>
>

RE: Page on DFDL Schema Style Guidance

Posted by "Interrante, John A (GE Research, US)" <Jo...@ge.com>.
I have also transitioned away from defining element references to defining complex types and only one global element on my own, so I agree completely with these best practices.  

You've recommended that DFDL schemas should use elementFormDefault="unqualified" in the style guide without giving a reason for it.  Can you please expand that one sentence to include your rationale, such as that the XML instance document would become larger because every single element in it would have a prefix even when all the elements are already in the same target namespace?

Interestingly enough, I was experimenting with the ex_nums.dfdl.xsd schema a while ago when testing it with the C code generator and I happened to change it to use elementFormDefault="qualified", targetNamespace="http://example.com", and xmlns="http://example.com" all together.  I found out that the combination of these 3 attributes together was the secret sauce to outputting the simplest XML instance document yet:

<ex_nums xmlns="http://example.com">
  <array>
...
  </fixed>
</ex_nums>

The root element had an xmlns attribute and no prefix.  No other elements had either an xmlns attribute or a prefix.  Normally, a schema uses both a target namespace and a prefix, and the XML instance document ends up looking like:

<ex:ex_nums xmlns:ex="http://example.com">
  <array>
...
  </fixed>
</ex:ex_nums>

Which is not too bad, and I'm sure that makes it easier to combine elements from multiple schemas with different target namespaces and prefixes in the same XML instance document.  The magic combination above probably works only for a single example schema and its instance document and can't scale to large schemas.  If you try the magic combination but change xmlns="http://example.com" to xmlns:ex="http://example.com", you go from the smallest XML instance document to the largest XML instance document:

<ex:ex_nums xmlns:ex="http://example.com">
  <ex:array>
...
  </ex:fixed>
</ex:ex_nums>

Every single element gets a prefix, so that's probably your reason for recommending elementFormDefault="unqualified".

John


From: Mike Beckerle <mb...@apache.org> 
Sent: Monday, February 13, 2023 9:41 AM
To: users@daffodil.apache.org
Subject: EXT: Page on DFDL Schema Style Guidance

Comments are welcome

https://cwiki.apache.org/confluence/display/DAFFODIL/DFDL+Schema+Style+Guide


Mike Beckerle 
Apache Daffodil PMC | http://daffodil.apache.org/
OGF DFDL Workgroup Co-Chair | http://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | http://www.owlcyberdefense.com/