You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by David Crossley <cr...@indexgeo.com.au> on 2003/03/04 15:10:23 UTC
sitemap validation is broken
I started to bring back some of the stuff from the
old build.xml that conducted validation of the
important configuration files.
However, i needed to comment-out the sitemap validation
because some recent changes to webapp/sitemap.xmap
cause errors.
The <map:transformer> used to have children like
<use-session-info> etc. but these have been recently
changed to <use-session-parameters> etc. However,
the sitemap.rng still has the former. Also, the code in
o.a.c.transformation.TraxTransformer still uses the
former. Are these just typos in the new sitemap.xmap?
There is also a problem with validation of cocoon.roles
where precept.xroles adds a new undefined attribute
for "default-hint".
--David
Re: validation of config during build (Was: Re: sitemap validation is broken)
Posted by Niclas Hedhman <ni...@internuscorp.com>.
On Friday 07 March 2003 19:32, Stefano Mazzocchi wrote:
> I'm more and more considering sitemap validation harmful.
> I propose to blast the sitemap validation alltogether.
Hmmm... It must be this OS thing ;o)
You should perhaps have added, it is a Concern of the Creator, not the
Executor.
I agree
a. XML validation in "runtime" is just not giving anything understandable back
to the user if there are errors.
b. Document (sitemap included) validation belongs to creation process. No need
to re-do the work...
Niclas
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Stefano Mazzocchi <st...@apache.org>.
Sylvain Wallez wrote:
> Well, I thought I made it clear : although I've not considered the
> technical details now, I would like to integrate schema-driven syntax
> checks (I avoid the ambiguous "validation" word) _inside_ the
> treeprocessor (i.e. at sitemap load-time) to be sure that the sitemap is
> correct since we cannot assume each user will perform pre-runtime checks.
>
> The technical details I'm referring to are how we can get meaninful
> messages from schema-driven syntax check, so that we can display them to
> the user.
>
> The benefits of this approach are IMO mutiple :
> - runtime checks, offline checks and schema-driven editors use a single
> definition of the sitemap grammar,
> - since this grammar becomes an integral part of the sitemap engine, it
> ensures its consistency and long-term maintainance.
>
> Deal ?
Yes, but only if the sitemap RNG schema stops trying to validate the
component parameters and accepts transparently what is given in a
namespace which is not the sitemap's.
Schematron validation (Re: validation of config during build)
Posted by Jeff Turner <je...@apache.org>.
On Sat, Mar 08, 2003 at 10:03:31AM +0100, Sylvain Wallez wrote:
> Steven Noels wrote:
>
> >Stefano Mazzocchi wrote:
> >
> >>Steven Noels wrote:
> >
> >>>Looking at the history of sitemap-v06.rng, I can't see this has been
> >>>happening a lot. Quite contrarily, some (myself included) have been
> >>>advocating to relax it even further. But dropping it will
> >>>effectively kill the small circle of people interested in
> >>>maintaining such a thing.
> >>>
> >>>Reasonable?
> >>
> >>
> >><read my lips> I AM NOT SUGGESTING TO DROP THE SCHEMA!!! </read my lips>
> >>
> >>is that clear enough? should I repeat it?
> >>
> >>I'm suggesting to remove the validation target from the build system
> >>and improve the way treeprocessor handles errors.
> >>
> >>As I said, i don't care *how* this is done, as long as the error
> >>messages that users receive are much more meaningful than those silly
> >>"System ID no found" when an attribute name is wrong.
> >
> >
> >I'm going to be stubborn about this: _if_ we drop the target (I was
> >already aware of you not pushing to drop the schema, no problem here),
> >then the few people who care about the schema won't be warned about
> >required changes anymore.
> >
> >I don't see any relation between the grammar, where and when it should
> >be used, and the lack of exception handling code in the tree processor.
> >
> >But since we are the only one who care to continue this thread, let's
> >drop it. I'm going to check what Sylvain has to say about it.
>
>
> Well, I thought I made it clear : although I've not considered the
> technical details now, I would like to integrate schema-driven syntax
> checks (I avoid the ambiguous "validation" word) _inside_ the
> treeprocessor (i.e. at sitemap load-time) to be sure that the sitemap is
> correct since we cannot assume each user will perform pre-runtime checks.
>
> The technical details I'm referring to are how we can get meaninful
> messages from schema-driven syntax check, so that we can display them to
> the user.
>
> The benefits of this approach are IMO mutiple :
> - runtime checks, offline checks and schema-driven editors use a single
> definition of the sitemap grammar,
> - since this grammar becomes an integral part of the sitemap engine, it
> ensures its consistency and long-term maintainance.
My rather unhelpful 2c (knee-deep in Forrest ATM):
For validating an incrementally composed structure like the sitemap,
it might be better to use a rule-based language like Schematron, rather
than RelaxNG/XSD. Reasons being:
- In Schematron, the XML is considered valid by default, and
subsequently constrained. This assumption makes sense for the
sitemap, because people will always be defining new components outside
our control.
- Schematron handles co-occurrence constraints, e.g. "only check for
map:generator/driver if @src contains 'XMLDBGenerator'". With RNG I
think we'd have to say "anything goes" in user-extensible sections.
- Schematron schemas combine very naturally. We could have a schema per
block, validating just that block's elements, and apply the block
schemas iteratively. With RNG, merging block schemas would be *much*
harder.
- Jing error messages are completely awful.
Perhaps a Schematron variant subsetted to support STXPath [1] would be
the best Cocoon validation system.
--Jeff
(who had a long and fascinating talk with Rick Jelliffe a few weeks ago,
and is now a convert, at least until I meet James Clark)
[1] http://www.xml.com/pub/a/2003/02/26/stx.html
> Deal ?
>
> Sylvain
>
> --
> Sylvain Wallez Anyware Technologies
> http://www.apache.org/~sylvain http://www.anyware-tech.com
> { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
>
>
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Steven Noels wrote:
> Stefano Mazzocchi wrote:
>
>> Steven Noels wrote:
>
>>> Looking at the history of sitemap-v06.rng, I can't see this has been
>>> happening a lot. Quite contrarily, some (myself included) have been
>>> advocating to relax it even further. But dropping it will
>>> effectively kill the small circle of people interested in
>>> maintaining such a thing.
>>>
>>> Reasonable?
>>
>>
>> <read my lips> I AM NOT SUGGESTING TO DROP THE SCHEMA!!! </read my lips>
>>
>> is that clear enough? should I repeat it?
>>
>> I'm suggesting to remove the validation target from the build system
>> and improve the way treeprocessor handles errors.
>>
>> As I said, i don't care *how* this is done, as long as the error
>> messages that users receive are much more meaningful than those silly
>> "System ID no found" when an attribute name is wrong.
>
>
> I'm going to be stubborn about this: _if_ we drop the target (I was
> already aware of you not pushing to drop the schema, no problem here),
> then the few people who care about the schema won't be warned about
> required changes anymore.
>
> I don't see any relation between the grammar, where and when it should
> be used, and the lack of exception handling code in the tree processor.
>
> But since we are the only one who care to continue this thread, let's
> drop it. I'm going to check what Sylvain has to say about it.
Well, I thought I made it clear : although I've not considered the
technical details now, I would like to integrate schema-driven syntax
checks (I avoid the ambiguous "validation" word) _inside_ the
treeprocessor (i.e. at sitemap load-time) to be sure that the sitemap is
correct since we cannot assume each user will perform pre-runtime checks.
The technical details I'm referring to are how we can get meaninful
messages from schema-driven syntax check, so that we can display them to
the user.
The benefits of this approach are IMO mutiple :
- runtime checks, offline checks and schema-driven editors use a single
definition of the sitemap grammar,
- since this grammar becomes an integral part of the sitemap engine, it
ensures its consistency and long-term maintainance.
Deal ?
Sylvain
--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Steven Noels <st...@outerthought.org>.
Stefano Mazzocchi wrote:
> Steven Noels wrote:
>> Looking at the history of sitemap-v06.rng, I can't see this has been
>> happening a lot. Quite contrarily, some (myself included) have been
>> advocating to relax it even further. But dropping it will effectively
>> kill the small circle of people interested in maintaining such a thing.
>>
>> Reasonable?
>
>
> <read my lips> I AM NOT SUGGESTING TO DROP THE SCHEMA!!! </read my lips>
>
> is that clear enough? should I repeat it?
>
> I'm suggesting to remove the validation target from the build system and
> improve the way treeprocessor handles errors.
>
> As I said, i don't care *how* this is done, as long as the error
> messages that users receive are much more meaningful than those silly
> "System ID no found" when an attribute name is wrong.
I'm going to be stubborn about this: _if_ we drop the target (I was
already aware of you not pushing to drop the schema, no problem here),
then the few people who care about the schema won't be warned about
required changes anymore.
I don't see any relation between the grammar, where and when it should
be used, and the lack of exception handling code in the tree processor.
But since we are the only one who care to continue this thread, let's
drop it. I'm going to check what Sylvain has to say about it.
Cheers,
</Steven>
--
Steven Noels http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org stevenn at apache.org
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Stefano Mazzocchi <st...@apache.org>.
Steven Noels wrote:
> Stefano Mazzocchi wrote:
>
>> Did I say that I consider having a sitemap schema descriptor harmful?
>>
>> No, damn, I just said that I consider using that schema to validate
>> the sitemap harmful.
>
>
> Let's agree that there exist multiple levels of validation, and that
> each of them has its own merits. Coincidentally however, XML grammars
> are also used to drive editors, and since the result of this editing is
> fed into Java code, it better tries to attain the same level of
> validation, as close as possible, if at possible.
>
> 'Which' schema do you mean here...: sitemap-v06.rng, or _any_ XSD/RNG
> grammar at all? Sorry - just want to know.
I have a hard time explaining myself today, could be this new operating
system.
>>>> Example, try
>>>>
>>>> <generate uri="..."/>
>>>>
>>>> where the uri attribute is not allowed in generate (shoulc be
>>>> 'src'), the treeprocessor totally ignores this and sends the empty
>>>> string to the parser, resulting in the error
>>>>
>>>> System ID not found!
>>>>
>>>> Sitemap validation has stopped us from fixing the error messaging
>>>> capabilities on mistakes.
>>>
>>>
>>>
>>>
>>> I don't parse this: in what way does the sitemap validation relieve
>>> somebody of the task of properly handling exceptions on the code level?
>>
>>
>>
>> The level of error-cheching of the treeprocessor isn't really that
>> pretty and know why? because validation removed most of the mistakes
>> that *us* developers do... but when users don't validate, they come up
>> with *wierd* error messages that don't give them *any* clue whatsoever
>> on how to fix the problem.
>
>
> Agree on the user aspect. But I don't follow the logic that the lack of
> error-checking in code is _caused_ by the validation process. That's
> just too fast to conclude.
>
>> My reasoning is that if we didn't have validation, we would see the
>> same mistakes the users see and fix the treeprocessor instead of
>> patching more and more the validation phase.
>
>
> Looking at the history of sitemap-v06.rng, I can't see this has been
> happening a lot. Quite contrarily, some (myself included) have been
> advocating to relax it even further. But dropping it will effectively
> kill the small circle of people interested in maintaining such a thing.
>
> Reasonable?
<read my lips> I AM NOT SUGGESTING TO DROP THE SCHEMA!!! </read my lips>
is that clear enough? should I repeat it?
I'm suggesting to remove the validation target from the build system and
improve the way treeprocessor handles errors.
As I said, i don't care *how* this is done, as long as the error
messages that users receive are much more meaningful than those silly
"System ID no found" when an attribute name is wrong.
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Steven Noels <st...@outerthought.org>.
Stefano Mazzocchi wrote:
> Did I say that I consider having a sitemap schema descriptor harmful?
>
> No, damn, I just said that I consider using that schema to validate the
> sitemap harmful.
Let's agree that there exist multiple levels of validation, and that
each of them has its own merits. Coincidentally however, XML grammars
are also used to drive editors, and since the result of this editing is
fed into Java code, it better tries to attain the same level of
validation, as close as possible, if at possible.
'Which' schema do you mean here...: sitemap-v06.rng, or _any_ XSD/RNG
grammar at all? Sorry - just want to know.
>>> Example, try
>>>
>>> <generate uri="..."/>
>>>
>>> where the uri attribute is not allowed in generate (shoulc be 'src'),
>>> the treeprocessor totally ignores this and sends the empty string to
>>> the parser, resulting in the error
>>>
>>> System ID not found!
>>>
>>> Sitemap validation has stopped us from fixing the error messaging
>>> capabilities on mistakes.
>>
>>
>>
>> I don't parse this: in what way does the sitemap validation relieve
>> somebody of the task of properly handling exceptions on the code level?
>
>
> The level of error-cheching of the treeprocessor isn't really that
> pretty and know why? because validation removed most of the mistakes
> that *us* developers do... but when users don't validate, they come up
> with *wierd* error messages that don't give them *any* clue whatsoever
> on how to fix the problem.
Agree on the user aspect. But I don't follow the logic that the lack of
error-checking in code is _caused_ by the validation process. That's
just too fast to conclude.
> My reasoning is that if we didn't have validation, we would see the same
> mistakes the users see and fix the treeprocessor instead of patching
> more and more the validation phase.
Looking at the history of sitemap-v06.rng, I can't see this has been
happening a lot. Quite contrarily, some (myself included) have been
advocating to relax it even further. But dropping it will effectively
kill the small circle of people interested in maintaining such a thing.
Reasonable?
</Steven>
--
Steven Noels http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org stevenn at apache.org
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Stefano Mazzocchi <st...@apache.org>.
Steven Noels wrote:
> Stefano Mazzocchi wrote:
>
>> I'm more and more considering sitemap validation harmful.
>>
>> why:
>>
>> 1) the sitemap logic is too hard to be validated from any validation
>> language (it requires java runtime capabilitles)
>>
>> 2) it reduces the effort of clean and meaningful error messages in the
>> treeprocessor
>
>
> 'Interesting' perspective, to say the least.
>
> Some thoughts:
>
> 1) http://outerthought.net/downloads/sitemap.pdf and
> http://outerthought.net/downloads/sitemap_a4_poster.pdf
>
> cat /usr/local/apache/logs/access_log | grep sitemap.pdf | wc -l -> 1825
> downloads in 3 months (dec-jan-feb). Add some 2500 in the 4 months
> preceding that period. And another 2500 for the poster version, brings
> us to a total of 975 downloads / month for Bruno's sitemap poster.
>
> ... which means there's a _vested_ interest in trying to understanding
> the sitemap, and people are even willing to look at some graphical
> depiction of it in order to understand.
>
> 2) In our experience, when we confront people with the sitemap, they are
> bewildered until we give them a copy of Pollo with the sitemap grammar
> loaded into it and some very basic customization
> (http://pollo.sourceforge.net/sitemap1.png). I assume the same happens
> when people see Sunbow. Needless to say, having 3 different grammars for
> the sitemap is a major PITA (XSD, RNG and a Pollo-specific grammar) is
> troublesome at best, so some rationalization is more then appropriate.
>
> 3) Some days ago when investigating
> http://marc.theaimsgroup.com/?t=104643526200004&r=1&w=2, I encountered
> some way to 'address' a matched group of a matcher pattern when nesting
> matchers which I never heard of, and already forgot about it ATM. :-( I
> can say for myself that I do a reasonable effort in keeping up with
> new-things-Cocoon, but it was something I clearly missed. I'm pretty
> sure it is only 'documented in code' or on the mailing list somewhere.
Did I say that I consider having a sitemap schema descriptor harmful?
No, damn, I just said that I consider using that schema to validate the
sitemap harmful.
>> Example, try
>>
>> <generate uri="..."/>
>>
>> where the uri attribute is not allowed in generate (shoulc be 'src'),
>> the treeprocessor totally ignores this and sends the empty string to
>> the parser, resulting in the error
>>
>> System ID not found!
>>
>> Sitemap validation has stopped us from fixing the error messaging
>> capabilities on mistakes.
>
>
> I don't parse this: in what way does the sitemap validation relieve
> somebody of the task of properly handling exceptions on the code level?
The level of error-cheching of the treeprocessor isn't really that
pretty and know why? because validation removed most of the mistakes
that *us* developers do... but when users don't validate, they come up
with *wierd* error messages that don't give them *any* clue whatsoever
on how to fix the problem.
My reasoning is that if we didn't have validation, we would see the same
mistakes the users see and fix the treeprocessor instead of patching
more and more the validation phase.
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Steven Noels <st...@outerthought.org>.
Stefano Mazzocchi wrote:
> I'm more and more considering sitemap validation harmful.
>
> why:
>
> 1) the sitemap logic is too hard to be validated from any validation
> language (it requires java runtime capabilitles)
>
> 2) it reduces the effort of clean and meaningful error messages in the
> treeprocessor
'Interesting' perspective, to say the least.
Some thoughts:
1) http://outerthought.net/downloads/sitemap.pdf and
http://outerthought.net/downloads/sitemap_a4_poster.pdf
cat /usr/local/apache/logs/access_log | grep sitemap.pdf | wc -l -> 1825
downloads in 3 months (dec-jan-feb). Add some 2500 in the 4 months
preceding that period. And another 2500 for the poster version, brings
us to a total of 975 downloads / month for Bruno's sitemap poster.
... which means there's a _vested_ interest in trying to understanding
the sitemap, and people are even willing to look at some graphical
depiction of it in order to understand.
2) In our experience, when we confront people with the sitemap, they are
bewildered until we give them a copy of Pollo with the sitemap grammar
loaded into it and some very basic customization
(http://pollo.sourceforge.net/sitemap1.png). I assume the same happens
when people see Sunbow. Needless to say, having 3 different grammars for
the sitemap is a major PITA (XSD, RNG and a Pollo-specific grammar) is
troublesome at best, so some rationalization is more then appropriate.
3) Some days ago when investigating
http://marc.theaimsgroup.com/?t=104643526200004&r=1&w=2, I encountered
some way to 'address' a matched group of a matcher pattern when nesting
matchers which I never heard of, and already forgot about it ATM. :-( I
can say for myself that I do a reasonable effort in keeping up with
new-things-Cocoon, but it was something I clearly missed. I'm pretty
sure it is only 'documented in code' or on the mailing list somewhere.
> Example, try
>
> <generate uri="..."/>
>
> where the uri attribute is not allowed in generate (shoulc be 'src'),
> the treeprocessor totally ignores this and sends the empty string to the
> parser, resulting in the error
>
> System ID not found!
>
> Sitemap validation has stopped us from fixing the error messaging
> capabilities on mistakes.
I don't parse this: in what way does the sitemap validation relieve
somebody of the task of properly handling exceptions on the code level?
> I propose to blast the sitemap validation alltogether.
OK. I know I'm sounding harsh and I don't mean to: it's just one of
these discussion I had so many times already in my own little company,
being the only XML-head with two (much smarter) Java-heads. We had the
same thing with the xReporter report grammar, which admittedly is only
really handled and interpreted in Java code, yet our initial customer
wanted to have a proper XML grammar for it.
Why that? For editing purposes. People want to use XML editors for
editing the sitemap, and these tools _can_ provide proper guidance when
configured with a grammar. I know we are heading towards your pet peeve
discussion (*) of pre/post validation Infosets and the various ways each
of the available grammars suck at grasping these concepts, but still I
very much believe people will be grateful for anything (apart from
Java(doc/code)) that guides them during the creation of an XML document,
or at the least offers them some validation prior to loading the thing
into Cocoon and see what Cocoon makes out of it.
(*) I must as this discussion is one of my favorite pet peeves, too ;-)
I agree there is a significant amount of overlap and various levels of
underspecification for-the-sake-of-simplicity when having both some XML
grammar and executable code which interpretes XML orthogonally to this
grammar, but still I'm very much +1 for some reasonable quality XML
grammar, if only to help out our users.
If not, why don't we just specify the sitemap in some own-cooked grammar
like:
match pattern="news/**"
match pattern="news/1999/**"
generate src="oldcontent/news/{1}.html" type="html"
transform src="styles/old2new.xsl"
match pattern="news/20*/**"
generate src="docs/news/20{1}/{2}.xml"
transform src="news2html.xsl"
serialize
Gee - I must have been reading too much Python code lately ;-)
Sorry if I sound offensive, I really don't mean to - but it's a personal
pet peeve ;-)
</Steven>
--
Steven Noels http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org stevenn at apache.org
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Stefano Mazzocchi wrote:
> Sylvain Wallez wrote:
>
>> Stefano Mazzocchi wrote:
>>
>>> Sylvain Wallez wrote:
>>
>>
>> <snip/>
>>
>>>> So it seems to me validation is good to easily write a syntax
>>>> checker and let the java code in treeprocessor concentrate on more
>>>> detailed "semantic" validation.
>>>
>>>
>>> too bad this is not done.
>>
>>
>> So what about *requiring* schema-validation to happen each time a
>> sitemap is loaded, i.e. have the use of a validating parser be
>> hard-coded in the treeprocessor. This schema-validation phase would
>> be a part of the global consistency checks performed by the
>> treeprocessor, implemented by tools adequate for this task.
>
>
> I'm only concerned about getting useful error messages out of sitemap
> loading. I don't care how this is achieved.
>
>>>> Now the problem, AFAIU, comes more from the fact that we're trying
>>>> to validate not only the sitemap, but also the configuration of
>>>> each component, which may take very various forms and obey to some
>>>> complicated logic.
>>>
>>>
>>> yes, but this is an argument on how the sitemap descriptor is
>>> defined. Why does everybody think that validation and schemas are
>>> synonims?
>>
>>
>> I don't think they are synonyms, but that a schema is an adequate
>> tool to easily perform the first phase of a global validation process.
>
>
> From a developer's point of view, sure. From a error message
> readability point of view, I strongly doubt it since treeprocessor can
> have much better and meaningful error messages than any validation stage.
>
> I'm not being negative, I'm just trying to reduce the number of
> misconfiguration questions that will happen on cocoon-users as soon as
> we release cocoon 2.1
>
OK. I understand your concern. Let's recap the various concerns on the
table in this area :
- you (and we, this was also one of my goals in the treeprocessor) want
meaningful messages
- I like a first validation phase driven by a schema for its ease of
development,
- Steven wants schema-driven editors.
So IMO, what we need is find a way for a schema syntax validator to give
some meaningful messages for what it is in charge of, which is
controlling that elements and attributes are the one the sitemap engine
is waiting for.
Taking your previous example, having "invalid 'uri' parameter at
foo/sitemap.xmap:145:28" seems meaningful to me. Once this phase is
successful, it's the treeprocessor responsibility to output messages
like "Cannot find a generator named 'bar' at foo/sitemap.xmap:145:15"
(as it already does today).
Thoughts ?
Sylvain
--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Stefano Mazzocchi <st...@apache.org>.
Sylvain Wallez wrote:
> Stefano Mazzocchi wrote:
>
>> Sylvain Wallez wrote:
>
>
> <snip/>
>
>>> So it seems to me validation is good to easily write a syntax checker
>>> and let the java code in treeprocessor concentrate on more detailed
>>> "semantic" validation.
>>
>>
>>
>> too bad this is not done.
>
>
>
> So what about *requiring* schema-validation to happen each time a
> sitemap is loaded, i.e. have the use of a validating parser be
> hard-coded in the treeprocessor. This schema-validation phase would be a
> part of the global consistency checks performed by the treeprocessor,
> implemented by tools adequate for this task.
I'm only concerned about getting useful error messages out of sitemap
loading. I don't care how this is achieved.
>
>>> Now the problem, AFAIU, comes more from the fact that we're trying to
>>> validate not only the sitemap, but also the configuration of each
>>> component, which may take very various forms and obey to some
>>> complicated logic.
>>
>>
>>
>> yes, but this is an argument on how the sitemap descriptor is defined.
>> Why does everybody think that validation and schemas are synonims?
>
>
>
> I don't think they are synonyms, but that a schema is an adequate tool
> to easily perform the first phase of a global validation process.
From a developer's point of view, sure. From a error message
readability point of view, I strongly doubt it since treeprocessor can
have much better and meaningful error messages than any validation stage.
I'm not being negative, I'm just trying to reduce the number of
misconfiguration questions that will happen on cocoon-users as soon as
we release cocoon 2.1
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Stefano Mazzocchi wrote:
> Sylvain Wallez wrote:
<snip/>
>> So it seems to me validation is good to easily write a syntax checker
>> and let the java code in treeprocessor concentrate on more detailed
>> "semantic" validation.
>
>
> too bad this is not done.
So what about *requiring* schema-validation to happen each time a
sitemap is loaded, i.e. have the use of a validating parser be
hard-coded in the treeprocessor. This schema-validation phase would be a
part of the global consistency checks performed by the treeprocessor,
implemented by tools adequate for this task.
>> Now the problem, AFAIU, comes more from the fact that we're trying to
>> validate not only the sitemap, but also the configuration of each
>> component, which may take very various forms and obey to some
>> complicated logic.
>
>
> yes, but this is an argument on how the sitemap descriptor is defined.
> Why does everybody think that validation and schemas are synonims?
I don't think they are synonyms, but that a schema is an adequate tool
to easily perform the first phase of a global validation process.
Sylvain
--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Stefano Mazzocchi <st...@apache.org>.
Sylvain Wallez wrote:
> Stefano Mazzocchi wrote:
>
>> David Crossley wrote:
>
>
> <snip/>
>
>>> -- o --
>>> The other purpose of my original message was to raise an alarm
>>> about the change in parameter names in the sitemap, which now
>>> do not correspond with code o.a.c.transformation.TraxTransformer
>>
>>
>>
>> I'm more and more considering sitemap validation harmful.
>>
>> why:
>>
>> 1) the sitemap logic is too hard to be validated from any validation
>> language (it requires java runtime capabilitles)
>>
>> 2) it reduces the effort of clean and meaningful error messages in the
>> treeprocessor
>>
>> Example, try
>>
>> <generate uri="..."/>
>>
>> where the uri attribute is not allowed in generate (shoulc be 'src'),
>> the treeprocessor totally ignores this and sends the empty string to
>> the parser, resulting in the error
>>
>> System ID not found!
>>
>> Sitemap validation has stopped us from fixing the error messaging
>> capabilities on mistakes.
>>
>> I propose to blast the sitemap validation alltogether.
>
>
>
> I don't follow you : a schema, although it cannot fully validate a
> sitemap, can easily check syntax inconsistencies like <map:generate
> uri="..."/>. The checks performed by the treeprocessor come at a lower
> lever such as controlling that a used component (type="...") exists,
> check variable expansion syntax, etc.
Yes, this is *exactly* the reasoning that makes the treeprocessor error
messages meaningful only if you hit a spot that is not validatable by
the sitemap.
You are assuming people validate the sitemap before entering it in the
system.
Unfortunately, this is not automated internally and, externally, only
few people do.
Result: the level of error message friendlyness of treeprocessor syntax
errors is poor and users are more often misplaced by them than helped.
> So it seems to me validation is good to easily write a syntax checker
> and let the java code in treeprocessor concentrate on more detailed
> "semantic" validation.
too bad this is not done.
> Now the problem, AFAIU, comes more from the fact that we're trying to
> validate not only the sitemap, but also the configuration of each
> component, which may take very various forms and obey to some
> complicated logic.
yes, but this is an argument on how the sitemap descriptor is defined.
Why does everybody think that validation and schemas are synonims?
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Stefano Mazzocchi wrote:
> David Crossley wrote:
<snip/>
>> -- o --
>> The other purpose of my original message was to raise an alarm
>> about the change in parameter names in the sitemap, which now
>> do not correspond with code o.a.c.transformation.TraxTransformer
>
>
> I'm more and more considering sitemap validation harmful.
>
> why:
>
> 1) the sitemap logic is too hard to be validated from any validation
> language (it requires java runtime capabilitles)
>
> 2) it reduces the effort of clean and meaningful error messages in the
> treeprocessor
>
> Example, try
>
> <generate uri="..."/>
>
> where the uri attribute is not allowed in generate (shoulc be 'src'),
> the treeprocessor totally ignores this and sends the empty string to
> the parser, resulting in the error
>
> System ID not found!
>
> Sitemap validation has stopped us from fixing the error messaging
> capabilities on mistakes.
>
> I propose to blast the sitemap validation alltogether.
I don't follow you : a schema, although it cannot fully validate a
sitemap, can easily check syntax inconsistencies like <map:generate
uri="..."/>. The checks performed by the treeprocessor come at a lower
lever such as controlling that a used component (type="...") exists,
check variable expansion syntax, etc.
So it seems to me validation is good to easily write a syntax checker
and let the java code in treeprocessor concentrate on more detailed
"semantic" validation.
Now the problem, AFAIU, comes more from the fact that we're trying to
validate not only the sitemap, but also the configuration of each
component, which may take very various forms and obey to some
complicated logic.
Something I've been thinking of long ago (but, as usual, never had the
time to make real), is a "CheckableConfiguration" : a special
implementation of the Configuration which would track usage of its data
and could be queried after use for unused elements or attributes.
This would allow a very simple but complete validation :
- build the sitemap (the whole file is read in a Configuration object),
- lookup once each component in <map:components> to be sure they have
been configured
- check your CheckableConfiguration for unused items.
Every unused item is a potential syntax error, and you can report them
all at once.
Thoughts ?
Sylvain
--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Re: validation of config during build (Was: Re: sitemap validation
is broken)
Posted by Stefano Mazzocchi <st...@apache.org>.
David Crossley wrote:
> Steven Noels wrote:
>
>>David Crossley wrote:
>>
>>
>>>The <map:transformer> used to have children like
>>><use-session-info> etc. but these have been recently
>>>changed to <use-session-parameters> etc. However,
>>>the sitemap.rng still has the former. Also, the code in
>>>o.a.c.transformation.TraxTransformer still uses the
>>>former. Are these just typos in the new sitemap.xmap?
>>
>>This brings
>>http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=103847911212458&w=2
>>back to mind. What 'level' of validation are we able to obtain if we try
>>to put everything into one grammar, which must fit each and every
>>component...? Hm. Bruno is on skying holiday ATM, I'll refer him to this
>>when he's back.
>
>
> I look forward to Bruno's expertise and help on this
> and other matters.
>
> Yes, i agree with the need for a more clever, component-based
> approach as soon as possible.
>
> In the meantime, we need something to help keep the new build
> on the rails. Already errors have crept in.
>
> I just added the few basic validations that we had in the
> old build:
> * cocoon.roles - quite strict structural validation.
> * cocoon.xconf - extremely loose validation, could be improved.
> * sitemap.xmap - cumbersome, but works for now.
>
> The one thing that is missing at the moment is the ability
> to have validate.config=false to get around any immediate
> build issues. I will try to add that today. (However i am away
> for one week holiday, then catch-up.)
>
> So the sitemap validation and the roles validation are
> commented-out in the current build. Pity, because that does
> not put the build issues in everyone's face.
>
> -- o --
> The other purpose of my original message was to raise an alarm
> about the change in parameter names in the sitemap, which now
> do not correspond with code o.a.c.transformation.TraxTransformer
I'm more and more considering sitemap validation harmful.
why:
1) the sitemap logic is too hard to be validated from any validation
language (it requires java runtime capabilitles)
2) it reduces the effort of clean and meaningful error messages in the
treeprocessor
Example, try
<generate uri="..."/>
where the uri attribute is not allowed in generate (shoulc be 'src'),
the treeprocessor totally ignores this and sends the empty string to the
parser, resulting in the error
System ID not found!
Sitemap validation has stopped us from fixing the error messaging
capabilities on mistakes.
I propose to blast the sitemap validation alltogether.
validation of config during build (Was: Re: sitemap validation is
broken)
Posted by David Crossley <cr...@indexgeo.com.au>.
Steven Noels wrote:
> David Crossley wrote:
>
> > The <map:transformer> used to have children like
> > <use-session-info> etc. but these have been recently
> > changed to <use-session-parameters> etc. However,
> > the sitemap.rng still has the former. Also, the code in
> > o.a.c.transformation.TraxTransformer still uses the
> > former. Are these just typos in the new sitemap.xmap?
>
> This brings
> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=103847911212458&w=2
> back to mind. What 'level' of validation are we able to obtain if we try
> to put everything into one grammar, which must fit each and every
> component...? Hm. Bruno is on skying holiday ATM, I'll refer him to this
> when he's back.
I look forward to Bruno's expertise and help on this
and other matters.
Yes, i agree with the need for a more clever, component-based
approach as soon as possible.
In the meantime, we need something to help keep the new build
on the rails. Already errors have crept in.
I just added the few basic validations that we had in the
old build:
* cocoon.roles - quite strict structural validation.
* cocoon.xconf - extremely loose validation, could be improved.
* sitemap.xmap - cumbersome, but works for now.
The one thing that is missing at the moment is the ability
to have validate.config=false to get around any immediate
build issues. I will try to add that today. (However i am away
for one week holiday, then catch-up.)
So the sitemap validation and the roles validation are
commented-out in the current build. Pity, because that does
not put the build issues in everyone's face.
-- o --
The other purpose of my original message was to raise an alarm
about the change in parameter names in the sitemap, which now
do not correspond with code o.a.c.transformation.TraxTransformer
--David
Re: sitemap validation is broken
Posted by Steven Noels <st...@outerthought.org>.
David Crossley wrote:
> The <map:transformer> used to have children like
> <use-session-info> etc. but these have been recently
> changed to <use-session-parameters> etc. However,
> the sitemap.rng still has the former. Also, the code in
> o.a.c.transformation.TraxTransformer still uses the
> former. Are these just typos in the new sitemap.xmap?
This brings
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=103847911212458&w=2
back to mind. What 'level' of validation are we able to obtain if we try
to put everything into one grammar, which must fit each and every
component...? Hm. Bruno is on skying holiday ATM, I'll refer him to this
when he's back.
</Steven>
--
Steven Noels http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org stevenn at apache.org