You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by David Crossley <cr...@indexgeo.com.au> on 2003/03/05 03:02:16 UTC

validation of config during build (Was: Re: sitemap validation is broken)

Steven Noels wrote:
> David Crossley wrote:
> 
> > The <map:transformer> used to have children like
> > <use-session-info> etc. but these have been recently
> > changed to <use-session-parameters> etc. However,
> > the sitemap.rng still has the former. Also, the code in
> > o.a.c.transformation.TraxTransformer still uses the
> > former. Are these just typos in the new sitemap.xmap?
> 
> This brings 
> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=103847911212458&w=2 
> back to mind. What 'level' of validation are we able to obtain if we try 
> to put everything into one grammar, which must fit each and every 
> component...? Hm. Bruno is on skying holiday ATM, I'll refer him to this 
> when he's back.

I look forward to Bruno's expertise and help on this
and other matters.

Yes, i agree with the need for a more clever, component-based
approach as soon as possible.

In the meantime, we need something to help keep the new build
on the rails. Already errors have crept in.

I just added the few basic validations that we had in the
old build:
* cocoon.roles - quite strict structural validation.
* cocoon.xconf - extremely loose validation, could be improved.
* sitemap.xmap - cumbersome, but works for now.

The one thing that is missing at the moment is the ability
to have validate.config=false to get around any immediate
build issues. I will try to add that today. (However i am away
for one week holiday, then catch-up.)

So the sitemap validation and the roles validation are
commented-out in the current build. Pity, because that does
not put the build issues in everyone's face.

                         -- o --
The other purpose of my original message was to raise an alarm
about the change in parameter names in the sitemap, which now
do not correspond with code o.a.c.transformation.TraxTransformer

--David


Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Niclas Hedhman <ni...@internuscorp.com>.
On Friday 07 March 2003 19:32, Stefano Mazzocchi wrote:

> I'm more and more considering sitemap validation harmful.

> I propose to blast the sitemap validation alltogether.

Hmmm... It must be this OS thing ;o)
You should perhaps have added, it is a Concern of the Creator, not the 
Executor.

I agree
a. XML validation in "runtime" is just not giving anything understandable back 
to the user if there are errors.

b. Document (sitemap included) validation belongs to creation process. No need 
to re-do the work...


Niclas

Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Stefano Mazzocchi <st...@apache.org>.
Sylvain Wallez wrote:

> Well, I thought I made it clear : although I've not considered the 
> technical details now, I would like to integrate schema-driven syntax 
> checks (I avoid the ambiguous "validation" word) _inside_ the 
> treeprocessor (i.e. at sitemap load-time) to be sure that the sitemap is 
> correct since we cannot assume each user will perform pre-runtime checks.
> 
> The technical details I'm referring to are how we can get meaninful 
> messages from schema-driven syntax check, so that we can display them to 
> the user.
> 
> The benefits of this approach are IMO mutiple :
> - runtime checks, offline checks and schema-driven editors use a single 
> definition of the sitemap grammar,
> - since this grammar becomes an integral part of the sitemap engine, it 
> ensures its consistency and long-term maintainance.
> 
> Deal ?

Yes, but only if the sitemap RNG schema stops trying to validate the 
component parameters and accepts transparently what is given in a 
namespace which is not the sitemap's.



Schematron validation (Re: validation of config during build)

Posted by Jeff Turner <je...@apache.org>.
On Sat, Mar 08, 2003 at 10:03:31AM +0100, Sylvain Wallez wrote:
> Steven Noels wrote:
> 
> >Stefano Mazzocchi wrote:
> >
> >>Steven Noels wrote:
> >
> >>>Looking at the history of sitemap-v06.rng, I can't see this has been 
> >>>happening a lot. Quite contrarily, some (myself included) have been 
> >>>advocating to relax it even further. But dropping it will 
> >>>effectively kill the small circle of people interested in 
> >>>maintaining such a thing.
> >>>
> >>>Reasonable?
> >>
> >>
> >><read my lips> I AM NOT SUGGESTING TO DROP THE SCHEMA!!! </read my lips>
> >>
> >>is that clear enough? should I repeat it?
> >>
> >>I'm suggesting to remove the validation target from the build system 
> >>and improve the way treeprocessor handles errors.
> >>
> >>As I said, i don't care *how* this is done, as long as the error 
> >>messages that users receive are much more meaningful than those silly 
> >>"System ID no found" when an attribute name is wrong.
> >
> >
> >I'm going to be stubborn about this: _if_ we drop the target (I was 
> >already aware of you not pushing to drop the schema, no problem here), 
> >then the few people who care about the schema won't be warned about 
> >required changes anymore.
> >
> >I don't see any relation between the grammar, where and when it should 
> >be used, and the lack of exception handling code in the tree processor.
> >
> >But since we are the only one who care to continue this thread, let's 
> >drop it. I'm going to check what Sylvain has to say about it.
> 
> 
> Well, I thought I made it clear : although I've not considered the 
> technical details now, I would like to integrate schema-driven syntax 
> checks (I avoid the ambiguous "validation" word) _inside_ the 
> treeprocessor (i.e. at sitemap load-time) to be sure that the sitemap is 
> correct since we cannot assume each user will perform pre-runtime checks.
> 
> The technical details I'm referring to are how we can get meaninful 
> messages from schema-driven syntax check, so that we can display them to 
> the user.
> 
> The benefits of this approach are IMO mutiple :
> - runtime checks, offline checks and schema-driven editors use a single 
> definition of the sitemap grammar,
> - since this grammar becomes an integral part of the sitemap engine, it 
> ensures its consistency and long-term maintainance.

My rather unhelpful 2c (knee-deep in Forrest ATM):

For validating an incrementally composed structure like the sitemap,
it might be better to use a rule-based language like Schematron, rather
than RelaxNG/XSD.  Reasons being:

 - In Schematron, the XML is considered valid by default, and
   subsequently constrained.  This assumption makes sense for the
   sitemap, because people will always be defining new components outside
   our control.

 - Schematron handles co-occurrence constraints, e.g. "only check for
   map:generator/driver if @src contains 'XMLDBGenerator'".  With RNG I
   think we'd have to say "anything goes" in user-extensible sections.

 - Schematron schemas combine very naturally.  We could have a schema per
   block, validating just that block's elements, and apply the block
   schemas iteratively.  With RNG, merging block schemas would be *much*
   harder.

 - Jing error messages are completely awful.

Perhaps a Schematron variant subsetted to support STXPath [1] would be
the best Cocoon validation system.  


--Jeff

(who had a long and fascinating talk with Rick Jelliffe a few weeks ago,
and is now a convert, at least until I meet James Clark)


[1] http://www.xml.com/pub/a/2003/02/26/stx.html

> Deal ?
> 
> Sylvain
> 
> -- 
> Sylvain Wallez                                  Anyware Technologies
> http://www.apache.org/~sylvain           http://www.anyware-tech.com
> { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
> 
> 

Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Steven Noels wrote:

> Stefano Mazzocchi wrote:
>
>> Steven Noels wrote:
>
>>> Looking at the history of sitemap-v06.rng, I can't see this has been 
>>> happening a lot. Quite contrarily, some (myself included) have been 
>>> advocating to relax it even further. But dropping it will 
>>> effectively kill the small circle of people interested in 
>>> maintaining such a thing.
>>>
>>> Reasonable?
>>
>>
>> <read my lips> I AM NOT SUGGESTING TO DROP THE SCHEMA!!! </read my lips>
>>
>> is that clear enough? should I repeat it?
>>
>> I'm suggesting to remove the validation target from the build system 
>> and improve the way treeprocessor handles errors.
>>
>> As I said, i don't care *how* this is done, as long as the error 
>> messages that users receive are much more meaningful than those silly 
>> "System ID no found" when an attribute name is wrong.
>
>
> I'm going to be stubborn about this: _if_ we drop the target (I was 
> already aware of you not pushing to drop the schema, no problem here), 
> then the few people who care about the schema won't be warned about 
> required changes anymore.
>
> I don't see any relation between the grammar, where and when it should 
> be used, and the lack of exception handling code in the tree processor.
>
> But since we are the only one who care to continue this thread, let's 
> drop it. I'm going to check what Sylvain has to say about it.


Well, I thought I made it clear : although I've not considered the 
technical details now, I would like to integrate schema-driven syntax 
checks (I avoid the ambiguous "validation" word) _inside_ the 
treeprocessor (i.e. at sitemap load-time) to be sure that the sitemap is 
correct since we cannot assume each user will perform pre-runtime checks.

The technical details I'm referring to are how we can get meaninful 
messages from schema-driven syntax check, so that we can display them to 
the user.

The benefits of this approach are IMO mutiple :
- runtime checks, offline checks and schema-driven editors use a single 
definition of the sitemap grammar,
- since this grammar becomes an integral part of the sitemap engine, it 
ensures its consistency and long-term maintainance.

Deal ?

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }



Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Steven Noels <st...@outerthought.org>.
Stefano Mazzocchi wrote:
> Steven Noels wrote:


>> Looking at the history of sitemap-v06.rng, I can't see this has been 
>> happening a lot. Quite contrarily, some (myself included) have been 
>> advocating to relax it even further. But dropping it will effectively 
>> kill the small circle of people interested in maintaining such a thing.
>>
>> Reasonable?
> 
> 
> <read my lips> I AM NOT SUGGESTING TO DROP THE SCHEMA!!! </read my lips>
> 
> is that clear enough? should I repeat it?
> 
> I'm suggesting to remove the validation target from the build system and 
> improve the way treeprocessor handles errors.
> 
> As I said, i don't care *how* this is done, as long as the error 
> messages that users receive are much more meaningful than those silly 
> "System ID no found" when an attribute name is wrong.

I'm going to be stubborn about this: _if_ we drop the target (I was 
already aware of you not pushing to drop the schema, no problem here), 
then the few people who care about the schema won't be warned about 
required changes anymore.

I don't see any relation between the grammar, where and when it should 
be used, and the lack of exception handling code in the tree processor.

But since we are the only one who care to continue this thread, let's 
drop it. I'm going to check what Sylvain has to say about it.

Cheers,

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at            http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org                stevenn at apache.org


Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Stefano Mazzocchi <st...@apache.org>.
Steven Noels wrote:
> Stefano Mazzocchi wrote:
> 
>> Did I say that I consider having a sitemap schema descriptor harmful?
>>
>> No, damn, I just said that I consider using that schema to validate 
>> the sitemap harmful.
> 
> 
> Let's agree that there exist multiple levels of validation, and that 
> each of them has its own merits. Coincidentally however, XML grammars 
> are also used to drive editors, and since the result of this editing is 
> fed into Java code, it better tries to attain the same level of 
> validation, as close as possible, if at possible.
> 
> 'Which' schema do you mean here...: sitemap-v06.rng, or _any_ XSD/RNG 
> grammar at all? Sorry - just want to know.

I have a hard time explaining myself today, could be this new operating 
system.

>>>> Example, try
>>>>
>>>>  <generate uri="..."/>
>>>>
>>>> where the uri attribute is not allowed in generate (shoulc be 
>>>> 'src'), the treeprocessor totally ignores this and sends the empty 
>>>> string to the parser, resulting in the error
>>>>
>>>>  System ID not found!
>>>>
>>>> Sitemap validation has stopped us from fixing the error messaging 
>>>> capabilities on mistakes.
>>>
>>>
>>>
>>>
>>> I don't parse this: in what way does the sitemap validation relieve 
>>> somebody of the task of properly handling exceptions on the code level?
>>
>>
>>
>> The level of error-cheching of the treeprocessor isn't really that 
>> pretty and know why? because validation removed most of the mistakes 
>> that *us* developers do... but when users don't validate, they come up 
>> with *wierd* error messages that don't give them *any* clue whatsoever 
>> on how to fix the problem.
> 
> 
> Agree on the user aspect. But I don't follow the logic that the lack of 
> error-checking in code is _caused_ by the validation process. That's 
> just too fast to conclude.
> 
>> My reasoning is that if we didn't have validation, we would see the 
>> same mistakes the users see and fix the treeprocessor instead of 
>> patching more and more the validation phase.
> 
> 
> Looking at the history of sitemap-v06.rng, I can't see this has been 
> happening a lot. Quite contrarily, some (myself included) have been 
> advocating to relax it even further. But dropping it will effectively 
> kill the small circle of people interested in maintaining such a thing.
> 
> Reasonable?

<read my lips> I AM NOT SUGGESTING TO DROP THE SCHEMA!!! </read my lips>

is that clear enough? should I repeat it?

I'm suggesting to remove the validation target from the build system and 
improve the way treeprocessor handles errors.

As I said, i don't care *how* this is done, as long as the error 
messages that users receive are much more meaningful than those silly 
"System ID no found" when an attribute name is wrong.


Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Steven Noels <st...@outerthought.org>.
Stefano Mazzocchi wrote:

> Did I say that I consider having a sitemap schema descriptor harmful?
> 
> No, damn, I just said that I consider using that schema to validate the 
> sitemap harmful.

Let's agree that there exist multiple levels of validation, and that 
each of them has its own merits. Coincidentally however, XML grammars 
are also used to drive editors, and since the result of this editing is 
fed into Java code, it better tries to attain the same level of 
validation, as close as possible, if at possible.

'Which' schema do you mean here...: sitemap-v06.rng, or _any_ XSD/RNG 
grammar at all? Sorry - just want to know.

>>> Example, try
>>>
>>>  <generate uri="..."/>
>>>
>>> where the uri attribute is not allowed in generate (shoulc be 'src'), 
>>> the treeprocessor totally ignores this and sends the empty string to 
>>> the parser, resulting in the error
>>>
>>>  System ID not found!
>>>
>>> Sitemap validation has stopped us from fixing the error messaging 
>>> capabilities on mistakes.
>>
>>
>>
>> I don't parse this: in what way does the sitemap validation relieve 
>> somebody of the task of properly handling exceptions on the code level?
> 
> 
> The level of error-cheching of the treeprocessor isn't really that 
> pretty and know why? because validation removed most of the mistakes 
> that *us* developers do... but when users don't validate, they come up 
> with *wierd* error messages that don't give them *any* clue whatsoever 
> on how to fix the problem.

Agree on the user aspect. But I don't follow the logic that the lack of 
error-checking in code is _caused_ by the validation process. That's 
just too fast to conclude.

> My reasoning is that if we didn't have validation, we would see the same 
> mistakes the users see and fix the treeprocessor instead of patching 
> more and more the validation phase.

Looking at the history of sitemap-v06.rng, I can't see this has been 
happening a lot. Quite contrarily, some (myself included) have been 
advocating to relax it even further. But dropping it will effectively 
kill the small circle of people interested in maintaining such a thing.

Reasonable?

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at            http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org                stevenn at apache.org


Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Stefano Mazzocchi <st...@apache.org>.
Steven Noels wrote:
> Stefano Mazzocchi wrote:
> 
>> I'm more and more considering sitemap validation harmful.
>>
>> why:
>>
>> 1) the sitemap logic is too hard to be validated from any validation 
>> language (it requires java runtime capabilitles)
>>
>> 2) it reduces the effort of clean and meaningful error messages in the 
>> treeprocessor
> 
> 
> 'Interesting' perspective, to say the least.
> 
> Some thoughts:
> 
> 1) http://outerthought.net/downloads/sitemap.pdf and 
> http://outerthought.net/downloads/sitemap_a4_poster.pdf
> 
> cat /usr/local/apache/logs/access_log | grep sitemap.pdf | wc -l -> 1825 
>  downloads in 3 months (dec-jan-feb). Add some 2500 in the 4 months 
> preceding that period. And another 2500 for the poster version, brings 
> us to a total of 975 downloads / month for Bruno's sitemap poster.
> 
> ... which means there's a _vested_ interest in trying to understanding 
> the sitemap, and people are even willing to look at some graphical 
> depiction of it in order to understand.
> 
> 2) In our experience, when we confront people with the sitemap, they are 
> bewildered until we give them a copy of Pollo with the sitemap grammar 
> loaded into it and some very basic customization 
> (http://pollo.sourceforge.net/sitemap1.png). I assume the same happens 
> when people see Sunbow. Needless to say, having 3 different grammars for 
> the sitemap is a major PITA (XSD, RNG and a Pollo-specific grammar) is 
> troublesome at best, so some rationalization is more then appropriate.
> 
> 3) Some days ago when investigating 
> http://marc.theaimsgroup.com/?t=104643526200004&r=1&w=2, I encountered 
> some way to 'address' a matched group of a matcher pattern when nesting 
> matchers which I never heard of, and already forgot about it ATM. :-( I 
> can say for myself that I do a reasonable effort in keeping up with 
> new-things-Cocoon, but it was something I clearly missed. I'm pretty 
> sure it is only 'documented in code' or on the mailing list somewhere.

Did I say that I consider having a sitemap schema descriptor harmful?

No, damn, I just said that I consider using that schema to validate the 
sitemap harmful.

>> Example, try
>>
>>  <generate uri="..."/>
>>
>> where the uri attribute is not allowed in generate (shoulc be 'src'), 
>> the treeprocessor totally ignores this and sends the empty string to 
>> the parser, resulting in the error
>>
>>  System ID not found!
>>
>> Sitemap validation has stopped us from fixing the error messaging 
>> capabilities on mistakes.
> 
> 
> I don't parse this: in what way does the sitemap validation relieve 
> somebody of the task of properly handling exceptions on the code level?

The level of error-cheching of the treeprocessor isn't really that 
pretty and know why? because validation removed most of the mistakes 
that *us* developers do... but when users don't validate, they come up 
with *wierd* error messages that don't give them *any* clue whatsoever 
on how to fix the problem.

My reasoning is that if we didn't have validation, we would see the same 
mistakes the users see and fix the treeprocessor instead of patching 
more and more the validation phase.




Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Steven Noels <st...@outerthought.org>.
Stefano Mazzocchi wrote:

> I'm more and more considering sitemap validation harmful.
> 
> why:
> 
> 1) the sitemap logic is too hard to be validated from any validation 
> language (it requires java runtime capabilitles)
> 
> 2) it reduces the effort of clean and meaningful error messages in the 
> treeprocessor

'Interesting' perspective, to say the least.

Some thoughts:

1) http://outerthought.net/downloads/sitemap.pdf and 
http://outerthought.net/downloads/sitemap_a4_poster.pdf

cat /usr/local/apache/logs/access_log | grep sitemap.pdf | wc -l -> 1825 
  downloads in 3 months (dec-jan-feb). Add some 2500 in the 4 months 
preceding that period. And another 2500 for the poster version, brings 
us to a total of 975 downloads / month for Bruno's sitemap poster.

... which means there's a _vested_ interest in trying to understanding 
the sitemap, and people are even willing to look at some graphical 
depiction of it in order to understand.

2) In our experience, when we confront people with the sitemap, they are 
bewildered until we give them a copy of Pollo with the sitemap grammar 
loaded into it and some very basic customization 
(http://pollo.sourceforge.net/sitemap1.png). I assume the same happens 
when people see Sunbow. Needless to say, having 3 different grammars for 
the sitemap is a major PITA (XSD, RNG and a Pollo-specific grammar) is 
troublesome at best, so some rationalization is more then appropriate.

3) Some days ago when investigating 
http://marc.theaimsgroup.com/?t=104643526200004&r=1&w=2, I encountered 
some way to 'address' a matched group of a matcher pattern when nesting 
matchers which I never heard of, and already forgot about it ATM. :-( I 
can say for myself that I do a reasonable effort in keeping up with 
new-things-Cocoon, but it was something I clearly missed. I'm pretty 
sure it is only 'documented in code' or on the mailing list somewhere.

> Example, try
> 
>  <generate uri="..."/>
> 
> where the uri attribute is not allowed in generate (shoulc be 'src'), 
> the treeprocessor totally ignores this and sends the empty string to the 
> parser, resulting in the error
> 
>  System ID not found!
> 
> Sitemap validation has stopped us from fixing the error messaging 
> capabilities on mistakes.

I don't parse this: in what way does the sitemap validation relieve 
somebody of the task of properly handling exceptions on the code level?

> I propose to blast the sitemap validation alltogether.

OK. I know I'm sounding harsh and I don't mean to: it's just one of 
these discussion I had so many times already in my own little company, 
being the only XML-head with two (much smarter) Java-heads. We had the 
same thing with the xReporter report grammar, which admittedly is only 
really handled and interpreted in Java code, yet our initial customer 
wanted to have a proper XML grammar for it.

Why that? For editing purposes. People want to use XML editors for 
editing the sitemap, and these tools _can_ provide proper guidance when 
configured with a grammar. I know we are heading towards your pet peeve 
discussion (*) of pre/post validation Infosets and the various ways each 
of the available grammars suck at grasping these concepts, but still I 
very much believe people will be grateful for anything (apart from 
Java(doc/code)) that guides them during the creation of an XML document, 
or at the least offers them some validation prior to loading the thing 
into Cocoon and see what Cocoon makes out of it.

(*) I must as this discussion is one of my favorite pet peeves, too ;-)

I agree there is a significant amount of overlap and various levels of 
underspecification for-the-sake-of-simplicity when having both some XML 
grammar and executable code which interpretes XML orthogonally to this 
grammar, but still I'm very much +1 for some reasonable quality XML 
grammar, if only to help out our users.

If not, why don't we just specify the sitemap in some own-cooked grammar 
like:

match pattern="news/**"
   match pattern="news/1999/**"
     generate src="oldcontent/news/{1}.html" type="html"
     transform src="styles/old2new.xsl"
   match pattern="news/20*/**"
     generate src="docs/news/20{1}/{2}.xml"
   transform src="news2html.xsl"
   serialize

Gee - I must have been reading too much Python code lately ;-)

Sorry if I sound offensive, I really don't mean to - but it's a personal 
pet peeve ;-)

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at            http://blogs.cocoondev.org/stevenn/
stevenn at outerthought.org                stevenn at apache.org


Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Stefano Mazzocchi wrote:

> Sylvain Wallez wrote:
>
>> Stefano Mazzocchi wrote:
>>
>>> Sylvain Wallez wrote:
>>
>>
>> <snip/>
>>
>>>> So it seems to me validation is good to easily write a syntax 
>>>> checker and let the java code in treeprocessor concentrate on more 
>>>> detailed "semantic" validation.
>>>
>>>
>>> too bad this is not done.
>>
>>
>> So what about *requiring* schema-validation to happen each time a 
>> sitemap is loaded, i.e. have the use of a validating parser be 
>> hard-coded in the treeprocessor. This schema-validation phase would 
>> be a part of the global consistency checks performed by the 
>> treeprocessor,  implemented by tools adequate for this task.
>
>
> I'm only concerned about getting useful error messages out of sitemap 
> loading. I don't care how this is achieved.
>
>>>> Now the problem, AFAIU, comes more from the fact that we're trying 
>>>> to validate not only the sitemap, but also the configuration of 
>>>> each component, which may take very various forms and obey to some 
>>>> complicated logic.
>>>
>>>
>>> yes, but this is an argument on how the sitemap descriptor is 
>>> defined. Why does everybody think that validation and schemas are 
>>> synonims?
>>
>>
>> I don't think they are synonyms, but that a schema is an adequate 
>> tool to easily perform the first phase of a global validation process.
>
>
> From a developer's point of view, sure. From a error message 
> readability point of view, I strongly doubt it since treeprocessor can 
> have much better and meaningful error messages than any validation stage.
>
> I'm not being negative, I'm just trying to reduce the number of 
> misconfiguration questions that will happen on cocoon-users as soon as 
> we release cocoon 2.1
>
OK. I understand your concern. Let's recap the various concerns on the 
table in this area :
- you (and we, this was also one of my goals in the treeprocessor) want 
meaningful messages
- I like a first validation phase driven by a schema for its ease of 
development,
- Steven wants schema-driven editors.

So IMO, what we need is find a way for a schema syntax validator to give 
some meaningful messages for what it is in charge of, which is 
controlling that elements and attributes are the one the sitemap engine 
is waiting for.

Taking your previous example, having "invalid 'uri' parameter at 
foo/sitemap.xmap:145:28" seems meaningful to me. Once this phase is 
successful, it's the treeprocessor responsibility to output messages 
like "Cannot find a generator named 'bar' at foo/sitemap.xmap:145:15" 
(as it already does today).

Thoughts ?

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }



Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Stefano Mazzocchi <st...@apache.org>.
Sylvain Wallez wrote:
> Stefano Mazzocchi wrote:
> 
>> Sylvain Wallez wrote:
> 
> 
> <snip/>
> 
>>> So it seems to me validation is good to easily write a syntax checker 
>>> and let the java code in treeprocessor concentrate on more detailed 
>>> "semantic" validation.
>>
>>
>>
>> too bad this is not done.
> 
> 
> 
> So what about *requiring* schema-validation to happen each time a 
> sitemap is loaded, i.e. have the use of a validating parser be 
> hard-coded in the treeprocessor. This schema-validation phase would be a 
> part of the global consistency checks performed by the treeprocessor,  
> implemented by tools adequate for this task.

I'm only concerned about getting useful error messages out of sitemap 
loading. I don't care how this is achieved.

> 
>>> Now the problem, AFAIU, comes more from the fact that we're trying to 
>>> validate not only the sitemap, but also the configuration of each 
>>> component, which may take very various forms and obey to some 
>>> complicated logic.
>>
>>
>>
>> yes, but this is an argument on how the sitemap descriptor is defined. 
>> Why does everybody think that validation and schemas are synonims?
> 
> 
> 
> I don't think they are synonyms, but that a schema is an adequate tool 
> to easily perform the first phase of a global validation process.

 From a developer's point of view, sure. From a error message 
readability point of view, I strongly doubt it since treeprocessor can 
have much better and meaningful error messages than any validation stage.

I'm not being negative, I'm just trying to reduce the number of 
misconfiguration questions that will happen on cocoon-users as soon as 
we release cocoon 2.1


Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Stefano Mazzocchi wrote:

> Sylvain Wallez wrote:

<snip/>

>> So it seems to me validation is good to easily write a syntax checker 
>> and let the java code in treeprocessor concentrate on more detailed 
>> "semantic" validation.
>
>
> too bad this is not done.


So what about *requiring* schema-validation to happen each time a 
sitemap is loaded, i.e. have the use of a validating parser be 
hard-coded in the treeprocessor. This schema-validation phase would be a 
part of the global consistency checks performed by the treeprocessor,  
implemented by tools adequate for this task.

>> Now the problem, AFAIU, comes more from the fact that we're trying to 
>> validate not only the sitemap, but also the configuration of each 
>> component, which may take very various forms and obey to some 
>> complicated logic.
>
>
> yes, but this is an argument on how the sitemap descriptor is defined. 
> Why does everybody think that validation and schemas are synonims?


I don't think they are synonyms, but that a schema is an adequate tool 
to easily perform the first phase of a global validation process.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }



Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Stefano Mazzocchi <st...@apache.org>.
Sylvain Wallez wrote:
> Stefano Mazzocchi wrote:
> 
>> David Crossley wrote:
> 
> 
> <snip/>
> 
>>>                          -- o --
>>> The other purpose of my original message was to raise an alarm
>>> about the change in parameter names in the sitemap, which now
>>> do not correspond with code o.a.c.transformation.TraxTransformer
>>
>>
>>
>> I'm more and more considering sitemap validation harmful.
>>
>> why:
>>
>> 1) the sitemap logic is too hard to be validated from any validation 
>> language (it requires java runtime capabilitles)
>>
>> 2) it reduces the effort of clean and meaningful error messages in the 
>> treeprocessor
>>
>> Example, try
>>
>>  <generate uri="..."/>
>>
>> where the uri attribute is not allowed in generate (shoulc be 'src'), 
>> the treeprocessor totally ignores this and sends the empty string to 
>> the parser, resulting in the error
>>
>>  System ID not found!
>>
>> Sitemap validation has stopped us from fixing the error messaging 
>> capabilities on mistakes.
>>
>> I propose to blast the sitemap validation alltogether.
> 
> 
> 
> I don't follow you : a schema, although it cannot fully validate a 
> sitemap, can easily check syntax inconsistencies like <map:generate 
> uri="..."/>. The checks performed by the treeprocessor come at a lower 
> lever such as controlling that a used component (type="...") exists, 
> check variable expansion syntax, etc.

Yes, this is *exactly* the reasoning that makes the treeprocessor error 
messages meaningful only if you hit a spot that is not validatable by 
the sitemap.

You are assuming people validate the sitemap before entering it in the 
system.

Unfortunately, this is not automated internally and, externally, only 
few people do.

Result: the level of error message friendlyness of treeprocessor syntax 
errors is poor and users are more often misplaced by them than helped.

> So it seems to me validation is good to easily write a syntax checker 
> and let the java code in treeprocessor concentrate on more detailed 
> "semantic" validation.

too bad this is not done.

> Now the problem, AFAIU, comes more from the fact that we're trying to 
> validate not only the sitemap, but also the configuration of each 
> component, which may take very various forms and obey to some 
> complicated logic.

yes, but this is an argument on how the sitemap descriptor is defined. 
Why does everybody think that validation and schemas are synonims?


Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Stefano Mazzocchi wrote:

> David Crossley wrote:

<snip/>

>>                          -- o --
>> The other purpose of my original message was to raise an alarm
>> about the change in parameter names in the sitemap, which now
>> do not correspond with code o.a.c.transformation.TraxTransformer
>
>
> I'm more and more considering sitemap validation harmful.
>
> why:
>
> 1) the sitemap logic is too hard to be validated from any validation 
> language (it requires java runtime capabilitles)
>
> 2) it reduces the effort of clean and meaningful error messages in the 
> treeprocessor
>
> Example, try
>
>  <generate uri="..."/>
>
> where the uri attribute is not allowed in generate (shoulc be 'src'), 
> the treeprocessor totally ignores this and sends the empty string to 
> the parser, resulting in the error
>
>  System ID not found!
>
> Sitemap validation has stopped us from fixing the error messaging 
> capabilities on mistakes.
>
> I propose to blast the sitemap validation alltogether.


I don't follow you : a schema, although it cannot fully validate a 
sitemap, can easily check syntax inconsistencies like <map:generate 
uri="..."/>. The checks performed by the treeprocessor come at a lower 
lever such as controlling that a used component (type="...") exists, 
check variable expansion syntax, etc.

So it seems to me validation is good to easily write a syntax checker 
and let the java code in treeprocessor concentrate on more detailed 
"semantic" validation.

Now the problem, AFAIU, comes more from the fact that we're trying to 
validate not only the sitemap, but also the configuration of each 
component, which may take very various forms and obey to some 
complicated logic.

Something I've been thinking of long ago (but, as usual, never had the 
time to make real), is a "CheckableConfiguration" : a special 
implementation of the Configuration which would track usage of its data 
and could be queried after use for unused elements or attributes.

This would allow a very simple but complete validation :
- build the sitemap (the whole file is read in a Configuration object),
- lookup once each component in <map:components> to be sure they have 
been configured
- check your CheckableConfiguration for unused items.
Every unused item is a potential syntax error, and you can report them 
all at once.

Thoughts ?

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }



Re: validation of config during build (Was: Re: sitemap validation is broken)

Posted by Stefano Mazzocchi <st...@apache.org>.
David Crossley wrote:
> Steven Noels wrote:
> 
>>David Crossley wrote:
>>
>>
>>>The <map:transformer> used to have children like
>>><use-session-info> etc. but these have been recently
>>>changed to <use-session-parameters> etc. However,
>>>the sitemap.rng still has the former. Also, the code in
>>>o.a.c.transformation.TraxTransformer still uses the
>>>former. Are these just typos in the new sitemap.xmap?
>>
>>This brings 
>>http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=103847911212458&w=2 
>>back to mind. What 'level' of validation are we able to obtain if we try 
>>to put everything into one grammar, which must fit each and every 
>>component...? Hm. Bruno is on skying holiday ATM, I'll refer him to this 
>>when he's back.
> 
> 
> I look forward to Bruno's expertise and help on this
> and other matters.
> 
> Yes, i agree with the need for a more clever, component-based
> approach as soon as possible.
> 
> In the meantime, we need something to help keep the new build
> on the rails. Already errors have crept in.
> 
> I just added the few basic validations that we had in the
> old build:
> * cocoon.roles - quite strict structural validation.
> * cocoon.xconf - extremely loose validation, could be improved.
> * sitemap.xmap - cumbersome, but works for now.
> 
> The one thing that is missing at the moment is the ability
> to have validate.config=false to get around any immediate
> build issues. I will try to add that today. (However i am away
> for one week holiday, then catch-up.)
> 
> So the sitemap validation and the roles validation are
> commented-out in the current build. Pity, because that does
> not put the build issues in everyone's face.
> 
>                          -- o --
> The other purpose of my original message was to raise an alarm
> about the change in parameter names in the sitemap, which now
> do not correspond with code o.a.c.transformation.TraxTransformer

I'm more and more considering sitemap validation harmful.

why:

1) the sitemap logic is too hard to be validated from any validation 
language (it requires java runtime capabilitles)

2) it reduces the effort of clean and meaningful error messages in the 
treeprocessor

Example, try

  <generate uri="..."/>

where the uri attribute is not allowed in generate (shoulc be 'src'), 
the treeprocessor totally ignores this and sends the empty string to the 
parser, resulting in the error

  System ID not found!

Sitemap validation has stopped us from fixing the error messaging 
capabilities on mistakes.

I propose to blast the sitemap validation alltogether.