You are viewing a plain text version of this content. The canonical link for it is here.

Posted to xml-commons-dev@xerces.apache.org by Neil Graham <ne...@ca.ibm.com> on 2003/05/06 00:38:47 UTC

Re: [ANN] XInclude processor for xml-commons

Hi Stefano,

Sorry about the delay; I was really hoping that Andy Clark (the "Father of
XNI") would pick up this thread.  He'd do a much better job of contrasting
XNI with SAX, and especially giving you the historical angle (I came on the
scene a bit after XNI had gotten out of the crib, so by then the decision
to break from SAX was already made).

> Can you tell us why SAX is not powerful enough for what you need?

Well it's turned out to be pretty handy to be able to pass "augmentations"
along with the events representing the infoset.  SAX doesn't have any kind
of configuration-management infrastructure:  If you want to set up a
pipeline of XMLFilters then you have to manage propagation of features and
properties to the components that comprise the pipeline on your own; XNI
sorts this out.  SAX is also somewhat impoverished for people who care a
great deal about the lexical layout of DTD's; there isn't much you can't
get in this respect from XNI.  Although SAX 1.1 extentions (still in
alpha!) could rectify this, as it stands it's not possible to determine the
version or encoding of a document with "stable" SAX interfaces.

> It does. But Cocoon is *entirely* built around SAX. Moving to XNI is an
> incredibly difficult task. Nobody will do that just for the sake of it.

I can understand that.

> And since Cocoon almost never validates and cocoon already has xinclude
> transformers built around SAX. I really don't see a need for such a
> massive transition.

I think my question revolved around Joerg's assertion that the spec thinks
of XInclude processing being done before validation.  If you don't do
validation of any kind anyway, then clearly this won't disturb you; but if
you did--or wanted to follow as closely to the spec as possible--then you'd
be confronted by the fact that most SAX parsers have validation built in,
not built as a separate module that you can drop an XInclude processor in
front of.

> Now, please, help us understand: what are the differences between XNI
> and SAX and what would we gain basing the entire cocoon architecture
> around XNI events instead of SAX events?

At this stage, I'm afraid the biggest pro might be that SAX just doesn't
seem to be all that healthy these days; I don't know if you lurk on its
development lists, but they've been pretty much dead for quite some time...

But I'll leave the hard job of selling XNI to Andy; it works well for us,
but perhaps SAX is good enough for what you need it to do.  Cocoon being
the huge project it is, I certainly wouldn't blame you for needing some
very solid reasons to migrate to a different pipeline framework!

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com

|---------+---------------------------->
|         |           Stefano Mazzocchi|
|         |           <stefano@apache.o|
|         |           rg>              |
|         |                            |
|         |           04/30/2003 11:04 |
|         |           AM               |
|         |                            |
|---------+---------------------------->
  >---------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                                             |
  |       To:       cocoon-dev@xml.apache.org                                                                                                   |
  |       cc:       commons-dev@xml.apache.org, xerces-j-dev@xml.apache.org                                                                     |
  |       Subject:  Re: [ANN] XInclude processor for xml-commons                                                                                |
  |                                                                                                                                             |
  |                                                                                                                                             |
  >---------------------------------------------------------------------------------------------------------------------------------------------|

on 4/29/03 6:41 PM Neil Graham wrote:

> I see what you're saying, I think:  you like the idea of a componentized
> parser that operates in a pipeline, but you wish that SAX had been used
as
> the glue for such a beast, rather than its own API.  Right?
>
> And if SAX had seemed rich and complete enough, and had it supported the
> kind of configuration-management facilities that are needed for such an
> arrangement, I imagine that's the road we would have gone down.  But it
> isn't/doesn't, so what was there to do?  :)

Can you tell us why SAX is not powerful enough for what you need?
Caution: I'm not sarcastic, just curious.

> So I guess I'd turn the question around:  If using SAX means you're stuck
> with monolithic-looking parsers, but XNI would give you a parser with all
> kinds of flexibility, maybe using XNI might merit consideration?  :)

It does. But Cocoon is *entirely* built around SAX. Moving to XNI is an
incredibly difficult task. Nobody will do that just for the sake of it.
And since Cocoon almost never validates and cocoon already has xinclude
transformers built around SAX. I really don't see a need for such a
massive transition.

of course, this might well be blindness from our side or simply
ignorance on the XNI paradigms.

> Sure
> it would bind you to a specific parser (until other parsers begin to use
> XNI :)) ), but if you write your own SAX-glued parser then you're stuck
to
> that anyway...

SAX is much more used than XNI and this will containue to be so for a
long while, independently of its technical merits. This is a fact we
simply cannot forget to take into consideration.

>>Dumb proposal: why not a XNI2SAX Filter that uses an XNI processor for
>>SAX?
>
>
> We already have one:  org.apache.xerces.parsers.SAXParser; heck, it even
> understands both SAX1 and SAX2!  :)  The trouble is that, once you're
> emitting SAX, you're not in the Xerces pipeline world anymore; so this
guy
> is only useful at the end of such a pipeline (whatever XNI components
that
> pipeline has in it).  So there's no way of putting a Xerces validator
after
> that component, for instance (to do so would kind of defeat the point of
> XNI).  Does that explain anything?

Yes and it's obviuos that such a XNI2SAX filter existed because Xerces
has to emit SAX events.

Now, please, help us understand: what are the differences between XNI
and SAX and what would we gain basing the entire cocoon architecture
around XNI events instead of SAX events?

Also, would would be the cons?

TIA for this.

--
Stefano.

Re: [ANN] XInclude processor for xml-commons

Posted by "J.Pietschmann" <j3...@yahoo.de>.

Stefano Mazzocchi wrote:
> Yes. I can only speak for myself (and I encouradge others to speak up if
> I'm mistaken) but I never felt the need for something better. (I do have
> some issues with the fact that SAX is lossy in respect of the original
> whitespace between attributes and attribute order, but that's not an
> issue for cocoon)

If
1. XPath/XSLT 2.0 proves to be the Next Big Thing and
2. people start to match on *datatypes* rather then element names and/or
3. Cocoon gets used as web services provider,
then you'll get pressure to adapt something which can actually
supply said data types from the schema validator or XQuery generator
to the XSLT processor or SOAP serializer.

J.Pietschmann

Re: [ANN] XInclude processor for xml-commons

Posted by Stefano Mazzocchi <st...@apache.org>.

on 5/8/03 2:54 PM Andy Clark wrote:

Andy,

thanks for replying. This is very useful information for us.

I did have a problem with SAX that you mentioned: attributes being
read-only. I also think that an event that includes metadata
capabilities might turn out to be *very* useful for us in the future.

> Also, XNI fulfills a different need than SAX. XNI is
> for building different kinds of parsers whereas SAX
> is for communicating document information to the app.
> XNI could also be used for this purpose but that is
> not the primary goal. And we've never marketed XNI
> that way.

Do you see any technological disadvantage in replacing SAX with XNI?

-- 
Stefano.

Re: [ANN] XInclude processor for xml-commons

Posted by Stefano Mazzocchi <st...@apache.org>.

on 5/8/03 2:54 PM Andy Clark wrote:

Andy,

thanks for replying. This is very useful information for us.

I did have a problem with SAX that you mentioned: attributes being
read-only. I also think that an event that includes metadata
capabilities might turn out to be *very* useful for us in the future.

> Also, XNI fulfills a different need than SAX. XNI is
> for building different kinds of parsers whereas SAX
> is for communicating document information to the app.
> XNI could also be used for this purpose but that is
> not the primary goal. And we've never marketed XNI
> that way.

Do you see any technological disadvantage in replacing SAX with XNI?

-- 
Stefano.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [ANN] XInclude processor for xml-commons

Posted by Andy Clark <an...@apache.org>.

Stefano Mazzocchi wrote:
>>Well it's turned out to be pretty handy to be able to 
 >>pass "augmentations" along with the events representing
 >>the infoset.
> 
> Can you make an explicit example of this? that would 
 > help us understand better.

In general, the augmentations allow us to extend the
existing framework (in certain ways) within breaking
the method prototypes. But they also have other uses.
For example, in my NekoHTML parser, I use them as a
way of communicating whether elements were specified
in the original document stream or were synthesized
by the tag balancer.

However, the augmentations will most often be used
for passing the PSVI information. And back when we
were debating augmentations, this was the major topic
of discussion. I pushed back from adding XML Schema
related information to XNI in favor of a more generic
system that allowed us to add XML Schema information
as well as any other information we wanted.

>>SAX doesn't have any kind of configuration-management 
 >>infrastructure:  If you want to set up a pipeline of
 >>XMLFilters then you have to manage propagation of
 >>features and properties to the components that comprise
 >>the pipeline on your own;
> 
> Hmmm, I see, but why do you pass such information thru 
 > the pipeline and not as a pipeline context?

When the information is directly associated to the data
being passed through the pipline, then it makes sense to
pass that information via augmentations of that data. In
addition, XNI is designed so that components can alter
the data passed through the pipeline. If an individual
component were to remove information from the pipeline,
it may not be able to understand which associated items
of information to remove if that information is only
passed via a pipeline context.

And it should be noted that we also have the concept
of a pipeline context within XNI. If a component in the
pipeline is interested in the overall settings of the
pipeline, then it can implement XMLComponent. Then,
before the parse it is notified and passed a reference
to the component manager which stores the context. So
it requests just those settings that it understands.

>>SAX is also somewhat impoverished for people who care 
 >>a great deal about the lexical layout of DTD's; there
 >>isn't much you can't get in this respect from XNI.
 >>Although SAX 1.1 extentions (still in alpha!) could
 >>rectify this, as it stands it's not possible to determine
 >>the version or encoding of a document with "stable" SAX
 >>interfaces.
> 
> I see. But the very fact that I, for one, never noticed those
> limitations might seem to advocate against such a move.

Neil is correct when he said that SAX is "impoverished".
And that is one of the major reasons that we made XNI in
the first place -- SAX simply didn't communicate the
information that we needed. Another problem was that
the information was read-only (e.g. Attributes). This
situation may change but at the time we were designing
XNI it was insufficient for our needs.

Also, XNI fulfills a different need than SAX. XNI is
for building different kinds of parsers whereas SAX
is for communicating document information to the app.
XNI could also be used for this purpose but that is
not the primary goal. And we've never marketed XNI
that way.

-- 
Andy Clark * andyc@apache.org

Re: [ANN] XInclude processor for xml-commons

Posted by "J.Pietschmann" <j3...@yahoo.de>.

Stefano Mazzocchi wrote:
> Yes. I can only speak for myself (and I encouradge others to speak up if
> I'm mistaken) but I never felt the need for something better. (I do have
> some issues with the fact that SAX is lossy in respect of the original
> whitespace between attributes and attribute order, but that's not an
> issue for cocoon)

If
1. XPath/XSLT 2.0 proves to be the Next Big Thing and
2. people start to match on *datatypes* rather then element names and/or
3. Cocoon gets used as web services provider,
then you'll get pressure to adapt something which can actually
supply said data types from the schema validator or XQuery generator
to the XSLT processor or SOAP serializer.

J.Pietschmann

Re: [ANN] XInclude processor for xml-commons

Posted by Andy Clark <an...@apache.org>.

Stefano Mazzocchi wrote:
>>Well it's turned out to be pretty handy to be able to 
 >>pass "augmentations" along with the events representing
 >>the infoset.
> 
> Can you make an explicit example of this? that would 
 > help us understand better.

In general, the augmentations allow us to extend the
existing framework (in certain ways) within breaking
the method prototypes. But they also have other uses.
For example, in my NekoHTML parser, I use them as a
way of communicating whether elements were specified
in the original document stream or were synthesized
by the tag balancer.

However, the augmentations will most often be used
for passing the PSVI information. And back when we
were debating augmentations, this was the major topic
of discussion. I pushed back from adding XML Schema
related information to XNI in favor of a more generic
system that allowed us to add XML Schema information
as well as any other information we wanted.

>>SAX doesn't have any kind of configuration-management 
 >>infrastructure:  If you want to set up a pipeline of
 >>XMLFilters then you have to manage propagation of
 >>features and properties to the components that comprise
 >>the pipeline on your own;
> 
> Hmmm, I see, but why do you pass such information thru 
 > the pipeline and not as a pipeline context?

When the information is directly associated to the data
being passed through the pipline, then it makes sense to
pass that information via augmentations of that data. In
addition, XNI is designed so that components can alter
the data passed through the pipeline. If an individual
component were to remove information from the pipeline,
it may not be able to understand which associated items
of information to remove if that information is only
passed via a pipeline context.

And it should be noted that we also have the concept
of a pipeline context within XNI. If a component in the
pipeline is interested in the overall settings of the
pipeline, then it can implement XMLComponent. Then,
before the parse it is notified and passed a reference
to the component manager which stores the context. So
it requests just those settings that it understands.

>>SAX is also somewhat impoverished for people who care 
 >>a great deal about the lexical layout of DTD's; there
 >>isn't much you can't get in this respect from XNI.
 >>Although SAX 1.1 extentions (still in alpha!) could
 >>rectify this, as it stands it's not possible to determine
 >>the version or encoding of a document with "stable" SAX
 >>interfaces.
> 
> I see. But the very fact that I, for one, never noticed those
> limitations might seem to advocate against such a move.

Neil is correct when he said that SAX is "impoverished".
And that is one of the major reasons that we made XNI in
the first place -- SAX simply didn't communicate the
information that we needed. Another problem was that
the information was read-only (e.g. Attributes). This
situation may change but at the time we were designing
XNI it was insufficient for our needs.

Also, XNI fulfills a different need than SAX. XNI is
for building different kinds of parsers whereas SAX
is for communicating document information to the app.
XNI could also be used for this purpose but that is
not the primary goal. And we've never marketed XNI
that way.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: [ANN] XInclude processor for xml-commons

Posted by Andy Clark <an...@apache.org>.

Stefano Mazzocchi wrote:
>>Well it's turned out to be pretty handy to be able to 
 >>pass "augmentations" along with the events representing
 >>the infoset.
> 
> Can you make an explicit example of this? that would 
 > help us understand better.

In general, the augmentations allow us to extend the
existing framework (in certain ways) within breaking
the method prototypes. But they also have other uses.
For example, in my NekoHTML parser, I use them as a
way of communicating whether elements were specified
in the original document stream or were synthesized
by the tag balancer.

However, the augmentations will most often be used
for passing the PSVI information. And back when we
were debating augmentations, this was the major topic
of discussion. I pushed back from adding XML Schema
related information to XNI in favor of a more generic
system that allowed us to add XML Schema information
as well as any other information we wanted.

>>SAX doesn't have any kind of configuration-management 
 >>infrastructure:  If you want to set up a pipeline of
 >>XMLFilters then you have to manage propagation of
 >>features and properties to the components that comprise
 >>the pipeline on your own;
> 
> Hmmm, I see, but why do you pass such information thru 
 > the pipeline and not as a pipeline context?

When the information is directly associated to the data
being passed through the pipline, then it makes sense to
pass that information via augmentations of that data. In
addition, XNI is designed so that components can alter
the data passed through the pipeline. If an individual
component were to remove information from the pipeline,
it may not be able to understand which associated items
of information to remove if that information is only
passed via a pipeline context.

And it should be noted that we also have the concept
of a pipeline context within XNI. If a component in the
pipeline is interested in the overall settings of the
pipeline, then it can implement XMLComponent. Then,
before the parse it is notified and passed a reference
to the component manager which stores the context. So
it requests just those settings that it understands.

>>SAX is also somewhat impoverished for people who care 
 >>a great deal about the lexical layout of DTD's; there
 >>isn't much you can't get in this respect from XNI.
 >>Although SAX 1.1 extentions (still in alpha!) could
 >>rectify this, as it stands it's not possible to determine
 >>the version or encoding of a document with "stable" SAX
 >>interfaces.
> 
> I see. But the very fact that I, for one, never noticed those
> limitations might seem to advocate against such a move.

Neil is correct when he said that SAX is "impoverished".
And that is one of the major reasons that we made XNI in
the first place -- SAX simply didn't communicate the
information that we needed. Another problem was that
the information was read-only (e.g. Attributes). This
situation may change but at the time we were designing
XNI it was insufficient for our needs.

Also, XNI fulfills a different need than SAX. XNI is
for building different kinds of parsers whereas SAX
is for communicating document information to the app.
XNI could also be used for this purpose but that is
not the primary goal. And we've never marketed XNI
that way.

-- 
Andy Clark * andyc@apache.org

Re: [ANN] XInclude processor for xml-commons

Posted by Stefano Mazzocchi <st...@apache.org>.

on 5/5/03 5:38 PM Neil Graham wrote:

> Hi Stefano,

Hi Neil,

thanks very much for answering this.

> Sorry about the delay; I was really hoping that Andy Clark (the "Father of
> XNI") would pick up this thread.  He'd do a much better job of contrasting
> XNI with SAX, and especially giving you the historical angle (I came on the
> scene a bit after XNI had gotten out of the crib, so by then the decision
> to break from SAX was already made).

 :-)

>>Can you tell us why SAX is not powerful enough for what you need?
> 
> 
> Well it's turned out to be pretty handy to be able to pass "augmentations"
> along with the events representing the infoset. 

Can you make an explicit example of this? that would help us understand
better.

> SAX doesn't have any kind
> of configuration-management infrastructure:  If you want to set up a
> pipeline of XMLFilters then you have to manage propagation of features and
> properties to the components that comprise the pipeline on your own; 

Hmmm, I see, but why do you pass such information thru the pipeline and
not as a pipeline context?

> XNI
> sorts this out.  SAX is also somewhat impoverished for people who care a
> great deal about the lexical layout of DTD's; there isn't much you can't
> get in this respect from XNI.  Although SAX 1.1 extentions (still in
> alpha!) could rectify this, as it stands it's not possible to determine the
> version or encoding of a document with "stable" SAX interfaces.

I see. But the very fact that I, for one, never noticed those
limitations might seem to advocate against such a move.

>>It does. But Cocoon is *entirely* built around SAX. Moving to XNI is an
>>incredibly difficult task. Nobody will do that just for the sake of it.
> 
> 
> I can understand that.
> 
> 
>>And since Cocoon almost never validates and cocoon already has xinclude
>>transformers built around SAX. I really don't see a need for such a
>>massive transition.
> 
> 
> I think my question revolved around Joerg's assertion that the spec thinks
> of XInclude processing being done before validation.  

I'm a follower of the JClark-ish school of thought that validation
should be orthogonal on respect of the infoset. I heard that this school
of thought is penetrating the validation circles at W3C, which is, IMO,
a good thing even if it moves the problem on the a pipeline definition
language and the XPipe note, well, it cries for abuse.

So, for now, what I personally advocate is to avoid the use of
infoset-messing validation as much as possible.

> If you don't do
> validation of any kind anyway, then clearly this won't disturb you; but if
> you did--or wanted to follow as closely to the spec as possible--then you'd
> be confronted by the fact that most SAX parsers have validation built in,
> not built as a separate module that you can drop an XInclude processor in
> front of.

Yes, that is the reason why I advocate not to validate at all using
infoset-messing schematas. In fact, for schema validation needs, I
advocate the use of a RelaxNG SAX filter and along with the cocoon
sitemap syntax, you can precisely describe how your pipeline behaves
(and you don't have those nasty pre/post-schema-infoset issues)

>>Now, please, help us understand: what are the differences between XNI
>>and SAX and what would we gain basing the entire cocoon architecture
>>around XNI events instead of SAX events?
> 
> 
> At this stage, I'm afraid the biggest pro might be that SAX just doesn't
> seem to be all that healthy these days; 

While I don't question your assertion, I think it might be misleading:
there are APIs which became solid out of lack of necessity to improve.
>From a parser-writer POV SAX might seem dead while from a non-validating
SAX user POV (as a cocoon developer is), SAX is just complete.

But I do see your point.

> I don't know if you lurk on its
> development lists, but they've been pretty much dead for quite some time...

No, I don't, nor ever felt the need to, to be honest.

> But I'll leave the hard job of selling XNI to Andy; it works well for us,
> but perhaps SAX is good enough for what you need it to do.  

Yes. I can only speak for myself (and I encouradge others to speak up if
I'm mistaken) but I never felt the need for something better. (I do have
some issues with the fact that SAX is lossy in respect of the original
whitespace between attributes and attribute order, but that's not an
issue for cocoon)

> Cocoon being
> the huge project it is, I certainly wouldn't blame you for needing some
> very solid reasons to migrate to a different pipeline framework!

Yes. we would need *incredibly* solid arguments to do such a transition
without risking a huge fork that would kill us. This is why I asked: in
all honesty, it's much easier for us to turn off parser validation
entirely and provide a Jing SAX filter than moving everything to XNI to
make internal Xerces filters into cocoon pipeline components.

And if the need ever emerged, we could use the XNI/SAX adapters that
already exist, like we do for Andy's HTML parser.

Anyway, thanks for anwering. There hasn't been much communication
between the cocoon and xerces communities but both are well funded on
xml-event-driven pipelines, even if seen from different points of view,
so any idea/suggestion/criticism exchange can only be a good thing for both.

Thanks.

-- 
Stefano.

Re: [ANN] XInclude processor for xml-commons

Posted by Stefano Mazzocchi <st...@apache.org>.

on 5/5/03 5:38 PM Neil Graham wrote:

> Hi Stefano,

Hi Neil,

thanks very much for answering this.

> Sorry about the delay; I was really hoping that Andy Clark (the "Father of
> XNI") would pick up this thread.  He'd do a much better job of contrasting
> XNI with SAX, and especially giving you the historical angle (I came on the
> scene a bit after XNI had gotten out of the crib, so by then the decision
> to break from SAX was already made).

 :-)

>>Can you tell us why SAX is not powerful enough for what you need?
> 
> 
> Well it's turned out to be pretty handy to be able to pass "augmentations"
> along with the events representing the infoset. 

Can you make an explicit example of this? that would help us understand
better.

> SAX doesn't have any kind
> of configuration-management infrastructure:  If you want to set up a
> pipeline of XMLFilters then you have to manage propagation of features and
> properties to the components that comprise the pipeline on your own; 

Hmmm, I see, but why do you pass such information thru the pipeline and
not as a pipeline context?

> XNI
> sorts this out.  SAX is also somewhat impoverished for people who care a
> great deal about the lexical layout of DTD's; there isn't much you can't
> get in this respect from XNI.  Although SAX 1.1 extentions (still in
> alpha!) could rectify this, as it stands it's not possible to determine the
> version or encoding of a document with "stable" SAX interfaces.

I see. But the very fact that I, for one, never noticed those
limitations might seem to advocate against such a move.

>>It does. But Cocoon is *entirely* built around SAX. Moving to XNI is an
>>incredibly difficult task. Nobody will do that just for the sake of it.
> 
> 
> I can understand that.
> 
> 
>>And since Cocoon almost never validates and cocoon already has xinclude
>>transformers built around SAX. I really don't see a need for such a
>>massive transition.
> 
> 
> I think my question revolved around Joerg's assertion that the spec thinks
> of XInclude processing being done before validation.  

I'm a follower of the JClark-ish school of thought that validation
should be orthogonal on respect of the infoset. I heard that this school
of thought is penetrating the validation circles at W3C, which is, IMO,
a good thing even if it moves the problem on the a pipeline definition
language and the XPipe note, well, it cries for abuse.

So, for now, what I personally advocate is to avoid the use of
infoset-messing validation as much as possible.

> If you don't do
> validation of any kind anyway, then clearly this won't disturb you; but if
> you did--or wanted to follow as closely to the spec as possible--then you'd
> be confronted by the fact that most SAX parsers have validation built in,
> not built as a separate module that you can drop an XInclude processor in
> front of.

Yes, that is the reason why I advocate not to validate at all using
infoset-messing schematas. In fact, for schema validation needs, I
advocate the use of a RelaxNG SAX filter and along with the cocoon
sitemap syntax, you can precisely describe how your pipeline behaves
(and you don't have those nasty pre/post-schema-infoset issues)

>>Now, please, help us understand: what are the differences between XNI
>>and SAX and what would we gain basing the entire cocoon architecture
>>around XNI events instead of SAX events?
> 
> 
> At this stage, I'm afraid the biggest pro might be that SAX just doesn't
> seem to be all that healthy these days; 

While I don't question your assertion, I think it might be misleading:
there are APIs which became solid out of lack of necessity to improve.
>From a parser-writer POV SAX might seem dead while from a non-validating
SAX user POV (as a cocoon developer is), SAX is just complete.

But I do see your point.

> I don't know if you lurk on its
> development lists, but they've been pretty much dead for quite some time...

No, I don't, nor ever felt the need to, to be honest.

> But I'll leave the hard job of selling XNI to Andy; it works well for us,
> but perhaps SAX is good enough for what you need it to do.  

Yes. I can only speak for myself (and I encouradge others to speak up if
I'm mistaken) but I never felt the need for something better. (I do have
some issues with the fact that SAX is lossy in respect of the original
whitespace between attributes and attribute order, but that's not an
issue for cocoon)

> Cocoon being
> the huge project it is, I certainly wouldn't blame you for needing some
> very solid reasons to migrate to a different pipeline framework!

Yes. we would need *incredibly* solid arguments to do such a transition
without risking a huge fork that would kill us. This is why I asked: in
all honesty, it's much easier for us to turn off parser validation
entirely and provide a Jing SAX filter than moving everything to XNI to
make internal Xerces filters into cocoon pipeline components.

And if the need ever emerged, we could use the XNI/SAX adapters that
already exist, like we do for Andy's HTML parser.

Anyway, thanks for anwering. There hasn't been much communication
between the cocoon and xerces communities but both are well funded on
xml-event-driven pipelines, even if seen from different points of view,
so any idea/suggestion/criticism exchange can only be a good thing for both.

Thanks.

-- 
Stefano.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: [ANN] XInclude processor for xml-commons

Posted by Stefano Mazzocchi <st...@apache.org>.

on 5/5/03 5:38 PM Neil Graham wrote:

> Hi Stefano,

Hi Neil,

thanks very much for answering this.

> Sorry about the delay; I was really hoping that Andy Clark (the "Father of
> XNI") would pick up this thread.  He'd do a much better job of contrasting
> XNI with SAX, and especially giving you the historical angle (I came on the
> scene a bit after XNI had gotten out of the crib, so by then the decision
> to break from SAX was already made).

 :-)

>>Can you tell us why SAX is not powerful enough for what you need?
> 
> 
> Well it's turned out to be pretty handy to be able to pass "augmentations"
> along with the events representing the infoset. 

Can you make an explicit example of this? that would help us understand
better.

> SAX doesn't have any kind
> of configuration-management infrastructure:  If you want to set up a
> pipeline of XMLFilters then you have to manage propagation of features and
> properties to the components that comprise the pipeline on your own; 

Hmmm, I see, but why do you pass such information thru the pipeline and
not as a pipeline context?

> XNI
> sorts this out.  SAX is also somewhat impoverished for people who care a
> great deal about the lexical layout of DTD's; there isn't much you can't
> get in this respect from XNI.  Although SAX 1.1 extentions (still in
> alpha!) could rectify this, as it stands it's not possible to determine the
> version or encoding of a document with "stable" SAX interfaces.

I see. But the very fact that I, for one, never noticed those
limitations might seem to advocate against such a move.

>>It does. But Cocoon is *entirely* built around SAX. Moving to XNI is an
>>incredibly difficult task. Nobody will do that just for the sake of it.
> 
> 
> I can understand that.
> 
> 
>>And since Cocoon almost never validates and cocoon already has xinclude
>>transformers built around SAX. I really don't see a need for such a
>>massive transition.
> 
> 
> I think my question revolved around Joerg's assertion that the spec thinks
> of XInclude processing being done before validation.  

I'm a follower of the JClark-ish school of thought that validation
should be orthogonal on respect of the infoset. I heard that this school
of thought is penetrating the validation circles at W3C, which is, IMO,
a good thing even if it moves the problem on the a pipeline definition
language and the XPipe note, well, it cries for abuse.

So, for now, what I personally advocate is to avoid the use of
infoset-messing validation as much as possible.

> If you don't do
> validation of any kind anyway, then clearly this won't disturb you; but if
> you did--or wanted to follow as closely to the spec as possible--then you'd
> be confronted by the fact that most SAX parsers have validation built in,
> not built as a separate module that you can drop an XInclude processor in
> front of.

Yes, that is the reason why I advocate not to validate at all using
infoset-messing schematas. In fact, for schema validation needs, I
advocate the use of a RelaxNG SAX filter and along with the cocoon
sitemap syntax, you can precisely describe how your pipeline behaves
(and you don't have those nasty pre/post-schema-infoset issues)

>>Now, please, help us understand: what are the differences between XNI
>>and SAX and what would we gain basing the entire cocoon architecture
>>around XNI events instead of SAX events?
> 
> 
> At this stage, I'm afraid the biggest pro might be that SAX just doesn't
> seem to be all that healthy these days; 

While I don't question your assertion, I think it might be misleading:
there are APIs which became solid out of lack of necessity to improve.
>From a parser-writer POV SAX might seem dead while from a non-validating
SAX user POV (as a cocoon developer is), SAX is just complete.

But I do see your point.

> I don't know if you lurk on its
> development lists, but they've been pretty much dead for quite some time...

No, I don't, nor ever felt the need to, to be honest.

> But I'll leave the hard job of selling XNI to Andy; it works well for us,
> but perhaps SAX is good enough for what you need it to do.  

Yes. I can only speak for myself (and I encouradge others to speak up if
I'm mistaken) but I never felt the need for something better. (I do have
some issues with the fact that SAX is lossy in respect of the original
whitespace between attributes and attribute order, but that's not an
issue for cocoon)

> Cocoon being
> the huge project it is, I certainly wouldn't blame you for needing some
> very solid reasons to migrate to a different pipeline framework!

Yes. we would need *incredibly* solid arguments to do such a transition
without risking a huge fork that would kill us. This is why I asked: in
all honesty, it's much easier for us to turn off parser validation
entirely and provide a Jing SAX filter than moving everything to XNI to
make internal Xerces filters into cocoon pipeline components.

And if the need ever emerged, we could use the XNI/SAX adapters that
already exist, like we do for Andy's HTML parser.

Anyway, thanks for anwering. There hasn't been much communication
between the cocoon and xerces communities but both are well funded on
xml-event-driven pipelines, even if seen from different points of view,
so any idea/suggestion/criticism exchange can only be a good thing for both.

Thanks.

-- 
Stefano.

Re: [ANN] XInclude processor for xml-commons

Posted by "J.Pietschmann" <j3...@yahoo.de>.

Neil Graham wrote:
> I think my question revolved around Joerg's assertion that the spec thinks
> of XInclude processing being done before validation.

Sorry, I wrote
 >>> The REC itself is deliberately neutral.
XInclude is defined in terms of "merging infosets", and it is briefly
and non-normatively) discussed what this could mean if validation
occurred before or afterwards.

I personally see there are use cases for either case:
1. XInclude as a substitute for external entities for physically
   structuring documents. This is useful if the document should
   be validated against a schema, where entities are not organically
   available. Naturally, the post-XInclude infoset has to be
   validated.
2. Validation before XInclude. This could in turn mean the including
   document is validated, or the included documents, all documents
   or arbitrary combinations. This use case happens for formats which
   are meant to aggregate content, for example imagine a portal where
   you aggregate vastly different content, like an article, stock
   quotes and ad streams. While each of the aggregated content may
   have an attached schema and therefore may be easy to vgalidate,
   the aggregated content may prove more diffucult in this respect
   (imagine if they use different schema languages).

Therefore parsers with an integrated XInclude stage (before validation)
are useful (remember libxml), but standalone XInclude processors make
sense as well.

J.Pietschmann