You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Elena Litani <el...@ca.ibm.com> on 2003/03/20 21:50:48 UTC

[PROPOSAL: XNI CHANGE]: entity resolution based on namespaces

While attempting to fix the encoding problem Neil raised, I believe we
should also address a flow in the XNI with regards to namespace
resolution.

As it stands today xni.XMLEntityResolver notifies user about system and
public ids present in a DTD. During XML Schema resolution Xerces uses
entity resolver's system id parameter to pass location of a schema file
specified in the xsi:schemaLocation attributes. 

However, given that the "xsi" attributes are just hints, sometimes
documents don't have any location attributes and in this case it is
important to be able to pass the namespace URI via the entity resolver
to allow user to provide schema document. 
By definition the XML Schema's <import>s don't have to specify the
location attributes either, so this is yet another case when a namespace
should be passed to a user. 
Finally, some Xerces users have tables (similar to XML Catalog approach)
to map a namespace to a schema document. This table is used to overwrite
the original schema in the document. To know which schema document to
pick, they need to receive namespace of the root element in the
EntityResolver.

As it currently works in XNI, to access namespace user would need to get
XMLResourceIdentifier (via XMLEntityResolver callback), cast it to
xni.grammars.XMLGrammarDescription, query the type of Grammar and if the
Grammar represents XML Schema, cast it to
xni.grammars.XMLSchemaDescription to retrieve a namespace.

Given the importance of the XML Namespaces, this seems like an overkill.

Thus, I suggest we fix XNI instead.

The PROPOSAL
------------
Adding 2 new methods to the xni.XMLResourceIdentifier:

public void setNamespace(String namespace);
public String getNamespace();

I believe this is unlikely to break anyone, since I doubt that any of
our users actually provide their own implementation of
XMLResourceIdentifier.

Thank you,
-- 
Elena Litani / IBM Toronto

PS: Note that another solution could have been to specify that namespace
is passed as publicId in resolveEntity() method. Unfortunately, for
years users have relied on the publicId to be present only if there is a
DTD grammar, thus if we are to make such a change in Xerces behavior,
the risk of breaking someone is extremely big.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: [PROPOSAL: XNI CHANGE]: entity resolution based on namespaces

Posted by Neeraj Bajaj <ne...@sun.com>.


Elena Litani wrote:

>Given the importance of the XML Namespaces, this seems like an overkill.
>
>Thus, I suggest we fix XNI instead.
>
>The PROPOSAL
>------------
>Adding 2 new methods to the xni.XMLResourceIdentifier:
>
>public void setNamespace(String namespace);
>public String getNamespace();
>  
>
+1. It's long standing need of the user community.


Neeraj



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: [PROPOSAL: XNI CHANGE]: entity resolution based on namespaces

Posted by Andy Clark <an...@apache.org>.

Elena Litani wrote:
> public void setNamespace(String namespace);
> public String getNamespace();

Given your arguments, this change makes sense. And I also
appreciate the fact that it doesn't break old applications.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: While we're discussing XNI changes...

Posted by Andy Clark <an...@apache.org>.

Joseph Kesselman wrote:
> I presume the Xerces team is already keeping an eye on this topic, since 
> some other parsers have implemented true Pull system. But I figured I'd 
> toss it out for brainstorming and let folks tell me where I'm mistaken... 
> <smile/>

I am the Apache representative to JSR-173 which is working
on an XML pull-parsing API for Java. Granted, I joined the
process rather late but I have been following (and thinking
about) pull-parsing for quite awhile. But I have concerns
about how such a beast would be implemented in Xerces.

There are a lot of nice properties to a pull-parsing API
in regard to the application developer. However, when trying
to make a modular, configurable parser around this paradigm,
you quickly run into problems. From what I see, the easiest
way around these problems is to make your parser a single,
monolithic component (as is the case in the existing pull-
parser implementations). Something we've tried to avoid in
Xerces2...

But as this paradigm increases in popularity, though, it
is clear that we'll need to put more thought and effort
into a native Xerces implementation.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

While we're discussing XNI changes...

Posted by Joseph Kesselman <ke...@us.ibm.com>.

Actually, while we're talking about alternatives to XNI... Events are 
wonderful for UI and other realtime-driven stuff... but I am starting 
conclude that  they're the wrong model for parsing. I'm becoming more and 
more interested in genuine pull-parser APIs (essentially, treat the parser 
as an iterator with a next-node operator that either yields an accessor 
object or IS an accessor object for the node's properties).

This approach has several benefits:

1) The iterator model is a lot easier to treat as a "tokenizer", 
simplifying its use in traditional recursive-descent grammars and the like 
where next-token requests may occur in multiple places.

2) The use of an accessor object allows more scope for "lazy" evaluation. 
We already get some of those benefits by passing the list of attributes as 
an object so they can simply be skipped over if they aren't examined, so 
there might not be a great deal of gain here -- EXCEPT in the case of 
serializing some other data representation; in that situation there might 
be significant advantages to not preparing the node name (for example) 
until it's called for. In some sense, this combines the advantages of the 
DOM approach with those of the event systems; it makes writing a 
thin-layer adapter much, much easier.

3) If someone really wants an event stream, it isn't hard to write a 
driver loop which pulls nodes from the iterator and generates events. It's 
much harder, as we've seen, to take an event-based system such as SAX or 
XNI and "throttle" it to yield one event at a time.

4) The pull approach can be generalized to cover processing models other 
than parsing-in-document-order. I'm investigating using something along 
these lines to implement the other XPath Axes.


Downside: Operating as an iterator would require that the parser save 
state between calls to next-node. On the other hand, in the event approach 
we're generally asking the application code to save state between events. 
I'm not convinced that the iterator approach involves more computation; it 
certainly seems to involve less coding effort for the user.



I presume the Xerces team is already keeping an eye on this topic, since 
some other parsers have implemented true Pull system. But I figured I'd 
toss it out for brainstorming and let folks tell me where I'm mistaken... 
<smile/>


______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more. 
"may'ron DaroQbe'chugh vaj bIrIQbej"  ("Put down the squeezebox and nobody 
gets hurt.")


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org