You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Aleksandar Milanovic <am...@galdosinc.com> on 2002/09/24 22:12:18 UTC

sorting out prefixes

Hi All,

Our app must ensure that all the incoming XML data uses the same namespace
bindings as those used in the data already persisted (i.e. the prefixes must
match). In addition, it is required that all namespace declarations be on
the document element. Perhaps I should mention that we are well aware that
the XML Schema spec. makes the choice of prefixes arbitrary and the location
of namespace declarations flexible, but we have our reasons for doing this
(mainly for simplicity of data processing).

We've implemented a "brute-force" algorithm for setting the prefixes to
desired values, but it turned out, not surprisingly, that on very large
documents (MB) it was too slow. We're currently exploring the two
optimization techniques listed below. I'd appreciate if you'd comment on
them and even suggest alternatives:

1. Make use of Xerces symbol table and then replace all string comparisons
by reference comparisons. Since prefix setting involves a large amount of
string comparisons, it is expected that this should significantly improve
performance. It is not clear to us, however, if Xerces uses a symbol table
by default, and if so, how it can be accessed?

2. Adjust the prefixes at parsing time by using our own content handlers.
This might be a more efficient solution, but also a bit more difficult to
implement.


Thanks,
Alex


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

RE: sorting out prefixes

Posted by Joseph Kesselman <ke...@us.ibm.com>.

Resolving colliding prefix names -- and doing reasonable things when 
prefix declarations may not exist at all -- is what the DOM3 namespace 
resolution and normalization algorithms are all about. 

What they don't do is merge/discard declarations. Note that it is _NOT_ 
always a good idea to merge prefixes if the user went to the trouble of 
asserting them separately, since the fact that some specs (XPath being one 
of them, _grrrrrr_) use prefixes in text content,  where there's no 
reliable way to find and fix them.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

RE: sorting out prefixes

Posted by Aleksandar Milanovic <am...@galdosinc.com>.

Thanks, I will have a look at it.

We're not really dealing with non-namespace-aware tools, but merely trying
to "simplify" the use of namespaces. The data that is persisted is also
returned to client applications on request. The level of sophistication of
XML processing of these clients is not always high. Also, internally, we
need to merge the incoming data with the existing data. Problems arise when
prefix names collide, which is often the case with the default namespace
binding. Of course, we can ensure that there is no prefix collision by
putting namespace declarations all over the place, but we're trying to avoid
this because it inflates the document size and it is not obvious if client
apps would be able to process it.

Alex

> -----Original Message-----
> From: Joseph Kesselman [mailto:keshlam@us.ibm.com]
> Sent: Tuesday, September 24, 2002 1:15 PM
> To: xerces-j-user@xml.apache.org
> Subject: Re: sorting out prefixes
>
>
> Best suggestion I can offer is to look at the sample algorithms for
> namespace normalization in the DOM Level 3 working draft, and adapt those
> for your needs, discarding declarations for namespaces already in scope
> and regenerating prefixes appropriately.
>
> Personal opinion: You're going the wrong direction. Rather than catering
> to broken, non-namespace-aware tools, fix those tools.
>
> ______________________________________
> Joe Kesselman  / IBM Research
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: sorting out prefixes

Posted by Joseph Kesselman <ke...@us.ibm.com>.

Best suggestion I can offer is to look at the sample algorithms for 
namespace normalization in the DOM Level 3 working draft, and adapt those 
for your needs, discarding declarations for namespaces already in scope 
and regenerating prefixes appropriately.

Personal opinion: You're going the wrong direction. Rather than catering 
to broken, non-namespace-aware tools, fix those tools.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org