You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Norman Walsh <nd...@nwalsh.com> on 2000/08/07 10:24:34 UTC

Re: entityResolver problem in readers/DefaultEntityHandler.java (and patch)

/ Andy Clark <an...@apache.org> was heard to say:
| Norman Walsh wrote:
[...]

(Sorry I didn't reply sooner, things got busy and I did some travelling.)

| > If the entity in fSource contains external parameter entities 
| > with relative system identifiers, on the next pass through this 
| > routine, the rs.systemId will still be the old, pre-resolver 
| > fSystemId and will not correctly reflect the location of the 
| > fSource. That's bad.
| 
| I don't think it's bad because a document should NOT know that
| a system identifier it is using is going to be resolved somewhere
| else and make all subsequent relative URIs based on the resolved
| URI.

Documents don't know anything, this is strictly a parser issue.
RFC2396 makes it clear that processors *are* supposed to know when
redirection has occurred:

  5.1.3. Base URI from the Retrieval URI

     [...] Note that if the retrieval was the
     result of a redirected request, the last URI used (i.e., that which
     resulted in the actual retrieval of the document) is the base URI.

And Xerces doesn't get this right yet either, though my patches don't
seem to fix this. Try to validate this document with a Xerces parser:

  <!DOCTYPE para SYSTEM "http://nwalsh.com/cgi-bin/dtdmoved">
  <para>foo</para>

| In short, if one identifier is resolved, then all subsequent
| relative URIs should also be resolved by the entity resolver.

The entity resolver never sees relative URIs.

| Or
| just always use absolute URIs in your documents. I favor the
| latter approach.

In light of the fact that the parser *is* supposed to understand some
forms of redirection, and because the entity resolver can be
considered another form of redirection, and because any other behavior
on the part of the parser makes it impossible to use URI schemes that
are not hierarchical to refer to the top-level of a (significant class
of) documents or DTDs, I'm forced to favor the former approach.

Since this is not strictly speaking a Xerces issue, I've started a
similar thread on xml-dev, in the hopes of getting community consensus
on what the right approach is.

I feel strongly that entity resolvers must be allowed to be viewed by
the parser as redirection, because I feel that any other view breaks the
web in fundamental ways.

If the entity resolver wants to substitute one resource for another
*without* implying redirection, it can simply return the InputSource
with the same systemIdentifier that it started with. But if it changes
the systemIdentifer, *it has performed redirection*. IMHO.

                                        Be seeing you,
                                          norm

-- 
Norman.Walsh@East.Sun.COM | Life is an irritation--Tucker Case
XML Technology Center     | (Christopher Moore)
Sun Microsystems, Inc.    |