You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Jeremy Aston <je...@yahoo.co.uk> on 2002/09/08 04:52:29 UTC

Possible Entity Resolving Bug

Hi,

I have been doing some work with entity catlogs and have noticed some
interesting behaviour.  It appears system ids are being ignored if used with
a DOCTYPE declaration, only public ids are picked up.  I can successfully
map something like:

<!DOCTYPE person PUBLIC "-//PIGBITE//DTD Person V1.0//EN"
"http://www.pigbite.com/dtd/person.dtd
<http://www.pigbite.com/dtd/person.dtd> ">

using an entry like

PUBLIC "-//PIGBITE//DTD Person V1.0//EN" "dtd/person.dtd"

But 

<!DOCTYPE person "http://www.pigbite.com/dtd/person.dtd
<http://www.pigbite.com/dtd/person.dtd> ">

does not get mapped even if 

SYSTEM http://www.pigbite.com/dtd/person.dtd
<http://www.pigbite.com/dtd/person.dtd> " "dtd/person.dtd"

is in the local catalog.

I've noted that if a systemId is used to reference an entity - e.g. 

<!ENTITY jez SYSTEM "http://www.pigbite.com/dtd/jez.txt
<http://www.pigbite.com/dtd/jez.txt> ">

and a corresponding map is in the catalog file:

SYSTEM "http://www.pigbite.com/dtd/jez.txt
<http://www.pigbite.com/dtd/jez.txt> "    "jez.txt"

then this is resolved correctly.

This does not appear to be a fundamental catalog problem.  I modified the
entity catalog tests to test the above scenarios and everything worked.
Interestingly the tests force the public Id and systemId args in the
resolveEntity() method call, however further checks what gets passed to
resolveEntity in org.apache.cocoon.components.resolver.ResolverImpl show
that in the case of DOCTYPE the systemId arg is always NULL, regardless of
if a systemId is present either on it's own or along with a publicid.  In
the case of an ENTITY declaration the systemid and the publicid (if present)
are always passed correctly.  It would appear that which ever bit of the doc
parser picks up the entity references, it is not doing it as I would have
expected.

The catalog samples work fine, mainly because all the examples follow the
above rules.  I know the local catalog is being loaded and have tested that
using the xml-commons tools, my modified cocoon test build and the fact that
the resolution does take place if the "right" rules are followed.  Trawling
the lists I noted a previous message along similar lines that had no
response, and I could not find anything else.  Is this behaviour correct or
is it a bug?  I've looked at several other sites, including the OASIS spec
and nothing seems to shed any light on it (other than implying that it
should work as expected) so If someone can advise I can either stop trying
something that is not meant to happen or raise a bugzilla and do some more
debugging.

FYI I am using 2.0.3, JDK1.4, JAXP.  I have not yet tried it against the
current HEAD code or using the XML catalog format (although I am sure the
catalog itself is fine)

TIA

jez



RE: Possible Entity Resolving Bug

Posted by David Crossley <cr...@indexgeo.com.au>.
Jeremy Aston wrote:
>
> Thanks for the response.  I'm glad it's not just me and my ever expanding
> understanding of entity catalogs that have gone loopy!  For the sake of
> brevity I'm cutting the previous stuff and just responding to this bit....
>
> >> This does not appear to be a fundamental catalog problem.  I modified the
> >> entity catalog tests to test the above scenarios and everything worked.
> 
> David Crossley wrote:
> >I am not sure what you mean here. Are you saying that you
> >can get the SystemId resolved via "./build.sh test" whereas
> >it will not work via "./build.sh docs"?
> 
> Basically I modified the ResolverImplTestCase class to add in some tests
> that would establish if a system id could be resolved if there was no public
> id passed along with it.  The two existing test cases just test for
> available and non available entities and both supply a public id.  My tests
> check a HTTP, URN and file URIs, each of them having a match in the catalog
> to the same s.o.i.  None of the tests supply a public id.  As an example:
> 
<snip good test-case example/>
> 
> The catalog entry (in the test class) for this is:
> 
> "SYSTEM \"urn:x-pigbite:person\"                  \"person.dtd\"\n" +
> 
> Add in the dtd to the test folder, run build test and everything works fine.
> What this is proving that IF you call (for example) resolveEntity( null,
> "urn:x-pigbite:person" ); then this can be resolved OK.  A simple bit of
> debug in ResolverImpl.resolveEntity() method proves that when attempting to
> resolve a DOCTYPE SYSTEM identifer it is ALWAYS passed as null (regardless
> of if a public id exists or not), yet when using a system identifier with
> ENTITY it will get passed through and this can be resolved.
> 
> My feeling is that the xml-commons code is fine (the tests prove it), they
> are simply getting passed incorrect data.  Question is, is this Cocoon or
> parser specific?  I am more than happy to chase this down, so - if you
> concur - I will raise a bug and get on to it.

Please do. It would be grand if you could also add a patch
to get your additional test cases into CVS. Cocoon is
generally lacking in that area.

> I have not noticed any problems running build docs but you have made me
> curious as to if there are any issues there!

You do not see problems there because nothing in the
build docs will exercise the bug. Try changing the
document type declaration on one of the xdocs instances
to use a systemID and it will break. That is how i
tested your hypothesis.

> Do you have anything in mind
> that might be wrong?

No. Let us start with your new test cases and then see
what is different to ResolverImpl.java

Aha. I just checked Xerces-J release notes and found
something relevant.
http://xml.apache.org/xerces2-j/releases.html
----
2.0.1
Fixed an entity resolution bug: we passed null as the
system ID to the entity resolver.  [Sandy Gao]
----

I see that Cocoon-2.0.3 (your bug-report branch) is
still using Xerces-2.0.0

I have not yet had time to try Cocoon-2.1 to see if
this same issue exists there. I do see that Carsten
upgraded head CVS to Xerces-2.1 last week, so perhaps
the bug is gone there.

> If I remember correctly, docs is the only bit where
> validation is forced...

It used to do validation. Unfortunately we had to
switch it off due to an obscure bug ...
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6200
Parser failure with validate=true when processing stylesheet

> Thanks for the xml-commons pointers btw - in the process of doing this
> searching I cottoned on to that earlier this weekend so if it looks like the
> scope is moving outside of Cocoon then Mr Walsh et al will be hearing from
> me ;-).

We should first explore the Cocoon situation. Unfortunately,
i have no time, so am pleased that you see the importance of
getting this fixed. I will try to help out.
--David



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Possible Entity Resolving Bug

Posted by Jeremy Aston <je...@yahoo.co.uk>.
People may recall that I raised a possible entity resolving bug in 2.0.3
where if you specified a public id and system id and then wanted to use a
catalog for entity resolution you would never see the system id.  Well, I
have finally got round to do some more checking and thanks to some
suggestions by Vadim and David I have tracked the problem down to a known
bug in Xerces 2.0.0 ( bugzilla
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6138 ).

The problem was fixed some time ago in 2.0.1 and above by a mod to
org.apache.xerces.impl.XMLEntityManager.

Hope that helps anyone who may have experienced the same problems as me.

Regards

Jez




__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Possible Entity Resolving Bug

Posted by Jeremy Aston <je...@yahoo.co.uk>.
Hi David,

Thanks for the response.  I'm glad it's not just me and my ever expanding
understanding of entity catalogs that have gone loopy!  For the sake of
brevity I'm cutting the previous stuff and just responding to this bit....

>> This does not appear to be a fundamental catalog problem.  I modified the
>> entity catalog tests to test the above scenarios and everything worked.

>I am not sure what you mean here. Are you saying that you
>can get the SystemId resolved via "./build.sh test" whereas
>it will not work via "./build.sh docs"?

Basically I modified the ResolverImplTestCase class to add in some tests
that would establish if a system id could be resolved if there was no public
id passed along with it.  The two existing test cases just test for
available and non available entities and both supply a public id.  My tests
check a HTTP, URN and file URIs, each of them having a match in the catalog
to the same s.o.i.  None of the tests supply a public id.  As an example:

    /**
     * JUnit test case:
     * Ask for an entity using a systemId, expressed as a URN
     *
     * @exception  Exception  Description of Exception
     * @since
     */
    public void testResolveURNSystemIdentity() throws Exception {
        assertNotNull("ResolverImpl is null", resolverImpl);

        String public_id;
        String system_id;
        InputSource is;
        public_id = null;
        system_id = "urn:x-pigbite:person";
        is = resolverImpl.resolveEntity(public_id, system_id);
        assertNotNull("InputSource is null for " +
                "'" + public_id + "'" + ", " +
                "'" + system_id + "'", is);

        // close the entity stream, otherwise removing it will fail
        // (note that normally the parser would handle this)
        java.io.Reader entity_r = is.getCharacterStream();
        if (entity_r != null) {
            entity_r.close();
        }
        java.io.InputStream entity_is = is.getByteStream();
        if (entity_is != null) {
            entity_is.close();
        }
        is = null;
    }


The catalog entry (in the test class) for this is:

"SYSTEM \"urn:x-pigbite:person\"                  \"person.dtd\"\n" +

Add in the dtd to the test folder, run build test and everything works fine.
What this is proving that IF you call (for example) resolveEntity( null,
"urn:x-pigbite:person" ); then this can be resolved OK.  A simple bit of
debug in ResolverImpl.resolveEntity() method proves that when attempting to
resolve a DOCTYPE SYSTEM identifer it is ALWAYS passed as null (regardless
of if a public id exists or not), yet when using a system identifier with
ENTITY it will get passed through and this can be resolved.

My feeling is that the xml-commons code is fine (the tests prove it), they
are simply getting passed incorrect data.  Question is, is this Cocoon or
parser specific?  I am more than happy to chase this down, so - if you
concur - I will raise a bug and get on to it.

I have not noticed any problems running build docs but you have made me
curious as to if there are any issues there!  Do you have anything in mind
that might be wrong?  If I remember correctly, docs is the only bit where
validation is forced...

Thanks for the xml-commons pointers btw - in the process of doing this
searching I cottoned on to that earlier this weekend so if it looks like the
scope is moving outside of Cocoon then Mr Walsh et al will be hearing from
me ;-).

Regards

Jeremy




__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Possible Entity Resolving Bug

Posted by David Crossley <cr...@indexgeo.com.au>.
Jeremy Aston wrote:
> Hi,
> 
> I have been doing some work with entity catlogs and have noticed some
> interesting behaviour.  It appears system ids are being ignored if used with
> a DOCTYPE declaration, only public ids are picked up.  I can successfully
> map something like:
> 
> <!DOCTYPE person PUBLIC "-//PIGBITE//DTD Person V1.0//EN"
> "http://www.pigbite.com/dtd/person.dtd
> <http://www.pigbite.com/dtd/person.dtd> ">
> 
> using an entry like
> 
> PUBLIC "-//PIGBITE//DTD Person V1.0//EN" "dtd/person.dtd"
> 
> But 
> 
> <!DOCTYPE person "http://www.pigbite.com/dtd/person.dtd
> <http://www.pigbite.com/dtd/person.dtd> ">
> 
> does not get mapped even if 
> 
> SYSTEM http://www.pigbite.com/dtd/person.dtd
> <http://www.pigbite.com/dtd/person.dtd> " "dtd/person.dtd"
> 
> is in the local catalog.

I just did some tests and i can confirm that there is
something amiss. (Careful, the http... bits in your
examples are getting mangled by our mail readers...)
----
In the XML instance ...
<!DOCTYPE person SYSTEM "missing-person.dtd">

In the OASIS catalog ...
SYSTEM "missing-person.dtd" "person.dtd"
----
Yes, it does fail to resolve the systemId.

> I've noted that if a systemId is used to reference an entity - e.g. 
> 
> <!ENTITY jez SYSTEM "http://www.pigbite.com/dtd/jez.txt
> <http://www.pigbite.com/dtd/jez.txt> ">
> 
> and a corresponding map is in the catalog file:
> 
> SYSTEM "http://www.pigbite.com/dtd/jez.txt
> <http://www.pigbite.com/dtd/jez.txt> "    "jez.txt"
> 
> then this is resolved correctly.

Agreed. The Cocoon Sample catalog-demo declares an entity
this way and all is well.

> This does not appear to be a fundamental catalog problem.  I modified the
> entity catalog tests to test the above scenarios and everything worked.

I am not sure what you mean here. Are you saying that you
can get the SystemId resolved via "./build.sh test" whereas
it will not work via "./build.sh docs"?

> Interestingly the tests force the public Id and systemId args in the
> resolveEntity() method call, however further checks what gets passed to
> resolveEntity in org.apache.cocoon.components.resolver.ResolverImpl show
> that in the case of DOCTYPE the systemId arg is always NULL, regardless of
> if a systemId is present either on it's own or along with a publicid.  In
> the case of an ENTITY declaration the systemid and the publicid (if present)
> are always passed correctly.  It would appear that which ever bit of the doc
> parser picks up the entity references, it is not doing it as I would have
> expected.
> 
> The catalog samples work fine, mainly because all the examples follow the
> above rules.  I know the local catalog is being loaded and have tested that
> using the xml-commons tools, my modified cocoon test build and the fact that
> the resolution does take place if the "right" rules are followed.  Trawling
> the lists I noted a previous message along similar lines that had no
> response, and I could not find anything else.  Is this behaviour correct or
> is it a bug?  I've looked at several other sites, including the OASIS spec
> and nothing seems to shed any light on it (other than implying that it
> should work as expected) so If someone can advise I can either stop trying
> something that is not meant to happen or raise a bugzilla and do some more
> debugging.

Please do explore it further. The problem may be in
Cocoon's implementation of the entity resolver.

The actual resolver development can be discussed
on the xml-commons-dev mailing list.
http://xml.apache.org/commons/
http://marc.theaimsgroup.com/?l=xml-commons-dev

> FYI I am using 2.0.3, JDK1.4, JAXP.  I have not yet tried it against the
> current HEAD code or using the XML catalog format (although I am sure the
> catalog itself is fine)

I have not yet tried against HEAD, though i expect that it
will exhibit the same behaviour.
--David



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org