You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Steven Noels <st...@outerthought.org> on 2002/01/22 15:13:12 UTC

entity resolver woes

Hi all,

Has anyone experienced problems with the Resolver when simply generating
& immediately (re-)serializing an XML document, when this document
refers to a PUBLIC DTD which includes a number of external parameter
entities.

>From what i can see now, the comments inside the external entities are
inserted as-is in the stream, unfortunately not inside a local
declaration subset but immediately after the SYSTEM id:

C:\bin\win32>nc localhost 8080
GET /hagal/content/courses/bloblo.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//Outerthought//DTD Site document//EN"
"document.dtd"<!--
 The rules attribute defines which rules to draw between cells:
	[...] (snipped a lot of comments originating from the external
parameter entities)
     ============================================================== -->>
<document>
<info>
    <title>Empty document</title><displaytitle>empty</displaytitle>
</info>
<body><topic><para>blablabla</para></topic>
</body></document>

As you can see, the comments from inside the referred entities are not
put inside a local declaration subset:

<!DOCTYPE foo PUBLIC "pub-id" "system-id" [
<!-- comments should be inserted here I suppose -->
]>
<document>
[etc]
</document>

If this is a bug, it is a Resolver bug, I guess.

Steven Noels
http://outerthought.org/
(+32)478 292900


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: entity resolver woes

Posted by Steven Noels <st...@outerthought.org>.
OK, getting back from the trenches...

I plugged in the logtransformer and got the resulting saxstream dump
(attached, sorry)

I see a lot of events about entities & comments being reported which are
not part of the document, but of the external parameter entities called
from within the DTD

the browser output is also attached - clearly invalid XML...

Regards,

</Steven>

> -----Original Message-----
> From: Steven Noels [mailto:stevenn@outerthought.org]
> Sent: dinsdag 22 januari 2002 16:54
> To: cocoon-dev@xml.apache.org
> Cc: ndw@nwalsh.com
> Subject: RE: entity resolver woes
>
>
> Getting back to cocoon-dev land:
>
> I did some further testing with some input from Michael...
>
> As I feared, putting an identity transformation stylesheet in
> between my
> pipeline, my problem was solved.
>
> I've been browsing some code to find the difference in handling SAX
> events between the AbstractTextSerializer and AbstractXMLPipe - but I
> could use a couple of extra eyes :-)
>
> My wild assumptions are that:
>
> 1) or this is a Resolver bug, i.e. it incorrectly handles comments
> inside external entities and inserts them incorrectly in the output
> stream
> 2) it throws some (weird?) SAX events (cfr 1) that the Serializer is
> unable to handle, but the XMLPipe is handling correctly (based on my
> understanding that XMLPipe is indeed the event-forwarder)
> 3) I'm touching grounds way to difficult for my weary brain to handle
> :-)
>
> Regards,
>
> </Steven>
>
> > -----Original Message-----
> > From: David Crossley [mailto:crossley@indexgeo.com.au]
> > Sent: dinsdag 22 januari 2002 15:39
> > To: cocoon-dev@xml.apache.org
> > Subject: Re: entity resolver woes
> >
> >
> > Michael Hartle wrote:
> > > Steven Noels wrote:
> > > >Has anyone experienced problems with the Resolver when
> > simply generating
> > > >& immediately (re-)serializing an XML document, when
> this document
> > > >refers to a PUBLIC DTD which includes a number of external
> > parameter
> > > >entities.
> > > >
> > > >From what i can see now, the comments inside the external
> > entities are
> > > >inserted as-is in the stream, unfortunately not inside a local
> > > >declaration subset but immediately after the SYSTEM id:
> > >
> > > <snip/>
> > >
> > > I recently had the same problem when I made a pipeline
> > which returns the
> > > DTD-conforming main content.xml file from OpenOffice zip archives;
> > > trying to aggregate several content.xml from another Cocoon
> > server via
> > > HTTP, the parser told me that the content.xml files are not
> > well-formed
> > > anymore, thus showing exactly the same resolution behaviour as you
> > > described. IIRC, there has been a thread not long ago
> where someone
> > > (maybe David Crossley) posted that this is resolver and/or
> > > parser-related, but I am not sure.
> >
> > No i do not know anything about this and do not recall such a
> > thread (but hey, there are so many :-)
> >
> > I will be doing some paid work with Cocoon tomorrow on files
> > that match Steven's description. So i will try to reproduce the
> > behaviour. I do recall once seeing XML comments from a DTD
> > being passed into the stream. But it did not cause the issue
> > that you reported.
> > --David
> >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > For additional commands, email: cocoon-dev-help@xml.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org

RE: entity resolver woes

Posted by Steven Noels <st...@outerthought.org>.
Getting back to cocoon-dev land:

I did some further testing with some input from Michael...

As I feared, putting an identity transformation stylesheet in between my
pipeline, my problem was solved.

I've been browsing some code to find the difference in handling SAX
events between the AbstractTextSerializer and AbstractXMLPipe - but I
could use a couple of extra eyes :-)

My wild assumptions are that:

1) or this is a Resolver bug, i.e. it incorrectly handles comments
inside external entities and inserts them incorrectly in the output
stream
2) it throws some (weird?) SAX events (cfr 1) that the Serializer is
unable to handle, but the XMLPipe is handling correctly (based on my
understanding that XMLPipe is indeed the event-forwarder)
3) I'm touching grounds way to difficult for my weary brain to handle
:-)

Regards,

</Steven>

> -----Original Message-----
> From: David Crossley [mailto:crossley@indexgeo.com.au]
> Sent: dinsdag 22 januari 2002 15:39
> To: cocoon-dev@xml.apache.org
> Subject: Re: entity resolver woes
>
>
> Michael Hartle wrote:
> > Steven Noels wrote:
> > >Has anyone experienced problems with the Resolver when
> simply generating
> > >& immediately (re-)serializing an XML document, when this document
> > >refers to a PUBLIC DTD which includes a number of external
> parameter
> > >entities.
> > >
> > >From what i can see now, the comments inside the external
> entities are
> > >inserted as-is in the stream, unfortunately not inside a local
> > >declaration subset but immediately after the SYSTEM id:
> >
> > <snip/>
> >
> > I recently had the same problem when I made a pipeline
> which returns the
> > DTD-conforming main content.xml file from OpenOffice zip archives;
> > trying to aggregate several content.xml from another Cocoon
> server via
> > HTTP, the parser told me that the content.xml files are not
> well-formed
> > anymore, thus showing exactly the same resolution behaviour as you
> > described. IIRC, there has been a thread not long ago where someone
> > (maybe David Crossley) posted that this is resolver and/or
> > parser-related, but I am not sure.
>
> No i do not know anything about this and do not recall such a
> thread (but hey, there are so many :-)
>
> I will be doing some paid work with Cocoon tomorrow on files
> that match Steven's description. So i will try to reproduce the
> behaviour. I do recall once seeing XML comments from a DTD
> being passed into the stream. But it did not cause the issue
> that you reported.
> --David
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: entity resolver woes

Posted by David Crossley <cr...@indexgeo.com.au>.
Michael Hartle wrote:
> Steven Noels wrote:
> >Has anyone experienced problems with the Resolver when simply generating
> >& immediately (re-)serializing an XML document, when this document
> >refers to a PUBLIC DTD which includes a number of external parameter
> >entities.
> >
> >From what i can see now, the comments inside the external entities are
> >inserted as-is in the stream, unfortunately not inside a local
> >declaration subset but immediately after the SYSTEM id:
> 
> <snip/>
>
> I recently had the same problem when I made a pipeline which returns the 
> DTD-conforming main content.xml file from OpenOffice zip archives; 
> trying to aggregate several content.xml from another Cocoon server via 
> HTTP, the parser told me that the content.xml files are not well-formed 
> anymore, thus showing exactly the same resolution behaviour as you 
> described. IIRC, there has been a thread not long ago where someone 
> (maybe David Crossley) posted that this is resolver and/or 
> parser-related, but I am not sure.

No i do not know anything about this and do not recall such a
thread (but hey, there are so many :-)

I will be doing some paid work with Cocoon tomorrow on files
that match Steven's description. So i will try to reproduce the
behaviour. I do recall once seeing XML comments from a DTD
being passed into the stream. But it did not cause the issue
that you reported.
--David
 




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: entity resolver woes

Posted by Steven Noels <st...@outerthought.org>.
> -----Original Message-----
> From: Michael Hartle [mailto:mhartle@hartle-klug.com]
> Sent: dinsdag 22 januari 2002 15:22
> To: cocoon-dev@xml.apache.org
> Subject: Re: entity resolver woes
>
>
> Steven Noels wrote:
>
> >Hi all,
> >
> >Has anyone experienced problems with the Resolver when
> simply generating
> >& immediately (re-)serializing an XML document, when this document
> >refers to a PUBLIC DTD which includes a number of external parameter
> >entities.
> >
> >>From what i can see now, the comments inside the external
> entities are
> >inserted as-is in the stream, unfortunately not inside a local
> >declaration subset but immediately after the SYSTEM id:
> >
> I recently had the same problem when I made a pipeline which
> returns the
> DTD-conforming main content.xml file from OpenOffice zip archives;
> trying to aggregate several content.xml from another Cocoon
> server via
> HTTP, the parser told me that the content.xml files are not
> well-formed
> anymore, thus showing exactly the same resolution behaviour as you
> described. IIRC, there has been a thread not long ago where someone
> (maybe David Crossley) posted that this is resolver and/or
> parser-related, but I am not sure.

Oh well, seems like a bug indeed.

Since the Resolver is responsible of replacing references with their
respective entities, I guess it is a Resolver issue then...?

Norm, any ideas? I've double-checked the XML spec and comments should be
inside the local subset, too.

Regards,

Steven Noels
http://outerthought.org/
(+32)478 292900


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: entity resolver woes

Posted by Michael Hartle <mh...@hartle-klug.com>.
Steven Noels wrote:

>Hi all,
>
>Has anyone experienced problems with the Resolver when simply generating
>& immediately (re-)serializing an XML document, when this document
>refers to a PUBLIC DTD which includes a number of external parameter
>entities.
>
>>>From what i can see now, the comments inside the external entities are
>inserted as-is in the stream, unfortunately not inside a local
>declaration subset but immediately after the SYSTEM id:
>
I recently had the same problem when I made a pipeline which returns the 
DTD-conforming main content.xml file from OpenOffice zip archives; 
trying to aggregate several content.xml from another Cocoon server via 
HTTP, the parser told me that the content.xml files are not well-formed 
anymore, thus showing exactly the same resolution behaviour as you 
described. IIRC, there has been a thread not long ago where someone 
(maybe David Crossley) posted that this is resolver and/or 
parser-related, but I am not sure.

Best regards,

Michael Hartle,
Hartle & Klug GbR


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org