You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Jean Georges PERRIN <jg...@jgp.net> on 2004/02/19 09:30:27 UTC
Parser to solve XHTML errors?
Hi,
I have a fairly complex problem to describe with Xerces-J 2.5.0. I'll try to
make it simple.
My goal is to parse XHTML files. I parse them ok but my issues are with not
compliant XHTML files such as the ones with <input ...> instead of <input
.../>.
I have rebuilt the Xerces library in eclipse.
I have made a small test in a test environment, and it looks like Xerces is
able to correct such xml errors. I was surprised. However, some features are
failing to be set such as
http://apache.org/xml/features/validation/schema/augment-psvi,
http://xml.org/sax/features/external-general-entities...
But the parse process is running ok. And it solves my XHTML mistakes...
Now, when I use this Xerces jar in Tomcat (I have removed all other
reference to any XML parser), the features are set correctly but the
document fails to be parsed.
I simply do not understand why it is different in the command line and in
tomcat. I use an explicit call to create a new
org.apache.xerces.parsers.DOMParser:
parser = new org.apache.xerces.parsers.DOMParser();
When I use a library from the binary distribution (2.4.0, 2.5.0 and 2.6.1)
it fails in both cases.
Is there some Sun XML parser that is overriding my use of Xerces (like the
built-in version of Xalan)?
Is there a way to know the location of the JAR files the program is using? I
am pretty sure that I am using the right classpath (otherwise it fails) but
who knows...
Any help / hint will be really appreciated...
TIA
Jean Georges
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
RE: Parser to solve XHTML errors?
Posted by Jean Georges PERRIN <jg...@jgp.net>.
Strangely enough, that's how it behaves...
I can't understand why & how!
I have replaced with the distribution binaries from 2.4.0, 2.5.0 and 2.6.1,
it fails (normal).
I use the Xerces 2.5.0 build I have done with eclipse (w/o ant), it works...
This thing is really getting me mad!
Jean Georges
> -----Original Message-----
> From: Andy Clark [mailto:andyc@apache.org]
> Sent: Thursday, February 19, 2004 19:25
> To: xerces-j-user@xml.apache.org
> Subject: Re: Parser to solve XHTML errors?
>
> Jean Georges PERRIN wrote:
> > I have made a small test in a test environment, and it looks like Xerces
> is
> > able to correct such xml errors. I was surprised. However, some features
> are
>
> Assuming that the input documents are erroneous as you say,
> I don't see how Xerces could parse them correctly. For
> example, the following XML code:
>
> <body>
> <img src='me.gif'>
> <p>Hello there</p>
> </body>
>
> can be parsed up to the end body tag. It's at that point
> that the end tag it sees does not equal the currently open
> tag, <img>. Since img did not end properly for XML (e.g.
> <img ... />), the following paragraph would appear as a
> child of the img tag. However, it will fail absolutely
> due to a well-formedness constraint when it hits the end
> body tag in the example.
>
> If your documents can be parsed with Xerces I would be
> surprised and it would mean there is a bug in Xerces.
>
> --
> Andy Clark * andyc@apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: Parser to solve XHTML errors?
Posted by Andy Clark <an...@apache.org>.
Jean Georges PERRIN wrote:
> I have made a small test in a test environment, and it looks like Xerces is
> able to correct such xml errors. I was surprised. However, some features are
Assuming that the input documents are erroneous as you say,
I don't see how Xerces could parse them correctly. For
example, the following XML code:
<body>
<img src='me.gif'>
<p>Hello there</p>
</body>
can be parsed up to the end body tag. It's at that point
that the end tag it sees does not equal the currently open
tag, <img>. Since img did not end properly for XML (e.g.
<img ... />), the following paragraph would appear as a
child of the img tag. However, it will fail absolutely
due to a well-formedness constraint when it hits the end
body tag in the example.
If your documents can be parsed with Xerces I would be
surprised and it would mean there is a bug in Xerces.
--
Andy Clark * andyc@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org