You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Jean Georges PERRIN <jg...@jgp.net> on 2004/02/19 09:30:27 UTC

Parser to solve XHTML errors?

Hi,

I have a fairly complex problem to describe with Xerces-J 2.5.0. I'll try to
make it simple.

My goal is to parse XHTML files. I parse them ok but my issues are with not
compliant XHTML files such as the ones with <input ...> instead of <input
.../>.

I have rebuilt the Xerces library in eclipse.

I have made a small test in a test environment, and it looks like Xerces is
able to correct such xml errors. I was surprised. However, some features are
failing to be set such as
http://apache.org/xml/features/validation/schema/augment-psvi,
http://xml.org/sax/features/external-general-entities...

But the parse process is running ok. And it solves my XHTML mistakes...

Now, when I use this Xerces jar in Tomcat (I have removed all other
reference to any XML parser), the features are set correctly but the
document fails to be parsed.

I simply do not understand why it is different in the command line and in
tomcat. I use an explicit call to create a new
org.apache.xerces.parsers.DOMParser:

      parser = new org.apache.xerces.parsers.DOMParser();

When I use a library from the binary distribution (2.4.0, 2.5.0 and 2.6.1)
it fails in both cases.

Is there some Sun XML parser that is overriding my use of Xerces (like the
built-in version of Xalan)?

Is there a way to know the location of the JAR files the program is using? I
am pretty sure that I am using the right classpath (otherwise it fails) but
who knows...

Any help / hint will be really appreciated...

TIA

Jean Georges 



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

RE: Parser to solve XHTML errors?

Posted by Jean Georges PERRIN <jg...@jgp.net>.

Strangely enough, that's how it behaves...

I can't understand why & how!

I have replaced with the distribution binaries from 2.4.0, 2.5.0 and 2.6.1,
it fails (normal).

I use the Xerces 2.5.0 build I have done with eclipse (w/o ant), it works...

This thing is really getting me mad!

Jean Georges

> -----Original Message-----
> From: Andy Clark [mailto:andyc@apache.org]
> Sent: Thursday, February 19, 2004 19:25
> To: xerces-j-user@xml.apache.org
> Subject: Re: Parser to solve XHTML errors?
> 
> Jean Georges PERRIN wrote:
> > I have made a small test in a test environment, and it looks like Xerces
> is
> > able to correct such xml errors. I was surprised. However, some features
> are
> 
> Assuming that the input documents are erroneous as you say,
> I don't see how Xerces could parse them correctly. For
> example, the following XML code:
> 
>    <body>
>      <img src='me.gif'>
>      <p>Hello there</p>
>    </body>
> 
> can be parsed up to the end body tag. It's at that point
> that the end tag it sees does not equal the currently open
> tag, <img>. Since img did not end properly for XML (e.g.
> <img ... />), the following paragraph would appear as a
> child of the img tag. However, it will fail absolutely
> due to a well-formedness constraint when it hits the end
> body tag in the example.
> 
> If your documents can be parsed with Xerces I would be
> surprised and it would mean there is a bug in Xerces.
> 
> --
> Andy Clark * andyc@apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Parser to solve XHTML errors?

Posted by Andy Clark <an...@apache.org>.

Jean Georges PERRIN wrote:
> I have made a small test in a test environment, and it looks like Xerces is
> able to correct such xml errors. I was surprised. However, some features are

Assuming that the input documents are erroneous as you say,
I don't see how Xerces could parse them correctly. For
example, the following XML code:

   <body>
     <img src='me.gif'>
     <p>Hello there</p>
   </body>

can be parsed up to the end body tag. It's at that point
that the end tag it sees does not equal the currently open
tag, <img>. Since img did not end properly for XML (e.g.
<img ... />), the following paragraph would appear as a
child of the img tag. However, it will fail absolutely
due to a well-formedness constraint when it hits the end
body tag in the example.

If your documents can be parsed with Xerces I would be
surprised and it would mean there is a bug in Xerces.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org