You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Marc Seldin <ms...@bbn.com> on 2004/09/23 17:17:50 UTC

Making Xerces less strict?

While in theory it makes good sense to have a strict parser, in the world of
my clients getting them to make generally well formed documents is difficult
enough. I've been getting a number of complaints about the "invalid
character" error; usually they've included some control character, like an
ASCII 18. (http://xml.apache.org/xerces-c/faq-parse.html#faq-20)

Is there any way to make the xerces parser less strict? If not, I'd like to
put in a feature request for this. It would really make the world a happier,
shinier place.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: XML Schema bug fixes for 2.6? (was: Making Xerces less strict?)

Posted by Scott Cantor <ca...@osu.edu>.
> As a suggestion for the entire mailing list, why don't we use the "vote
> for this bug" feature of Jira? At least we can have a feeling of 
> what the users really think is urgent to fix.

Yep, I certainly did. Of course, I can't vote for my own. ;-)

This is exactly what I was asking, essentially. If people aren't using the
schema support, those tiny few who are have to decide if we have the
resources to fix the bugs we find or make some choices.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: XML Schema bug fixes for 2.6? (was: Making Xerces less strict?)

Posted by Alberto Massari <am...@progress.com>.
As a suggestion for the entire mailing list, why don't we use the "vote for 
this bug" feature of Jira? At least we can have a feeling of what the users 
really think is urgent to fix.

Alberto

At 15.38 23/09/2004 -0400, Scott Cantor wrote:
> > While it's no guarantee, referencing specific bug numbers that are
> > show-stoppers for you will certainly help increase the likelihood of
> > them getting fixed. Patches, of course, help even more ;)
>
>I'm sure, but since somebody else already pointed to one and got deafening
>silence, I had sort of lost hope.
>
>My top two (and the dealbreakers for me at the moment):
>
>http://nagoya.apache.org/jira/browse/XERCESC-1240
>http://nagoya.apache.org/jira/browse/XERCESC-1197
>
>If I had any time, I probably would have supplied a fix for mine. I may take
>another look, but I didn't get far trying to find the relevant code.
>
>The first one I think has a proposed fix within the entry.
>
>What I was mostly after was some sense as to whether there was a plan to
>look at the open bug list before cutting a new release and see if any quick
>fixes were possible.
>
>-- Scott
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: XML Schema bug fixes for 2.6? (was: Making Xerces less strict?)

Posted by Scott Cantor <ca...@osu.edu>.
> While it's no guarantee, referencing specific bug numbers that are 
> show-stoppers for you will certainly help increase the likelihood of 
> them getting fixed. Patches, of course, help even more ;)

I'm sure, but since somebody else already pointed to one and got deafening
silence, I had sort of lost hope.

My top two (and the dealbreakers for me at the moment):

http://nagoya.apache.org/jira/browse/XERCESC-1240
http://nagoya.apache.org/jira/browse/XERCESC-1197

If I had any time, I probably would have supplied a fix for mine. I may take
another look, but I didn't get far trying to find the relevant code.

The first one I think has a proposed fix within the entry.

What I was mostly after was some sense as to whether there was a plan to
look at the open bug list before cutting a new release and see if any quick
fixes were possible.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: XML Schema bug fixes for 2.6? (was: Making Xerces less strict?)

Posted by James Berry <ja...@jberry.us>.
On Sep 23, 2004, at 12:16 PM, Scott Cantor wrote:

> While the list is discussing the importance of adhering to standards, 
> is
> there any hope of getting a few simple XML schema bugs fixed before 
> the next
> release (e.g. xml:lang is unusable, anyType doesn't handle arbitrary
> ttributes)? Most of them don't even get an acknowledgement in the bug
> tracker.
>
> It's reaching critical...I effectively have to choose between Xerces 
> and
> validation now.

Scott,

While it's no guarantee, referencing specific bug numbers that are 
show-stoppers for you will certainly help increase the likelihood of 
them getting fixed. Patches, of course, help even more ;)

James.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Making Xerces less strict?

Posted by Scott Cantor <ca...@osu.edu>.
> Yes, resources are a major issue. What people should realize as well is 
> that we can't just go and check a patch in. We have to ensure its OK 
> with regards to the appropriate standard. As everyone knows, the schema 
> specs are not the most readable documents in the world :), so schema 
> bugs face this problem more than other DOM related bugs.

I totally understand that, believe me. If I sounded a little harsh
originally, I apologize, it wasn't my intent to criticize so much as elicit
some sense as to where the schema issues fit on the list of work.

Xerces-J simply has better schema support right now, and if -C isn't able to
devote the resources to catch up, that's something I have to deal with
somehow.

It seemed like a good time to ask the question, with a new release
apparently in the offing.

Speaking for myself, I think it would be nice to see some kind of policy
about making sure that all open bugs are at least reviewed (and if possible
commented on) before a major release. That way people with major bugs know
whether they can count on a fix or whether they need to really devote their
own time to it.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Making Xerces less strict?

Posted by Gareth Reakes <ga...@parthenoncomputing.com>.
Hi,

On 23 Sep 2004, at 20:30, david_n_bertoni@us.ibm.com wrote:

>> While the list is discussing the importance of adhering
>> to standards, is there any hope of getting a few simple
>> XML schema bugs fixed before the next release (e.g.
>> xml:lang is unusable, anyType doesn't handle arbitrary
>> ttributes)? Most of them don't even get an acknowledgement
>> in the bug tracker.
>
> Well, there is a difference, at least philosophically, between bugs and
> deliberate non-conformance.  I'm not a committer on the project, but I
> suspect it's just an issue of resources.  Bugs, after all, don't fix
> themselves -- they require human intervention.  Since the code is there
> for you, why not dive in and try to fix a few of them?  I'm sure your
> contributions will be welcome.


Yes, resources are a major issue. What people should realize as well is 
that we can't just go and check a patch in. We have to ensure its OK 
with regards to the appropriate standard. As everyone knows, the schema 
specs are not the most readable documents in the world :), so schema 
bugs face this problem more than other DOM related bugs.


Gareth


> Dave
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>
>
>
--
Gareth Reakes, Managing Director      Parthenon Computing
+44-1865-811184                  http://www.parthcomp.com


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Making Xerces less strict?

Posted by Scott Cantor <ca...@osu.edu>.
> Well, there is a difference, at least philosophically, between bugs and 
> deliberate non-conformance.

There's a similarity when the "solution" becomes hacking schemas to get
things to work. That way lies madness, same as this.

> I'm not a committer on the project, but I suspect it's just an issue of
> resources.  Bugs, after all, don't fix themselves -- they require human
> intervention.

Of course, but getting a sense as to whether the schema support is on the
critical list is important in choosing what to do. One bug begets another.
I'm well aware that nobody validates, I've been told that enough. So the
question is, can I realistically continue to? Should I?

> Since the code is there 
> for you, why not dive in and try to fix a few of them?  I'm sure your 
> contributions will be welcome.

Believe me, I've tried. It's not like one can digest the validator without
serious effort, and I'm busy writing my own projects too. Nobody can expect
to fix all the bugs in all the libraries they use, obviously, so getting a
sense as to whether that's my only way forward is important; if that's my
choice, I'll have to weigh the time I can spend.

As I noted in my other response, one of the bugs I hit *has* a fix proposed.

Anyway, the point was that the talk about a new release made me wonder if a
certain set of bugs are just "not on the radar screen", which is why I
asked. Somebody else asked a couple of weeks back about similar bugs and got
no answer, so I had sort of taken that for an answer. It's not a big deal
per se if there just aren't enough people using that code to justify the
resources to fix it vs. other things, but it's important as a user to know.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Making Xerces less strict?

Posted by da...@us.ibm.com.
> While the list is discussing the importance of adhering
> to standards, is there any hope of getting a few simple
> XML schema bugs fixed before the next release (e.g.
> xml:lang is unusable, anyType doesn't handle arbitrary
> ttributes)? Most of them don't even get an acknowledgement
> in the bug tracker.

Well, there is a difference, at least philosophically, between bugs and 
deliberate non-conformance.  I'm not a committer on the project, but I 
suspect it's just an issue of resources.  Bugs, after all, don't fix 
themselves -- they require human intervention.  Since the code is there 
for you, why not dive in and try to fix a few of them?  I'm sure your 
contributions will be welcome.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Making Xerces less strict?

Posted by Scott Cantor <ca...@osu.edu>.
> It's true that Microsoft's parser allows these characters, but it's 
> non-standard behavior.  If you really want to use XML, it's best to avoid 
> relying on such non-standard behavior.

While the list is discussing the importance of adhering to standards, is
there any hope of getting a few simple XML schema bugs fixed before the next
release (e.g. xml:lang is unusable, anyType doesn't handle arbitrary
ttributes)? Most of them don't even get an acknowledgement in the bug
tracker.

It's reaching critical...I effectively have to choose between Xerces and
validation now.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Making Xerces less strict?

Posted by da...@us.ibm.com.
> A preprocessing service is not a bad work around; and I
> do have sympathy for the notion of standards of compliance.
> Still, I see this problem (control characters) often enough
> that I question the XML standard and Xerces' compliance to it.
> XML Spy does not, though Microsoft's DOM parser does. Is
> there any reason I can give my users other than that it is
> verboten? Is there some good reason for these characters to
> be off limits?

They are absolutely off-limits in XML 1.0, so any "pro-processing" you do 
will have to encode them in some application-specific way (base64 is often 
used).  Whether or not there's a "good reason," I can't tell you, because 
I don't know the history, and "good" implies a value judgement I'm not 
willing to make. 

They are now allowed in XML 1.1, although they must be represented using 
numeric character references.  However, XML 1.1 is a very recent 
recommendation, so finding tools that support it may be difficult.

It's true that Microsoft's parser allows these characters, but it's 
non-standard behavior.  If you really want to use XML, it's best to avoid 
relying on such non-standard behavior.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Making Xerces less strict?

Posted by Marc Seldin <ms...@bbn.com>.
A preprocessing service is not a bad work around; and I do have sympathy for
the notion of standards of compliance. Still, I see this problem (control
characters) often enough that I question the XML standard and Xerces'
compliance to it. XML Spy does not, though Microsoft's DOM parser does. Is
there any reason I can give my users other than that it is verboten? Is
there some good reason for these characters to be off limits?

> -----Original Message-----
> From: Dean Roddey [mailto:droddey@charmedquark.com]
> Sent: Thursday, September 23, 2004 1:19 PM
> To: xerces-c-dev@xml.apache.org
> Subject: RE: Making Xerces less strict?
> 
> Ditto the other answers. The whole point of a strict standard is that you
> know that the documents you have (once they go through a compliant parser)
> should go through any compliant parser. HTML turned into a race to the
> bottom and this browser couldn't parse that page but the person creating
> that page says, "But it works for me." XML cannot allow that because not
> only is it for visual markup, where a failure to comply is a visual
> annoyance, it's for semantic information and data, which will cause far
> greater failures. It would make the world a very muddled, confused place,
> where evil doers can create confusion and despair.
> 
> -------------------------------------
> Dean Roddey
> The Charmed Quark Controller
> droddey@charmedquark.com
> www.charmedquark.com
> 
> 
> 
> -----Original Message-----
> From: Marc Seldin [mailto:mseldin@bbn.com]
> Sent: Thursday, September 23, 2004 8:18 AM
> To: xerces-c-dev@xml.apache.org
> Subject: Making Xerces less strict?
> 
> 
> While in theory it makes good sense to have a strict parser, in the world
> of
> my clients getting them to make generally well formed documents is
> difficult
> enough. I've been getting a number of complaints about the "invalid
> character" error; usually they've included some control character, like an
> ASCII 18. (http://xml.apache.org/xerces-c/faq-parse.html#faq-20)
> 
> Is there any way to make the xerces parser less strict? If not, I'd like
> to
> put in a feature request for this. It would really make the world a
> happier,
> shinier place.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Making Xerces less strict?

Posted by Dean Roddey <dr...@charmedquark.com>.
Ditto the other answers. The whole point of a strict standard is that you
know that the documents you have (once they go through a compliant parser)
should go through any compliant parser. HTML turned into a race to the
bottom and this browser couldn't parse that page but the person creating
that page says, "But it works for me." XML cannot allow that because not
only is it for visual markup, where a failure to comply is a visual
annoyance, it's for semantic information and data, which will cause far
greater failures. It would make the world a very muddled, confused place,
where evil doers can create confusion and despair.

-------------------------------------
Dean Roddey
The Charmed Quark Controller
droddey@charmedquark.com
www.charmedquark.com
 


-----Original Message-----
From: Marc Seldin [mailto:mseldin@bbn.com] 
Sent: Thursday, September 23, 2004 8:18 AM
To: xerces-c-dev@xml.apache.org
Subject: Making Xerces less strict?


While in theory it makes good sense to have a strict parser, in the world of
my clients getting them to make generally well formed documents is difficult
enough. I've been getting a number of complaints about the "invalid
character" error; usually they've included some control character, like an
ASCII 18. (http://xml.apache.org/xerces-c/faq-parse.html#faq-20)

Is there any way to make the xerces parser less strict? If not, I'd like to
put in a feature request for this. It would really make the world a happier,
shinier place.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org