You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Dean Roddey <dr...@portal.com> on 2000/12/22 22:55:18 UTC
RE: Validation of NMTOKEN strings

I believe that the XMLReader:isXXXChar() methods are statics, so though they
are 'internal' you could certainly use them if you want to, without having
to make your own copy. The issue is whether the parser team is wiling to
support them as public API.

Overall, it wouldn't make much difference if there was created in Framework
an XMLCharType class which enapsulates the character checking stuff (i.e.
holds that table and provides those static APIs.) But, I believe that some
performance tricks are played in the reader using direct access to that
array and such which might have to be given up in order to move that code
out of the reader. And any problems it didn't cause today, it might cause
later.

However, on the other side of the argument, eventually there will be some
type of revalidation of DOM trees, and this type of functionality will have
to be available to that system anyway I guess.

If such an XMLCharType class was provided, I'd also argue for adding some
convenience methods to it such as "isLegalNmToken()" and such, so that
people don't end up writing those string scanning wrappers over and over
again, and avoid all of the calls into and out of the lower level APIs. The
readers couldn't use those, since it does such things char by char anyway,
but it's always likely to be a special case.

--------------
Dean Roddey
Software Geek Extraordinaire
Portal, Inc
droddey@portal.com



-----Original Message-----
From: Bob Kline [mailto:bkline@rksystems.com]
Sent: Friday, December 22, 2000 2:06 PM
To: xerces-c-dev@xml.apache.org
Subject: Validation of NMTOKEN strings


I have software which needs to determine whether a given string matches
the XML grammar's production for NMTOKEN.  My first (admittedly
crude) version used the test

   if (!iswalpha(c) && !iswdigit(c) && !wcschr(L".-_:", c))

on each character in the string to detect characters which did not meet
the NameChar production.  This was close enough for a first cut, but it
ignores part of the NameChar production, and makes some assumptions
about what iswalpha and iswdigit are doing that are worse than
unreliable.

For my second cut I looked for a public method in xerces-c which would
do the job properly.  I didn't find such a public method, but I did find
the internal code which does the trick (XMLReader::isNameChar).  So I
copied the table and mask that I needed and produced a much more correct
implementation of my function.

What I'd really like to do is find this publicly exposed somewhere, so I
can pick up bug fixes and changes to the definition of NMTOKEN, should
such ever occur.  Am I just not looking in the right place, or is this
really not exposed anywhere?

Thanks!

-- 
Bob Kline
mailto:bkline@rksystems.com
http://www.rksystems.com


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org