You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Aleksander Slominski <as...@cs.indiana.edu> on 2002/07/22 05:37:45 UTC
xmlpull api [Re: [Announce] The CyberNeko Tools for XNI 2002.07.17
Available]
Andy Clark wrote:
> Elliotte Rusty Harold wrote:
> >> the only thing that could possibly make XMLPULL API not
> >> 100% compatible with XML 1.0 is when PROCESS DOCDECL feature
> >> [...]
> >
> > That's a very big one. A parser should not be allowed to turn off
> > processing of the internal DTD subset at all. And to make not processing
> > it the default?! That's just wrong.
>
> Well, you gotta look at the intended purpose of these types
> of parsers. If I remember correctly, the XPP work was started
> because of SOAP which subsets XML syntax and doesn't allow a
> DOCTYPE line at all.
that means that those implementations were designed to concentrate
on size (like kXML2) or speed (like MXP1) but there can be many
other implementations ...
> > Worse yet, according to http://www.xmlpull.org/impls.shtml neither of
> > the existing implementations even allows you to set that feature to true.
>
> I think Alexsander has code to use Xerces2 as the driver
> for the push API. So, if used that way then it should be
> able to check the DTD just like Xerces. And when I finish
> my API for the CyberNeko tools, the default impl will be
> driven by Xerces so it should have no problem in that
> regard.
exactly - one thing is API and completely another is implementation.
as long as each implementation is correctly described users can make
informed choices.
> > I've also heard it claimed recently that the parsers aren't doing all
> > the name character checking they're supposed to, though I haven't
>
> No wonder they're so fast. ;) This is one of the big
> checks that implementors would love to remove from their
> inner loop. Xerces, being fully compliant, can't do that
> and suffers some performance hits.
in MXP1 i use lookup table for char values below
and if statement for the rest. i am putting
relevant part of code from MXP1 below and
welcome comments about it (especially if you find anything
wrong with the functions!).
> Just about any XML parser can be written to go fast if
> they don't do all of the work. For example, removing
> character checking, avoiding DTD parsing and processing,
> not implementing XML Schema, etc. But, depending on the
> situation, these are all perfectly acceptable choices.
well - i think that MXP1 do all XML parsing and i am slowly
improving it to the level of non validating parsing - only
remaining incompatibilities i know about is DTD parsing and
add XML 1.0 character set support (i am a bit hesitant about
it as i like XML 1.1 much more ...).
thanks,
alek
ps. here is fragment of MXParser - please comment if you think
that i am missing something important when looking on what is
required in http://www.w3.org/TR/xml11/#sec2.3 (thanks in advance!)
protected static final int LOOKUP_MAX = 0x400;
protected static final char LOOKUP_MAX_CHAR = (char)LOOKUP_MAX;
protected static boolean lookupNameStartChar[] = new boolean[ LOOKUP_MAX ];
protected static boolean lookupNameChar[] = new boolean[ LOOKUP_MAX ];
private static final void setName(char ch)
{ lookupNameChar[ ch ] = true; }
private static final void setNameStart(char ch)
{ lookupNameStartChar[ ch ] = true; setName(ch); }
static {
setNameStart(':');
for (char ch = 'A'; ch <= 'Z'; ++ch) setNameStart(ch);
setNameStart('_');
for (char ch = 'a'; ch <= 'z'; ++ch) setNameStart(ch);
for (char ch = '\u00c0'; ch <= '\u02FF'; ++ch) setNameStart(ch);
for (char ch = '\u0370'; ch <= '\u037d'; ++ch) setNameStart(ch);
for (char ch = '\u037f'; ch < '\u0400'; ++ch) setNameStart(ch);
setName('-');
setName('.');
for (char ch = '0'; ch <= '9'; ++ch) setName(ch);
setName('\u00b7');
for (char ch = '\u0300'; ch <= '\u036f'; ++ch) setName(ch);
}
private final static boolean isNameStartChar(char ch) {
return (ch < LOOKUP_MAX_CHAR && lookupNameStartChar[ ch ])
|| (ch >= LOOKUP_MAX_CHAR && ch <= '\u2027')
|| (ch >= '\u202A' && ch <= '\u218F')
|| (ch >= '\u2800' && ch <= '\uFFEF')
;
}
private final static boolean isNameChar(char ch) {
return (ch < LOOKUP_MAX_CHAR && lookupNameChar[ ch ])
|| (ch >= LOOKUP_MAX_CHAR && ch <= '\u2027')
|| (ch >= '\u202A' && ch <= '\u218F')
|| (ch >= '\u2800' && ch <= '\uFFEF')
;
}
protected boolean isS(char ch) {
return (ch == ' ' || ch == '\n' || ch == '\r' || ch == '\t');
}
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org