You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Olaf Kittelmann <ki...@elmedia.de> on 2002/01/07 18:09:40 UTC
ambigous parsing behaviour/whitespace prob

Hi Everybody,
I do have a problem with Xerces, parsing my own ML and Whitespace.
I am trying to read init Data for a complex Object structure from XML
parsing with Xerces.
I do have a class "structureParser" using a XMLReader and an inner class as
contenthandler.
In a static initializer I specify org.apache.xerces.parsers.SAXParser as my
SAXdriver, set up my debugging and set validation to false.

static {

System.setProperty("org.xml.sax.driver","org.apache.xerces.parsers.SAXParser
");


System.setProperty("debug","false");


String strDebug = System.getProperty("DEBUG");

if (strDebug == null)

strDebug = System.getProperty("debug");

if (strDebug != null && strDebug.equalsIgnoreCase("true"))

debug = true;

else

debug = false;

}

I wrote a main method for testing that takes the path to my XML file as
argument pass it to my XMLReader and parse.

everything works fine, characters is called when there is characters and
ignorable whitespace is called when there are none.

the message stack on debugging looks like this:

de.elmedia.StructureParser$AbmlHandler.ignorableWhitespace(char[], int, int)
line: 415
org.apache.xerces.parsers.SAXParser(org.apache.xerces.parsers.AbstractSAXPar
ser).ignorableWhitespace(org.apache.xerces.xni.XMLString) line: 404
org.apache.xerces.impl.xs.XMLSchemaValidator.ignorableWhitespace(org.apache.
xerces.xni.XMLString) line: 479
org.apache.xerces.impl.XMLNamespaceBinder.ignorableWhitespace(org.apache.xer
ces.xni.XMLString) line: 612
org.apache.xerces.impl.dtd.XMLDTDValidator.characters(org.apache.xerces.xni.
XMLString) line: 836
org.apache.xerces.impl.XMLDocumentScannerImpl(org.apache.xerces.impl.XMLDocu
mentFragmentScannerImpl).scanContent() line: 836
org.apache.xerces.impl.XMLDocumentScannerImpl$ContentDispatcher(org.apache.x
erces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatc
h(boolean) line: 1379
org.apache.xerces.impl.XMLDocumentScannerImpl(org.apache.xerces.impl.XMLDocu
mentFragmentScannerImpl).scanDocument(boolean) line: 328
org.apache.xerces.parsers.DTDXSParserConfiguration(org.apache.xerces.parsers
.StandardParserConfiguration).parse(boolean) line: 479
org.apache.xerces.parsers.DTDXSParserConfiguration(org.apache.xerces.parsers
.StandardParserConfiguration).parse(org.apache.xerces.xni.parser.XMLInputSou
rce) line: 521
org.apache.xerces.parsers.SAXParser(org.apache.xerces.parsers.XMLParser).par
se(org.apache.xerces.xni.parser.XMLInputSource) line: 148
org.apache.xerces.parsers.SAXParser(org.apache.xerces.parsers.AbstractSAXPar
ser).parse(org.xml.sax.InputSource) line: 972


Now for my real purpose I am using a servlet that does pretty much the same
thing. it creates a Structureparser, the static initializer is executed and
it passes the same XML document:

to StructureParser and the XMLReader is set not to validate. but now, The
sax parser only triggers the character() method, with Strings that look like
"| ".

Now, I can still trim the strings and only process the ones that really
contain characters. but the thing I am interested in is: why the heck does
Xerces show this different behaviour when I use exactly the same steps to
set it up?

the message stack this time looks like:

de.elmedia.StructureParser$AbmlHandler.characters(char[], int, int) line: 87
org.apache.xerces.parsers.SAXParser.characters(char[], int, int) line: 1574
org.apache.xerces.validators.common.XMLValidator.processWhitespace(char[],
int, int) line: 654
org.apache.xerces.readers.UTF8Reader.scanContent(org.apache.xerces.utils.QNa
me) line: 2246
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(bo
olean) line: 1145
org.apache.xerces.framework.XMLDocumentScanner.parseSome(boolean) line: 380
org.apache.xerces.parsers.SAXParser(org.apache.xerces.framework.XMLParser).p
arse(org.xml.sax.InputSource) line: 908



so how can this be? why are the classes from the .framework package used for
the servlet, and the .implementation ones for the application.?


my XML source looks like this (nothing fancy).

<?xml version ="1.0"?>
<!DOCTYPE Struktur SYSTEM "Struktur.dtd">
<Struktur>
<Kategorie RootTemplate="Services.html">
<LinkObjekt ID="" pic="">Services</LinkObjekt>
<Kategorie RootTemplate="seach.html">
<LinkObjekt ID="" pic="">search</LinkObjekt>
</Kategorie>
<Kategorie RootTemplate="cart.html</Kategorie>

...........

.......
</Struktur>













---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org