You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Ben Griffin <be...@redsnapper.net> on 2009/03/11 15:48:32 UTC
LoadGrammar Error?
Okay - I've been staring at this for four days now.
Here is a small example of what is bugging me:
-----------------
class Err: public DOMErrorHandler {
bool Err::handleError(const xercesc::DOMError& domError) {
std::cerr << transcode(domError.getMessage());
return true;
}
};
int main(int argc, char *argv[]) {
XMLPlatformUtils::Initialize();
transcoder = XMLPlatformUtils::fgTransService-
>makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager);
std::string grammar_str = "<xs:schema targetNamespace=\"http://my.org/blah
\" xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" ><xs:attribute name=
\"box\" fixed=\"true\" /></xs:schema>";
XMLCh* grammar_file = transcoder->transcode(grammar_str.c_str());
Grammar::GrammarType grammar_type = Grammar::SchemaGrammarType;
DOMImplementation* impl =
DOMImplementationRegistry::getDOMImplementation(X("LS"));
DOMLSParser* parser = ((DOMImplementationLS*)impl)-
>createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS, 0);
DOMConfiguration* dc = parser->getDomConfig();
Err* errorHandler = new Err();
dc->setParameter(XMLUni::fgDOMErrorHandler,errorHandler);
dc->setParameter(XMLUni::fgXercesUseCachedGrammarInParse, true);
dc->setParameter(XMLUni::fgXercesSchema, true);
dc->setParameter(XMLUni::fgXercesCacheGrammarFromParse, true);
dc->setParameter(XMLUni::fgDOMValidate, true);
DOMLSInput* input = ((DOMImplementationLS*)impl)->createLSInput();
input->setStringData(grammar_file);
parser->loadGrammar(input, grammar_type, true);
// [...]
}
-----------------------------------------------
An error is being thrown by IGXMLScanner::scanStartTagNS because
fQNameBuf is not being loaded by ReaderMgr.getQName because
isFirstNCNameChar is returning false.
if (!fReaderMgr.getQName(fQNameBuf, &prefixColonPos)) {
if (fQNameBuf.isEmpty())
emitError(XMLErrs::ExpectedElementName); // <-- Error
thrown here.
else
//false being returned by XMLReader::isFirstNCNameChar.
inline bool XMLReader::isFirstNCNameChar(const XMLCh toCheck) const {
return (((fgCharCharsTable[toCheck] & gFirstNameCharMask) != 0)
&& (toCheck != chColon));
}
The reason is that the schema characters in fCharBuf have been
converted twice. (note that this is little-endian)
(what follows is the start of a memory dump of the fCharBuf )
3c 00 00 00 78 00 00 00 73 00 00 00 3a 00 00 00
73 00 00 00 63 00 00 00 68 00 00 00 65 00 00 00
6d 00 00 00 61 00 00 00 20 00 00 00 74 00 00 00
61 00 00 00 72 00 00 00 67 00 00 00 65 00 00 00
74 00 00 00 4e 00 00 00 61 00 00 00 6d 00 00 00
65 00 00 00 73 00 00 00 70 00 00 00 61 00 00 00
#0 0x00fe3453 in xercesc_3_0::Wrapper4DOMLSInput::makeStream at
Wrapper4DOMLSInput.cpp:132
#1 0x01011e7b in xercesc_3_0::ReaderMgr::createReader at ReaderMgr.cpp:
365
#2 0x0100d6f7 in xercesc_3_0::IGXMLScanner::scanReset at
IGXMLScanner2.cpp:1362
#3 0x01003c1b in xercesc_3_0::IGXMLScanner::scanDocument at
IGXMLScanner.cpp:197
#4 0x0105b587 in xercesc_3_0::AbstractDOMParser::parse at
AbstractDOMParser.cpp:535
#5 0x01008845 in xercesc_3_0::IGXMLScanner::loadXMLSchemaGrammar at
IGXMLScanner2.cpp:2085
#6 0x00ffee5f in xercesc_3_0::IGXMLScanner::loadGrammar at
IGXMLScanner.cpp:3005
#7 0x010616c9 in xercesc_3_0::DOMLSParserImpl::loadGrammar at
DOMLSParserImpl.cpp:935
//So here we see the culprit -
BinInputStream* Wrapper4DOMLSInput::makeStream() const {
// The LSParser will use the LSInput object to determine how to
read data. The LSParser will look at the different inputs specified in
the
// LSInput in the following order to know which one to read from,
the first one that is not null and not an empty string will be used:
// 1. LSInput.characterStream
// 2. LSInput.byteStream
// 3. LSInput.stringData
// 4. LSInput.systemId
// 5. LSInput.publicId
InputSource* binStream=fInputSource->getByteStream();
if(binStream)
return binStream->makeStream();
const XMLCh* xmlString=fInputSource->getStringData();
if(xmlString)
{
MemBufInputSource is((const XMLByte*)xmlString,
XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false,
getMemoryManager()); // <--!!!! what?!
is.setCopyBufToStream(false);
return is.makeStream();
}
-----------------------------------------------
First of all the fact that this function first looks at the byteStream
MUST be a bug.
Secondly, the characterStream is being CONVERTED - when it should
already be an XMLCh* (as defined everywhere else)
Or am I missing a trick?
Re: LoadGrammar Error?
Posted by Alberto Massari <am...@datadirect.com>.
Hi Ben,
the cast in the MemBufInputSource is fine, as it is simply a wrapper for
a bunch of bytes, regardless of which encoding they are using. The only
thing that can be made to avoid your case (a missing XML header in the
string) is adding the call to
is.setEncoding(XMLUni::fgXMLChEncodingString);
after the creation of the object.
Alberto
Ben Griffin wrote:
> Alberto, thanks for your time.
>
> On 11 Mar 2009, at 15:46, Alberto Massari wrote:
>> Hi Ben,
>> 1) why do you think that Wrapper4LSInput shouldn't look at the
>> byteStream? The specs list this order
>
> Okay - I see that there is no LSInput.characterStream, which is (sort
> of) fair enough, so I agree that the order is therefore correct.
>>
>> 2) the stringData is not being converted: MemBufInputSource works on
>> a byte stream, so it needs a cast and a size computed by multiplying
>> sizeof(XMLCh) by the length (in UTF-16 chars) of the string.
>
> Well, here I have to disagree. Look at the (fragment of ) makeStream
> below:
>
> BinInputStream* Wrapper4DOMLSInput::makeStream() const {
> // The LSParser will use the LSInput object to
> determine how to read data. The LSParser will look at the different
> inputs specified in the
> // LSInput in the following order to know which one to
> read from, the first one that is not null and not an empty string will
> be used:
> // 1. LSInput.characterStream
> // 2. LSInput.byteStream
> // 3. LSInput.stringData
> // 4. LSInput.systemId
> // 5. LSInput.publicId
> InputSource* binStream=fInputSource->getByteStream();
> if(binStream)
> return binStream->makeStream();
> ---> const XMLCh* xmlString=fInputSource->getStringData();
> // xmlString is a XMLCh*, as created using LSInput->setStringData()
>
> if(xmlString)
> {
>
> --> MemBufInputSource is((const XMLByte*)xmlString,
> XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false,
> getMemoryManager());
> //So why is it being CAST into XMLByte here?
> /And now "is" is being instantiated as if the xmlString is a XMLByte*
> ....
>
> is.setCopyBufToStream(false);
> return is.makeStream();
>
> //...which makes a BinInputStream* from "is"
>
> Now, THAT goes onto instantiate a XMLReader which does an initial load
> of raw bytes.
> refreshRawBuffer();
>
> and then uses.. and XMLRecognizer to test the Encoding.. HANG ON -
> this is meant to be XMLCh...
> ... anyway... That should be FINE if it returns the same encoding as a
> XMLCh.
>
> So being a XMLCh* - the grammar starts (in terms of bytes) 3c 00
>
> XMLRecognizer::basicEncodingProbe( const XMLByte* const rawBuffer
> , const XMLSize_t rawByteCount)
>
> Because this doesn't actually know about non BOM UTF-16BE or UTF-16LE
> (ie, the XMLCh encoding), it is going to return "UTF-8".
>
> Likewise, the grammar string does not have an <?xml ..> declaration,
> (which is legal) the XMLRecognizer is going to fail.
>
> As you can imagine, once the BinInputStream has been identified as
> UTF-8, there really is no turning back.
>
> Sure enough, now AbstractDOMParser::startDocument() calls
> fDocument->setInputEncoding(fScanner->getReaderMgr()->getCurrentEncodingStr());
>
>
> Just in time for
> IGXMLScanner::scanDocument(const InputSource& src) to call
> scanStartTagNS(gotData)
>
> This then hits trouble at (!fReaderMgr.getQName(fQNameBuf,
> &prefixColonPos)) which return empty
> and the empty will emit an Error.
>
>>
>> As for the error you see, are you sure your
>> transcoder->transcoder(grammar_str.c_str()) is actually generating a
>> string of XMLCh? Could you post its code?
>
> My transcoder?
>
> XMLLCPTranscoder* transcoder =
> XMLPlatformUtils::fgTransService->makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager);
>
>>
>
>
> Best regards
> Ben.
>
Re: LoadGrammar Error?
Posted by Ben Griffin <be...@redsnapper.net>.
Alberto, thanks for your time.
On 11 Mar 2009, at 15:46, Alberto Massari wrote:
> Hi Ben,
> 1) why do you think that Wrapper4LSInput shouldn't look at the
> byteStream? The specs list this order
Okay - I see that there is no LSInput.characterStream, which is (sort
of) fair enough, so I agree that the order is therefore correct.
>
> 2) the stringData is not being converted: MemBufInputSource works on
> a byte stream, so it needs a cast and a size computed by multiplying
> sizeof(XMLCh) by the length (in UTF-16 chars) of the string.
Well, here I have to disagree. Look at the (fragment of ) makeStream
below:
BinInputStream* Wrapper4DOMLSInput::makeStream() const {
// The LSParser will use the LSInput object to determine how to
read data. The LSParser will look at the different inputs specified in
the
// LSInput in the following order to know which one to read
from, the first one that is not null and not an empty string will be
used:
// 1. LSInput.characterStream
// 2. LSInput.byteStream
// 3. LSInput.stringData
// 4. LSInput.systemId
// 5. LSInput.publicId
InputSource* binStream=fInputSource->getByteStream();
if(binStream)
return binStream->makeStream();
---> const XMLCh* xmlString=fInputSource->getStringData();
// xmlString is a XMLCh*, as created using LSInput->setStringData()
if(xmlString)
{
--> MemBufInputSource is((const XMLByte*)xmlString,
XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false,
getMemoryManager());
//So why is it being CAST into XMLByte here?
/And now "is" is being instantiated as if the xmlString is a
XMLByte* ....
is.setCopyBufToStream(false);
return is.makeStream();
//...which makes a BinInputStream* from "is"
Now, THAT goes onto instantiate a XMLReader which does an initial load
of raw bytes.
refreshRawBuffer();
and then uses.. and XMLRecognizer to test the Encoding.. HANG ON -
this is meant to be XMLCh...
... anyway... That should be FINE if it returns the same encoding as a
XMLCh.
So being a XMLCh* - the grammar starts (in terms of bytes) 3c 00
XMLRecognizer::basicEncodingProbe( const XMLByte* const
rawBuffer , const XMLSize_t rawByteCount)
Because this doesn't actually know about non BOM UTF-16BE or UTF-16LE
(ie, the XMLCh encoding), it is going to return "UTF-8".
Likewise, the grammar string does not have an <?xml ..> declaration,
(which is legal) the XMLRecognizer is going to fail.
As you can imagine, once the BinInputStream has been identified as
UTF-8, there really is no turning back.
Sure enough, now AbstractDOMParser::startDocument() calls
fDocument->setInputEncoding(fScanner->getReaderMgr()-
>getCurrentEncodingStr());
Just in time for
IGXMLScanner::scanDocument(const InputSource& src) to call
scanStartTagNS(gotData)
This then hits trouble at (!fReaderMgr.getQName(fQNameBuf,
&prefixColonPos)) which return empty
and the empty will emit an Error.
>
> As for the error you see, are you sure your transcoder-
> >transcoder(grammar_str.c_str()) is actually generating a string of
> XMLCh? Could you post its code?
My transcoder?
XMLLCPTranscoder* transcoder = XMLPlatformUtils::fgTransService-
>makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager);
>
Best regards
Ben.
Re: LoadGrammar Error?
Posted by Alberto Massari <am...@datadirect.com>.
Hi Ben,
1) why do you think that Wrapper4LSInput shouldn't look at the
byteStream? The specs list this order
1. |LSInput.characterStream|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-characterStream>
2. |LSInput.byteStream|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-byteStream>
3. |LSInput.stringData|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-stringData>
4. |LSInput.systemId|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-systemId>
5. |LSInput.publicId|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-publicId>
and the first item, characterStream (of type LSReader) is not available
in Xerces-C++, as allowed by the specs (LSReader is an Object, so its
purpose is to allow the use of java.lang.String).
2) the stringData is not being converted: MemBufInputSource works on a
byte stream, so it needs a cast and a size computed by multiplying
sizeof(XMLCh) by the length (in UTF-16 chars) of the string.
As for the error you see, are you sure your
transcoder->transcoder(grammar_str.c_str()) is actually generating a
string of XMLCh? Could you post its code?
Alberto
Ben Griffin wrote:
> Okay - I've been staring at this for four days now.
> Here is a small example of what is bugging me:
> -----------------
> class Err: public DOMErrorHandler {
> bool Err::handleError(const xercesc::DOMError& domError) {
> std::cerr << transcode(domError.getMessage());
> return true;
> }
> };
>
> int main(int argc, char *argv[]) {
> XMLPlatformUtils::Initialize();
> transcoder =
> XMLPlatformUtils::fgTransService->makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager);
>
>
> std::string grammar_str = "<xs:schema
> targetNamespace=\"http://my.org/blah\"
> xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" ><xs:attribute
> name=\"box\" fixed=\"true\" /></xs:schema>";
> XMLCh* grammar_file = transcoder->transcode(grammar_str.c_str());
> Grammar::GrammarType grammar_type = Grammar::SchemaGrammarType;
> DOMImplementation* impl =
> DOMImplementationRegistry::getDOMImplementation(X("LS"));
> DOMLSParser* parser =
> ((DOMImplementationLS*)impl)->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS,
> 0);
>
> DOMConfiguration* dc = parser->getDomConfig();
> Err* errorHandler = new Err();
> dc->setParameter(XMLUni::fgDOMErrorHandler,errorHandler);
> dc->setParameter(XMLUni::fgXercesUseCachedGrammarInParse,
> true);
> dc->setParameter(XMLUni::fgXercesSchema, true);
> dc->setParameter(XMLUni::fgXercesCacheGrammarFromParse,
> true);
> dc->setParameter(XMLUni::fgDOMValidate, true);
>
> DOMLSInput* input =
> ((DOMImplementationLS*)impl)->createLSInput();
> input->setStringData(grammar_file);
> parser->loadGrammar(input, grammar_type, true);
>
> // [...]
>
> }
> -----------------------------------------------
> An error is being thrown by IGXMLScanner::scanStartTagNS because
> fQNameBuf is not being loaded by ReaderMgr.getQName because
> isFirstNCNameChar is returning false.
>
> if (!fReaderMgr.getQName(fQNameBuf, &prefixColonPos)) {
> if (fQNameBuf.isEmpty())
> emitError(XMLErrs::ExpectedElementName); // <-- Error
> thrown here.
> else
>
>
> //false being returned by XMLReader::isFirstNCNameChar.
> inline bool XMLReader::isFirstNCNameChar(const XMLCh toCheck) const {
> return (((fgCharCharsTable[toCheck] & gFirstNameCharMask) != 0)
> && (toCheck != chColon));
> }
>
> The reason is that the schema characters in fCharBuf have been
> converted twice. (note that this is little-endian)
> (what follows is the start of a memory dump of the fCharBuf )
> 3c 00 00 00 78 00 00 00 73 00 00 00 3a 00 00 00
> 73 00 00 00 63 00 00 00 68 00 00 00 65 00 00 00
> 6d 00 00 00 61 00 00 00 20 00 00 00 74 00 00 00
> 61 00 00 00 72 00 00 00 67 00 00 00 65 00 00 00
> 74 00 00 00 4e 00 00 00 61 00 00 00 6d 00 00 00
> 65 00 00 00 73 00 00 00 70 00 00 00 61 00 00 00
>
> #0 0x00fe3453 in xercesc_3_0::Wrapper4DOMLSInput::makeStream at
> Wrapper4DOMLSInput.cpp:132
> #1 0x01011e7b in xercesc_3_0::ReaderMgr::createReader at
> ReaderMgr.cpp:365
> #2 0x0100d6f7 in xercesc_3_0::IGXMLScanner::scanReset at
> IGXMLScanner2.cpp:1362
> #3 0x01003c1b in xercesc_3_0::IGXMLScanner::scanDocument at
> IGXMLScanner.cpp:197
> #4 0x0105b587 in xercesc_3_0::AbstractDOMParser::parse at
> AbstractDOMParser.cpp:535
> #5 0x01008845 in xercesc_3_0::IGXMLScanner::loadXMLSchemaGrammar at
> IGXMLScanner2.cpp:2085
> #6 0x00ffee5f in xercesc_3_0::IGXMLScanner::loadGrammar at
> IGXMLScanner.cpp:3005
> #7 0x010616c9 in xercesc_3_0::DOMLSParserImpl::loadGrammar at
> DOMLSParserImpl.cpp:935
>
> //So here we see the culprit -
> BinInputStream* Wrapper4DOMLSInput::makeStream() const {
> // The LSParser will use the LSInput object to determine how to
> read data. The LSParser will look at the different inputs specified in
> the
> // LSInput in the following order to know which one to read from,
> the first one that is not null and not an empty string will be used:
> // 1. LSInput.characterStream
> // 2. LSInput.byteStream
> // 3. LSInput.stringData
> // 4. LSInput.systemId
> // 5. LSInput.publicId
>
> InputSource* binStream=fInputSource->getByteStream();
> if(binStream)
> return binStream->makeStream();
> const XMLCh* xmlString=fInputSource->getStringData();
> if(xmlString)
> {
> MemBufInputSource is((const XMLByte*)xmlString,
> XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false,
> getMemoryManager()); // <--!!!! what?!
> is.setCopyBufToStream(false);
> return is.makeStream();
> }
> -----------------------------------------------
>
> First of all the fact that this function first looks at the byteStream
> MUST be a bug.
> Secondly, the characterStream is being CONVERTED - when it should
> already be an XMLCh* (as defined everywhere else)
>
>
> Or am I missing a trick?
>