You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by ro...@us.ibm.com on 2000/02/22 01:13:12 UTC

Xerces-C: A bug in internal entity processing!



Boy is my face red. Long ago, the internal character size used to be
UTF-16. As of 1.1.0/3.1.0 its floated to wchar_t on almost all platforms.
This change brought up a bug that I'm about to fix, but I wanted to let
everyone know about it.

Enternal entity values are transcoded in the process of transcoding the DTD
subset that they are in. Therefore, they are stored internally as XMLCh.
However, to make everything orthagonal, the readers created for internal
entities are treated no different than for external ones. So they still
create a transcoder. However, that transcoder is supposed to effectively do
nothing but just pass the data through from the raw buffer to the cooked
buffer.

But, if you look at the code in ReaderMgr::createIntEntReader(), it looks
like this:

    XMLReader* retVal = new XMLReader
    (
        sysId
        , 0
        , new BinMemInputStream
          (
            (const XMLByte*)dataBuf
            , dataLen * sizeof(XMLCh)
            , copyBuf ? BinMemInputStream::BufOpt_Copy
                        : BinMemInputStream::BufOpt_Reference
          )
        , XMLRecognizer::nameForEncoding(XMLRecognizer::Def_UTF16)
        , refFrom
        , type
        , XMLReader::Source_Internal
    );

Note that its passing XMLRecognizer::Def_UTF16 as the encoding!! This is
either UTF-16(LE) or UTF-16(BE) according to the endianness of the local
machine. However, now that XMLCh is floating, it can be 32 bits in length.
This causes internal entities to be incorrectly interpreted on those
systems.

I will be fixing this by creating a simple XMLChTranscoder and adding a
special encoding name for it, perhaps something like "$Xerces-XMLCh$" that
will be recognized by the TransService (and not conflict with any real
encoding name likely to ever exist). It will always then return that
special XMLCh transcoder for internal entities.

Oh well... ya lives and ya learns.

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
roddey@us.ibm.com