You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Matthew Lovett <ML...@uk.ibm.com> on 2001/09/03 17:25:38 UTC

Why are some ebcdic codepages 'disallowed'?

Hi all,

I'm trying to parse a document with encoding 'ibm-37', using the ICU
transcoder.  The ICU data dll I use does have an entry for ibm-37, but the
code in TransService.cpp is never letting it get that far.  It seems that
TransService has a list of encodings that it deliberately disallows:

//  gDisallow1
//  gDisallowX
//      These area small set of encoding names that, for temporary reasons,
//      we disallow at this time.
//
//  gDisallowList
//  gDisallowListSize
//      An array of the disallow strings, for easier searching below.
//

Is there anything to stop me commenting out the code which implements this
check?  Should I raise a bug report so that the code is chopped out of the
next Xerces release?

Here's the code which checks the list, from
XmlTransService::makeNewTranscoderFor()

    //
    //  For now, we have a little list of encodings that we disallow
    //  explicitly. So lets check for them up front. They all start with
    //  IBM, so we can do a quick check to see if we should even do
    //  anything at all.
    //
    if (XMLString::startsWith(upBuf, gDisallowPre))
    {
        for (unsigned int index = 0; index < gDisallowListSize; index++)
        {
            // If its one of our guys, then pretend we don't understand it
            if (!XMLString::compareString(upBuf, gDisallowList[index]))
                return 0;
        }
    }

Does anyone recall why the encodings were disallowed?

Thanks in advance,

Matt Lovett




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Why are some ebcdic codepages 'disallowed'?

Posted by Dean Roddey <dr...@charmedquark.com>.
Its because that is not a well defined EBCDIC code page actually. I think
that it has two different code tables on two different variations of IBM
OSes. This is some historic problem that the parser couldn't do anything
about, so it had to deal with it the best it could, which is by creating
encoding names that are unambiguous and enforcing use of those.

So the problem is that, if the parser is going to support one of these
ambiguous encodings, it must pick one table or the other. So if you give the
parser a file (a perfectly legal file according to your version of your IBM
OS) the parser might reject it as being illegal if the version of that
encoding the parser supported was the opposite one from what your OS uses.

Thus it ends up being an unwinnable situation, and so we ended up having to
step around it. You should use one of the unambiguous EBCDIC encoding names.

--------------------------
Dean Roddey
The Charmed Quark Controller
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"If it don't have a control port, don't buy it!"


----- Original Message -----
From: "Matthew Lovett" <ML...@uk.ibm.com>
To: <xe...@xml.apache.org>
Sent: Monday, September 03, 2001 8:25 AM
Subject: Why are some ebcdic codepages 'disallowed'?


>
> Hi all,
>
> I'm trying to parse a document with encoding 'ibm-37', using the ICU
> transcoder.  The ICU data dll I use does have an entry for ibm-37, but the
> code in TransService.cpp is never letting it get that far.  It seems that
> TransService has a list of encodings that it deliberately disallows:
>
> //  gDisallow1
> //  gDisallowX
> //      These area small set of encoding names that, for temporary
reasons,
> //      we disallow at this time.
> //
> //  gDisallowList
> //  gDisallowListSize
> //      An array of the disallow strings, for easier searching below.
> //
>
> Is there anything to stop me commenting out the code which implements this
> check?  Should I raise a bug report so that the code is chopped out of the
> next Xerces release?



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org