You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Scott Cantor (JIRA)" <xe...@xml.apache.org> on 2017/07/12 22:56:00 UTC
[jira] [Resolved] (XERCESC-2054) Grammar serialization not portable (integer size / alignment issue)

     [ https://issues.apache.org/jira/browse/XERCESC-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Cantor resolved XERCESC-2054.
-----------------------------------
    Resolution: Duplicate

This was already reported, and there are some additional comments over in the original bug, so closing this.

> Grammar serialization not portable (integer size / alignment issue)
> -------------------------------------------------------------------
>
>                 Key: XERCESC-2054
>                 URL: https://issues.apache.org/jira/browse/XERCESC-2054
>             Project: Xerces-C++
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.1.4
>         Environment: Linux CentOS-7 (64bit), Windows 7 (64bit)
>            Reporter: Oliver Moeller
>
> Apologies if this is a known issue, but I have not found it by conventional
> means (i.e., google an searching through the bug data base here).
> I found that the serialisation/deserialisation (here: of grammars) is not as portable as it (IMHO) should be.
> The problem happens in XSerializeEngine::readString() when
> the length of the string is taken from the associated BinInputStream as
> "unsigned long":
>     /***
>      * Check if any data written
>      ***/
>     unsigned long tmp;
>     *this>>tmp;
> On a Windows7 x64, MSVS2012, this will take 4 byte off the head of the stream,
> but on a CentOS 7 x64 (g++ 4.8.3), this will take 8 byte.
> As a consequence, a BinInputStream carefully encoded on Windows (e.g. putting
> it into a char array with
>   examples/cxx/tree/embedded/grammar-input-stream.cxx
> which is a common xsd example)
> will fail when "reading" it on the Linux box, because everything from the first
> string on is garbage.
> Moreover, this will (probably) give no meaningful error message, just a
> "XSerialisationException" thrown, cause at some point it will (probably)
> misinterpret wchar data as length information and try to read the next string
> that is millions of bytes long (according to the misunderstood BinInputStream).
> The BinInputStream will then run out of bytes.
> A similar issue is present concerning the *alignment* of the data according to data type that happens for all >> operations: this is (necessarily) very
> platform dependent.
> It would be a big improvement, if xerces would encode the (de)serialization
> in a platform/compiler independent manner. The purpose after all *IS* to be portable, right?
> E.g., the serialisation engine could always use integers of known byte width
> (e.g.: #include <inttypes.h> -> use uint32_t) instead of "unsigned long".
> ALso, the alignment issue should be addressed; it is hard to predict
> what restrictions apply for the used compiler (or even processor) here, some are not capable to read an integer from a memory address that is not 4-byte aligned.
> E.g., the data could be copied (to a properly aligned item initialized by 0s)
> before doing the cast to an integer type.
> In any case, it should always be platform-independent how many bytes are next to be read from the BinaryInputStream.
> (Of course, the write operations have to follow the same business logic.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org