You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by Mike Beckerle <mb...@tresys.com> on 2017/10/20 18:18:21 UTC

Packed and Zoned Design details

Our initial goal for the packed/zoned number support is to be able to run the IBM-created schemas on github for ISO8583 format, and for IBM4690_TLOG format, as well as run all the scala-debug tests we have that use the packed/zoned and other features that we've not yet implemented.


What we need:


So, ISO8583 uses zoned numbers. IBM4690_TLOG uses "ibm4690Packed" binary numbers.


Neither uses plain old "packed" decimal numbers, but in the daffodil-test-ibm1 module (tests contributed by IBM) there are tests that use regular old "packed" decimal. So we should implement that and get those tests to run also, as part of interoperability demonstration.

In the interest of reducing implementation risk, we should start by implementing the smallest viable subset of the functionality.

And the JTOPEN library appears to be far from comprehensive, so if we use it, we can only implement a starter set of functionality. However it is a good place to start.

For example, we can assume (and check) that the dfdl:byteOrder is always bigEndian - which is all that the JTOPEN library supports. later we can implement and test littleEndian variants.

Some specifics:


For dfdl:binaryNumberRep="ibm4690Packed"


JTOPEN doesn't support this. In this variant of packed and in the IBM4690_TLOG format, the bytes for the number are isolated by use of delimiters (delimited binary - meaning the delimiter is known to be something that cannot appear in the bytes of a packed number!)


We do support binary delimited for hexBinary data today, so the isolation of the bytes can be based on that code.


We can write our own parser/unparser for ibm4690packed, or massage the bytes into packed form, and then call the JTOPEN routine for packed.


DFDL requires 4-bit alignment for all the packed number types. So the alignmentInBits lazy val in the Daffodil schema compiler, a check needs to be made for the binaryNumberRep packed case to insure this 4-bit alignment.


For dfdl:binaryNumberRep="packed"


There is no DFDL property for specifying leading sign for packed. Sign is always the final/last nibble of the last byte.

I used to think that like zoned, for packed you could specify if you wanted the sign leading or trailing, but some web searching suggests only trailing sign nibbles for "packed" representation (What cobol calls Computational-3 or Comp-3 type.)

But note that ibm4690Packed is a variant of packed with leading sign.


Initially we can require the dfdl:binaryPackedSignCodes to be specified, but only accept "C D F C" as the 4 nibbles - assuming this is what the JTOPEN library implements.


For the dfdl:binaryNumberCheckPolicy strict is specified by IBM4690_TLOG, but lax is specified by the ISO8583 schema. So both must be supported, but initially we can implement strict, and add lax later.


For dfdl:textNumberRep="zoned"


JTOPEN only suppports trailing overpunched sign.


So the dfdl:textNumberPattern, if it shows a sign "+" location, it must be after the final digit. The ISO8583 schema doesn't do this. It shows a leading + sign. However, all the data is actually unsigned, so there is no overpunched minus-sign, so whether the "+" is first or last doesn't matter.


Here's a link to the variations in Cobol for specifying numbers with "Usage Display" which means "text numbers"


https://supportline.microfocus.com/Documentation/books/rd60/lhpdf40m.htm


I include this link only by way of showing that Cobol data can have many more variants than the JTOPEN library supports. Also of note is that Cobol's default behavior for Usage Display is zoned trailing sign. Cobol code must add the clause "SIGN TRAILING SEPARATE" or "SIGN LEADING SEPARATE" to get a textNumberRep='standard' number.

Since JTOPEN will not support our functional needs, we will have to rewrite and either contribute back to JTOPEN, or write our own library that is more flexible. I would prefer to implement the bare minimum here that will let us handle the github DFDL schemas for ISO8583, IBM4670TLOG.


For zoned, the dfdl:textZonedSignStyle of 'asciiStandard' is the only one needed for ISO8583 or IBM4690_TLOG formats, as these use iso-8859-1 and us-ascii encodings, so both ascii.


No TDML or unit tests exercise EBCDIC zoned data currently, so we can initially focus on ascii only. We do claim to support EBCDIC encoding and do support it for textNumberRep='standard', so we do need to support it for textNumberRep 'zoned'. This will result in a string that needs to be interpreted according to textZonedSignStyle of 'asciiTranslatedEBCDIC'.


--------------------------


SIGN Clause - supportline.microfocus.com<https://supportline.microfocus.com/Documentation/books/rd60/lhpdf40m.htm>
supportline.microfocus.com
The SIGN Clause. The SIGN clause specifies the position and the mode of representation of the operational sign when it is necessary to describe these properties ...