You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by GitBox <gi...@apache.org> on 2019/06/24 20:33:54 UTC
[GitHub] [incubator-daffodil] bsloane1650 commented on a change in pull
request #245: Added X-DFDL-5-BIT-DFI-1661-DUI-001 char encoding
bsloane1650 commented on a change in pull request #245: Added X-DFDL-5-BIT-DFI-1661-DUI-001 char encoding
URL: https://github.com/apache/incubator-daffodil/pull/245#discussion_r296903319
##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/X_DFDL_MIL_STD.scala
##########
@@ -43,7 +43,15 @@ object BitsCharset6BitDFI264DUI001 extends {
object BitsCharset6BitDFI311DUI002 extends {
override val name = "X-DFDL-6-BIT-DFI-311-DUI-002"
override val bitWidthOfACodeUnit = 6
- override val decodeString = """\u00A0ABCDEFGHIJKLMNOPQRSTuVWXYZ\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD \uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD0123456789\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD"""
+ override val decodeString = """\u00A0ABCDEFGHIJKLMNOPQRSTUVWXYZ\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD \uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD0123456789\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD"""
+ override val replacementCharCode = 0x0
+ override val requiredBitOrder = BitOrder.LeastSignificantBitFirst
+} with BitsCharsetNonByteSize
+
+object BitsCharset5BitDFI1661DUI001 extends {
+ override val name = "X-DFDL-5-BIT-DFI-1661-DUI-001"
+ override val bitWidthOfACodeUnit = 5
+ override val decodeString = """\u00A0ABCDEFGHIJKLMNOPQRSTUVWXYZ\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD"""
Review comment:
I am trying to write a test-case for this. What I came up with is:
```
<xs:element name="fiveBitDFI1661DUI001" type="xs:string" dfdl:encoding="X-DFDL-5-BIT-DFI-1661-DUI-001"
dfdl:bitOrder="leastSignificantBitFirst" dfdl:byteOrder="littleEndian"/>
<tdml:parserTestCase name="fiveBitDFI1661DUI001" root="fiveBitDFI1661DUI001" model="enc1" description="X-DFDL-5-BIT-DFI-1661-DUI-001">
<tdml:document>
<tdml:documentPart type="bits" bitOrder="LSBFirst" byteOrder="RTL">
00000
00001
00010
00011
00100
00101
00110
00111
01000
01001
01010
01011
01100
01101
01110
01111
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
11011
11100
11101
11110
11111
</tdml:documentPart>
</tdml:document>
<tdml:infoset>
<tdml:dfdlInfoset>
<!--
Note, the space below is actually \u00A0 no-break space
-->
<tns:fiveBitDFI1661DUI001><![CDATA[ ABCDEFGHIJKLMNOPQRSTUVWXYZ�����]]></tns:fiveBitDFI1661DUI001>
</tdml:dfdlInfoset>
</tdml:infoset>
</tdml:parserTestCase>
```
This fails, with the actual result of the parse being:
```
<ex:fiveBitDFI1661DUI001 xmlns:ex="http://example.com">�����ZYXWVUTSRQPONMLKJIHGFEDCBA </ex:fiveBitDFI1661DUI001>
```
Note that the string has the characters in reverse order.
Given my experience in the area, I assume the problem is with my understanding of LSBF bit ordering.
As a secondary concern, Daffodil is outputing non-breaking space as   . This is technically correct, but it is not clear if this is desireable or not.
Also, I would expect this test to fail to round-trip because of the number of "undefined" characters being mapped to u+FFFD. Once I figure out parse only for this one, I will add a second that round-trips without those characters.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services