You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by GitBox <gi...@apache.org> on 2019/06/24 20:33:54 UTC

[GitHub] [incubator-daffodil] bsloane1650 commented on a change in pull request #245: Added X-DFDL-5-BIT-DFI-1661-DUI-001 char encoding

bsloane1650 commented on a change in pull request #245: Added X-DFDL-5-BIT-DFI-1661-DUI-001 char encoding
URL: https://github.com/apache/incubator-daffodil/pull/245#discussion_r296903319
 
 

 ##########
 File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/X_DFDL_MIL_STD.scala
 ##########
 @@ -43,7 +43,15 @@ object BitsCharset6BitDFI264DUI001 extends {
 object BitsCharset6BitDFI311DUI002 extends {
   override val name = "X-DFDL-6-BIT-DFI-311-DUI-002"
   override val bitWidthOfACodeUnit = 6
-  override val decodeString = """\u00A0ABCDEFGHIJKLMNOPQRSTuVWXYZ\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD \uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD0123456789\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD"""
+  override val decodeString = """\u00A0ABCDEFGHIJKLMNOPQRSTUVWXYZ\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD \uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD0123456789\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD"""
+  override val replacementCharCode = 0x0
+  override val requiredBitOrder = BitOrder.LeastSignificantBitFirst
+} with BitsCharsetNonByteSize
+
+object BitsCharset5BitDFI1661DUI001 extends {
+  override val name = "X-DFDL-5-BIT-DFI-1661-DUI-001"
+  override val bitWidthOfACodeUnit = 5
+  override val decodeString = """\u00A0ABCDEFGHIJKLMNOPQRSTUVWXYZ\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD"""
 
 Review comment:
   I am trying to write a test-case for this. What I came up with is:
   
   ```
     <xs:element name="fiveBitDFI1661DUI001" type="xs:string" dfdl:encoding="X-DFDL-5-BIT-DFI-1661-DUI-001"
       dfdl:bitOrder="leastSignificantBitFirst" dfdl:byteOrder="littleEndian"/>
   
     <tdml:parserTestCase name="fiveBitDFI1661DUI001" root="fiveBitDFI1661DUI001" model="enc1" description="X-DFDL-5-BIT-DFI-1661-DUI-001">
       <tdml:document>
         <tdml:documentPart type="bits" bitOrder="LSBFirst" byteOrder="RTL">
         00000
         00001
         00010
         00011
         00100
         00101
         00110
         00111
         01000
         01001
         01010
         01011
         01100
         01101
         01110
         01111
         10000
         10001
         10010
         10011
         10100
         10101
         10110
         10111
         11000
         11001
         11010
         11011
         11100
         11101
         11110
         11111
         </tdml:documentPart>
       </tdml:document>
       <tdml:infoset>
         <tdml:dfdlInfoset>
           <!--
           Note, the space below is actually \u00A0 no-break space
           -->
           <tns:fiveBitDFI1661DUI001><![CDATA[ ABCDEFGHIJKLMNOPQRSTUVWXYZ&#xFFFD;&#xFFFD;&#xFFFD;&#xFFFD;&#xFFFD;]]></tns:fiveBitDFI1661DUI001>
         </tdml:dfdlInfoset>
       </tdml:infoset>
     </tdml:parserTestCase>
   ```
   
   This fails, with the actual result of the parse being:
   ```
   <ex:fiveBitDFI1661DUI001 xmlns:ex="http://example.com">&#xFFFD;&#xFFFD;&#xFFFD;&#xFFFD;&#xFFFD;ZYXWVUTSRQPONMLKJIHGFEDCBA&#xA0;</ex:fiveBitDFI1661DUI001>
   ```
   
   Note that the string has the characters in reverse order.
   Given my experience in the area, I assume the problem is with my understanding of LSBF bit ordering.
   
   As a secondary concern, Daffodil is outputing non-breaking space as &#xA0; . This is technically correct, but it is not clear if this is desireable or not.
   
   Also, I would expect this test to fail to round-trip because of the number of "undefined" characters being mapped to u+FFFD. Once I figure out parse only for this one, I will add a second that round-trips without those characters.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services