You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2020/02/07 16:14:42 UTC

Why is dfdl:lengthUnits=bytes required with lengthKind=prefixed?

Hi Folks,

My input is this:

8John Doe

I used lengthKind=prefixed in my schema. See below. Notice on the element declaration for "name" I specify dfdl:lengthUnits="bytes". I originally specified "characters" instead of "bytes" and that resulted in an error (unconsumed data). This is so counterintuitive. First, why do I even need to specify lengthUnits on "name"? Second, although I don't' show it, I tried putting lengthUnits on the simpleType and it didn't matter what value I assigned to lengthUnits ... I thought you always had to specify lengthUnits whenever you specify lengthKind=explicit ... apparently not. I couldn't find any explanation of this in the specification.  /Roger 

<xs:element name="input">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="name" 
 		type="xs:string" 
 		dfdl:lengthKind="prefixed" 
 		dfdl:lengthUnits="bytes" 
                	dfdl:prefixLengthType="prefix-type"  
		dfdl:prefixIncludesPrefixLength="no"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

<xs:simpleType  name="prefix-type" 
 		dfdl:lengthKind="explicit"
 		dfdl:length="1">
    <xs:restriction base="xs:integer" />
</xs:simpleType>



Re: Why is dfdl:lengthUnits=bytes required with lengthKind=prefixed?

Posted by Steve Lawrence <sl...@apache.org>.
It looks like we might have a bug with dfdl:lengthUnits="characters".
I'm seeing odd behavior too. I'll take a look into this.

And you do need lengthUnits on both the simple type and the name
element. Maybe you already have lengthUnits defined in the default
format, and when you added it to the simpleType you used the same value,
so it was effectively the same?

When you think about lengthKind="prefixed", it might help to think of it
as syntactic sugar of this:

  <xs:sequence>
    <xs:element name="length" type="prefix-type" />
    <xs:element name="name"
      type="xs:string"
      dfdl:lengthKind="explicit"
      dfdl:length="{ ../length }"
      ... />
  </xs:sequence>

So it's really two separate parses. We first parse a "length" field that
results in an integer, and it needs all the normal properties related to
integer parsing, including length/lengthKind/lengthUnits/etc. We then
parse the "name" field that uses the resulting number of the "length"
field as the length. But that number is unitless (e.g. 8). We use the
lengthUnits of "name" to determine how to interpret that number.

When you think about it like this, it maybe becomes more clear when
lengthUnits are required for both the simple type and the "name"
type--they really are two different elements with different properties.

Note that you can mix and match lengthUnits. So, for example, the prefix
length could have lengthUnits="bits", and the element could have
lengthUnits="bytes". So when we parse "8", we could interpret that as 8
bits or 8 bytes depending on the lengthUnits property of "name".



On 2/7/20 11:14 AM, Costello, Roger L. wrote:
> Hi Folks,
> 
> My input is this:
> 
> 8John Doe
> 
> I used lengthKind=prefixed in my schema. See below. Notice on the element declaration for "name" I specify dfdl:lengthUnits="bytes". I originally specified "characters" instead of "bytes" and that resulted in an error (unconsumed data). This is so counterintuitive. First, why do I even need to specify lengthUnits on "name"? Second, although I don't' show it, I tried putting lengthUnits on the simpleType and it didn't matter what value I assigned to lengthUnits ... I thought you always had to specify lengthUnits whenever you specify lengthKind=explicit ... apparently not. I couldn't find any explanation of this in the specification.  /Roger 
> 
> <xs:element name="input">
>     <xs:complexType>
>         <xs:sequence>
>             <xs:element name="name" 
>  		type="xs:string" 
>  		dfdl:lengthKind="prefixed" 
>  		dfdl:lengthUnits="bytes" 
>                 	dfdl:prefixLengthType="prefix-type"  
> 		dfdl:prefixIncludesPrefixLength="no"/>
>         </xs:sequence>
>     </xs:complexType>
> </xs:element>
> 
> <xs:simpleType  name="prefix-type" 
>  		dfdl:lengthKind="explicit"
>  		dfdl:length="1">
>     <xs:restriction base="xs:integer" />
> </xs:simpleType>
> 
>