You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@daffodil.apache.org by "Gedvilas, Brett L2" <BR...@UCDENVER.EDU> on 2018/09/12 18:14:16 UTC

DFDL Schema help

Hi everyone,


I am a new daffodil user and I was looking for input on a DFDL schema definition I'm trying to create. I'm working with some binary physics data, the format of which can loosely be described as fields that are aggregated together and packed into a single 32-bit integer before being written to memory. The gist of the issue is that because not all fields fall nicely on 1-byte divisions, different pieces of a field will get jumbled if you read the data as a linear stream from memory. This is best illustrated by a simple example:


Consider the following 32-bit hex value: 0x90 00 20 01


The problem arises because the values that have meaning in context are 0x9 (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).


When this value gets stored in memory on a little endian architecture we see the following: 0x01 20 00 90. Trying to read those bit sequences as a stream from memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.


The simplest approach I can envision is to read in the value as an entire 32-bit value and then perform some processing via masks/bit shift in order to extract the correct values. Is there a more straightforward solution to this problem? or does anyone have experience or insights solving this issue using daffodil?


Thanks!


Brett

Re: DFDL Schema help

Posted by Steve Lawrence <sl...@apache.org>.

Maybe the XML was filtered out by an over aggressive spam filter. Here's
a link to a github gist instead:

https://gist.github.com/stevedlawrence/691c4c7db664f2678524e8ac8f7195ad

- Steve

On 09/12/2018 03:57 PM, Gedvilas, Brett L2 wrote:
> Hi Steve,
> 
> 
> Thanks for the quick reply, that appears to be exactly what I'm looking for! Is 
> there any chance you could try sending me the example.dfdl.xsd file again? The 
> attachment didn't seem to make it through correctly.
> 
> 
> -Brett
> 
> --------------------------------------------------------------------------------
> *From:* Steve Lawrence <sl...@apache.org>
> *Sent:* Wednesday, September 12, 2018 1:04:39 PM
> *To:* users@daffodil.apache.org; Gedvilas, Brett L2
> *Subject:* Re: DFDL Schema help
> Hi Brett,
> 
> The recent 2.2.0 release adds a feature that does just what you need,
> called "data layering". It's not officially part of the DFDL spec, but
> the proposal is found here:
> 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75979671
> 
> Essentially, what you'll want to do to is specify a data layer transform
> of "fourbyteswap" on your data. This layer transform swaps the bytes of
> each 4 byte chunk for the given length of data, effectively making them
> big-endian-like. You can then parse the individual fields using a
> bigEndian byteOrder and explicit bit lengths. I've attached an example
> schema that parses your 4 bytes of example data to give you an idea of
> what such a schema would look like.
> 
> The data in the data.bin is:
> 
>    0x01 20 00 90
> 
> To parse with the daffodil CLI, you can run:
> 
>    daffodil parse -s example.dfdl.xsd data.bin
> 
> The resulting XML infoset should be:
> 
>    <Data>
>      <a>9</a>
>      <b>2</b>
>      <c>1</c>
>    </Data>
> 
> - Steve
> 
> 
> On 09/12/2018 02:14 PM, Gedvilas, Brett L2 wrote:
>> Hi everyone,
>> 
>> 
>> I am a new daffodil user and I was looking for input on a DFDL schema definition
>> I'm trying to create. I'm working with some binary physics data, the format of 
>> which can loosely be described as fields that are aggregated together and packed
>> into a single 32-bit integer before being written to memory. The gist of the 
>> issue is that because not all fields fall nicely on 1-byte divisions, different
>> pieces of a field will get jumbled if you read the data as a linear stream from
>> memory. This is best illustrated by a simple example:
>> 
>> 
>> Consider the following 32-bit hex value: 0x90 00 20 01
>> 
>> 
>> The problem arises because the values that have meaning in context are 0x9 
>> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
>> 
>> 
>> When this value gets stored in memory on a little endian architecture we see the
>> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from 
>> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
>> 
>> 
>> The simplest approach I can envision is to read in the value as an entire 32-bit
>> value and then perform some processing via masks/bit shift in order to extract 
>> the correct values. Is there a more straightforward solution to this problem? or
>> does anyone have experience or insights solving this issue using daffodil?
>> 
>> 
>> Thanks!
>> 
>> 
>> Brett
>> 
>> 
>> 
>> 
>> 
>

Re: DFDL Schema help

Posted by "Gedvilas, Brett L2" <BR...@UCDENVER.EDU>.

Hi Steve,

Thanks for the quick reply, that appears to be exactly what I'm looking for! Is there any chance you could try sending me the example.dfdl.xsd file again? The attachment didn't seem to make it through correctly.

-Brett

________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Wednesday, September 12, 2018 1:04:39 PM
To: users@daffodil.apache.org; Gedvilas, Brett L2
Subject: Re: DFDL Schema help

Hi Brett,

The recent 2.2.0 release adds a feature that does just what you need,
called "data layering". It's not officially part of the DFDL spec, but
the proposal is found here:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75979671

Essentially, what you'll want to do to is specify a data layer transform
of "fourbyteswap" on your data. This layer transform swaps the bytes of
each 4 byte chunk for the given length of data, effectively making them
big-endian-like. You can then parse the individual fields using a
bigEndian byteOrder and explicit bit lengths. I've attached an example
schema that parses your 4 bytes of example data to give you an idea of
what such a schema would look like.

The data in the data.bin is:

  0x01 20 00 90

To parse with the daffodil CLI, you can run:

  daffodil parse -s example.dfdl.xsd data.bin

The resulting XML infoset should be:

  <Data>
    <a>9</a>
    <b>2</b>
    <c>1</c>
  </Data>

- Steve

On 09/12/2018 02:14 PM, Gedvilas, Brett L2 wrote:
> Hi everyone,
>
>
> I am a new daffodil user and I was looking for input on a DFDL schema definition
> I'm trying to create. I'm working with some binary physics data, the format of
> which can loosely be described as fields that are aggregated together and packed
> into a single 32-bit integer before being written to memory. The gist of the
> issue is that because not all fields fall nicely on 1-byte divisions, different
> pieces of a field will get jumbled if you read the data as a linear stream from
> memory. This is best illustrated by a simple example:
>
>
> Consider the following 32-bit hex value: 0x90 00 20 01
>
>
> The problem arises because the values that have meaning in context are 0x9
> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
>
>
> When this value gets stored in memory on a little endian architecture we see the
> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from
> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
>
>
> The simplest approach I can envision is to read in the value as an entire 32-bit
> value and then perform some processing via masks/bit shift in order to extract
> the correct values. Is there a more straightforward solution to this problem? or
> does anyone have experience or insights solving this issue using daffodil?
>
>
> Thanks!
>
>
> Brett
>
>
>
>
>

Re: DFDL Schema help

Posted by Steve Lawrence <sl...@apache.org>.

Hi Brett,

The recent 2.2.0 release adds a feature that does just what you need,
called "data layering". It's not officially part of the DFDL spec, but
the proposal is found here:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75979671

Essentially, what you'll want to do to is specify a data layer transform
of "fourbyteswap" on your data. This layer transform swaps the bytes of
each 4 byte chunk for the given length of data, effectively making them
big-endian-like. You can then parse the individual fields using a
bigEndian byteOrder and explicit bit lengths. I've attached an example
schema that parses your 4 bytes of example data to give you an idea of
what such a schema would look like.

The data in the data.bin is:

  0x01 20 00 90

To parse with the daffodil CLI, you can run:

  daffodil parse -s example.dfdl.xsd data.bin

The resulting XML infoset should be:

  <Data>
    <a>9</a>
    <b>2</b>
    <c>1</c>
  </Data>

- Steve

On 09/12/2018 02:14 PM, Gedvilas, Brett L2 wrote:
> Hi everyone,
> 
> 
> I am a new daffodil user and I was looking for input on a DFDL schema definition 
> I'm trying to create. I'm working with some binary physics data, the format of 
> which can loosely be described as fields that are aggregated together and packed 
> into a single 32-bit integer before being written to memory. The gist of the 
> issue is that because not all fields fall nicely on 1-byte divisions, different 
> pieces of a field will get jumbled if you read the data as a linear stream from 
> memory. This is best illustrated by a simple example:
> 
> 
> Consider the following 32-bit hex value: 0x90 00 20 01
> 
> 
> The problem arises because the values that have meaning in context are 0x9 
> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
> 
> 
> When this value gets stored in memory on a little endian architecture we see the 
> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from 
> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
> 
> 
> The simplest approach I can envision is to read in the value as an entire 32-bit 
> value and then perform some processing via masks/bit shift in order to extract 
> the correct values. Is there a more straightforward solution to this problem? or 
> does anyone have experience or insights solving this issue using daffodil?
> 
> 
> Thanks!
> 
> 
> Brett
> 
> 
> 
> 
>

Re: DFDL Schema help

Posted by "Gedvilas, Brett L2" <BR...@UCDENVER.EDU>.

Thanks for the input Mike. I had definitely been misinterpreting how dfdl was applying the leastSignificantBitFirst property to the data stream but I think that makes sense now. I appreciate the help.


Brett

________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, September 13, 2018 5:23:11 AM
To: users@daffodil.apache.org; Mike Beckerle
Subject: Re: DFDL Schema help

Good call, Mike. That's going to be more efficient and probably is the
right representation of the data. Here is a schema gist of what Mike
describes:

https://gist.github.com/stevedlawrence/1404e03a313ff63cd0bad8c79d0ae267

- Steve

On 09/12/2018 09:13 PM, Mike Beckerle wrote:
> Layering will work, but this problem is simpler than that.
>
>
> Pretty sure this is just dfdl:bitOrder='leastSignificantBitFirst' with
> dfdl:byteOrder="littleEndian" data.
>
>
> However, the order of the fields is reversed from the way you are thinking. The
> first field is the 12 bit field, the second field is the 16 bit field, the third
> is the 4 bit field. This should illustrate what I mean:
>
>
>       Byte 4    Byte 3    Byte 2    Byte 1
>
> Hex: 9    0    0    0    2    0    0    1
>
> Bits 1001 0000 0000 0000 0010 0000 0000 0001
>
> f1                            xxxx xxxx xxxx
>
> f2        yyyy yyyy yyyy yyyy
>
> f3   zzzz
>
>
> ...mike beckerle
>
> Tresys
>
> --------------------------------------------------------------------------------
> *From:* Gedvilas, Brett L2 <BR...@UCDENVER.EDU>
> *Sent:* Wednesday, September 12, 2018 2:14 PM
> *To:* users@daffodil.apache.org
> *Subject:* DFDL Schema help
>
> Hi everyone,
>
>
> I am a new daffodil user and I was looking for input on a DFDL schema definition
> I'm trying to create. I'm working with some binary physics data, the format of
> which can loosely be described as fields that are aggregated together and packed
> into a single 32-bit integer before being written to memory. The gist of the
> issue is that because not all fields fall nicely on 1-byte divisions, different
> pieces of a field will get jumbled if you read the data as a linear stream from
> memory. This is best illustrated by a simple example:
>
>
> Consider the following 32-bit hex value: 0x90 00 20 01
>
>
> The problem arises because the values that have meaning in context are 0x9
> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
>
>
> When this value gets stored in memory on a little endian architecture we see the
> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from
> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
>
>
> The simplest approach I can envision is to read in the value as an entire 32-bit
> value and then perform some processing via masks/bit shift in order to extract
> the correct values. Is there a more straightforward solution to this problem? or
> does anyone have experience or insights solving this issue using daffodil?
>
>
> Thanks!
>
>
> Brett
>
>
>
>
>

Re: DFDL Schema help

Posted by Steve Lawrence <sl...@apache.org>.

Good call, Mike. That's going to be more efficient and probably is the
right representation of the data. Here is a schema gist of what Mike
describes:

https://gist.github.com/stevedlawrence/1404e03a313ff63cd0bad8c79d0ae267

- Steve

On 09/12/2018 09:13 PM, Mike Beckerle wrote:
> Layering will work, but this problem is simpler than that.
> 
> 
> Pretty sure this is just dfdl:bitOrder='leastSignificantBitFirst' with 
> dfdl:byteOrder="littleEndian" data.
> 
> 
> However, the order of the fields is reversed from the way you are thinking. The 
> first field is the 12 bit field, the second field is the 16 bit field, the third 
> is the 4 bit field. This should illustrate what I mean:
> 
> 
>       Byte 4    Byte 3    Byte 2    Byte 1
> 
> Hex: 9    0    0    0    2    0    0    1
> 
> Bits 1001 0000 0000 0000 0010 0000 0000 0001
> 
> f1                            xxxx xxxx xxxx
> 
> f2        yyyy yyyy yyyy yyyy
> 
> f3   zzzz
> 
> 
> ...mike beckerle
> 
> Tresys
> 
> --------------------------------------------------------------------------------
> *From:* Gedvilas, Brett L2 <BR...@UCDENVER.EDU>
> *Sent:* Wednesday, September 12, 2018 2:14 PM
> *To:* users@daffodil.apache.org
> *Subject:* DFDL Schema help
> 
> Hi everyone,
> 
> 
> I am a new daffodil user and I was looking for input on a DFDL schema definition 
> I'm trying to create. I'm working with some binary physics data, the format of 
> which can loosely be described as fields that are aggregated together and packed 
> into a single 32-bit integer before being written to memory. The gist of the 
> issue is that because not all fields fall nicely on 1-byte divisions, different 
> pieces of a field will get jumbled if you read the data as a linear stream from 
> memory. This is best illustrated by a simple example:
> 
> 
> Consider the following 32-bit hex value: 0x90 00 20 01
> 
> 
> The problem arises because the values that have meaning in context are 0x9 
> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
> 
> 
> When this value gets stored in memory on a little endian architecture we see the 
> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from 
> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
> 
> 
> The simplest approach I can envision is to read in the value as an entire 32-bit 
> value and then perform some processing via masks/bit shift in order to extract 
> the correct values. Is there a more straightforward solution to this problem? or 
> does anyone have experience or insights solving this issue using daffodil?
> 
> 
> Thanks!
> 
> 
> Brett
> 
> 
> 
> 
>

Re: DFDL Schema help

Posted by Mike Beckerle <mb...@tresys.com>.

Layering will work, but this problem is simpler than that.


Pretty sure this is just dfdl:bitOrder='leastSignificantBitFirst' with dfdl:byteOrder="littleEndian" data.


However, the order of the fields is reversed from the way you are thinking. The first field is the 12 bit field, the second field is the 16 bit field, the third is the 4 bit field. This should illustrate what I mean:


     Byte 4    Byte 3    Byte 2    Byte 1

Hex: 9    0    0    0    2    0    0    1

Bits 1001 0000 0000 0000 0010 0000 0000 0001

f1                            xxxx xxxx xxxx

f2        yyyy yyyy yyyy yyyy

f3   zzzz


...mike beckerle

Tresys

________________________________
From: Gedvilas, Brett L2 <BR...@UCDENVER.EDU>
Sent: Wednesday, September 12, 2018 2:14 PM
To: users@daffodil.apache.org
Subject: DFDL Schema help


Hi everyone,


I am a new daffodil user and I was looking for input on a DFDL schema definition I'm trying to create. I'm working with some binary physics data, the format of which can loosely be described as fields that are aggregated together and packed into a single 32-bit integer before being written to memory. The gist of the issue is that because not all fields fall nicely on 1-byte divisions, different pieces of a field will get jumbled if you read the data as a linear stream from memory. This is best illustrated by a simple example:


Consider the following 32-bit hex value: 0x90 00 20 01


The problem arises because the values that have meaning in context are 0x9 (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).


When this value gets stored in memory on a little endian architecture we see the following: 0x01 20 00 90. Trying to read those bit sequences as a stream from memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.


The simplest approach I can envision is to read in the value as an entire 32-bit value and then perform some processing via masks/bit shift in order to extract the correct values. Is there a more straightforward solution to this problem? or does anyone have experience or insights solving this issue using daffodil?


Thanks!


Brett