You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Gedvilas, Brett L2" <BR...@UCDENVER.EDU> on 2018/09/12 18:14:16 UTC
DFDL Schema help
Hi everyone,
I am a new daffodil user and I was looking for input on a DFDL schema definition I'm trying to create. I'm working with some binary physics data, the format of which can loosely be described as fields that are aggregated together and packed into a single 32-bit integer before being written to memory. The gist of the issue is that because not all fields fall nicely on 1-byte divisions, different pieces of a field will get jumbled if you read the data as a linear stream from memory. This is best illustrated by a simple example:
Consider the following 32-bit hex value: 0x90 00 20 01
The problem arises because the values that have meaning in context are 0x9 (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
When this value gets stored in memory on a little endian architecture we see the following: 0x01 20 00 90. Trying to read those bit sequences as a stream from memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
The simplest approach I can envision is to read in the value as an entire 32-bit value and then perform some processing via masks/bit shift in order to extract the correct values. Is there a more straightforward solution to this problem? or does anyone have experience or insights solving this issue using daffodil?
Thanks!
Brett
Re: DFDL Schema help
Posted by Steve Lawrence <sl...@apache.org>.
Maybe the XML was filtered out by an over aggressive spam filter. Here's
a link to a github gist instead:
https://gist.github.com/stevedlawrence/691c4c7db664f2678524e8ac8f7195ad
- Steve
On 09/12/2018 03:57 PM, Gedvilas, Brett L2 wrote:
> Hi Steve,
>
>
> Thanks for the quick reply, that appears to be exactly what I'm looking for! Is
> there any chance you could try sending me the example.dfdl.xsd file again? The
> attachment didn't seem to make it through correctly.
>
>
> -Brett
>
> --------------------------------------------------------------------------------
> *From:* Steve Lawrence <sl...@apache.org>
> *Sent:* Wednesday, September 12, 2018 1:04:39 PM
> *To:* users@daffodil.apache.org; Gedvilas, Brett L2
> *Subject:* Re: DFDL Schema help
> Hi Brett,
>
> The recent 2.2.0 release adds a feature that does just what you need,
> called "data layering". It's not officially part of the DFDL spec, but
> the proposal is found here:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75979671
>
> Essentially, what you'll want to do to is specify a data layer transform
> of "fourbyteswap" on your data. This layer transform swaps the bytes of
> each 4 byte chunk for the given length of data, effectively making them
> big-endian-like. You can then parse the individual fields using a
> bigEndian byteOrder and explicit bit lengths. I've attached an example
> schema that parses your 4 bytes of example data to give you an idea of
> what such a schema would look like.
>
> The data in the data.bin is:
>
> 0x01 20 00 90
>
> To parse with the daffodil CLI, you can run:
>
> daffodil parse -s example.dfdl.xsd data.bin
>
> The resulting XML infoset should be:
>
> <Data>
> <a>9</a>
> <b>2</b>
> <c>1</c>
> </Data>
>
> - Steve
>
>
> On 09/12/2018 02:14 PM, Gedvilas, Brett L2 wrote:
>> Hi everyone,
>>
>>
>> I am a new daffodil user and I was looking for input on a DFDL schema definition
>> I'm trying to create. I'm working with some binary physics data, the format of
>> which can loosely be described as fields that are aggregated together and packed
>> into a single 32-bit integer before being written to memory. The gist of the
>> issue is that because not all fields fall nicely on 1-byte divisions, different
>> pieces of a field will get jumbled if you read the data as a linear stream from
>> memory. This is best illustrated by a simple example:
>>
>>
>> Consider the following 32-bit hex value: 0x90 00 20 01
>>
>>
>> The problem arises because the values that have meaning in context are 0x9
>> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
>>
>>
>> When this value gets stored in memory on a little endian architecture we see the
>> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from
>> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
>>
>>
>> The simplest approach I can envision is to read in the value as an entire 32-bit
>> value and then perform some processing via masks/bit shift in order to extract
>> the correct values. Is there a more straightforward solution to this problem? or
>> does anyone have experience or insights solving this issue using daffodil?
>>
>>
>> Thanks!
>>
>>
>> Brett
>>
>>
>>
>>
>>
>
Re: DFDL Schema help
Posted by "Gedvilas, Brett L2" <BR...@UCDENVER.EDU>.
Hi Steve,
Thanks for the quick reply, that appears to be exactly what I'm looking for! Is there any chance you could try sending me the example.dfdl.xsd file again? The attachment didn't seem to make it through correctly.
-Brett
________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Wednesday, September 12, 2018 1:04:39 PM
To: users@daffodil.apache.org; Gedvilas, Brett L2
Subject: Re: DFDL Schema help
Hi Brett,
The recent 2.2.0 release adds a feature that does just what you need,
called "data layering". It's not officially part of the DFDL spec, but
the proposal is found here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75979671
Essentially, what you'll want to do to is specify a data layer transform
of "fourbyteswap" on your data. This layer transform swaps the bytes of
each 4 byte chunk for the given length of data, effectively making them
big-endian-like. You can then parse the individual fields using a
bigEndian byteOrder and explicit bit lengths. I've attached an example
schema that parses your 4 bytes of example data to give you an idea of
what such a schema would look like.
The data in the data.bin is:
0x01 20 00 90
To parse with the daffodil CLI, you can run:
daffodil parse -s example.dfdl.xsd data.bin
The resulting XML infoset should be:
<Data>
<a>9</a>
<b>2</b>
<c>1</c>
</Data>
- Steve
On 09/12/2018 02:14 PM, Gedvilas, Brett L2 wrote:
> Hi everyone,
>
>
> I am a new daffodil user and I was looking for input on a DFDL schema definition
> I'm trying to create. I'm working with some binary physics data, the format of
> which can loosely be described as fields that are aggregated together and packed
> into a single 32-bit integer before being written to memory. The gist of the
> issue is that because not all fields fall nicely on 1-byte divisions, different
> pieces of a field will get jumbled if you read the data as a linear stream from
> memory. This is best illustrated by a simple example:
>
>
> Consider the following 32-bit hex value: 0x90 00 20 01
>
>
> The problem arises because the values that have meaning in context are 0x9
> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
>
>
> When this value gets stored in memory on a little endian architecture we see the
> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from
> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
>
>
> The simplest approach I can envision is to read in the value as an entire 32-bit
> value and then perform some processing via masks/bit shift in order to extract
> the correct values. Is there a more straightforward solution to this problem? or
> does anyone have experience or insights solving this issue using daffodil?
>
>
> Thanks!
>
>
> Brett
>
>
>
>
>
Re: DFDL Schema help
Posted by Steve Lawrence <sl...@apache.org>.
Hi Brett,
The recent 2.2.0 release adds a feature that does just what you need,
called "data layering". It's not officially part of the DFDL spec, but
the proposal is found here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75979671
Essentially, what you'll want to do to is specify a data layer transform
of "fourbyteswap" on your data. This layer transform swaps the bytes of
each 4 byte chunk for the given length of data, effectively making them
big-endian-like. You can then parse the individual fields using a
bigEndian byteOrder and explicit bit lengths. I've attached an example
schema that parses your 4 bytes of example data to give you an idea of
what such a schema would look like.
The data in the data.bin is:
0x01 20 00 90
To parse with the daffodil CLI, you can run:
daffodil parse -s example.dfdl.xsd data.bin
The resulting XML infoset should be:
<Data>
<a>9</a>
<b>2</b>
<c>1</c>
</Data>
- Steve
On 09/12/2018 02:14 PM, Gedvilas, Brett L2 wrote:
> Hi everyone,
>
>
> I am a new daffodil user and I was looking for input on a DFDL schema definition
> I'm trying to create. I'm working with some binary physics data, the format of
> which can loosely be described as fields that are aggregated together and packed
> into a single 32-bit integer before being written to memory. The gist of the
> issue is that because not all fields fall nicely on 1-byte divisions, different
> pieces of a field will get jumbled if you read the data as a linear stream from
> memory. This is best illustrated by a simple example:
>
>
> Consider the following 32-bit hex value: 0x90 00 20 01
>
>
> The problem arises because the values that have meaning in context are 0x9
> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
>
>
> When this value gets stored in memory on a little endian architecture we see the
> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from
> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
>
>
> The simplest approach I can envision is to read in the value as an entire 32-bit
> value and then perform some processing via masks/bit shift in order to extract
> the correct values. Is there a more straightforward solution to this problem? or
> does anyone have experience or insights solving this issue using daffodil?
>
>
> Thanks!
>
>
> Brett
>
>
>
>
>
Re: DFDL Schema help
Posted by "Gedvilas, Brett L2" <BR...@UCDENVER.EDU>.
Thanks for the input Mike. I had definitely been misinterpreting how dfdl was applying the leastSignificantBitFirst property to the data stream but I think that makes sense now. I appreciate the help.
Brett
________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, September 13, 2018 5:23:11 AM
To: users@daffodil.apache.org; Mike Beckerle
Subject: Re: DFDL Schema help
Good call, Mike. That's going to be more efficient and probably is the
right representation of the data. Here is a schema gist of what Mike
describes:
https://gist.github.com/stevedlawrence/1404e03a313ff63cd0bad8c79d0ae267
- Steve
On 09/12/2018 09:13 PM, Mike Beckerle wrote:
> Layering will work, but this problem is simpler than that.
>
>
> Pretty sure this is just dfdl:bitOrder='leastSignificantBitFirst' with
> dfdl:byteOrder="littleEndian" data.
>
>
> However, the order of the fields is reversed from the way you are thinking. The
> first field is the 12 bit field, the second field is the 16 bit field, the third
> is the 4 bit field. This should illustrate what I mean:
>
>
> Byte 4 Byte 3 Byte 2 Byte 1
>
> Hex: 9 0 0 0 2 0 0 1
>
> Bits 1001 0000 0000 0000 0010 0000 0000 0001
>
> f1 xxxx xxxx xxxx
>
> f2 yyyy yyyy yyyy yyyy
>
> f3 zzzz
>
>
> ...mike beckerle
>
> Tresys
>
> --------------------------------------------------------------------------------
> *From:* Gedvilas, Brett L2 <BR...@UCDENVER.EDU>
> *Sent:* Wednesday, September 12, 2018 2:14 PM
> *To:* users@daffodil.apache.org
> *Subject:* DFDL Schema help
>
> Hi everyone,
>
>
> I am a new daffodil user and I was looking for input on a DFDL schema definition
> I'm trying to create. I'm working with some binary physics data, the format of
> which can loosely be described as fields that are aggregated together and packed
> into a single 32-bit integer before being written to memory. The gist of the
> issue is that because not all fields fall nicely on 1-byte divisions, different
> pieces of a field will get jumbled if you read the data as a linear stream from
> memory. This is best illustrated by a simple example:
>
>
> Consider the following 32-bit hex value: 0x90 00 20 01
>
>
> The problem arises because the values that have meaning in context are 0x9
> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
>
>
> When this value gets stored in memory on a little endian architecture we see the
> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from
> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
>
>
> The simplest approach I can envision is to read in the value as an entire 32-bit
> value and then perform some processing via masks/bit shift in order to extract
> the correct values. Is there a more straightforward solution to this problem? or
> does anyone have experience or insights solving this issue using daffodil?
>
>
> Thanks!
>
>
> Brett
>
>
>
>
>
Re: DFDL Schema help
Posted by Steve Lawrence <sl...@apache.org>.
Good call, Mike. That's going to be more efficient and probably is the
right representation of the data. Here is a schema gist of what Mike
describes:
https://gist.github.com/stevedlawrence/1404e03a313ff63cd0bad8c79d0ae267
- Steve
On 09/12/2018 09:13 PM, Mike Beckerle wrote:
> Layering will work, but this problem is simpler than that.
>
>
> Pretty sure this is just dfdl:bitOrder='leastSignificantBitFirst' with
> dfdl:byteOrder="littleEndian" data.
>
>
> However, the order of the fields is reversed from the way you are thinking. The
> first field is the 12 bit field, the second field is the 16 bit field, the third
> is the 4 bit field. This should illustrate what I mean:
>
>
> Byte 4 Byte 3 Byte 2 Byte 1
>
> Hex: 9 0 0 0 2 0 0 1
>
> Bits 1001 0000 0000 0000 0010 0000 0000 0001
>
> f1 xxxx xxxx xxxx
>
> f2 yyyy yyyy yyyy yyyy
>
> f3 zzzz
>
>
> ...mike beckerle
>
> Tresys
>
> --------------------------------------------------------------------------------
> *From:* Gedvilas, Brett L2 <BR...@UCDENVER.EDU>
> *Sent:* Wednesday, September 12, 2018 2:14 PM
> *To:* users@daffodil.apache.org
> *Subject:* DFDL Schema help
>
> Hi everyone,
>
>
> I am a new daffodil user and I was looking for input on a DFDL schema definition
> I'm trying to create. I'm working with some binary physics data, the format of
> which can loosely be described as fields that are aggregated together and packed
> into a single 32-bit integer before being written to memory. The gist of the
> issue is that because not all fields fall nicely on 1-byte divisions, different
> pieces of a field will get jumbled if you read the data as a linear stream from
> memory. This is best illustrated by a simple example:
>
>
> Consider the following 32-bit hex value: 0x90 00 20 01
>
>
> The problem arises because the values that have meaning in context are 0x9
> (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
>
>
> When this value gets stored in memory on a little endian architecture we see the
> following: 0x01 20 00 90. Trying to read those bit sequences as a stream from
> memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
>
>
> The simplest approach I can envision is to read in the value as an entire 32-bit
> value and then perform some processing via masks/bit shift in order to extract
> the correct values. Is there a more straightforward solution to this problem? or
> does anyone have experience or insights solving this issue using daffodil?
>
>
> Thanks!
>
>
> Brett
>
>
>
>
>
Re: DFDL Schema help
Posted by Mike Beckerle <mb...@tresys.com>.
Layering will work, but this problem is simpler than that.
Pretty sure this is just dfdl:bitOrder='leastSignificantBitFirst' with dfdl:byteOrder="littleEndian" data.
However, the order of the fields is reversed from the way you are thinking. The first field is the 12 bit field, the second field is the 16 bit field, the third is the 4 bit field. This should illustrate what I mean:
Byte 4 Byte 3 Byte 2 Byte 1
Hex: 9 0 0 0 2 0 0 1
Bits 1001 0000 0000 0000 0010 0000 0000 0001
f1 xxxx xxxx xxxx
f2 yyyy yyyy yyyy yyyy
f3 zzzz
...mike beckerle
Tresys
________________________________
From: Gedvilas, Brett L2 <BR...@UCDENVER.EDU>
Sent: Wednesday, September 12, 2018 2:14 PM
To: users@daffodil.apache.org
Subject: DFDL Schema help
Hi everyone,
I am a new daffodil user and I was looking for input on a DFDL schema definition I'm trying to create. I'm working with some binary physics data, the format of which can loosely be described as fields that are aggregated together and packed into a single 32-bit integer before being written to memory. The gist of the issue is that because not all fields fall nicely on 1-byte divisions, different pieces of a field will get jumbled if you read the data as a linear stream from memory. This is best illustrated by a simple example:
Consider the following 32-bit hex value: 0x90 00 20 01
The problem arises because the values that have meaning in context are 0x9 (consisting of 4 bits), 0x0002 (16 bits), and finally 0x001 (12 bits).
When this value gets stored in memory on a little endian architecture we see the following: 0x01 20 00 90. Trying to read those bit sequences as a stream from memory will yield 0x0, 0x1200, and 0x090, which are clearly incorrect.
The simplest approach I can envision is to read in the value as an entire 32-bit value and then perform some processing via masks/bit shift in order to extract the correct values. Is there a more straightforward solution to this problem? or does anyone have experience or insights solving this issue using daffodil?
Thanks!
Brett