You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2018/12/14 13:37:45 UTC

Daffodil always processes the chunks in input files from left to right (low memory address to high memory address) ... True?

Hello DFDL community,

[Definition] Epiphany: a sudden, intuitive perception of or insight into the reality or essential meaning of something.

I just had two epiphanies. I want to confirm that my epiphanies are correct.

The epiphanies are with regard to parsing (processing) binary files.

Epiphany #1: Daffodil always processes input files in chunks, from low memory address to high memory address (left to right, top to bottom, as viewed in a hex editor).

Epiphany #2: Within a chunk, Daffodil may process the chunk's pieces in a different order. For example, suppose a chunk is 2 bytes and you have informed Daffodil that "bytes are in little endian order." So, Daffodil will treat the second byte (high memory byte) as the most significant byte and the first byte (low memory byte) as the least significant byte. The numeric value of the two bytes is the value of right byte left byte. You might think of that as Daffodil processing the two bytes from right to left, within that chunk. But remember, chunks are always processed left to right (low memory address to high memory address).

Are my two epiphanies correct?

/Roger

Re: Daffodil always processes the chunks in input files from left to right (low memory address to high memory address) ... True?

Posted by Mike Beckerle <mb...@tresys.com>.

Yes this is correct.


I would call "chunks" the result of "length isolation" i.e., one of the first things daffodil does is isolate an atomic piece of data by determining its length.


Then the interpreting of the data, creating a "value" from it, might do any variety of things with the data, including reverse the byte order, but could even do very radical things like feed them to a decompressor, or shuffle the bits.


Sometimes it is easier to visualize raw data in a different manner than a typical in-order right-to-left hex dump.


A basic bytes-in-order hex dump is ok if the data is textual, or binary with bigEndian byte order.


For binary littleEndian,... not so much. Numeric data bytes are all "backwards" from what your eye wants to see.


I've always been a fan of a layout where byte 1 is in the lower right corner, and addresses increase to the left and upward, so that higher addresses are well,.... higher. This layout really makes little-endian and least-significant-bit-first data much easier to visualize, but all text strings show up backward of course. And I don't have a tool that lays out data this way (though it would be pretty easy). A good data debugger (which I will write *someday*) will allow you to flip raw data around this way easily.


If you are working on data created on a PC/Intel-based platform, you are probably dealing with little-endian stuff, so a standard in-order hex dump makes looking at the numeric data painful because the bytes are reversed.


-mike beckerle

________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Friday, December 14, 2018 8:37:45 AM
To: users@daffodil.apache.org
Subject: Daffodil always processes the chunks in input files from left to right (low memory address to high memory address) ... True?


Hello DFDL community,



[Definition] Epiphany: a sudden, intuitive perception of or insight into the reality or essential meaning of something.



I just had two epiphanies. I want to confirm that my epiphanies are correct.



The epiphanies are with regard to parsing (processing) binary files.



Epiphany #1: Daffodil always processes input files in chunks, from low memory address to high memory address (left to right, top to bottom, as viewed in a hex editor).



Epiphany #2: Within a chunk, Daffodil may process the chunk’s pieces in a different order. For example, suppose a chunk is 2 bytes and you have informed Daffodil that “bytes are in little endian order.” So, Daffodil will treat the second byte (high memory byte) as the most significant byte and the first byte (low memory byte) as the least significant byte. The numeric value of the two bytes is the value of right byte left byte. You might think of that as Daffodil processing the two bytes from right to left, within that chunk. But remember, chunks are always processed left to right (low memory address to high memory address).



Are my two epiphanies correct?



/Roger