You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2020/04/13 11:28:44 UTC

Is leadingSkip and trailingSkip used (exclusively) with binary data formats that contain an island of text?

Hi Folks,

I can't imagine a text data format in which the text doesn't start until after n bytes (or n bits). That is, I can't see the need for leadingSkip in a text data format. Is there a use case for leadingSkip in text data formats? If leadingSkip doesn't apply to text data formats, then why do I have to specify it?

However, I can imagine an island of text embedded in a binary data format: the text doesn't start until after n bytes (or n bits). Yes?

Likewise, I can't imagine a text data format in which there is some text and then the next text is after n bytes (or n bits). That is, I can't see the need for trailingSkip in a text data format. Is there a use case for trailingSkip in text data formats? If trailingSkip doesn't apply to text data formats, then why do I have to specify it?

However, I can imagine islands of text embedded in a binary data format: there is an island of text and then the next island of text doesn't start until after n bytes (or n bits). Yes?

/Roger

 

Re: Is leadingSkip and trailingSkip used (exclusively) with binary data formats that contain an island of text?

Posted by "Beckerle, Mike" <mb...@tresys.com>.
LeadingSkip/TrailingSkip are just about the way some formats are expressed.

You are correct that these are about formats with unused "holes" in them.

Sometimes the spec document tells you how long the fields are, and when there are unused areas, the distance to the next element. That's the style that leadingSkip/TrailingSkip are really about.

Other formats define "elements" with names like "spare" which consume the 'unused' data.

Other formats tell you the starting location of each field. This style ("offset oriented") isn't currently supported directly by DFDL. You have to convert it into length information. It's planned for a DFDL v2.0 feature to directly support this.

I would agree with you that I've only ever seen these leadingSkip/TrailingSkip notions used in binary data formats.  But the difference between binary and text can be subtle.

An important property for DFDL is an A + B composition rule. If you can describe A, and you can describe B, then if you concatenate A-described data with B-described data, you can describe that. So if A is textual, and B is binary, then A+B is a blend. One can of course then surround that mixture with yet more layers of other formats. I have seen commercial data sets which truly looked like a COBOL-style mainframe data record concatenated onto a log output from some perl-based web application. They were directly juxtaposed in the data. Then repeated per record. Each record was this composition of two utterly different formats. This was a commercial data set you had to pay money to buy from a supplier of such data.

The A+B composition property prevents DFDL from segregating the world into text, and the properties exclusive to text, and binary and the properties exclusive to binary data.  The world of data formats is much too messy for such clean distinctions.
________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Monday, April 13, 2020 7:28 AM
To: users@daffodil.apache.org <us...@daffodil.apache.org>
Subject: Is leadingSkip and trailingSkip used (exclusively) with binary data formats that contain an island of text?

Hi Folks,

I can't imagine a text data format in which the text doesn't start until after n bytes (or n bits). That is, I can't see the need for leadingSkip in a text data format. Is there a use case for leadingSkip in text data formats? If leadingSkip doesn't apply to text data formats, then why do I have to specify it?

However, I can imagine an island of text embedded in a binary data format: the text doesn't start until after n bytes (or n bits). Yes?

Likewise, I can't imagine a text data format in which there is some text and then the next text is after n bytes (or n bits). That is, I can't see the need for trailingSkip in a text data format. Is there a use case for trailingSkip in text data formats? If trailingSkip doesn't apply to text data formats, then why do I have to specify it?

However, I can imagine islands of text embedded in a binary data format: there is an island of text and then the next island of text doesn't start until after n bytes (or n bits). Yes?

/Roger