You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Roger L Costello <co...@mitre.org> on 2020/09/15 10:40:46 UTC

Have you ever used more than one dfdl:representation in a DFDL schema?

Hi Folks,

A file contains a long series of text data and at the end is binary data. The binary data is not encoded as base64 text or anything like that. It is raw, unfiltered, unencoded binary data.

Is it a text file or a binary file?

Should the DFDL schema specify representation="text" or representation="binary"? 

Or, should the DFDL schema specify representation="text" for the text part and then switch to representation="binary" for the binary part?

All this time I have been thinking that a DFDL schema will have one dfdl:representation, But perhaps I am wrong. Have you ever used more than one dfdl:representation in a DFDL schema? If yes, then are the files it specifies some kind of a hybrid between text and binary? 

/Roger

Re: Have you ever used more than one dfdl:representation in a DFDL schema?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
I have seen commercial data sets, literally data you buy and pay big bucks for from financial data companies,  that looked exactly like a bunch of perl log output (all text, lots of semicolon separators), concatenated with Cobol oriented binary mainframe data, (packed decimal, EBCDIC characters) in each record of the data set.

Hence each record has a part that is dfdl:representation text, and another part that is dfdl:representation binary.  Each record did not even use the same character set encoding throughout.

This is one of the reasons that DFDL has a composition principle that "if you can describe A, and you can describe B, you can describe A concatenated to B."



________________________________
From: Roger L Costello <co...@mitre.org>
Sent: Tuesday, September 15, 2020 6:40 AM
To: users@daffodil.apache.org <us...@daffodil.apache.org>
Subject: Have you ever used more than one dfdl:representation in a DFDL schema?

Hi Folks,

A file contains a long series of text data and at the end is binary data. The binary data is not encoded as base64 text or anything like that. It is raw, unfiltered, unencoded binary data.

Is it a text file or a binary file?

Should the DFDL schema specify representation="text" or representation="binary"?

Or, should the DFDL schema specify representation="text" for the text part and then switch to representation="binary" for the binary part?

All this time I have been thinking that a DFDL schema will have one dfdl:representation, But perhaps I am wrong. Have you ever used more than one dfdl:representation in a DFDL schema? If yes, then are the files it specifies some kind of a hybrid between text and binary?

/Roger

Re: Have you ever used more than one dfdl:representation in a DFDL schema?

Posted by Steve Lawrence <sl...@apache.org>.
I don't think I'd said it's text or binary, I'd consider it as having
mixed representation. Though I'll admit I don't think I've actually seen
such a format. It's almost always one or the other.

However, if you do run into this case, an alternative to choosing one of
those as the default and then switching at some point might be to create
separate schemas for the text part and the binary part, and then
including those in a main schema, for example:

file_format.dfdl.xsd:

  <xs:include schemaLocation="file_format_text.dfdl.xsd" />
  <xs:include schemaLocation="file_format_bin.dfdl.xsd" />

  <xs:element name="FileFormat">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="TextPart" />
        <xs:element ref="BinaryPart" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

This way file_format_text.dfdl.xsd and file_format_bin.dfdl.xsd can have
a different default representation propertie, which scopes over that
schema file so you never have to specify it again.

- Steve

On 9/15/20 6:40 AM, Roger L Costello wrote:
> Hi Folks,
> 
> A file contains a long series of text data and at the end is binary data. The binary data is not encoded as base64 text or anything like that. It is raw, unfiltered, unencoded binary data.
> 
> Is it a text file or a binary file?
> 
> Should the DFDL schema specify representation="text" or representation="binary"? 
> 
> Or, should the DFDL schema specify representation="text" for the text part and then switch to representation="binary" for the binary part?
> 
> All this time I have been thinking that a DFDL schema will have one dfdl:representation, But perhaps I am wrong. Have you ever used more than one dfdl:representation in a DFDL schema? If yes, then are the files it specifies some kind of a hybrid between text and binary? 
> 
> /Roger
>