You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/02/19 17:16:09 UTC

With the tremendous agility that DFDL provides, what is the role of XML? What is the role of binary?

Hello DFDL community,

DFDL gives us tremendous agility - we can quickly and easily transform binary to XML and XML to binary.

Binary, with its conciseness, is beautiful for moving data.

XML, with its vast tool suite, is beautiful for processing data.

What do you see as the role of XML? The role of binary? 

Use binary when moving data, use XML when processing data?

Most images (JPEG, GIF, PNG, etc.) are binary and are processed in their binary form. So XML isn't necessarily the ideal form for processing data.

I am eager to hear your thoughts/opinions/comments on this subject.

/Roger

Re: With the tremendous agility that DFDL provides, what is the role of XML? What is the role of binary?

Posted by "Beckerle, Mike" <mb...@tresys.com>.
Interesting question.


So if you have legacy/pre-existing data formats, then the use case for DFDL is clear.

So I think of your question as this really: What are use cases for DFDL for "new" applications?

I think new applications that are inventing file formats may end up using DFDL if the application authors are too lazy to use say, XML as the file format. If they just do whatever is easiest to write-out from their favorite programming language, then they're going to get an ad-hoc file format, and in the future if some *other* software wants to read that file, then DFDL is a tool of choice.

But it is preferable if new applications that invent file formats do so purposefully and use a standard text-oriented representation like XML. (Could be JSON too, but lack of a schema language for JSON makes it far less desirable IMHO.)

The exceptions here are if speed/space concerns make the overhead of XML too high.

There is an environmental argument against using XML.
Consider all the wasted CPU cycles in the world dealing with XML's verbose and redundant structure. Given that computers use lots of energy, the "Carbon Footprint" of XML on global scale is something to think about. Makes me wish EXI would catch on more. I also wish XML would just allow a non-verbose close tag like <foo>value</> where the end tag doesn't have to repeat the open tag. This would reduce XML's overhead to much closer to JSON or Lisp S-expressions again. But I digress.

But ignoring all that, there are cases where use of an expensive data format like XML just won't allow you to achieve the goal of your software. The two cases I know of where something like XML is unacceptable and one might prefer a dense binary data format are cutting-edge supercomputing applications - where every bit counts in space/speed if the application is going to work at all, and also ultra-low-power computing, where every bit counts, because the cost of just data compress/decompress consumes too much battery power.

But even then, a standard binary format like EXI (binary XML - same infoset as XML, just denser binary representation) may be preferable to an ad-hoc file format with DFDL schema.

Lastly another use case I've found for DFDL is what I call "CSV-like" data files.

These arise when human beings will be editing data files by hand.  I have a lot of experience of "CSV" data files that aren't at all well behaved as true CSV data files are supposed to be. Given a spreadsheet program like MS-Excel, people will create a spreadsheet document with all sorts of headers and sections on a sheet. Then they'll export that sheet as "CSV" and claim the file is CSV data.

These sorts of "CSV-like" files are often full of inconsistencies. Empty cells are sometimes empty string, sometimes all-whitespace strings, sometimes  various markers like "--" or "N/A" or "none"

A DFDL schema can be written which handles all these human inconsistency factors, skipping section headers, standardizing "--", "N/A", etc.  The result is well-behaved XML data set from an inconsistent human-edited CSV-like data file.

-mike beckerle
Tresys Technology



________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Tuesday, February 19, 2019 12:16:09 PM
To: users@daffodil.apache.org
Subject: With the tremendous agility that DFDL provides, what is the role of XML? What is the role of binary?

Hello DFDL community,

DFDL gives us tremendous agility - we can quickly and easily transform binary to XML and XML to binary.

Binary, with its conciseness, is beautiful for moving data.

XML, with its vast tool suite, is beautiful for processing data.

What do you see as the role of XML? The role of binary?

Use binary when moving data, use XML when processing data?

Most images (JPEG, GIF, PNG, etc.) are binary and are processed in their binary form. So XML isn't necessarily the ideal form for processing data.

I am eager to hear your thoughts/opinions/comments on this subject.

/Roger