You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by Mike Beckerle <mb...@tresys.com> on 2018/03/06 17:02:54 UTC
Re: Please review & discuss - draft proposal for how to do base64, foldedLines, etc.

Received some excellent feedback on the proposal as presented on the Wiki from Steve Hanson of IBM.


The feedback was mostly very supportive of the proposal. He suggested this change:


He suggested that we avoid the term "streaming" and stick with "layering" in all the terminology as the behavior known as "streaming" already has strong connotations.


Throughout the long history of DFDL, the term layering was always used for these concepts where a transformation must be done to data before parsing (after unparsing).


We do use the term "data stream" or just "stream" as a direction-independent way of referring to the data being parsed (input stream) or unparsed (output stream), but "streaming" connotes processing in a manner consistent with an unbounded stream, using a small/finite memory footprint.


While layering can be done in a streaming manner or not, the point of layering is different, as it is about the algorithmic transformations, not the memory footprint nor length-boundedness of the data stream.


So I've updated the proposal to use the term layer, layered, and layering and use the term stream minimally.


The updated proposal, which now lives on the Apache Daffodil Wiki here:


https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layer+Annotations+for+Base64+and+other+Layered+Transformations


-Mike Beckerle

Tresys


________________________________
From: Mike Beckerle
Sent: Friday, January 5, 2018 6:28:23 PM
To: dev@daffodil.apache.org
Subject: Re: Please review & discuss - draft proposal for how to do base64, foldedLines, etc.


Updated proposal attached.

________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, January 4, 2018 9:28:51 AM
To: Mike Beckerle
Subject: Re: Please review & discuss - draft proposal for how to do base64, foldedLines, etc.

<-- snip -->

> 2) What about options for a transform? For example, you might want to
> specify a gzip stream to do something like --best or --fast to favor
> compression size vs speed. Or what variation of base64 should be used.
> Might also used to describe how errors should be handled specific to a
> transform. For example, base64 can ignore garbage characters when
> decoding, but that might want to be a processing error in some cases.
>
> I guess this could be a single option with space separated key/value
> pairs, e.g.
>
>    daf:streamTransformOptions="base64_ignore_garbage=yes
> base64_variant=rfc1421"
>
> That's very extensible, but might not be consistent with the rest of
> DFDL. Maybe we need specific options for each stream transform, e.g.
>
>    daf:streamTransformBase64IgnoreGarbage="yes"
>    daf:streamTransformBase64Variant="rfc1421"
>    ..
>
> MikeB: My suggestion would be to make these parameters part of the algorithm
> name for now. E.g.,
> daf:streamTransform="base64Best" or daf:streamTransform="base64_ignore_garbage".
>
> We're going to need a way to specify many of these stream transforms. Specifying
> gzip with options
> and naming it something new better not be very hard. So perhaps that is good
> enough for now.
>

My only (minor) concern with this is that if something had multiple
options, the combinations of names could expand quickly. But probably
not worth worrying about until that actually happens--it may not be an
issue in practice.

Everything else above sounds good.