You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/11/09 13:05:19 UTC

Is it okay to officially publish a DFDL schema that produces warnings on valid input data?

Hi Folks,

Suppose you are creating the official, standard DFDL schema for a data format. Would you be okay with officially releasing a schema that generates warnings on data that is valid?

Here's an example. The RFC for CSV (RFC 4180) says that CSV files consist of records separated by newlines. Each record consists of fields separated by commas. The last record may or may not have a new line.

Suppose the last record of a CSV file has newline. My DFDL schema generates this warning:

[warning] Left over data. Consumed 1680 bit(s) with at least 16 bit(s) remaining.

I am thinking that that warning is okay. Why? Because when the last record has a newline, then the file really does have left over data - the newline on the last record. So, a warning is not unreasonable.

Well, that's what I think. I might be thinking wrongly. What do you think? Would you ever officially release a DFDL schema that generates warnings on valid input data?

/Roger

Re: Is it okay to officially publish a DFDL schema that produces warnings on valid input data?

Posted by "Beckerle, Mike" <mb...@tresys.com>.
I would avoid this.

One thing you need to take a position on is whether on unparsing you generate this final new line, or not, or try to preserve whatever the file had originally.

Choosing to always generate this, or always omit it is canonicalization.

I suggest adding this

<choice>
  <sequence dfdl:initiator="%NL;" />
  <sequence />
</choice>

At the end of the schema after the repeating row element.

This will absorb and discard any final newline.

If you want to preserve the final newline then you have to model it as data so change the first branch of the choice above and make it an element named 'finalNewLine' with initiator and type string with explicit length 0.


________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Saturday, November 9, 2019 8:05:19 AM
To: users@daffodil.apache.org <us...@daffodil.apache.org>
Subject: Is it okay to officially publish a DFDL schema that produces warnings on valid input data?


Hi Folks,



Suppose you are creating the official, standard DFDL schema for a data format. Would you be okay with officially releasing a schema that generates warnings on data that is valid?



Here’s an example. The RFC for CSV (RFC 4180) says that CSV files consist of records separated by newlines. Each record consists of fields separated by commas. The last record may or may not have a new line.



Suppose the last record of a CSV file has newline. My DFDL schema generates this warning:



[warning] Left over data. Consumed 1680 bit(s) with at least 16 bit(s) remaining.



I am thinking that that warning is okay. Why? Because when the last record has a newline, then the file really does have left over data – the newline on the last record. So, a warning is not unreasonable.



Well, that’s what I think. I might be thinking wrongly. What do you think? Would you ever officially release a DFDL schema that generates warnings on valid input data?



/Roger