You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Roger L Costello <co...@mitre.org> on 2021/05/19 15:13:24 UTC

Is generateEscapeBlock="always" working correctly?

Hi Folks,

Suppose I have comma-separated records and the second field of each record may have an escape block (quote is the begin and end escape character). Here's a sample instance:

a,b,c
d,"e,f,g",h

Clearly the second field of the first record does not need the escape block.

Parsing produces:

<csv>
    <record>
        <field>a</field>
        <field>b</field>
        <field>c</field>
    <record>
    <record>
        <field>d</field>
        <field>e,f,g</field>
        <field>h</field>
    <record>
</csv>

If the DFDL schema specifies generateEscapeBlock="always", then as I understand it unparsing will always escape the second field even if it's not needed. Do I understand correctly? So, the output of unparsing is:

a,"b",c
d,"e,f,g",h

Notice that the second field of the first record is now escaped even though it's not needed.

If my understanding is correct, then Daffodil v3.0 is not behaving correctly since it produces:

a,b,c
d,"e,f,g",h

Thoughts?

/Roger

Re: Is generateEscapeBlock="always" working correctly?

Posted by Steve Lawrence <sl...@apache.org>.
Your understanding is correct, and it should unparse with the second
field always surrounded in quotes, e.g.:

  a,"b",c
  d,"e,f,g",h

I've created a test schema/data and I do get this correct behavior on
Daffodil 3.0.0 and the recently release 3.1.0. There might be something
more subtle going if this isn't working. Can you send your schema and we
can take a look?


On 5/19/21 11:13 AM, Roger L Costello wrote:
> Hi Folks,
> 
> Suppose I have comma-separated records and the second field of each record may have an escape block (quote is the begin and end escape character). Here's a sample instance:
> 
> a,b,c
> d,"e,f,g",h
> 
> Clearly the second field of the first record does not need the escape block.
> 
> Parsing produces:
> 
> <csv>
>     <record>
>         <field>a</field>
>         <field>b</field>
>         <field>c</field>
>     <record>
>     <record>
>         <field>d</field>
>         <field>e,f,g</field>
>         <field>h</field>
>     <record>
> </csv>
> 
> If the DFDL schema specifies generateEscapeBlock="always", then as I understand it unparsing will always escape the second field even if it's not needed. Do I understand correctly? So, the output of unparsing is:
> 
> a,"b",c
> d,"e,f,g",h
> 
> Notice that the second field of the first record is now escaped even though it's not needed.
> 
> If my understanding is correct, then Daffodil v3.0 is not behaving correctly since it produces:
> 
> a,b,c
> d,"e,f,g",h
> 
> Thoughts?
> 
> /Roger
>