You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Attila Horvath <at...@gmail.com> on 2022/03/08 12:20:55 UTC

help processing missing optional data

ALCON

I'm having difficulty parsing an input ASCII file containing optional
records as follows:







*Required line 1... fixed length <CR><LF>Required line 2... fixed length
<CR><LF> Optional line 3... fixed length <CR><LF> Optional comment... 1 of
n <CR><LF> Optional comment... 2 of n <CR><LF> Optional comment... n of n
<CR><LF> Required line 4... fixed length <CR><LF> :::*

I've tried various <xs:choice ... /> and lookahead approaches
unsuccessfully and admittedly getting frustrated. <8(

For example, I'm able to parse 'Optional line 3' *when it exists* BUT I
can't construct DFDL script to process the file successfully when line 3 is
not present.

Can someone pls point me to an example/link illustrating best way to
process this if/when optional lines are NOT present.

Thx in advance
Attila

Re: help processing missing optional data

Posted by Steve Lawrence <sl...@apache.org>.
There's a few different possible approaches that depend on your data and 
how you know if a line is one of your required lines or an optional line.

In your data, how do you know that an optional line is missing or not? 
If you could post some an example of actual data, we could probably give 
a more specific approach.

That said, a general common approach is something like this:

   <xs:element name="RequiredThing" ... />
   <xs:element name="OptionalThing" minOccurs="0" ...>
     <xs:annotation>
       <xs:appinfo source="http://www.ogf.org/dfdl/">
         <dfdl:discriminator test="{ ... }"/>
       </xs:appinfo>
     </xs:annotation>
   </xs:element>

This approach sets minOccurs of the optional thing to zero so that it 
may or may not exist. We also add a discriminator to test if this is 
actually the optional thing or not. With this approach, Daffodil will 
speculatively try to parse the optional thing, and if it fails to parse 
or the discriminator evaluates to false and determines this isn't 
actually the optional thing, Daffodil will backtrack and try to parse 
the next thing. The actual test depends on how you determine if a line 
is an optional line or not.

In your snippet you just posted, length kind pattern is slightly 
unintuitive. A non-match just means it's a successful parse of zero 
length. Usually users want a non-match to be an error, so you would need 
to do something like this to get that behavior:

   <xs:element name="test" type="xs:string" minOccurs="0"
     dfdl:lengthKind="pattern"
     dfdl:lengthPattern="...">
     <xs:annotation>
       <xs:appinfo source="http://www.ogf.org/dfdl/">
         <dfdl:discriminator test="{ . != '' }"/>
       </xs:appinfo>
     </xs:annotation>
   </xs:element>

Like above, the test field is optional because minOccurs=0. An we have a 
discriminator to determine if we actually parsed the test field or if it 
was something else. In this case, the test is if the value of the test 
element was the empty string (i.e., the pattern didn't match anything). 
If it is the empty string, Daffodil will know the test element doesn't 
actually exist, it will backtrack, and try parsing the same data with 
whatever is next in the schema.

- Steve

On 3/8/22 7:37 AM, Attila Horvath wrote:
> PS:...
> 
> Case in point:...
> 
> <xs:element name="test" type="xs:string" dfdl:lengthKind="pattern"
> dfdl:lengthPattern=".*?(?=(THIS IS A CONTINUATION MESSAGE, PART))"/>
> <xs:choice>
> <xs:choice dfdl:choiceDispatchKey="{./test}">
> 
> When 'Optional line 3' is NOT there, I get an error on 2nd '<xs:choice.../>'
> because it resolves to an empty string.
> 
> <8(
> 
> 
> 
> On Tue, Mar 8, 2022 at 7:20 AM Attila Horvath <attila.j.horvath@gmail.com
> <ma...@gmail.com>> wrote:
> 
> 
>      ALCON
> 
>      I'm having difficulty parsing an input ASCII file containing optional
>      records as follows:
>      *Required line 1... fixed length <CR><LF>
>      Required line 2... fixed length *<CR><LF>*
>      Optional line 3... fixed length *<CR><LF>*
>      Optional comment... 1 of n *<CR><LF>*
>      Optional comment... 2 of n *<CR><LF>*
>      Optional comment... n of n *<CR><LF>*
>      Required line 4... fixed length *<CR><LF>*
>      :::*
> 
>      I've tried various <xs:choice ... /> and lookahead approaches unsuccessfully
>      and admittedly getting frustrated. <8(
> 
>      For example, I'm able to parse 'Optional line 3' _when it exists_ BUT I
>      can't construct DFDL script to process the file successfully when line 3 is
>      not present.
> 
>      Can someone pls point me to an example/link illustrating best way to process
>      this if/when optional lines are NOT present.
> 
>      Thx in advance
>      Attila
> 


Re: help processing missing optional data

Posted by Attila Horvath <at...@gmail.com>.
PS:...

Case in point:...

<xs:element name="test" type="xs:string" dfdl:lengthKind="pattern"
dfdl:lengthPattern=".*?(?=(THIS IS A CONTINUATION MESSAGE, PART))"/>
<xs:choice>
<xs:choice dfdl:choiceDispatchKey="{./test}">

When 'Optional line 3' is NOT there, I get an error on 2nd '<xs:choice.../>'
because it resolves to an empty string.

<8(



On Tue, Mar 8, 2022 at 7:20 AM Attila Horvath <at...@gmail.com>
wrote:

>
> ALCON
>
> I'm having difficulty parsing an input ASCII file containing optional
> records as follows:
>
>
>
>
>
>
>
> *Required line 1... fixed length <CR><LF>Required line 2... fixed length
> <CR><LF> Optional line 3... fixed length <CR><LF> Optional comment... 1 of
> n <CR><LF> Optional comment... 2 of n <CR><LF> Optional comment... n of n
> <CR><LF> Required line 4... fixed length <CR><LF> :::*
>
> I've tried various <xs:choice ... /> and lookahead approaches
> unsuccessfully and admittedly getting frustrated. <8(
>
> For example, I'm able to parse 'Optional line 3' *when it exists* BUT I
> can't construct DFDL script to process the file successfully when line 3 is
> not present.
>
> Can someone pls point me to an example/link illustrating best way to
> process this if/when optional lines are NOT present.
>
> Thx in advance
> Attila
>
>

Re: help processing missing optional data

Posted by Attila Horvath <at...@gmail.com>.
PS:...

Case in point:...

<xs:element name="test" type="xs:string" dfdl:lengthKind="pattern"
dfdl:lengthPattern=".*?(?=(THIS IS A CONTINUATION MESSAGE, PART))"/>
<xs:choice>
<xs:choice dfdl:choiceDispatchKey="{./test}">

When 'Optional line 3' is NOT there, I get an error on 2nd '<xs:choice.../>'
because it resolves to an empty string.

<8(



On Tue, Mar 8, 2022 at 7:20 AM Attila Horvath <at...@gmail.com>
wrote:

>
> ALCON
>
> I'm having difficulty parsing an input ASCII file containing optional
> records as follows:
>
>
>
>
>
>
>
> *Required line 1... fixed length <CR><LF>Required line 2... fixed length
> <CR><LF> Optional line 3... fixed length <CR><LF> Optional comment... 1 of
> n <CR><LF> Optional comment... 2 of n <CR><LF> Optional comment... n of n
> <CR><LF> Required line 4... fixed length <CR><LF> :::*
>
> I've tried various <xs:choice ... /> and lookahead approaches
> unsuccessfully and admittedly getting frustrated. <8(
>
> For example, I'm able to parse 'Optional line 3' *when it exists* BUT I
> can't construct DFDL script to process the file successfully when line 3 is
> not present.
>
> Can someone pls point me to an example/link illustrating best way to
> process this if/when optional lines are NOT present.
>
> Thx in advance
> Attila
>
>