You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Roger L Costello <co...@mitre.org> on 2021/12/15 13:30:51 UTC

Have you seen a data format like this? Pair of nested lists with no punctuation after each outer list.

Hi Folks,

Have you seen a data format like this: there is a pair of nested lists -- an outerList that has an innerList. The lists can be repeated an arbitrary number of times. There is no punctuation (separator) at the end of each outerList, but there is punctuation at the end.

For example, suppose the data format consists of input data that is a series of "A" characters and at the end is a "Z" character. Here are sample inputs:

Z
AZ
AAZ
AAAZ
AAAAZ
...

Here is a grammar for this:
------------------------------------------------------------
start: outerList 'Z'

outerList:   /* empty */
    |             outerList outerListItem

outerListItem: innerList

innerList:  /* empty */
   |             innerList innerListItem

innerListItem:  'A'
------------------------------------------------------------
So, this input:

	AAAZ

could be due to one outerListItem and three innerListItems, or two outerListItems and (0,3), (1,2), (2,1), or (3,0) innerListItems, or ...

It is my understanding that this is rare but does exist. A book that I am reading says:

	In practice, it's pretty rare to have a pair of nested
 	lists with no punctuation. It's confusing to parsers,
 	and it's confusing to humans, too.

Have you seen a data format like this?

/Roger

Re: Have you seen a data format like this? Pair of nested lists with no punctuation after each outer list.

Posted by Mike Beckerle <mb...@apache.org>.
Can't say I have seen this.

Of the various parses that are possible in this seemingly ambiguous
situation, which one is correct? Or is any one of them correct?

If you create a DFDL schema for this, you'll always get one outer and
one inner sequence, where the inner sequence has the maximum number of
elements in it.

On Wed, Dec 15, 2021 at 8:31 AM Roger L Costello <co...@mitre.org> wrote:
>
> Hi Folks,
>
> Have you seen a data format like this: there is a pair of nested lists -- an outerList that has an innerList. The lists can be repeated an arbitrary number of times. There is no punctuation (separator) at the end of each outerList, but there is punctuation at the end.
>
> For example, suppose the data format consists of input data that is a series of "A" characters and at the end is a "Z" character. Here are sample inputs:
>
> Z
> AZ
> AAZ
> AAAZ
> AAAAZ
> ...
>
> Here is a grammar for this:
> ------------------------------------------------------------
> start: outerList 'Z'
>
> outerList:   /* empty */
>     |             outerList outerListItem
>
> outerListItem: innerList
>
> innerList:  /* empty */
>    |             innerList innerListItem
>
> innerListItem:  'A'
> ------------------------------------------------------------
> So, this input:
>
>         AAAZ
>
> could be due to one outerListItem and three innerListItems, or two outerListItems and (0,3), (1,2), (2,1), or (3,0) innerListItems, or ...
>
> It is my understanding that this is rare but does exist. A book that I am reading says:
>
>         In practice, it's pretty rare to have a pair of nested
>         lists with no punctuation. It's confusing to parsers,
>         and it's confusing to humans, too.
>
> Have you seen a data format like this?
>
> /Roger