You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Mike Beckerle (Jira)" <ji...@apache.org> on 2022/08/18 21:09:00 UTC

[jira] [Commented] (DAFFODIL-2692) Add lengthKind 'valuePattern' which uses regex to match allowed data values

    [ https://issues.apache.org/jira/browse/DAFFODIL-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581540#comment-17581540 ] 

Mike Beckerle commented on DAFFODIL-2692:
-----------------------------------------

Recent email discussion suggests this idea should evolve to be not just value pattern, but the pattern should be expressed by way of the XSD facets on an element.

See: [https://lists.apache.org/thread/3t3j72k9z6dcpyb4pqsh5mh5lg53mgt9]

The name 'valuePattern' may not be well-chosen given this different direction.

At the crux of this all is the realization that the value patterns must be regex with longest-match for alternatives as the semantics. XSD pattern facet matching has this behavior, the dfdl:lengthKind 'pattern' regex engine does NOT. it has the left-to-right order behavior. 

(Also: Apache Xerces has an XSD validator which contains a regex matcher for pattern facets that implements this longest-match behavior, so there's at least one source of an open-source acceptable license code-base for this regex engine.)

> Add lengthKind 'valuePattern' which uses regex to match allowed data values
> ---------------------------------------------------------------------------
>
>                 Key: DAFFODIL-2692
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2692
>             Project: Daffodil
>          Issue Type: New Feature
>          Components: Back End, Front End
>    Affects Versions: 3.3.0
>            Reporter: Mike Beckerle
>            Priority: Major
>
> Existing dfdl:lengthKind 'pattern' uses the pattern to determine the length. No match means length 0.
> People want to use regular expression (or regex) matches differently from this. They want to specify the allowed data patterns, with no match meaning parse error. 
> This should be added as a dfdlx experimental feature to develop experience with it. 
> A few design issues: we need to decide if this pattern includes nil values in its syntax, or if those get added as allowed value patterns automatically. It is simpler if we define this to require that the regex pattern specify all possible data patterns that are accepted, whether they become nilled elements, or elements with values. That, however, requires one to redundantly express the dfdl:nilValue information.
> There may also be an interaction with properties like dfdl:emptyValueDelimiterPolicy and the empty representation. I.e., does the pattern have to allow for a zero-length successful match in order for the data to be zero-length?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)