You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Mike Beckerle (Jira)" <ji...@apache.org> on 2021/03/04 02:01:00 UTC

[jira] [Commented] (DAFFODIL-2474) how to deal with Control chars and newlines in pattern.

    [ https://issues.apache.org/jira/browse/DAFFODIL-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294932#comment-17294932 ] 

Mike Beckerle commented on DAFFODIL-2474:
-----------------------------------------

PUll request  https://github.com/apache/daffodil/pull/495

> how to deal with Control chars and newlines in pattern.
> -------------------------------------------------------
>
>                 Key: DAFFODIL-2474
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2474
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Clean Ups, QA
>    Affects Versions: 3.0.0
>            Reporter: Mike Beckerle
>            Assignee: Mike Beckerle
>            Priority: Major
>
> Major because this issue was raised by a user, and it took me hours to figure it out!
> We need to cleanup some code and add tests to show how to do seemingly obvious things with XSD pattern facets that are in fact quite tricky to do, and we've gotten them wrong before in real schemas.
> E.g., use a pattern facet to restrict the characters of a string to only the characters with code points less than 7F.
> This turns out to be quite tricky due to XML illegal characters, combined with XML attribute normalization.
> The correct pattern facet definition is this:
> ```
> <{color:#871094}xs{color}{color:#0033b3}:pattern {color}{color:#174ad4}value{color}{color:#067d17}="[{color}{color:#0037a6}&#xE000;{color}{color:#0033b3}-{color}{color:#0037a6}&#xE008;{color}{color:#067d17}\t\n{color}{color:#0037a6}&#xE00B;&#xE00C;{color}{color:#067d17}\r{color}{color:#0037a6}&#xE00E;{color}{color:#0033b3}-{color}{color:#0037a6}&#xE01F;&#x20;{color}{color:#0033b3}-{color}{color:#0037a6}&#x7F;{color}{color:#067d17}]*"{color}/>
> ```
> (that has to all be on one line)
> Various other combinations do NOT work. E.g., you can't replace the \n by "&#x0A;" because XML attribute normalization will take that out.
> You can't use Daffodil's "&#xE00A;" either, because when Xerces-based full validation comes along, there will be an 0x0A in the data, not an 0xE00A, so Xerces will fail validation. You have to use \n for this.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)