You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/01/27 14:57:57 UTC

[GitHub] [incubator-daffodil] mbeckerle commented on pull request #481: ADD test for "%%" to bypass existing test for "%NL"

mbeckerle commented on pull request #481:
URL: https://github.com/apache/incubator-daffodil/pull/481#issuecomment-768342231


   Yeah, regex are really hard without myregextester, and the explain regex site tools. 
   
   Here's a regex that matches even number of % followed by single % followed by either NL or ES followed by ";"
   That is, it matches an "unescaped NL" or "unescaped ES".
   
   ```
   (?:%%)*(%(?:NL|ES)\;)
   ```
   You can of course generalize the NL|ES to the set of DFDL entities you want it to exclude. 
   
   Here's a test string:
   ```
   %NL;abcd%%ES;efgh%NLijkl%%%%%NL;mnop
   ```
   And here's the matchAll output as described by myregextester.com
   ```
   $matches Array:
   (
       [0] => Array
           (
               [0] => %NL;
               [1] => %ES;
               [2] => %%%%%NL;
           )
   
       [1] => Array
           (
               [0] => %NL;
               [1] => %ES;
               [2] => %NL;
           )
   
   )
   ```
   And here's the explanation of this per 
   http://rick.measham.id.au/paste/explain.pl?regex=%28%3F%3A%25%25%29*%28%25%28%3F%3ANL%7CES%29%5C%3B%29
   ```
   NODE                     EXPLANATION
   --------------------------------------------------------------------------------
     (?:                      group, but do not capture (0 or more times
                              (matching the most amount possible)):
   --------------------------------------------------------------------------------
       %%                       '%%'
   --------------------------------------------------------------------------------
     )*                       end of grouping
   --------------------------------------------------------------------------------
     (                        group and capture to \1:
   --------------------------------------------------------------------------------
       %                        '%'
   --------------------------------------------------------------------------------
       (?:                      group, but do not capture:
   --------------------------------------------------------------------------------
         NL                       'NL'
   --------------------------------------------------------------------------------
        |                        OR
   --------------------------------------------------------------------------------
         ES                       'ES'
   --------------------------------------------------------------------------------
       )                        end of grouping
   --------------------------------------------------------------------------------
       \;                       ';'
   --------------------------------------------------------------------------------
     )                        end of \1
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org