You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@daffodil.apache.org by John Dziurlaj <jo...@turnout.rocks> on 2022/10/17 19:52:59 UTC

Variable tokens in DFDL

I am attempting to create a DFDL schema for a HL7 derived language. HL7 has a feature that all the delimiters in a message are mapped using the first few bytes of the input itself. For example: MSH|^~\& specifies the various delimiters, in order of field separator | component separator ^, repetition separator ~, escape character \, subcomponent separator &.

Can an DFDL schema be produced that does not hardcode these tokens?
John Dziurłaj

Re: Variable tokens in DFDL

Posted by Steve Lawrence <sl...@apache.org>.

Yep. The key is to use DFDL variables.

You first need to define the variable using dfdl:defineVariable in the 
schema annotation, providing the name, type, and optional default value, 
e.g.:

   <schema ...>
     <annotation>
       <appinfo source="http://www.ogf.org/dfdl/">
         ...
         <dfdl:defineVariable name="FieldSep" type="xs:string" 
defaultValue="|"/>
         ...
       </appinfo>
     </annotation>
     ...

Then on the element that parses that delimiter you use dfdl:setVariable 
to set it based on the parsed value, e.g.:

   <element name="FieldSeparator" type="xs:string" 
dfdl:lengthKind="explicit" dfdl:length="1">
     <annotation>
       <appinfo source="http://www.ogf.org/dfdl/">
         <dfdl:setVariable ref="FieldSep" value="{.}" />
       </appinfo>
     </annotation>
   </element>

And finally, specific an expression that evaluates to that variable 
placed on the appropriate sequence, e.g.

   <sequence dfdl:separator="{ $FieldSep }">
     ...
   </sequence>

For a real world example, the EDIFACT schema does something very 
similar. Here is the Daffodil development branch for that schema:

   https://github.com/DFDLSchemas/EDIFACT/tree/daffodil-dev

Everything I mentioned is in the src/main/resource/EDIFACT-Common/ 
files. Though it is a bit more complicated since it defines new formats 
that use the variable and then the sequence uses dfdl:ref to refer to 
those formats (instead of directly setting the separator to the variable 
like above), and it has a slightly more complicated setVariable 
expression to conditionally set the variables. But the core idea is the 
same.

On 10/17/22 3:52 PM, John Dziurlaj wrote:
> I am attempting to create a DFDL schema for a HL7 derived language. HL7 
> has a feature that all the delimiters in a message are mapped using the 
> first few bytes of the input itself. For example: |*MSH|^~\&*| specifies 
> the various delimiters, in order of field separator ||| component 
> separator |^|, repetition separator |~|, escape character |\|, 
> subcomponent separator |&|.
> 
> Can an DFDL schema be produced that does not hardcode these tokens?
> 
> John Dziurłaj
>