You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/08/27 16:43:21 UTC

Does the dollar sign mean "end of file"?

Hello DFDL community,

My input is this:

Hello, World Blah
Broccoli
3ABC

I want it parsed to this:

<input>
  <A>Hello, World</A>
  <B> Blah</B>
  <C>Broccoli
3ABC</C>
</input>

That is, the first field is exactly 12 characters. The second field extends up to the newline. The third field is the rest.

Below is my DFDL schema. It produces this result:

<input>
  <A>Hello, World</A>
  <B> Blah</B>
  <C></C>
</input>

along with a warning message saying that a bunch of bytes remain.

Why do I get that result instead of the desired result?  /Roger

<xs:element name="input">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="A" type="xs:string"
                        dfdl:lengthKind="explicit"
                        dfdl:length="12"
                        dfdl:lengthUnits="characters" />
            <xs:element name="B" type="xs:string"
                        dfdl:lengthKind="delimited"
                        dfdl:terminator="%NL;" />
            <xs:element name="C" type="xs:string"
                        dfdl:lengthKind="pattern"
                        dfdl:lengthPattern=".+?(?=$)" />
        </xs:sequence>
    </xs:complexType>
</xs:element>






Re: Does the dollar sign mean "end of file"?

Posted by Steve Lawrence <sl...@apache.org>.
The issue here is that the dot character doesn't match newlines. So your
expression is essentially just looking for one or more non-newline
characters up until the end of the data Your field has a newline, so the
regular expression fails there, doesn't match, and results in a zero
length string.

If you want dot to match a newline, you can put the "(?s)" flag before
the regex.

You can also simplify the expression a bit. You don't need to make the
dot match non-greedy, and the $ doesn't need to be in a forward
lookahead. So the following should work and is a bit more compact:

 dfdl:lengthPattern="(?s).+$"

That will match one or more characters (including newlines) up until the
end of the data.



On 8/27/19 12:43 PM, Costello, Roger L. wrote:
> Hello DFDL community,
> 
> My input is this:
> 
> Hello, World Blah
> Broccoli
> 3ABC
> 
> I want it parsed to this:
> 
> <input>
> <A>Hello, World</A>
> <B>Blah</B>
> <C>Broccoli
> 3ABC</C>
> </input>
> 
> That is, the first field is exactly 12 characters. The second field extends up 
> to the newline. The third field is the rest.
> 
> Below is my DFDL schema. It produces this result:
> 
> <input>
> <A>Hello, World</A>
> <B>Blah</B>
> <C></C>
> </input>
> 
> along with a warning message saying that a bunch of bytes remain.
> 
> Why do I get that result instead of the desired result?  /Roger
> 
> <xs:elementname="input">
> <xs:complexType>
> <xs:sequence>
> <xs:elementname="A"type="xs:string"
>                          dfdl:lengthKind="explicit"
>                          dfdl:length="12"
>                          dfdl:lengthUnits="characters"/>
> <xs:elementname="B"type="xs:string"
>                          dfdl:lengthKind="delimited"
>                          dfdl:terminator="%NL;"/>
> <xs:elementname="C"type="xs:string"
>                          dfdl:lengthKind="pattern"
>                          dfdl:lengthPattern=".+?(?=$)"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
>