You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Steve Lawrence (Jira)" <ji...@apache.org> on 2020/09/22 12:04:00 UTC
[jira] [Comment Edited] (DAFFODIL-2399) Error diagnostics output even though there is an infoset

    [ https://issues.apache.org/jira/browse/DAFFODIL-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199658#comment-17199658 ] 

Steve Lawrence edited comment on DAFFODIL-2399 at 9/22/20, 12:03 PM:
---------------------------------------------------------------------

Was able to create a minimal example that reproduces the issue. Here is the schema:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>

<xs:schema
  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
  <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />

  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="GeneralFormat"
        lengthKind="delimited" 
        separatorSuppressionPolicy="trailingEmpty" />
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="message">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="record"/>
      </xs:sequence>
    </xs:complexType> 
  </xs:element>

  <xs:element name="record" dfdl:initiator="record" dfdl:terminator="%NL;">
    <xs:complexType>
      <xs:sequence dfdl:separator="|" dfdl:separatorPosition="prefix">
        <xs:sequence dfdl:separator="~" dfdl:separatorPosition="infix">
          <xs:element name="field1" type="xs:string" minOccurs="0" maxOccurs="unbounded" />
        </xs:sequence>
        <xs:sequence dfdl:separator="~" dfdl:separatorPosition="infix">
          <xs:element name="field2" type="xs:string" minOccurs="0" maxOccurs="unbounded" />
        </xs:sequence>
        <xs:sequence dfdl:separator="~" dfdl:separatorPosition="infix">
          <xs:element name="field3" type="xs:string" minOccurs="0" maxOccurs="unbounded" />
        </xs:sequence>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
    
</xs:schema>
{code}

Here is test data (with a newline at the end):
{code}
record|field1
{code}

Parsing using the CLI results in the following output. Note that it prints the correct infoset followed by multiple parse errors.

{code}
$ daffodil parse  -s schema.dfdl.xsd data.txt
<?xml version="1.0" encoding="UTF-8"?>
<message>
  <record>
    <field1>field1</field1>
  </record>
</message>
[error] Parse Error: Failed to parse prefix separator. Cause: Parse Error: Found out of scope delimiter: '%NL;' '␊'
Schema context: sequence[1] Location line 27 column 8 in file:/schema.dfdl.xsd
Data location was preceding byte 13.
Schema context: sequence[1] Location line 27 column 8 in file:/schema.dfdl.xsd
Data location was preceding byte 13
[error] Parse Error: Found out of scope delimiter: '%NL;' '␊'
Schema context: sequence[1] Location line 27 column 8 in file:/scheam.dfdl.xsd
Data location was preceding byte 13
[error] Parse Error: Failed to parse prefix separator. Cause: Parse Error: Found out of scope delimiter: '%NL;' '␊'
Schema context: sequence[1] Location line 27 column 8 in file:/schema.dfdl.xsd
Data location was preceding byte 13.
Schema context: sequence[1] Location line 27 column 8 in file:/schema.dfdl.xsd
Data location was preceding byte 13
[error] Parse Error: Found out of scope delimiter: '%NL;' '␊'
Schema context: sequence[1] Location
{code}

I have confirmed that Daffodil does not think there are any parse errors. It thinks this is a successful parse. It's unclear where the diagnostics are coming from.

Also note that if you copy/paste additional inner sequences with the tilde separator and a incremented field name, you get more errors.


was (Author: slawrence):
Was able to create a minimal example that reproduces the issue. Here is the schema:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>

<xs:schema
  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
  <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />

  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="GeneralFormat"
        lengthKind="delimited" 
        separatorSuppressionPolicy="trailingEmpty" />
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="message">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="record"/>
      </xs:sequence>
    </xs:complexType> 
  </xs:element>

  <xs:element name="record" dfdl:initiator="record" dfdl:terminator="%NL;">
    <xs:complexType>
      <xs:sequence dfdl:separator="|" dfdl:separatorPosition="prefix">
        <xs:sequence dfdl:separator="~" dfdl:separatorPosition="infix">
          <xs:element name="field1" type="xs:string" minOccurs="0" maxOccurs="unbounded" />
        </xs:sequence>
        <xs:sequence dfdl:separator="~" dfdl:separatorPosition="infix">
          <xs:element name="field2" type="xs:string" minOccurs="0" maxOccurs="unbounded" />
        </xs:sequence>
        <xs:sequence dfdl:separator="~" dfdl:separatorPosition="infix">
          <xs:element name="field3" type="xs:string" minOccurs="0" maxOccurs="unbounded" />
        </xs:sequence>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
    
</xs:schema>
{code}

Here is test data (with a newline at the end):
{code}
record|field1
{code}

Parsing using the CLI results in the following output. Note that it prints the correct infoset followed by multiple parse errors.

{code}
$ daffodil parse  -s schema.dfdl.xsd data.txt
<?xml version="1.0" encoding="UTF-8"?>
<message>
  <record>
    <field1>field1</field1>
  </record>
</message>
[error] Parse Error: Failed to parse prefix separator. Cause: Parse Error: Found out of scope delimiter: '%NL;' '␊'
Schema context: sequence[1] Location line 27 column 8 in file:/home/slawrence/owl/daffodil/dfdl-schemas.git/dfdl-hl7/src/main/resources/GenericHL7.xsd
Data location was preceding byte 13.
Schema context: sequence[1] Location line 27 column 8 in file:/home/slawrence/owl/daffodil/dfdl-schemas.git/dfdl-hl7/src/main/resources/GenericHL7.xsd
Data location was preceding byte 13
[error] Parse Error: Found out of scope delimiter: '%NL;' '␊'
Schema context: sequence[1] Location line 27 column 8 in file:/home/slawrence/owl/daffodil/dfdl-schemas.git/dfdl-hl7/src/main/resources/GenericHL7.xsd
Data location was preceding byte 13
[error] Parse Error: Failed to parse prefix separator. Cause: Parse Error: Found out of scope delimiter: '%NL;' '␊'
Schema context: sequence[1] Location line 27 column 8 in file:/home/slawrence/owl/daffodil/dfdl-schemas.git/dfdl-hl7/src/main/resources/GenericHL7.xsd
Data location was preceding byte 13.
Schema context: sequence[1] Location line 27 column 8 in file:/home/slawrence/owl/daffodil/dfdl-schemas.git/dfdl-hl7/src/main/resources/GenericHL7.xsd
Data location was preceding byte 13
[error] Parse Error: Found out of scope delimiter: '%NL;' '␊'
Schema context: sequence[1] Location
{code}

I have confirmed that Daffodil does not think there are any parse errors. It thinks this is a successful parse. It's unclear where the diagnostics are coming from.

Also note that if you copy/paste additional inner sequences with the tilde separator and a incremented field name, you get more errors.

> Error diagnostics output even though there is an infoset 
> ---------------------------------------------------------
>
>                 Key: DAFFODIL-2399
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2399
>             Project: Daffodil
>          Issue Type: Bug
>            Reporter: Steve Lawrence
>            Priority: Major
>
> The HL7 schema (currently not public) currently parses successfully and outputs an infoset, but also outputs a bunch of Parse Error diagnostics, making it appear as if the parse failed. The TDML runner does not detect these additional diagnostics and so when run via a TDML runner the tests pass. But when run via the CLI it's very clear something is wrong. Testing with older version of Daffodil, this appears to go back to at least 2.4.0, so this is a very old bug. Additionally, when using the new SAX API, the error diagnostic is only output once--need to figure out why SAX errors are different than non-SAX.
> This is potentially multiple separate issues, but I'll keep this as one issue until I can create a minimial test case to reproduce this issue and can figure out what's actually going on.  Creating this ticket for now so we don't forget out this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)