You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/03/19 19:20:24 UTC

[GitHub] [daffodil] mbeckerle opened a new pull request #516: Added test for DAFFODIL-2486 discriminator bug.

mbeckerle opened a new pull request #516:
URL: https://github.com/apache/daffodil/pull/516


   DAFFODIL-2486
   
   This is a very serious bug. Ran into it while improving CSV-like data examples for DFDL training materials. 
   
   These are really simple schemas. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r598764854



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,150 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="onePass">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">

Review comment:
       1 through 5 yes.
   6 - the flaw in DFDL language is that this is so hard to figure out. One of the separatorSuppressionPolicies should correspond to "works exactly like terminators" if "postfix", and "infix" means no last terminator, and "prefix" means add an initiator before the first.
   7 - yes, I believe so, but it will also allow some things to be considered well-formed that a user might think are malformed.
   Such as allowing empty elements in places where the user wouldn't want them. 
   8 - yes that policy would disallow things like excess empty elements such as sparse arrays allowed by separatorSuppressionPolicy trailingEmpty, anyEmpty, or trailingEmptyStrict
   9 - This is a guess that having two kinds of uncertainty points might be a way to fix the problem and achieve (7). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r597966157



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,150 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="onePass">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">
+            <xs:complexType>
+              <xs:sequence>
+                <xs:element name="lastName" type="xs:string"
+                            dfdl:terminator=","/>
+                <xs:sequence>
+                  <xs:annotation>
+                    <xs:appinfo source="http://www.ogf.org/dfdl/">
+                      <!--
+                      This discriminator should discriminate the closest
+                      point of uncertainty, which should be the record array element.
+
+                      If we parse the record to this discriminator, then
+                      any subsequent error parsing the remaining fields (e.g., such as
+                      the date being incorrect format, should fail the whole parse, not
+                      just terminate the array.
+                      -->
+                      <dfdl:discriminator test="{ fn:true() }"/>
+                    </xs:appinfo>
+                  </xs:annotation>
+                </xs:sequence>
+                <xs:sequence dfdl:separator=",">
+                  <xs:element name="firstName" type="xs:string"/>
+                  <xs:element name="middleName" type="xs:string"/>
+                  <xs:element name="DOB" type="xs:date"
+                              dfdl:calendarPattern="MM/dd/yyyy"
+                              dfdl:calendarPatternKind="explicit"/>
+                </xs:sequence>
+              </xs:sequence>
+            </xs:complexType>
+          </xs:element>
+        </xs:sequence>
+      </xs:complexType>
+    </xs:element>
+
+  </tdml:defineSchema>
+
+  <!--
+  This test just illustrates that the above schema does parse well-formed CSV-like data
+  including the final date element.
+  -->
+  <tdml:parserTestCase name="nameDOB_test1" model="s1">
+    <tdml:document><![CDATA[last,first,middle,DOB
+smith,robert,brandon,03/24/1988
+johnson,john,henry,01/23/1986
+jones,arya,cat,02/19/1986
+]]></tdml:document>
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <ex:file xmlns:ex="http://example.com">
+          <record>
+            <lastName>smith</lastName>
+            <firstName>robert</firstName>
+            <middleName>brandon</middleName>
+            <DOB>1988-03-24</DOB>
+          </record>
+          <record>
+            <lastName>johnson</lastName>
+            <firstName>john</firstName>
+            <middleName>henry</middleName>
+            <DOB>1986-01-23</DOB>
+          </record>
+          <record>
+            <lastName>jones</lastName>
+            <firstName>arya</firstName>
+            <middleName>cat</middleName>
+            <DOB>1986-02-19</DOB>
+          </record>
+        </ex:file>
+      </tdml:dfdlInfoset>
+    </tdml:infoset>
+  </tdml:parserTestCase>
+
+  <!--
+    This test illustrates that because of the malformed date in the
+    final row of data, the schema will deem the whole file malformed.
+
+    Bug DAFFODIL-2486 reports that the discriminator in the schema
+    for this data does not seem to work.
+
+    The test should end with a complaint about the DOB element, which is a date.
+    Until that bug is fixed, this ends with "left over data".
+    Because it backtracks and terminates the record array based on the failure
+    to parse the date.
+    That shouldn't happen because of the discriminator.
+    -->
+  <tdml:parserTestCase name="nameDOB_test_bad_1" root="file"
+                       model="s1">
+    <tdml:document><![CDATA[last,first,middle,DOB
+smith,robert,brandon,03/24/1988
+johnson,john,henry,01/23/1986
+jones,arya,cat,1986-02-19
+]]></tdml:document>

Review comment:
       I can't believe I hit this snag, while actually writing up a TL;DR essay in a comment on another bug about exactly this problem!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] tuxji commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

tuxji commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r598307595



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,150 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="onePass">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">

Review comment:
       I'm not quite sure what is being said here, so I'll ask for clarification.
   1. The goal of the schema is to fail the parse if an error occurs in the second, third, or fourth column of any line, yes?
   2. An error should occur if a line has less than 4 columns, yes?
   3. An error should occur if a line has more than 4 columns, yes?
   4. An error should occur if the date in the 4th column can't be parsed, yes?
   5. The terminator schema works, but the separator schema doesn't work as intended, yes?
   6. Here's where I'm confused: is the problem a bug in Daffodil or a flaw in the DFDL 1.0 specification?
   7. Should the separator schema work if the bug is fixed in Daffodil?
   8. Or should the separator schema work if DFDL 1.0 allows the schema to say 
   dfdl:separatorSuppressionPolicy="disabled" to avoid creating a second point of uncertainty?
   9. Or does the problem go deeper; should the separator schema work if DFDL 1.0 allows separators to create second class points of uncertainty that don't interact with first class points of uncertainty created by discriminators?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r597984051



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,150 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="onePass">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">

Review comment:
       I have thought for a long time that DFDL was missing an important dfdl:separatorSuppressionPolicy which should probably be the most common one which is "disabled", meaning that if the sequence contains 4 children there will be 3 separators (plus 1 if prefix or postfix). That is, empty-element suppression isn't even attempted. This is equivalent to adding a terminator to all the children of the sequence but the last (for infix).
   
   Unfortunately separatorSuppressionPolicy 'never' doesn't mean this. It means that arrays first off must be bounded maxOccurs, and there will always be separators for that many occurrences. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r597971124



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,150 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="onePass">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">

Review comment:
       So you can't just put two discriminators, because only one is allowed at a given annotation point. 
   
   The 2nd point of uncertainty is quite problematic. The separatorSuppression stuff really shouldn't add first class points of uncertainty. They're related in terms of the control flow of the parser, but likely should be some sort of kind that doesn't interact with discriminators.
   
   This separatorSuppressionPolicy stuff is a very good reason to model things with terminators, not separators. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r597980825



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,150 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="onePass">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">

Review comment:
       You are correct that if the malformed date is first row, the failure is detected. 
   I also made a variant that avoids separators, uses only terminators. 
   Added tests that show this works, detects the malformed date in last row, as expected.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

stevedlawrence commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r598776358



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,226 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="none">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">
+            <xs:complexType>
+              <xs:sequence>
+                <xs:element name="lastName" type="xs:string"
+                            dfdl:terminator=","/>
+                <xs:sequence>
+                  <xs:annotation>
+                    <xs:appinfo source="http://www.ogf.org/dfdl/">
+                      <!--
+                      This discriminator should discriminate the closest
+                      point of uncertainty, which should be the record array element.
+
+                      If we parse the record to this discriminator, then
+                      any subsequent error parsing the remaining fields (e.g., such as
+                      the date being incorrect format, should fail the whole parse, not
+                      just terminate the array.
+                      -->
+                      <dfdl:discriminator test="{ fn:true() }"/>

Review comment:
       I had a thought this morning, I wonder if there is a "more right" way to model the bejavior this is trying to achieved. It seems the goal here is to say "if there is a row, and something is invalid, then that should fail the entire file". But really what this says, is "if we successfully parse the first thing, and something after it is invalid, ten that should fail the entire file".
   
   So I wonder if we really want a pattern discriminator on the record element that looks something like
   ```xml
   <dfdl:disriminator testKind="pattern" test="." />
   ```
   So if we can read at least one character, then we know the row must exist. It's possible this need to be a bit more complex (e.g. read one character that's not a CR or LF or something), but the idea is that we determine if a row exists before even trying to parse anything.
   
   It doesn't change anything related to these tests, these tests are still good and show valid use cases that don't work so should be merged. But might be an alternative approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle merged pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

mbeckerle merged pull request #516:
URL: https://github.com/apache/daffodil/pull/516


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

stevedlawrence commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r597943896



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,150 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="onePass">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">
+            <xs:complexType>
+              <xs:sequence>
+                <xs:element name="lastName" type="xs:string"
+                            dfdl:terminator=","/>
+                <xs:sequence>
+                  <xs:annotation>
+                    <xs:appinfo source="http://www.ogf.org/dfdl/">
+                      <!--
+                      This discriminator should discriminate the closest
+                      point of uncertainty, which should be the record array element.
+
+                      If we parse the record to this discriminator, then
+                      any subsequent error parsing the remaining fields (e.g., such as
+                      the date being incorrect format, should fail the whole parse, not
+                      just terminate the array.
+                      -->
+                      <dfdl:discriminator test="{ fn:true() }"/>
+                    </xs:appinfo>
+                  </xs:annotation>
+                </xs:sequence>
+                <xs:sequence dfdl:separator=",">
+                  <xs:element name="firstName" type="xs:string"/>
+                  <xs:element name="middleName" type="xs:string"/>
+                  <xs:element name="DOB" type="xs:date"
+                              dfdl:calendarPattern="MM/dd/yyyy"
+                              dfdl:calendarPatternKind="explicit"/>
+                </xs:sequence>
+              </xs:sequence>
+            </xs:complexType>
+          </xs:element>
+        </xs:sequence>
+      </xs:complexType>
+    </xs:element>
+
+  </tdml:defineSchema>
+
+  <!--
+  This test just illustrates that the above schema does parse well-formed CSV-like data
+  including the final date element.
+  -->
+  <tdml:parserTestCase name="nameDOB_test1" model="s1">
+    <tdml:document><![CDATA[last,first,middle,DOB
+smith,robert,brandon,03/24/1988
+johnson,john,henry,01/23/1986
+jones,arya,cat,02/19/1986
+]]></tdml:document>
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <ex:file xmlns:ex="http://example.com">
+          <record>
+            <lastName>smith</lastName>
+            <firstName>robert</firstName>
+            <middleName>brandon</middleName>
+            <DOB>1988-03-24</DOB>
+          </record>
+          <record>
+            <lastName>johnson</lastName>
+            <firstName>john</firstName>
+            <middleName>henry</middleName>
+            <DOB>1986-01-23</DOB>
+          </record>
+          <record>
+            <lastName>jones</lastName>
+            <firstName>arya</firstName>
+            <middleName>cat</middleName>
+            <DOB>1986-02-19</DOB>
+          </record>
+        </ex:file>
+      </tdml:dfdlInfoset>
+    </tdml:infoset>
+  </tdml:parserTestCase>
+
+  <!--
+    This test illustrates that because of the malformed date in the
+    final row of data, the schema will deem the whole file malformed.
+
+    Bug DAFFODIL-2486 reports that the discriminator in the schema
+    for this data does not seem to work.
+
+    The test should end with a complaint about the DOB element, which is a date.
+    Until that bug is fixed, this ends with "left over data".
+    Because it backtracks and terminates the record array based on the failure
+    to parse the date.
+    That shouldn't happen because of the discriminator.
+    -->
+  <tdml:parserTestCase name="nameDOB_test_bad_1" root="file"
+                       model="s1">
+    <tdml:document><![CDATA[last,first,middle,DOB
+smith,robert,brandon,03/24/1988
+johnson,john,henry,01/23/1986
+jones,arya,cat,1986-02-19
+]]></tdml:document>

Review comment:
       This doesn't roundtrip on windows due to the newlines. Maybe disable roundtrip or use %LF;s in the data/




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence commented on pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

stevedlawrence commented on pull request #516:
URL: https://github.com/apache/daffodil/pull/516#issuecomment-803988403


   +1 to the new changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r597984440



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,150 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="onePass">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">

Review comment:
       For DFDL v1.0 I think this means that one should model separation of sequences with terminators unless one wants some sort of suppression behavior. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

stevedlawrence commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r597941549



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,150 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="onePass">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">

Review comment:
       +1
   
   I was curious to see if the CLI debugger would help shine some light on what's going on, and it actually was somewhat helpful now that it shows PoU's! Here's what's happening at a high level:
   
   The first time a record is parsed we create a single PoU when the record element is created. The discriminator correctly resolves that PoU.
   
   For the second and subsequent records, we actually create two PoU's when a record is parsed: one is for the record, one is for the the outer sequence (maybe related to separator suppression policy?). When the disciminator evaluates to true, it resolves only the record PoU but not the the sequence PoU. The existance of that outer PoU allows the parse to succeed and results in the left over data error.
   
   So adding a second discriminator does allow this to work, though that probably shouldn't be required.
   
   It also means that if there is an error parsing the first record, then you do get the expected error because there's only a single PoU for the first record. So whatever the solution, it's probably also worth adding a test where the error is on the first record, since there clearly is some difference in behavior between first and subsequent records right now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on a change in pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on a change in pull request #516:
URL: https://github.com/apache/daffodil/pull/516#discussion_r598815519



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section07/discriminators/discriminator2.tdml
##########
@@ -0,0 +1,226 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<testSuite suiteName="NameDOB"
+           xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:fn="http://www.w3.org/2005/xpath-functions"
+           xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+           xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+           xmlns:ex="http://example.com"
+           xmlns:tns="http://example.com"
+           defaultRoundTrip="none">
+
+  <tdml:defineSchema name="s1">
+
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
+
+    <dfdl:format
+      ref="tns:GeneralFormat"
+      representation="text"
+      encoding="ASCII"
+      lengthKind="delimited"
+      separator=""
+      separatorPosition="infix"
+    />
+
+    <!--
+    Schema for simple CSV-like file containing 4 columns, the last of which is a date.
+
+    What makes this non-trivial is use of discriminator after the first column.
+    If we can parse a first column, then the remaining columns MUST be present, and an error
+    in parsing them should be fatal.
+
+    -->
+    <xs:element name="file" dfdl:initiator="last,first,middle,DOB%NL;%WSP*;">
+      <xs:complexType>
+        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
+          <xs:element name="record" maxOccurs="unbounded">
+            <xs:complexType>
+              <xs:sequence>
+                <xs:element name="lastName" type="xs:string"
+                            dfdl:terminator=","/>
+                <xs:sequence>
+                  <xs:annotation>
+                    <xs:appinfo source="http://www.ogf.org/dfdl/">
+                      <!--
+                      This discriminator should discriminate the closest
+                      point of uncertainty, which should be the record array element.
+
+                      If we parse the record to this discriminator, then
+                      any subsequent error parsing the remaining fields (e.g., such as
+                      the date being incorrect format, should fail the whole parse, not
+                      just terminate the array.
+                      -->
+                      <dfdl:discriminator test="{ fn:true() }"/>

Review comment:
       Yeah, I thought of this on the weekend also. A pattern discriminator would be better, as it could be at the very start of the record. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on pull request #516: Added test for DAFFODIL-2486 discriminator bug.

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on pull request #516:
URL: https://github.com/apache/daffodil/pull/516#issuecomment-803169227


   @stevedlawrence please take a look again. I found another significant bug, so added tests to characterize that as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org