You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by "Sloane, Brandon" <bs...@tresys.com> on 2019/12/18 21:16:28 UTC

Inconsistent CI results

After merging in my DataValue PR, I noticed some inconsitent results in our automated testing.

Prior to merging, all tests passed.

After merging, there was an error in 2 of the configurations:


  *   Java 11, Scala 2.11, Windows-latest https://github.com/apache/incubator-daffodil/runs/355023220

  *   Java 11, Scala 2.12, Windows-latest https://github.com/apache/incubator-daffodil/runs/355023183


After re-running the checks, the above 2 configurations passed, but there was an error on a third configuration:

  *   Java 9, scala 2.11, Ubuntu-latest https://github.com/apache/incubator-daffodil/runs/355200232

The 2.11 Windows failure was  a timeout fetching an external resource; which is a reasonable cause of a transient failure.

The 2 remaining failures were both a failure to read the file:/D:/a/incubator-daffodil/incubator-daffodil/daffodil-test/target/scala-2.11/test-classes/org/apache/daffodil/section06/namespaces/multi_base_09.dfdl.xsd file. This file claims to use the UTF-16BE encoding (eyeballing it as a hex file; this appears to be accurate. If this were a determinstic failure, I would blame encoding issues, but I am not sure what to make about the non-deterministic aspect.

Thoughts?


Brandon T. Sloane

Associate, Services

bsloane@tresys.com | tresys.com

Re: Inconsistent CI results

Posted by Steve Lawrence <sl...@apache.org>.
I've created a WIP pull request that switches from XmlStreamReader to a
scala-xml EnodingHeuristic to detect the charset.

https://github.com/apache/incubator-daffodil/pull/306

All tests pass, and I haven't been able to retrigger the failure, but I
also don't really know what caused the issue before, and I've only seen
it once today with the XmlStreamReader.

I think we should periodically trigger this PR to rebuild and see if the
tests ever fail again. If not, we can be somewhat confident that the
issue is somehow related to the XmlStreamReader.


On 12/20/19 11:25 AM, Steve Lawrence wrote:
> I created a branch on my fork with a little extra logging, and I don't
> think this is Xerces now.
> 
> The issue appears to be in the DaffodilConstructingLoader. In that
> constructor, we're creating an XmlStreamReader and calling getEncoding.
> Normally that returns UTF-16BE for these tests, but when the tests fail,
> it returns UTF-8. So for some reason something is racey there and
> XmlStreamReader isn't detecting the encoding correctly sometimes...
> 
> 
> On 12/19/19 5:55 PM, Steve Lawrence wrote:
>> On 12/19/19 12:09 PM, Dave Fisher wrote:
>>>
>>>
>>>> On Dec 18, 2019, at 1:57 PM, Steve Lawrence <sl...@apache.org> wrote:
>>>>
>>>> Unfortunately, this error happens from time to time, and we haven't been
>>>> able to track it down. Primarily because I don't think anyone has been
>>>> able to reliably reproduce it. I know I've never actually seen it
>>>> outside of the CI.
>>>>
>>>> The bug for this is https://issues.apache.org/jira/browse/DAFFODIL-1908
>>>>
>>>> I think the assumption is there is some kindof non-thread-safe code in
>>>> Xerces (or something that parses the XML) and it hits som race condition
>>>> that prevents it from detecting that the file is UTF-16, and so can't
>>>> parse the file correctly.
>>>
>>> If you think that this a Xerces issue then I’d ask on the Xerces dev list.
>>>
>>> Regards,
>>> Dave
>>>
>>
>> I'm actually not entirely convinced it's xerces yet. The SDE is
>> happening because DaffodilXMLLoader.load is returning null. Looking at
>> that function, it can return null in two different ways:
>>
>> xercesAdapter.load(inputSource)
>>
>>   and
>>
>> constructingLoader.load()
>>
>> The first is used for validation, the second actually loads the XML.
>> Based on the error it's not clear which is failing, but the
>> constructingLoader is daffodil stuff.
>>
>> Interestingly, the DaffodilConstructingLoader constructor is maybe a
>> little suspicious:
>>
>> https://github.com/apache/incubator-daffodil/blob/master/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala#L75-L87
>>
>> That code is using Apache Commons XMLStreamReader to detect the encoding
>> in the constructor. Considering the issue appears to be related to not
>> detecting UTF-16, the issue might be in there as well.
>>
>> So lots of problems where the issue could be: Xerces, Apache Commons, or
>> Daffodil.
>>
> 


Re: Inconsistent CI results

Posted by Steve Lawrence <sl...@apache.org>.
I created a branch on my fork with a little extra logging, and I don't
think this is Xerces now.

The issue appears to be in the DaffodilConstructingLoader. In that
constructor, we're creating an XmlStreamReader and calling getEncoding.
Normally that returns UTF-16BE for these tests, but when the tests fail,
it returns UTF-8. So for some reason something is racey there and
XmlStreamReader isn't detecting the encoding correctly sometimes...


On 12/19/19 5:55 PM, Steve Lawrence wrote:
> On 12/19/19 12:09 PM, Dave Fisher wrote:
>>
>>
>>> On Dec 18, 2019, at 1:57 PM, Steve Lawrence <sl...@apache.org> wrote:
>>>
>>> Unfortunately, this error happens from time to time, and we haven't been
>>> able to track it down. Primarily because I don't think anyone has been
>>> able to reliably reproduce it. I know I've never actually seen it
>>> outside of the CI.
>>>
>>> The bug for this is https://issues.apache.org/jira/browse/DAFFODIL-1908
>>>
>>> I think the assumption is there is some kindof non-thread-safe code in
>>> Xerces (or something that parses the XML) and it hits som race condition
>>> that prevents it from detecting that the file is UTF-16, and so can't
>>> parse the file correctly.
>>
>> If you think that this a Xerces issue then I’d ask on the Xerces dev list.
>>
>> Regards,
>> Dave
>>
> 
> I'm actually not entirely convinced it's xerces yet. The SDE is
> happening because DaffodilXMLLoader.load is returning null. Looking at
> that function, it can return null in two different ways:
> 
> xercesAdapter.load(inputSource)
> 
>   and
> 
> constructingLoader.load()
> 
> The first is used for validation, the second actually loads the XML.
> Based on the error it's not clear which is failing, but the
> constructingLoader is daffodil stuff.
> 
> Interestingly, the DaffodilConstructingLoader constructor is maybe a
> little suspicious:
> 
> https://github.com/apache/incubator-daffodil/blob/master/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala#L75-L87
> 
> That code is using Apache Commons XMLStreamReader to detect the encoding
> in the constructor. Considering the issue appears to be related to not
> detecting UTF-16, the issue might be in there as well.
> 
> So lots of problems where the issue could be: Xerces, Apache Commons, or
> Daffodil.
> 


Re: Inconsistent CI results

Posted by Steve Lawrence <st...@gmail.com>.
On 12/19/19 12:09 PM, Dave Fisher wrote:
> 
> 
>> On Dec 18, 2019, at 1:57 PM, Steve Lawrence <sl...@apache.org> wrote:
>>
>> Unfortunately, this error happens from time to time, and we haven't been
>> able to track it down. Primarily because I don't think anyone has been
>> able to reliably reproduce it. I know I've never actually seen it
>> outside of the CI.
>>
>> The bug for this is https://issues.apache.org/jira/browse/DAFFODIL-1908
>>
>> I think the assumption is there is some kindof non-thread-safe code in
>> Xerces (or something that parses the XML) and it hits som race condition
>> that prevents it from detecting that the file is UTF-16, and so can't
>> parse the file correctly.
> 
> If you think that this a Xerces issue then I’d ask on the Xerces dev list.
> 
> Regards,
> Dave
> 

I'm actually not entirely convinced it's xerces yet. The SDE is
happening because DaffodilXMLLoader.load is returning null. Looking at
that function, it can return null in two different ways:

xercesAdapter.load(inputSource)

  and

constructingLoader.load()

The first is used for validation, the second actually loads the XML.
Based on the error it's not clear which is failing, but the
constructingLoader is daffodil stuff.

Interestingly, the DaffodilConstructingLoader constructor is maybe a
little suspicious:

https://github.com/apache/incubator-daffodil/blob/master/daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala#L75-L87

That code is using Apache Commons XMLStreamReader to detect the encoding
in the constructor. Considering the issue appears to be related to not
detecting UTF-16, the issue might be in there as well.

So lots of problems where the issue could be: Xerces, Apache Commons, or
Daffodil.

Re: Inconsistent CI results

Posted by Dave Fisher <wa...@apache.org>.

> On Dec 18, 2019, at 1:57 PM, Steve Lawrence <sl...@apache.org> wrote:
> 
> Unfortunately, this error happens from time to time, and we haven't been
> able to track it down. Primarily because I don't think anyone has been
> able to reliably reproduce it. I know I've never actually seen it
> outside of the CI.
> 
> The bug for this is https://issues.apache.org/jira/browse/DAFFODIL-1908
> 
> I think the assumption is there is some kindof non-thread-safe code in
> Xerces (or something that parses the XML) and it hits som race condition
> that prevents it from detecting that the file is UTF-16, and so can't
> parse the file correctly.

If you think that this a Xerces issue then I’d ask on the Xerces dev list.

Regards,
Dave

> 
> When I see this, I usually just trigger a new build and it goes away so
> we at least can confirm that the new commit didn't cause any new problems.
> 
> 
> On 12/18/19 4:16 PM, Sloane, Brandon wrote:
>> After merging in my DataValue PR, I noticed some inconsitent results in our automated testing.
>> 
>> Prior to merging, all tests passed.
>> 
>> After merging, there was an error in 2 of the configurations:
>> 
>> 
>>  *   Java 11, Scala 2.11, Windows-latest https://github.com/apache/incubator-daffodil/runs/355023220
>> 
>>  *   Java 11, Scala 2.12, Windows-latest https://github.com/apache/incubator-daffodil/runs/355023183
>> 
>> 
>> After re-running the checks, the above 2 configurations passed, but there was an error on a third configuration:
>> 
>>  *   Java 9, scala 2.11, Ubuntu-latest https://github.com/apache/incubator-daffodil/runs/355200232
>> 
>> The 2.11 Windows failure was  a timeout fetching an external resource; which is a reasonable cause of a transient failure.
>> 
>> The 2 remaining failures were both a failure to read the file:/D:/a/incubator-daffodil/incubator-daffodil/daffodil-test/target/scala-2.11/test-classes/org/apache/daffodil/section06/namespaces/multi_base_09.dfdl.xsd file. This file claims to use the UTF-16BE encoding (eyeballing it as a hex file; this appears to be accurate. If this were a determinstic failure, I would blame encoding issues, but I am not sure what to make about the non-deterministic aspect.
>> 
>> Thoughts?
>> 
>> 
>> Brandon T. Sloane
>> 
>> Associate, Services
>> 
>> bsloane@tresys.com | tresys.com
>> 
> 


Re: Inconsistent CI results

Posted by Steve Lawrence <sl...@apache.org>.
Unfortunately, this error happens from time to time, and we haven't been
able to track it down. Primarily because I don't think anyone has been
able to reliably reproduce it. I know I've never actually seen it
outside of the CI.

The bug for this is https://issues.apache.org/jira/browse/DAFFODIL-1908

I think the assumption is there is some kindof non-thread-safe code in
Xerces (or something that parses the XML) and it hits som race condition
that prevents it from detecting that the file is UTF-16, and so can't
parse the file correctly.

When I see this, I usually just trigger a new build and it goes away so
we at least can confirm that the new commit didn't cause any new problems.


On 12/18/19 4:16 PM, Sloane, Brandon wrote:
> After merging in my DataValue PR, I noticed some inconsitent results in our automated testing.
> 
> Prior to merging, all tests passed.
> 
> After merging, there was an error in 2 of the configurations:
> 
> 
>   *   Java 11, Scala 2.11, Windows-latest https://github.com/apache/incubator-daffodil/runs/355023220
> 
>   *   Java 11, Scala 2.12, Windows-latest https://github.com/apache/incubator-daffodil/runs/355023183
> 
> 
> After re-running the checks, the above 2 configurations passed, but there was an error on a third configuration:
> 
>   *   Java 9, scala 2.11, Ubuntu-latest https://github.com/apache/incubator-daffodil/runs/355200232
> 
> The 2.11 Windows failure was  a timeout fetching an external resource; which is a reasonable cause of a transient failure.
> 
> The 2 remaining failures were both a failure to read the file:/D:/a/incubator-daffodil/incubator-daffodil/daffodil-test/target/scala-2.11/test-classes/org/apache/daffodil/section06/namespaces/multi_base_09.dfdl.xsd file. This file claims to use the UTF-16BE encoding (eyeballing it as a hex file; this appears to be accurate. If this were a determinstic failure, I would blame encoding issues, but I am not sure what to make about the non-deterministic aspect.
> 
> Thoughts?
> 
> 
> Brandon T. Sloane
> 
> Associate, Services
> 
> bsloane@tresys.com | tresys.com
>