You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Alex Amato <aj...@google.com> on 2019/01/22 23:34:56 UTC

FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

I've seen this fail in a few different PRs for different contributors, and
its causing some issues during the presubmit process.. This is a
multithreadred test with a lot of sleeps, so it looks a bit suspicious as
the source of the problem.
https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/

I filed a JIRA for this issue:
https://jira.apache.org/jira/browse/BEAM-6491?filter=-2

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Jeff Klukas <jk...@mozilla.com>.
Thanks for the review, Alex. I pushed an additional commit to add
commentary and to explicitly pass a copy option that indicates we want to
preserve attributes like timestamps.

On Wed, Jan 23, 2019 at 12:23 PM Alex Amato <aj...@google.com> wrote:

> Thanks Jeff, I reviewed your PR with one suggestion to add a comment to
> make the test more clear. I am assuming the modified times get copied, not
> re-timestamped on copy, which is why your method works.
> Otherwise looks good to me
>
> On Wed, Jan 23, 2019 at 5:49 AM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> Posted https://github.com/apache/beam/pull/7599
>>
>> That PR follows suggestion #4. I chose that route because it maintains
>> the PAssert containsInAnyOrder check which seems easier to read and more
>> straight-forward than PAssert satisfies.
>>
>> Do let me know if you disagree and I can switch back to Eugene's
>> suggestion #1.
>>
>> On Wed, Jan 23, 2019 at 8:12 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>
>>> Suggestion #4: Create source files outside the writer thread, and then
>>> copy them from a source directory to the watched directory. That should
>>> atomically write the file with the already known lastModificationTime.
>>>
>>> On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>>
>>>> I'll work on getting a PR together this morning, probably following
>>>> Eugene's suggestion #1.
>>>>
>>>> On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> Alex, the only way to implement my suggestion #1 (that I know of)
>>>>> would be to write to a file and read it back.
>>>>> I don't have good example for #2.
>>>>>
>>>>> Eugene's suggestion no. 1 seems like a good idea. There are some
>>>>> example
>>>>> <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
>>>>> in the codebase.
>>>>>
>>>>> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <ki...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Yeah the "List<MatchResult.Metadata> expected" is constructed
>>>>>> from Files.getLastModifiedTime() calls before the files are actually
>>>>>> modified, the code is basically unconditionally broken rather than merely
>>>>>> flaky.
>>>>>>
>>>>>> There's several easy options:
>>>>>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>>>>>> assertThat().contains() inside that, with the list constructed at time the
>>>>>> assertion is applied rather than declared.
>>>>>> 2) Implement a Matcher<Metadata> that ignores last modified time and
>>>>>> use that
>>>>>>
>>>>>> Jeff - your option #3 is unfortunately also race-prone, because the
>>>>>> code may match the files after they have been written but before
>>>>>> setLastModifiedTime was called.
>>>>>>
>>>>>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <je...@klukas.net> wrote:
>>>>>>
>>>>>>> Another option:
>>>>>>>
>>>>>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>>>>>>> after each File.write. Then the lastModifiedMillis can be a stable value
>>>>>>> for each file and we can use those same static values in our expected
>>>>>>> result. I think that would also eliminate the race condition.
>>>>>>>
>>>>>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <aj...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Udi, is there a good example for either of these?
>>>>>>>> #1 - seems like you have to rewrite your assertion logic without
>>>>>>>> the PAssert? Is there some way to capture the pipeline output and iterate
>>>>>>>> over it? The pattern I have seen for this in the past also has thread
>>>>>>>> safety issues (Using a DoFn at the end of the pipeline to add the output to
>>>>>>>> a collection is not safe since the collection can be executed concurrently)
>>>>>>>> #2 - Would BigqueryMatcher be a good example for this? which is
>>>>>>>> used in BigQueryTornadoesIT.java Or is there another example you would
>>>>>>>> suggest looking at for reference?
>>>>>>>>
>>>>>>>>    - I guess to this you need to implement the SerializableMatcher
>>>>>>>>    interface and use the matcher as an option in the pipeline options.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Some options:
>>>>>>>>> - You could wait to assert until after p.waitForFinish().
>>>>>>>>> - You could PAssert using SerializableMatcher and allow any
>>>>>>>>> lastModifiedTime.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <aj...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +Jeff, Eugene,
>>>>>>>>>>
>>>>>>>>>> Hi Jeff and Eugene,
>>>>>>>>>>
>>>>>>>>>> I've noticed that Jeff's PR
>>>>>>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> introduced
>>>>>>>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>>>>>>>> test check in a thread safe way. I believe this to be the source of the
>>>>>>>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>>>>>>>> test)?
>>>>>>>>>>
>>>>>>>>>> I added some details to this JIRA issue explaining in full
>>>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've seen this fail in a few different PRs for different
>>>>>>>>>>> contributors, and its causing some issues during the presubmit process..
>>>>>>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>>>>>>>>>> suspicious as the source of the problem.
>>>>>>>>>>>
>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>>>>>>
>>>>>>>>>>> I filed a JIRA for this issue:
>>>>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Alex Amato <aj...@google.com>.
Thanks Jeff, I reviewed your PR with one suggestion to add a comment to
make the test more clear. I am assuming the modified times get copied, not
re-timestamped on copy, which is why your method works.
Otherwise looks good to me

On Wed, Jan 23, 2019 at 5:49 AM Jeff Klukas <jk...@mozilla.com> wrote:

> Posted https://github.com/apache/beam/pull/7599
>
> That PR follows suggestion #4. I chose that route because it maintains the
> PAssert containsInAnyOrder check which seems easier to read and more
> straight-forward than PAssert satisfies.
>
> Do let me know if you disagree and I can switch back to Eugene's
> suggestion #1.
>
> On Wed, Jan 23, 2019 at 8:12 AM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> Suggestion #4: Create source files outside the writer thread, and then
>> copy them from a source directory to the watched directory. That should
>> atomically write the file with the already known lastModificationTime.
>>
>> On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas <jk...@mozilla.com> wrote:
>>
>>> I'll work on getting a PR together this morning, probably following
>>> Eugene's suggestion #1.
>>>
>>> On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> Alex, the only way to implement my suggestion #1 (that I know of) would
>>>> be to write to a file and read it back.
>>>> I don't have good example for #2.
>>>>
>>>> Eugene's suggestion no. 1 seems like a good idea. There are some
>>>> example
>>>> <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
>>>> in the codebase.
>>>>
>>>> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <ki...@google.com>
>>>> wrote:
>>>>
>>>>> Yeah the "List<MatchResult.Metadata> expected" is constructed
>>>>> from Files.getLastModifiedTime() calls before the files are actually
>>>>> modified, the code is basically unconditionally broken rather than merely
>>>>> flaky.
>>>>>
>>>>> There's several easy options:
>>>>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>>>>> assertThat().contains() inside that, with the list constructed at time the
>>>>> assertion is applied rather than declared.
>>>>> 2) Implement a Matcher<Metadata> that ignores last modified time and
>>>>> use that
>>>>>
>>>>> Jeff - your option #3 is unfortunately also race-prone, because the
>>>>> code may match the files after they have been written but before
>>>>> setLastModifiedTime was called.
>>>>>
>>>>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <je...@klukas.net> wrote:
>>>>>
>>>>>> Another option:
>>>>>>
>>>>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>>>>>> after each File.write. Then the lastModifiedMillis can be a stable value
>>>>>> for each file and we can use those same static values in our expected
>>>>>> result. I think that would also eliminate the race condition.
>>>>>>
>>>>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <aj...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Udi, is there a good example for either of these?
>>>>>>> #1 - seems like you have to rewrite your assertion logic without the
>>>>>>> PAssert? Is there some way to capture the pipeline output and iterate over
>>>>>>> it? The pattern I have seen for this in the past also has thread safety
>>>>>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>>>>>> collection is not safe since the collection can be executed concurrently)
>>>>>>> #2 - Would BigqueryMatcher be a good example for this? which is used
>>>>>>> in BigQueryTornadoesIT.java Or is there another example you would suggest
>>>>>>> looking at for reference?
>>>>>>>
>>>>>>>    - I guess to this you need to implement the SerializableMatcher
>>>>>>>    interface and use the matcher as an option in the pipeline options.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> Some options:
>>>>>>>> - You could wait to assert until after p.waitForFinish().
>>>>>>>> - You could PAssert using SerializableMatcher and allow any
>>>>>>>> lastModifiedTime.
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <aj...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +Jeff, Eugene,
>>>>>>>>>
>>>>>>>>> Hi Jeff and Eugene,
>>>>>>>>>
>>>>>>>>> I've noticed that Jeff's PR
>>>>>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> introduced
>>>>>>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>>>>>>> test check in a thread safe way. I believe this to be the source of the
>>>>>>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>>>>>>> test)?
>>>>>>>>>
>>>>>>>>> I added some details to this JIRA issue explaining in full
>>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I've seen this fail in a few different PRs for different
>>>>>>>>>> contributors, and its causing some issues during the presubmit process..
>>>>>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>>>>>>>>> suspicious as the source of the problem.
>>>>>>>>>>
>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>>>>>
>>>>>>>>>> I filed a JIRA for this issue:
>>>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Jeff Klukas <jk...@mozilla.com>.
Posted https://github.com/apache/beam/pull/7599

That PR follows suggestion #4. I chose that route because it maintains the
PAssert containsInAnyOrder check which seems easier to read and more
straight-forward than PAssert satisfies.

Do let me know if you disagree and I can switch back to Eugene's suggestion
#1.

On Wed, Jan 23, 2019 at 8:12 AM Jeff Klukas <jk...@mozilla.com> wrote:

> Suggestion #4: Create source files outside the writer thread, and then
> copy them from a source directory to the watched directory. That should
> atomically write the file with the already known lastModificationTime.
>
> On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas <jk...@mozilla.com> wrote:
>
>> I'll work on getting a PR together this morning, probably following
>> Eugene's suggestion #1.
>>
>> On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> Alex, the only way to implement my suggestion #1 (that I know of) would
>>> be to write to a file and read it back.
>>> I don't have good example for #2.
>>>
>>> Eugene's suggestion no. 1 seems like a good idea. There are some example
>>> <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
>>> in the codebase.
>>>
>>> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <ki...@google.com>
>>> wrote:
>>>
>>>> Yeah the "List<MatchResult.Metadata> expected" is constructed
>>>> from Files.getLastModifiedTime() calls before the files are actually
>>>> modified, the code is basically unconditionally broken rather than merely
>>>> flaky.
>>>>
>>>> There's several easy options:
>>>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>>>> assertThat().contains() inside that, with the list constructed at time the
>>>> assertion is applied rather than declared.
>>>> 2) Implement a Matcher<Metadata> that ignores last modified time and
>>>> use that
>>>>
>>>> Jeff - your option #3 is unfortunately also race-prone, because the
>>>> code may match the files after they have been written but before
>>>> setLastModifiedTime was called.
>>>>
>>>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <je...@klukas.net> wrote:
>>>>
>>>>> Another option:
>>>>>
>>>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>>>>> after each File.write. Then the lastModifiedMillis can be a stable value
>>>>> for each file and we can use those same static values in our expected
>>>>> result. I think that would also eliminate the race condition.
>>>>>
>>>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <aj...@google.com> wrote:
>>>>>
>>>>>> Thanks Udi, is there a good example for either of these?
>>>>>> #1 - seems like you have to rewrite your assertion logic without the
>>>>>> PAssert? Is there some way to capture the pipeline output and iterate over
>>>>>> it? The pattern I have seen for this in the past also has thread safety
>>>>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>>>>> collection is not safe since the collection can be executed concurrently)
>>>>>> #2 - Would BigqueryMatcher be a good example for this? which is used
>>>>>> in BigQueryTornadoesIT.java Or is there another example you would suggest
>>>>>> looking at for reference?
>>>>>>
>>>>>>    - I guess to this you need to implement the SerializableMatcher
>>>>>>    interface and use the matcher as an option in the pipeline options.
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> Some options:
>>>>>>> - You could wait to assert until after p.waitForFinish().
>>>>>>> - You could PAssert using SerializableMatcher and allow any
>>>>>>> lastModifiedTime.
>>>>>>>
>>>>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <aj...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +Jeff, Eugene,
>>>>>>>>
>>>>>>>> Hi Jeff and Eugene,
>>>>>>>>
>>>>>>>> I've noticed that Jeff's PR
>>>>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> introduced
>>>>>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>>>>>> test check in a thread safe way. I believe this to be the source of the
>>>>>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>>>>>> test)?
>>>>>>>>
>>>>>>>> I added some details to this JIRA issue explaining in full
>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I've seen this fail in a few different PRs for different
>>>>>>>>> contributors, and its causing some issues during the presubmit process..
>>>>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>>>>>>>> suspicious as the source of the problem.
>>>>>>>>>
>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>>>>
>>>>>>>>> I filed a JIRA for this issue:
>>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Jeff Klukas <jk...@mozilla.com>.
Suggestion #4: Create source files outside the writer thread, and then copy
them from a source directory to the watched directory. That should
atomically write the file with the already known lastModificationTime.

On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas <jk...@mozilla.com> wrote:

> I'll work on getting a PR together this morning, probably following
> Eugene's suggestion #1.
>
> On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri <eh...@google.com> wrote:
>
>> Alex, the only way to implement my suggestion #1 (that I know of) would
>> be to write to a file and read it back.
>> I don't have good example for #2.
>>
>> Eugene's suggestion no. 1 seems like a good idea. There are some example
>> <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
>> in the codebase.
>>
>> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> Yeah the "List<MatchResult.Metadata> expected" is constructed
>>> from Files.getLastModifiedTime() calls before the files are actually
>>> modified, the code is basically unconditionally broken rather than merely
>>> flaky.
>>>
>>> There's several easy options:
>>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>>> assertThat().contains() inside that, with the list constructed at time the
>>> assertion is applied rather than declared.
>>> 2) Implement a Matcher<Metadata> that ignores last modified time and use
>>> that
>>>
>>> Jeff - your option #3 is unfortunately also race-prone, because the code
>>> may match the files after they have been written but before
>>> setLastModifiedTime was called.
>>>
>>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <je...@klukas.net> wrote:
>>>
>>>> Another option:
>>>>
>>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>>>> after each File.write. Then the lastModifiedMillis can be a stable value
>>>> for each file and we can use those same static values in our expected
>>>> result. I think that would also eliminate the race condition.
>>>>
>>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <aj...@google.com> wrote:
>>>>
>>>>> Thanks Udi, is there a good example for either of these?
>>>>> #1 - seems like you have to rewrite your assertion logic without the
>>>>> PAssert? Is there some way to capture the pipeline output and iterate over
>>>>> it? The pattern I have seen for this in the past also has thread safety
>>>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>>>> collection is not safe since the collection can be executed concurrently)
>>>>> #2 - Would BigqueryMatcher be a good example for this? which is used
>>>>> in BigQueryTornadoesIT.java Or is there another example you would suggest
>>>>> looking at for reference?
>>>>>
>>>>>    - I guess to this you need to implement the SerializableMatcher
>>>>>    interface and use the matcher as an option in the pipeline options.
>>>>>
>>>>>
>>>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> Some options:
>>>>>> - You could wait to assert until after p.waitForFinish().
>>>>>> - You could PAssert using SerializableMatcher and allow any
>>>>>> lastModifiedTime.
>>>>>>
>>>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <aj...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +Jeff, Eugene,
>>>>>>>
>>>>>>> Hi Jeff and Eugene,
>>>>>>>
>>>>>>> I've noticed that Jeff's PR
>>>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> introduced
>>>>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>>>>> test check in a thread safe way. I believe this to be the source of the
>>>>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>>>>> test)?
>>>>>>>
>>>>>>> I added some details to this JIRA issue explaining in full
>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've seen this fail in a few different PRs for different
>>>>>>>> contributors, and its causing some issues during the presubmit process..
>>>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>>>>>>> suspicious as the source of the problem.
>>>>>>>>
>>>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>>>
>>>>>>>> I filed a JIRA for this issue:
>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Jeff Klukas <jk...@mozilla.com>.
I'll work on getting a PR together this morning, probably following
Eugene's suggestion #1.

On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri <eh...@google.com> wrote:

> Alex, the only way to implement my suggestion #1 (that I know of) would be
> to write to a file and read it back.
> I don't have good example for #2.
>
> Eugene's suggestion no. 1 seems like a good idea. There are some example
> <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
> in the codebase.
>
> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> Yeah the "List<MatchResult.Metadata> expected" is constructed
>> from Files.getLastModifiedTime() calls before the files are actually
>> modified, the code is basically unconditionally broken rather than merely
>> flaky.
>>
>> There's several easy options:
>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>> assertThat().contains() inside that, with the list constructed at time the
>> assertion is applied rather than declared.
>> 2) Implement a Matcher<Metadata> that ignores last modified time and use
>> that
>>
>> Jeff - your option #3 is unfortunately also race-prone, because the code
>> may match the files after they have been written but before
>> setLastModifiedTime was called.
>>
>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <je...@klukas.net> wrote:
>>
>>> Another option:
>>>
>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>>> after each File.write. Then the lastModifiedMillis can be a stable value
>>> for each file and we can use those same static values in our expected
>>> result. I think that would also eliminate the race condition.
>>>
>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <aj...@google.com> wrote:
>>>
>>>> Thanks Udi, is there a good example for either of these?
>>>> #1 - seems like you have to rewrite your assertion logic without the
>>>> PAssert? Is there some way to capture the pipeline output and iterate over
>>>> it? The pattern I have seen for this in the past also has thread safety
>>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>>> collection is not safe since the collection can be executed concurrently)
>>>> #2 - Would BigqueryMatcher be a good example for this? which is used in
>>>> BigQueryTornadoesIT.java Or is there another example you would suggest
>>>> looking at for reference?
>>>>
>>>>    - I guess to this you need to implement the SerializableMatcher
>>>>    interface and use the matcher as an option in the pipeline options.
>>>>
>>>>
>>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> Some options:
>>>>> - You could wait to assert until after p.waitForFinish().
>>>>> - You could PAssert using SerializableMatcher and allow any
>>>>> lastModifiedTime.
>>>>>
>>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <aj...@google.com> wrote:
>>>>>
>>>>>> +Jeff, Eugene,
>>>>>>
>>>>>> Hi Jeff and Eugene,
>>>>>>
>>>>>> I've noticed that Jeff's PR
>>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> introduced
>>>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>>>> test check in a thread safe way. I believe this to be the source of the
>>>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>>>> test)?
>>>>>>
>>>>>> I added some details to this JIRA issue explaining in full
>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I've seen this fail in a few different PRs for different
>>>>>>> contributors, and its causing some issues during the presubmit process..
>>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>>>>>> suspicious as the source of the problem.
>>>>>>>
>>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>>
>>>>>>> I filed a JIRA for this issue:
>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>
>>>>>>>
>>>>>>>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Udi Meiri <eh...@google.com>.
Alex, the only way to implement my suggestion #1 (that I know of) would be
to write to a file and read it back.
I don't have good example for #2.

Eugene's suggestion no. 1 seems like a good idea. There are some example
<https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
in the codebase.

On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <ki...@google.com>
wrote:

> Yeah the "List<MatchResult.Metadata> expected" is constructed
> from Files.getLastModifiedTime() calls before the files are actually
> modified, the code is basically unconditionally broken rather than merely
> flaky.
>
> There's several easy options:
> 1) Use PAssert.that().satisfies() instead of .contains(), and use
> assertThat().contains() inside that, with the list constructed at time the
> assertion is applied rather than declared.
> 2) Implement a Matcher<Metadata> that ignores last modified time and use
> that
>
> Jeff - your option #3 is unfortunately also race-prone, because the code
> may match the files after they have been written but before
> setLastModifiedTime was called.
>
> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <je...@klukas.net> wrote:
>
>> Another option:
>>
>> #3 Have the writer thread call Files.setLastModifiedTime explicitly after
>> each File.write. Then the lastModifiedMillis can be a stable value for each
>> file and we can use those same static values in our expected result. I
>> think that would also eliminate the race condition.
>>
>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <aj...@google.com> wrote:
>>
>>> Thanks Udi, is there a good example for either of these?
>>> #1 - seems like you have to rewrite your assertion logic without the
>>> PAssert? Is there some way to capture the pipeline output and iterate over
>>> it? The pattern I have seen for this in the past also has thread safety
>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>> collection is not safe since the collection can be executed concurrently)
>>> #2 - Would BigqueryMatcher be a good example for this? which is used in
>>> BigQueryTornadoesIT.java Or is there another example you would suggest
>>> looking at for reference?
>>>
>>>    - I guess to this you need to implement the SerializableMatcher
>>>    interface and use the matcher as an option in the pipeline options.
>>>
>>>
>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> Some options:
>>>> - You could wait to assert until after p.waitForFinish().
>>>> - You could PAssert using SerializableMatcher and allow any
>>>> lastModifiedTime.
>>>>
>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <aj...@google.com> wrote:
>>>>
>>>>> +Jeff, Eugene,
>>>>>
>>>>> Hi Jeff and Eugene,
>>>>>
>>>>> I've noticed that Jeff's PR
>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> introduced
>>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>>> test check in a thread safe way. I believe this to be the source of the
>>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>>> test)?
>>>>>
>>>>> I added some details to this JIRA issue explaining in full
>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>
>>>>>
>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com> wrote:
>>>>>
>>>>>> I've seen this fail in a few different PRs for different
>>>>>> contributors, and its causing some issues during the presubmit process..
>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>>>>> suspicious as the source of the problem.
>>>>>>
>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>
>>>>>> I filed a JIRA for this issue:
>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>
>>>>>>
>>>>>>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Eugene Kirpichov <ki...@google.com>.
Yeah the "List<MatchResult.Metadata> expected" is constructed
from Files.getLastModifiedTime() calls before the files are actually
modified, the code is basically unconditionally broken rather than merely
flaky.

There's several easy options:
1) Use PAssert.that().satisfies() instead of .contains(), and use
assertThat().contains() inside that, with the list constructed at time the
assertion is applied rather than declared.
2) Implement a Matcher<Metadata> that ignores last modified time and use
that

Jeff - your option #3 is unfortunately also race-prone, because the code
may match the files after they have been written but before
setLastModifiedTime was called.

On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <je...@klukas.net> wrote:

> Another option:
>
> #3 Have the writer thread call Files.setLastModifiedTime explicitly after
> each File.write. Then the lastModifiedMillis can be a stable value for each
> file and we can use those same static values in our expected result. I
> think that would also eliminate the race condition.
>
> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <aj...@google.com> wrote:
>
>> Thanks Udi, is there a good example for either of these?
>> #1 - seems like you have to rewrite your assertion logic without the
>> PAssert? Is there some way to capture the pipeline output and iterate over
>> it? The pattern I have seen for this in the past also has thread safety
>> issues (Using a DoFn at the end of the pipeline to add the output to a
>> collection is not safe since the collection can be executed concurrently)
>> #2 - Would BigqueryMatcher be a good example for this? which is used in
>> BigQueryTornadoesIT.java Or is there another example you would suggest
>> looking at for reference?
>>
>>    - I guess to this you need to implement the SerializableMatcher
>>    interface and use the matcher as an option in the pipeline options.
>>
>>
>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> Some options:
>>> - You could wait to assert until after p.waitForFinish().
>>> - You could PAssert using SerializableMatcher and allow any
>>> lastModifiedTime.
>>>
>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <aj...@google.com> wrote:
>>>
>>>> +Jeff, Eugene,
>>>>
>>>> Hi Jeff and Eugene,
>>>>
>>>> I've noticed that Jeff's PR
>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> introduced
>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>> test check in a thread safe way. I believe this to be the source of the
>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>> test)?
>>>>
>>>> I added some details to this JIRA issue explaining in full
>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>
>>>>
>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com> wrote:
>>>>
>>>>> I've seen this fail in a few different PRs for different contributors,
>>>>> and its causing some issues during the presubmit process.. This is a
>>>>> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
>>>>> the source of the problem.
>>>>>
>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>
>>>>> I filed a JIRA for this issue:
>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>
>>>>>
>>>>>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Alex Amato <aj...@google.com>.
Thanks Udi, is there a good example for either of these?
#1 - seems like you have to rewrite your assertion logic without the
PAssert? Is there some way to capture the pipeline output and iterate over
it? The pattern I have seen for this in the past also has thread safety
issues (Using a DoFn at the end of the pipeline to add the output to a
collection is not safe since the collection can be executed concurrently)
#2 - Would BigqueryMatcher be a good example for this? which is used in
BigQueryTornadoesIT.java Or is there another example you would suggest
looking at for reference?

   - I guess to this you need to implement the SerializableMatcher
   interface and use the matcher as an option in the pipeline options.


On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:

> Some options:
> - You could wait to assert until after p.waitForFinish().
> - You could PAssert using SerializableMatcher and allow any
> lastModifiedTime.
>
> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <aj...@google.com> wrote:
>
>> +Jeff, Eugene,
>>
>> Hi Jeff and Eugene,
>>
>> I've noticed that Jeff's PR
>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> introduced
>> a race condition in this test, but its not clear exactly how to add Jeff's
>> test check in a thread safe way. I believe this to be the source of the
>> flakeyness Do you have any suggestions Eugene (since you authored this
>> test)?
>>
>> I added some details to this JIRA issue explaining in full
>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>
>>
>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com> wrote:
>>
>>> I've seen this fail in a few different PRs for different contributors,
>>> and its causing some issues during the presubmit process.. This is a
>>> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
>>> the source of the problem.
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>
>>> I filed a JIRA for this issue:
>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>
>>>
>>>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Udi Meiri <eh...@google.com>.
Some options:
- You could wait to assert until after p.waitForFinish().
- You could PAssert using SerializableMatcher and allow any
lastModifiedTime.

On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <aj...@google.com> wrote:

> +Jeff, Eugene,
>
> Hi Jeff and Eugene,
>
> I've noticed that Jeff's PR
> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> introduced
> a race condition in this test, but its not clear exactly how to add Jeff's
> test check in a thread safe way. I believe this to be the source of the
> flakeyness Do you have any suggestions Eugene (since you authored this
> test)?
>
> I added some details to this JIRA issue explaining in full
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com> wrote:
>
>> I've seen this fail in a few different PRs for different contributors,
>> and its causing some issues during the presubmit process.. This is a
>> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
>> the source of the problem.
>>
>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>
>> I filed a JIRA for this issue:
>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>
>>
>>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Alex Amato <aj...@google.com>.
+Jeff, Eugene,

Hi Jeff and Eugene,

I've noticed that Jeff's PR
<https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884>
introduced
a race condition in this test, but its not clear exactly how to add Jeff's
test check in a thread safe way. I believe this to be the source of the
flakeyness Do you have any suggestions Eugene (since you authored this
test)?

I added some details to this JIRA issue explaining in full
https://jira.apache.org/jira/browse/BEAM-6491?filter=-2


On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <aj...@google.com> wrote:

> I've seen this fail in a few different PRs for different contributors, and
> its causing some issues during the presubmit process.. This is a
> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
> the source of the problem.
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>
> I filed a JIRA for this issue:
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

Posted by Ruoyun Huang <ru...@google.com>.
+1  getting the same issue as well.

I saw there were @Ignore on those tests before. If it is not critical and
just caused by the way how we do test, does it make sense to put those
@Ignores back until it's resolved?


On Tue, Jan 22, 2019 at 3:35 PM Alex Amato <aj...@google.com> wrote:

> I've seen this fail in a few different PRs for different contributors, and
> its causing some issues during the presubmit process.. This is a
> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
> the source of the problem.
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>
> I filed a JIRA for this issue:
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
>

-- 
================
Ruoyun  Huang