You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Jan Lukavský <je...@seznam.cz> on 2019/05/09 14:42:34 UTC

Unexpected behavior of StateSpecs

Hi,

I have spent several hour digging into strange issue with DirectRunner, 
that manifested as non-deterministic run of pipeline. The pipeline 
contains basically only single stateful ParDo, which adds elements into 
state and after some timeout flushes these elements into output. The 
issues was, that sometimes (very often) when the timer fired, the state 
appeared to be empty, although I actually added something into the 
state. I will skip details, but the problem boils down to the fact, that 
StateSpecs hash Coder into hashCode - e.g.

     @Override
     public int hashCode() {
       return Objects.hash(getClass(), coder);
     }

in ValueStateSpec. Now, when Coder doesn't have hashCode and equals 
implemented (and there are some of those in the codebase itself - e.g. 
SchemaCoder), it all blows up in a very hard-to-debug manner. So the 
proposal is - either to add abstract hashCode and equals to Coder, or 
don't hash the Coder into hashCode of StateSpecs (we can generate unique 
ID for each StateSpec instance for example).

Any thoughts about which path to follow? Or maybe both? :)

Jan



Re: Unexpected behavior of StateSpecs

Posted by Jan Lukavský <je...@seznam.cz>.
Hi Anton,

yes, if the keyCoder doesn't have proper hashCode and equals, then it 
would manifest exactly as described.

Jan

On 5/9/19 6:28 PM, Anton Kedin wrote:
> Does it look similar to 
> https://issues.apache.org/jira/browse/BEAM-6813 ? I also stumbled on a 
> problem with a state in DirectRunner but wasn't able to figure it out 
> yet: 
> https://lists.apache.org/thread.html/dae8b605a218532c085a0eea4e71338eae51922c26820f37b24875c0@%3Cdev.beam.apache.org%3E
>
> Regards,
> Anton
>
> *From: *Jan Lukavský <je.ik@seznam.cz <ma...@seznam.cz>>
> *Date: *Thu, May 9, 2019 at 8:13 AM
> *To: * <dev@beam.apache.org <ma...@beam.apache.org>>
>
>     Because of the use of hashCode in StateSpecs, I'd say that it is.
>     But it is not obvious. That's why I'd suggest to make it abstract
>     on Coder, so that all implementations have to override it. That's
>     a simple solution, but the question is - should hashCode of Coder
>     be used that way? I think that StateSpec instances should be equal
>     only to itself. Then the hashCode can be stored in the instance, e.g.
>
>       private final int hashCode = System.identityHashCode(this)
>
>     and returned in hashCode(). There would be no need for Coder to
>     implement hashCode anymore (if there aren't any other cases, where
>     it is needed, in which case it would still be better to add
>     abstract hashCode and equals methods on Coder).
>
>     Jan
>
>     On 5/9/19 5:04 PM, Reuven Lax wrote:
>>     Is a valid hashCode on Coder part of our contract or not? If it
>>     is, then the lack of hashCode on SchemaCoder is simply a bug.
>>
>>     On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je.ik@seznam.cz
>>     <ma...@seznam.cz>> wrote:
>>
>>         Hi,
>>
>>         I have spent several hour digging into strange issue with
>>         DirectRunner,
>>         that manifested as non-deterministic run of pipeline. The
>>         pipeline
>>         contains basically only single stateful ParDo, which adds
>>         elements into
>>         state and after some timeout flushes these elements into
>>         output. The
>>         issues was, that sometimes (very often) when the timer fired,
>>         the state
>>         appeared to be empty, although I actually added something
>>         into the
>>         state. I will skip details, but the problem boils down to the
>>         fact, that
>>         StateSpecs hash Coder into hashCode - e.g.
>>
>>              @Override
>>              public int hashCode() {
>>                return Objects.hash(getClass(), coder);
>>              }
>>
>>         in ValueStateSpec. Now, when Coder doesn't have hashCode and
>>         equals
>>         implemented (and there are some of those in the codebase
>>         itself - e.g.
>>         SchemaCoder), it all blows up in a very hard-to-debug manner.
>>         So the
>>         proposal is - either to add abstract hashCode and equals to
>>         Coder, or
>>         don't hash the Coder into hashCode of StateSpecs (we can
>>         generate unique
>>         ID for each StateSpec instance for example).
>>
>>         Any thoughts about which path to follow? Or maybe both? :)
>>
>>         Jan
>>
>>

Re: Unexpected behavior of StateSpecs

Posted by Anton Kedin <ke...@google.com>.
Does it look similar to https://issues.apache.org/jira/browse/BEAM-6813 ? I
also stumbled on a problem with a state in DirectRunner but wasn't able to
figure it out yet:
https://lists.apache.org/thread.html/dae8b605a218532c085a0eea4e71338eae51922c26820f37b24875c0@%3Cdev.beam.apache.org%3E

Regards,
Anton

*From: *Jan Lukavský <je...@seznam.cz>
*Date: *Thu, May 9, 2019 at 8:13 AM
*To: * <de...@beam.apache.org>

Because of the use of hashCode in StateSpecs, I'd say that it is. But it is
> not obvious. That's why I'd suggest to make it abstract on Coder, so that
> all implementations have to override it. That's a simple solution, but the
> question is - should hashCode of Coder be used that way? I think that
> StateSpec instances should be equal only to itself. Then the hashCode can
> be stored in the instance, e.g.
>
>   private final int hashCode = System.identityHashCode(this)
>
> and returned in hashCode(). There would be no need for Coder to implement
> hashCode anymore (if there aren't any other cases, where it is needed, in
> which case it would still be better to add abstract hashCode and equals
> methods on Coder).
>
> Jan
> On 5/9/19 5:04 PM, Reuven Lax wrote:
>
> Is a valid hashCode on Coder part of our contract or not? If it is, then
> the lack of hashCode on SchemaCoder is simply a bug.
>
> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Hi,
>>
>> I have spent several hour digging into strange issue with DirectRunner,
>> that manifested as non-deterministic run of pipeline. The pipeline
>> contains basically only single stateful ParDo, which adds elements into
>> state and after some timeout flushes these elements into output. The
>> issues was, that sometimes (very often) when the timer fired, the state
>> appeared to be empty, although I actually added something into the
>> state. I will skip details, but the problem boils down to the fact, that
>> StateSpecs hash Coder into hashCode - e.g.
>>
>>      @Override
>>      public int hashCode() {
>>        return Objects.hash(getClass(), coder);
>>      }
>>
>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>> implemented (and there are some of those in the codebase itself - e.g.
>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>> proposal is - either to add abstract hashCode and equals to Coder, or
>> don't hash the Coder into hashCode of StateSpecs (we can generate unique
>> ID for each StateSpec instance for example).
>>
>> Any thoughts about which path to follow? Or maybe both? :)
>>
>> Jan
>>
>>
>>

Re: Unexpected behavior of StateSpecs

Posted by Jan Lukavský <je...@seznam.cz>.
Hi Lukasz,

I've created JIRA issue [1] and PR [2].

Jan

[1] https://issues.apache.org/jira/browse/BEAM-7269

[2] https://github.com/apache/beam/pull/8555

On 5/10/19 7:39 PM, Lukasz Cwik wrote:
> That seems like the correct fix as well. We could open up a PR and see 
> what the tests catch as a first pass for understanding the implications.
>
> On Fri, May 10, 2019 at 9:31 AM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     Hm, yes, the fix might be also in fixing hashCode and equals of
>     SimpleStateTag, so that it doesn't hash and compare the StateSpec,
>     but only the StructureId. That looks like best option to me. But
>     I'm not sure about other implications this might have.
>
>     Jan
>
>     On 5/10/19 5:43 PM, Reuven Lax wrote:
>>     Ok so this sounds like a bug in the DirectRunner then?
>>
>>     *From: *Lukasz Cwik <lcwik@google.com <ma...@google.com>>
>>     *Date: *Fri, May 10, 2019 at 8:38 AM
>>     *To: *dev
>>
>>         StateSpec should not be used as a key within any maps. We
>>         should use the logical name of the StateSpec relative to the
>>         DoFn as its id and should only be using that id for
>>         comparisons/lookups.
>>
>>         On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je.ik@seznam.cz
>>         <ma...@seznam.cz>> wrote:
>>
>>             I'm not sure. Generally it affects any runner that uses
>>             HashMap to store StateSpec.
>>
>>             Jan
>>
>>             On 5/9/19 6:32 PM, Reuven Lax wrote:
>>>             Is this specific to the DirectRunner, or does it affect
>>>             other runners?
>>>
>>>             On Thu, May 9, 2019 at 8:13 AM Jan Lukavský
>>>             <je.ik@seznam.cz <ma...@seznam.cz>> wrote:
>>>
>>>                 Because of the use of hashCode in StateSpecs, I'd
>>>                 say that it is. But it is not obvious. That's why
>>>                 I'd suggest to make it abstract on Coder, so that
>>>                 all implementations have to override it. That's a
>>>                 simple solution, but the question is - should
>>>                 hashCode of Coder be used that way? I think that
>>>                 StateSpec instances should be equal only to itself.
>>>                 Then the hashCode can be stored in the instance, e.g.
>>>
>>>                   private final int hashCode =
>>>                 System.identityHashCode(this)
>>>
>>>                 and returned in hashCode(). There would be no need
>>>                 for Coder to implement hashCode anymore (if there
>>>                 aren't any other cases, where it is needed, in which
>>>                 case it would still be better to add abstract
>>>                 hashCode and equals methods on Coder).
>>>
>>>                 Jan
>>>
>>>                 On 5/9/19 5:04 PM, Reuven Lax wrote:
>>>>                 Is a valid hashCode on Coder part of our contract
>>>>                 or not? If it is, then the lack of hashCode on
>>>>                 SchemaCoder is simply a bug.
>>>>
>>>>                 On Thu, May 9, 2019 at 7:42 AM Jan Lukavský
>>>>                 <je.ik@seznam.cz <ma...@seznam.cz>> wrote:
>>>>
>>>>                     Hi,
>>>>
>>>>                     I have spent several hour digging into strange
>>>>                     issue with DirectRunner,
>>>>                     that manifested as non-deterministic run of
>>>>                     pipeline. The pipeline
>>>>                     contains basically only single stateful ParDo,
>>>>                     which adds elements into
>>>>                     state and after some timeout flushes these
>>>>                     elements into output. The
>>>>                     issues was, that sometimes (very often) when
>>>>                     the timer fired, the state
>>>>                     appeared to be empty, although I actually added
>>>>                     something into the
>>>>                     state. I will skip details, but the problem
>>>>                     boils down to the fact, that
>>>>                     StateSpecs hash Coder into hashCode - e.g.
>>>>
>>>>                          @Override
>>>>                          public int hashCode() {
>>>>                            return Objects.hash(getClass(), coder);
>>>>                          }
>>>>
>>>>                     in ValueStateSpec. Now, when Coder doesn't have
>>>>                     hashCode and equals
>>>>                     implemented (and there are some of those in the
>>>>                     codebase itself - e.g.
>>>>                     SchemaCoder), it all blows up in a very
>>>>                     hard-to-debug manner. So the
>>>>                     proposal is - either to add abstract hashCode
>>>>                     and equals to Coder, or
>>>>                     don't hash the Coder into hashCode of
>>>>                     StateSpecs (we can generate unique
>>>>                     ID for each StateSpec instance for example).
>>>>
>>>>                     Any thoughts about which path to follow? Or
>>>>                     maybe both? :)
>>>>
>>>>                     Jan
>>>>
>>>>

Re: Unexpected behavior of StateSpecs

Posted by Lukasz Cwik <lc...@google.com>.
That seems like the correct fix as well. We could open up a PR and see what
the tests catch as a first pass for understanding the implications.

On Fri, May 10, 2019 at 9:31 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hm, yes, the fix might be also in fixing hashCode and equals of
> SimpleStateTag, so that it doesn't hash and compare the StateSpec, but only
> the StructureId. That looks like best option to me. But I'm not sure about
> other implications this might have.
>
> Jan
> On 5/10/19 5:43 PM, Reuven Lax wrote:
>
> Ok so this sounds like a bug in the DirectRunner then?
>
> *From: *Lukasz Cwik <lc...@google.com>
> *Date: *Fri, May 10, 2019 at 8:38 AM
> *To: *dev
>
> StateSpec should not be used as a key within any maps. We should use the
>> logical name of the StateSpec relative to the DoFn as its id and should
>> only be using that id for comparisons/lookups.
>>
>> On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> I'm not sure. Generally it affects any runner that uses HashMap to store
>>> StateSpec.
>>>
>>> Jan
>>> On 5/9/19 6:32 PM, Reuven Lax wrote:
>>>
>>> Is this specific to the DirectRunner, or does it affect other runners?
>>>
>>> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> Because of the use of hashCode in StateSpecs, I'd say that it is. But
>>>> it is not obvious. That's why I'd suggest to make it abstract on Coder, so
>>>> that all implementations have to override it. That's a simple solution, but
>>>> the question is - should hashCode of Coder be used that way? I think that
>>>> StateSpec instances should be equal only to itself. Then the hashCode can
>>>> be stored in the instance, e.g.
>>>>
>>>>   private final int hashCode = System.identityHashCode(this)
>>>>
>>>> and returned in hashCode(). There would be no need for Coder to
>>>> implement hashCode anymore (if there aren't any other cases, where it is
>>>> needed, in which case it would still be better to add abstract hashCode and
>>>> equals methods on Coder).
>>>>
>>>> Jan
>>>> On 5/9/19 5:04 PM, Reuven Lax wrote:
>>>>
>>>> Is a valid hashCode on Coder part of our contract or not? If it is,
>>>> then the lack of hashCode on SchemaCoder is simply a bug.
>>>>
>>>> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have spent several hour digging into strange issue with
>>>>> DirectRunner,
>>>>> that manifested as non-deterministic run of pipeline. The pipeline
>>>>> contains basically only single stateful ParDo, which adds elements
>>>>> into
>>>>> state and after some timeout flushes these elements into output. The
>>>>> issues was, that sometimes (very often) when the timer fired, the
>>>>> state
>>>>> appeared to be empty, although I actually added something into the
>>>>> state. I will skip details, but the problem boils down to the fact,
>>>>> that
>>>>> StateSpecs hash Coder into hashCode - e.g.
>>>>>
>>>>>      @Override
>>>>>      public int hashCode() {
>>>>>        return Objects.hash(getClass(), coder);
>>>>>      }
>>>>>
>>>>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>>>>> implemented (and there are some of those in the codebase itself - e.g.
>>>>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>>>>> proposal is - either to add abstract hashCode and equals to Coder, or
>>>>> don't hash the Coder into hashCode of StateSpecs (we can generate
>>>>> unique
>>>>> ID for each StateSpec instance for example).
>>>>>
>>>>> Any thoughts about which path to follow? Or maybe both? :)
>>>>>
>>>>> Jan
>>>>>
>>>>>
>>>>>

Re: Unexpected behavior of StateSpecs

Posted by Jan Lukavský <je...@seznam.cz>.
Hm, yes, the fix might be also in fixing hashCode and equals of 
SimpleStateTag, so that it doesn't hash and compare the StateSpec, but 
only the StructureId. That looks like best option to me. But I'm not 
sure about other implications this might have.

Jan

On 5/10/19 5:43 PM, Reuven Lax wrote:
> Ok so this sounds like a bug in the DirectRunner then?
>
> *From: *Lukasz Cwik <lcwik@google.com <ma...@google.com>>
> *Date: *Fri, May 10, 2019 at 8:38 AM
> *To: *dev
>
>     StateSpec should not be used as a key within any maps. We should
>     use the logical name of the StateSpec relative to the DoFn as its
>     id and should only be using that id for comparisons/lookups.
>
>     On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je.ik@seznam.cz
>     <ma...@seznam.cz>> wrote:
>
>         I'm not sure. Generally it affects any runner that uses
>         HashMap to store StateSpec.
>
>         Jan
>
>         On 5/9/19 6:32 PM, Reuven Lax wrote:
>>         Is this specific to the DirectRunner, or does it affect other
>>         runners?
>>
>>         On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je.ik@seznam.cz
>>         <ma...@seznam.cz>> wrote:
>>
>>             Because of the use of hashCode in StateSpecs, I'd say
>>             that it is. But it is not obvious. That's why I'd suggest
>>             to make it abstract on Coder, so that all implementations
>>             have to override it. That's a simple solution, but the
>>             question is - should hashCode of Coder be used that way?
>>             I think that StateSpec instances should be equal only to
>>             itself. Then the hashCode can be stored in the instance, e.g.
>>
>>               private final int hashCode = System.identityHashCode(this)
>>
>>             and returned in hashCode(). There would be no need for
>>             Coder to implement hashCode anymore (if there aren't any
>>             other cases, where it is needed, in which case it would
>>             still be better to add abstract hashCode and equals
>>             methods on Coder).
>>
>>             Jan
>>
>>             On 5/9/19 5:04 PM, Reuven Lax wrote:
>>>             Is a valid hashCode on Coder part of our contract or
>>>             not? If it is, then the lack of hashCode on SchemaCoder
>>>             is simply a bug.
>>>
>>>             On Thu, May 9, 2019 at 7:42 AM Jan Lukavský
>>>             <je.ik@seznam.cz <ma...@seznam.cz>> wrote:
>>>
>>>                 Hi,
>>>
>>>                 I have spent several hour digging into strange issue
>>>                 with DirectRunner,
>>>                 that manifested as non-deterministic run of
>>>                 pipeline. The pipeline
>>>                 contains basically only single stateful ParDo, which
>>>                 adds elements into
>>>                 state and after some timeout flushes these elements
>>>                 into output. The
>>>                 issues was, that sometimes (very often) when the
>>>                 timer fired, the state
>>>                 appeared to be empty, although I actually added
>>>                 something into the
>>>                 state. I will skip details, but the problem boils
>>>                 down to the fact, that
>>>                 StateSpecs hash Coder into hashCode - e.g.
>>>
>>>                      @Override
>>>                      public int hashCode() {
>>>                        return Objects.hash(getClass(), coder);
>>>                      }
>>>
>>>                 in ValueStateSpec. Now, when Coder doesn't have
>>>                 hashCode and equals
>>>                 implemented (and there are some of those in the
>>>                 codebase itself - e.g.
>>>                 SchemaCoder), it all blows up in a very
>>>                 hard-to-debug manner. So the
>>>                 proposal is - either to add abstract hashCode and
>>>                 equals to Coder, or
>>>                 don't hash the Coder into hashCode of StateSpecs (we
>>>                 can generate unique
>>>                 ID for each StateSpec instance for example).
>>>
>>>                 Any thoughts about which path to follow? Or maybe
>>>                 both? :)
>>>
>>>                 Jan
>>>
>>>

Re: Unexpected behavior of StateSpecs

Posted by Reuven Lax <re...@google.com>.
Ok so this sounds like a bug in the DirectRunner then?

*From: *Lukasz Cwik <lc...@google.com>
*Date: *Fri, May 10, 2019 at 8:38 AM
*To: *dev

StateSpec should not be used as a key within any maps. We should use the
> logical name of the StateSpec relative to the DoFn as its id and should
> only be using that id for comparisons/lookups.
>
> On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> I'm not sure. Generally it affects any runner that uses HashMap to store
>> StateSpec.
>>
>> Jan
>> On 5/9/19 6:32 PM, Reuven Lax wrote:
>>
>> Is this specific to the DirectRunner, or does it affect other runners?
>>
>> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Because of the use of hashCode in StateSpecs, I'd say that it is. But it
>>> is not obvious. That's why I'd suggest to make it abstract on Coder, so
>>> that all implementations have to override it. That's a simple solution, but
>>> the question is - should hashCode of Coder be used that way? I think that
>>> StateSpec instances should be equal only to itself. Then the hashCode can
>>> be stored in the instance, e.g.
>>>
>>>   private final int hashCode = System.identityHashCode(this)
>>>
>>> and returned in hashCode(). There would be no need for Coder to
>>> implement hashCode anymore (if there aren't any other cases, where it is
>>> needed, in which case it would still be better to add abstract hashCode and
>>> equals methods on Coder).
>>>
>>> Jan
>>> On 5/9/19 5:04 PM, Reuven Lax wrote:
>>>
>>> Is a valid hashCode on Coder part of our contract or not? If it is, then
>>> the lack of hashCode on SchemaCoder is simply a bug.
>>>
>>> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have spent several hour digging into strange issue with DirectRunner,
>>>> that manifested as non-deterministic run of pipeline. The pipeline
>>>> contains basically only single stateful ParDo, which adds elements into
>>>> state and after some timeout flushes these elements into output. The
>>>> issues was, that sometimes (very often) when the timer fired, the state
>>>> appeared to be empty, although I actually added something into the
>>>> state. I will skip details, but the problem boils down to the fact,
>>>> that
>>>> StateSpecs hash Coder into hashCode - e.g.
>>>>
>>>>      @Override
>>>>      public int hashCode() {
>>>>        return Objects.hash(getClass(), coder);
>>>>      }
>>>>
>>>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>>>> implemented (and there are some of those in the codebase itself - e.g.
>>>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>>>> proposal is - either to add abstract hashCode and equals to Coder, or
>>>> don't hash the Coder into hashCode of StateSpecs (we can generate
>>>> unique
>>>> ID for each StateSpec instance for example).
>>>>
>>>> Any thoughts about which path to follow? Or maybe both? :)
>>>>
>>>> Jan
>>>>
>>>>
>>>>

Re: Unexpected behavior of StateSpecs

Posted by Lukasz Cwik <lc...@google.com>.
StateSpec should not be used as a key within any maps. We should use the
logical name of the StateSpec relative to the DoFn as its id and should
only be using that id for comparisons/lookups.

On Fri, May 10, 2019 at 1:07 AM Jan Lukavský <je...@seznam.cz> wrote:

> I'm not sure. Generally it affects any runner that uses HashMap to store
> StateSpec.
>
> Jan
> On 5/9/19 6:32 PM, Reuven Lax wrote:
>
> Is this specific to the DirectRunner, or does it affect other runners?
>
> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Because of the use of hashCode in StateSpecs, I'd say that it is. But it
>> is not obvious. That's why I'd suggest to make it abstract on Coder, so
>> that all implementations have to override it. That's a simple solution, but
>> the question is - should hashCode of Coder be used that way? I think that
>> StateSpec instances should be equal only to itself. Then the hashCode can
>> be stored in the instance, e.g.
>>
>>   private final int hashCode = System.identityHashCode(this)
>>
>> and returned in hashCode(). There would be no need for Coder to implement
>> hashCode anymore (if there aren't any other cases, where it is needed, in
>> which case it would still be better to add abstract hashCode and equals
>> methods on Coder).
>>
>> Jan
>> On 5/9/19 5:04 PM, Reuven Lax wrote:
>>
>> Is a valid hashCode on Coder part of our contract or not? If it is, then
>> the lack of hashCode on SchemaCoder is simply a bug.
>>
>> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Hi,
>>>
>>> I have spent several hour digging into strange issue with DirectRunner,
>>> that manifested as non-deterministic run of pipeline. The pipeline
>>> contains basically only single stateful ParDo, which adds elements into
>>> state and after some timeout flushes these elements into output. The
>>> issues was, that sometimes (very often) when the timer fired, the state
>>> appeared to be empty, although I actually added something into the
>>> state. I will skip details, but the problem boils down to the fact, that
>>> StateSpecs hash Coder into hashCode - e.g.
>>>
>>>      @Override
>>>      public int hashCode() {
>>>        return Objects.hash(getClass(), coder);
>>>      }
>>>
>>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>>> implemented (and there are some of those in the codebase itself - e.g.
>>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>>> proposal is - either to add abstract hashCode and equals to Coder, or
>>> don't hash the Coder into hashCode of StateSpecs (we can generate unique
>>> ID for each StateSpec instance for example).
>>>
>>> Any thoughts about which path to follow? Or maybe both? :)
>>>
>>> Jan
>>>
>>>
>>>

Re: Unexpected behavior of StateSpecs

Posted by Jan Lukavský <je...@seznam.cz>.
I'm not sure. Generally it affects any runner that uses HashMap to store 
StateSpec.

Jan

On 5/9/19 6:32 PM, Reuven Lax wrote:
> Is this specific to the DirectRunner, or does it affect other runners?
>
> On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     Because of the use of hashCode in StateSpecs, I'd say that it is.
>     But it is not obvious. That's why I'd suggest to make it abstract
>     on Coder, so that all implementations have to override it. That's
>     a simple solution, but the question is - should hashCode of Coder
>     be used that way? I think that StateSpec instances should be equal
>     only to itself. Then the hashCode can be stored in the instance, e.g.
>
>       private final int hashCode = System.identityHashCode(this)
>
>     and returned in hashCode(). There would be no need for Coder to
>     implement hashCode anymore (if there aren't any other cases, where
>     it is needed, in which case it would still be better to add
>     abstract hashCode and equals methods on Coder).
>
>     Jan
>
>     On 5/9/19 5:04 PM, Reuven Lax wrote:
>>     Is a valid hashCode on Coder part of our contract or not? If it
>>     is, then the lack of hashCode on SchemaCoder is simply a bug.
>>
>>     On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je.ik@seznam.cz
>>     <ma...@seznam.cz>> wrote:
>>
>>         Hi,
>>
>>         I have spent several hour digging into strange issue with
>>         DirectRunner,
>>         that manifested as non-deterministic run of pipeline. The
>>         pipeline
>>         contains basically only single stateful ParDo, which adds
>>         elements into
>>         state and after some timeout flushes these elements into
>>         output. The
>>         issues was, that sometimes (very often) when the timer fired,
>>         the state
>>         appeared to be empty, although I actually added something
>>         into the
>>         state. I will skip details, but the problem boils down to the
>>         fact, that
>>         StateSpecs hash Coder into hashCode - e.g.
>>
>>              @Override
>>              public int hashCode() {
>>                return Objects.hash(getClass(), coder);
>>              }
>>
>>         in ValueStateSpec. Now, when Coder doesn't have hashCode and
>>         equals
>>         implemented (and there are some of those in the codebase
>>         itself - e.g.
>>         SchemaCoder), it all blows up in a very hard-to-debug manner.
>>         So the
>>         proposal is - either to add abstract hashCode and equals to
>>         Coder, or
>>         don't hash the Coder into hashCode of StateSpecs (we can
>>         generate unique
>>         ID for each StateSpec instance for example).
>>
>>         Any thoughts about which path to follow? Or maybe both? :)
>>
>>         Jan
>>
>>

Re: Unexpected behavior of StateSpecs

Posted by Reuven Lax <re...@google.com>.
Is this specific to the DirectRunner, or does it affect other runners?

On Thu, May 9, 2019 at 8:13 AM Jan Lukavský <je...@seznam.cz> wrote:

> Because of the use of hashCode in StateSpecs, I'd say that it is. But it
> is not obvious. That's why I'd suggest to make it abstract on Coder, so
> that all implementations have to override it. That's a simple solution, but
> the question is - should hashCode of Coder be used that way? I think that
> StateSpec instances should be equal only to itself. Then the hashCode can
> be stored in the instance, e.g.
>
>   private final int hashCode = System.identityHashCode(this)
>
> and returned in hashCode(). There would be no need for Coder to implement
> hashCode anymore (if there aren't any other cases, where it is needed, in
> which case it would still be better to add abstract hashCode and equals
> methods on Coder).
>
> Jan
> On 5/9/19 5:04 PM, Reuven Lax wrote:
>
> Is a valid hashCode on Coder part of our contract or not? If it is, then
> the lack of hashCode on SchemaCoder is simply a bug.
>
> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Hi,
>>
>> I have spent several hour digging into strange issue with DirectRunner,
>> that manifested as non-deterministic run of pipeline. The pipeline
>> contains basically only single stateful ParDo, which adds elements into
>> state and after some timeout flushes these elements into output. The
>> issues was, that sometimes (very often) when the timer fired, the state
>> appeared to be empty, although I actually added something into the
>> state. I will skip details, but the problem boils down to the fact, that
>> StateSpecs hash Coder into hashCode - e.g.
>>
>>      @Override
>>      public int hashCode() {
>>        return Objects.hash(getClass(), coder);
>>      }
>>
>> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>> implemented (and there are some of those in the codebase itself - e.g.
>> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>> proposal is - either to add abstract hashCode and equals to Coder, or
>> don't hash the Coder into hashCode of StateSpecs (we can generate unique
>> ID for each StateSpec instance for example).
>>
>> Any thoughts about which path to follow? Or maybe both? :)
>>
>> Jan
>>
>>
>>

Re: Unexpected behavior of StateSpecs

Posted by Jan Lukavský <je...@seznam.cz>.
Because of the use of hashCode in StateSpecs, I'd say that it is. But it 
is not obvious. That's why I'd suggest to make it abstract on Coder, so 
that all implementations have to override it. That's a simple solution, 
but the question is - should hashCode of Coder be used that way? I think 
that StateSpec instances should be equal only to itself. Then the 
hashCode can be stored in the instance, e.g.

   private final int hashCode = System.identityHashCode(this)

and returned in hashCode(). There would be no need for Coder to 
implement hashCode anymore (if there aren't any other cases, where it is 
needed, in which case it would still be better to add abstract hashCode 
and equals methods on Coder).

Jan

On 5/9/19 5:04 PM, Reuven Lax wrote:
> Is a valid hashCode on Coder part of our contract or not? If it is, 
> then the lack of hashCode on SchemaCoder is simply a bug.
>
> On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     Hi,
>
>     I have spent several hour digging into strange issue with
>     DirectRunner,
>     that manifested as non-deterministic run of pipeline. The pipeline
>     contains basically only single stateful ParDo, which adds elements
>     into
>     state and after some timeout flushes these elements into output. The
>     issues was, that sometimes (very often) when the timer fired, the
>     state
>     appeared to be empty, although I actually added something into the
>     state. I will skip details, but the problem boils down to the
>     fact, that
>     StateSpecs hash Coder into hashCode - e.g.
>
>          @Override
>          public int hashCode() {
>            return Objects.hash(getClass(), coder);
>          }
>
>     in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
>     implemented (and there are some of those in the codebase itself -
>     e.g.
>     SchemaCoder), it all blows up in a very hard-to-debug manner. So the
>     proposal is - either to add abstract hashCode and equals to Coder, or
>     don't hash the Coder into hashCode of StateSpecs (we can generate
>     unique
>     ID for each StateSpec instance for example).
>
>     Any thoughts about which path to follow? Or maybe both? :)
>
>     Jan
>
>

Re: Unexpected behavior of StateSpecs

Posted by Reuven Lax <re...@google.com>.
Is a valid hashCode on Coder part of our contract or not? If it is, then
the lack of hashCode on SchemaCoder is simply a bug.

On Thu, May 9, 2019 at 7:42 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hi,
>
> I have spent several hour digging into strange issue with DirectRunner,
> that manifested as non-deterministic run of pipeline. The pipeline
> contains basically only single stateful ParDo, which adds elements into
> state and after some timeout flushes these elements into output. The
> issues was, that sometimes (very often) when the timer fired, the state
> appeared to be empty, although I actually added something into the
> state. I will skip details, but the problem boils down to the fact, that
> StateSpecs hash Coder into hashCode - e.g.
>
>      @Override
>      public int hashCode() {
>        return Objects.hash(getClass(), coder);
>      }
>
> in ValueStateSpec. Now, when Coder doesn't have hashCode and equals
> implemented (and there are some of those in the codebase itself - e.g.
> SchemaCoder), it all blows up in a very hard-to-debug manner. So the
> proposal is - either to add abstract hashCode and equals to Coder, or
> don't hash the Coder into hashCode of StateSpecs (we can generate unique
> ID for each StateSpec instance for example).
>
> Any thoughts about which path to follow? Or maybe both? :)
>
> Jan
>
>
>