You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by NicoK <gi...@git.apache.org> on 2017/01/26 14:12:35 UTC

[GitHub] flink pull request #3218: [FLINK-5642][query] fix a race condition with Head...

GitHub user NicoK opened a pull request:

    https://github.com/apache/flink/pull/3218

    [FLINK-5642][query] fix a race condition with HeadListState

    The idiom behind `AppendingState#get()` is to return a copy of the value behind or at least not to allow changes to the underlying state storage. However, the heap state backend returns the original `ArrayList` which is not thread-safe. In contrast to the operator/window evictor thread where only one accesses the state at a time, queryable state may access state any time in order not to slow down normal operation. Any structural changes to `ArrayList` are thus unsafe which is why this PR:
    * synchronizes access to structure-changing methods on that list (only for queryable state),
    * forbids `ArrayList#remove()` (only for queryable state) that is available through `Iterator#remove()` which is the only structural change the API offers on the `Iterable` that `HeapListState#get()` returns.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NicoK/flink flink-5642

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3218
    
----
commit 542432d80ae79f773fe08d533222e71aee4c7da8
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-01-25T15:07:50Z

    [FLINK-5642][query] fix a race condition with HeadListState
    
    The idiom behind AppendingState#get() is to return a copy of the value behind
    or at least not to allow changes to the underlying state storage. However,
    the heap state backend returns the original list which is backed by an
    ArrayList which is not thread-safe. Aside from the operator/window evictor
    thread where only one accesses the state at a time, however, queryable state
    may access state anytime in order not to slow down normal operation. Any
    structural changes to ArrayList are thus unsafe and are hereby synchronized
    in case the state is queryable.

commit 641c2c9f46be4f4171a33923b1f1dc961464575f
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-01-25T15:04:15Z

    [FLINK-5642] fail when calling Iterator#remove() on queryable list state returned from HeapListState#get()
    
    The idiom behind AppendingState#get() is to return a copy of the value behind
    or at least not to allow changes to the underlying state storage. However,
    the heap state backend returns the original list and thus is prone to changes.
    The user cannot rely on changes to be reflected by the backing store but, if
    correctly used, e.g. by clearing the list and re-adding all elements afterwards,
    changes may still be ok.
    
    However, in conjunction with queryable state, any structural changes to the
    backing ArrayList lead to races (as may changes to the stored objects but we
    cannot forbid that for now). By forbidding ArrayList#remove(), we can at least
    forbid Iterator#remove() which is the only structural change the API offers
    on the Iterable that HeapListState#get() returns.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3218: [FLINK-5642][query] fix a race condition with HeadListSta...

Posted by NicoK <gi...@git.apache.org>.
Github user NicoK commented on the issue:

    https://github.com/apache/flink/pull/3218
  
    ok, let's close this PR as the issue is actually deeper than originally though and can only be fixed with a new heap state backend or by locking for queryable state queries as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3218: [FLINK-5642][query] fix a race condition with Head...

Posted by NicoK <gi...@git.apache.org>.
Github user NicoK closed the pull request at:

    https://github.com/apache/flink/pull/3218


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3218: [FLINK-5642][query] fix a race condition with HeadListSta...

Posted by NicoK <gi...@git.apache.org>.
Github user NicoK commented on the issue:

    https://github.com/apache/flink/pull/3218
  
    1. Actually, RocksDB state's get() method has the idiom of returning a (deserialized) **copy** with which the user can do whatever he likes to, knowing that changes are not reflected in the state back-end. This is also what the window evictors rely on, e.g. `TimeEvictor#evict()`: it uses list state and iterates over the elements removing the ones it wants to remove. Later on, `EvictingWindowOperator#emitWindowContents()` clears the list state and (re-)adds all remaining elements of that iterator back to the list state. Forbidding `remove()` for all cases would require a lot of refactoring there with potential affects in user-defined evictors, too. Since window operators are not queryable though, that was the next-best and least-invasive solution since returning an actual copy in the heap list state implementation was also not desired.
    
    2. Sounds nice, I'll do some changes to create a version without the code branch using a specialised class for queryable list state. The branching then moves to `HeapKeyedStateBackend#createListState` where it is only called once per creation.
    
    2. The specialised `ArrayList` implementation sounds like a nice idea but unfortunately, the window evictors' use mentioned above does not follow the assumption of an ever-growing list and makes things more complicated again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3218: [FLINK-5642][query] fix a race condition with HeadListSta...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/3218
  
    Good catch for that problem.
    
    I would suggest two changes:
      1. Let's forbid `remove()` in all cases, also when the state is non-queryable. It seems inconsistent that the heap-state allows modifications via the iterable, while the RocksDB state does not.
    
      2. It would be nice to get rid of the locking (or the code branch) in the `add()` method:
        - A simple approach is to override `add()` in the queryable state version of the list and synchronize there.
        - A more advanced idea: We may get rid of the locking on the ArrayList alltogether by implementing our own specialized list: Since the list ever only grows (a clear call removes the list as a whole from the map in the heap state) creating a serialized copy means simply taking the list and taking all elements from the list up to the size that are not null (to support `null` we can use a null-marker element). Taking the elements from the list needs to be conservative (take the lower, size or nun-null entries) to compensate for visibility issues across the threads.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---