You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "Kostas Kloudas (JIRA)" <ji...@apache.org> on 2017/10/06 08:31:03 UTC

[jira] [Created] (FLINK-7771) Make the operator state queryable

Kostas Kloudas created FLINK-7771:
-------------------------------------

Summary: Make the operator state queryable
Key: FLINK-7771
URL: https://issues.apache.org/jira/browse/FLINK-7771
Project: Flink
Issue Type: Improvement
Components: Queryable State
Affects Versions: 1.4.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas
Fix For: 1.4.0

There seem to be some requests for making the operator (non-keyed) state queryable. This means that the user will specify the *uuid* of the operator and the *taskId*, and he will be able to access the state that corresponds to that operator and for that specific task.

This issue will serve to document the discussion on the topic, so that everybody can participate.

Personally, I think that such a feature should wait until some things on state handling are stabilized (_e.g._ replication and checkpoint management). My main concerns have to do with the semantics and guarantees that such a feature could offer *for now*.

At first, operator state is essentially a list state that can be reshuffled arbitrarily upon restoring or rescaling. This means that task1 will have at a given execution attempt elements _A,B,C_ while after restoring (even without rescaling) it may have _D,B,E_ without this implying that something happened to states _A_ and _C_. They were simply assigned to another task. This makes it hard to reason about the results that you get at any point in time, as it provides *no locality/consistency guarantees between executions*.

The above, in combination with the fact that (for now) it is not possible to query the state at a specific point in time (_e.g._ the last checkpointed state), means that there is no easy way to get a consistent view of the state of an operator. So in the example above, when querying _(operatorA, task1)_ and _(operatorA, task2)_, the user can get states belonging to different "points in time" which can result to duplicates, lost values and all the problems encountered in distributed systems when there are no consistency guarantees.

The above illustrates some of the consistency problems that such a feature could face now.

I also link [~till.rohrmann] and [~skonto] as he also mentioned that this feature could be helpful.

--
This message was sent by Atlassian JIRA
(v6.4.14#64029)