You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by rehevkor5 <gi...@git.apache.org> on 2016/08/02 18:03:55 UTC

[GitHub] flink issue #2051: [FLINK-3779] Add support for queryable state

Github user rehevkor5 commented on the issue:

https://github.com/apache/flink/pull/2051

Hi, it's great to see that someone is working on this stuff!

I just wanted to put in my two cents, to provide a different perspective that might change how you are thinking about this.

On my project, we are interested in incorporating pre-computed historical time-series data into the values within a time window. Those values would need to be loaded from a distributed database such as Cassandra or DynamoDB. Also, we would like for newly computed time-series data points (produced by a Flink window pane) to be persisted externally, side-by-side with the historical data (in Cassandra/DynamoDB).

In contrast with your approach, which enables querying of state from within Flink, we are more interested in querying that state from the external database. This allows the Flink job to produce time series data which can be queried ad-hoc in the database, while also allowing the Flink job to produce pre-calculated aggregates from that time series.

I believe others have mentioned in this thread the need, therefore, to allow the State Store to choose the serialization approach. While serializing to byte[] works well for Memory and RocksDB State Stores, inserting into a NoSQL database requires creation of an INSERT command with data that includes primary/partition key, secondary/range key, and arbitrarily structured data (one column of byte[], or perhaps more complex based on the particular type of value). In particular, we need the timestamp of the time series point to be a top-level value in the INSERT, so that time range queries can be efficient. The interface is also important when it comes to Flink loading pre-existing data, because Flink or an integration layer will need to know how to query for the particular keys it is looking for.

I hope that makes sense & gives some perspective on what some people are thinking about with regard to "queryable state".

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---