You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2018/07/09 04:44:00 UTC
[jira] [Created] (SPARK-24763) Remove redundant key data from value
in streaming aggregation
Jungtaek Lim created SPARK-24763:
------------------------------------
Summary: Remove redundant key data from value in streaming aggregation
Key: SPARK-24763
URL: https://issues.apache.org/jira/browse/SPARK-24763
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 2.4.0
Reporter: Jungtaek Lim
Key/Value of state in streaming aggregation is formatted as below:
* key: UnsafeRow containing group-by fields
* value: UnsafeRow containing key fields and another fields for aggregation results
which data for key is stored to both key and value.
This is to avoid doing projection row to value while storing, and joining key and value to restore origin row to boost performance, but while doing a simple benchmark test, I found it not much helpful compared to "project and join". (will paste test result in comment)
So I would propose a new option: remove redundant in stateful aggregation. I'm avoiding to modify default behavior of stateful aggregation, because state value will not be compatible between current and option enabled.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org