You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arnaud Bailly (JIRA)" <ji...@apache.org> on 2016/07/01 13:21:11 UTC
[jira] [Commented] (SPARK-8360) Structured Streaming (aka Streaming
DataFrames)
[ https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358945#comment-15358945 ]
Arnaud Bailly commented on SPARK-8360:
--------------------------------------
I have a question regarding the semantics of the "complete" output mode but I am not sure this is the right place to ask.
Given some aggregation query I would expect a "complete" streaming request to result total aggregation over *all* values in the stream, past and new, but running a simple experiment with latest code at HEAD shows this is not the case : The streaming query returns result of running the query on *new* data only. My query looks something like:
{code}
select key, sum(value) from table1 t1, stream2 t2 where t1.pk = t2.pk w group by key;
{code}
with table1 a non-streaming DataFrame and stream2 a streaming DataFrame.
Am I missing/misunderstanding something?
> Structured Streaming (aka Streaming DataFrames)
> -----------------------------------------------
>
> Key: SPARK-8360
> URL: https://issues.apache.org/jira/browse/SPARK-8360
> Project: Spark
> Issue Type: Umbrella
> Components: SQL, Streaming
> Reporter: Reynold Xin
> Attachments: StructuredStreamingProgrammingAbstractionSemanticsandAPIs-ApacheJIRA.pdf
>
>
> Umbrella ticket to track what's needed to make streaming DataFrame a reality.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org