You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Arnaud Bailly (JIRA)" <ji...@apache.org> on 2016/07/01 13:21:11 UTC

[jira] [Commented] (SPARK-8360) Structured Streaming (aka Streaming DataFrames)

    [ https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358945#comment-15358945 ] 

Arnaud Bailly commented on SPARK-8360:
--------------------------------------

I have a question regarding the semantics of the "complete" output mode but I am not sure this is the right place to ask. 
Given some aggregation query I would expect a "complete" streaming request to result total aggregation over *all* values in the stream, past and new, but running a simple experiment with latest code at HEAD shows this is not the case : The streaming query returns result of running the query on *new* data only. My query looks something like:

{code}
select key, sum(value) from table1 t1, stream2 t2 where t1.pk = t2.pk w group by key;
{code} 

with table1 a non-streaming DataFrame and stream2 a streaming DataFrame.

Am I missing/misunderstanding something?

> Structured Streaming (aka Streaming DataFrames)
> -----------------------------------------------
>
>                 Key: SPARK-8360
>                 URL: https://issues.apache.org/jira/browse/SPARK-8360
>             Project: Spark
>          Issue Type: Umbrella
>          Components: SQL, Streaming
>            Reporter: Reynold Xin
>         Attachments: StructuredStreamingProgrammingAbstractionSemanticsandAPIs-ApacheJIRA.pdf
>
>
> Umbrella ticket to track what's needed to make streaming DataFrame a reality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org