You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/04/13 02:34:41 UTC

Does partition by and order by works only in stateful case?

Hi All,

Does partition by and order by works only in stateful case?

For example:

select row_number() over (partition by id order by timestamp) from table

gives me

*SEVERE: Exception occured while submitting the query:
java.lang.RuntimeException: org.apache.spark.sql.AnalysisException:
Non-time-based windows are not supported on streaming DataFrames/Datasets;;*

I wonder what time based window means? is it not the window from over()
clause or does it mean group by(window('timestamp'), '10 minutes') like the
stateful case?

Thanks

Re: Does partition by and order by works only in stateful case?

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

My sincere apologies for adding my question to this chain. For some reason,
I am unable to see the messages which I write to the group ever appear back
in it and I think that this might be related in a way that shows a few
differences between traditional operations and Spark Streaming operations.

Can I please ask why does lines.count() throws the exception:
org.apache.spark.sql.AnalysisException:
Queries with streaming sources must be executed with writeStream.start();;

Whereas if I do lines.createOrReplaceTempView("test") and then run the sql
"select count(*) ccount from test" it runs absolutely fine.

I can figure out from the exceptions that there is a check which is getting
executed to find out whether isStreaming is true for lines or not, but a
bit of explanation might help.



Regards,
Gourav Sengupta

On Fri, Apr 13, 2018 at 3:53 AM, Tathagata Das <ta...@gmail.com>
wrote:

> The traditional SQL windows with `over` is not supported in streaming.
> Only time-based windows, that is, `window("timestamp", "10 minutes")` is
> supported in streaming.
>
> On Thu, Apr 12, 2018 at 7:34 PM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi All,
>>
>> Does partition by and order by works only in stateful case?
>>
>> For example:
>>
>> select row_number() over (partition by id order by timestamp) from table
>>
>> gives me
>>
>> *SEVERE: Exception occured while submitting the query:
>> java.lang.RuntimeException: org.apache.spark.sql.AnalysisException:
>> Non-time-based windows are not supported on streaming DataFrames/Datasets;;*
>>
>> I wonder what time based window means? is it not the window from over()
>> clause or does it mean group by(window('timestamp'), '10 minutes') like the
>> stateful case?
>>
>> Thanks
>>
>
>

Re: Does partition by and order by works only in stateful case?

Posted by kant kodali <ka...@gmail.com>.
got it! Thanks.

On Thu, Apr 12, 2018 at 7:53 PM, Tathagata Das <ta...@gmail.com>
wrote:

> The traditional SQL windows with `over` is not supported in streaming.
> Only time-based windows, that is, `window("timestamp", "10 minutes")` is
> supported in streaming.
>
> On Thu, Apr 12, 2018 at 7:34 PM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi All,
>>
>> Does partition by and order by works only in stateful case?
>>
>> For example:
>>
>> select row_number() over (partition by id order by timestamp) from table
>>
>> gives me
>>
>> *SEVERE: Exception occured while submitting the query:
>> java.lang.RuntimeException: org.apache.spark.sql.AnalysisException:
>> Non-time-based windows are not supported on streaming DataFrames/Datasets;;*
>>
>> I wonder what time based window means? is it not the window from over()
>> clause or does it mean group by(window('timestamp'), '10 minutes') like the
>> stateful case?
>>
>> Thanks
>>
>
>

Re: Does partition by and order by works only in stateful case?

Posted by Tathagata Das <ta...@gmail.com>.
The traditional SQL windows with `over` is not supported in streaming. Only
time-based windows, that is, `window("timestamp", "10 minutes")` is
supported in streaming.

On Thu, Apr 12, 2018 at 7:34 PM, kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> Does partition by and order by works only in stateful case?
>
> For example:
>
> select row_number() over (partition by id order by timestamp) from table
>
> gives me
>
> *SEVERE: Exception occured while submitting the query:
> java.lang.RuntimeException: org.apache.spark.sql.AnalysisException:
> Non-time-based windows are not supported on streaming DataFrames/Datasets;;*
>
> I wonder what time based window means? is it not the window from over()
> clause or does it mean group by(window('timestamp'), '10 minutes') like the
> stateful case?
>
> Thanks
>