You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by JackyLee <qc...@163.com> on 2018/10/22 03:15:48 UTC

Re: Support SqlStreaming in spark

The code of SQLStreaming has been pushed:

https://github.com/apache/spark/pull/22575



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Support SqlStreaming in spark

Posted by JackyLee <qc...@163.com>.
No problem



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Support SqlStreaming in spark

Posted by Stavros Kontopoulos <st...@lightbend.com>.
Hi all,
From what I read there is an effort here to globally standardize SQL
Streaming (Flink people, Google at others are working with SQL
standardization body) https://arxiv.org/abs/1905.12133v1
<https://www.google.com/url?q=https://arxiv.org/abs/1905.12133v1&sa=D&ust=1559573953957000&usg=AFQjCNHcTO5X-qWxa_4Na5Q4VIgxdzPnuQ>should
Spark community be part of it?

Best,
Stavros

On Thu, Mar 28, 2019 at 12:03 PM uncleGen <hu...@gmail.com> wrote:

> Hi all,
>
> I have rewritten the design doc based on previous discussing.
>
> https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0
>
> Would be interested to hear what others think.
>
> Regards,
> Genmao Yu
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Support SqlStreaming in spark

Posted by uncleGen <hu...@gmail.com>.
Hi all, 

I have rewritten the design doc based on previous discussing. 
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0

Would be interested to hear what others think.

Regards,
Genmao Yu 



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Support SqlStreaming in spark

Posted by uncleGen <hu...@gmail.com>.
Hi all, 

I have rewritten the design doc based on previous discussing. 
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0

Would be interested to hear what others think. 

Regards, 
Genmao Yu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Support SqlStreaming in spark

Posted by sujith chacko <su...@gmail.com>.
Hi All,

 I think there are few more updates are added in the design document
compare to last document where few folks has reviewed and provided inputs.,
requesting all experts to review the design document and help us to
baseline the design for the  SPIP
'Support SQL streaming' in spark structured streaming, few more sections is
been added in-order to handle some scenarios as below

1) Passing the stream level configurations to the sql command instead of
setting it in session/application level.

2) Supporting Multiple Streams in single application,. etc

Link to the design document

https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#


Few Questions are already clarified by Jacky, please find through below link

https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#heading=h.t96f9l205fk1


Regards,
Sujith

On Thu, Dec 27, 2018 at 6:39 PM JackyLee <qc...@163.com> wrote:

> Hi, Wenchen
>
> Thank you for your recognition of Streaming on sql. I have written the
> SQLStreaming design document:
>
> https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#
>
> Your Questions are answered in here:
>
> https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#heading=h.t96f9l205fk1
>
> There may be some details that I have not considered, we can discuss it in
> more depth.
>
> Thanks
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Support SqlStreaming in spark

Posted by JackyLee <qc...@163.com>.
Hi, Wenchen

Thank you for your recognition of Streaming on sql. I have written the
SQLStreaming design document:
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#

Your Questions are answered in here:
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#heading=h.t96f9l205fk1

There may be some details that I have not considered, we can discuss it in
more depth.

Thanks



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Support SqlStreaming in spark

Posted by Wenchen Fan <cl...@gmail.com>.
Hi JackyLee,

Can you put the answers to these questions in the design doc?

e.g. if we don't want to support manipulating a streaming query, then is
`SELECT STREAM ...` a blocking action? And how users can create a Spark
application with multiple streaming jobs? How users can run Structured
Streaming interactively? etc.

On Sat, Dec 22, 2018 at 3:04 PM JackyLee <qc...@163.com> wrote:

> Hi wenchen
>     I have been working at SQLStreaming for a year, and I have promoted it
> in company.
>     I have seen the design for Kafka or the Calcite, and I believe my
> design
> is better than them. They support pure-SQL not table API for streaming.
> Users can only use the specified Streaming statement, and the same
> statement
> can't run Batch queries.
>     But in my opinion, the Table API is actually  the key to solve
> SQLStreaming, pure-SQL is just another expression of the Streaming Table
> API.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Support SqlStreaming in spark

Posted by JackyLee <qc...@163.com>.
Hi wenchen
    I have been working at SQLStreaming for a year, and I have promoted it
in company. 
    I have seen the design for Kafka or the Calcite, and I believe my design
is better than them. They support pure-SQL not table API for streaming.
Users can only use the specified Streaming statement, and the same statement
can't run Batch queries.
    But in my opinion, the Table API is actually  the key to solve
SQLStreaming, pure-SQL is just another expression of the Streaming Table
API.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Support SqlStreaming in spark

Posted by JackyLee <qc...@163.com>.
Hi wenchen and Arun Mahadevan
    Thanks for your reply.

    SQLStreaming is not just a way to support pure-SQL, but also a way to
define table api for Streaming.
    I have redefined the SQLStreaming to make it support table API. User can
use sql or table API to run SQLStreaming. 

    I will update the design document of SQLStreaming. Could you help me
improve the design doc?

    Again, thanks for your attention.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Support SqlStreaming in spark

Posted by Arun Mahadevan <ar...@apache.org>.
There has been efforts to come up with a unified syntax for streaming (see
[1] [2]), but I guess there will be differences based on the streaming
features supported by a system.

Agree it needs a detailed design and it can be as close to the Spark batch
SQL syntax as possible.

Also I am not sure if its possible or makes sense to express all the
operations via pure sql. e.g. the query start/stop, triggers, watermark etc
might be better expressed via APIs.

[1]
https://docs.google.com/document/d/1wrla8mF_mmq-NW9sdJHYVgMyZsgCmHumJJ5f5WUzTiM/edit#heading=h.vfrf26d6b3ne
[2] https://calcite.apache.org/docs/stream.html


On Fri, 21 Dec 2018 at 18:13, Wenchen Fan <cl...@gmail.com> wrote:

> It will be great to add pure-SQL support to structured streaming. I think
> it goes without saying that how important SQL support is, but we should
> make a completed design first.
>
> Looking at the Kafka streaming syntax
> <https://www.confluent.io/blog/ksql-streaming-sql-for-apache-kafka/>, it
> has CREATE STREAM, it has WINDOW TUMBLING. Shall we check other streaming
> systems with SQL support, and justify places where we are going to differ?
>
> We should also take into account the full lifecycle:
> 1. how to restart a streaming query from checkpoint?
> 2. how to stop a streaming query?
> 3. how to check status/progress of a streaming query?
> 4. ...
>
> Basically, we should check what functions the DataStreamReader/Writer API
> support, and see if we can support it with SQL as well.
>
>
> Thanks for your proposal!
> Wenchen
>
> On Mon, Oct 22, 2018 at 11:15 AM JackyLee <qc...@163.com> wrote:
>
>> The code of SQLStreaming has been pushed:
>>
>> https://github.com/apache/spark/pull/22575
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: Support SqlStreaming in spark

Posted by Wenchen Fan <cl...@gmail.com>.
It will be great to add pure-SQL support to structured streaming. I think
it goes without saying that how important SQL support is, but we should
make a completed design first.

Looking at the Kafka streaming syntax
<https://www.confluent.io/blog/ksql-streaming-sql-for-apache-kafka/>, it
has CREATE STREAM, it has WINDOW TUMBLING. Shall we check other streaming
systems with SQL support, and justify places where we are going to differ?

We should also take into account the full lifecycle:
1. how to restart a streaming query from checkpoint?
2. how to stop a streaming query?
3. how to check status/progress of a streaming query?
4. ...

Basically, we should check what functions the DataStreamReader/Writer API
support, and see if we can support it with SQL as well.


Thanks for your proposal!
Wenchen

On Mon, Oct 22, 2018 at 11:15 AM JackyLee <qc...@163.com> wrote:

> The code of SQLStreaming has been pushed:
>
> https://github.com/apache/spark/pull/22575
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>