You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jackey Lee (JIRA)" <ji...@apache.org> on 2018/10/08 08:37:00 UTC
[jira] [Comment Edited] (SPARK-24630) SPIP: Support SQLStreaming in Spark

    [ https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641516#comment-16641516 ] 

Jackey Lee edited comment on SPARK-24630 at 10/8/18 8:36 AM:
-------------------------------------------------------------

[~kabhwan]
 The DDL in SQLStreaming has nothing change with Batch SQL.

Adding 'stream' keyword has two purposes:
 # *Mark the entire sql query as a stream query and generate the SQLStreaming plan tree.*
 # *Mark the table type as UnResolvedStreamRelation.* Parse the table as StreamingRelation or other Relation, especially in the stream join batch queries, such as kafka join mysql.

 

A little example to show importances of 'stream': read stream from kafka stream table, and join mysql to count user message
 # *with 'stream'* 
 ## select stream kafka_sql_test.name, count(door)  from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name group by kafka_sql_test.name
 ## It will be regarded as Streaming Query, the kafka_sql_test will be parsed as StreamingRelation and _mysql_test will be parsed as JDBCRelation, not Streaming Relation._
 # *without 'stream'*
 ## select kafka_sql.name, count(door) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name group by kafka_sql_test.name
 ## It will be regarded as Batch Query, the kafka_sql_test will be parsed to KafkaRelation and mysql_test will be parsed as JDBCRelation.

 

*As for Flink, it uses the StreamExecutionEnvironment and StreamTableEnvironment to achieve the above goals.* Thus, 'stream' keyword is no need for Flink.


was (Author: jackey lee):
[~kabhwan]
 The DDL in SQLStreaming has nothing change with Batch SQL.

Adding 'stream' keyword has two purposes:
 # *Mark the entire sql query as a stream query and generate the SQLStreaming plan tree.*
 # *Mark the table type as UnResolvedStreamRelation.* Parse the table as StreamingRelation or other Relation, especially in the stream join batch queries, such as kafka join mysql.

 

A little example to show importances of 'stream': read stream from kafka stream table, and join mysql to count user message
 # *with 'stream'* 
 ## select stream name, count(*) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name
 ## It will be regarded as Streaming Query, the kafka_sql_test will be parsed as StreamingRelation and _mysql_test will be parsed as JDBCRelation, not Streaming Relation._
 # *without 'stream'*
 ## select name, count(*) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name
 ## It will be regarded as Batch Query, the kafka_sql_test will be parsed to KafkaRelation and mysql_test will be parsed as JDBCRelation.

 

*As for Flink, it uses the StreamExecutionEnvironment and StreamTableEnvironment to achieve the above goals.* Thus, 'stream' keyword is no need for Flink.

> SPIP: Support SQLStreaming in Spark
> -----------------------------------
>
>                 Key: SPARK-24630
>                 URL: https://issues.apache.org/jira/browse/SPARK-24630
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.2.0, 2.2.1
>            Reporter: Jackey Lee
>            Priority: Minor
>              Labels: SQLStreaming
>         Attachments: SQLStreaming SPIP.pdf
>
>
> At present, KafkaSQL, Flink SQL(which is actually based on Calcite), SQLStream, StormSQL all provide a stream type SQL interface, with which users with little knowledge about streaming,  can easily develop a flow system processing model. In Spark, we can also support SQL API based on StructStreamig.
> To support for SQL Streaming, there are two key points: 
> 1, Analysis should be able to parse streaming type SQL. 
> 2, Analyzer should be able to map metadata information to the corresponding 
> Relation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org