You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jackey Lee (JIRA)" <ji...@apache.org> on 2018/10/08 08:37:00 UTC
[jira] [Comment Edited] (SPARK-24630) SPIP: Support SQLStreaming in
Spark
[ https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641516#comment-16641516 ]
Jackey Lee edited comment on SPARK-24630 at 10/8/18 8:36 AM:
-------------------------------------------------------------
[~kabhwan]
The DDL in SQLStreaming has nothing change with Batch SQL.
Adding 'stream' keyword has two purposes:
# *Mark the entire sql query as a stream query and generate the SQLStreaming plan tree.*
# *Mark the table type as UnResolvedStreamRelation.* Parse the table as StreamingRelation or other Relation, especially in the stream join batch queries, such as kafka join mysql.
A little example to show importances of 'stream': read stream from kafka stream table, and join mysql to count user message
# *with 'stream'*
## select stream kafka_sql_test.name, count(door) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name group by kafka_sql_test.name
## It will be regarded as Streaming Query, the kafka_sql_test will be parsed as StreamingRelation and _mysql_test will be parsed as JDBCRelation, not Streaming Relation._
# *without 'stream'*
## select kafka_sql.name, count(door) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name group by kafka_sql_test.name
## It will be regarded as Batch Query, the kafka_sql_test will be parsed to KafkaRelation and mysql_test will be parsed as JDBCRelation.
*As for Flink, it uses the StreamExecutionEnvironment and StreamTableEnvironment to achieve the above goals.* Thus, 'stream' keyword is no need for Flink.
was (Author: jackey lee):
[~kabhwan]
The DDL in SQLStreaming has nothing change with Batch SQL.
Adding 'stream' keyword has two purposes:
# *Mark the entire sql query as a stream query and generate the SQLStreaming plan tree.*
# *Mark the table type as UnResolvedStreamRelation.* Parse the table as StreamingRelation or other Relation, especially in the stream join batch queries, such as kafka join mysql.
A little example to show importances of 'stream': read stream from kafka stream table, and join mysql to count user message
# *with 'stream'*
## select stream name, count(*) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name
## It will be regarded as Streaming Query, the kafka_sql_test will be parsed as StreamingRelation and _mysql_test will be parsed as JDBCRelation, not Streaming Relation._
# *without 'stream'*
## select name, count(*) from kafka_sql_test inner join mysql_test on kafka_sql_test.name == mysql_test.name
## It will be regarded as Batch Query, the kafka_sql_test will be parsed to KafkaRelation and mysql_test will be parsed as JDBCRelation.
*As for Flink, it uses the StreamExecutionEnvironment and StreamTableEnvironment to achieve the above goals.* Thus, 'stream' keyword is no need for Flink.
> SPIP: Support SQLStreaming in Spark
> -----------------------------------
>
> Key: SPARK-24630
> URL: https://issues.apache.org/jira/browse/SPARK-24630
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 2.2.0, 2.2.1
> Reporter: Jackey Lee
> Priority: Minor
> Labels: SQLStreaming
> Attachments: SQLStreaming SPIP.pdf
>
>
> At present, KafkaSQL, Flink SQL(which is actually based on Calcite), SQLStream, StormSQL all provide a stream type SQL interface, with which users with little knowledge about streaming, can easily develop a flow system processing model. In Spark, we can also support SQL API based on StructStreamig.
> To support for SQL Streaming, there are two key points:
> 1, Analysis should be able to parse streaming type SQL.
> 2, Analyzer should be able to map metadata information to the corresponding
> Relation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org