You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/11/02 06:53:55 UTC

Re: Plan on Structured Streaming in next major/minor release?

If I can add one thing to this list I would say stateless aggregations
using Raw SQL.

For example: As I read micro-batches from Kafka I want to do say a count of
that micro batch and spit it out using Raw SQL . (No Count aggregation
across batches.)

On Tue, Oct 30, 2018 at 4:55 PM Jungtaek Lim <ka...@gmail.com> wrote:

> OK thanks for clarifying. I guess it is one of major features in streaming
> area and nice to add, but also agree it would require huge investigation.
>
> 2018년 10월 31일 (수) 오전 8:06, Michael Armbrust <mi...@databricks.com>님이 작성:
>
>> Agree. Just curious, could you explain what do you mean by "negation"?
>>> Does it mean applying retraction on aggregated?
>>>
>>
>> Yeah exactly.  Our current streaming aggregation assumes that the input
>> is in append-mode and multiple aggregations break this.
>>
>

Re: Plan on Structured Streaming in next major/minor release?

Posted by JackyLee <qc...@163.com>.

Can these things be added into this list?
1. [SPARK-24630] Support SQLStreaming in Spark
      This patch defines the Table API for StructStreaming
2. [SPARK-25937] Support user-defined schema in Kafka Source & Sink
      This patch make user easier to work with StructStreaming
3. SS supports dynamic partition scheduling 
       SS uses the serial execution engine, which means, SS can not catch up
with data output effectively when back pressure or computing speed is
reduced. If the dynamic partition scheduling for SS can be realized, the
partition number will be automatically increased when needed, then SS can
effectively catch up with the calculation speed.The main idea is to replace
time with computing resources.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Plan on Structured Streaming in next major/minor release?

Posted by Jungtaek Lim <ka...@gmail.com>.

My 2 cents, "micro-batch" is the way how Spark handles stream, not a
semantic we are considering. Semantically and ideally, same SQL query
should provide same result between batch and streaming except late events
once the operations in query are supported.

2018년 11월 2일 (금) 오후 3:54, kant kodali <ka...@gmail.com>님이 작성:

> If I can add one thing to this list I would say stateless aggregations
> using Raw SQL.
>
> For example: As I read micro-batches from Kafka I want to do say a count
> of that micro batch and spit it out using Raw SQL . (No Count aggregation
> across batches.)
>
>
>
> On Tue, Oct 30, 2018 at 4:55 PM Jungtaek Lim <ka...@gmail.com> wrote:
>
>> OK thanks for clarifying. I guess it is one of major features in
>> streaming area and nice to add, but also agree it would require huge
>> investigation.
>>
>> 2018년 10월 31일 (수) 오전 8:06, Michael Armbrust <mi...@databricks.com>님이
>> 작성:
>>
>>> Agree. Just curious, could you explain what do you mean by "negation"?
>>>> Does it mean applying retraction on aggregated?
>>>>
>>>
>>> Yeah exactly.  Our current streaming aggregation assumes that the input
>>> is in append-mode and multiple aggregations break this.
>>>
>>