You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Etienne Chauchot <ec...@apache.org> on 2019/11/20 16:12:30 UTC

[spark structured streaming runner] available on master

Hi all,

I'm glad to announce that the new Spark runner based on Spark structured 
streaming framework has been merged into master !

It is not based on RDD/DStream API. See 
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

It is still experimental, its coverage of the Beam model is partial:

- the runner passes 95% of the validates runner tests in batch mode.

- It does not have support for streaming yet (waiting for the 
multi-aggregations support in spark StructuredStreaming framework from 
the Spark community)

- Runner can execute Nexmark : perfkit dashboards yet to come

- Some things are not wired up yet:

     - Beam Schemas not wired up

     - Optional features of the model not implemented:  state api, timer 
api, splittable doFn api, …

I will submit a PR to update the capability matrix in the coming days.

Best

Etienne



Re: [spark structured streaming runner] available on master

Posted by Kai Jiang <ji...@gmail.com>.
Cool, cool! Love to see Nexmark on Spark structured streaming runner
perfkit dashboard

On Wed, Nov 20, 2019 at 2:12 PM Pablo Estrada <pa...@google.com> wrote:

> Very cool! : ) Thanks to everyone involved moving this forward.
> Best
> -P.
>
> On Wed, Nov 20, 2019 at 8:50 AM Etienne Chauchot <ec...@apache.org>
> wrote:
>
>> Forgot to say thanks everyone for their contribution to this especially
>> Alexey, Ryan and Ismael.
>>
>> Etienne
>> On 20/11/2019 17:12, Etienne Chauchot wrote:
>>
>> Hi all,
>>
>> I'm glad to announce that the new Spark runner based on Spark structured
>> streaming framework has been merged into master !
>>
>> It is not based on RDD/DStream API. See
>> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
>>
>> It is still experimental, its coverage of the Beam model is partial:
>>
>> - the runner passes 95% of the validates runner tests in batch mode.
>>
>> - It does not have support for streaming yet (waiting for the
>> multi-aggregations support in spark StructuredStreaming framework from the
>> Spark community)
>>
>> - Runner can execute Nexmark : perfkit dashboards yet to come
>>
>> - Some things are not wired up yet:
>>
>>     - Beam Schemas not wired up
>>
>>     - Optional features of the model not implemented:  state api, timer
>> api, splittable doFn api, …
>>
>> I will submit a PR to update the capability matrix in the coming days.
>>
>> Best
>>
>> Etienne
>>
>>
>>

Re: [spark structured streaming runner] available on master

Posted by Pablo Estrada <pa...@google.com>.
Very cool! : ) Thanks to everyone involved moving this forward.
Best
-P.

On Wed, Nov 20, 2019 at 8:50 AM Etienne Chauchot <ec...@apache.org>
wrote:

> Forgot to say thanks everyone for their contribution to this especially
> Alexey, Ryan and Ismael.
>
> Etienne
> On 20/11/2019 17:12, Etienne Chauchot wrote:
>
> Hi all,
>
> I'm glad to announce that the new Spark runner based on Spark structured
> streaming framework has been merged into master !
>
> It is not based on RDD/DStream API. See
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
>
> It is still experimental, its coverage of the Beam model is partial:
>
> - the runner passes 95% of the validates runner tests in batch mode.
>
> - It does not have support for streaming yet (waiting for the
> multi-aggregations support in spark StructuredStreaming framework from the
> Spark community)
>
> - Runner can execute Nexmark : perfkit dashboards yet to come
>
> - Some things are not wired up yet:
>
>     - Beam Schemas not wired up
>
>     - Optional features of the model not implemented:  state api, timer
> api, splittable doFn api, …
>
> I will submit a PR to update the capability matrix in the coming days.
>
> Best
>
> Etienne
>
>
>

Re: [spark structured streaming runner] available on master

Posted by Etienne Chauchot <ec...@apache.org>.
Forgot to say thanks everyone for their contribution to this especially 
Alexey, Ryan and Ismael.

Etienne

On 20/11/2019 17:12, Etienne Chauchot wrote:
> Hi all,
>
> I'm glad to announce that the new Spark runner based on Spark 
> structured streaming framework has been merged into master !
>
> It is not based on RDD/DStream API. See 
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
>
> It is still experimental, its coverage of the Beam model is partial:
>
> - the runner passes 95% of the validates runner tests in batch mode.
>
> - It does not have support for streaming yet (waiting for the 
> multi-aggregations support in spark StructuredStreaming framework from 
> the Spark community)
>
> - Runner can execute Nexmark : perfkit dashboards yet to come
>
> - Some things are not wired up yet:
>
>     - Beam Schemas not wired up
>
>     - Optional features of the model not implemented:  state api, 
> timer api, splittable doFn api, …
>
> I will submit a PR to update the capability matrix in the coming days.
>
> Best
>
> Etienne
>
>