You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Etienne Chauchot <ec...@apache.org> on 2019/11/20 16:12:30 UTC
[spark structured streaming runner] available on master
Hi all,
I'm glad to announce that the new Spark runner based on Spark structured
streaming framework has been merged into master !
It is not based on RDD/DStream API. See
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
It is still experimental, its coverage of the Beam model is partial:
- the runner passes 95% of the validates runner tests in batch mode.
- It does not have support for streaming yet (waiting for the
multi-aggregations support in spark StructuredStreaming framework from
the Spark community)
- Runner can execute Nexmark : perfkit dashboards yet to come
- Some things are not wired up yet:
- Beam Schemas not wired up
- Optional features of the model not implemented: state api, timer
api, splittable doFn api, …
I will submit a PR to update the capability matrix in the coming days.
Best
Etienne
Re: [spark structured streaming runner] available on master
Posted by Kai Jiang <ji...@gmail.com>.
Cool, cool! Love to see Nexmark on Spark structured streaming runner
perfkit dashboard
On Wed, Nov 20, 2019 at 2:12 PM Pablo Estrada <pa...@google.com> wrote:
> Very cool! : ) Thanks to everyone involved moving this forward.
> Best
> -P.
>
> On Wed, Nov 20, 2019 at 8:50 AM Etienne Chauchot <ec...@apache.org>
> wrote:
>
>> Forgot to say thanks everyone for their contribution to this especially
>> Alexey, Ryan and Ismael.
>>
>> Etienne
>> On 20/11/2019 17:12, Etienne Chauchot wrote:
>>
>> Hi all,
>>
>> I'm glad to announce that the new Spark runner based on Spark structured
>> streaming framework has been merged into master !
>>
>> It is not based on RDD/DStream API. See
>> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
>>
>> It is still experimental, its coverage of the Beam model is partial:
>>
>> - the runner passes 95% of the validates runner tests in batch mode.
>>
>> - It does not have support for streaming yet (waiting for the
>> multi-aggregations support in spark StructuredStreaming framework from the
>> Spark community)
>>
>> - Runner can execute Nexmark : perfkit dashboards yet to come
>>
>> - Some things are not wired up yet:
>>
>> - Beam Schemas not wired up
>>
>> - Optional features of the model not implemented: state api, timer
>> api, splittable doFn api, …
>>
>> I will submit a PR to update the capability matrix in the coming days.
>>
>> Best
>>
>> Etienne
>>
>>
>>
Re: [spark structured streaming runner] available on master
Posted by Pablo Estrada <pa...@google.com>.
Very cool! : ) Thanks to everyone involved moving this forward.
Best
-P.
On Wed, Nov 20, 2019 at 8:50 AM Etienne Chauchot <ec...@apache.org>
wrote:
> Forgot to say thanks everyone for their contribution to this especially
> Alexey, Ryan and Ismael.
>
> Etienne
> On 20/11/2019 17:12, Etienne Chauchot wrote:
>
> Hi all,
>
> I'm glad to announce that the new Spark runner based on Spark structured
> streaming framework has been merged into master !
>
> It is not based on RDD/DStream API. See
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
>
> It is still experimental, its coverage of the Beam model is partial:
>
> - the runner passes 95% of the validates runner tests in batch mode.
>
> - It does not have support for streaming yet (waiting for the
> multi-aggregations support in spark StructuredStreaming framework from the
> Spark community)
>
> - Runner can execute Nexmark : perfkit dashboards yet to come
>
> - Some things are not wired up yet:
>
> - Beam Schemas not wired up
>
> - Optional features of the model not implemented: state api, timer
> api, splittable doFn api, …
>
> I will submit a PR to update the capability matrix in the coming days.
>
> Best
>
> Etienne
>
>
>
Re: [spark structured streaming runner] available on master
Posted by Etienne Chauchot <ec...@apache.org>.
Forgot to say thanks everyone for their contribution to this especially
Alexey, Ryan and Ismael.
Etienne
On 20/11/2019 17:12, Etienne Chauchot wrote:
> Hi all,
>
> I'm glad to announce that the new Spark runner based on Spark
> structured streaming framework has been merged into master !
>
> It is not based on RDD/DStream API. See
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
>
> It is still experimental, its coverage of the Beam model is partial:
>
> - the runner passes 95% of the validates runner tests in batch mode.
>
> - It does not have support for streaming yet (waiting for the
> multi-aggregations support in spark StructuredStreaming framework from
> the Spark community)
>
> - Runner can execute Nexmark : perfkit dashboards yet to come
>
> - Some things are not wired up yet:
>
> - Beam Schemas not wired up
>
> - Optional features of the model not implemented: state api,
> timer api, splittable doFn api, …
>
> I will submit a PR to update the capability matrix in the coming days.
>
> Best
>
> Etienne
>
>