You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Albert Strasheim <al...@cloudflare.com> on 2015/07/25 19:01:02 UTC

spark-dataflow + Spark Streaming + Kafka

Hello all

New Spark user here. We've been looking at the Spark ecosystem to
build some new parts of our log processing pipeline.

The spark-dataflow project looks especially interesting.

The windowing and triggers concepts look like a good fit for what we
need to do: our log data going into Kafka is only in approximate time
order and some events are sometimes delayed for quite some time.

https://cloud.google.com/dataflow/model/windowing
https://cloud.google.com/dataflow/model/triggers

A few questions:

0. Is this a good forum for questions about spark-dataflow?

1. Is anybody using spark-dataflow for serious projects running
outside of Google Cloud? How's it going with 0.2.3? Do windowing and
triggers work?

2. Is anybody looking at adding support for Spark Streaming to
spark-dataflow? It looks like SparkPipelineRunner and other parts
would need to be extended to understand about StreamingContext.

3. Are there good alternatives to spark-dataflow that should be considered?

4. Should we be looking at rolling our own windowing + triggers setup
directly on top of Spark/Spark Streaming instead of trying to use
spark-dataflow?

5. If number 4 sounds like an option, is there any code out there that
is doing this already that we can look at for some inspiration?

Any advice appreciated.

Thanks

Albert

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org