You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by polariz <st...@gmail.com> on 2015/07/15 11:16:40 UTC

Spark Stream suitability

Hi,

I am am evaluating my options for a project that injects a rich data feed,
does some aggregate calculations and allows the user to query on these. 

The (protobuf) data feed is rich in the sense that it contains several data
fields which can be used to calculate several different KPI figures. The
KPIs are not related. 

I would like to explore the possibility of doing this work as data comes in
using Spark Streaming. Any examples I've seen and my gut tells me that the
Spark Stream apps should be kept simple.. one data metric is processed in
one "pipeline" and persisted at the end. In my case I would need to ingest
the rich data and fork into several pipelines, each calculating a different
KPI and then persist them all at the end as one transaction.  

Am I right in thinking that this complexity and aggregation work would be
better placed in separate offline Spark jobs?

Any feedback would be much appreciated, thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Stream-suitability-tp23852.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org