You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "xichen_tju@126" <xi...@126.com> on 2014/07/09 03:17:16 UTC

Spark Streaming and Storm

hi all
I am a newbie to Spark Streaming, and used Strom before.Have u test the performance both of them and which one is better?




xichen_tju@126

Re: Spark Streaming and Storm

Posted by critikaled <is...@gmail.com>.
http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-Storm-tp9118p17530.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark Streaming and Storm

Posted by "Dan H." <dc...@gmail.com>.
Xichen_tju,

I recently evaluated Storm for a period of months (using 2Us, 2.4GHz CPU, 24GBRAM with 3 servers) and was not able to achieve a realistic scale for my business domain needs.  Storm is really only a framework, which allows you to put in code to do whatever it is you need for a distributed system…so it’s completely flexible and distributable, but it comes at a price.  In Storm, the one of the biggest performance hits, came down to how the “acks” work within the tuple trees.  You can have the framework default ack messages between spouts and/or bolts, but in the end, you most likely want to manage acks yourself, due to how much reliability you’re system will need (to replay messages…).  All this means, is that if you don’t have massive amounts of data that you need to process within a few seconds, (which I do) then Storm may work well for you, but you’re performance will diminish as you add in more and more business rules (unless of course you add in more servers for processing).  If you need to ingest at least 1GBps+, then you may want to reevaluate since you’re server scale may not mesh with you overall processing needs.

I recently just started using Spark Streaming with Kafka and have been quite impressed at the performance level that’s being achieved.  I particularly like the fact that Spark isn’t just a framework, but it provides you with simple tools with API convenience methods.  Some of those features are reduceByKey (mapReduce), sliding and aggregate sub time windows, etc.  Also, In my environment, I believe it’s going to be a great fit since we use Hadoop already and Spark should fit into that environment well.

You should look into both Storm and Spark Streaming, but in the end it just depends on your needs.  If you not looking for Streaming aspects, then Spark on Hadoop is a great option since Spark will cache the dataset in memory for all queries, which will be much faster than running Hive/Pig onto of Hadoop.  But I’m assuming you need some sort of Streaming system for data flow, but if it doesn’t need to be real-time or near real-time, you may want to simply look at Hadoop, which you could always use Spark ontop of for real-time queries.

Hope this helps…

Dan

 
On Jul 8, 2014, at 7:25 PM, Shao, Saisai <sa...@intel.com> wrote:

> You may get the performance comparison results from Spark Streaming paper and meetup ppt, just google it.
> Actually performance comparison is case by case and relies on your work load design, hardware and software configurations. There is no actual winner for the whole scenarios.
>  
> Thanks
> Jerry
>  
> From: xichen_tju@126 [mailto:xichen_tju@126.com] 
> Sent: Wednesday, July 09, 2014 9:17 AM
> To: user@spark.apache.org
> Subject: Spark Streaming and Storm
>  
> hi all
> I am a newbie to Spark Streaming, and used Strom before.Have u test the performance both of them and which one is better?
>  
> xichen_tju@126


RE: Spark Streaming and Storm

Posted by "Shao, Saisai" <sa...@intel.com>.
You may get the performance comparison results from Spark Streaming paper and meetup ppt, just google it.
Actually performance comparison is case by case and relies on your work load design, hardware and software configurations. There is no actual winner for the whole scenarios.

Thanks
Jerry

From: xichen_tju@126 [mailto:xichen_tju@126.com]
Sent: Wednesday, July 09, 2014 9:17 AM
To: user@spark.apache.org
Subject: Spark Streaming and Storm

hi all
I am a newbie to Spark Streaming, and used Strom before.Have u test the performance both of them and which one is better?

________________________________
xichen_tju@126