You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Abhishek R. Singh" <ab...@tetrationanalytics.com> on 2015/07/21 20:57:41 UTC

spark streaming disk hit

Is it fair to say that Storm stream processing is completely in memory, whereas spark streaming would take a disk hit because of how shuffle works?

Does spark streaming try to avoid disk usage out of the box?

-Abhishek-
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: spark streaming disk hit

Posted by "Abhishek R. Singh" <ab...@tetrationanalytics.com>.

Thanks TD - appreciate the response !

On Jul 21, 2015, at 1:54 PM, Tathagata Das <td...@databricks.com> wrote:

> Most shuffle files are really kept around in the OS's buffer/disk cache, so it is still pretty much in memory. If you are concerned about performance, you have to do a holistic comparison for end-to-end performance. You could take a look at this. 
> 
> https://spark-summit.org/2015/events/towards-benchmarking-modern-distributed-streaming-systems/
> 
> On Tue, Jul 21, 2015 at 11:57 AM, Abhishek R. Singh <ab...@tetrationanalytics.com> wrote:
> Is it fair to say that Storm stream processing is completely in memory, whereas spark streaming would take a disk hit because of how shuffle works?
> 
> Does spark streaming try to avoid disk usage out of the box?
> 
> -Abhishek-
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 
>

Re: spark streaming disk hit

Posted by Tathagata Das <td...@databricks.com>.

Most shuffle files are really kept around in the OS's buffer/disk cache, so
it is still pretty much in memory. If you are concerned about performance,
you have to do a holistic comparison for end-to-end performance. You could
take a look at this.

https://spark-summit.org/2015/events/towards-benchmarking-modern-distributed-streaming-systems/

On Tue, Jul 21, 2015 at 11:57 AM, Abhishek R. Singh <
abhishsi@tetrationanalytics.com> wrote:

> Is it fair to say that Storm stream processing is completely in memory,
> whereas spark streaming would take a disk hit because of how shuffle works?
>
> Does spark streaming try to avoid disk usage out of the box?
>
> -Abhishek-
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>