You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Muler <mu...@gmail.com> on 2015/08/07 18:42:16 UTC

Spark is in-memory processing, how then can Tachyon make Spark faster?

Spark is an in-memory engine and attempts to do computation in-memory.
Tachyon is memory-centeric distributed storage, OK, but how would that help
ran Spark faster?

Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

Posted by andy petrella <an...@gmail.com>.

Exactly!

The sharing part is used in the Spark Notebook (this one
<https://github.com/andypetrella/spark-notebook/blob/master/notebooks/Tachyon%20Test.snb>)
so we can share stuffs between notebooks which are different SparkContext
(in diff JVM).

OTOH, we have a project that creates micro services on genomics data, for
several reasons we used Tachyon to server genomes cubes (ranges across
genomes), see here <https://github.com/med-at-scale/high-health>.

HTH
andy

On Fri, Aug 7, 2015 at 8:36 PM Calvin Jia <ji...@gmail.com> wrote:

> Hi,
>
> Tachyon <http://tachyon-project.org> manages memory off heap which can
> help prevent long GC pauses. Also, using Tachyon will allow the data to be
> shared between Spark jobs if they use the same dataset.
>
> Here's <http://www.meetup.com/Tachyon/events/222485713/> a production use
> case where Baidu runs Tachyon to get 30x performance improvement in their
> SparkSQL workload.
>
> Hope this helps,
> Calvin
>
> On Fri, Aug 7, 2015 at 9:42 AM, Muler <mu...@gmail.com> wrote:
>
>> Spark is an in-memory engine and attempts to do computation in-memory.
>> Tachyon is memory-centeric distributed storage, OK, but how would that help
>> ran Spark faster?
>>
>
> --
andy

Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

Posted by Calvin Jia <ji...@gmail.com>.

Hi,

Tachyon <http://tachyon-project.org> manages memory off heap which can help
prevent long GC pauses. Also, using Tachyon will allow the data to be
shared between Spark jobs if they use the same dataset.

Here's <http://www.meetup.com/Tachyon/events/222485713/> a production use
case where Baidu runs Tachyon to get 30x performance improvement in their
SparkSQL workload.

Hope this helps,
Calvin

On Fri, Aug 7, 2015 at 9:42 AM, Muler <mu...@gmail.com> wrote:

> Spark is an in-memory engine and attempts to do computation in-memory.
> Tachyon is memory-centeric distributed storage, OK, but how would that help
> ran Spark faster?
>