You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Xu (Simon) Chen" <xc...@gmail.com> on 2014/06/07 05:13:54 UTC

cache spark sql parquet file in memory?

This might be a stupid question... but it seems that saveAsParquetFile()
writes everything back to HDFS. I am wondering if it is possible to cache
parquet-format intermediate results in memory, and therefore making spark
sql queries faster.

Thanks.
-Simon

Re: cache spark sql parquet file in memory?

Posted by "Xu (Simon) Chen" <xc...@gmail.com>.

Is there a way to start tachyon on top of a yarn cluster?
 On Jun 7, 2014 2:11 PM, "Marek Wiewiorka" <ma...@gmail.com>
wrote:

> I was also thinking of using tachyon to store parquet files - maybe
> tomorrow I will give a try as well.
>
>
> 2014-06-07 20:01 GMT+02:00 Michael Armbrust <mi...@databricks.com>:
>
>> Not a stupid question!  I would like to be able to do this.  For now, you
>> might try writing the data to tachyon <http://tachyon-project.org/>
>> instead of HDFS.  This is untested though, please report any issues you run
>> into.
>>
>> Michael
>>
>>
>> On Fri, Jun 6, 2014 at 8:13 PM, Xu (Simon) Chen <xc...@gmail.com>
>> wrote:
>>
>>> This might be a stupid question... but it seems that saveAsParquetFile()
>>> writes everything back to HDFS. I am wondering if it is possible to cache
>>> parquet-format intermediate results in memory, and therefore making spark
>>> sql queries faster.
>>>
>>> Thanks.
>>> -Simon
>>>
>>
>>
>

Re: cache spark sql parquet file in memory?

Posted by Marek Wiewiorka <ma...@gmail.com>.

I was also thinking of using tachyon to store parquet files - maybe
tomorrow I will give a try as well.


2014-06-07 20:01 GMT+02:00 Michael Armbrust <mi...@databricks.com>:

> Not a stupid question!  I would like to be able to do this.  For now, you
> might try writing the data to tachyon <http://tachyon-project.org/>
> instead of HDFS.  This is untested though, please report any issues you run
> into.
>
> Michael
>
>
> On Fri, Jun 6, 2014 at 8:13 PM, Xu (Simon) Chen <xc...@gmail.com> wrote:
>
>> This might be a stupid question... but it seems that saveAsParquetFile()
>> writes everything back to HDFS. I am wondering if it is possible to cache
>> parquet-format intermediate results in memory, and therefore making spark
>> sql queries faster.
>>
>> Thanks.
>> -Simon
>>
>
>

Re: cache spark sql parquet file in memory?

Posted by Michael Armbrust <mi...@databricks.com>.

Not a stupid question!  I would like to be able to do this.  For now, you
might try writing the data to tachyon <http://tachyon-project.org/> instead
of HDFS.  This is untested though, please report any issues you run into.

Michael

On Fri, Jun 6, 2014 at 8:13 PM, Xu (Simon) Chen <xc...@gmail.com> wrote:

> This might be a stupid question... but it seems that saveAsParquetFile()
> writes everything back to HDFS. I am wondering if it is possible to cache
> parquet-format intermediate results in memory, and therefore making spark
> sql queries faster.
>
> Thanks.
> -Simon
>