You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Brandon White <bw...@gmail.com> on 2015/07/21 22:59:28 UTC

Spark SQL Table Caching

A few questions about caching a table in Spark SQL.

1) Is there any difference between caching the dataframe and the table?

df.cache() vs sqlContext.cacheTable("tableName")

2) Do you need to "warm up" the cache before seeing the performance
benefits? Is the cache LRU? Do you need to run some queries on the table
before it is cached in memory?

3) Is caching the table much faster than .saveAsTable? I am only seeing a
10 %- 20% performance increase.

Re: Question on Spark SQL for a directory

Posted by Michael Armbrust <mi...@databricks.com>.

https://spark.apache.org/docs/latest/sql-programming-guide.html#loading-data-programmatically

On Tue, Jul 21, 2015 at 4:06 PM, Ron Gonzalez <zl...@yahoo.com.invalid>
wrote:

> Hi,
>   Question on using spark sql.
>   Can someone give an example for creating table from a directory
> containing parquet files in HDFS instead of an actual parquet file?
>
> Thanks,
> Ron
>
> On 07/21/2015 01:59 PM, Brandon White wrote:
>
>> A few questions about caching a table in Spark SQL.
>>
>> 1) Is there any difference between caching the dataframe and the table?
>>
>> df.cache() vs sqlContext.cacheTable("tableName")
>>
>> 2) Do you need to "warm up" the cache before seeing the performance
>> benefits? Is the cache LRU? Do you need to run some queries on the table
>> before it is cached in memory?
>>
>> 3) Is caching the table much faster than .saveAsTable? I am only seeing a
>> 10 %- 20% performance increase.
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Question on Spark SQL for a directory

Posted by Ron Gonzalez <zl...@yahoo.com.INVALID>.

Hi,
   Question on using spark sql.
   Can someone give an example for creating table from a directory 
containing parquet files in HDFS instead of an actual parquet file?

Thanks,
Ron

On 07/21/2015 01:59 PM, Brandon White wrote:
> A few questions about caching a table in Spark SQL.
>
> 1) Is there any difference between caching the dataframe and the table?
>
> df.cache() vs sqlContext.cacheTable("tableName")
>
> 2) Do you need to "warm up" the cache before seeing the performance 
> benefits? Is the cache LRU? Do you need to run some queries on the 
> table before it is cached in memory?
>
> 3) Is caching the table much faster than .saveAsTable? I am only 
> seeing a 10 %- 20% performance increase.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark SQL Table Caching

Posted by Pedro Rodriguez <sk...@gmail.com>.

I would be interested in the answer to this question, plus the relationship
between those and registerTempTable()

Pedro

On Tue, Jul 21, 2015 at 1:59 PM, Brandon White <bw...@gmail.com>
wrote:

> A few questions about caching a table in Spark SQL.
>
> 1) Is there any difference between caching the dataframe and the table?
>
> df.cache() vs sqlContext.cacheTable("tableName")
>
> 2) Do you need to "warm up" the cache before seeing the performance
> benefits? Is the cache LRU? Do you need to run some queries on the table
> before it is cached in memory?
>
> 3) Is caching the table much faster than .saveAsTable? I am only seeing a
> 10 %- 20% performance increase.
>



-- 
Pedro Rodriguez
UCBerkeley 2014 | Computer Science
SnowGeek <http://SnowGeek.org>
pedro-rodriguez.com
ski.rodriguez@gmail.com
208-340-1703