You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anjali Chadha <an...@gmail.com> on 2016/06/16 09:24:42 UTC

Spark cache behaviour when the source table is modified

Hi all,

I am having a hard time understanding the caching concepts in Spark.

I have a hive table("person"), which is cached in Spark.

sqlContext.sql("create table person (name string, age int)") //Create
a new table
//Add some values to the table
...
...
//Cache the table in Spark
sqlContext.cacheTable("person")
sqlContext.isCached("person") //Returns true
sqlContext.sql("insert into table person values ("Foo", 25)") //
Insert some other value in the table

//Check caching status again
sqlContext.isCached("person") //Returns true

sqlContext is *HiveContext*.

Will the entries inserted after *cacheTable("person")* statement be cached?
In other words, ("Foo", 25) entry is cached in Spark or not?

If not, how can I cache only the entries inserted later? I don't want to
first uncache and then again cache the whole table.

Any relevant web link or information will be appreciated.

- Anjali Chadha

Re: Spark cache behaviour when the source table is modified

Posted by Chanh Le <gi...@gmail.com>.
Hi Anjali,
The Cached is immutable you can’t update data into. 
They way to update cache is re-create cache.


> On Jun 16, 2016, at 4:24 PM, Anjali Chadha <an...@gmail.com> wrote:
> 
> Hi all,
> 
> I am having a hard time understanding the caching concepts in Spark.
> 
> I have a hive table("person"), which is cached in Spark.
> 
> sqlContext.sql("create table person (name string, age int)") //Create a new table
> //Add some values to the table
> ...
> ...
> //Cache the table in Spark
> sqlContext.cacheTable("person") 
> sqlContext.isCached("person") //Returns true
> sqlContext.sql("insert into table person values ("Foo", 25)") // Insert some other value in the table
> 
> //Check caching status again
> sqlContext.isCached("person") //Returns true
> sqlContext is HiveContext.
> 
> Will the entries inserted after cacheTable("person") statement be cached? In other words, ("Foo", 25) entry is cached in Spark or not?
> 
> If not, how can I cache only the entries inserted later? I don't want to first uncache and then again cache the whole table.
> 
> Any relevant web link or information will be appreciated.
> 
> - Anjali Chadha
>