You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Brad Miller <bm...@eecs.berkeley.edu> on 2014/06/30 19:20:12 UTC

odd caching behavior or accounting

Hi All,

I've recently noticed some caching behavior which I did not understand
and may or may not have indicated a bug.  In short, the web UI seemed
to indicate that some blocks were being added to the cache despite
already being in cache.

As documentation, I have attached two UI screenshots.  The PNG
captures enough of the screen to demonstrate the problem; the PDF is
the printout of the full page.  Notice that:

-block rdd_21_1001 is in the cache twice, both times on
letang.research.intel-research.net; many other blocks also occur twice
on a variety of hosts.  I've not confirmed that the duplicate block is
*always* the same host but it seems to appear that way.

-the stated storage level is "Memory Deserialized 1x Replicated"

-the top left states that the "cached partitions" and "total
partitions" are 4000, but in the table where partitions are enumerated
there are 4534.

Although not reflected in this screenshot, I believe I have seen this
behavior occur even when double caching of blocks causes eviction of
blocks from other RDDs.  I am running the Spark 1.0.0 release and
using pyspark.

best,
-Brad

Re: odd caching behavior or accounting

Posted by Hbf <Ka...@dreizak.com>.
I'm seeing the same behavior in Spark 2.0.1. Does anybody have an
explanation?

Thanks!
Kaspar


bmiller1 wrote
> Hi All,
> 
> I've recently noticed some caching behavior which I did not understand
> and may or may not have indicated a bug.  In short, the web UI seemed
> to indicate that some blocks were being added to the cache despite
> already being in cache.
> 
> As documentation, I have attached two UI screenshots.  The PNG
> captures enough of the screen to demonstrate the problem; the PDF is
> the printout of the full page.  Notice that:
> 
> -block rdd_21_1001 is in the cache twice, both times on
> letang.research.intel-research.net; many other blocks also occur twice
> on a variety of hosts.  I've not confirmed that the duplicate block is
> *always* the same host but it seems to appear that way.
> 
> -the stated storage level is "Memory Deserialized 1x Replicated"
> 
> -the top left states that the "cached partitions" and "total
> partitions" are 4000, but in the table where partitions are enumerated
> there are 4534.
> 
> Although not reflected in this screenshot, I believe I have seen this
> behavior occur even when double caching of blocks causes eviction of
> blocks from other RDDs.  I am running the Spark 1.0.0 release and
> using pyspark.
> 
> best,
> -Brad
> 
> 
> pyspark_caching.pdf (2M)
> &lt;http://apache-spark-user-list.1001560.n3.nabble.com/attachment/8546/0/pyspark_caching.pdf&gt;
> Screen Shot 2014-06-30 at 10.03.16 AM.png (292K)
> &lt;http://apache-spark-user-list.1001560.n3.nabble.com/attachment/8546/1/Screen%20Shot%202014-06-30%20at%2010.03.16%20AM.png&gt;





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/odd-caching-behavior-or-accounting-tp8546p28376.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org