You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sarutak <gi...@git.apache.org> on 2014/09/16 17:04:56 UTC

[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

GitHub user sarutak opened a pull request:

    https://github.com/apache/spark/pull/2411

    [SPARK-3548] [WebUI] Display cache hit ratio on WebUI

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sarutak/spark cache-hit-ratio-feature

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2411.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2411
    
----
commit 1d1b18d80bd0ff2f2546821ad637eb07c3df59f2
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Date:   2014-09-16T11:44:04Z

    Added Cache Hit Count and Cache Miss Count metrics

commit 678c676004f180f4087807cac6a472338d796b3e
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Date:   2014-09-16T13:09:07Z

    Modified StagePage.scala

commit 05724f84b5927e589a76a4f3cc7e3b8161996512
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Date:   2014-09-16T15:03:40Z

    Modified ExecutorTable.scala to display cache hit ratio

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2411#issuecomment-55766313
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20390/consoleFull) for   PR 2411 at commit [`05724f8`](https://github.com/apache/spark/commit/05724f84b5927e589a76a4f3cc7e3b8161996512).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2411#issuecomment-55757731
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20390/consoleFull) for   PR 2411 at commit [`05724f8`](https://github.com/apache/spark/commit/05724f84b5927e589a76a4f3cc7e3b8161996512).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2411#issuecomment-55820856
  
    Hey so I think there are a few issues with this. Given the semantics of persisting RDD's I don't think it's really possible to express a "hit ratio" that makes sense. If I cache my RDD with MEMORY_AND_DISK, and the data is served from disk, is that considered a cache hit? We don't have a binary system of "cached, not cached", so reducing the result to a ratio doesn't make much sense.
    
    Another issue with this is that it has somewhat awkward semantics around pipelining. For instance:
    
    ```
    >>> val x = rdd1.cache().count
    
    # This will be at most 33% cache ratio, even if all partitions of x are served from cache
    >>> x.filter(...).filter(...).count
    
    # This will be at most 25% cache ratio, even if all partitions of x are served from cache
    >>> x.filter(...).filter(...).filter(...).count
    ```
    
    So I'd propose instead of this to augment the existing InputMetrics with a count of the number of partitions coming from each input source. That way we just give the user all relevant information. I think we almost have this already, we just need to add a partition counter for each input source.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak closed the pull request at:

    https://github.com/apache/spark/pull/2411


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2411#issuecomment-55773715
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20395/consoleFull) for   PR 2411 at commit [`8a2000a`](https://github.com/apache/spark/commit/8a2000a71c3f9c0a0d58f02b035f7470621d1474).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2411#issuecomment-55794612
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20399/consoleFull) for   PR 2411 at commit [`8a2000a`](https://github.com/apache/spark/commit/8a2000a71c3f9c0a0d58f02b035f7470621d1474).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2411#issuecomment-62332025
  
    Let's close this issue for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/2411#issuecomment-55783030
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2411#issuecomment-55781856
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20395/consoleFull) for   PR 2411 at commit [`8a2000a`](https://github.com/apache/spark/commit/8a2000a71c3f9c0a0d58f02b035f7470621d1474).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class NonASCIICharacterChecker extends ScalariformChecker `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2411#issuecomment-55783640
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20399/consoleFull) for   PR 2411 at commit [`8a2000a`](https://github.com/apache/spark/commit/8a2000a71c3f9c0a0d58f02b035f7470621d1474).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org