You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by redsanket <gi...@git.apache.org> on 2017/08/14 15:21:39 UTC

[GitHub] spark pull request #18940: YSPARK-734 Change CacheLoader to limit entries ba...

GitHub user redsanket opened a pull request:

    https://github.com/apache/spark/pull/18940

    YSPARK-734 Change CacheLoader to limit entries based on memory footprint

    Right now the spark shuffle service has a cache for index files. It is based on a # of files cached (spark.shuffle.service.index.cache.entries). This can cause issues if people have a lot of reducers because the size of each entry can fluctuate based on the # of reducers.
    We saw an issues with a job that had 170000 reducers and it caused NM with spark shuffle service to use 700-800MB or memory in NM by itself.
    We should change this cache to be memory based and only allow a certain memory size used. When I say memory based I mean the cache should have a limit of say 100MB.
    
    https://issues.apache.org/jira/browse/SPARK-21501
    
    Manual Testing with 170000 reducers has been performed with cache loaded up to max 100MB default limit, with each shuffle index file of size 1.3MB. Eviction takes place as soon as the total cache size reaches the 100MB limit and the objects will be ready for garbage collection there by avoiding NM to crash. No notable difference in runtime has been observed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/redsanket/spark SPARK-21501

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18940
    
----
commit f23a4c79b69fd1f8a77162da34b8821cb0cc1352
Author: Sanket Chintapalli <sc...@yahoo-inc.com>
Date:   2017-07-27T14:59:40Z

    YSPARK-734 Change CacheLoader to limit entries based on memory footprint

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    +1, I'm going to merge as it appears all comments addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18940: [SPARK-21501] Change CacheLoader to limit entries...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18940


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    @kiszk wouldn't the updated release notes/docs take care of that, which configs can no longer be used and which are not. I don't mind adding a warning msg saying please use another cache.size instead of cache.entries or providing two alternate implementations based on entries/size. I would like to see what other PMC's think about this @tgravescs @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Thanks @vanzin @kiszk will do, makes sense to me now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    **[Test build #80696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80696/testReport)** for PR 18940 at commit [`8826ca3`](https://github.com/apache/spark/commit/8826ca3e8ea1e9d0b120c53a4fc70af7f6e7d5d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Can you fix the title? Doesn't look like this is related to SPARK-734.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    @kiszk I dont think that would be ideal, it is better to backport the feature itself to a desired version or branch, having two conflicting configs for the same task is not ideal, if that is what you mean, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18940: [SPARK-21501] Change CacheLoader to limit entries...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18940#discussion_r133726526
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java ---
    @@ -104,15 +105,21 @@ public ExternalShuffleBlockResolver(TransportConf conf, File registeredExecutorF
           Executor directoryCleaner) throws IOException {
         this.conf = conf;
         this.registeredExecutorFile = registeredExecutorFile;
    -    int indexCacheEntries = conf.getInt("spark.shuffle.service.index.cache.entries", 1024);
    +    String indexCacheSize = conf.get("spark.shuffle.service.index.cache.size", "100m");
    --- End diff --
    
    Let's create a new config in `internal/config`, then we can use it by `package$.MODULE$.SHUFFLE_INDEX_CACHE_SIZE()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    LGTM, also cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    +1. Any further comments. @vanzin @jiangxb1987 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: SPARK-21501 Change CacheLoader to limit entries based on...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    @dbolshak there were no unit tests for google cache implementation here before, I could add a simple test to check for cache behavior if it is necessary but ideally a scale test is necessary to understand the shuffleCacheIndex behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    If you're removing a public config, you should at least add it to `SparkConf.deprecatedConfigs`. It would be nice, but not required, to have some kind of mapping of the old value to the new (in which case you'd use `SparkConf.configsWithAlternatives` instead).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    **[Test build #80636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80636/testReport)** for PR 18940 at commit [`f23a4c7`](https://github.com/apache/spark/commit/f23a4c79b69fd1f8a77162da34b8821cb0cc1352).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18940: [SPARK-21501] Change CacheLoader to limit entries...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18940#discussion_r134092710
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -597,7 +597,8 @@ private[spark] object SparkConf extends Logging {
           DeprecatedConfig("spark.scheduler.executorTaskBlacklistTime", "2.1.0",
             "Please use the new blacklisting options, spark.blacklist.*"),
           DeprecatedConfig("spark.yarn.am.port", "2.0.0", "Not used any more"),
    -      DeprecatedConfig("spark.executor.port", "2.0.0", "Not used any more")
    +      DeprecatedConfig("spark.executor.port", "2.0.0", "Not used any more"),
    +      DeprecatedConfig("spark.shuffle.service.index.cache.entries", "2.3.0", "Not used any more")
    --- End diff --
    
    +1 We better let user know what the alternative config is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80931/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: SPARK-21501 Change CacheLoader to limit entries based on...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    **[Test build #80691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80691/testReport)** for PR 18940 at commit [`e9afdf7`](https://github.com/apache/spark/commit/e9afdf7060148f26611223b957df83d15112d324).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    **[Test build #80636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80636/testReport)** for PR 18940 at commit [`f23a4c7`](https://github.com/apache/spark/commit/f23a4c79b69fd1f8a77162da34b8821cb0cc1352).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    **[Test build #80931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80931/testReport)** for PR 18940 at commit [`09a17c6`](https://github.com/apache/spark/commit/09a17c67ed7d79391c7841dffe11fa012aa69f27).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18940: YSPARK-734 Change CacheLoader to limit entries ba...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18940#discussion_r133077743
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java ---
    @@ -104,15 +105,22 @@ public ExternalShuffleBlockResolver(TransportConf conf, File registeredExecutorF
           Executor directoryCleaner) throws IOException {
         this.conf = conf;
         this.registeredExecutorFile = registeredExecutorFile;
    -    int indexCacheEntries = conf.getInt("spark.shuffle.service.index.cache.entries", 1024);
    +    String indexCacheSize = conf.get("spark.shuffle.service.index.cache.size", "100m");
         CacheLoader<File, ShuffleIndexInformation> indexCacheLoader =
             new CacheLoader<File, ShuffleIndexInformation>() {
               public ShuffleIndexInformation load(File file) throws IOException {
                 return new ShuffleIndexInformation(file);
               }
             };
    -    shuffleIndexCache = CacheBuilder.newBuilder()
    -                                    .maximumSize(indexCacheEntries).build(indexCacheLoader);
    +    shuffleIndexCache =
    +        CacheBuilder.newBuilder()
    +                    .maximumWeight(JavaUtils.byteStringAsBytes(indexCacheSize))
    --- End diff --
    
    nit: these lines are indented way too far (the previous code was, too).
    
    See block above this one for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18940: [SPARK-21501] Change CacheLoader to limit entries...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18940#discussion_r133782968
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java ---
    @@ -104,15 +105,21 @@ public ExternalShuffleBlockResolver(TransportConf conf, File registeredExecutorF
           Executor directoryCleaner) throws IOException {
         this.conf = conf;
         this.registeredExecutorFile = registeredExecutorFile;
    -    int indexCacheEntries = conf.getInt("spark.shuffle.service.index.cache.entries", 1024);
    +    String indexCacheSize = conf.get("spark.shuffle.service.index.cache.size", "100m");
    --- End diff --
    
    internal/config is in spark core not in the common/network code. we shouldn't be using the core version here as it will add an extra dependency that isn't needed and then will be required to ship in like the external shuffle jar, etc.
    I prefer to leave it as is and if we want  a config builder in the common network code we do that as a separate jira.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    @redsanket I am thinking about the case that the same configuration file, which explicitly sets a value (e.g. 4096) into `spark.shuffle.service.index.cache.entries`, is used in Spark 2.3.
    The user may feel strange since the performance would be degraded. If Spark 2.3 will completely ignore `spark.shuffle.service.index.cache.entries`, it would be good to show a warning message when `spark.shuffle.service.index.cache.entries` is explicitly set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80696/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18940: SPARK-21501 Change CacheLoader to limit entries b...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18940#discussion_r133220047
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java ---
    @@ -104,15 +105,22 @@ public ExternalShuffleBlockResolver(TransportConf conf, File registeredExecutorF
           Executor directoryCleaner) throws IOException {
         this.conf = conf;
         this.registeredExecutorFile = registeredExecutorFile;
    -    int indexCacheEntries = conf.getInt("spark.shuffle.service.index.cache.entries", 1024);
    +    String indexCacheSize = conf.get("spark.shuffle.service.index.cache.size", "100m");
         CacheLoader<File, ShuffleIndexInformation> indexCacheLoader =
             new CacheLoader<File, ShuffleIndexInformation>() {
               public ShuffleIndexInformation load(File file) throws IOException {
                 return new ShuffleIndexInformation(file);
               }
             };
    -    shuffleIndexCache = CacheBuilder.newBuilder()
    -                                    .maximumSize(indexCacheEntries).build(indexCacheLoader);
    +    shuffleIndexCache =
    +        CacheBuilder.newBuilder()
    +                    .maximumWeight(JavaUtils.byteStringAsBytes(indexCacheSize))
    --- End diff --
    
    yeah the prev code actually made me to follow the convention, ok will revert to 2 space indentation thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: SPARK-21501 Change CacheLoader to limit entries based on...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    nit: title should be "`[SPARK-21501] ...`".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

Posted by dbolshak <gi...@git.apache.org>.
Github user dbolshak commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    LGTM, btw, no unit tests for the change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80636/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: SPARK-21501 Change CacheLoader to limit entries based on...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    I like this feature.
    For backward compatibility, how about referring to  `spark.shuffle.service.index.cache.entries` only if `spark.shuffle.service.index.cache.entries` is explicitly declared.
    If `spark.shuffle.service.index.cache.size` is specified, it should be referred.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80691/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18940: [SPARK-21501] Change CacheLoader to limit entries...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18940#discussion_r134058845
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -597,7 +597,8 @@ private[spark] object SparkConf extends Logging {
           DeprecatedConfig("spark.scheduler.executorTaskBlacklistTime", "2.1.0",
             "Please use the new blacklisting options, spark.blacklist.*"),
           DeprecatedConfig("spark.yarn.am.port", "2.0.0", "Not used any more"),
    -      DeprecatedConfig("spark.executor.port", "2.0.0", "Not used any more")
    +      DeprecatedConfig("spark.executor.port", "2.0.0", "Not used any more"),
    +      DeprecatedConfig("spark.shuffle.service.index.cache.entries", "2.3.0", "Not used any more")
    --- End diff --
    
    It would be good to mention the new config that's replacing it in the warning message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    **[Test build #80931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80931/testReport)** for PR 18940 at commit [`09a17c6`](https://github.com/apache/spark/commit/09a17c67ed7d79391c7841dffe11fa012aa69f27).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    **[Test build #80696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80696/testReport)** for PR 18940 at commit [`8826ca3`](https://github.com/apache/spark/commit/8826ca3e8ea1e9d0b120c53a4fc70af7f6e7d5d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by redsanket <gi...@git.apache.org>.
Github user redsanket commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    @vanzin addressed the config comment thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18940
  
    **[Test build #80691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80691/testReport)** for PR 18940 at commit [`e9afdf7`](https://github.com/apache/spark/commit/e9afdf7060148f26611223b957df83d15112d324).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org