You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liyezhang556520 <gi...@git.apache.org> on 2014/08/26 12:37:00 UTC

[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

GitHub user liyezhang556520 opened a pull request:

    https://github.com/apache/spark/pull/2134

    [SPARK-3000][CORE] drop old blocks to disk in parallel when memory is no...

    ...t large enough for caching new blocks
    
    Currently, old blocks dropping for new blocks' caching are processed by one thread at the same time. Which can not fully utilize the disk throughput. If the to be dropped block size is huge, then the dropping time will be very long. We need to make it processed in parallel. In this patch, dropping blocks operation are processed in multiple threads, before dropping, each thread will select the blocks that to be dropped for itself.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liyezhang556520/spark spark-3000-v0.4.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2134.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2134
    
----
commit 357dae839034490bf83b8fdadb413cdef32f2e8b
Author: Zhang, Liye <li...@intel.com>
Date:   2014-08-26T10:20:30Z

    [SPARK-3000][CORE] drop old blocks to disk in parallel when memory is not large enough for caching new blocks
    
    Currently, old blocks dropping for new blocks' caching are processed by one thread at the same time. Which can not fully utilize the disk throughput. If the to be dropped block size is huge, then the dropping time will be very long. We need to make it processed in parallel. In this patch, dropping blocks operation are processed in multiple threads, before dropping, each thread will select the blocks that to be dropped for itself.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54698590
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19858/consoleFull) for   PR 2134 at commit [`71765eb`](https://github.com/apache/spark/commit/71765eb24b615c7d77bd9e080eacd735ec72bb09).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53515848
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19252/consoleFull) for   PR 2134 at commit [`3299414`](https://github.com/apache/spark/commit/329941440ddf9caf88f4dc5a35bf4f5c9f56424e).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ExternalSorter(object):`
      * `protected class AttributeEquals(val a: Attribute) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54118785
  
    @pwendell I think they are duplicated in JIRA (I didn't discovered there is a similar JIRA before I opened a new one). But the two PR are based on different code base. This PR is based on [SPARK-1777]/[#1165](https://github.com/apache/spark/pull/1165), which has much difference from the logic of before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2134#discussion_r16878218
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
    @@ -200,81 +248,118 @@ private[spark] class MemoryStore(blockManager: BlockManager, maxMemory: Long)
        * checking whether the memory restrictions for unrolling blocks are still satisfied,
        * stopping immediately if not. This check is a safeguard against the scenario in which
        * there is not enough free memory to accommodate the entirety of a single block.
    +   * 
    +   * When there is not enough memory for unrolling blocks, old blocks will be dropped from
    +   * memory. The dropping operation is in parallel to fully utilized the disk throughput
    +   * when there are multiple disks. And befor dropping, each thread will mark the old blocks
    +   * that can be dropped.
        *
        * This method returns either an array with the contents of the entire block or an iterator
        * containing the values of the block (if the array would have exceeded available memory).
        */
    +
       def unrollSafely(
    -      blockId: BlockId,
    -      values: Iterator[Any],
    -      droppedBlocks: ArrayBuffer[(BlockId, BlockStatus)])
    -    : Either[Array[Any], Iterator[Any]] = {
    +    blockId: BlockId,
    --- End diff --
    
    @ScrapCodes 
    updated, thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93648457
  
      [Test build #30397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30397/consoleFull) for   PR 2134 at commit [`3192a6d`](https://github.com/apache/spark/commit/3192a6de7254107c038ab2b0b6868295bee11231).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-67598353
  
    Hi @andrewor14 ,
    [PR#3629](https://github.com/apache/spark/pull/3629) solved the problem that I pointed out in your original patch [PR#1165](https://github.com/apache/spark/pull/1165), you can check the comment history on Aug 12th.
    This PR mainly not focus on this bug, but resolved this bug meanwhile. 
    This PR mainly focus on the disk IO issue, which is memory dropping problem. There is only one thread dropping memory when cached RDD memory need to evict to disk. This problem also pointed out in [PR#791](https://github.com/apache/spark/pull/791). The main difference between this PR and [PR#791](https://github.com/apache/spark/pull/791) is that this PR also make the `tryToPut` process in parallel. And the memory maintain will be more complex. Also this PR make some change with testSuite file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54244727
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19624/consoleFull) for   PR 2134 at commit [`71765eb`](https://github.com/apache/spark/commit/71765eb24b615c7d77bd9e080eacd735ec72bb09).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53512784
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19252/consoleFull) for   PR 2134 at commit [`3299414`](https://github.com/apache/spark/commit/329941440ddf9caf88f4dc5a35bf4f5c9f56424e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54116534
  
    @mridulm , @tdas , @andrewor14 , @ScrapCodes Can one of you help review the code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93650200
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30398/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54115957
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19578/consoleFull) for   PR 2134 at commit [`f2f2c62`](https://github.com/apache/spark/commit/f2f2c6259cea1c21d7ff5567a2e41f38f6e7656e).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ByteArrayChunkOutputStream(chunkSize: Int) extends OutputStream `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93648267
  
      [Test build #30397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30397/consoleFull) for   PR 2134 at commit [`3192a6d`](https://github.com/apache/spark/commit/3192a6de7254107c038ab2b0b6868295bee11231).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54119463
  
    @pwendell 
    This patch also fix some existing bugs introduced from [SPARK-1777]. Since [SPARK-1777] need to resolve the OOM problem, the logic of the original code is changed a lot, and then it becomes more complicated to make the dropping blocks operation in parallel, that's why there need 5X more code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54050427
  
    There is something simillar in #791


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53407378
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19207/consoleFull) for   PR 2134 at commit [`357dae8`](https://github.com/apache/spark/commit/357dae839034490bf83b8fdadb413cdef32f2e8b).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93666873
  
      [Test build #30400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30400/consoleFull) for   PR 2134 at commit [`c248156`](https://github.com/apache/spark/commit/c24815624f9e51debfe1ca8e4859c1a58b30a42a).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-67558309
  
    Hey @liyezhang556520 sorry for the delay I'll take a look at the design doc you posted in a day or two.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93650039
  
      [Test build #30398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30398/consoleFull) for   PR 2134 at commit [`3192a6d`](https://github.com/apache/spark/commit/3192a6de7254107c038ab2b0b6868295bee11231).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54247818
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19624/consoleFull) for   PR 2134 at commit [`71765eb`](https://github.com/apache/spark/commit/71765eb24b615c7d77bd9e080eacd735ec72bb09).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54050560
  
    And some of the comment there applies to this patch as well.. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53827962
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19443/consoleFull) for   PR 2134 at commit [`9ec7d36`](https://github.com/apache/spark/commit/9ec7d367c58e06bd5cd9ef0fdedabeaf69701f96).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53402614
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19207/consoleFull) for   PR 2134 at commit [`357dae8`](https://github.com/apache/spark/commit/357dae839034490bf83b8fdadb413cdef32f2e8b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-69635330
  
    @liyezhang556520 I would like to fix the issue you raised in #1165 first (i.e. SPARK-4777) before looking at SPARK-3000, which seems to me more like an optimization. Let's agree on a solution in #3629 before making more progress in this PR, since it seems that there are logical conflicts between the two PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-66560620
  
    Hi @andrewor14 , do you have time to take a look at this patch? [SPARK-4777] is supposed to be fixed here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-67558505
  
    Quick question though, what does this patch provide that #3629 doesn't? It seems that they're both trying to solve the same problem but this one is much bigger (I haven't looked at the code in detail yet)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54116132
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19579/consoleFull) for   PR 2134 at commit [`6604e9a`](https://github.com/apache/spark/commit/6604e9a8b0873d4bc894af31a5599a99475d5a51).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93648460
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30397/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 closed the pull request at:

    https://github.com/apache/spark/pull/2134


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-136918237
  
    @liyezhang556520 this issue has mostly gone stale at this point, and I'm not sure if it's applicable anymore given some of the latest changes in master. Unfortunately I won't have the bandwidth to review this further and push this patch forward. I think we should close this patch for now and reopen it later if there's interest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54115828
  
    @ScrapCodes Thanks for your comment! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53520694
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19268/consoleFull) for   PR 2134 at commit [`3299414`](https://github.com/apache/spark/commit/329941440ddf9caf88f4dc5a35bf4f5c9f56424e).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `"$FWDIR"/bin/spark-submit --class $CLASS "$`
      * `class ExternalSorter(object):`
      * `"$FWDIR"/bin/spark-submit --class $CLASS "$`
      * `protected class AttributeEquals(val a: Attribute) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54113146
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19579/consoleFull) for   PR 2134 at commit [`6604e9a`](https://github.com/apache/spark/commit/6604e9a8b0873d4bc894af31a5599a99475d5a51).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2134#discussion_r16832952
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
    @@ -291,54 +376,71 @@ private[spark] class MemoryStore(blockManager: BlockManager, maxMemory: Long)
        * an Array if deserialized is true or a ByteBuffer otherwise. Its (possibly estimated) size
        * must also be passed by the caller.
        *
    -   * Synchronize on `accountingLock` to ensure that all the put requests and its associated block
    -   * dropping is done by only on thread at a time. Otherwise while one thread is dropping
    -   * blocks to free memory for one block, another thread may use up the freed space for
    -   * another block.
    -   *
    +   * In order to drop old blocks in parallel, we will first mark the blocks that can be dropped
    +   * when there is not enough memory. 
    +   * 
        * Return whether put was successful, along with the blocks dropped in the process.
        */
    -  private def tryToPut(
    -      blockId: BlockId,
    -      value: Any,
    -      size: Long,
    -      deserialized: Boolean): ResultWithDroppedBlocks = {
     
    -    /* TODO: Its possible to optimize the locking by locking entries only when selecting blocks
    -     * to be dropped. Once the to-be-dropped blocks have been selected, and lock on entries has
    -     * been released, it must be ensured that those to-be-dropped blocks are not double counted
    -     * for freeing up more space for another block that needs to be put. Only then the actually
    -     * dropping of blocks (and writing to disk if necessary) can proceed in parallel. */
    +  private def tryToPut(
    +    blockId: BlockId,
    --- End diff --
    
    same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54112614
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19578/consoleFull) for   PR 2134 at commit [`f2f2c62`](https://github.com/apache/spark/commit/f2f2c6259cea1c21d7ff5567a2e41f38f6e7656e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54117989
  
    Can you explain how this differs from SPARK-1888/#791? Is this just a duplicate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by suyanNone <gi...@git.apache.org>.
Github user suyanNone commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-67126094
  
    yes, its duplicate with your patch
    I just see you patch title "parallel drop to disk"... so I don't see the code in detail. I already close my patch.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-55543007
  
    @andrewor14 any comment on my explanation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93652175
  
      [Test build #30400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30400/consoleFull) for   PR 2134 at commit [`c248156`](https://github.com/apache/spark/commit/c24815624f9e51debfe1ca8e4859c1a58b30a42a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53517030
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54572675
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93650198
  
      [Test build #30398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30398/consoleFull) for   PR 2134 at commit [`3192a6d`](https://github.com/apache/spark/commit/3192a6de7254107c038ab2b0b6868295bee11231).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-62029257
  
    Hey @liyezhang556520 sorry I've been swamped with the 1.2 release. I will look at this shortly after that's out of the window


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54118968
  
    But this one is 5X more code, so I'm just wondering if there is a difference in the feature set...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-69528774
  
    Hi @andrewor14 , I don't know if you have reproduced this issue. Since I know most of your cases are tested on Amazon EC2 which are equipped with SSD. And even one SSD's throughput may can be up to more than 3 HDDs' . So that this problem may not that obvious on your cluster. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53515001
  
    @andrewor14
    Can you help review the code?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54695085
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19858/consoleFull) for   PR 2134 at commit [`71765eb`](https://github.com/apache/spark/commit/71765eb24b615c7d77bd9e080eacd735ec72bb09).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53517358
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19268/consoleFull) for   PR 2134 at commit [`3299414`](https://github.com/apache/spark/commit/329941440ddf9caf88f4dc5a35bf4f5c9f56424e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-54121834
  
    @pwendell And also, SPARK-1888/[#791](https://github.com/apache/spark/pull/791) has a problem to maintain the freeMemory, the freeMemory is not changed for next blocks to tryToPut after the previous blocks are finished selecting to-be-dropped blocks (which means previous blocks will reserve the freeMemory, and freeMemory should be changed for next blocks). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-61951704
  
      [Test build #22998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22998/consoleFull) for   PR 2134 at commit [`73b3339`](https://github.com/apache/spark/commit/73b33392ecb600a7bf7d1b39439625c100ff1021).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53830392
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19443/consoleFull) for   PR 2134 at commit [`9ec7d36`](https://github.com/apache/spark/commit/9ec7d367c58e06bd5cd9ef0fdedabeaf69701f96).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-61964351
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22998/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-140318245
  
    ok, I'll close this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93649867
  
    jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-53517663
  
    @liyezhang556520 I'm a little swamped with the 1.1 release at the moment, but I'll try to look at this soon after we put out some fires there. Thanks for your PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-93666889
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30400/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-61964345
  
      [Test build #22998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22998/consoleFull) for   PR 2134 at commit [`73b3339`](https://github.com/apache/spark/commit/73b33392ecb600a7bf7d1b39439625c100ff1021).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-55356380
  
    Hi @andrewor14 , you are really right about this, which is also my concern, and I tried to make the risk to the least. Allow me to tell a story here:
    One reason I removed configuration of **spark.storage.unrollFraction** is that the **unrollFraction** is set to be a fixed value, however, in some workload, assume there are many iteration of the application, and each iteration has blocks to be cached, for each iteration, the required cache memory for new blocks may various, and this has some side-effect, for some iteration too many old blocks dropped leading to  time wasting, and for some iteration not enough old blocks dropped which will lead to the new blocks dropped to disk or else while there are still many old blocks can be dropped for new blocks for caching. And also, when there is not so many old blocks that can be dropped (the memory of old blocks can be dropped is less than `maxMemory * unrollFraction`), the `ensureFreeSpace` will always return false. so it's hard for user to decide the value of `unrollFraction`. The other reason is for easy implementation of dropping old blocks in parallel.
    
    For OOM problem, it's really hard to avoid, since there are two places have the risk in this patch and one of the two also exists in the original implementation. 
    1. when we process blocks in `unrollSafely`, we will go through the `iterator` and to see if the new block partitions can be put into memory, and the checkperiod is `memoryCheckPeriod`, default value 16. Since we have no idea what is the memory value is required for each iteration, and this process is in parallel with many threads, the memory has occupied by the new block partitions for the first round check might be already very huge. This might cause OOM when the memory is already around the edge of it's capacity. This situations exists in both this patch and origin implementation.
    2. The second place is where you pointed out. Yes, in this patch, We lazy drop the old blocks when new blocks are to unroll in `unrollSafely`. In my implementation, for each check period, if old blocks need to drop, then only the least number of old blocks will be marked to be dropped for the current thread, just satisfy the required value of the new block partition. And then dropped those marked old blocks to disk, and continue going through the iteration for next checkpoint. Since only the least number of blocks will be dropped, which will make the difference of the tobedropped memory and tobeunroll memory to the least. And only the difference value will have effects the `freeMemoryForUnroll`, which will have effect to other threads unrolling process.
    
    There are two phases need to drop blocks in the whole procedure, one is `unrollSafely`, and the other is `tryToPut`, there will no OOM risk for `tryToPut` since all data when calling `tryToPut` has been already in  memory.
    
    Fortunately there is `spark.storage.safetyFraction` to lower the risk deeper, but the OOM risk will still exists I think.
    
    Another way is just drop the new blocks to disk when there is not enough free memory, which will not dropping old blocks at all, and in this way can also gain a lot performance speedup compared with dropping old block in serial. And performance is very close to dropping old blocks in parallel in our test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by liyezhang556520 <gi...@git.apache.org>.
Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-61959007
  
    @andrewor14, I have rebased the code and updated a [spark-3000 design doc](https://issues.apache.org/jira/secure/attachment/12679822/Spark-3000%20Design%20Doc.pdf), Would you please take a look and help to review the code? I think current code has get rid of the OOM risk.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2134#issuecomment-55337314
  
    Hi @liyezhang556520, I did a cursory glance of your changes and I have a high-level question before we dig deeper. While we drop the blocks in parallel, we still need to occupy the chunk of memory that was held by the old blocks that we're dropping. However, the whole point of unrolling new blocks safely is to ensure that we don't use more memory than is available in the JVM. Doesn't this introduce a potential condition where we unroll the new block quicker than we drop the old block, and we can still run out of memory?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2134#discussion_r16832936
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
    @@ -200,81 +248,118 @@ private[spark] class MemoryStore(blockManager: BlockManager, maxMemory: Long)
        * checking whether the memory restrictions for unrolling blocks are still satisfied,
        * stopping immediately if not. This check is a safeguard against the scenario in which
        * there is not enough free memory to accommodate the entirety of a single block.
    +   * 
    +   * When there is not enough memory for unrolling blocks, old blocks will be dropped from
    +   * memory. The dropping operation is in parallel to fully utilized the disk throughput
    +   * when there are multiple disks. And befor dropping, each thread will mark the old blocks
    +   * that can be dropped.
        *
        * This method returns either an array with the contents of the entire block or an iterator
        * containing the values of the block (if the array would have exceeded available memory).
        */
    +
       def unrollSafely(
    -      blockId: BlockId,
    -      values: Iterator[Any],
    -      droppedBlocks: ArrayBuffer[(BlockId, BlockStatus)])
    -    : Either[Array[Any], Iterator[Any]] = {
    +    blockId: BlockId,
    --- End diff --
    
    incorrect indentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org