You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by witgo <gi...@git.apache.org> on 2017/03/17 03:29:20 UTC

[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

GitHub user witgo opened a pull request:

    https://github.com/apache/spark/pull/17329

    [SPARK-19991]FileSegmentManagedBuffer performance improvement

    FileSegmentManagedBuffer performance improvement.
    
    
    ## What changes were proposed in this pull request?
    
    When we do not set the value of the configuration items `spark.storage.memoryMapThreshold` and `spark.shuffle.io.lazyFD`, 
    each call to the cFileSegmentManagedBuffer.nioByteBuffer or FileSegmentManagedBuffer.createInputStream method creates a NoSuchElementException instance. This is a more time-consuming operation.
    
    In the use case, this PR can improve the performance of about 3.5%
    
    The test code:
    
    ``` scala
    
    (1 to 10).foreach { i =>
      val numPartition = 10000
      val rdd = sc.parallelize(0 until numPartition).repartition(numPartition).flatMap { t =>
        (0 until numPartition).map(r => r * numPartition + t)
      }.repartition(numPartition)
      val serializeStart = System.currentTimeMillis()
      rdd.sum()
      val serializeFinish = System.currentTimeMillis()
      println(f"Test $i: ${(serializeFinish - serializeStart) / 1000D}%1.2f")
    }
    
    
    ```
    
    and `spark-defaults.conf` file:
    
    ```
    spark.master                                      yarn-client
    spark.executor.instances                          20
    spark.driver.memory                               64g
    spark.executor.memory                             30g
    spark.executor.cores                              5
    spark.default.parallelism                         100 
    spark.sql.shuffle.partitions                      100
    spark.serializer                                  org.apache.spark.serializer.KryoSerializer
    spark.driver.maxResultSize                        0
    spark.ui.enabled                                  false 
    spark.driver.extraJavaOptions                     -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=512M 
    spark.executor.extraJavaOptions                   -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=256M 
    spark.cleaner.referenceTracking.blocking          true
    spark.cleaner.referenceTracking.blocking.shuffle  true
    
    ```
    
    The test results are as follows
    
    | [SPARK-19991](https://github.com/witgo/spark/tree/SPARK-19991) |https://github.com/apache/spark/commit/68ea290b3aa89b2a539d13ea2c18bdb5a651b2bf|
    |---| --- | 
    |226.09 s| 235.21 s|
    
    ## How was this patch tested?
    
    Existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/witgo/spark SPARK-19991

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17329
    
----
commit abcfc79991ecd1d5cef2cd1e275b872695ba19d9
Author: Guoqiang Li <li...@huawei.com>
Date:   2017-03-17T03:19:37Z

    FileSegmentManagedBuffer performance improvement

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    **[Test build #74714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74714/testReport)** for PR 17329 at commit [`abcfc79`](https://github.com/apache/spark/commit/abcfc79991ecd1d5cef2cd1e275b872695ba19d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r108881597
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    Ping @witgo to update or close


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74727/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r107706851
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    Like the following code?
    
    ```java
      public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
        this.lazyFileDescriptor = conf.lazyFileDescriptor();
        this.memoryMapBytes = conf.memoryMapBytes();
        this.file = file;
        this.offset = offset;
        this.length = length;
      }
    ```
    
    the code `conf.lazyFileDescriptor();` or `conf.memoryMapBytes();`  creates a NoSuchElementException instance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r106781598
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    Oh, do you have a better idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r108067937
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    I'm simply describing what you proposed above at https://github.com/apache/spark/pull/17329#discussion_r107706851


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r108049460
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    Sorry,I didn't get your idea. Can you write some code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r107707746
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    Yes exactly. The point is you just check them once, right? this also accomplishes that goal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74714/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r108032669
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    This still doesn't address the point. What you say is true even if you make the change I suggest, which is to remove the superfluous constructor. The performance is exactly the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    ```java
    
    public class HadoopConfigProvider extends ConfigProvider {
      private final Configuration conf;
    
      public HadoopConfigProvider(Configuration conf) {
        this.conf = conf;
      }
    
      @Override
      public String get(String name) {
        String value = conf.get(name);
        // When do not set the value of spark.storage.memoryMapThreshold or spark.shuffle.io.lazyFD,
       //  When the value of `value` is null
        if (value == null) {
          throw new NoSuchElementException(name);
        }
        return value;
      }
    
      @Override
      public Iterable<Map.Entry<String, String>> getAll() {
        return conf;
      }
    
    }
    
    ````


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r107841105
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    Yes,  But the above code does not optimize performance, `FileSegmentManagedBuffer.convertToNetty` method  is also called only once  in the master branch code . 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r108031525
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    I'm not sure what you're stating, that this is slow? but you also only made half the changes as you did here. I don't see what the argument is. Removing the constructor you added can't have an effect on performance either way. In either case, you access the value once in the constructor. 
    
    I think all you're showing is that the first file changed here doesn't matter. You can revert this entirely too. It's ExternalBlockShufflerManager that matters.
    
    Make one or the other change and this can proceed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r107611158
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    No, why? I mean keep exactly the same constructors, no more no less. No code would change. You just set your two new fields in the current constructor. It actually means you don't need some of the changes you made here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74715/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r106778634
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    Yeah that makes sense then but I don't think you need a new public constructor for this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r106848680
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    Just don't add a new constructor. The existing one can set the new fields. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    **[Test build #74715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74715/testReport)** for PR 17329 at commit [`dc5fd8d`](https://github.com/apache/spark/commit/dc5fd8dda4717b485d4c4b2dfcdc5d115abf811c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by witgo <gi...@git.apache.org>.
Github user witgo closed the pull request at:

    https://github.com/apache/spark/pull/17329


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    **[Test build #74715 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74715/testReport)** for PR 17329 at commit [`dc5fd8d`](https://github.com/apache/spark/commit/dc5fd8dda4717b485d4c4b2dfcdc5d115abf811c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    OK. I think anyone's welcome to make this change without the new constructor. It sounded fine otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    **[Test build #74727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74727/testReport)** for PR 17329 at commit [`e0a3b62`](https://github.com/apache/spark/commit/e0a3b6286f8d1171d921fd1f90d88ca825b12b56).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r107846996
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    But if that's true, then the proposed changed to this class here already doesn't help anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    **[Test build #74714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74714/testReport)** for PR 17329 at commit [`abcfc79`](https://github.com/apache/spark/commit/abcfc79991ecd1d5cef2cd1e275b872695ba19d9).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r108032378
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    Suppose there are E Executor in the cluster, a shuffle process has M Map task, R reduce task, in the master branch will be created:
    
    1. Up to M * R FileSegmentManagedBuffer instances
    2. Up to 2 * M * R NoSuchElementException instances
    
    in this PR will be created:
    
    1. Up to M * R FileSegmentManagedBuffer instances
    2. Up to 2 * NoSuchElementException instances (ExternalShuffleBlockResolver and IndexShuffleBlockResolver are created once executor starts and They call the new constructor to create a FileSegmentManagedBuffer instance)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r107572297
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    That will change a lot of code, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17329#discussion_r108027203
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java ---
    @@ -37,13 +37,24 @@
      * A {@link ManagedBuffer} backed by a segment in a file.
      */
     public final class FileSegmentManagedBuffer extends ManagedBuffer {
    -  private final TransportConf conf;
    +  private final boolean lazyFileDescriptor;
    +  private final int memoryMapBytes;
       private final File file;
       private final long offset;
       private final long length;
     
       public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) {
    -    this.conf = conf;
    +    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
    +  }
    +
    +  public FileSegmentManagedBuffer(
    --- End diff --
    
    This branch [SPARK-19991_try2 ](https://github.com/witgo/spark/commits/SPARK-19991_try2) needs `244.45` s in my test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/17329
  
    Out of curiosity why does each call generate NoSuchElementException  ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org