You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by paul-rogers <gi...@git.apache.org> on 2018/01/11 01:01:41 UTC

[GitHub] drill pull request #1058: DRILL-6002: Avoid memory copy from direct buffer t...

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1058#discussion_r160841030
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java ---
    @@ -107,7 +107,7 @@
          * nodes provide insufficient local disk space)
          */
     
    -    private static final int TRANSFER_SIZE = 32 * 1024;
    +    private static final int TRANSFER_SIZE = 1024 * 1024;
    --- End diff --
    
    Is a 1MB buffer excessive? The point of a buffer is to ensure we write in units of a disk block. For the local file system, experience showed no gain after 32K. In the MapR FS, each write is in units of 1 MB. Does Hadoop have a preferred size?
    
    Given this variation, if we need large buffers, should we choose a buffer size based on the underlying file system? For example, is there a preferred size for S3?
    
    32K didn't seem large enough to worry about, even if we had 1000 fragments busily spilling. But 1MB? 1000 * 1 MB = 1GB, which starts becoming significant, especially in light of our efforts to reduce heap usage. Should we worry?


---