You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/03/23 20:22:38 UTC

[GitHub] [hadoop] bilaharith opened a new pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

bilaharith opened a new pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907
 
 
   HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399027766
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java
 ##########
 @@ -41,8 +41,11 @@
   public static final String AZURE_WRITE_BUFFER_SIZE = "fs.azure.write.request.size";
   public static final String AZURE_READ_BUFFER_SIZE = "fs.azure.read.request.size";
   public static final String AZURE_BLOCK_SIZE_PROPERTY_NAME = "fs.azure.block.size";
-  public static final String AZURE_BLOCK_LOCATION_HOST_PROPERTY_NAME = "fs.azure.block.location.impersonatedhost";
+  public static final String AZURE_BLOCK_LOCATION_HOST_PROPERTY_NAME = "fs.azure.block.location.impersonatedhost";;
 
 Review comment:
   Remove delta 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399027883
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java
 ##########
 @@ -49,7 +49,11 @@
   public static final int DEFAULT_AZURE_LIST_MAX_RESULTS = 500;
 
   public static final int MAX_CONCURRENT_READ_THREADS = 12;
-  public static final int MAX_CONCURRENT_WRITE_THREADS = 8;
+  public static final int DEFAULT_WRITE_CONCURRENCY_FACTOR = 4;
 
 Review comment:
   concurrency reduced to 4 ? why.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399027614
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
 ##########
 @@ -416,33 +417,39 @@ public OutputStream createFile(final Path path, final boolean overwrite, final F
                                  final FsPermission umask) throws AzureBlobFileSystemException {
     try (AbfsPerfInfo perfInfo = startTracking("createFile", "createPath")) {
       boolean isNamespaceEnabled = getIsNamespaceEnabled();
-      LOG.debug("createFile filesystem: {} path: {} overwrite: {} permission: {} umask: {} isNamespaceEnabled: {}",
-              client.getFileSystem(),
-              path,
-              overwrite,
-              permission.toString(),
-              umask.toString(),
-              isNamespaceEnabled);
+      LOG.debug(
+          "createFile filesystem: {} path: {} overwrite: {} permission: {} umask: {} isNamespaceEnabled: {}",
+          client.getFileSystem(), path, overwrite, permission.toString(), umask.toString(),
+          isNamespaceEnabled);
 
-        boolean appendBlob = false;
-        if (isAppendBlobKey(path.toString())) {
-          appendBlob = true;
-        }
+      boolean appendBlob = false;
+      if (isAppendBlobKey(path.toString())) {
+        appendBlob = true;
+      }
 
-      client.createPath(AbfsHttpConstants.FORWARD_SLASH + getRelativePath(path), true, overwrite,
+      client.createPath(AbfsHttpConstants.FORWARD_SLASH + getRelativePath(path),
+          true, overwrite,
           isNamespaceEnabled ? getOctalNotation(permission) : null,
-          isNamespaceEnabled ? getOctalNotation(umask) : null,
-          appendBlob);
-
-      return new AbfsOutputStream(
-          client,
-          AbfsHttpConstants.FORWARD_SLASH + getRelativePath(path),
-          0,
-          abfsConfiguration.getWriteBufferSize(),
+          isNamespaceEnabled ? getOctalNotation(umask) : null, appendBlob);
+
+      if (abfsConfiguration.shouldUseOlderAbfsOutputStream()) {
+        return new AbfsOutputStreamOld(
+            client,
+            AbfsHttpConstants.FORWARD_SLASH + getRelativePath(path),
+            0,
+            abfsConfiguration.getWriteBufferSize(),
+            abfsConfiguration.isFlushEnabled(),
+            abfsConfiguration.isOutputStreamFlushDisabled(),
+            abfsConfiguration.isAppendWithFlushEnabled(),
+            appendBlob);
+      }
+      return new AbfsOutputStream(client,
 
 Review comment:
   New line needed after a block end

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r403502506
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java
 ##########
 @@ -49,7 +49,11 @@
   public static final int DEFAULT_AZURE_LIST_MAX_RESULTS = 500;
 
   public static final int MAX_CONCURRENT_READ_THREADS = 12;
-  public static final int MAX_CONCURRENT_WRITE_THREADS = 8;
+  public static final int DEFAULT_WRITE_CONCURRENCY_FACTOR = 4;
 
 Review comment:
   Not changed, Removed an unused constant. This value 4 is currently hardcoded in AbfsOutputStream class. Now made it a config.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r403502608
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsByteBufferPool.java
 ##########
 @@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+
+import java.util.concurrent.ArrayBlockingQueue;
+
+import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.MAX_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE;
+import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE;
+
+/**
+ * Pool for byte[]
+ */
+public class AbfsByteBufferPool {
+
+  /**
+   * Queue holding the free buffers.
+   */
+  private ArrayBlockingQueue<byte[]> freeBuffers;
+  /**
+   * Count to track the buffers issued and yet to be returned.
+   */
+  private int numBuffersInUse;
+  /**
+   * Maximum number of buffers that can be in use.
+   */
+  private int maxBuffersInUse;
+  private int bufferSize;
+
+  /**
+   * @param bufferSize                 Size of the byte[] to be returned.
+   * @param maxFreeBuffers             Maximum number of buffers that cab
+   *                                   reside in the pool.
+   * @param maxWriteMemUsagePercentage Maximum percentage of memory that can
+   *                                   be used by the pool from the max
+   *                                   available memory.
+   */
+  public AbfsByteBufferPool(final int bufferSize, final int maxFreeBuffers,
+      final int maxWriteMemUsagePercentage) {
+    Preconditions.checkArgument(maxWriteMemUsagePercentage
+            >= MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE
+            && maxWriteMemUsagePercentage
+            <= MAX_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE,
+        "maxWriteMemUsagePercentage should be in range (%s - %s)",
+        MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE,
+        MAX_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE);
+    Preconditions
+        .checkArgument(maxFreeBuffers > 0, "maxFreeBuffers cannot be < 1");
+    this.bufferSize = bufferSize;
+    this.numBuffersInUse = 0;
+    freeBuffers = new ArrayBlockingQueue<>(maxFreeBuffers);
+
+    double maxMemoryAllowedForPool =
+        Runtime.getRuntime().maxMemory() * maxWriteMemUsagePercentage / 100;
+    double bufferCountByMemory = maxMemoryAllowedForPool / bufferSize;
+    double bufferCountByMaxFreeBuffers =
+        maxFreeBuffers + Runtime.getRuntime().availableProcessors();
+
+    maxBuffersInUse = (int) Math
+        .ceil(Math.min(bufferCountByMemory, bufferCountByMaxFreeBuffers));
+    if (maxBuffersInUse < 2) {
+      maxBuffersInUse = 2;
+    }
+  }
+
+  /**
+   * @return byte[] from the pool if available otherwise new byte[] is returned.
+   * Waits if pool is empty and already maximum number of buffers are in use.
+   */
+  public byte[] get() {
+    byte[] byteArray = null;
+    synchronized (this) {
 
 Review comment:
   There will be synchronisation issues as comparison with the value won't be atomic operations.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399030377
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
 ##########
 @@ -100,19 +99,62 @@ public AbfsOutputStream(
     this.appendBlob = appendBlob;
     this.lastError = null;
     this.lastFlushOffset = 0;
-    this.bufferSize = bufferSize;
-    this.buffer = byteBufferPool.getBuffer(false, bufferSize).array();
     this.bufferIndex = 0;
-    this.writeOperations = new ConcurrentLinkedDeque<>();
-
-    this.maxConcurrentRequestCount = 4 * Runtime.getRuntime().availableProcessors();
-    this.threadExecutor
-        = new ThreadPoolExecutor(maxConcurrentRequestCount,
-        maxConcurrentRequestCount,
-        10L,
-        TimeUnit.SECONDS,
-        new LinkedBlockingQueue<>());
-    this.completionService = new ExecutorCompletionService<>(this.threadExecutor);
+
+    init(abfsConfiguration);
+    buffer = new byte[bufferSize];
+  }
+
+  private void init(final AbfsConfiguration conf) {
+    if (isCommonPoolsInitialised()) {
+      return;
+    }
+
+    initWriteBufferPool(conf);
 
 Review comment:
   As discussed, all implementations not tied to the AbfsOutputStream should reside outside this class.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r403502322
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
 ##########
 @@ -416,33 +417,39 @@ public OutputStream createFile(final Path path, final boolean overwrite, final F
                                  final FsPermission umask) throws AzureBlobFileSystemException {
     try (AbfsPerfInfo perfInfo = startTracking("createFile", "createPath")) {
       boolean isNamespaceEnabled = getIsNamespaceEnabled();
-      LOG.debug("createFile filesystem: {} path: {} overwrite: {} permission: {} umask: {} isNamespaceEnabled: {}",
-              client.getFileSystem(),
-              path,
-              overwrite,
-              permission.toString(),
-              umask.toString(),
-              isNamespaceEnabled);
+      LOG.debug(
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399028369
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsByteBufferPool.java
 ##########
 @@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+
+import java.util.concurrent.ArrayBlockingQueue;
+
+import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.MAX_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE;
+import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE;
+
+/**
+ * Pool for byte[]
+ */
+public class AbfsByteBufferPool {
+
+  /**
+   * Queue holding the free buffers.
+   */
+  private ArrayBlockingQueue<byte[]> freeBuffers;
+  /**
+   * Count to track the buffers issued and yet to be returned.
+   */
+  private int numBuffersInUse;
+  /**
+   * Maximum number of buffers that can be in use.
+   */
+  private int maxBuffersInUse;
+  private int bufferSize;
+
+  /**
+   * @param bufferSize                 Size of the byte[] to be returned.
+   * @param maxFreeBuffers             Maximum number of buffers that cab
+   *                                   reside in the pool.
+   * @param maxWriteMemUsagePercentage Maximum percentage of memory that can
+   *                                   be used by the pool from the max
+   *                                   available memory.
+   */
+  public AbfsByteBufferPool(final int bufferSize, final int maxFreeBuffers,
+      final int maxWriteMemUsagePercentage) {
+    Preconditions.checkArgument(maxWriteMemUsagePercentage
+            >= MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE
+            && maxWriteMemUsagePercentage
+            <= MAX_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE,
+        "maxWriteMemUsagePercentage should be in range (%s - %s)",
+        MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE,
+        MAX_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE);
+    Preconditions
+        .checkArgument(maxFreeBuffers > 0, "maxFreeBuffers cannot be < 1");
+    this.bufferSize = bufferSize;
+    this.numBuffersInUse = 0;
+    freeBuffers = new ArrayBlockingQueue<>(maxFreeBuffers);
+
+    double maxMemoryAllowedForPool =
+        Runtime.getRuntime().maxMemory() * maxWriteMemUsagePercentage / 100;
+    double bufferCountByMemory = maxMemoryAllowedForPool / bufferSize;
+    double bufferCountByMaxFreeBuffers =
+        maxFreeBuffers + Runtime.getRuntime().availableProcessors();
+
+    maxBuffersInUse = (int) Math
 
 Review comment:
   if max is determined at constructor, how can the this bufferpool be utilized better memory available increases post this point.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399029873
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
 ##########
 @@ -62,33 +62,32 @@
   private long lastFlushOffset;
   private long lastTotalAppendOffset = 0;
 
-  private final int bufferSize;
+  private static int bufferSize;
 
 Review comment:
   Why should this be static and not final ? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r403502335
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
 ##########
 @@ -416,33 +417,39 @@ public OutputStream createFile(final Path path, final boolean overwrite, final F
                                  final FsPermission umask) throws AzureBlobFileSystemException {
     try (AbfsPerfInfo perfInfo = startTracking("createFile", "createPath")) {
       boolean isNamespaceEnabled = getIsNamespaceEnabled();
-      LOG.debug("createFile filesystem: {} path: {} overwrite: {} permission: {} umask: {} isNamespaceEnabled: {}",
-              client.getFileSystem(),
-              path,
-              overwrite,
-              permission.toString(),
-              umask.toString(),
-              isNamespaceEnabled);
+      LOG.debug(
+          "createFile filesystem: {} path: {} overwrite: {} permission: {} umask: {} isNamespaceEnabled: {}",
+          client.getFileSystem(), path, overwrite, permission.toString(), umask.toString(),
+          isNamespaceEnabled);
 
-        boolean appendBlob = false;
-        if (isAppendBlobKey(path.toString())) {
-          appendBlob = true;
-        }
+      boolean appendBlob = false;
+      if (isAppendBlobKey(path.toString())) {
+        appendBlob = true;
+      }
 
-      client.createPath(AbfsHttpConstants.FORWARD_SLASH + getRelativePath(path), true, overwrite,
+      client.createPath(AbfsHttpConstants.FORWARD_SLASH + getRelativePath(path),
+          true, overwrite,
           isNamespaceEnabled ? getOctalNotation(permission) : null,
-          isNamespaceEnabled ? getOctalNotation(umask) : null,
-          appendBlob);
-
-      return new AbfsOutputStream(
-          client,
-          AbfsHttpConstants.FORWARD_SLASH + getRelativePath(path),
-          0,
-          abfsConfiguration.getWriteBufferSize(),
+          isNamespaceEnabled ? getOctalNotation(umask) : null, appendBlob);
+
+      if (abfsConfiguration.shouldUseOlderAbfsOutputStream()) {
+        return new AbfsOutputStreamOld(
+            client,
+            AbfsHttpConstants.FORWARD_SLASH + getRelativePath(path),
+            0,
+            abfsConfiguration.getWriteBufferSize(),
+            abfsConfiguration.isFlushEnabled(),
+            abfsConfiguration.isOutputStreamFlushDisabled(),
+            abfsConfiguration.isAppendWithFlushEnabled(),
+            appendBlob);
+      }
+      return new AbfsOutputStream(client,
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r403502393
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java
 ##########
 @@ -41,8 +41,11 @@
   public static final String AZURE_WRITE_BUFFER_SIZE = "fs.azure.write.request.size";
   public static final String AZURE_READ_BUFFER_SIZE = "fs.azure.read.request.size";
   public static final String AZURE_BLOCK_SIZE_PROPERTY_NAME = "fs.azure.block.size";
-  public static final String AZURE_BLOCK_LOCATION_HOST_PROPERTY_NAME = "fs.azure.block.location.impersonatedhost";
+  public static final String AZURE_BLOCK_LOCATION_HOST_PROPERTY_NAME = "fs.azure.block.location.impersonatedhost";;
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399029674
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsByteBufferPool.java
 ##########
 @@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+
+import java.util.concurrent.ArrayBlockingQueue;
+
+import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.MAX_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE;
+import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE;
+
+/**
+ * Pool for byte[]
+ */
+public class AbfsByteBufferPool {
+
+  /**
+   * Queue holding the free buffers.
+   */
+  private ArrayBlockingQueue<byte[]> freeBuffers;
+  /**
+   * Count to track the buffers issued and yet to be returned.
+   */
+  private int numBuffersInUse;
+  /**
+   * Maximum number of buffers that can be in use.
+   */
+  private int maxBuffersInUse;
+  private int bufferSize;
+
+  /**
+   * @param bufferSize                 Size of the byte[] to be returned.
+   * @param maxFreeBuffers             Maximum number of buffers that cab
+   *                                   reside in the pool.
+   * @param maxWriteMemUsagePercentage Maximum percentage of memory that can
+   *                                   be used by the pool from the max
+   *                                   available memory.
+   */
+  public AbfsByteBufferPool(final int bufferSize, final int maxFreeBuffers,
+      final int maxWriteMemUsagePercentage) {
+    Preconditions.checkArgument(maxWriteMemUsagePercentage
+            >= MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE
+            && maxWriteMemUsagePercentage
+            <= MAX_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE,
+        "maxWriteMemUsagePercentage should be in range (%s - %s)",
+        MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE,
+        MAX_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE);
+    Preconditions
+        .checkArgument(maxFreeBuffers > 0, "maxFreeBuffers cannot be < 1");
+    this.bufferSize = bufferSize;
+    this.numBuffersInUse = 0;
+    freeBuffers = new ArrayBlockingQueue<>(maxFreeBuffers);
+
+    double maxMemoryAllowedForPool =
+        Runtime.getRuntime().maxMemory() * maxWriteMemUsagePercentage / 100;
+    double bufferCountByMemory = maxMemoryAllowedForPool / bufferSize;
+    double bufferCountByMaxFreeBuffers =
+        maxFreeBuffers + Runtime.getRuntime().availableProcessors();
+
+    maxBuffersInUse = (int) Math
+        .ceil(Math.min(bufferCountByMemory, bufferCountByMaxFreeBuffers));
+    if (maxBuffersInUse < 2) {
+      maxBuffersInUse = 2;
+    }
+  }
+
+  /**
+   * @return byte[] from the pool if available otherwise new byte[] is returned.
+   * Waits if pool is empty and already maximum number of buffers are in use.
+   */
+  public byte[] get() {
+    byte[] byteArray = null;
+    synchronized (this) {
 
 Review comment:
   instead of locking for counter, why not use AtomicInteger ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399027244
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
 ##########
 @@ -416,33 +417,39 @@ public OutputStream createFile(final Path path, final boolean overwrite, final F
                                  final FsPermission umask) throws AzureBlobFileSystemException {
     try (AbfsPerfInfo perfInfo = startTracking("createFile", "createPath")) {
       boolean isNamespaceEnabled = getIsNamespaceEnabled();
-      LOG.debug("createFile filesystem: {} path: {} overwrite: {} permission: {} umask: {} isNamespaceEnabled: {}",
-              client.getFileSystem(),
-              path,
-              overwrite,
-              permission.toString(),
-              umask.toString(),
-              isNamespaceEnabled);
+      LOG.debug(
 
 Review comment:
   Do not introduce any formatting changes unless needed by actual change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r403707938
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
 ##########
 @@ -62,33 +62,32 @@
   private long lastFlushOffset;
   private long lastTotalAppendOffset = 0;
 
-  private final int bufferSize;
+  private static int bufferSize;
 
 Review comment:
   static final variables cannot be initialised in constructor. And we cannot initialise the same either during declaration or in a static block.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399026980
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
 ##########
 @@ -713,7 +731,7 @@ void setReadBufferSize(int bufferSize) {
   }
 
   @VisibleForTesting
-  void setWriteBufferSize(int bufferSize) {
+  public void setWriteBufferSize(int bufferSize) {
 
 Review comment:
   why public ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
snvijaya commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r399030896
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsReadWriteAndSeek.java
 ##########
 @@ -21,6 +21,8 @@
 import java.util.Arrays;
 import java.util.Random;
 
+import org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream;
 
 Review comment:
   Did you try using profiling tools before and after the change ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances

Posted by GitBox <gi...@apache.org>.
bilaharith commented on a change in pull request #1907: HADOOP-16854 Fix to prevent OutOfMemoryException and Make the threadpool and bytebuffer pool common across all AbfsOutputStream instances
URL: https://github.com/apache/hadoop/pull/1907#discussion_r403501864
 
 

 ##########
 File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
 ##########
 @@ -713,7 +731,7 @@ void setReadBufferSize(int bufferSize) {
   }
 
   @VisibleForTesting
-  void setWriteBufferSize(int bufferSize) {
+  public void setWriteBufferSize(int bufferSize) {
 
 Review comment:
   Changed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org