You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "ehoner (via GitHub)" <gi...@apache.org> on 2023/04/03 19:53:03 UTC

[GitHub] [samza] ehoner opened a new pull request, #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

ehoner opened a new pull request, #1662:
URL: https://github.com/apache/samza/pull/1662

   The existing implementation can cause GC performance issues and even OOM errors when the underlying buffer, `ByteArrayOutputStream`, is initialized to the maximum blob size (default is 10MB). This adds a new configuration parameter to control init size **and** sets the default to 32 (bytes). 32 is the default size for `ByteArrayOutputStream`, which will grow as needed. The [AzureBlobAvroWriter#L176](https://github.com/apache/samza/blob/03b187a6de0e123568f3ce3af94c946e6380fc8d/samza-azure/src/main/java/org/apache/samza/system/azureblob/avro/AzureBlobAvroWriter.java#L176) instance prevents "maximum blob size" from being exceeded, so the size does not need to be "guarded" by the `AzureBlobOutputStream`, although these responsibilities are not clearly separated in [SEP-26](https://cwiki.apache.org/confluence/display/SAMZA/SEP-26:+Azure+Blob+Storage+Producer). 
   
   #### GC Discussion
   The focus here is on the G1 GC, the default GC in Java 11+, and humongous objects (G1 specific).[^1] The G1 GC introduced a new memory management strategy that divides the Heap into regions, `-XX:G1HeapRegionSize=n`. The default behavior creates ~2048 regions that are a factor of 2 between 1MB and 32MB. Any object larger than half of a region size, is considered a humongous object. Humongous objects are allocated an entire region (or consecutive regions) and any remaining space is non-addressable for the life of the humongous object.[^2] A JVM heap size of 31GB, `-Xmx31G`, will default to 16MB regions, which means each buffer requires an entire region **and** prevent the use of 6MB, regardless of the how much data is in the buffer. This buffer size can also complicate memory allocation on `new`, when the JVM immediately promotes an object to Perm Gen because there is insufficient space in Eden and the G1 has a strict minimum space for Young Gen, the JVM can exit with an OOM if the
 re are no empty regions.[^3] 
   
   This significance of this issue is directly related to the number of buffers allocated. Systems allocating a large number of buffers are susceptible to this issue. Using the default size allows the JVM to allocate memory as needed and avoid designs that interfere with GC architecture. For any users that encounter issues caused by buffer growth, the configuration parameter allows them to tune their system accordingly.
   
   
   [^1]: "[Garbage-First Garbage Collector: Humongous Objects](https://docs.oracle.com/en/java/javase/11/gctuning/garbage-first-g1-garbage-collector1.html#GUID-D74F3CC7-CC9F-45B5-B03D-510AEEAC2DAC)"
   [^2]: "[What’s the deal with humongous objects in Java?](https://devblogs.microsoft.com/java/whats-the-deal-with-humongous-objects-in-java/)"
   [^3]: "[Part 1: Introduction to the G1 Garbage Collector](https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector)"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] vmaheshw commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "vmaheshw (via GitHub)" <gi...@apache.org>.
vmaheshw commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1214663767


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/avro/AzureBlobAvroWriter.java:
##########
@@ -108,19 +109,32 @@ public class AzureBlobAvroWriter implements AzureBlobWriter {
   private final String blobURLPrefix;
   private final long maxBlobSize;
   private final long maxRecordsPerBlob;
+  private final int initBufferSize;
   private final boolean useRandomStringInBlobName;
   private final Object currentDataFileWriterLock = new Object();
   private volatile long recordsInCurrentBlob = 0;
   private BlobMetadataGeneratorFactory blobMetadataGeneratorFactory;
   private Config blobMetadataGeneratorConfig;
   private String streamName;
 
+  @Deprecated

Review Comment:
   Why are you deprecating this? If the user does not care about the size, it can always start with the default size.



##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/avro/AzureBlobAvroWriterFactory.java:
##########
@@ -35,13 +36,29 @@ public class AzureBlobAvroWriterFactory implements AzureBlobWriterFactory {
   /**
    * {@inheritDoc}
    */
+  @Deprecated

Review Comment:
   Similar concern with Deprecation here



##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/AzureBlobConfig.java:
##########
@@ -80,6 +80,12 @@ public class AzureBlobConfig extends MapConfig {
   public static final String SYSTEM_MAX_FLUSH_THRESHOLD_SIZE = SYSTEM_AZUREBLOB_PREFIX + "maxFlushThresholdSize";
   private static final int SYSTEM_MAX_FLUSH_THRESHOLD_SIZE_DEFAULT = 10485760;
 
+  // initialization size of in-memory OutputStream
+  // This value should be between SYSTEM_INIT_BUFFER_SIZE_DEFAULT and getMaxFlushThresholdSize() exclusive.
+  public static final String SYSTEM_INIT_BUFFER_SIZE = SYSTEM_AZUREBLOB_PREFIX + "initBufferSize.bytes";

Review Comment:
   Does Samza initialize their configs in this format? Plz check the other examples and name it in similar fashion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] mynameborat commented on pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "mynameborat (via GitHub)" <gi...@apache.org>.
mynameborat commented on PR #1662:
URL: https://github.com/apache/samza/pull/1662#issuecomment-1507275275

   @ehoner can you please follow the PR guidelines outlined in this SEP - https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+PR+Title+And+Description+Guidelines 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] ehoner commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "ehoner (via GitHub)" <gi...@apache.org>.
ehoner commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1224450485


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/AzureBlobConfig.java:
##########
@@ -80,6 +80,12 @@ public class AzureBlobConfig extends MapConfig {
   public static final String SYSTEM_MAX_FLUSH_THRESHOLD_SIZE = SYSTEM_AZUREBLOB_PREFIX + "maxFlushThresholdSize";
   private static final int SYSTEM_MAX_FLUSH_THRESHOLD_SIZE_DEFAULT = 10485760;
 
+  // initialization size of in-memory OutputStream
+  // This value should be between SYSTEM_INIT_BUFFER_SIZE_DEFAULT and getMaxFlushThresholdSize() exclusive.
+  public static final String SYSTEM_INIT_BUFFER_SIZE = SYSTEM_AZUREBLOB_PREFIX + "initBufferSize.bytes";

Review Comment:
   I believe this is the correct way. I was following the guidelines from their "[Coding Guide](https://samza.apache.org/contribute/coding-guide.html)" _Configuration_. For reference, the rules I was applying:
   
   * All configuration names that define a byte size must end with .bytes (e.g. foo.bar.bytes=1000).
   * Configuration will always be defined as simple key/value pairs (e.g. a=b).
   * When configuration is related, it must be grouped using the same prefix (e.g. job.container.count=1, yarn.container.memory.bytes=1073741824).
   * All getter methods must be a camel case match with their configuration names (e.g. yarn.package.uri and getYarnPackageUri).
   * Reading configuration should only be done in factories and main methods. Don’t pass Config objects around.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] ehoner commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "ehoner (via GitHub)" <gi...@apache.org>.
ehoner commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1227312731


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/AzureBlobConfig.java:
##########
@@ -80,6 +80,12 @@ public class AzureBlobConfig extends MapConfig {
   public static final String SYSTEM_MAX_FLUSH_THRESHOLD_SIZE = SYSTEM_AZUREBLOB_PREFIX + "maxFlushThresholdSize";
   private static final int SYSTEM_MAX_FLUSH_THRESHOLD_SIZE_DEFAULT = 10485760;
 
+  // initialization size of in-memory OutputStream
+  // This value should be between SYSTEM_INIT_BUFFER_SIZE_DEFAULT and getMaxFlushThresholdSize() exclusive.
+  public static final String SYSTEM_INIT_BUFFER_SIZE = SYSTEM_AZUREBLOB_PREFIX + "initBufferSize.bytes";
+  // re-use size for parameterless constructor java.io.ByteArrayOutputStream()
+  public static final int SYSTEM_INIT_BUFFER_SIZE_DEFAULT = 32;

Review Comment:
   @mynameborat Can you provide an example? The comment above the field identifies the source. tbc, `ByteArrayOutputStream` has two constructors, and the value here is the default when using the [parameterless constructor](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/ByteArrayOutputStream.html#%3Cinit%3E()). 
   
   Fwiw, I am in-part also avoiding "magic" values. Where the implementation, `AzureBlobOutputStream`, would need to "understand" if the value is provided/set, ie. `-1` or `null`, and then select the "correct" constructor. This allows the config object to establish the "meaning"/intent of the value and create a "default" that is always usable by the implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] ehoner commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "ehoner (via GitHub)" <gi...@apache.org>.
ehoner commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1227313162


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/avro/AzureBlobAvroWriter.java:
##########
@@ -108,19 +109,32 @@ public class AzureBlobAvroWriter implements AzureBlobWriter {
   private final String blobURLPrefix;
   private final long maxBlobSize;
   private final long maxRecordsPerBlob;
+  private final int initBufferSize;
   private final boolean useRandomStringInBlobName;
   private final Object currentDataFileWriterLock = new Object();
   private volatile long recordsInCurrentBlob = 0;
   private BlobMetadataGeneratorFactory blobMetadataGeneratorFactory;
   private Config blobMetadataGeneratorConfig;
   private String streamName;
 
+  @Deprecated

Review Comment:
   It isn't used elsewhere in the samza codebase. It has been removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] ehoner commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "ehoner (via GitHub)" <gi...@apache.org>.
ehoner commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1227313294


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/avro/AzureBlobAvroWriterFactory.java:
##########
@@ -35,13 +36,29 @@ public class AzureBlobAvroWriterFactory implements AzureBlobWriterFactory {
   /**
    * {@inheritDoc}
    */
+  @Deprecated

Review Comment:
   Also not used elsewhere in samza and removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] ehoner commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "ehoner (via GitHub)" <gi...@apache.org>.
ehoner commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1232865188


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/AzureBlobConfig.java:
##########
@@ -80,6 +80,12 @@ public class AzureBlobConfig extends MapConfig {
   public static final String SYSTEM_MAX_FLUSH_THRESHOLD_SIZE = SYSTEM_AZUREBLOB_PREFIX + "maxFlushThresholdSize";
   private static final int SYSTEM_MAX_FLUSH_THRESHOLD_SIZE_DEFAULT = 10485760;
 
+  // initialization size of in-memory OutputStream
+  // This value should be between SYSTEM_INIT_BUFFER_SIZE_DEFAULT and getMaxFlushThresholdSize() exclusive.
+  public static final String SYSTEM_INIT_BUFFER_SIZE = SYSTEM_AZUREBLOB_PREFIX + "initBufferSize.bytes";
+  // re-use size for parameterless constructor java.io.ByteArrayOutputStream()
+  public static final int SYSTEM_INIT_BUFFER_SIZE_DEFAULT = 32;

Review Comment:
   @mynameborat I didn't see any examples of GC warnings or notes. I added instructions for when the value should be changed and what else (outside of samza) might need to be changed too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] mynameborat commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "mynameborat (via GitHub)" <gi...@apache.org>.
mynameborat commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1232923050


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/AzureBlobConfig.java:
##########
@@ -80,6 +80,12 @@ public class AzureBlobConfig extends MapConfig {
   public static final String SYSTEM_MAX_FLUSH_THRESHOLD_SIZE = SYSTEM_AZUREBLOB_PREFIX + "maxFlushThresholdSize";
   private static final int SYSTEM_MAX_FLUSH_THRESHOLD_SIZE_DEFAULT = 10485760;
 
+  // initialization size of in-memory OutputStream
+  // This value should be between SYSTEM_INIT_BUFFER_SIZE_DEFAULT and getMaxFlushThresholdSize() exclusive.
+  public static final String SYSTEM_INIT_BUFFER_SIZE = SYSTEM_AZUREBLOB_PREFIX + "initBufferSize.bytes";
+  // re-use size for parameterless constructor java.io.ByteArrayOutputStream()
+  public static final int SYSTEM_INIT_BUFFER_SIZE_DEFAULT = 32;

Review Comment:
   Sounds good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] mynameborat commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "mynameborat (via GitHub)" <gi...@apache.org>.
mynameborat commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1232620072


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/AzureBlobConfig.java:
##########
@@ -80,6 +80,12 @@ public class AzureBlobConfig extends MapConfig {
   public static final String SYSTEM_MAX_FLUSH_THRESHOLD_SIZE = SYSTEM_AZUREBLOB_PREFIX + "maxFlushThresholdSize";
   private static final int SYSTEM_MAX_FLUSH_THRESHOLD_SIZE_DEFAULT = 10485760;
 
+  // initialization size of in-memory OutputStream
+  // This value should be between SYSTEM_INIT_BUFFER_SIZE_DEFAULT and getMaxFlushThresholdSize() exclusive.
+  public static final String SYSTEM_INIT_BUFFER_SIZE = SYSTEM_AZUREBLOB_PREFIX + "initBufferSize.bytes";
+  // re-use size for parameterless constructor java.io.ByteArrayOutputStream()
+  public static final int SYSTEM_INIT_BUFFER_SIZE_DEFAULT = 32;

Review Comment:
   There is a [1] configuration-table.html in the code base where you can document something about the config you introduced above and also explain about the details.
   
   [1] https://github.com/apache/samza/blob/master/docs/learn/documentation/versioned/jobs/configuration-table.html
   



##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/avro/AzureBlobAvroWriter.java:
##########
@@ -108,19 +109,32 @@ public class AzureBlobAvroWriter implements AzureBlobWriter {
   private final String blobURLPrefix;
   private final long maxBlobSize;
   private final long maxRecordsPerBlob;
+  private final int initBufferSize;
   private final boolean useRandomStringInBlobName;
   private final Object currentDataFileWriterLock = new Object();
   private volatile long recordsInCurrentBlob = 0;
   private BlobMetadataGeneratorFactory blobMetadataGeneratorFactory;
   private Config blobMetadataGeneratorConfig;
   private String streamName;
 
+  @Deprecated

Review Comment:
   I though you were going to remove this? Why keep this constructor around?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] mynameborat merged pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "mynameborat (via GitHub)" <gi...@apache.org>.
mynameborat merged PR #1662:
URL: https://github.com/apache/samza/pull/1662


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] mynameborat commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "mynameborat (via GitHub)" <gi...@apache.org>.
mynameborat commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1227250936


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/AzureBlobConfig.java:
##########
@@ -80,6 +80,12 @@ public class AzureBlobConfig extends MapConfig {
   public static final String SYSTEM_MAX_FLUSH_THRESHOLD_SIZE = SYSTEM_AZUREBLOB_PREFIX + "maxFlushThresholdSize";
   private static final int SYSTEM_MAX_FLUSH_THRESHOLD_SIZE_DEFAULT = 10485760;
 
+  // initialization size of in-memory OutputStream
+  // This value should be between SYSTEM_INIT_BUFFER_SIZE_DEFAULT and getMaxFlushThresholdSize() exclusive.
+  public static final String SYSTEM_INIT_BUFFER_SIZE = SYSTEM_AZUREBLOB_PREFIX + "initBufferSize.bytes";
+  // re-use size for parameterless constructor java.io.ByteArrayOutputStream()
+  public static final int SYSTEM_INIT_BUFFER_SIZE_DEFAULT = 32;

Review Comment:
   Can you provide some context through comments here on why and how this value is chosen and what its impact is?
   E.g., what was the previous default provided through config or defaulted somehow (through code) that caused issues so that folks can be mindful about making changes to these defaults in future.



##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/avro/AzureBlobAvroWriter.java:
##########
@@ -108,19 +109,32 @@ public class AzureBlobAvroWriter implements AzureBlobWriter {
   private final String blobURLPrefix;
   private final long maxBlobSize;
   private final long maxRecordsPerBlob;
+  private final int initBufferSize;
   private final boolean useRandomStringInBlobName;
   private final Object currentDataFileWriterLock = new Object();
   private volatile long recordsInCurrentBlob = 0;
   private BlobMetadataGeneratorFactory blobMetadataGeneratorFactory;
   private Config blobMetadataGeneratorConfig;
   private String streamName;
 
+  @Deprecated

Review Comment:
   Is this constructor used somewhere else other than the factory? If not, I'd go ahead and remove this code since the factory handles the logic on passing in the buffer size.
   
   The contract for this class is clear that going forward buffer size needs to be provided.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] ehoner commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "ehoner (via GitHub)" <gi...@apache.org>.
ehoner commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1224450052


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/avro/AzureBlobAvroWriter.java:
##########
@@ -108,19 +109,32 @@ public class AzureBlobAvroWriter implements AzureBlobWriter {
   private final String blobURLPrefix;
   private final long maxBlobSize;
   private final long maxRecordsPerBlob;
+  private final int initBufferSize;
   private final boolean useRandomStringInBlobName;
   private final Object currentDataFileWriterLock = new Object();
   private volatile long recordsInCurrentBlob = 0;
   private BlobMetadataGeneratorFactory blobMetadataGeneratorFactory;
   private Config blobMetadataGeneratorConfig;
   private String streamName;
 
+  @Deprecated

Review Comment:
   I was deprecating this because it wasn't clear to me that the class _should_ have multiple constructors. And marking as deprecated avoids the major version bump requirement for public APIs. I believe the expectation is that config values are not expected to be referenced throughout the code, but maybe default values are acceptable. 
   
   I will follow up with Samza committers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] ehoner commented on a diff in pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "ehoner (via GitHub)" <gi...@apache.org>.
ehoner commented on code in PR #1662:
URL: https://github.com/apache/samza/pull/1662#discussion_r1232858832


##########
samza-azure/src/main/java/org/apache/samza/system/azureblob/AzureBlobConfig.java:
##########
@@ -80,6 +80,12 @@ public class AzureBlobConfig extends MapConfig {
   public static final String SYSTEM_MAX_FLUSH_THRESHOLD_SIZE = SYSTEM_AZUREBLOB_PREFIX + "maxFlushThresholdSize";
   private static final int SYSTEM_MAX_FLUSH_THRESHOLD_SIZE_DEFAULT = 10485760;
 
+  // initialization size of in-memory OutputStream
+  // This value should be between SYSTEM_INIT_BUFFER_SIZE_DEFAULT and getMaxFlushThresholdSize() exclusive.
+  public static final String SYSTEM_INIT_BUFFER_SIZE = SYSTEM_AZUREBLOB_PREFIX + "initBufferSize.bytes";
+  // re-use size for parameterless constructor java.io.ByteArrayOutputStream()
+  public static final int SYSTEM_INIT_BUFFER_SIZE_DEFAULT = 32;

Review Comment:
   @mynameborat ty for pointing to the file. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [samza] mynameborat commented on pull request #1662: SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable.

Posted by "mynameborat (via GitHub)" <gi...@apache.org>.
mynameborat commented on PR #1662:
URL: https://github.com/apache/samza/pull/1662#issuecomment-1507271241

   @PawasChhokra can you review this change? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org