You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2020/05/22 19:43:55 UTC

[GitHub] [samza] cameronlee314 commented on a change in pull request #1364: AzureBlobSystemProducer: Enable adding of number of records in blob as metadata of the blob

cameronlee314 commented on a change in pull request #1364:
URL: https://github.com/apache/samza/pull/1364#discussion_r429424131



##########
File path: samza-azure/src/main/java/org/apache/samza/system/azureblob/avro/AzureBlobOutputStream.java
##########
@@ -230,6 +231,10 @@ public synchronized void releaseBuffer() throws IOException {
     }
   }
 
+  public synchronized void incrementNumberOfRecordsInBlob() {

Review comment:
       I suggest adding some javadocs here to clarify how this needs to be used (i.e. only for passing metadata for the blob when closing the blob).
   Since `DataFileWriter` may still be buffering some data, this `totalNumberOfRecordsInBlob` may not reflect the exact number of records in the output stream until flush/close are actually called. Also, it would be good to clarify that this needs to be atomically incremented along with appending the record to the `DataFileWriter` (or else the final record count may not be correct when closing the blob). It looks like you are doing the atomic update already in `AzureBlobAvroWriter`, but the general management of writers and output streams is a bit complex, so it would be good to add some more details.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org