You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2020/09/25 22:57:06 UTC

[GitHub] [nifi] turcsanyip commented on a change in pull request #4556: NIFI-7830: Support large files in PutAzureDataLakeStorage

turcsanyip commented on a change in pull request #4556:
URL: https://github.com/apache/nifi/pull/4556#discussion_r495299416



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java
##########
@@ -120,11 +122,29 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
 
                 final long length = flowFile.getSize();
                 if (length > 0) {
-                    try (final InputStream rawIn = session.read(flowFile); final BufferedInputStream in = new BufferedInputStream(rawIn)) {
-                        fileClient.append(in, 0, length);
+                    long chunkStart = 0;
+                    long chunkSize;
+
+                    try (final InputStream rawIn = session.read(flowFile);
+                         final BufferedInputStream in = new BufferedInputStream(rawIn) {
+                             @Override
+                             public int available() {
+                                 // com.azure.storage.common.Utility.convertStreamToByteBuffer() throws an exception
+                                 // if there are more available bytes in the stream after reading the chunk
+                                 return 0;

Review comment:
       @MuazmaZ Do you happen to know why `Utility.convertStreamToByteBuffer()` throws an exception when `available() > 0`?
   https://github.com/Azure/azure-sdk-for-java/blob/0345889402425191b7003e73b7b3d6ea3c0a5175/sdk/storage/azure-storage-common/src/main/java/com/azure/storage/common/Utility.java#L268
   
   Due to this, it is not possible to process a longer input stream in portions / chunks.
   As a workaround, I added a fake `available()` method to lie there is no more data in the input stream which is not really nice but works.
   Another option would be to read the chunks in a loop into a byte array on our side and pass a stream on the byte array to the Azure client lib. But I would rather avoid this extra copy and extra memory for the buffer.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org