You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@jclouds.apache.org by "Andrew Gaul (Jira)" <ji...@apache.org> on 2019/10/29 05:36:00 UTC
[jira] [Updated] (JCLOUDS-1521) Automatic computation of content
length for input streams
[ https://issues.apache.org/jira/browse/JCLOUDS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Gaul updated JCLOUDS-1521:
---------------------------------
Component/s: jclouds-blobstore
> Automatic computation of content length for input streams
> ----------------------------------------------------------
>
> Key: JCLOUDS-1521
> URL: https://issues.apache.org/jira/browse/JCLOUDS-1521
> Project: jclouds
> Issue Type: Improvement
> Components: jclouds-blobstore
> Affects Versions: 2.1.1
> Reporter: Alexander Tsvetkov
> Priority: Major
>
> I have a REST API that allows upload of potentially large files (up to 4GBs). Due to the size, I cannot load these files in-memory, as that could quickly crash my application. I also don't want to store them as temporary files, since that could fill up my disk if a lot of people decide to upload at the same time.
> Instead, I want to process the incoming files as InputStreams and forward them to the S3 object store. I understand that this is not possible directly, since S3 requires the content length to be known before the upload. However, I saw on StackOverflow ([https://stackoverflow.com/questions/8653146/can-i-stream-a-file-upload-to-s3-without-a-content-length-header]) that it's possible to workaround this problem by reading the InputStream in memory in chunks of 5 (or more) MBs and uploading these chunks via the S3 multipart upload API. As a result, I assume that I'll be able to upload a 4GB file, by having no more than 5 MBs of its content stored in-memory at any given time.
> I tried to do so with JClouds (version 2.1.1), but I've hit a problem. I have the following code:
> {code:java}
> Blob blob = blobStore.blobBuilder(name)
> .payload(inputStream)
> ...
> .build();
> blobStore.putBlob(container, blob, PutOptions.Builder.multipart());
> {code}
> If I run it like this, I get a NullPointerException, because I didn't specify the content's length:
> {code:java}
> java.lang.NullPointerException: while trying to invoke the method java.lang.Long.longValue() of a null object returned from org.jclouds.io.MutableContentMetadata.getContentLength()
> at org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356)
> at org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:347)
> at org.jclouds.aws.s3.blobstore.AWSS3BlobStore.putBlob(AWSS3BlobStore.java:79)
> {code}
> I think it would be possible for JClouds to compute the size of the InputStream dynamically:
> # Slice the stream into chunks of X MBs and store the chunks in-memory (where X has a default value but is also configurable).
> # Upload the chunks sequentially - the content length header can be set to X MBs.
> # Finalize the multipart upload.
> That way, no more than X MBs will be stored in memory for any given upload.
> Would you accept a pull request for this?
> PS: I've set the priority to blocker, because we really can't use JClouds for our upload right now, because of the memory and disk space concerns listed above.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)