You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Matt Ryan (JIRA)" <ji...@apache.org> on 2018/07/20 17:41:00 UTC

[jira] [Comment Edited] (JCR-4335) API for direct binary access

    [ https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550887#comment-16550887 ] 

Matt Ryan edited comment on JCR-4335 at 7/20/18 5:40 PM:
---------------------------------------------------------

{quote} - do we really need to parametrize sizes and number of parts? I understand that the implementation doing the upload needs this, but why does it appear in the API?{quote}
I think they are necessary.  There are a few reasons for stating the number of parts, but they mostly center on the potential impact of a very large list of URIs, for example to a resulting JSON document.

Assume a JavaScript browser client is interacting with a web endpoint that, in turn, is invoking this API.  The JavaScript client wants to upload a binary directly, so it is requesting instructions on how to do that from the web endpoint.  The web endpoint would then call this API and obtain a {{BinaryUpload}} object that it then converts into a JSON document to return to the JavaScript client.  The JavaScript client or the web endpoint may have limitations on the size of the JSON document that it can support.

IIRC, Azure allows up to 10,000 upload parts in a multi-part upload.  S3 is even higher at 50,000.  In my testing, I've seen signed URIs over 500 characters long.  If a client were unable to specify the number of parts, a list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a JSON document just for the list of URIs itself.  This may or may not be a problem; only the client would know whether accepting a document that large is problematic.

The expected size of the upload is also needed for similar reasons, based on what the service provider capabilities are.  Some service providers require multi-part uploads for binaries above a certain size.  Some do not allow multi-part uploads of binaries smaller than a certain size.  Both Azure and S3 have limits as to the maximum size of a binary that can be uploaded.

If the implementation knows the expected upload size and the number of parts the client can accept, then it can determine whether it is possible to perform this upload directly or whether the client will need to try to upload it through the repository as has been done traditionally.  For example if the client wants to upload a 300MB binary but does not support multi-part uploading, if the service provider requires multi-part uploading above 250MB then this upload request will fail so the client cannot upload this binary directly to storage.  However, the Oak backend may be able to handle this upload without problems so it could be uploaded the traditional way.


was (Author: mattvryan):
{quote} - do we really need to parametrize sizes and number of parts? I understand that the implementation doing the upload needs this, but why does it appear in the API?{quote}
I think they are necessary.  There are a few reasons for stating the number of parts, but they mostly center on the impact to a resulting JSON document, for example.

Assume a JavaScript browser client is interacting with a web endpoint that, in turn, is invoking this API.  The JavaScript client wants to upload a binary directly, so it is requesting instructions on how to do that from the web endpoint.  The web endpoint would then call this API and obtain a {{BinaryUpload}} object that it then converts into a JSON document to return to the JavaScript client.  The JavaScript client or the web endpoint may have limitations on the size of the JSON document that it can support.

IIRC, Azure allows up to 10,000 upload parts in a multi-part upload.  S3 is even higher at 50,000.  In my testing, I've seen signed URIs over 500 characters long.  If a client were unable to specify the number of parts, a list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a JSON document just for the list of URIs itself.  This may or may not be a problem; only the client would know whether accepting a document that large is problematic.

The expected size of the upload is also needed for similar reasons, based on what the service provider capabilities are.  Some service providers require multi-part uploads for binaries above a certain size.  Some do not allow multi-part uploads of binaries smaller than a certain size.  Both Azure and S3 have limits as to the maximum size of a binary that can be uploaded.

If the implementation knows the expected upload size and the number of parts the client can accept, then it can determine whether it is possible to perform this upload directly or whether the client will need to try to upload it through the repository as has been done traditionally.  For example if the client wants to upload a 300MB binary but does not support multi-part uploading, if the service provider requires multi-part uploading above 250MB then this upload request will fail so the client cannot upload this binary directly to storage.  However, the Oak backend may be able to handle this upload without problems so it could be uploaded the traditional way.

> API for direct binary access
> ----------------------------
>
>                 Key: JCR-4335
>                 URL: https://issues.apache.org/jira/browse/JCR-4335
>             Project: Jackrabbit Content Repository
>          Issue Type: New Feature
>          Components: jackrabbit-api
>            Reporter: Marcel Reutegger
>            Assignee: Marcel Reutegger
>            Priority: Major
>         Attachments: JCR-4335.patch, JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the repository. One part of the proposal is to expose this new capability in the Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)