You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Matt Ryan (JIRA)" <ji...@apache.org> on 2019/08/17 02:13:00 UTC

[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI

    [ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909545#comment-16909545 ] 

Matt Ryan commented on OAK-8552:
--------------------------------

The entry point for getting a direct download URI begins with a {{Binary}} instance and the {{getURI()}} call.

Known causes of network requests in this call:
 * Starting at [https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/plugins/value/jcr/BinaryImpl.java#L96] - the call to {{getReference()}} calls through the blob implementation into {{DataStoreBlobStore#getReference()}} which calls {{AbstractSharedCachingDataStore#getRecordIfStored()}}. If the blob is not cached this will result in a call to the backend's {{getRecord()}}.  For {{AzureBlobStoreBackend}}, for example, this actually currently makes two network calls - one to check if the blob exists, and another to get the blob metadata needed to construct the {{DataRecord}}.  (See [https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-blob-cloud-azure/src/main/java/org/apache/jackrabbit/oak/blob/cloud/azure/blobstorage/AzureBlobStoreBackend.java#L355).]  But all that is really needed in this case is the reference, which can be obtained from the back end directly using the blob id - no network calls required.  Furthermore, the reason we are even trying to get the reference in the first place is to determine if this blob is stored inline or not.  Maybe there is a better way to determine this.
 * Starting at [https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/plugins/value/jcr/BinaryImpl.java#L107] - the call to {{getDownloadURI()}} eventually results in a call to the data store implementation's {{getDownloadURI()}} method.  In the case of {{AzureDataStore}}, this calls into the backend's {{createHttpDownloadURI()}} method which (now, due to OAK-7998) is checking that the binary exists - a network call - before creating the signed download URI.  Note that creating the download URI doesn't require the network call, but checking for the existence of the blob ID does.

In a benchmark test I showed that creating 1000 download URIs took just over 40000 milliseconds, averaging around 40 milliseconds per request.  This result is actually not that bad - but removing the existence check and running the test again dropped the time to 147 milliseconds for all 1000 URIs.  So we can see that if the network latency is bad this could potentially be a problem.

> Minimize network calls required when creating a direct download URI
> -------------------------------------------------------------------
>
>                 Key: OAK-8552
>                 URL: https://issues.apache.org/jira/browse/OAK-8552
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: blob-cloud, blob-cloud-azure
>            Reporter: Matt Ryan
>            Priority: Major
>
> We need to isolate and try to optimize network calls required to create a direct download URI.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)