You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Matt Ryan <os...@mvryan.org> on 2018/06/20 23:21:39 UTC

Oak Direct Binary Access pull request

Hi,

A pull request [0] has been submitted containing a proposal for a Direct
Binary Access feature in Oak.  The proposed feature is described at [1].
In a nutshell, it outlines a mechanism by which direct access to binary
data in a cloud-based Oak data store can be made available via signed URLs
with short TTLs.  Such a capability would have a significant positive
impact on Oak scalability.

I’m emailing to request review and discussion based on the proposal.  As
acknowledged in the wiki, there is some similarity to discussions we’ve had
in the past ([2], [3], [4]) but the approach in this proposal is slightly
different.


[0] - https://github.com/apache/jackrabbit-oak/pull/88
[1] - https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access
[2] - https://issues.apache.org/jira/browse/OAK-6575
[3] - https://markmail.org/thread/7eiwvkuv3ybv2vyz
[4] - https://markmail.org/thread/zh6zxdxytnyonqms


Regards,

-MR

Re: Oak Direct Binary Access pull request

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Matt,

On Wed, Jun 27, 2018 at 6:11 PM Matt Ryan <os...@mvryan.org> wrote:
> ...Of course Sling could take the return value from the call to initiate the
> upload and turn it into a JSON document that the client can then consume.
> As you say the client will need to have some knowledge of the new API to do
> this...

Ok, agreed, it looks like we'll need to make Sling clients aware of
various upload mechanisms if we want to go that route.

Thanks for the clarifications (and thanks Julian for the note on
relative links, makes sense!).

-Bertrand

Re: Oak Direct Binary Access pull request

Posted by Julian Reschke <ju...@gmx.de>.

On 2018-06-27 18:10, Matt Ryan wrote:
> Hi Bertrand,
> 
> On June 27, 2018 at 4:33:05 AM, Bertrand Delacretaz (bdelacretaz@apache.org)
> wrote:
> 
> Hi Matt,
> 
>>From the Sling clients perspective I'm interested in making this
> somewhat transparent, maybe something like:
> 
> For downloads, a client requests
> http://my.sling.instance/somebinary.jpg and is redirected to
> https://somecloudprovider/23874623748623746234782634273846237846723864.jpg
> 
> For uploads, it's a bit more complicated - maybe the client POSTing to
> Sling receives a 307 status with a JSON document that describes
> where/how to upload. In this case the client requires some knowledge
> of this new API, unless someone has a better idea.
> 
> Do you see any obstacles in implementing something like this on top of
> your suggested API?
> 
> 
> It seems to me the download case should work as you’ve described.  Sling
> could ask for a download URL, and if it gets one Sling can send a redirect
> to that URL; if not, Sling can then issue the request as is currently done
> today.
> ...

Keep in mind that redirecting to a cloud URI is likely to break relative 
references contained in a document....

Best regards, Julian

Re: Oak Direct Binary Access pull request

Posted by Matt Ryan <os...@mvryan.org>.

Hi Bertrand,

On June 27, 2018 at 4:33:05 AM, Bertrand Delacretaz (bdelacretaz@apache.org)
wrote:

Hi Matt,

From the Sling clients perspective I'm interested in making this
somewhat transparent, maybe something like:

For downloads, a client requests
http://my.sling.instance/somebinary.jpg and is redirected to
https://somecloudprovider/23874623748623746234782634273846237846723864.jpg

For uploads, it's a bit more complicated - maybe the client POSTing to
Sling receives a 307 status with a JSON document that describes
where/how to upload. In this case the client requires some knowledge
of this new API, unless someone has a better idea.

Do you see any obstacles in implementing something like this on top of
your suggested API?


It seems to me the download case should work as you’ve described.  Sling
could ask for a download URL, and if it gets one Sling can send a redirect
to that URL; if not, Sling can then issue the request as is currently done
today.

Upload is more complicated because of multi-part uploads.  For example,
Azure requires that a multi-part upload be performed for any binary larger
than 256MB [0].  Both Azure and AWS require multi-part uploads to be done
using a distinct URL for each part (instead of allowing the reuse of the
same URL with Content-Range like Google does [1]).  Thus the new Oak API
needs to support multi-part uploading via distinct URLs.  I’m not sure how
Sling would manage to hide that away from a client via a redirect when
there are potentially multiple URLs involved, without creating a stateful
session or something like that.

Of course Sling could take the return value from the call to initiate the
upload and turn it into a JSON document that the client can then consume.
As you say the client will need to have some knowledge of the new API to do
this.


[0] -
https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob#remarks
[1] -
https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload


-MR

Re: Oak Direct Binary Access pull request

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Matt,

On Thu, Jun 21, 2018 at 6:25 AM Julian Reschke <ju...@gmx.de> wrote:
> ...it would be helpful if you could link to example client code taking
> advantage of this extension...

+1, ideally as readable test code that one can use as examples.

From the Sling clients perspective I'm interested in making this
somewhat transparent, maybe something like:

For downloads, a client requests
http://my.sling.instance/somebinary.jpg and is redirected to
https://somecloudprovider/23874623748623746234782634273846237846723864.jpg

For uploads, it's a bit more complicated - maybe the client POSTing to
Sling receives a 307 status with a JSON document that describes
where/how to upload. In this case the client requires some knowledge
of this new API, unless someone has a better idea.

Do you see any obstacles in implementing something like this on top of
your suggested API?

-Bertrand

Re: Oak Direct Binary Access pull request

Posted by Matt Ryan <os...@mvryan.org>.

On June 20, 2018 at 10:25:20 PM, Julian Reschke (julian.reschke@gmx.de)
wrote:

On 2018-06-21 01:21, Matt Ryan wrote:
> Hi,
>
> A pull request [0] has been submitted containing a proposal for a Direct
> Binary Access feature in Oak.

...

>
> Regards,
>
> -MR

Hi Matt,

it would be helpful if you could link to example client code taking
advantage of this extension.

Best regards, Julian

Sure Julian.  There are some integration tests at [1].  Are you looking for
something more than that or does that address your question?

[1] -
https://github.com/mattvryan/jackrabbit-oak/blob/f46f5802e3dc48e1e3c26e2a5f89cbf3abe0ed8a/oak-jcr/src/test/java/org/apache/jackrabbit/oak/jcr/binary/HttpBinaryIT.java

-MR

Re: Oak Direct Binary Access pull request

Posted by Julian Reschke <ju...@gmx.de>.

On 2018-06-21 01:21, Matt Ryan wrote:
> Hi,
> 
> A pull request [0] has been submitted containing a proposal for a Direct
> Binary Access feature in Oak.  The proposed feature is described at [1].
> In a nutshell, it outlines a mechanism by which direct access to binary
> data in a cloud-based Oak data store can be made available via signed URLs
> with short TTLs.  Such a capability would have a significant positive
> impact on Oak scalability.
> 
> I’m emailing to request review and discussion based on the proposal.  As
> acknowledged in the wiki, there is some similarity to discussions we’ve had
> in the past ([2], [3], [4]) but the approach in this proposal is slightly
> different.
> 
> 
> [0] - https://github.com/apache/jackrabbit-oak/pull/88
> [1] - https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access
> [2] - https://issues.apache.org/jira/browse/OAK-6575
> [3] - https://markmail.org/thread/7eiwvkuv3ybv2vyz
> [4] - https://markmail.org/thread/zh6zxdxytnyonqms
> 
> 
> Regards,
> 
> -MR

Hi Matt,

it would be helpful if you could link to example client code taking 
advantage of this extension.

Best regards, Julian

Re: Oak Direct Binary Access pull request

Posted by Matt Ryan <os...@mvryan.org>.

Hi,

A JIRA issue has been created:
https://issues.apache.org/jira/browse/OAK-7569

At Marcel’s suggestion I have created subtasks for each of the points where
discussions may occur, and will add more as needed.  Feel free to add your
own if you have an item that you think merits further discussion than a
quick resolution on-list.

-MR


On June 20, 2018 at 5:21:39 PM, Matt Ryan (oss@mvryan.org) wrote:

Hi,

A pull request [0] has been submitted containing a proposal for a Direct
Binary Access feature in Oak.  The proposed feature is described at [1].
In a nutshell, it outlines a mechanism by which direct access to binary
data in a cloud-based Oak data store can be made available via signed URLs
with short TTLs.  Such a capability would have a significant positive
impact on Oak scalability.

I’m emailing to request review and discussion based on the proposal.  As
acknowledged in the wiki, there is some similarity to discussions we’ve had
in the past ([2], [3], [4]) but the approach in this proposal is slightly
different.


[0] - https://github.com/apache/jackrabbit-oak/pull/88
[1] - https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access
[2] - https://issues.apache.org/jira/browse/OAK-6575
[3] - https://markmail.org/thread/7eiwvkuv3ybv2vyz
[4] - https://markmail.org/thread/zh6zxdxytnyonqms


Regards,

-MR

Re: Oak Direct Binary Access pull request

Posted by Alexander Klimetschek <ak...@adobe.com.INVALID>.

> On 26.06.2018, at 00:28, Marcel Reutegger <mr...@adobe.com.INVALID> wrote:
> On 21.06.18 23:11, Alexander Klimetschek wrote:
>> The design of Oak is explicitly that the NodeStore controls binaries,
>> and use of a BlobStore is only optional. Root and Tree only see the
>> NodeStore, not the BlobStore. The SegmentNodeStore even makes the
>> choice to inline binaries under 16 KB for example. For these, the new
>> HTTP access is not possible and getHttpDownloadURL() must return null
>> in that case. This logic can only happen in the implementation of the
>> NodeStores.
> 
> The blob store can do the same and return null in that case.
> 
>> How would a client access it? How would the permission check be implemented?
> 
> It could work the same way as done in the current PR.

See my comment touching on this in OAK-7570 [1].

While the above point might be moot (now) due to the use of ReferenceBinary in SessionImpl.getHttpDownloadURL() (assuming this is only present for binaries from a BlobStore), there is still the question how SessionImpl would get hold of a BlobStore without breaking separation of concerns.

As mentioned before, this is simply a new feature on all layers, so a clean API approach requires this on all layers. Note that almost all changes do not interact with existing code.

[1] https://issues.apache.org/jira/browse/OAK-7570?focusedCommentId=16524255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16524255

Cheers,
Alex

Re: Oak Direct Binary Access pull request

Posted by Marcel Reutegger <mr...@adobe.com.INVALID>.

Hi,

On 21.06.18 23:11, Alexander Klimetschek wrote:
> The design of Oak is explicitly that the NodeStore controls binaries,
> and use of a BlobStore is only optional. Root and Tree only see the
> NodeStore, not the BlobStore. The SegmentNodeStore even makes the
> choice to inline binaries under 16 KB for example. For these, the new
> HTTP access is not possible and getHttpDownloadURL() must return null
> in that case. This logic can only happen in the implementation of the
> NodeStores.

The blob store can do the same and return null in that case.

> How would a client access it? How would the permission check be 
> implemented?

It could work the same way as done in the current PR.

Regards
  Marcel

Re: Oak Direct Binary Access pull request

Posted by Alexander Klimetschek <ak...@adobe.com.INVALID>.

FYI, there are now some more diagrams on the wiki [1] that hopefully help to get an overview before digging into the PR.

[1] https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access

Cheers,
Alex

Re: Oak Direct Binary Access pull request

Posted by Alexander Klimetschek <ak...@adobe.com.INVALID>.

On 21.06.2018, at 05:53, Marcel Reutegger <mr...@adobe.com.INVALID> wrote:
> As mentioned in an offline conversion with you already, I'm a bit concerned of the impact this optional feature has on nearly all layers of Oak.

Yes, that is the case, but it's just because Oak has so many layers, so if you want to add a new proper API, it means adding it in a few places :)

> SessionImpl implements HttpBinaryProvider, MutableRoot implements HttpBlobProvider, SegmentNodeStore implements HttpBlobProvider, DocumentNodeStore implements HttpBlobProvider. E.g. the last two just pass through calls they are not concerned with.

The design of Oak is explicitly that the NodeStore controls binaries, and use of a BlobStore is only optional. Root and Tree only see the NodeStore, not the BlobStore. The SegmentNodeStore even makes the choice to inline binaries under 16 KB for example. For these, the new HTTP access is not possible and getHttpDownloadURL() must return null in that case. This logic can only happen in the implementation of the NodeStores.

Note that the API changes are designed to be fully backwards compatible through the use of new extension interfaces. I.e. Session might implement HttpBinaryProvider (and Oak's SessionImpl does in the patch) and clients have to do an instanceof check. Same as with JackrabbitSession and co. In no place did we add a method to an existing API interface.

The reasons we added it as JCR API extension are:

- existing client code using the JCR API does not need to change to an Oak API or have knowledge of the datastore; only needs to import the new extension interface (package o.a.j.oak.jcr.api.binary in oak-jcr, alternatively we could move it to jackrabbit-api (see below as well)
- client code can access the feature from the Session object
- permission check for upload (IMO critical) can only happen inside the SessionImpl [1]
- converting Blob to Binary and vice versa is impossible outside of oak-jcr (unless we are missing something…)
- NodeStore & Blob semantics as outlined above
- no need for reflection tricks (as in the CDN feature discussed last year and moved out of Oak in the end)

> Alternatively, could you do the required plumbing on construction time? That is, if the BlobStore implements HttpBlobProvider register it with that interface as well and use it to construct the repository. Something like:
> 
> BlobStore bs = ...
> NodeStore ns = ...
> Jcr jcr = new Jcr(ns)
> if (bs instanceof HttpBlobProvider)
>    jcr.with((HttpBlobProvider) bs)
> Repository r = jcr.createRepository()
> 
> By default, the Jcr factory would have a HttpBlobProvider implementation that doesn't support the feature, which also relieves the repository implementation from checking the type or for null on every call to the new feature (as is the case in SessionImpl, MutableRoot, DocumentNodeStore, SegmentNodeStore).

How would a client access it? How would the permission check be implemented?

> I would also prefer if the API used by the client is moved to a separate module that can be release independently. Yes, we don't do this right now in Oak, but this may be a good opportunity to try this again. Releasing the API independently with a stable version lowers the barrier for consumers to adopt it.

There are 3 new API/SPI additions:

1) oak-jcr: client facing: HttpBinaryProvider
2) oak-api: shared for Oak API, NodeStore and BlobStore SPIs: HttpBlobProvider
3) oak-blob-plugins: for Jackrabbit DataStore SPI: HttpDataRecordProvider

We can move 1) to jackrabbit-api and 3) to jackrabbit-data. We left that open for the review - it was just a lot easier to work on the patch if it was all contained in the jackrabbit-oak repository.

Still the implementation changes in all layers are necessary as per above.

[1] https://github.com/apache/jackrabbit-oak/pull/88/files#diff-190f7c0d7156c8ab24c49208f9eb04f2R793
[2] https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access#Oak_Layers

Cheers,
Alex

Re: Oak Direct Binary Access pull request

Posted by Matt Ryan <os...@mvryan.org>.

Hi oak-dev,

I’ve added a new ticket [0] as a subtask to the main ticket for Oak Direct
Binary Access, pertaining to headers that may need to be set in the
responses to download URIs.  Please take a look and chime in on that
discussion if you have an opinion, I think it is an important one to get
right.


[0] - https://issues.apache.org/jira/browse/OAK-7637


-MR

On June 26, 2018 at 5:40:26 PM, Matt Ryan (oss@mvryan.org) wrote:

Hi oak-dev,

Here is the latest on this proposed change.

- I’ve made most of the minor fixes requested in the main pull request:
https://github.com/apache/jackrabbit-oak/pull/88
- Marcel has asked that I submit a separate pull request for one change in
PR #88, namely to add a filter to exclude
“org.apache.jackrabbit.oak.plugins.value.jcr” from BND evaluation in
oak-parent/pom.xml.  I’ve done this and made a new pull request:
https://github.com/apache/jackrabbit-oak/pull/89
- Julian has raised a concern in PR #88 in which he expresses a desire to
use the URI class instead of the URL class.  While PR #88 still uses URL,
I’ve made another pull request using URI instead:
https://github.com/apache/jackrabbit-oak/pull/90  If you have an opinion on
this matter please weigh in at OAK-7574.
- Marcel has asked that API changes in PR #88 be moved out of oak-jcr and
into another location.  In OAK-7589 he expresses this in more detail and
expressed a preference to move these API changes into jackrabbit-api.  I’ve
submitted a pull request to jackrabbit-api with this change:
https://github.com/apache/jackrabbit/pull/59  Since this change would
require modifications to my original pull request to work, I submitted
another pull request to Oak which relies on the jackrabbit-api changes.
This new pull request is at:
https://github.com/apache/jackrabbit-oak/pull/91

Michael has also asked me to try to simplify the original pull request to
make it easier to follow.  I’ve intended to do so but simply have not had
the time, I apologize.

Can progress be made with things as they are currently?  Maybe there are
still some issues to be resolved, but if some of the supporting pull
requests can be accepted at least that would be a good start.


Thanks


-MR

On June 21, 2018 at 9:24:44 PM, Matt Ryan (oss@mvryan.org) wrote:

On June 21, 2018 at 6:53:44 AM, Marcel Reutegger (mreutegg@adobe.com.invalid)
wrote:

Hi Matt,

New files in your pull request have a different format for the Apache
License header. Can you please change them to match the format of
existing source files?

Yes - I believe I have fixed this now, let me know if I missed any.



As mentioned in an offline conversion with you already, I'm a bit
concerned of the impact this optional feature has on nearly all layers
of Oak. SessionImpl implements HttpBinaryProvider, MutableRoot
implements HttpBlobProvider, SegmentNodeStore implements
HttpBlobProvider, DocumentNodeStore implements HttpBlobProvider. E.g.
the last two just pass through calls they are not concerned with.

Alternatively, could you do the required plumbing on construction time?
That is, if the BlobStore implements HttpBlobProvider register it with
that interface as well and use it to construct the repository. Something
like:

BlobStore bs = ...
NodeStore ns = ...
Jcr jcr = new Jcr(ns)
if (bs instanceof HttpBlobProvider)
jcr.with((HttpBlobProvider) bs)
Repository r = jcr.createRepository()

By default, the Jcr factory would have a HttpBlobProvider implementation
that doesn't support the feature, which also relieves the repository
implementation from checking the type or for null on every call to the
new feature (as is the case in SessionImpl, MutableRoot,
DocumentNodeStore, SegmentNodeStore).

I added OAK-7570 to discuss this.




I would also prefer if the API used by the client is moved to a separate
module that can be release independently. Yes, we don't do this right
now in Oak, but this may be a good opportunity to try this again.
Releasing the API independently with a stable version lowers the barrier
for consumers to adopt it.

I added OAK-7571 to discuss this.



-MR

Re: Oak Direct Binary Access pull request

Posted by Matt Ryan <os...@mvryan.org>.

Hi oak-dev,

Here is the latest on this proposed change.

- I’ve made most of the minor fixes requested in the main pull request:
https://github.com/apache/jackrabbit-oak/pull/88
- Marcel has asked that I submit a separate pull request for one change in
PR #88, namely to add a filter to exclude
“org.apache.jackrabbit.oak.plugins.value.jcr” from BND evaluation in
oak-parent/pom.xml.  I’ve done this and made a new pull request:
https://github.com/apache/jackrabbit-oak/pull/89
- Julian has raised a concern in PR #88 in which he expresses a desire to
use the URI class instead of the URL class.  While PR #88 still uses URL,
I’ve made another pull request using URI instead:
https://github.com/apache/jackrabbit-oak/pull/90  If you have an opinion on
this matter please weigh in at OAK-7574.
- Marcel has asked that API changes in PR #88 be moved out of oak-jcr and
into another location.  In OAK-7589 he expresses this in more detail and
expressed a preference to move these API changes into jackrabbit-api.  I’ve
submitted a pull request to jackrabbit-api with this change:
https://github.com/apache/jackrabbit/pull/59  Since this change would
require modifications to my original pull request to work, I submitted
another pull request to Oak which relies on the jackrabbit-api changes.
This new pull request is at:
https://github.com/apache/jackrabbit-oak/pull/91

Michael has also asked me to try to simplify the original pull request to
make it easier to follow.  I’ve intended to do so but simply have not had
the time, I apologize.

Can progress be made with things as they are currently?  Maybe there are
still some issues to be resolved, but if some of the supporting pull
requests can be accepted at least that would be a good start.


Thanks


-MR

On June 21, 2018 at 9:24:44 PM, Matt Ryan (oss@mvryan.org) wrote:

On June 21, 2018 at 6:53:44 AM, Marcel Reutegger (mreutegg@adobe.com.invalid)
wrote:

Hi Matt,

New files in your pull request have a different format for the Apache
License header. Can you please change them to match the format of
existing source files?

Yes - I believe I have fixed this now, let me know if I missed any.



As mentioned in an offline conversion with you already, I'm a bit
concerned of the impact this optional feature has on nearly all layers
of Oak. SessionImpl implements HttpBinaryProvider, MutableRoot
implements HttpBlobProvider, SegmentNodeStore implements
HttpBlobProvider, DocumentNodeStore implements HttpBlobProvider. E.g.
the last two just pass through calls they are not concerned with.

Alternatively, could you do the required plumbing on construction time?
That is, if the BlobStore implements HttpBlobProvider register it with
that interface as well and use it to construct the repository. Something
like:

BlobStore bs = ...
NodeStore ns = ...
Jcr jcr = new Jcr(ns)
if (bs instanceof HttpBlobProvider)
jcr.with((HttpBlobProvider) bs)
Repository r = jcr.createRepository()

By default, the Jcr factory would have a HttpBlobProvider implementation
that doesn't support the feature, which also relieves the repository
implementation from checking the type or for null on every call to the
new feature (as is the case in SessionImpl, MutableRoot,
DocumentNodeStore, SegmentNodeStore).

I added OAK-7570 to discuss this.




I would also prefer if the API used by the client is moved to a separate
module that can be release independently. Yes, we don't do this right
now in Oak, but this may be a good opportunity to try this again.
Releasing the API independently with a stable version lowers the barrier
for consumers to adopt it.

I added OAK-7571 to discuss this.



-MR

Re: Oak Direct Binary Access pull request

Posted by Matt Ryan <os...@mvryan.org>.

On June 21, 2018 at 6:53:44 AM, Marcel Reutegger (mreutegg@adobe.com.invalid)
wrote:

Hi Matt,

New files in your pull request have a different format for the Apache
License header. Can you please change them to match the format of
existing source files?

Yes - I believe I have fixed this now, let me know if I missed any.



As mentioned in an offline conversion with you already, I'm a bit
concerned of the impact this optional feature has on nearly all layers
of Oak. SessionImpl implements HttpBinaryProvider, MutableRoot
implements HttpBlobProvider, SegmentNodeStore implements
HttpBlobProvider, DocumentNodeStore implements HttpBlobProvider. E.g.
the last two just pass through calls they are not concerned with.

Alternatively, could you do the required plumbing on construction time?
That is, if the BlobStore implements HttpBlobProvider register it with
that interface as well and use it to construct the repository. Something
like:

BlobStore bs = ...
NodeStore ns = ...
Jcr jcr = new Jcr(ns)
if (bs instanceof HttpBlobProvider)
jcr.with((HttpBlobProvider) bs)
Repository r = jcr.createRepository()

By default, the Jcr factory would have a HttpBlobProvider implementation
that doesn't support the feature, which also relieves the repository
implementation from checking the type or for null on every call to the
new feature (as is the case in SessionImpl, MutableRoot,
DocumentNodeStore, SegmentNodeStore).

I added OAK-7570 to discuss this.




I would also prefer if the API used by the client is moved to a separate
module that can be release independently. Yes, we don't do this right
now in Oak, but this may be a good opportunity to try this again.
Releasing the API independently with a stable version lowers the barrier
for consumers to adopt it.

I added OAK-7571 to discuss this.



-MR

Re: Oak Direct Binary Access pull request

Posted by Marcel Reutegger <mr...@adobe.com.INVALID>.

Hi Matt,

New files in your pull request have a different format for the Apache 
License header. Can you please change them to match the format of 
existing source files?

As mentioned in an offline conversion with you already, I'm a bit 
concerned of the impact this optional feature has on nearly all layers 
of Oak. SessionImpl implements HttpBinaryProvider, MutableRoot 
implements HttpBlobProvider, SegmentNodeStore implements 
HttpBlobProvider, DocumentNodeStore implements HttpBlobProvider. E.g. 
the last two just pass through calls they are not concerned with.

Alternatively, could you do the required plumbing on construction time? 
That is, if the BlobStore implements HttpBlobProvider register it with 
that interface as well and use it to construct the repository. Something 
like:

BlobStore bs = ...
NodeStore ns = ...
Jcr jcr = new Jcr(ns)
if (bs instanceof HttpBlobProvider)
     jcr.with((HttpBlobProvider) bs)
Repository r = jcr.createRepository()

By default, the Jcr factory would have a HttpBlobProvider implementation 
that doesn't support the feature, which also relieves the repository 
implementation from checking the type or for null on every call to the 
new feature (as is the case in SessionImpl, MutableRoot, 
DocumentNodeStore, SegmentNodeStore).

I would also prefer if the API used by the client is moved to a separate 
module that can be release independently. Yes, we don't do this right 
now in Oak, but this may be a good opportunity to try this again. 
Releasing the API independently with a stable version lowers the barrier 
for consumers to adopt it.

Regards
  Marcel

On 21.06.18 01:21, Matt Ryan wrote:
> Hi,
> 
> A pull request [0] has been submitted containing a proposal for a Direct
> Binary Access feature in Oak.  The proposed feature is described at [1].
> In a nutshell, it outlines a mechanism by which direct access to binary
> data in a cloud-based Oak data store can be made available via signed URLs
> with short TTLs.  Such a capability would have a significant positive
> impact on Oak scalability.
> 
> I’m emailing to request review and discussion based on the proposal.  As
> acknowledged in the wiki, there is some similarity to discussions we’ve had
> in the past ([2], [3], [4]) but the approach in this proposal is slightly
> different.
> 
> 
> [0] - https://github.com/apache/jackrabbit-oak/pull/88
> [1] - https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access
> [2] - https://issues.apache.org/jira/browse/OAK-6575
> [3] - https://markmail.org/thread/7eiwvkuv3ybv2vyz
> [4] - https://markmail.org/thread/zh6zxdxytnyonqms
> 
> 
> Regards,
> 
> -MR
>

Re: Oak Direct Binary Access pull request

Posted by Matt Ryan <os...@mvryan.org>.

On June 21, 2018 at 1:35:30 AM, Michael Dürig (mduerig@apache.org) wrote:


Hi,

Any chance for cleaning up the history? This will make it much easier to
review an to maintain once applied.

Certainly; I will try.


I know that this can be a bit of a pain. But in my eyes the revision
history is part of the "code" as much as the code itself and it should
be as easy to read as possible.


Agreed.  I’m also trying to find a way to do that and maintain the history
of who made which changes (since there are multiple authors).


-MR