You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Bertrand Delacretaz <bd...@apache.org> on 2016/08/10 09:29:45 UTC

Re: Usecases around Binary handling in Oak

Hi,

On Tue, Jul 26, 2016 at 4:36 PM, Bertrand Delacretaz
<bd...@apache.org> wrote:
> ...I've thought about adding an "adopt-a-binary" feature to Sling
> recently, to allow it to serve existing (disk or cloud) binaries along
> with those stored in Oak....

I just noticed that the Git Large File Storage project uses a similar
approach, it "replaces large files such as audio samples, videos,
datasets, and graphics with text pointers inside Git, while storing
the file contents on a remote server". Maybe there are ideas to
steal^H^H^H^H^H borrow from there.

-Bertrand

[1] https://git-lfs.github.com/

Re: Usecases around Binary handling in Oak

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Wed, Aug 10, 2016 at 11:56 AM, Ian Boston <ie...@tfd.co.uk> wrote:
> On 10 August 2016 at 10:29, Bertrand Delacretaz <bd...@apache.org>
> wrote:
>> ...I just noticed that the Git Large File Storage project uses a similar
>> approach, it "replaces large files such as audio samples, videos,
>> datasets, and graphics with text pointers inside Git, while storing
>> the file contents on a remote server"...

> Would that be something to do at the Sling level on upload of a large file?

That's one option, which becomes obsolete if Oak implements that but
can always be migrated to an Oak-based solution later if needed, you'd
just need to move the file contents pointers.

-Bertrand

Re: Usecases around Binary handling in Oak

Posted by Ian Boston <ie...@tfd.co.uk>.
Hi,

On 10 August 2016 at 11:14, Chetan Mehrotra <ch...@gmail.com>
wrote:

> This can be done at Sling level yes. But then any code which makes use
> of JCR API would not be able to access the binary.


TBH, the JCR API doesn't really give you much help accessing a binary of a
file. You can't just get the nt:file node and get an InputStream from that.
You have to

ntFileNode.getNode(Node.JCR_CONTENT).getProperty(Property.JCR_DATA).getBinary().getStream().

and then (according to the spec) remember to

ntFileNode.getNode(Node.JCR_CONTENT).getProperty(Property.JCR_DATA).getBinary().dispose()

when done.
For some this might be common knowledge, but it's not exactly simple or
straightforward. It also assumes that the ntFileNode is a nt:file and not
some other construction holding  a Binary property.

Even implemented in Sling it would require knowledge.
ntFileResource.adaptTo(InputStream.class) or
ntFileResource.adaptTo(URI.class) or ntFileResource.adaptTo(File.class), as
the Resource API doesn't have resource.openInputStream() or
resource.getURL() or resource.getFile(), although, in theory, it could as
it's not a JSR spec.

I hope this is still the right place to share these observations.



> One way to have it
> implemented at Oak level would be to introduce some sort of
> 'ExternalBinary' and open up an extension in BlobStore implementation
> to delegate binary lookup call to some provider. Just that it needs to
> honor the contract of Binary and Blob API
>

I thought the consensus was that Oak was not going to leak those pointers
to DS Binaries ?
Won't all the same arguments apply to any other type of Binary under the
control and responsibility of Oak ?


>
> That part is easy.
>
> The problem comes in management side where you need to decide on GC.
> Probably Oak would need to expose an API to provide list (iterator) of
> all such external binaries it refers to and then the external system
> can manage the GC
>

Many external systems already have space management capabilities. eg S3.
Oak should mark binaries safe to delete via an API and let the external
system decide what to do.
Exporting a list via an iterator won't scale.

Best Regards
Ian


> Chetan Mehrotra
>
>
> On Wed, Aug 10, 2016 at 3:26 PM, Ian Boston <ie...@tfd.co.uk> wrote:
> > Hi,
> >
> > On 10 August 2016 at 10:29, Bertrand Delacretaz <bd...@apache.org>
> > wrote:
> >
> >> Hi,
> >>
> >> On Tue, Jul 26, 2016 at 4:36 PM, Bertrand Delacretaz
> >> <bd...@apache.org> wrote:
> >> > ...I've thought about adding an "adopt-a-binary" feature to Sling
> >> > recently, to allow it to serve existing (disk or cloud) binaries along
> >> > with those stored in Oak....
> >>
> >> I just noticed that the Git Large File Storage project uses a similar
> >> approach, it "replaces large files such as audio samples, videos,
> >> datasets, and graphics with text pointers inside Git, while storing
> >> the file contents on a remote server". Maybe there are ideas to
> >> steal^H^H^H^H^H borrow from there.
> >>
> >
> > Would that be something to do at the Sling level on upload of a large
> file?
> >
> > I am working on a patch to use the Commons File Upload streaming API in
> > Sling servlets/post as a Operation impl.
> > I know this is oak-dev, so the question might not be appropriate here.
> >
> > Best Regards
> > Ian
> >
> >
> >>
> >> -Bertrand
> >>
> >> [1] https://git-lfs.github.com/
> >>
>

Re: Usecases around Binary handling in Oak

Posted by Chetan Mehrotra <ch...@gmail.com>.
This can be done at Sling level yes. But then any code which makes use
of JCR API would not be able to access the binary. One way to have it
implemented at Oak level would be to introduce some sort of
'ExternalBinary' and open up an extension in BlobStore implementation
to delegate binary lookup call to some provider. Just that it needs to
honor the contract of Binary and Blob API

That part is easy.

The problem comes in management side where you need to decide on GC.
Probably Oak would need to expose an API to provide list (iterator) of
all such external binaries it refers to and then the external system
can manage the GC
Chetan Mehrotra


On Wed, Aug 10, 2016 at 3:26 PM, Ian Boston <ie...@tfd.co.uk> wrote:
> Hi,
>
> On 10 August 2016 at 10:29, Bertrand Delacretaz <bd...@apache.org>
> wrote:
>
>> Hi,
>>
>> On Tue, Jul 26, 2016 at 4:36 PM, Bertrand Delacretaz
>> <bd...@apache.org> wrote:
>> > ...I've thought about adding an "adopt-a-binary" feature to Sling
>> > recently, to allow it to serve existing (disk or cloud) binaries along
>> > with those stored in Oak....
>>
>> I just noticed that the Git Large File Storage project uses a similar
>> approach, it "replaces large files such as audio samples, videos,
>> datasets, and graphics with text pointers inside Git, while storing
>> the file contents on a remote server". Maybe there are ideas to
>> steal^H^H^H^H^H borrow from there.
>>
>
> Would that be something to do at the Sling level on upload of a large file?
>
> I am working on a patch to use the Commons File Upload streaming API in
> Sling servlets/post as a Operation impl.
> I know this is oak-dev, so the question might not be appropriate here.
>
> Best Regards
> Ian
>
>
>>
>> -Bertrand
>>
>> [1] https://git-lfs.github.com/
>>

Re: Usecases around Binary handling in Oak

Posted by Ian Boston <ie...@tfd.co.uk>.
Hi,

On 10 August 2016 at 10:29, Bertrand Delacretaz <bd...@apache.org>
wrote:

> Hi,
>
> On Tue, Jul 26, 2016 at 4:36 PM, Bertrand Delacretaz
> <bd...@apache.org> wrote:
> > ...I've thought about adding an "adopt-a-binary" feature to Sling
> > recently, to allow it to serve existing (disk or cloud) binaries along
> > with those stored in Oak....
>
> I just noticed that the Git Large File Storage project uses a similar
> approach, it "replaces large files such as audio samples, videos,
> datasets, and graphics with text pointers inside Git, while storing
> the file contents on a remote server". Maybe there are ideas to
> steal^H^H^H^H^H borrow from there.
>

Would that be something to do at the Sling level on upload of a large file?

I am working on a patch to use the Commons File Upload streaming API in
Sling servlets/post as a Operation impl.
I know this is oak-dev, so the question might not be appropriate here.

Best Regards
Ian


>
> -Bertrand
>
> [1] https://git-lfs.github.com/
>