You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Woonsan Ko <wo...@apache.org> on 2016/07/07 21:14:53 UTC

Retrieving Binary Value Identifier

Hi,

I'm trying to retrieve Binary Value Identifier by following [1].
The example shows retrieving it when creating a binary value.
Is it possible to retrieve the identifier when reading binary value afterward?
I tried it like this:

            if (value instanceof JackrabbitValue) {
                contentId = ((JackrabbitValue) value).getContentIdentity();
            }

But, it always returns null. In debugging, it hits
org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier() in
my example and the method always returns null.

Is #getContentIdentity() supposed to be used only when creating a
binary value once initially, or is it kind of shortcoming at the
moment?

Regards,

Woonsan

[1] https://wiki.apache.org/jackrabbit/DataStore#Retrieve_the_Identifier

Re: Retrieving Binary Value Identifier

Posted by Woonsan Ko <wo...@apache.org>.
On Thu, Jul 7, 2016 at 6:59 PM, Clay Ferguson <wc...@gmail.com> wrote:
> also one more thing. The only purpose of the contentIdentity is probably
> for comparing one blob against another blob to see if the blobs are
> byte-by-byte exactly the same or not without having to read the blobs. If
> you are using the contentIdentity for any other purpose, you probably
> should rethink. It's like a checksum for efficient blob comparison.

My intention is to help build a migration tool by showing the binary
content id in a jcr explorer. Then the tool can allow to compare/debug
the data between source and destination DataStore. So, it's really for
internal debugging/tracing purpose only in backends.

Regards,

Woonsan

>
> Best regards,
> Clay Ferguson
> wclayf@gmail.com
>
>
> On Thu, Jul 7, 2016 at 4:14 PM, Woonsan Ko <wo...@apache.org> wrote:
>
>> Hi,
>>
>> I'm trying to retrieve Binary Value Identifier by following [1].
>> The example shows retrieving it when creating a binary value.
>> Is it possible to retrieve the identifier when reading binary value
>> afterward?
>> I tried it like this:
>>
>>             if (value instanceof JackrabbitValue) {
>>                 contentId = ((JackrabbitValue) value).getContentIdentity();
>>             }
>>
>> But, it always returns null. In debugging, it hits
>> org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier() in
>> my example and the method always returns null.
>>
>> Is #getContentIdentity() supposed to be used only when creating a
>> binary value once initially, or is it kind of shortcoming at the
>> moment?
>>
>> Regards,
>>
>> Woonsan
>>
>> [1] https://wiki.apache.org/jackrabbit/DataStore#Retrieve_the_Identifier
>>

Re: Retrieving Binary Value Identifier

Posted by Clay Ferguson <wc...@gmail.com>.
also one more thing. The only purpose of the contentIdentity is probably
for comparing one blob against another blob to see if the blobs are
byte-by-byte exactly the same or not without having to read the blobs. If
you are using the contentIdentity for any other purpose, you probably
should rethink. It's like a checksum for efficient blob comparison.

Best regards,
Clay Ferguson
wclayf@gmail.com


On Thu, Jul 7, 2016 at 4:14 PM, Woonsan Ko <wo...@apache.org> wrote:

> Hi,
>
> I'm trying to retrieve Binary Value Identifier by following [1].
> The example shows retrieving it when creating a binary value.
> Is it possible to retrieve the identifier when reading binary value
> afterward?
> I tried it like this:
>
>             if (value instanceof JackrabbitValue) {
>                 contentId = ((JackrabbitValue) value).getContentIdentity();
>             }
>
> But, it always returns null. In debugging, it hits
> org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier() in
> my example and the method always returns null.
>
> Is #getContentIdentity() supposed to be used only when creating a
> binary value once initially, or is it kind of shortcoming at the
> moment?
>
> Regards,
>
> Woonsan
>
> [1] https://wiki.apache.org/jackrabbit/DataStore#Retrieve_the_Identifier
>

Re: Retrieving Binary Value Identifier

Posted by Clay Ferguson <wc...@gmail.com>.
I saw how it is coded to return null. You are right. I'm not familiar with
this exact class, but probably means this is a class used as a base class
usually and normally some other class will extend it and implement that
method to actually return some value. I'd search the entire source (you may
need to unzip onto your machine to search it) and see what other classes
derive from this as a base class. See if your code is actually this base
class or some other class. If you step into it and it goes right to that
"return null" then you know you have somehow gotten the base class
instantiated or else you just have a class that doesn't provide the
implementation. Probably implementing this method is not defined. Just
detect if this is the case, and do your own checksum may be the best you
can do if you need an actual checksum.

Best regards,
Clay Ferguson
wclayf@gmail.com


On Thu, Jul 7, 2016 at 11:20 PM, Woonsan Ko <wo...@apache.org> wrote:

> Hi Clay,
>
> On Thu, Jul 7, 2016 at 6:55 PM, Clay Ferguson <wc...@gmail.com> wrote:
> > Woosan,
> > The contentIdentity is basically a "hash" of the data, so it can't be
> > generated for the first time without using the actual data, so the data I
> > think has to be read first. But it seems like you could store the hash
> > somewhere if you wanted to, You could just set a breakpoing on the
> > 'getContentIdentity' and step into the code and see that it's generating
> a
> > hash. Seems like they would have stored this hash so it can be used
> again,
> > but i'm not sure if it's stored or not. If it is, stored you could use
> the
> > stored value rather than let it get recalculated every time.
>
> If I'm not mistaken, I guess the content id should be stored somewhere
> by PersistenceManager. DataStore#getRecord(DataIdentifier) is invoked
> to retrieve binary data if stored in a DataStore. I'm experimenting
> with VfsDataStore (JCR-3975) and seeing the hash-like identifiers all
> the time. Should be same as FileDataStore or DbDataStore.
> In my debugging, it hits
> org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier()
> when retrieving a binary value. The method is hard-coded to return
> null. Maybe I missed something...
>
> Thanks for your remarks,
>
> Woonsan
>
> >
> > Best regards,
> > Clay Ferguson
> > wclayf@gmail.com
> >
> >
> > On Thu, Jul 7, 2016 at 4:14 PM, Woonsan Ko <wo...@apache.org> wrote:
> >
> >> Hi,
> >>
> >> I'm trying to retrieve Binary Value Identifier by following [1].
> >> The example shows retrieving it when creating a binary value.
> >> Is it possible to retrieve the identifier when reading binary value
> >> afterward?
> >> I tried it like this:
> >>
> >>             if (value instanceof JackrabbitValue) {
> >>                 contentId = ((JackrabbitValue)
> value).getContentIdentity();
> >>             }
> >>
> >> But, it always returns null. In debugging, it hits
> >> org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier() in
> >> my example and the method always returns null.
> >>
> >> Is #getContentIdentity() supposed to be used only when creating a
> >> binary value once initially, or is it kind of shortcoming at the
> >> moment?
> >>
> >> Regards,
> >>
> >> Woonsan
> >>
> >> [1]
> https://wiki.apache.org/jackrabbit/DataStore#Retrieve_the_Identifier
> >>
>

Re: Retrieving Binary Value Identifier

Posted by Woonsan Ko <wo...@apache.org>.
On Sat, Jul 16, 2016 at 10:09 AM, Woonsan Ko <wo...@apache.org> wrote:
> Hi Marcel,
>
> I figured out my mistake! It works like a charm as expected and
> described in the wiki page [1].
> So, even when retrieving a Binary value from existing property, the
> following example works fine, too:
>
>     Value value = prop.getValue();
>     if (value instanceof JackrabbitValue) {
>         String contentId = ((JackrabbitValue) value).getContentIdentity();
>     }
>
> My mistake was I didn't realize that DataStore component is optional,
> so if not specified with any, no DataStore used by default. Not from
> DataStore, no contentIdentity like you mentioned insightfully. :-)
>
> For others' information, I simply added the following for a simple H2
> based demo:
>
>   <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>     <param name="url" value="jdbc:h2:file:${rep.home}/version/db"/>

Minor detail: I changed it to "jdbc:h2:file:${rep.home}/datastore/db"
in my demo, not to pollute the version db. ;-)

>     <param name="databaseType" value="h2"/>
>     <param name="driver" value="org.h2.Driver"/>
>     <param name="minRecordLength" value="1024"/>
>     <param name="copyWhenReading" value="true"/>
>   </DataStore>
>
> By the way, minRecoredLength is 100 bytes by default, not 1024 bytes
> as configured in the above example.
>
> Cheers,
>
> Woonsan
>
> [1] https://wiki.apache.org/jackrabbit/DataStore
>
> On Wed, Jul 13, 2016 at 8:52 AM, Woonsan Ko <wo...@apache.org> wrote:
>> Hi Marcel,
>>
>> On Wed, Jul 13, 2016 at 2:40 AM, Marcel Reutegger <mr...@adobe.com> wrote:
>>> Hi,
>>>
>>> On 08/07/16 06:20, "Woonsan Ko" wrote:
>>>>In my debugging, it hits
>>>>org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier()
>>>>when retrieving a binary value. The method is hard-coded to return
>>>>null. Maybe I missed something...
>>>
>>> Only Binary objects served from a DataStore have an identifier.
>>> Smaller binaries are usually inlined by the PersistenceManager.
>>> IIRC the threshold is 4k. Maybe your test creates binaries smaller
>>> than this threshold?
>>
>> I guess I already checked that possibility and made sure on binary
>> from DataStore at the moment.
>> But it reminds me of the possibility to put a debugger to see how
>> PersistenceManager passes the DataIdentifier to DataStore through the
>> stack trace.
>> I'll take a look again later and let you know if I find anything interesting.
>>
>> Thanks,
>>
>> Woonsan
>>
>>>
>>> Regards
>>>  Marcel
>>>

Re: Retrieving Binary Value Identifier

Posted by Woonsan Ko <wo...@apache.org>.
On Sat, Jul 16, 2016 at 10:09 AM, Woonsan Ko <wo...@apache.org> wrote:
> Hi Marcel,
>
> I figured out my mistake! It works like a charm as expected and
> described in the wiki page [1].
> So, even when retrieving a Binary value from existing property, the
> following example works fine, too:
>
>     Value value = prop.getValue();
>     if (value instanceof JackrabbitValue) {
>         String contentId = ((JackrabbitValue) value).getContentIdentity();
>     }
>
> My mistake was I didn't realize that DataStore component is optional,
> so if not specified with any, no DataStore used by default. Not from
> DataStore, no contentIdentity like you mentioned insightfully. :-)
>
> For others' information, I simply added the following for a simple H2
> based demo:
>
>   <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
>     <param name="url" value="jdbc:h2:file:${rep.home}/version/db"/>
>     <param name="databaseType" value="h2"/>
>     <param name="driver" value="org.h2.Driver"/>
>     <param name="minRecordLength" value="1024"/>
>     <param name="copyWhenReading" value="true"/>
>   </DataStore>
>
> By the way, minRecoredLength is 100 bytes by default, not 1024 bytes
> as configured in the above example.

FWIW, the default values of minRecoredLength are different in each
DataStore implementation.
FileDataStore and DbDataStore have 100 bytes as default value, but
CachingDataStore implementations such as S3DataStore have 16KB by
default.

Cheers, Woonsan

>
> Cheers,
>
> Woonsan
>
> [1] https://wiki.apache.org/jackrabbit/DataStore
>
> On Wed, Jul 13, 2016 at 8:52 AM, Woonsan Ko <wo...@apache.org> wrote:
>> Hi Marcel,
>>
>> On Wed, Jul 13, 2016 at 2:40 AM, Marcel Reutegger <mr...@adobe.com> wrote:
>>> Hi,
>>>
>>> On 08/07/16 06:20, "Woonsan Ko" wrote:
>>>>In my debugging, it hits
>>>>org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier()
>>>>when retrieving a binary value. The method is hard-coded to return
>>>>null. Maybe I missed something...
>>>
>>> Only Binary objects served from a DataStore have an identifier.
>>> Smaller binaries are usually inlined by the PersistenceManager.
>>> IIRC the threshold is 4k. Maybe your test creates binaries smaller
>>> than this threshold?
>>
>> I guess I already checked that possibility and made sure on binary
>> from DataStore at the moment.
>> But it reminds me of the possibility to put a debugger to see how
>> PersistenceManager passes the DataIdentifier to DataStore through the
>> stack trace.
>> I'll take a look again later and let you know if I find anything interesting.
>>
>> Thanks,
>>
>> Woonsan
>>
>>>
>>> Regards
>>>  Marcel
>>>

Re: Retrieving Binary Value Identifier

Posted by Woonsan Ko <wo...@apache.org>.
Hi Marcel,

I figured out my mistake! It works like a charm as expected and
described in the wiki page [1].
So, even when retrieving a Binary value from existing property, the
following example works fine, too:

    Value value = prop.getValue();
    if (value instanceof JackrabbitValue) {
        String contentId = ((JackrabbitValue) value).getContentIdentity();
    }

My mistake was I didn't realize that DataStore component is optional,
so if not specified with any, no DataStore used by default. Not from
DataStore, no contentIdentity like you mentioned insightfully. :-)

For others' information, I simply added the following for a simple H2
based demo:

  <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
    <param name="url" value="jdbc:h2:file:${rep.home}/version/db"/>
    <param name="databaseType" value="h2"/>
    <param name="driver" value="org.h2.Driver"/>
    <param name="minRecordLength" value="1024"/>
    <param name="copyWhenReading" value="true"/>
  </DataStore>

By the way, minRecoredLength is 100 bytes by default, not 1024 bytes
as configured in the above example.

Cheers,

Woonsan

[1] https://wiki.apache.org/jackrabbit/DataStore

On Wed, Jul 13, 2016 at 8:52 AM, Woonsan Ko <wo...@apache.org> wrote:
> Hi Marcel,
>
> On Wed, Jul 13, 2016 at 2:40 AM, Marcel Reutegger <mr...@adobe.com> wrote:
>> Hi,
>>
>> On 08/07/16 06:20, "Woonsan Ko" wrote:
>>>In my debugging, it hits
>>>org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier()
>>>when retrieving a binary value. The method is hard-coded to return
>>>null. Maybe I missed something...
>>
>> Only Binary objects served from a DataStore have an identifier.
>> Smaller binaries are usually inlined by the PersistenceManager.
>> IIRC the threshold is 4k. Maybe your test creates binaries smaller
>> than this threshold?
>
> I guess I already checked that possibility and made sure on binary
> from DataStore at the moment.
> But it reminds me of the possibility to put a debugger to see how
> PersistenceManager passes the DataIdentifier to DataStore through the
> stack trace.
> I'll take a look again later and let you know if I find anything interesting.
>
> Thanks,
>
> Woonsan
>
>>
>> Regards
>>  Marcel
>>

Re: Retrieving Binary Value Identifier

Posted by Woonsan Ko <wo...@apache.org>.
Hi Marcel,

On Wed, Jul 13, 2016 at 2:40 AM, Marcel Reutegger <mr...@adobe.com> wrote:
> Hi,
>
> On 08/07/16 06:20, "Woonsan Ko" wrote:
>>In my debugging, it hits
>>org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier()
>>when retrieving a binary value. The method is hard-coded to return
>>null. Maybe I missed something...
>
> Only Binary objects served from a DataStore have an identifier.
> Smaller binaries are usually inlined by the PersistenceManager.
> IIRC the threshold is 4k. Maybe your test creates binaries smaller
> than this threshold?

I guess I already checked that possibility and made sure on binary
from DataStore at the moment.
But it reminds me of the possibility to put a debugger to see how
PersistenceManager passes the DataIdentifier to DataStore through the
stack trace.
I'll take a look again later and let you know if I find anything interesting.

Thanks,

Woonsan

>
> Regards
>  Marcel
>

Re: Retrieving Binary Value Identifier

Posted by Marcel Reutegger <mr...@adobe.com>.
Hi,

On 08/07/16 06:20, "Woonsan Ko" wrote:
>In my debugging, it hits
>org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier()
>when retrieving a binary value. The method is hard-coded to return
>null. Maybe I missed something...

Only Binary objects served from a DataStore have an identifier.
Smaller binaries are usually inlined by the PersistenceManager.
IIRC the threshold is 4k. Maybe your test creates binaries smaller
than this threshold?

Regards
 Marcel 


Re: Retrieving Binary Value Identifier

Posted by Woonsan Ko <wo...@apache.org>.
Hi Clay,

On Thu, Jul 7, 2016 at 6:55 PM, Clay Ferguson <wc...@gmail.com> wrote:
> Woosan,
> The contentIdentity is basically a "hash" of the data, so it can't be
> generated for the first time without using the actual data, so the data I
> think has to be read first. But it seems like you could store the hash
> somewhere if you wanted to, You could just set a breakpoing on the
> 'getContentIdentity' and step into the code and see that it's generating a
> hash. Seems like they would have stored this hash so it can be used again,
> but i'm not sure if it's stored or not. If it is, stored you could use the
> stored value rather than let it get recalculated every time.

If I'm not mistaken, I guess the content id should be stored somewhere
by PersistenceManager. DataStore#getRecord(DataIdentifier) is invoked
to retrieve binary data if stored in a DataStore. I'm experimenting
with VfsDataStore (JCR-3975) and seeing the hash-like identifiers all
the time. Should be same as FileDataStore or DbDataStore.
In my debugging, it hits
org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier()
when retrieving a binary value. The method is hard-coded to return
null. Maybe I missed something...

Thanks for your remarks,

Woonsan

>
> Best regards,
> Clay Ferguson
> wclayf@gmail.com
>
>
> On Thu, Jul 7, 2016 at 4:14 PM, Woonsan Ko <wo...@apache.org> wrote:
>
>> Hi,
>>
>> I'm trying to retrieve Binary Value Identifier by following [1].
>> The example shows retrieving it when creating a binary value.
>> Is it possible to retrieve the identifier when reading binary value
>> afterward?
>> I tried it like this:
>>
>>             if (value instanceof JackrabbitValue) {
>>                 contentId = ((JackrabbitValue) value).getContentIdentity();
>>             }
>>
>> But, it always returns null. In debugging, it hits
>> org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier() in
>> my example and the method always returns null.
>>
>> Is #getContentIdentity() supposed to be used only when creating a
>> binary value once initially, or is it kind of shortcoming at the
>> moment?
>>
>> Regards,
>>
>> Woonsan
>>
>> [1] https://wiki.apache.org/jackrabbit/DataStore#Retrieve_the_Identifier
>>

Re: Retrieving Binary Value Identifier

Posted by Clay Ferguson <wc...@gmail.com>.
Woosan,
The contentIdentity is basically a "hash" of the data, so it can't be
generated for the first time without using the actual data, so the data I
think has to be read first. But it seems like you could store the hash
somewhere if you wanted to, You could just set a breakpoing on the
'getContentIdentity' and step into the code and see that it's generating a
hash. Seems like they would have stored this hash so it can be used again,
but i'm not sure if it's stored or not. If it is, stored you could use the
stored value rather than let it get recalculated every time.

Best regards,
Clay Ferguson
wclayf@gmail.com


On Thu, Jul 7, 2016 at 4:14 PM, Woonsan Ko <wo...@apache.org> wrote:

> Hi,
>
> I'm trying to retrieve Binary Value Identifier by following [1].
> The example shows retrieving it when creating a binary value.
> Is it possible to retrieve the identifier when reading binary value
> afterward?
> I tried it like this:
>
>             if (value instanceof JackrabbitValue) {
>                 contentId = ((JackrabbitValue) value).getContentIdentity();
>             }
>
> But, it always returns null. In debugging, it hits
> org.apache.jackrabbit.core.value.BLOBFileValue#getDataIdentifier() in
> my example and the method always returns null.
>
> Is #getContentIdentity() supposed to be used only when creating a
> binary value once initially, or is it kind of shortcoming at the
> moment?
>
> Regards,
>
> Woonsan
>
> [1] https://wiki.apache.org/jackrabbit/DataStore#Retrieve_the_Identifier
>