You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Robert Munteanu <ro...@apache.org> on 2016/05/06 13:00:03 UTC

[DocumentStore] Matching blob ids from a node property

Hi,

Using the DocumentNodeStore with a Mongo backend, I'm trying to match
blobs which are linked to certain nodes.

What I do is:

- look for nodes with _bin == 1
- look at the values from jcr:data property map
- for each value, strip the :blobId: and unquote the value

At this point, I expected a blob id which matches the ids of objects
from the blobs collection. However, there seems to be some extra
information attached to the property in the nodes collection.

For example, if a blob has the id
'0100bd6c2070cccd6d571350269554f06d0bb928d57463e3e74c8434d00a274a52c57f
6cbd', it is referenced in a property as
'0100bd6c2070cccd6d571350269554f06d0bb928d57463e3e74c8434d00a274a52c57f
6cbd'. The difference is in the extra 10 bytes at the start (
'0100bd6c20' ). Those 10 bytes seem to vary between blob id references,
so I'm not sure how to interpret them.

How do I figure out which is the blob id that I should be looking for
in the blobs collection?

Thanks,

Robert

Re: [DocumentStore] Matching blob ids from a node property

Posted by Robert Munteanu <ro...@apache.org>.
Hi Amit,

On Mon, 2016-05-09 at 13:29 +0530, Amit Jain wrote:
> Hi Robert,
> 
> You would have to use AbstractBlobStore#resolveChunks [1] to get the
> all
> the blob ids available in the blob collection for the id stored in
> the
> node.
> For the example id you have it is likely that this is a in-memory
> blob
> (default < 4096 bytes ) and the data is encoded in the id itself and
> these
> id encoded blobs won't be returned by the above method.

That works fine, thank you!

I am going to assume that all production-ready blob stores actually
implement this via the GarbageCollectableBlobStore interface, hopefulyl
that won't bite down the road.

Robert
> 
> Thanks
> Amit
> 
> [1]
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-blob/src/main
> /java/org/apache/jackrabbit/oak/spi/blob/AbstractBlobStore.java#L592
> 
> On Fri, May 6, 2016 at 6:30 PM, Robert Munteanu <ro...@apache.org>
> wrote:
> 
> > Hi,
> > 
> > Using the DocumentNodeStore with a Mongo backend, I'm trying to
> > match
> > blobs which are linked to certain nodes.
> > 
> > What I do is:
> > 
> > - look for nodes with _bin == 1
> > - look at the values from jcr:data property map
> > - for each value, strip the :blobId: and unquote the value
> > 
> > At this point, I expected a blob id which matches the ids of
> > objects
> > from the blobs collection. However, there seems to be some extra
> > information attached to the property in the nodes collection.
> > 
> > For example, if a blob has the id
> > '0100bd6c2070cccd6d571350269554f06d0bb928d57463e3e74c8434d00a274a52
> > c57f
> > 6cbd', it is referenced in a property as
> > '0100bd6c2070cccd6d571350269554f06d0bb928d57463e3e74c8434d00a274a52
> > c57f
> > 6cbd'. The difference is in the extra 10 bytes at the start (
> > '0100bd6c20' ). Those 10 bytes seem to vary between blob id
> > references,
> > so I'm not sure how to interpret them.
> > 
> > How do I figure out which is the blob id that I should be looking
> > for
> > in the blobs collection?
> > 
> > Thanks,
> > 
> > Robert
> > 


Re: [DocumentStore] Matching blob ids from a node property

Posted by Amit Jain <am...@ieee.org>.
Hi Robert,

You would have to use AbstractBlobStore#resolveChunks [1] to get the all
the blob ids available in the blob collection for the id stored in the
node.
For the example id you have it is likely that this is a in-memory blob
(default < 4096 bytes ) and the data is encoded in the id itself and these
id encoded blobs won't be returned by the above method.

Thanks
Amit

[1]
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-blob/src/main/java/org/apache/jackrabbit/oak/spi/blob/AbstractBlobStore.java#L592

On Fri, May 6, 2016 at 6:30 PM, Robert Munteanu <ro...@apache.org> wrote:

> Hi,
>
> Using the DocumentNodeStore with a Mongo backend, I'm trying to match
> blobs which are linked to certain nodes.
>
> What I do is:
>
> - look for nodes with _bin == 1
> - look at the values from jcr:data property map
> - for each value, strip the :blobId: and unquote the value
>
> At this point, I expected a blob id which matches the ids of objects
> from the blobs collection. However, there seems to be some extra
> information attached to the property in the nodes collection.
>
> For example, if a blob has the id
> '0100bd6c2070cccd6d571350269554f06d0bb928d57463e3e74c8434d00a274a52c57f
> 6cbd', it is referenced in a property as
> '0100bd6c2070cccd6d571350269554f06d0bb928d57463e3e74c8434d00a274a52c57f
> 6cbd'. The difference is in the extra 10 bytes at the start (
> '0100bd6c20' ). Those 10 bytes seem to vary between blob id references,
> so I'm not sure how to interpret them.
>
> How do I figure out which is the blob id that I should be looking for
> in the blobs collection?
>
> Thanks,
>
> Robert
>