You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2015/02/06 08:07:34 UTC

[jira] [Commented] (OAK-2466) DataStoreBlobStore: chunk ids should not contain the size

    [ https://issues.apache.org/jira/browse/OAK-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308736#comment-14308736 ] 

Thomas Mueller commented on OAK-2466:
-------------------------------------

Here is a patch (work in progress). It contains a print statement, because I wanted to find out if there is a unit test for this. It looks like there is none(!) yet. We need to add one. Data store garbage collection is too dangerous to _not_ have a unit test.

{noformat}
### Eclipse Workspace Patch 1.0
#P oak-core
Index: src/main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/DataStoreBlobStore.java
===================================================================
--- src/main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/DataStoreBlobStore.java	(revision 1656712)
+++ src/main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/DataStoreBlobStore.java	(working copy)
@@ -364,9 +364,6 @@
         }), new Function<DataRecord, String>() {
             @Override
             public String apply(DataRecord input) {
-                if (encodeLengthInId) {
-                    return BlobId.of(input).encodedValue();
-                }
                 return input.getIdentifier().toString();
             }
         });
@@ -376,8 +373,7 @@
     public boolean deleteChunks(List<String> chunkIds, long maxLastModifiedTime) throws Exception {
         if (delegate instanceof MultiDataStoreAware) {
             for (String chunkId : chunkIds) {
-                String blobId = extractBlobId(chunkId);
-                DataIdentifier identifier = new DataIdentifier(blobId);
+                DataIdentifier identifier = new DataIdentifier(chunkId);
                 DataRecord dataRecord = delegate.getRecord(identifier);
                 boolean success = (maxLastModifiedTime <= 0)
                         || dataRecord.getLastModified() <= maxLastModifiedTime;
@@ -391,7 +387,9 @@
 
     @Override
     public Iterator<String> resolveChunks(String blobId) throws IOException {
-        return Iterators.singletonIterator(blobId);
+        String chunkId = BlobId.of(blobId).getBlobIdWithoutLength();
+System.out.println("@@@@ recolve " + blobId + " ==== " + chunkId);        
+        return Iterators.singletonIterator(chunkId);
     }
 
     //~---------------------------------------------< Object >
@@ -523,6 +521,10 @@
                 return blobId;
             }
         }
+        
+        String getBlobIdWithoutLength() {
+            return blobId;
+        }
 
         boolean hasLengthInfo() {
             return length != -1;

{noformat}

> DataStoreBlobStore: chunk ids should not contain the size
> ---------------------------------------------------------
>
>                 Key: OAK-2466
>                 URL: https://issues.apache.org/jira/browse/OAK-2466
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.2
>
>
> The blob store garbage collection (data store garbage collection) uses the chunk ids to identify binaries to be deleted. The blob ids contain the size now (<contentHash>#<size>), and the blob id is currently equal to the chunk id.
> It would be more efficient to _not_ use the size, and instead just use the content hash, for the chunk ids. That way, enumerating the entries that are in the store is potentially faster. Also, it allows us to change the blob id in the future, for example add more information to it (for example the creation time, or the first few bytes of the content) if we ever want to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)