You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "rishabhdaim (via GitHub)" <gi...@apache.org> on 2023/02/23 10:49:44 UTC

[GitHub] [jackrabbit-oak] rishabhdaim opened a new pull request, #858: OAK-10114 : created new query API with projections to limit memory us…

rishabhdaim opened a new pull request, #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858

   …age when fetching huge no. of records


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] rishabhdaim commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "rishabhdaim (via GitHub)" <gi...@apache.org>.
rishabhdaim commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1124126499


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +490,45 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projections {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.

Review Comment:
   This contract `to fetch all if projections are empty` is in line with how mongo behaves.
   If we provide an empty projection array, then mongo returns all the results.
   In case we want specific fields, then we need to mention those individually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] mreutegg commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "mreutegg (via GitHub)" <gi...@apache.org>.
mreutegg commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1129342058


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +489,55 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projection {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.
+     * @return the list (possibly empty)
+     * @throws DocumentStoreException if the operation failed. E.g. because of
+     *          an I/O error.
+     */
+    @NotNull
+    default <T extends Document> List<T> query(final Collection<T> collection,
+                                               final String fromKey,
+                                               final String toKey,
+                                               final String indexedProperty,
+                                               final long startValue,
+                                               final int limit,
+                                               final List<String> projection) throws DocumentStoreException {
+
+        final List<T> list = query(collection, fromKey, toKey, indexedProperty, startValue, limit);
+
+        if (projection == null || projection.isEmpty()) {
+            list.forEach(Document::seal);
+            return list;
+        }
+
+        final Set<String> projectedSet = newHashSet(projection);
+        projectedSet.add(ID);
+
+        list.forEach(t -> {
+            t.unSeal();
+            t.keySet().retainAll(projectedSet);
+            t.seal();
+        });

Review Comment:
   I think this may also be the reason why DocumentNodeStoreBranchesTest.commitHookChangesOnBranchWithInterference is currently failing.



##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +489,55 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projection {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.
+     * @return the list (possibly empty)
+     * @throws DocumentStoreException if the operation failed. E.g. because of
+     *          an I/O error.
+     */
+    @NotNull
+    default <T extends Document> List<T> query(final Collection<T> collection,
+                                               final String fromKey,
+                                               final String toKey,
+                                               final String indexedProperty,
+                                               final long startValue,
+                                               final int limit,
+                                               final List<String> projection) throws DocumentStoreException {
+
+        final List<T> list = query(collection, fromKey, toKey, indexedProperty, startValue, limit);
+
+        if (projection == null || projection.isEmpty()) {
+            list.forEach(Document::seal);

Review Comment:
   This shouldn't be necessary, because implementations should already return sealed documents when they are e.g. served from a cache.



##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +489,55 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projection {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.
+     * @return the list (possibly empty)
+     * @throws DocumentStoreException if the operation failed. E.g. because of
+     *          an I/O error.
+     */
+    @NotNull
+    default <T extends Document> List<T> query(final Collection<T> collection,
+                                               final String fromKey,
+                                               final String toKey,
+                                               final String indexedProperty,
+                                               final long startValue,
+                                               final int limit,
+                                               final List<String> projection) throws DocumentStoreException {
+
+        final List<T> list = query(collection, fromKey, toKey, indexedProperty, startValue, limit);
+
+        if (projection == null || projection.isEmpty()) {
+            list.forEach(Document::seal);
+            return list;
+        }
+
+        final Set<String> projectedSet = newHashSet(projection);
+        projectedSet.add(ID);
+
+        list.forEach(t -> {
+            t.unSeal();
+            t.keySet().retainAll(projectedSet);
+            t.seal();
+        });

Review Comment:
   Doesn't this modify Documents potentially coming from a cache? See also earlier comment about un-sealing a Document.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] rishabhdaim commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "rishabhdaim (via GitHub)" <gi...@apache.org>.
rishabhdaim commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1124127355


##########
oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/MongoDocumentStoreTest.java:
##########
@@ -241,6 +242,30 @@ public void queryWithLimit() throws Exception {
         store.dispose();
     }
 
+    @Test
+    public void queryWithProjections() {
+        DocumentStore docStore = openDocumentStore();
+        DocumentNodeStore store = new DocumentMK.Builder()
+                .setDocumentStore(docStore).setAsyncDelay(0).getNodeStore();
+        Revision rev = Revision.newRevision(0);
+        List<UpdateOp> inserts = new ArrayList<UpdateOp>();
+        for (int i = 0; i < DocumentMK.MANY_CHILDREN_THRESHOLD * 2; i++) {
+            DocumentNodeState n = new DocumentNodeState(store, Path.fromString("/node-" + i),
+                    new RevisionVector(rev));
+            inserts.add(n.asOperation(rev));
+        }
+        docStore.create(Collection.NODES, inserts);
+        List<NodeDocument> docs = docStore.query(Collection.NODES,
+                Utils.getKeyLowerLimit(Path.ROOT),  Utils.getKeyUpperLimit(Path.ROOT), null, 0,
+                DocumentMK.MANY_CHILDREN_THRESHOLD, Lists.newArrayList(ID));
+        // since both _id & _modCount are mandatory, so data size should be 2

Review Comment:
   This won't be required after we make changes regarding putting documents into cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] mreutegg commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "mreutegg (via GitHub)" <gi...@apache.org>.
mreutegg commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1129336932


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/Document.java:
##########
@@ -151,6 +151,22 @@ public void seal() {
         }
     }
 
+    /**
+     * UnSeals this document and turns it into a mutable object.
+     *
+     * @see #seal()
+     */
+    void unSeal() {

Review Comment:
   I don't like adding method. The idea behind sealing a document is to ensure it becomes immutable. A DocumentStore implementation can use this to construct a Document, put it into a cache and then return it to a caller. With this new method a caller can modify a sealed Document, violating a basic principle.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] mreutegg commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "mreutegg (via GitHub)" <gi...@apache.org>.
mreutegg commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1132315205


##########
oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/MongoDocumentStoreTest.java:
##########
@@ -241,6 +243,80 @@ public void queryWithLimit() throws Exception {
         store.dispose();
     }
 
+    @Test
+    public void queryWithProjection() {
+        DocumentStore docStore = openDocumentStore();
+        DocumentNodeStore store = new DocumentMK.Builder()
+                .setDocumentStore(docStore).setAsyncDelay(0).getNodeStore();
+        Revision rev = Revision.newRevision(0);
+        List<UpdateOp> inserts = new ArrayList<UpdateOp>();
+        for (int i = 0; i < 10; i++) {
+            DocumentNodeState n = new DocumentNodeState(store, Path.fromString("/node-" + i),
+                    new RevisionVector(rev));
+            inserts.add(n.asOperation(rev));
+        }
+        docStore.create(Collection.NODES, inserts);
+        List<NodeDocument> docs = docStore.query(Collection.NODES,
+                Utils.getKeyLowerLimit(Path.ROOT),  Utils.getKeyUpperLimit(Path.ROOT), null, 0,
+                20, newArrayList(MODIFIED_IN_SECS));
+        // since _id is mandatory, so data size should be 2
+        docs.forEach(d -> assertEquals(2 , d.data.size()));
+        assertEquals(10, docs.size());
+        store.dispose();
+    }
+
+    @Test
+    public void queryWithEmptyProjection() {
+        DocumentStore docStore = openDocumentStore();
+        DocumentNodeStore store = new DocumentMK.Builder()
+                .setDocumentStore(docStore).setAsyncDelay(0).getNodeStore();
+        Revision rev = Revision.newRevision(0);
+        List<UpdateOp> inserts = new ArrayList<UpdateOp>();
+        for (int i = 0; i < 10; i++) {
+            DocumentNodeState n = new DocumentNodeState(store, Path.fromString("/node-" + i),
+                    new RevisionVector(rev));
+            inserts.add(n.asOperation(rev));
+        }
+        docStore.create(Collection.NODES, inserts);
+        List<NodeDocument> docs = docStore.query(Collection.NODES,
+                Utils.getKeyLowerLimit(Path.ROOT),  Utils.getKeyUpperLimit(Path.ROOT), null, 0,
+                20, newArrayList());
+        if (MONGO_DB) {
+            // we have _modCount as additional field in case we use MongoDocumentStore
+            docs.forEach(d -> assertEquals(4 , d.data.size()));
+        } else {
+            docs.forEach(d -> assertEquals(3 , d.data.size()));
+        }
+        assertEquals(10, docs.size());
+        store.dispose();
+    }
+
+    @Test
+    public void queryWithNullProjection() {
+        DocumentStore docStore = openDocumentStore();
+        DocumentNodeStore store = new DocumentMK.Builder()
+                .setDocumentStore(docStore).setAsyncDelay(0).getNodeStore();
+        Revision rev = Revision.newRevision(0);
+        List<UpdateOp> inserts = new ArrayList<UpdateOp>();
+        for (int i = 0; i < 10; i++) {
+            DocumentNodeState n = new DocumentNodeState(store, Path.fromString("/node-" + i),
+                    new RevisionVector(rev));
+            inserts.add(n.asOperation(rev));
+        }
+        docStore.create(Collection.NODES, inserts);
+        List<NodeDocument> docs = docStore.query(Collection.NODES,
+                Utils.getKeyLowerLimit(Path.ROOT),  Utils.getKeyUpperLimit(Path.ROOT), null, 0,
+                20, null);
+        if (MONGO_DB) {
+            // we have _modCount as additional field in case we use MongoDocumentStore
+            docs.forEach(d -> assertEquals(4 , d.data.size()));
+        } else {
+            docs.forEach(d -> assertEquals(3 , d.data.size()));
+        }
+        assertEquals(10, docs.size());
+        store.dispose();
+    }
+

Review Comment:
   Confusingly MongoDocumentStoreTest does not use MongoDB as a DocumentStore by default.
   
   I would rather move these tests to MongoDocumentStoreIT: https://github.com/rishabhdaim/jackrabbit-oak/pull/2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] mreutegg commented on pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "mreutegg (via GitHub)" <gi...@apache.org>.
mreutegg commented on PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#issuecomment-1463970210

   Other than moving the tests, this PR now looks good to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] rishabhdaim commented on pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "rishabhdaim (via GitHub)" <gi...@apache.org>.
rishabhdaim commented on PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#issuecomment-1441549322

   @mreutegg @stefan-egli could you please review and share the feedback?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] reschke commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "reschke (via GitHub)" <gi...@apache.org>.
reschke commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1124224777


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +490,45 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projections {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.

Review Comment:
   "This is what MongoDB does" IMHO is not sufficient reason for something that would apply to all DocumentStore implementations. Leaking implementation details...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] reschke commented on pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "reschke (via GitHub)" <gi...@apache.org>.
reschke commented on PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#issuecomment-1453239389

   It would be good if the JIRA issue was more clear about the fact that a new DocumentStore is being introduced; and that API change should actually be described over there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] rishabhdaim commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "rishabhdaim (via GitHub)" <gi...@apache.org>.
rishabhdaim commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1129366551


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +489,55 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projection {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.
+     * @return the list (possibly empty)
+     * @throws DocumentStoreException if the operation failed. E.g. because of
+     *          an I/O error.
+     */
+    @NotNull
+    default <T extends Document> List<T> query(final Collection<T> collection,
+                                               final String fromKey,
+                                               final String toKey,
+                                               final String indexedProperty,
+                                               final long startValue,
+                                               final int limit,
+                                               final List<String> projection) throws DocumentStoreException {
+
+        final List<T> list = query(collection, fromKey, toKey, indexedProperty, startValue, limit);
+
+        if (projection == null || projection.isEmpty()) {
+            list.forEach(Document::seal);
+            return list;
+        }
+
+        final Set<String> projectedSet = newHashSet(projection);
+        projectedSet.add(ID);
+
+        list.forEach(t -> {
+            t.unSeal();
+            t.keySet().retainAll(projectedSet);
+            t.seal();
+        });

Review Comment:
   Yes, I agree this would modify the documents coming from the cache.
   
   I will fix this by cloning the document and then returning the cloned document with projected fields only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] mreutegg merged pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "mreutegg (via GitHub)" <gi...@apache.org>.
mreutegg merged PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] rishabhdaim commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "rishabhdaim (via GitHub)" <gi...@apache.org>.
rishabhdaim commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1124587766


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +490,45 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projections {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.

Review Comment:
   We need a mechanism to fetch all the fields on the document. Since we don't know the field names in advance and they can change at any time (given the schema-less nature of Mongo), I think passing an empty list makes sense to fetch the entire document.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] mreutegg commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "mreutegg (via GitHub)" <gi...@apache.org>.
mreutegg commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1117032596


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoDocumentStore.java:
##########
@@ -722,8 +725,19 @@ public <T extends Document> List<T> query(Collection<T> collection,
                                               String indexedProperty,
                                               long startValue,
                                               int limit) {
-        return queryWithRetry(collection, fromKey, toKey, indexedProperty,
-                startValue, limit, maxQueryTimeMS);
+        return query(collection, fromKey, toKey, indexedProperty, startValue, limit, emptyList());

Review Comment:
   Ah, now I see why an empty projection must return the entire document. I'm still not sure this is a good idea.



##########
oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/MongoDocumentStoreTest.java:
##########
@@ -241,6 +242,30 @@ public void queryWithLimit() throws Exception {
         store.dispose();
     }
 
+    @Test
+    public void queryWithProjections() {
+        DocumentStore docStore = openDocumentStore();
+        DocumentNodeStore store = new DocumentMK.Builder()
+                .setDocumentStore(docStore).setAsyncDelay(0).getNodeStore();
+        Revision rev = Revision.newRevision(0);
+        List<UpdateOp> inserts = new ArrayList<UpdateOp>();
+        for (int i = 0; i < DocumentMK.MANY_CHILDREN_THRESHOLD * 2; i++) {
+            DocumentNodeState n = new DocumentNodeState(store, Path.fromString("/node-" + i),
+                    new RevisionVector(rev));
+            inserts.add(n.asOperation(rev));
+        }
+        docStore.create(Collection.NODES, inserts);
+        List<NodeDocument> docs = docStore.query(Collection.NODES,
+                Utils.getKeyLowerLimit(Path.ROOT),  Utils.getKeyUpperLimit(Path.ROOT), null, 0,
+                DocumentMK.MANY_CHILDREN_THRESHOLD, Lists.newArrayList(ID));
+        // since both _id & _modCount are mandatory, so data size should be 2

Review Comment:
   This is not consistent with the contract in DocumentStore where it says only `_id` is mandatory.



##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +490,45 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projections {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.

Review Comment:
   This contract is not intuitive to me. Wouldn't it be better in to return documents with `_id` field only when the `projections` list is empty? One can still use the existing query method if all fields are needed.



##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoDocumentStore.java:
##########
@@ -823,6 +839,14 @@ && canUseModifiedTimeIdx(startValue)) {
                 } else {
                     result = dbCollection.find(query);
                 }
+
+                if (!projections.isEmpty()) {
+                    // it is mandatory to fetch MOD_COUNT because of cacheChangesTracker
+                    projections.add(MOD_COUNT);
+                    Bson projection = include(projections);
+                    result.projection(projection);

Review Comment:
   We need to be careful with the projection in `queryInternal()`. Later in this method documents are put into a cache: `nodeCache.putNonConflictingDocs()`. Documents with a projection are incomplete and later reads for complete documents may be served from cache.
   It might be best to simply bypass the cache when there is a projection and not put those documents in the cache.



##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +490,45 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projections {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.
+     * @return the list (possibly empty)
+     * @throws DocumentStoreException if the operation failed. E.g. because of
+     *          an I/O error.
+     */
+    @NotNull
+    default <T extends Document> List<T> query(final Collection<T> collection,
+                                               final String fromKey,
+                                               final String toKey,
+                                               final String indexedProperty,
+                                               final long startValue,
+                                               final int limit,
+                                               final List<String> projections) throws DocumentStoreException {
+        final Set<String> projectedSet = newHashSet(projections);
+        projectedSet.add(ID);
+        projectedSet.add(MOD_COUNT);
+
+        final List<T> list = query(collection, fromKey, toKey, indexedProperty, startValue, limit);
+        list.stream().filter(doc -> !doc.isSealed()).forEach(doc -> doc.keySet().retainAll(projectedSet));

Review Comment:
   I would unconditionally create new documents with the projection applied. I'm surprised this works for some DocumentStore implementations. I was expecting all returned documents are sealed. Such documents can be served from cache and it would be dangerous if a caller can modify them.



##########
oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/MongoDocumentStoreTest.java:
##########
@@ -241,6 +242,30 @@ public void queryWithLimit() throws Exception {
         store.dispose();
     }
 
+    @Test
+    public void queryWithProjections() {
+        DocumentStore docStore = openDocumentStore();
+        DocumentNodeStore store = new DocumentMK.Builder()
+                .setDocumentStore(docStore).setAsyncDelay(0).getNodeStore();
+        Revision rev = Revision.newRevision(0);
+        List<UpdateOp> inserts = new ArrayList<UpdateOp>();
+        for (int i = 0; i < DocumentMK.MANY_CHILDREN_THRESHOLD * 2; i++) {

Review Comment:
   Why add so many documents? It seems unnecessary to me for this test.



##########
oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/MongoDocumentStoreTest.java:
##########
@@ -241,6 +242,30 @@ public void queryWithLimit() throws Exception {
         store.dispose();
     }
 
+    @Test
+    public void queryWithProjections() {
+        DocumentStore docStore = openDocumentStore();
+        DocumentNodeStore store = new DocumentMK.Builder()
+                .setDocumentStore(docStore).setAsyncDelay(0).getNodeStore();
+        Revision rev = Revision.newRevision(0);
+        List<UpdateOp> inserts = new ArrayList<UpdateOp>();
+        for (int i = 0; i < DocumentMK.MANY_CHILDREN_THRESHOLD * 2; i++) {
+            DocumentNodeState n = new DocumentNodeState(store, Path.fromString("/node-" + i),
+                    new RevisionVector(rev));
+            inserts.add(n.asOperation(rev));
+        }
+        docStore.create(Collection.NODES, inserts);
+        List<NodeDocument> docs = docStore.query(Collection.NODES,
+                Utils.getKeyLowerLimit(Path.ROOT),  Utils.getKeyUpperLimit(Path.ROOT), null, 0,
+                DocumentMK.MANY_CHILDREN_THRESHOLD, Lists.newArrayList(ID));
+        // since both _id & _modCount are mandatory, so data size should be 2
+        if (MONGO_DB) {
+            docs.forEach(d -> assertEquals(2 , d.data.size()));
+        }

Review Comment:
   It would be better if behaviour for all implementations was the same, so we don't have to special case here.



##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +490,45 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projections {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.
+     * @return the list (possibly empty)
+     * @throws DocumentStoreException if the operation failed. E.g. because of
+     *          an I/O error.
+     */
+    @NotNull
+    default <T extends Document> List<T> query(final Collection<T> collection,
+                                               final String fromKey,
+                                               final String toKey,
+                                               final String indexedProperty,
+                                               final long startValue,
+                                               final int limit,
+                                               final List<String> projections) throws DocumentStoreException {

Review Comment:
   I think it should be singular: `projection`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [jackrabbit-oak] rishabhdaim commented on a diff in pull request #858: OAK-10114 : created new query API with projections to limit memory us…

Posted by "rishabhdaim (via GitHub)" <gi...@apache.org>.
rishabhdaim commented on code in PR #858:
URL: https://github.com/apache/jackrabbit-oak/pull/858#discussion_r1129361097


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java:
##########
@@ -486,4 +489,55 @@ default int getNodeNameLimit() {
     default Throttler throttler() {
         return NO_THROTTLING;
     }
+
+    /**
+     * Get a list of documents with only projected fields (as mentioned in projections param)
+     * along with "_id" field and where the key is greater than a start value and
+     * less than an end value <em>and</em> the given "indexed property" is greater
+     * or equals the specified value.
+     * <p>
+     * The indexed property can either be a {@link Long} value, in which case numeric
+     * comparison applies, or a {@link Boolean} value, in which case "false" is mapped
+     * to "0" and "true" is mapped to "1".
+     * <p>
+     * The returned documents are sorted by key and are immutable.
+     *
+     * @param <T> the document type
+     * @param collection the collection
+     * @param fromKey the start value (excluding)
+     * @param toKey the end value (excluding)
+     * @param indexedProperty the name of the indexed property (optional)
+     * @param startValue the minimum value of the indexed property
+     * @param limit the maximum number of entries to return
+     * @param projection {@link List} of projected keys (optional). Keep this empty to fetch all fields on document.
+     * @return the list (possibly empty)
+     * @throws DocumentStoreException if the operation failed. E.g. because of
+     *          an I/O error.
+     */
+    @NotNull
+    default <T extends Document> List<T> query(final Collection<T> collection,
+                                               final String fromKey,
+                                               final String toKey,
+                                               final String indexedProperty,
+                                               final long startValue,
+                                               final int limit,
+                                               final List<String> projection) throws DocumentStoreException {
+
+        final List<T> list = query(collection, fromKey, toKey, indexedProperty, startValue, limit);
+
+        if (projection == null || projection.isEmpty()) {
+            list.forEach(Document::seal);

Review Comment:
   sure, will revert this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org