You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Georg Henzler <ge...@netcentric.biz> on 2019/02/07 09:46:52 UTC

Problem with read limits & a query using a lucene index with many results (but below setting queryLimitReads)

Hi all,

We have a servlet in place that exports redirects to apache using rewrite
maps [1]. That servlet is running a query [2] against a large repository
that holds ~ 2 Mio nodes for the primary type cq:PageContent (as referenced
in the query). We have a lucene index defined for property redirectTarget
that holds around 1 Mio documents when checked via JMX [3]  (the custom
index also holds the properties sling:alias and sling:vanityPath that are
not strictly needed for this query but for another use case, see [7] for
exact definition). When checking the query with the explain query tool, it
always uses the index www_redirectmanager as desired. The amount of nodes
that have the property redirectTarget set is ~150,000. The servlet returns
usually within 1-2 minutes which is totally fine (it is called once per
hour).

Since upgrading to OAK 1.8.7 (we had 1.4.3 before without problems), we get
the error [6] in around 2% of the cases (so most of the time it works, but
sometimes we get the error and the servlet fails, it is *not*
deterministic). I suppose this is connected to the change in [5]. We have
already increased queryLimitInMemory and queryLimitReads
(PID org.apache.jackrabbit.oak.query.QueryEngineSettingsService) to 500,000
(from default 200,000) but we still get the error every now and then. We
had once one node that always (deterministically) returned the error [6],
after reindexing of [7] we were back to non-deterministic 2% of the queries
(but even while the problem was deterministic on that node, explain query
always returned that index to be used).

I have the following understanding:
1. The settings queryLimitInMemory and queryLimitReads both are evaluated
*after* the results form the index are retrieved (so the query engine asks
the index for nodes and gets ~150,000 and reads those and then applies
further criteria to filter the result set further, to avoid large result
sets for this filtering those properties are in place)
2. Having multiple properties in the index [3] should not really make a
difference for this particular problem since no matter how many properties
are held in index the result set for query [1] is always the same
3. No matter if the assumptions from 1. and 2. are true, the problem should
be deterministic

Has anyone else run into a similar problem? Are the assumptions above
correct? Obviously the query [1] could be split up to run many queries for
sub paths or even traverse all paths for the property, but conceptionally
it should really possible to do this in one query IMHO.

-Georg

[1] https://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
[2] SELECT * FROM [cq:PageContent] AS s WHERE ISDESCENDANTNODE([/content])
and s.[redirectTarget] is not null
[3]
/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3DLucene+Index+statistics%2Ctype%3DLuceneIndex
[4]
https://jackrabbit.apache.org/oak/docs/query/query-engine.html#Slow_Queries_and_Read_Limits
[5] https://issues.apache.org/jira/browse/OAK-6875
[6] 07.12.2018 11:01:22.408 *WARN* [192.168.166.72 [1544176801343] GET
/bin/www/redirectmap/redirecttarget HTTP/1.1]
org.apache.jackrabbit.oak.query.FilterIterators The query read or traversed
more than 500000 nodes.
java.lang.UnsupportedOperationException: The query read or traversed more
than 500000 nodes. To avoid affecting other tasks, processing was stopped.

[7]
    <www_redirectmanager
        jcr:primaryType="oak:QueryIndexDefinition"
        async="async"
        compatVersion="{Long}2"
        evaluatePathRestrictions="{Boolean}true"

excludedPaths="[/var,/system,/apps,/libs,/content/dam,/etc,/jcr:system]"
        reindex="{Boolean}false"
        reindexCount="{Long}7"
        type="lucene">
        <aggregates jcr:primaryType="nt:unstructured">
            <cq:PageContent jcr:primaryType="nt:unstructured">
                <include0
                    jcr:primaryType="nt:unstructured"
                    path="*"
                    relativeNode="{Boolean}false"/>
            </cq:PageContent>
        </aggregates>
        <facets jcr:primaryType="nt:unstructured"/>
        <indexRules jcr:primaryType="nt:unstructured">
            <cq:PageContent jcr:primaryType="nt:unstructured">
                <properties jcr:primaryType="nt:unstructured">
                    <redirectTarget
                        jcr:primaryType="nt:unstructured"
                        name="redirectTarget"
                        notNullCheckEnabled="{Boolean}true"
                        propertyIndex="{Boolean}true"/>
                    <alias
                        jcr:primaryType="nt:unstructured"
                        name="sling:alias"
                        notNullCheckEnabled="{Boolean}true"
                        propertyIndex="{Boolean}true"/>
                    <vanityPath
                        jcr:primaryType="nt:unstructured"
                        name="sling:vanityPath"
                        notNullCheckEnabled="{Boolean}true"
                        propertyIndex="{Boolean}true"/>
                </properties>
            </cq:PageContent>
        </indexRules>
    </www_redirectmanager>

Re: [Initially posted to users@j.a.o] Problem with read limits & a query using a lucene index with many results (but below setting queryLimitReads)

Posted by Georg Henzler <ja...@ghenzler.de>.

Hi Thomas,

thanks for the quick answer!

> Yes, I have seen cases where an index is re-opened during query
> execution. In that case, already returned entries are read again and
> skipped, so basically counted twice. I think it would be good to fix
> this (only count entries once).

It sounds like this is the root cause of my problem. I created OAK-8046
for it to have it tracked.

> I think queries should read at most a few thousands entries. That way,
> there are no problems if the limit is set to 100'000. If an
> application needs to read more than that, then best run multiple
> queries, using keyset pagination if needed:
> 
> * https://blog.jooq.org/tag/keyset-pagination/
> * https://use-the-index-luke.com/no-offset

The use case with the redirect maps is not really UI related and it does
not use an offset. Splitting up the redirect map generation in multiple
queries is not that straight forward since we have ~2 Mio nodes... and
the content magnitude of first level root paths change over time. So it
could be possible to have the problem fixed for a little while and then
one of the root paths goes over the 100,000 again and the problem is 
back.
Before splitting it up in multiple queries I would rather use a 
traversal
without query (that is at least 100% safe), but doing that feels 
wasteful
- the lucene index is in place and ready to be used. Also it worked
perfectly fine with oak 1.4.3. I still think it would be great to have
a query option to disable it for "export use cases" like the one 
described.

-Georg

Re: [Initially posted to users@j.a.o] Problem with read limits & a query using a lucene index with many results (but below setting queryLimitReads)

Posted by Thomas Mueller <mu...@adobe.com.INVALID>.

Hi,

> Wouldn't it make sense to introduce a query option ala [1] to disable read/memory limits for one particular query?

It's possible, but my fear is that people would use the option in their queries too often...

> OAK-6875 does not always have the desired effect (for sure there is some un-deterministic behaviour for large content being accessed

Yes, I have seen cases where an index is re-opened during query execution. In that case, already returned entries are read again and skipped, so basically counted twice. I think it would be good to fix this (only count entries once).

I think queries should read at most a few thousands entries. That way, there are no problems if the limit is set to 100'000. If an application needs to read more than that, then best run multiple queries, using keyset pagination if needed:

* https://blog.jooq.org/tag/keyset-pagination/
* https://use-the-index-luke.com/no-offset

Regards,
Thomas

[Initially posted to users@j.a.o] Problem with read limits & a query using a lucene index with many results (but below setting queryLimitReads)

Posted by Georg Henzler <ja...@ghenzler.de>.

Hi all,

sorry for cross-posting, but I didn't get an answer on the users list.

I think the change made with OAK-6875 does not always have the desired 
effect (for sure there is some un-deterministic behaviour for large 
content being accessed via a lucene index which should at least be 
deterministic&explainable). See below email for details (if somebody 
could confirm or reject my assumptions, that would already help a lot!)

Also in general: Wouldn't it make sense to introduce a query option ala 
[1] to disable read/memory limits for one particular query? It would 
then just be a safety net for queries that unexpectedly exceed the 
limits, for special use cases as described below it could be turned it 
off.

-Georg

[1] 
https://jackrabbit.apache.org/oak/docs/query/query-engine.html#Query_Option_Index_Tag

-------- Original Message --------
Subject: Problem with read limits & a query using a lucene index with 
many results (but below setting queryLimitReads)
Date: 2019-02-07 01:46
 From: Georg Henzler <ge...@netcentric.biz>
To: users@jackrabbit.apache.org
Reply-To: users@jackrabbit.apache.org

Hi all,

We have a servlet in place that exports redirects to apache using 
rewrite
maps [1]. That servlet is running a query [2] against a large repository
that holds ~ 2 Mio nodes for the primary type cq:PageContent (as 
referenced
in the query). We have a lucene index defined for property 
redirectTarget
that holds around 1 Mio documents when checked via JMX [3]  (the custom
index also holds the properties sling:alias and sling:vanityPath that 
are
not strictly needed for this query but for another use case, see [7] for
exact definition). When checking the query with the explain query tool, 
it
always uses the index www_redirectmanager as desired. The amount of 
nodes
that have the property redirectTarget set is ~150,000. The servlet 
returns
usually within 1-2 minutes which is totally fine (it is called once per
hour).

Since upgrading to OAK 1.8.7 (we had 1.4.3 before without problems), we 
get
the error [6] in around 2% of the cases (so most of the time it works, 
but
sometimes we get the error and the servlet fails, it is *not*
deterministic). I suppose this is connected to the change in [5]. We 
have
already increased queryLimitInMemory and queryLimitReads
(PID org.apache.jackrabbit.oak.query.QueryEngineSettingsService) to 
500,000
(from default 200,000) but we still get the error every now and then. We
had once one node that always (deterministically) returned the error 
[6],
after reindexing of [7] we were back to non-deterministic 2% of the 
queries
(but even while the problem was deterministic on that node, explain 
query
always returned that index to be used).

I have the following understanding:
1. The settings queryLimitInMemory and queryLimitReads both are 
evaluated
*after* the results form the index are retrieved (so the query engine 
asks
the index for nodes and gets ~150,000 and reads those and then applies
further criteria to filter the result set further, to avoid large result
sets for this filtering those properties are in place)
2. Having multiple properties in the index [3] should not really make a
difference for this particular problem since no matter how many 
properties
are held in index the result set for query [1] is always the same
3. No matter if the assumptions from 1. and 2. are true, the problem 
should
be deterministic

Has anyone else run into a similar problem? Are the assumptions above
correct? Obviously the query [1] could be split up to run many queries 
for
sub paths or even traverse all paths for the property, but 
conceptionally
it should really possible to do this in one query IMHO.

-Georg

[1] https://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
[2] SELECT * FROM [cq:PageContent] AS s WHERE 
ISDESCENDANTNODE([/content])
and s.[redirectTarget] is not null
[3]
/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3DLucene+Index+statistics%2Ctype%3DLuceneIndex
[4]
https://jackrabbit.apache.org/oak/docs/query/query-engine.html#Slow_Queries_and_Read_Limits
[5] https://issues.apache.org/jira/browse/OAK-6875
[6] 07.12.2018 11:01:22.408 *WARN* [192.168.166.72 [1544176801343] GET
/bin/www/redirectmap/redirecttarget HTTP/1.1]
org.apache.jackrabbit.oak.query.FilterIterators The query read or 
traversed
more than 500000 nodes.
java.lang.UnsupportedOperationException: The query read or traversed 
more
than 500000 nodes. To avoid affecting other tasks, processing was 
stopped.

[7]
     <www_redirectmanager
         jcr:primaryType="oak:QueryIndexDefinition"
         async="async"
         compatVersion="{Long}2"
         evaluatePathRestrictions="{Boolean}true"

excludedPaths="[/var,/system,/apps,/libs,/content/dam,/etc,/jcr:system]"
         reindex="{Boolean}false"
         reindexCount="{Long}7"
         type="lucene">
         <aggregates jcr:primaryType="nt:unstructured">
             <cq:PageContent jcr:primaryType="nt:unstructured">
                 <include0
                     jcr:primaryType="nt:unstructured"
                     path="*"
                     relativeNode="{Boolean}false"/>
             </cq:PageContent>
         </aggregates>
         <facets jcr:primaryType="nt:unstructured"/>
         <indexRules jcr:primaryType="nt:unstructured">
             <cq:PageContent jcr:primaryType="nt:unstructured">
                 <properties jcr:primaryType="nt:unstructured">
                     <redirectTarget
                         jcr:primaryType="nt:unstructured"
                         name="redirectTarget"
                         notNullCheckEnabled="{Boolean}true"
                         propertyIndex="{Boolean}true"/>
                     <alias
                         jcr:primaryType="nt:unstructured"
                         name="sling:alias"
                         notNullCheckEnabled="{Boolean}true"
                         propertyIndex="{Boolean}true"/>
                     <vanityPath
                         jcr:primaryType="nt:unstructured"
                         name="sling:vanityPath"
                         notNullCheckEnabled="{Boolean}true"
                         propertyIndex="{Boolean}true"/>
                 </properties>
             </cq:PageContent>
         </indexRules>
     </www_redirectmanager>