You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Jörg Hoh (JIRA)" <ji...@apache.org> on 2018/10/11 08:28:00 UTC
[jira] [Created] (OAK-7819) Improve logging for indexing progress
Jörg Hoh created OAK-7819:
-----------------------------
Summary: Improve logging for indexing progress
Key: OAK-7819
URL: https://issues.apache.org/jira/browse/OAK-7819
Project: Jackrabbit Oak
Issue Type: Improvement
Components: indexing
Affects Versions: 1.8.2
Reporter: Jörg Hoh
At the moment I am trying to understand how I can improve the indexing performance of my RDB-based Oak setup.
Currently the indexing progress is logged like this:
{noformat}
10.10.2018 13:00:04.077 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing will be performed for following indexes: [/oak:index/nodetype]
10.10.2018 13:00:15.911 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #10000 <path> [666,60 nodes/s, 2399760,00 nodes/hr]
10.10.2018 13:00:21.792 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #20000 <path> [999,95 nodes/s, 3599820,00 nodes/hr]
10.10.2018 13:00:27.211 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #30000 <path> [1153,81 nodes/s, 4153707,69 nodes/hr]
10.10.2018 13:00:31.581 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #40000 <path> [1333,30 nodes/s, 4799880,00 nodes/hr]
...
10.10.2018 13:13:44.585 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #580000 <path> [704,74 nodes/s, 2537055,16 nodes/hr]
10.10.2018 13:14:04.738 *INFO* [Apache Sling Repository Startup Thread] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Reindexing Traversed #590000 <path> [699,88 nodes/s, 2519568,68 nodes/hr]
...
{noformat}
But it isn't clear to me how much of the time is spent on
* fetching the nodes to be indexed from the repo (in our case residing in the RDB)
* the actual indexing computation
* the time to store extracted index data
having a more detailed logging of these individual aspects could shed some more light on the bottlenecks of this process.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)