You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Mingchun Zhao (Jira)" <ji...@apache.org> on 2023/05/07 01:07:00 UTC

[jira] [Created] (CONNECTORS-1746) Adding execution conditions of PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

Mingchun Zhao created CONNECTORS-1746:
-----------------------------------------

             Summary: Adding execution conditions of PostgreSQL's ANALYZE command to avoid crawling become extremely slow.
                 Key: CONNECTORS-1746
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
             Project: ManifoldCF
          Issue Type: Improvement
          Components: Web connector
         Environment: I am using ManifoldCF 2.24 with PostgreSQL 12.14 as the database. 
            Reporter: Mingchun Zhao


Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem.

Therefore, in addition to the current configuration parameter org.apache.manifoldcf.db.postgres.analyze.<tablename> , it is considered necessary to execute the 'ANALYZE' even in the following situations.
1. When the number of records in the table exceeds the number required for creating an query plan after the job starts.
2. When the crawling performance slows down. For example, if the document processing rate drops below a specified threshold. 

How about adding two parameters to handle the timing of 'ANALYZE' execution as below?
1. `org.apache.manifoldcf.db.postgres.analyze.<tablename>.minimumrowcount`
Specify how many records should be accumulated before carrying out an 'ANALYZE' on the specified table as the first time.defaults to 100.
2.`org.apache.manifoldcf.db.postgres.analyze.<tablename>.minimumprocessrate`
Specify the number of documents processed in the last minute. If the actual processing rate falls below this, the 'ANALYZE' will be carrying out. defaults to 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)