You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Mingchun Zhao (Jira)" <ji...@apache.org> on 2023/05/12 15:21:00 UTC

[jira] [Commented] (CONNECTORS-1746) Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.

    [ https://issues.apache.org/jira/browse/CONNECTORS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722207#comment-17722207 ] 

Mingchun Zhao commented on CONNECTORS-1746:
-------------------------------------------

Hello,

Here is a patch for adding options for PostgreSQL’s “ANALYZE” command.
I’ve tried to add two properties to handle 'ANALYZE' command as below.
 # "org.apache.manifoldcf.db.postgres.analyzeatstart"
If this property is set to true, then analyze a table which is specified by property "org.apache.manifoldcf.db.postgres.analyze.<tablename>" at the start of job. defaults to false (not to run "ANALYZE" at the start).

 # "org.apache.manifoldcf.db.postgres.analyzeratethreshold"
If this property is set to a positive integer, then analyze a table which is specified by property "org.apache.manifoldcf.db.postgres.analyze.<tablename>" only when events per second drops below the threshold. defaults to 1 (1 event processed per second).

I tested using the attached patch and confirmed that the “ANALYZE” command was executed correctly in the above two situations. Especially, when MCF's throughput (event counts per second) dropped due to PostgreSQL's bad query plan, an “ANALYZE” command was executed and the MCF's performance recovered.

[^DBInterfacePostgreSQL.java.patch]

> Adding conditions to execute PostgreSQL's ANALYZE command to avoid crawling become extremely slow.
> --------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1746
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1746
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Web connector
>         Environment: Using ManifoldCF 2.24 with PostgreSQL 12.14 as the database. 
>            Reporter: Mingchun Zhao
>            Priority: Major
>         Attachments: DBInterfacePostgreSQL.java.patch
>
>
> Sometimes, the crawling does not process any documents for a while and there is nothing logged about long-running queries. The performance can be restored by firing the 'ANALYZE' command manually. It seems that a bad query plan caused this performance problem.
> Therefore, in addition to the current configuration parameter 'org.apache.manifoldcf.db.postgres.analyze.<tablename>', it is considered necessary to execute the 'ANALYZE' even in the following situations.
> 1. When the number of records in the table exceeds the number required for creating a execution plan after the job starts.
> 2. When the crawling performance slows down. For example, if the processing rate of documents drops below a specified threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)