You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Julien <ju...@francelabs.com> on 2019/05/28 12:37:11 UTC

Long running queries on jobqueue

Hi, 

I need some help to better understand what I am experiencing with a job using a JDBC connector. The job crawls a table containing approximately 9M of documents. I am currently unable to complete this job as it randomly fails with the Following error :

Error : Unexpected jobqueue status - record id 15588697113928, expecting active status, saw 0

The logs are showing a lot of long running queries, here is an example but there are a lot and on different query kind, but all of them concern the jobqueue table  :

WARN 2019-05-27T01:16:56,048 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|Found a long-running query (92568 ms): [SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 40000]
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 0: '1.000000001E9'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 1: 'P'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 2: 'G'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 3: 'R'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 4: '1558919723463'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 5: 'A'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 6: 'a'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 7: '5'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 8: 'A'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 9: 'F'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 10: 'a'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 11: 'f'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 12: 'D'
 WARN 2019-05-27T01:16:56,053 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|  Parameter 13: 'd'
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan: Limit  (cost=89494.71..89495.11 rows=160 width=91) (actual time=15468.494..15481.412 rows=40000 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:   ->  Sort  (cost=89494.71..89495.11 rows=160 width=91) (actual time=15468.493..15477.328 rows=40000 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:         Sort Key: t0.docpriority
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:         Sort Method: external sort  Disk: 102584kB
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:         ->  Nested Loop Anti Join  (cost=42423.35..89488.86 rows=160 width=91) (actual time=234.622..13792.727 rows=1164882 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:               ->  Hash Anti Join  (cost=42422.79..88190.34 rows=160 width=91) (actual time=234.593..1316.236 rows=1164882 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                     Hash Cond: (t0.id = t3.owner)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                     ->  Nested Loop  (cost=42395.08..88160.62 rows=160 width=91) (actual time=234.568..1002.111 rows=1164882 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                           Join Filter: (t0.jobid = t1.id)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                           ->  Seq Scan on jobs t1  (cost=0.00..11.40 rows=1 width=8) (actual time=0.015..0.016 rows=1 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                 Filter: ((priority = '5'::bigint) AND ((status = 'A'::bpchar) OR (status = 'a'::bpchar)))
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                 Rows Removed by Filter: 6
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                           ->  Bitmap Heap Scan on jobqueue t0  (cost=42395.08..87989.60 rows=12769 width=91) (actual time=234.548..656.692 rows=1164882 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                 Recheck Cond: (((docpriority < '1000000001'::double precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <= '1558919723463'::bigint)) OR ((status = 'G'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <= '1558919723463'::bigint)))
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                 Filter: (docpriority < '1000000001'::double precision)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                 Heap Blocks: exact=20094
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                 ->  BitmapOr  (cost=42395.08..42395.08 rows=12769 width=0) (actual time=229.219..229.219 rows=0 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                       ->  Bitmap Index Scan on i1556284697879  (cost=0.00..42384.25 rows=12769 width=0) (actual time=229.176..229.176 rows=1164882 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                             Index Cond: ((docpriority < '1000000001'::double precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <= '1558919723463'::bigint))
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                       ->  Bitmap Index Scan on i1556284697880  (cost=0.00..4.45 rows=1 width=0) (actual time=0.041..0.041 rows=93 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                             Index Cond: ((status = 'G'::bpchar) AND (checkaction = 'R'::bpchar) AND (checktime <= '1558919723463'::bigint))
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                     ->  Hash  (cost=26.08..26.08 rows=130 width=8) (actual time=0.007..0.007 rows=0 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                           Buckets: 1024  Batches: 1  Memory Usage: 8kB
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                           ->  Hash Join  (cost=12.93..26.08 rows=130 width=8) (actual time=0.006..0.006 rows=0 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                 Hash Cond: ((t3.eventname)::text = (t4.name)::text)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                 ->  Seq Scan on prereqevents t3  (cost=0.00..11.40 rows=140 width=524) (actual time=0.005..0.005 rows=0 loops=1)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                 ->  Hash  (cost=11.30..11.30 rows=130 width=516) (never executed)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                                       ->  Seq Scan on events t4  (cost=0.00..11.30 rows=130 width=516) (never executed)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:               ->  Index Scan using i1556284697878 on jobqueue t2  (cost=0.56..8.11 rows=1 width=49) (actual time=0.010..0.010 rows=0 loops=1164882)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                     Index Cond: ((dochash)::text = (t0.dochash)::text)
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                     Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status = 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan:                     Rows Removed by Filter: 1
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan: Planning time: 1.455 ms
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Plan: Execution time: 15518.479 ms
 WARN 2019-05-27T01:17:11,635 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db|
 WARN 2019-05-27T01:17:11,749 (Stuffer thread) - MCF|MCF-agent|apache.manifoldcf.db| Stats: n_distinct=3.0 most_common_vals={C,P,A} most_common_freqs={0.779200017,0.220666662,0.000133333335}

The same happens if I split the job into several jobs of 1.5M of docs at the difference that if I restart the jobs, some may end but with no guarantee. 

I don’t really understand what is causing those long running queries and the fact that the job/jobs can fail on an Error : Unexpected status…

I tried to add the following parameter to the properties.xml to have better logs but it changed nothing : <property name="org.apache.manifoldcf.diagnostics" value="DEBUG"/>

The only thing I did that can explain a lack of performances is that I raised the org.apache.manifoldcf.ui.maxstatuscount from 500k to 2M. But I was only expecting low performances on job status UI. Is it possible that it is responsible of the long running queries ?

Thanks for the help,
Julien 


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus