You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by jetnet <je...@gmail.com> on 2016/04/27 13:33:48 UTC

Database performance

Hi Karl,

I set up two MCF instances (quick setup) on the same machine, using
the same Postgres 9.3 instance (with different databases
"org.apache.manifoldcf.database.name" of course).
After a couple of days I've got a performance issue: one MCF instance
has become very slow - it processes a few docs per hour only. I guess,
the bottleneck is the database:

"normal" instance:
SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
738.311 rows in the table, took 1,2 sec
"G";50674
"F";68
"P";149179
"C";402367
"A";33
"Z";136676

"slow" instance (currently with a single active job):
SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
2.745.329 rows in the table, took 350 sec
"G";337922  --STATUS_PENDINGPURGATORY
"F";449     --STATUS_ACTIVEPURGATORY
"P";25909   --STATUS_PENDING
"C";562772  --STATUS_COMPLETE
"A";9       --STATUS_ACTIVE
"Z";1644927 --STATUS_PURGATORY

Since "count(*)" is terrible slow in Postgres, I used the following
sql to count jobqueue's rows:
SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
relname = 'jobqueue';

Both MCF instances have the same number of working threads, database
handles etc.
Is the database "full"? What could you recommend to improve the performance?

Thank you!
Konstantin

Re: Database performance

Posted by Karl Wright <da...@gmail.com>.
Hi Konstantin,

It may be that stats on some other table involved in the complex stuffer
query were messing us up.  That makes me happy for you but I'm still not
happy that Postgresql does this to us. :-)

Thanks again,
Karl


On Wed, Apr 27, 2016 at 9:25 AM, jetnet <je...@gmail.com> wrote:

> Hi Karl,
>
> just want to let you know - after a "full analyse" (for all tables),
> the DB performance seems to be OK,  and I see, the crawler is moving
> now! :)
> I'll add the analyze command to the "vacuum" daily script :)
>
> Thank you very much for you time, hopefully, the lack of analyzing was
> the root cause for that particular problem.
>
> Konstantin
>
>
> 2016-04-27 15:16 GMT+02:00 Karl Wright <da...@gmail.com>:
> > Ok, here's the query:
> >
> > SELECT t0.id
> ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
> t0.status=?)
> > AND t0.checkaction=? AND t0.checktime<=? ... ORDER BY t0.docpriority ASC
> > LIMIT 400
> >
> > Here's the index:
> >
> > "public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
> > ON jobqueue USING btree (docpriority, status, checkaction, checktime)"
> >
> > In earlier versions of Postgresql it uses this index for both access and
> > ordering (it reports "ordered index scan" in the plan or something like
> > that).  We rely on the ability to use this index for fast stuffer
> queries.
> > Maybe postgresql 9.3 broke this in some way?
> >
> > Here's the documentation; maybe you can figure it out. :-P  You can play
> > around using "EXPLAIN" to figure out what it's trying to do.  My guess
> is,
> > as I said, that it just thinks it has a better way, which isn't in fact
> > better and is in fact worse...
> >
> > http://www.postgresql.org/docs/9.2/static/indexes-ordering.html
> >
> > Karl
> >
> >
> > On Wed, Apr 27, 2016 at 9:00 AM, jetnet <je...@gmail.com> wrote:
> >>
> >> yes, I use the recommended db settings:
> >> shared_buffers = 1024MB            # min 128kB
> >>
> >> The indexes:
> >> select * from pg_indexes where tablename = 'jobqueue';
> >>
> >> "public";"jobqueue";"jobqueue_pkey";"";"CREATE UNIQUE INDEX
> >> jobqueue_pkey ON jobqueue USING btree (id)"
> >> "public";"jobqueue";"i1459943614205";"";"CREATE INDEX i1459943614205
> >> ON jobqueue USING btree (jobid, status)"
> >> "public";"jobqueue";"i1459943614204";"";"CREATE INDEX i1459943614204
> >> ON jobqueue USING btree (needpriority)"
> >> "public";"jobqueue";"i1459943614203";"";"CREATE INDEX i1459943614203
> >> ON jobqueue USING btree (isseed, jobid)"
> >> "public";"jobqueue";"i1459943614202";"";"CREATE INDEX i1459943614202
> >> ON jobqueue USING btree (failtime, jobid)"
> >> "public";"jobqueue";"i1459943614201";"";"CREATE INDEX i1459943614201
> >> ON jobqueue USING btree (status, checkaction, checktime)"
> >> "public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
> >> ON jobqueue USING btree (docpriority, status, checkaction, checktime)"
> >> "public";"jobqueue";"i1459943614199";"";"CREATE UNIQUE INDEX
> >> i1459943614199 ON jobqueue USING btree (dochash, jobid)"
> >>
> >>
> >> Thanks!
> >> Konstantin
> >>
> >> 2016-04-27 14:57 GMT+02:00 Karl Wright <da...@gmail.com>:
> >> > Also, have you looked at increasing the amount of shared buffers in
> your
> >> > postgresql configuration?  See the "how-to-build-and-deploy" page for
> >> > that;
> >> > it can make a lot of difference for MCF.  So even if the plan is not
> >> > perfect
> >> > the database simply performs better.
> >> >
> >> > Karl
> >> >
> >> > On Wed, Apr 27, 2016 at 8:53 AM, Karl Wright <da...@gmail.com>
> wrote:
> >> >>
> >> >> Hi Konstantin,
> >> >>
> >> >> Can you list the indexes defined on the jobqueue table?
> >> >>
> >> >> The query we do for stuffing is precisely aligned with one of the
> >> >> indexes
> >> >> we create on the jobqueue table.  It should be doing an ordered index
> >> >> read
> >> >> with complex filtering conditions; instead it is doing this:
> >> >>
> >> >> WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>            ->  Bitmap Heap Scan on jobqueue t0
> (cost=2223.73..3352.95
> >> >> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
> >> >>
> >> >> ... with a hash join and a hash sort.
> >> >>
> >> >> Postgres is being way too smart for its own good here.  Probably
> that's
> >> >> at
> >> >> least partly because its time estimates for its chosen plan are far
> >> >> less
> >> >> than reality.  There may be a way to configure postgresql not to make
> >> >> that
> >> >> mistake but I don't know offhand how you do it.
> >> >>
> >> >> In order for us to be able to control the plan via hints, which would
> >> >> make
> >> >> this problem go away once and for all, you need the version of
> Postgres
> >> >> that
> >> >> you pay for.  Sigh.
> >> >>
> >> >> Anyhow, let's verify that the right indexes are in place, at least...
> >> >>
> >> >> Karl
> >> >>
> >> >>
> >> >> On Wed, Apr 27, 2016 at 8:38 AM, jetnet <je...@gmail.com> wrote:
> >> >>>
> >> >>> yet another long running query:
> >> >>>
> >> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
> >> >>> query (215465 ms): [UPDATE jobqueue SET
> >> >>> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
> >> >>> docpriority<?]
> >> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
> >> >>> '1.000000001E9'
> >> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
> >> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
> >> >>> '1.000000001E9'
> >> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
> >> >>> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
> >> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
> >> >>> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01
> rows=19647
> >> >>> width=352)
> >> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
> >> >>> Cond: (docpriority < 1000000001::double precision)
> >> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
> >> >>>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats:
> n_distinct=4.0
> >> >>> most_common_vals={Z,C,G,P}
> >> >>> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
> >> >>>
> >> >>>
> >> >>>
> >> >>> 2016-04-27 14:32 GMT+02:00 jetnet <je...@gmail.com>:
> >> >>> > Thank you for the fast reply!
> >> >>> >
> >> >>> > I provided the query to show, how many docs have been processed by
> >> >>> > the
> >> >>> > instances so far.
> >> >>> > I do not use UI to check the jobs statues - it takes 10-15 minutes
> >> >>> > to
> >> >>> > display the page :( (ui.maxstatuscount=100.000).
> >> >>> > The full vacuum and re-index database are running as a scheduled
> job
> >> >>> > on a daily basis.
> >> >>> > So, just tried to analyze the "problem" table - jobqueue, but I
> >> >>> > don't
> >> >>> > think it's helped a lot, the same "status" query took 250 second
> >> >>> > now,
> >> >>> > improved, but it's still very long.
> >> >>> >
> >> >>> > The long-running queries reported in the log are all about
> jobqueue
> >> >>> > table, e.g.:
> >> >>> >
> >> >>> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a
> >> >>> > long-running
> >> >>> > query (75328 ms): [SELECT
> >> >>> >
> >> >>> > t0.id
> ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
> >> >>> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
> >> >>> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
> >> >>> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?)
> >> >>> > AND
> >> >>> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
> >> >>> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
> >> >>> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
> >> >>> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x'
> FROM
> >> >>> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
> >> >>> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
> >> >>> > '1.000000001E9'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1:
> 'P'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2:
> 'G'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3:
> 'R'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
> >> >>> > '1461757713112'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5:
> 'A'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6:
> 'a'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7:
> '5'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8:
> 'A'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9:
> 'F'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10:
> 'a'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11:
> 'f'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12:
> 'D'
> >> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13:
> 'd'
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
> >> >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
> >> >>> > time=738.775..738.842 rows=400 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
> >> >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
> >> >>> > time=738.774..738.792 rows=400 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>  Sort
> >> >>> > Key: t0.docpriority
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>  Sort
> >> >>> > Method: top-N heapsort  Memory: 139kB
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
> >> >>> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
> >> >>> > (actual time=46.841..723.735 rows=20215 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235)
> >> >>> > (actual
> >> >>> > time=46.770..108.460 rows=20215 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >      Hash Cond: (t0.id = t3.owner)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
> >> >>> > (actual time=46.747..102.890 rows=20215 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >            Join Filter: (t0.jobid = t1.id)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >            Rows Removed by Join Filter: 643
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1
> width=8)
> >> >>> > (actual time=0.015..0.024 rows=1 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                  Filter: ((priority = 5::bigint) AND ((status =
> >> >>> > 'A'::bpchar) OR (status = 'a'::bpchar)))
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                  Rows Removed by Filter: 3
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >            ->  Bitmap Heap Scan on jobqueue t0
> >> >>> > (cost=2223.73..3352.95
> >> >>> > rows=286 width=235) (actual time=46.705..93.152 rows=20858
> loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                  Recheck Cond: (((docpriority < 1000000001::double
> >> >>> > precision) AND (status = 'P'::bpchar) AND (checkaction =
> >> >>> > 'R'::bpchar)
> >> >>> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
> >> >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
> >> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
> >> >>> > 1461757713112::bigint)))
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
> >> >>> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                        ->  Bitmap Index Scan on i1459943614200
> >> >>> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
> >> >>> > rows=333 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                              Index Cond: ((docpriority <
> >> >>> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
> >> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
> >> >>> > 1461757713112::bigint))
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                        ->  Bitmap Index Scan on i1459943614200
> >> >>> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
> >> >>> > rows=20555 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                              Index Cond: ((docpriority <
> >> >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
> >> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
> >> >>> > 1461757713112::bigint))
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
> >> >>> > time=0.004..0.004 rows=0 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8)
> >> >>> > (actual
> >> >>> > time=0.003..0.003 rows=0 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                  Hash Cond: ((t3.eventname)::text = (t4.name
> )::text)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                  ->  Seq Scan on prereqevents t3
> (cost=0.00..11.40
> >> >>> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
> >> >>> > (never executed)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >                        ->  Seq Scan on events t4
> (cost=0.00..11.30
> >> >>> > rows=130 width=516) (never executed)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> > ->  Index Scan using i1459943614199 on jobqueue t2
> (cost=0.56..8.59
> >> >>> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
> >> >>> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
> >> >>> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >> >>> >      Rows Removed by Filter: 1
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total
> >> >>> > runtime:
> >> >>> > 747.031 ms
> >> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
> >> >>> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
> >> >>> > n_distinct=5.0 most_common_vals={Z,C,G,P}
> >> >>> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
> >> >>> >
> >> >>> >
> >> >>> > 2016-04-27 13:43 GMT+02:00 Karl Wright <da...@gmail.com>:
> >> >>> >> Hi Konstantin,
> >> >>> >>
> >> >>> >> The query you are looking at is performed by the UI only, and
> there
> >> >>> >> is
> >> >>> >> a
> >> >>> >> parameter you can set which applies a limit to the number of
> >> >>> >> documents
> >> >>> >> so
> >> >>> >> that the count is reported as "<limit>+" in the UI.  This is the
> >> >>> >> parameter:
> >> >>> >>
> >> >>> >> org.apache.manifoldcf.ui.maxstatuscount
> >> >>> >>
> >> >>> >> As for why the database gets slow for crawling, unless you are
> >> >>> >> seeing
> >> >>> >> reports in the log of long-running queries, then it's a good
> chance
> >> >>> >> you need
> >> >>> >> to vacuum your database instance.  I generally recommend that a
> >> >>> >> vacuum
> >> >>> >> full
> >> >>> >> be done periodically for database instances.  Autovacuuming has
> >> >>> >> gotten
> >> >>> >> a lot
> >> >>> >> better in postgres than it used to be but at least in the past
> the
> >> >>> >> autovacuuming process would get far behind ManifoldCF and so the
> >> >>> >> database
> >> >>> >> would get quite bloated anyway.  So I'd give that a try.
> >> >>> >>
> >> >>> >> If you are seeing logging output mentioning slow queries, you may
> >> >>> >> need
> >> >>> >> to
> >> >>> >> tune how often MCF analyzes certain tables.  There are parameters
> >> >>> >> that
> >> >>> >> control that as well.  In general, if there is a slow query with
> a
> >> >>> >> bad
> >> >>> >> plan,
> >> >>> >> and analyzing the tables involved makes it come up with a much
> >> >>> >> better
> >> >>> >> plan,
> >> >>> >> analysis is not happening often enough.  But first, before you
> get
> >> >>> >> to
> >> >>> >> that
> >> >>> >> point, have a look at the log and see whether this is likely to
> be
> >> >>> >> the
> >> >>> >> problem.  (Usually it is the stuffer query that gets slow when
> >> >>> >> there's
> >> >>> >> an
> >> >>> >> issue with table analysis, FWIW).  Please feel free to post the
> >> >>> >> plan
> >> >>> >> of the
> >> >>> >> queries being reported here.
> >> >>> >>
> >> >>> >> Thanks,
> >> >>> >> Karl
> >> >>> >>
> >> >>> >>
> >> >>> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <je...@gmail.com>
> wrote:
> >> >>> >>>
> >> >>> >>> Hi Karl,
> >> >>> >>>
> >> >>> >>> I set up two MCF instances (quick setup) on the same machine,
> >> >>> >>> using
> >> >>> >>> the same Postgres 9.3 instance (with different databases
> >> >>> >>> "org.apache.manifoldcf.database.name" of course).
> >> >>> >>> After a couple of days I've got a performance issue: one MCF
> >> >>> >>> instance
> >> >>> >>> has become very slow - it processes a few docs per hour only. I
> >> >>> >>> guess,
> >> >>> >>> the bottleneck is the database:
> >> >>> >>>
> >> >>> >>> "normal" instance:
> >> >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status
> --
> >> >>> >>> 738.311 rows in the table, took 1,2 sec
> >> >>> >>> "G";50674
> >> >>> >>> "F";68
> >> >>> >>> "P";149179
> >> >>> >>> "C";402367
> >> >>> >>> "A";33
> >> >>> >>> "Z";136676
> >> >>> >>>
> >> >>> >>> "slow" instance (currently with a single active job):
> >> >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status
> --
> >> >>> >>> 2.745.329 rows in the table, took 350 sec
> >> >>> >>> "G";337922  --STATUS_PENDINGPURGATORY
> >> >>> >>> "F";449     --STATUS_ACTIVEPURGATORY
> >> >>> >>> "P";25909   --STATUS_PENDING
> >> >>> >>> "C";562772  --STATUS_COMPLETE
> >> >>> >>> "A";9       --STATUS_ACTIVE
> >> >>> >>> "Z";1644927 --STATUS_PURGATORY
> >> >>> >>>
> >> >>> >>> Since "count(*)" is terrible slow in Postgres, I used the
> >> >>> >>> following
> >> >>> >>> sql to count jobqueue's rows:
> >> >>> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class
> >> >>> >>> WHERE
> >> >>> >>> relname = 'jobqueue';
> >> >>> >>>
> >> >>> >>> Both MCF instances have the same number of working threads,
> >> >>> >>> database
> >> >>> >>> handles etc.
> >> >>> >>> Is the database "full"? What could you recommend to improve the
> >> >>> >>> performance?
> >> >>> >>>
> >> >>> >>> Thank you!
> >> >>> >>> Konstantin
> >> >>> >>
> >> >>> >>
> >> >>
> >> >>
> >> >
> >
> >
>

Re: Database performance

Posted by jetnet <je...@gmail.com>.
Hi Karl,

just want to let you know - after a "full analyse" (for all tables),
the DB performance seems to be OK,  and I see, the crawler is moving
now! :)
I'll add the analyze command to the "vacuum" daily script :)

Thank you very much for you time, hopefully, the lack of analyzing was
the root cause for that particular problem.

Konstantin


2016-04-27 15:16 GMT+02:00 Karl Wright <da...@gmail.com>:
> Ok, here's the query:
>
> SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
> FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR t0.status=?)
> AND t0.checkaction=? AND t0.checktime<=? ... ORDER BY t0.docpriority ASC
> LIMIT 400
>
> Here's the index:
>
> "public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
> ON jobqueue USING btree (docpriority, status, checkaction, checktime)"
>
> In earlier versions of Postgresql it uses this index for both access and
> ordering (it reports "ordered index scan" in the plan or something like
> that).  We rely on the ability to use this index for fast stuffer queries.
> Maybe postgresql 9.3 broke this in some way?
>
> Here's the documentation; maybe you can figure it out. :-P  You can play
> around using "EXPLAIN" to figure out what it's trying to do.  My guess is,
> as I said, that it just thinks it has a better way, which isn't in fact
> better and is in fact worse...
>
> http://www.postgresql.org/docs/9.2/static/indexes-ordering.html
>
> Karl
>
>
> On Wed, Apr 27, 2016 at 9:00 AM, jetnet <je...@gmail.com> wrote:
>>
>> yes, I use the recommended db settings:
>> shared_buffers = 1024MB            # min 128kB
>>
>> The indexes:
>> select * from pg_indexes where tablename = 'jobqueue';
>>
>> "public";"jobqueue";"jobqueue_pkey";"";"CREATE UNIQUE INDEX
>> jobqueue_pkey ON jobqueue USING btree (id)"
>> "public";"jobqueue";"i1459943614205";"";"CREATE INDEX i1459943614205
>> ON jobqueue USING btree (jobid, status)"
>> "public";"jobqueue";"i1459943614204";"";"CREATE INDEX i1459943614204
>> ON jobqueue USING btree (needpriority)"
>> "public";"jobqueue";"i1459943614203";"";"CREATE INDEX i1459943614203
>> ON jobqueue USING btree (isseed, jobid)"
>> "public";"jobqueue";"i1459943614202";"";"CREATE INDEX i1459943614202
>> ON jobqueue USING btree (failtime, jobid)"
>> "public";"jobqueue";"i1459943614201";"";"CREATE INDEX i1459943614201
>> ON jobqueue USING btree (status, checkaction, checktime)"
>> "public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
>> ON jobqueue USING btree (docpriority, status, checkaction, checktime)"
>> "public";"jobqueue";"i1459943614199";"";"CREATE UNIQUE INDEX
>> i1459943614199 ON jobqueue USING btree (dochash, jobid)"
>>
>>
>> Thanks!
>> Konstantin
>>
>> 2016-04-27 14:57 GMT+02:00 Karl Wright <da...@gmail.com>:
>> > Also, have you looked at increasing the amount of shared buffers in your
>> > postgresql configuration?  See the "how-to-build-and-deploy" page for
>> > that;
>> > it can make a lot of difference for MCF.  So even if the plan is not
>> > perfect
>> > the database simply performs better.
>> >
>> > Karl
>> >
>> > On Wed, Apr 27, 2016 at 8:53 AM, Karl Wright <da...@gmail.com> wrote:
>> >>
>> >> Hi Konstantin,
>> >>
>> >> Can you list the indexes defined on the jobqueue table?
>> >>
>> >> The query we do for stuffing is precisely aligned with one of the
>> >> indexes
>> >> we create on the jobqueue table.  It should be doing an ordered index
>> >> read
>> >> with complex filtering conditions; instead it is doing this:
>> >>
>> >> WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
>> >> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>> >>
>> >> ... with a hash join and a hash sort.
>> >>
>> >> Postgres is being way too smart for its own good here.  Probably that's
>> >> at
>> >> least partly because its time estimates for its chosen plan are far
>> >> less
>> >> than reality.  There may be a way to configure postgresql not to make
>> >> that
>> >> mistake but I don't know offhand how you do it.
>> >>
>> >> In order for us to be able to control the plan via hints, which would
>> >> make
>> >> this problem go away once and for all, you need the version of Postgres
>> >> that
>> >> you pay for.  Sigh.
>> >>
>> >> Anyhow, let's verify that the right indexes are in place, at least...
>> >>
>> >> Karl
>> >>
>> >>
>> >> On Wed, Apr 27, 2016 at 8:38 AM, jetnet <je...@gmail.com> wrote:
>> >>>
>> >>> yet another long running query:
>> >>>
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
>> >>> query (215465 ms): [UPDATE jobqueue SET
>> >>> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
>> >>> docpriority<?]
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
>> >>> '1.000000001E9'
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
>> >>> '1.000000001E9'
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
>> >>> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
>> >>> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
>> >>> width=352)
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
>> >>> Cond: (docpriority < 1000000001::double precision)
>> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
>> >>>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
>> >>> most_common_vals={Z,C,G,P}
>> >>> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
>> >>>
>> >>>
>> >>>
>> >>> 2016-04-27 14:32 GMT+02:00 jetnet <je...@gmail.com>:
>> >>> > Thank you for the fast reply!
>> >>> >
>> >>> > I provided the query to show, how many docs have been processed by
>> >>> > the
>> >>> > instances so far.
>> >>> > I do not use UI to check the jobs statues - it takes 10-15 minutes
>> >>> > to
>> >>> > display the page :( (ui.maxstatuscount=100.000).
>> >>> > The full vacuum and re-index database are running as a scheduled job
>> >>> > on a daily basis.
>> >>> > So, just tried to analyze the "problem" table - jobqueue, but I
>> >>> > don't
>> >>> > think it's helped a lot, the same "status" query took 250 second
>> >>> > now,
>> >>> > improved, but it's still very long.
>> >>> >
>> >>> > The long-running queries reported in the log are all about jobqueue
>> >>> > table, e.g.:
>> >>> >
>> >>> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a
>> >>> > long-running
>> >>> > query (75328 ms): [SELECT
>> >>> >
>> >>> > t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
>> >>> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
>> >>> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
>> >>> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?)
>> >>> > AND
>> >>> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
>> >>> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
>> >>> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
>> >>> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
>> >>> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
>> >>> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
>> >>> > '1.000000001E9'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
>> >>> > '1461757713112'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
>> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
>> >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>> >>> > time=738.775..738.842 rows=400 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
>> >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>> >>> > time=738.774..738.792 rows=400 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>> >>> > Key: t0.docpriority
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>> >>> > Method: top-N heapsort  Memory: 139kB
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
>> >>> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
>> >>> > (actual time=46.841..723.735 rows=20215 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235)
>> >>> > (actual
>> >>> > time=46.770..108.460 rows=20215 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      Hash Cond: (t0.id = t3.owner)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
>> >>> > (actual time=46.747..102.890 rows=20215 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            Join Filter: (t0.jobid = t1.id)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            Rows Removed by Join Filter: 643
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
>> >>> > (actual time=0.015..0.024 rows=1 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  Filter: ((priority = 5::bigint) AND ((status =
>> >>> > 'A'::bpchar) OR (status = 'a'::bpchar)))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  Rows Removed by Filter: 3
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            ->  Bitmap Heap Scan on jobqueue t0
>> >>> > (cost=2223.73..3352.95
>> >>> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  Recheck Cond: (((docpriority < 1000000001::double
>> >>> > precision) AND (status = 'P'::bpchar) AND (checkaction =
>> >>> > 'R'::bpchar)
>> >>> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
>> >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
>> >>> > 1461757713112::bigint)))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
>> >>> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                        ->  Bitmap Index Scan on i1459943614200
>> >>> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
>> >>> > rows=333 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                              Index Cond: ((docpriority <
>> >>> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
>> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
>> >>> > 1461757713112::bigint))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                        ->  Bitmap Index Scan on i1459943614200
>> >>> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
>> >>> > rows=20555 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                              Index Cond: ((docpriority <
>> >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
>> >>> > 1461757713112::bigint))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
>> >>> > time=0.004..0.004 rows=0 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8)
>> >>> > (actual
>> >>> > time=0.003..0.003 rows=0 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
>> >>> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
>> >>> > (never executed)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >                        ->  Seq Scan on events t4  (cost=0.00..11.30
>> >>> > rows=130 width=516) (never executed)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> > ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
>> >>> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
>> >>> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
>> >>> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >>> >      Rows Removed by Filter: 1
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total
>> >>> > runtime:
>> >>> > 747.031 ms
>> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
>> >>> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
>> >>> > n_distinct=5.0 most_common_vals={Z,C,G,P}
>> >>> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
>> >>> >
>> >>> >
>> >>> > 2016-04-27 13:43 GMT+02:00 Karl Wright <da...@gmail.com>:
>> >>> >> Hi Konstantin,
>> >>> >>
>> >>> >> The query you are looking at is performed by the UI only, and there
>> >>> >> is
>> >>> >> a
>> >>> >> parameter you can set which applies a limit to the number of
>> >>> >> documents
>> >>> >> so
>> >>> >> that the count is reported as "<limit>+" in the UI.  This is the
>> >>> >> parameter:
>> >>> >>
>> >>> >> org.apache.manifoldcf.ui.maxstatuscount
>> >>> >>
>> >>> >> As for why the database gets slow for crawling, unless you are
>> >>> >> seeing
>> >>> >> reports in the log of long-running queries, then it's a good chance
>> >>> >> you need
>> >>> >> to vacuum your database instance.  I generally recommend that a
>> >>> >> vacuum
>> >>> >> full
>> >>> >> be done periodically for database instances.  Autovacuuming has
>> >>> >> gotten
>> >>> >> a lot
>> >>> >> better in postgres than it used to be but at least in the past the
>> >>> >> autovacuuming process would get far behind ManifoldCF and so the
>> >>> >> database
>> >>> >> would get quite bloated anyway.  So I'd give that a try.
>> >>> >>
>> >>> >> If you are seeing logging output mentioning slow queries, you may
>> >>> >> need
>> >>> >> to
>> >>> >> tune how often MCF analyzes certain tables.  There are parameters
>> >>> >> that
>> >>> >> control that as well.  In general, if there is a slow query with a
>> >>> >> bad
>> >>> >> plan,
>> >>> >> and analyzing the tables involved makes it come up with a much
>> >>> >> better
>> >>> >> plan,
>> >>> >> analysis is not happening often enough.  But first, before you get
>> >>> >> to
>> >>> >> that
>> >>> >> point, have a look at the log and see whether this is likely to be
>> >>> >> the
>> >>> >> problem.  (Usually it is the stuffer query that gets slow when
>> >>> >> there's
>> >>> >> an
>> >>> >> issue with table analysis, FWIW).  Please feel free to post the
>> >>> >> plan
>> >>> >> of the
>> >>> >> queries being reported here.
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Karl
>> >>> >>
>> >>> >>
>> >>> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <je...@gmail.com> wrote:
>> >>> >>>
>> >>> >>> Hi Karl,
>> >>> >>>
>> >>> >>> I set up two MCF instances (quick setup) on the same machine,
>> >>> >>> using
>> >>> >>> the same Postgres 9.3 instance (with different databases
>> >>> >>> "org.apache.manifoldcf.database.name" of course).
>> >>> >>> After a couple of days I've got a performance issue: one MCF
>> >>> >>> instance
>> >>> >>> has become very slow - it processes a few docs per hour only. I
>> >>> >>> guess,
>> >>> >>> the bottleneck is the database:
>> >>> >>>
>> >>> >>> "normal" instance:
>> >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> >>> >>> 738.311 rows in the table, took 1,2 sec
>> >>> >>> "G";50674
>> >>> >>> "F";68
>> >>> >>> "P";149179
>> >>> >>> "C";402367
>> >>> >>> "A";33
>> >>> >>> "Z";136676
>> >>> >>>
>> >>> >>> "slow" instance (currently with a single active job):
>> >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> >>> >>> 2.745.329 rows in the table, took 350 sec
>> >>> >>> "G";337922  --STATUS_PENDINGPURGATORY
>> >>> >>> "F";449     --STATUS_ACTIVEPURGATORY
>> >>> >>> "P";25909   --STATUS_PENDING
>> >>> >>> "C";562772  --STATUS_COMPLETE
>> >>> >>> "A";9       --STATUS_ACTIVE
>> >>> >>> "Z";1644927 --STATUS_PURGATORY
>> >>> >>>
>> >>> >>> Since "count(*)" is terrible slow in Postgres, I used the
>> >>> >>> following
>> >>> >>> sql to count jobqueue's rows:
>> >>> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class
>> >>> >>> WHERE
>> >>> >>> relname = 'jobqueue';
>> >>> >>>
>> >>> >>> Both MCF instances have the same number of working threads,
>> >>> >>> database
>> >>> >>> handles etc.
>> >>> >>> Is the database "full"? What could you recommend to improve the
>> >>> >>> performance?
>> >>> >>>
>> >>> >>> Thank you!
>> >>> >>> Konstantin
>> >>> >>
>> >>> >>
>> >>
>> >>
>> >
>
>

Re: Database performance

Posted by Karl Wright <da...@gmail.com>.
Ok, here's the query:

SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,
t0.failcount FROM jobqueue t0  WHERE t0.docpriority<? AND
(t0.status=? OR t0.status=?)
AND t0.checkaction=? AND t0.checktime<=? ... ORDER BY t0.docpriority ASC
LIMIT 400

Here's the index:

"public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
ON jobqueue USING btree (docpriority, status, checkaction, checktime)"

In earlier versions of Postgresql it uses this index for both access and
ordering (it reports "ordered index scan" in the plan or something like
that).  We rely on the ability to use this index for fast stuffer queries.
Maybe postgresql 9.3 broke this in some way?

Here's the documentation; maybe you can figure it out. :-P  You can play
around using "EXPLAIN" to figure out what it's trying to do.  My guess is,
as I said, that it just thinks it has a better way, which isn't in fact
better and is in fact worse...

http://www.postgresql.org/docs/9.2/static/indexes-ordering.html

Karl


On Wed, Apr 27, 2016 at 9:00 AM, jetnet <je...@gmail.com> wrote:

> yes, I use the recommended db settings:
> shared_buffers = 1024MB            # min 128kB
>
> The indexes:
> select * from pg_indexes where tablename = 'jobqueue';
>
> "public";"jobqueue";"jobqueue_pkey";"";"CREATE UNIQUE INDEX
> jobqueue_pkey ON jobqueue USING btree (id)"
> "public";"jobqueue";"i1459943614205";"";"CREATE INDEX i1459943614205
> ON jobqueue USING btree (jobid, status)"
> "public";"jobqueue";"i1459943614204";"";"CREATE INDEX i1459943614204
> ON jobqueue USING btree (needpriority)"
> "public";"jobqueue";"i1459943614203";"";"CREATE INDEX i1459943614203
> ON jobqueue USING btree (isseed, jobid)"
> "public";"jobqueue";"i1459943614202";"";"CREATE INDEX i1459943614202
> ON jobqueue USING btree (failtime, jobid)"
> "public";"jobqueue";"i1459943614201";"";"CREATE INDEX i1459943614201
> ON jobqueue USING btree (status, checkaction, checktime)"
> "public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
> ON jobqueue USING btree (docpriority, status, checkaction, checktime)"
> "public";"jobqueue";"i1459943614199";"";"CREATE UNIQUE INDEX
> i1459943614199 ON jobqueue USING btree (dochash, jobid)"
>
>
> Thanks!
> Konstantin
>
> 2016-04-27 14:57 GMT+02:00 Karl Wright <da...@gmail.com>:
> > Also, have you looked at increasing the amount of shared buffers in your
> > postgresql configuration?  See the "how-to-build-and-deploy" page for
> that;
> > it can make a lot of difference for MCF.  So even if the plan is not
> perfect
> > the database simply performs better.
> >
> > Karl
> >
> > On Wed, Apr 27, 2016 at 8:53 AM, Karl Wright <da...@gmail.com> wrote:
> >>
> >> Hi Konstantin,
> >>
> >> Can you list the indexes defined on the jobqueue table?
> >>
> >> The query we do for stuffing is precisely aligned with one of the
> indexes
> >> we create on the jobqueue table.  It should be doing an ordered index
> read
> >> with complex filtering conditions; instead it is doing this:
> >>
> >> WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
> >> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
> >>
> >> ... with a hash join and a hash sort.
> >>
> >> Postgres is being way too smart for its own good here.  Probably that's
> at
> >> least partly because its time estimates for its chosen plan are far less
> >> than reality.  There may be a way to configure postgresql not to make
> that
> >> mistake but I don't know offhand how you do it.
> >>
> >> In order for us to be able to control the plan via hints, which would
> make
> >> this problem go away once and for all, you need the version of Postgres
> that
> >> you pay for.  Sigh.
> >>
> >> Anyhow, let's verify that the right indexes are in place, at least...
> >>
> >> Karl
> >>
> >>
> >> On Wed, Apr 27, 2016 at 8:38 AM, jetnet <je...@gmail.com> wrote:
> >>>
> >>> yet another long running query:
> >>>
> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
> >>> query (215465 ms): [UPDATE jobqueue SET
> >>> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
> >>> docpriority<?]
> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
> >>> '1.000000001E9'
> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
> >>> '1.000000001E9'
> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
> >>> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
> >>> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
> >>> width=352)
> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
> >>> Cond: (docpriority < 1000000001::double precision)
> >>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
> >>>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
> >>> most_common_vals={Z,C,G,P}
> >>> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
> >>>
> >>>
> >>>
> >>> 2016-04-27 14:32 GMT+02:00 jetnet <je...@gmail.com>:
> >>> > Thank you for the fast reply!
> >>> >
> >>> > I provided the query to show, how many docs have been processed by
> the
> >>> > instances so far.
> >>> > I do not use UI to check the jobs statues - it takes 10-15 minutes to
> >>> > display the page :( (ui.maxstatuscount=100.000).
> >>> > The full vacuum and re-index database are running as a scheduled job
> >>> > on a daily basis.
> >>> > So, just tried to analyze the "problem" table - jobqueue, but I don't
> >>> > think it's helped a lot, the same "status" query took 250 second now,
> >>> > improved, but it's still very long.
> >>> >
> >>> > The long-running queries reported in the log are all about jobqueue
> >>> > table, e.g.:
> >>> >
> >>> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
> >>> > query (75328 ms): [SELECT
> >>> > t0.id
> ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
> >>> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
> >>> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
> >>> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?)
> AND
> >>> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
> >>> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
> >>> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
> >>> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
> >>> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
> >>> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
> >>> > '1.000000001E9'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
> >>> > '1461757713112'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
> >>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
> >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
> >>> > time=738.775..738.842 rows=400 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
> >>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
> >>> > time=738.774..738.792 rows=400 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
> >>> > Key: t0.docpriority
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
> >>> > Method: top-N heapsort  Memory: 139kB
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
> >>> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
> >>> > (actual time=46.841..723.735 rows=20215 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235) (actual
> >>> > time=46.770..108.460 rows=20215 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >      Hash Cond: (t0.id = t3.owner)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
> >>> > (actual time=46.747..102.890 rows=20215 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >            Join Filter: (t0.jobid = t1.id)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >            Rows Removed by Join Filter: 643
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
> >>> > (actual time=0.015..0.024 rows=1 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                  Filter: ((priority = 5::bigint) AND ((status =
> >>> > 'A'::bpchar) OR (status = 'a'::bpchar)))
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                  Rows Removed by Filter: 3
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >            ->  Bitmap Heap Scan on jobqueue t0
> (cost=2223.73..3352.95
> >>> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                  Recheck Cond: (((docpriority < 1000000001::double
> >>> > precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
> >>> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
> >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
> >>> > (checkaction = 'R'::bpchar) AND (checktime <=
> 1461757713112::bigint)))
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
> >>> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                        ->  Bitmap Index Scan on i1459943614200
> >>> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
> >>> > rows=333 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                              Index Cond: ((docpriority <
> >>> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
> >>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                        ->  Bitmap Index Scan on i1459943614200
> >>> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
> >>> > rows=20555 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                              Index Cond: ((docpriority <
> >>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
> >>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
> >>> > time=0.004..0.004 rows=0 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8)
> (actual
> >>> > time=0.003..0.003 rows=0 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
> >>> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
> >>> > (never executed)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >                        ->  Seq Scan on events t4  (cost=0.00..11.30
> >>> > rows=130 width=516) (never executed)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> > ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
> >>> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
> >>> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
> >>> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >>> >      Rows Removed by Filter: 1
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total
> runtime:
> >>> > 747.031 ms
> >>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
> >>> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
> >>> > n_distinct=5.0 most_common_vals={Z,C,G,P}
> >>> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
> >>> >
> >>> >
> >>> > 2016-04-27 13:43 GMT+02:00 Karl Wright <da...@gmail.com>:
> >>> >> Hi Konstantin,
> >>> >>
> >>> >> The query you are looking at is performed by the UI only, and there
> is
> >>> >> a
> >>> >> parameter you can set which applies a limit to the number of
> documents
> >>> >> so
> >>> >> that the count is reported as "<limit>+" in the UI.  This is the
> >>> >> parameter:
> >>> >>
> >>> >> org.apache.manifoldcf.ui.maxstatuscount
> >>> >>
> >>> >> As for why the database gets slow for crawling, unless you are
> seeing
> >>> >> reports in the log of long-running queries, then it's a good chance
> >>> >> you need
> >>> >> to vacuum your database instance.  I generally recommend that a
> vacuum
> >>> >> full
> >>> >> be done periodically for database instances.  Autovacuuming has
> gotten
> >>> >> a lot
> >>> >> better in postgres than it used to be but at least in the past the
> >>> >> autovacuuming process would get far behind ManifoldCF and so the
> >>> >> database
> >>> >> would get quite bloated anyway.  So I'd give that a try.
> >>> >>
> >>> >> If you are seeing logging output mentioning slow queries, you may
> need
> >>> >> to
> >>> >> tune how often MCF analyzes certain tables.  There are parameters
> that
> >>> >> control that as well.  In general, if there is a slow query with a
> bad
> >>> >> plan,
> >>> >> and analyzing the tables involved makes it come up with a much
> better
> >>> >> plan,
> >>> >> analysis is not happening often enough.  But first, before you get
> to
> >>> >> that
> >>> >> point, have a look at the log and see whether this is likely to be
> the
> >>> >> problem.  (Usually it is the stuffer query that gets slow when
> there's
> >>> >> an
> >>> >> issue with table analysis, FWIW).  Please feel free to post the plan
> >>> >> of the
> >>> >> queries being reported here.
> >>> >>
> >>> >> Thanks,
> >>> >> Karl
> >>> >>
> >>> >>
> >>> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <je...@gmail.com> wrote:
> >>> >>>
> >>> >>> Hi Karl,
> >>> >>>
> >>> >>> I set up two MCF instances (quick setup) on the same machine, using
> >>> >>> the same Postgres 9.3 instance (with different databases
> >>> >>> "org.apache.manifoldcf.database.name" of course).
> >>> >>> After a couple of days I've got a performance issue: one MCF
> instance
> >>> >>> has become very slow - it processes a few docs per hour only. I
> >>> >>> guess,
> >>> >>> the bottleneck is the database:
> >>> >>>
> >>> >>> "normal" instance:
> >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> >>> >>> 738.311 rows in the table, took 1,2 sec
> >>> >>> "G";50674
> >>> >>> "F";68
> >>> >>> "P";149179
> >>> >>> "C";402367
> >>> >>> "A";33
> >>> >>> "Z";136676
> >>> >>>
> >>> >>> "slow" instance (currently with a single active job):
> >>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> >>> >>> 2.745.329 rows in the table, took 350 sec
> >>> >>> "G";337922  --STATUS_PENDINGPURGATORY
> >>> >>> "F";449     --STATUS_ACTIVEPURGATORY
> >>> >>> "P";25909   --STATUS_PENDING
> >>> >>> "C";562772  --STATUS_COMPLETE
> >>> >>> "A";9       --STATUS_ACTIVE
> >>> >>> "Z";1644927 --STATUS_PURGATORY
> >>> >>>
> >>> >>> Since "count(*)" is terrible slow in Postgres, I used the following
> >>> >>> sql to count jobqueue's rows:
> >>> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class
> WHERE
> >>> >>> relname = 'jobqueue';
> >>> >>>
> >>> >>> Both MCF instances have the same number of working threads,
> database
> >>> >>> handles etc.
> >>> >>> Is the database "full"? What could you recommend to improve the
> >>> >>> performance?
> >>> >>>
> >>> >>> Thank you!
> >>> >>> Konstantin
> >>> >>
> >>> >>
> >>
> >>
> >
>

Re: Database performance

Posted by jetnet <je...@gmail.com>.
yes, I use the recommended db settings:
shared_buffers = 1024MB            # min 128kB

The indexes:
select * from pg_indexes where tablename = 'jobqueue';

"public";"jobqueue";"jobqueue_pkey";"";"CREATE UNIQUE INDEX
jobqueue_pkey ON jobqueue USING btree (id)"
"public";"jobqueue";"i1459943614205";"";"CREATE INDEX i1459943614205
ON jobqueue USING btree (jobid, status)"
"public";"jobqueue";"i1459943614204";"";"CREATE INDEX i1459943614204
ON jobqueue USING btree (needpriority)"
"public";"jobqueue";"i1459943614203";"";"CREATE INDEX i1459943614203
ON jobqueue USING btree (isseed, jobid)"
"public";"jobqueue";"i1459943614202";"";"CREATE INDEX i1459943614202
ON jobqueue USING btree (failtime, jobid)"
"public";"jobqueue";"i1459943614201";"";"CREATE INDEX i1459943614201
ON jobqueue USING btree (status, checkaction, checktime)"
"public";"jobqueue";"i1459943614200";"";"CREATE INDEX i1459943614200
ON jobqueue USING btree (docpriority, status, checkaction, checktime)"
"public";"jobqueue";"i1459943614199";"";"CREATE UNIQUE INDEX
i1459943614199 ON jobqueue USING btree (dochash, jobid)"


Thanks!
Konstantin

2016-04-27 14:57 GMT+02:00 Karl Wright <da...@gmail.com>:
> Also, have you looked at increasing the amount of shared buffers in your
> postgresql configuration?  See the "how-to-build-and-deploy" page for that;
> it can make a lot of difference for MCF.  So even if the plan is not perfect
> the database simply performs better.
>
> Karl
>
> On Wed, Apr 27, 2016 at 8:53 AM, Karl Wright <da...@gmail.com> wrote:
>>
>> Hi Konstantin,
>>
>> Can you list the indexes defined on the jobqueue table?
>>
>> The query we do for stuffing is precisely aligned with one of the indexes
>> we create on the jobqueue table.  It should be doing an ordered index read
>> with complex filtering conditions; instead it is doing this:
>>
>> WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
>> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>>
>> ... with a hash join and a hash sort.
>>
>> Postgres is being way too smart for its own good here.  Probably that's at
>> least partly because its time estimates for its chosen plan are far less
>> than reality.  There may be a way to configure postgresql not to make that
>> mistake but I don't know offhand how you do it.
>>
>> In order for us to be able to control the plan via hints, which would make
>> this problem go away once and for all, you need the version of Postgres that
>> you pay for.  Sigh.
>>
>> Anyhow, let's verify that the right indexes are in place, at least...
>>
>> Karl
>>
>>
>> On Wed, Apr 27, 2016 at 8:38 AM, jetnet <je...@gmail.com> wrote:
>>>
>>> yet another long running query:
>>>
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
>>> query (215465 ms): [UPDATE jobqueue SET
>>> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
>>> docpriority<?]
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
>>> '1.000000001E9'
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
>>> '1.000000001E9'
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
>>> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
>>> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
>>> width=352)
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
>>> Cond: (docpriority < 1000000001::double precision)
>>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
>>>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
>>> most_common_vals={Z,C,G,P}
>>> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
>>>
>>>
>>>
>>> 2016-04-27 14:32 GMT+02:00 jetnet <je...@gmail.com>:
>>> > Thank you for the fast reply!
>>> >
>>> > I provided the query to show, how many docs have been processed by the
>>> > instances so far.
>>> > I do not use UI to check the jobs statues - it takes 10-15 minutes to
>>> > display the page :( (ui.maxstatuscount=100.000).
>>> > The full vacuum and re-index database are running as a scheduled job
>>> > on a daily basis.
>>> > So, just tried to analyze the "problem" table - jobqueue, but I don't
>>> > think it's helped a lot, the same "status" query took 250 second now,
>>> > improved, but it's still very long.
>>> >
>>> > The long-running queries reported in the log are all about jobqueue
>>> > table, e.g.:
>>> >
>>> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
>>> > query (75328 ms): [SELECT
>>> > t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
>>> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
>>> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
>>> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND
>>> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
>>> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
>>> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
>>> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
>>> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
>>> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
>>> > '1.000000001E9'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
>>> > '1461757713112'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
>>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
>>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>>> > time=738.775..738.842 rows=400 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
>>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>>> > time=738.774..738.792 rows=400 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>>> > Key: t0.docpriority
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>>> > Method: top-N heapsort  Memory: 139kB
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
>>> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
>>> > (actual time=46.841..723.735 rows=20215 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235) (actual
>>> > time=46.770..108.460 rows=20215 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      Hash Cond: (t0.id = t3.owner)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
>>> > (actual time=46.747..102.890 rows=20215 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            Join Filter: (t0.jobid = t1.id)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            Rows Removed by Join Filter: 643
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
>>> > (actual time=0.015..0.024 rows=1 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  Filter: ((priority = 5::bigint) AND ((status =
>>> > 'A'::bpchar) OR (status = 'a'::bpchar)))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  Rows Removed by Filter: 3
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
>>> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  Recheck Cond: (((docpriority < 1000000001::double
>>> > precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
>>> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
>>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
>>> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                        ->  Bitmap Index Scan on i1459943614200
>>> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
>>> > rows=333 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                              Index Cond: ((docpriority <
>>> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
>>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                        ->  Bitmap Index Scan on i1459943614200
>>> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
>>> > rows=20555 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                              Index Cond: ((docpriority <
>>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
>>> > time=0.004..0.004 rows=0 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8) (actual
>>> > time=0.003..0.003 rows=0 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
>>> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
>>> > (never executed)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >                        ->  Seq Scan on events t4  (cost=0.00..11.30
>>> > rows=130 width=516) (never executed)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> > ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
>>> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
>>> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
>>> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>>> >      Rows Removed by Filter: 1
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total runtime:
>>> > 747.031 ms
>>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
>>> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
>>> > n_distinct=5.0 most_common_vals={Z,C,G,P}
>>> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
>>> >
>>> >
>>> > 2016-04-27 13:43 GMT+02:00 Karl Wright <da...@gmail.com>:
>>> >> Hi Konstantin,
>>> >>
>>> >> The query you are looking at is performed by the UI only, and there is
>>> >> a
>>> >> parameter you can set which applies a limit to the number of documents
>>> >> so
>>> >> that the count is reported as "<limit>+" in the UI.  This is the
>>> >> parameter:
>>> >>
>>> >> org.apache.manifoldcf.ui.maxstatuscount
>>> >>
>>> >> As for why the database gets slow for crawling, unless you are seeing
>>> >> reports in the log of long-running queries, then it's a good chance
>>> >> you need
>>> >> to vacuum your database instance.  I generally recommend that a vacuum
>>> >> full
>>> >> be done periodically for database instances.  Autovacuuming has gotten
>>> >> a lot
>>> >> better in postgres than it used to be but at least in the past the
>>> >> autovacuuming process would get far behind ManifoldCF and so the
>>> >> database
>>> >> would get quite bloated anyway.  So I'd give that a try.
>>> >>
>>> >> If you are seeing logging output mentioning slow queries, you may need
>>> >> to
>>> >> tune how often MCF analyzes certain tables.  There are parameters that
>>> >> control that as well.  In general, if there is a slow query with a bad
>>> >> plan,
>>> >> and analyzing the tables involved makes it come up with a much better
>>> >> plan,
>>> >> analysis is not happening often enough.  But first, before you get to
>>> >> that
>>> >> point, have a look at the log and see whether this is likely to be the
>>> >> problem.  (Usually it is the stuffer query that gets slow when there's
>>> >> an
>>> >> issue with table analysis, FWIW).  Please feel free to post the plan
>>> >> of the
>>> >> queries being reported here.
>>> >>
>>> >> Thanks,
>>> >> Karl
>>> >>
>>> >>
>>> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <je...@gmail.com> wrote:
>>> >>>
>>> >>> Hi Karl,
>>> >>>
>>> >>> I set up two MCF instances (quick setup) on the same machine, using
>>> >>> the same Postgres 9.3 instance (with different databases
>>> >>> "org.apache.manifoldcf.database.name" of course).
>>> >>> After a couple of days I've got a performance issue: one MCF instance
>>> >>> has become very slow - it processes a few docs per hour only. I
>>> >>> guess,
>>> >>> the bottleneck is the database:
>>> >>>
>>> >>> "normal" instance:
>>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>>> >>> 738.311 rows in the table, took 1,2 sec
>>> >>> "G";50674
>>> >>> "F";68
>>> >>> "P";149179
>>> >>> "C";402367
>>> >>> "A";33
>>> >>> "Z";136676
>>> >>>
>>> >>> "slow" instance (currently with a single active job):
>>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>>> >>> 2.745.329 rows in the table, took 350 sec
>>> >>> "G";337922  --STATUS_PENDINGPURGATORY
>>> >>> "F";449     --STATUS_ACTIVEPURGATORY
>>> >>> "P";25909   --STATUS_PENDING
>>> >>> "C";562772  --STATUS_COMPLETE
>>> >>> "A";9       --STATUS_ACTIVE
>>> >>> "Z";1644927 --STATUS_PURGATORY
>>> >>>
>>> >>> Since "count(*)" is terrible slow in Postgres, I used the following
>>> >>> sql to count jobqueue's rows:
>>> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
>>> >>> relname = 'jobqueue';
>>> >>>
>>> >>> Both MCF instances have the same number of working threads, database
>>> >>> handles etc.
>>> >>> Is the database "full"? What could you recommend to improve the
>>> >>> performance?
>>> >>>
>>> >>> Thank you!
>>> >>> Konstantin
>>> >>
>>> >>
>>
>>
>

Re: Database performance

Posted by Karl Wright <da...@gmail.com>.
Also, have you looked at increasing the amount of shared buffers in your
postgresql configuration?  See the "how-to-build-and-deploy" page for that;
it can make a lot of difference for MCF.  So even if the plan is not
perfect the database simply performs better.

Karl

On Wed, Apr 27, 2016 at 8:53 AM, Karl Wright <da...@gmail.com> wrote:

> Hi Konstantin,
>
> Can you list the indexes defined on the jobqueue table?
>
> The query we do for stuffing is precisely aligned with one of the indexes
> we create on the jobqueue table.  It should be doing an ordered index read
> with complex filtering conditions; instead it is doing this:
>
> WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>
> ... with a hash join and a hash sort.
>
> Postgres is being way too smart for its own good here.  Probably that's at
> least partly because its time estimates for its chosen plan are far less
> than reality.  There may be a way to configure postgresql not to make that
> mistake but I don't know offhand how you do it.
>
> In order for us to be able to control the plan via hints, which would make
> this problem go away once and for all, you need the version of Postgres
> that you pay for.  Sigh.
>
> Anyhow, let's verify that the right indexes are in place, at least...
>
> Karl
>
>
> On Wed, Apr 27, 2016 at 8:38 AM, jetnet <je...@gmail.com> wrote:
>
>> yet another long running query:
>>
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
>> query (215465 ms): [UPDATE jobqueue SET
>> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
>> docpriority<?]
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
>> '1.000000001E9'
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
>> '1.000000001E9'
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
>> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
>> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
>> width=352)
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
>> Cond: (docpriority < 1000000001::double precision)
>>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
>>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
>> most_common_vals={Z,C,G,P}
>> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
>>
>>
>>
>> 2016-04-27 14:32 GMT+02:00 jetnet <je...@gmail.com>:
>> > Thank you for the fast reply!
>> >
>> > I provided the query to show, how many docs have been processed by the
>> > instances so far.
>> > I do not use UI to check the jobs statues - it takes 10-15 minutes to
>> > display the page :( (ui.maxstatuscount=100.000).
>> > The full vacuum and re-index database are running as a scheduled job
>> > on a daily basis.
>> > So, just tried to analyze the "problem" table - jobqueue, but I don't
>> > think it's helped a lot, the same "status" query took 250 second now,
>> > improved, but it's still very long.
>> >
>> > The long-running queries reported in the log are all about jobqueue
>> table, e.g.:
>> >
>> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
>> > query (75328 ms): [SELECT
>> > t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
>> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
>> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
>> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND
>> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
>> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
>> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
>> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
>> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
>> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
>> '1.000000001E9'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
>> '1461757713112'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
>> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>> > time=738.775..738.842 rows=400 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
>> > (cost=4016.80..4016.98 rows=72 width=235) (actual
>> > time=738.774..738.792 rows=400 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>> > Key: t0.docpriority
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
>> > Method: top-N heapsort  Memory: 139kB
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
>> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
>> > (actual time=46.841..723.735 rows=20215 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235) (actual
>> > time=46.770..108.460 rows=20215 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      Hash Cond: (t0.id = t3.owner)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
>> > (actual time=46.747..102.890 rows=20215 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            Join Filter: (t0.jobid = t1.id)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            Rows Removed by Join Filter: 643
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
>> > (actual time=0.015..0.024 rows=1 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  Filter: ((priority = 5::bigint) AND ((status =
>> > 'A'::bpchar) OR (status = 'a'::bpchar)))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  Rows Removed by Filter: 3
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
>> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  Recheck Cond: (((docpriority < 1000000001::double
>> > precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
>> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
>> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                        ->  Bitmap Index Scan on i1459943614200
>> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
>> > rows=333 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                              Index Cond: ((docpriority <
>> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                        ->  Bitmap Index Scan on i1459943614200
>> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
>> > rows=20555 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                              Index Cond: ((docpriority <
>> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
>> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
>> > time=0.004..0.004 rows=0 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8) (actual
>> > time=0.003..0.003 rows=0 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
>> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
>> > (never executed)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >                        ->  Seq Scan on events t4  (cost=0.00..11.30
>> > rows=130 width=516) (never executed)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> > ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
>> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
>> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
>> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>> >      Rows Removed by Filter: 1
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total runtime:
>> > 747.031 ms
>> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
>> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
>> > n_distinct=5.0 most_common_vals={Z,C,G,P}
>> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
>> >
>> >
>> > 2016-04-27 13:43 GMT+02:00 Karl Wright <da...@gmail.com>:
>> >> Hi Konstantin,
>> >>
>> >> The query you are looking at is performed by the UI only, and there is
>> a
>> >> parameter you can set which applies a limit to the number of documents
>> so
>> >> that the count is reported as "<limit>+" in the UI.  This is the
>> parameter:
>> >>
>> >> org.apache.manifoldcf.ui.maxstatuscount
>> >>
>> >> As for why the database gets slow for crawling, unless you are seeing
>> >> reports in the log of long-running queries, then it's a good chance
>> you need
>> >> to vacuum your database instance.  I generally recommend that a vacuum
>> full
>> >> be done periodically for database instances.  Autovacuuming has gotten
>> a lot
>> >> better in postgres than it used to be but at least in the past the
>> >> autovacuuming process would get far behind ManifoldCF and so the
>> database
>> >> would get quite bloated anyway.  So I'd give that a try.
>> >>
>> >> If you are seeing logging output mentioning slow queries, you may need
>> to
>> >> tune how often MCF analyzes certain tables.  There are parameters that
>> >> control that as well.  In general, if there is a slow query with a bad
>> plan,
>> >> and analyzing the tables involved makes it come up with a much better
>> plan,
>> >> analysis is not happening often enough.  But first, before you get to
>> that
>> >> point, have a look at the log and see whether this is likely to be the
>> >> problem.  (Usually it is the stuffer query that gets slow when there's
>> an
>> >> issue with table analysis, FWIW).  Please feel free to post the plan
>> of the
>> >> queries being reported here.
>> >>
>> >> Thanks,
>> >> Karl
>> >>
>> >>
>> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <je...@gmail.com> wrote:
>> >>>
>> >>> Hi Karl,
>> >>>
>> >>> I set up two MCF instances (quick setup) on the same machine, using
>> >>> the same Postgres 9.3 instance (with different databases
>> >>> "org.apache.manifoldcf.database.name" of course).
>> >>> After a couple of days I've got a performance issue: one MCF instance
>> >>> has become very slow - it processes a few docs per hour only. I guess,
>> >>> the bottleneck is the database:
>> >>>
>> >>> "normal" instance:
>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> >>> 738.311 rows in the table, took 1,2 sec
>> >>> "G";50674
>> >>> "F";68
>> >>> "P";149179
>> >>> "C";402367
>> >>> "A";33
>> >>> "Z";136676
>> >>>
>> >>> "slow" instance (currently with a single active job):
>> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> >>> 2.745.329 rows in the table, took 350 sec
>> >>> "G";337922  --STATUS_PENDINGPURGATORY
>> >>> "F";449     --STATUS_ACTIVEPURGATORY
>> >>> "P";25909   --STATUS_PENDING
>> >>> "C";562772  --STATUS_COMPLETE
>> >>> "A";9       --STATUS_ACTIVE
>> >>> "Z";1644927 --STATUS_PURGATORY
>> >>>
>> >>> Since "count(*)" is terrible slow in Postgres, I used the following
>> >>> sql to count jobqueue's rows:
>> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
>> >>> relname = 'jobqueue';
>> >>>
>> >>> Both MCF instances have the same number of working threads, database
>> >>> handles etc.
>> >>> Is the database "full"? What could you recommend to improve the
>> >>> performance?
>> >>>
>> >>> Thank you!
>> >>> Konstantin
>> >>
>> >>
>>
>
>

Re: Database performance

Posted by Karl Wright <da...@gmail.com>.
Hi Konstantin,

Can you list the indexes defined on the jobqueue table?

The query we do for stuffing is precisely aligned with one of the indexes
we create on the jobqueue table.  It should be doing an ordered index read
with complex filtering conditions; instead it is doing this:

WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
           ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)

... with a hash join and a hash sort.

Postgres is being way too smart for its own good here.  Probably that's at
least partly because its time estimates for its chosen plan are far less
than reality.  There may be a way to configure postgresql not to make that
mistake but I don't know offhand how you do it.

In order for us to be able to control the plan via hints, which would make
this problem go away once and for all, you need the version of Postgres
that you pay for.  Sigh.

Anyhow, let's verify that the right indexes are in place, at least...

Karl


On Wed, Apr 27, 2016 at 8:38 AM, jetnet <je...@gmail.com> wrote:

> yet another long running query:
>
>  WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
> query (215465 ms): [UPDATE jobqueue SET
> docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
> docpriority<?]
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0:
> '1.000000001E9'
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2:
> '1.000000001E9'
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
> jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
> Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
> width=352)
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
> Cond: (docpriority < 1000000001::double precision)
>  WARN 2016-04-27 14:23:31,562 (Agents thread) -
>  WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
> most_common_vals={Z,C,G,P}
> most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}
>
>
>
> 2016-04-27 14:32 GMT+02:00 jetnet <je...@gmail.com>:
> > Thank you for the fast reply!
> >
> > I provided the query to show, how many docs have been processed by the
> > instances so far.
> > I do not use UI to check the jobs statues - it takes 10-15 minutes to
> > display the page :( (ui.maxstatuscount=100.000).
> > The full vacuum and re-index database are running as a scheduled job
> > on a daily basis.
> > So, just tried to analyze the "problem" table - jobqueue, but I don't
> > think it's helped a lot, the same "status" query took 250 second now,
> > improved, but it's still very long.
> >
> > The long-running queries reported in the log are all about jobqueue
> table, e.g.:
> >
> >  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
> > query (75328 ms): [SELECT
> > t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
> > FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
> > t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
> > EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND
> > t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
> > jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
> > t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
> > t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
> > prereqevents t3,events t4 WHERE t0.id=t3.owner AND
> > t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0:
> '1.000000001E9'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4:
> '1461757713112'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
> >  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
> > (cost=4016.80..4016.98 rows=72 width=235) (actual
> > time=738.775..738.842 rows=400 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
> > (cost=4016.80..4016.98 rows=72 width=235) (actual
> > time=738.774..738.792 rows=400 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
> > Key: t0.docpriority
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
> > Method: top-N heapsort  Memory: 139kB
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
> > Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
> > (actual time=46.841..723.735 rows=20215 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> > ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235) (actual
> > time=46.770..108.460 rows=20215 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      Hash Cond: (t0.id = t3.owner)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
> > (actual time=46.747..102.890 rows=20215 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            Join Filter: (t0.jobid = t1.id)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            Rows Removed by Join Filter: 643
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
> > (actual time=0.015..0.024 rows=1 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  Filter: ((priority = 5::bigint) AND ((status =
> > 'A'::bpchar) OR (status = 'a'::bpchar)))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  Rows Removed by Filter: 3
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
> > rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  Recheck Cond: (((docpriority < 1000000001::double
> > precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
> > AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
> > width=0) (actual time=45.349..45.349 rows=0 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                        ->  Bitmap Index Scan on i1459943614200
> > (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
> > rows=333 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                              Index Cond: ((docpriority <
> > 1000000001::double precision) AND (status = 'P'::bpchar) AND
> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                        ->  Bitmap Index Scan on i1459943614200
> > (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
> > rows=20555 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                              Index Cond: ((docpriority <
> > 1000000001::double precision) AND (status = 'G'::bpchar) AND
> > (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
> > time=0.004..0.004 rows=0 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            Buckets: 1024  Batches: 1  Memory Usage: 0kB
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8) (actual
> > time=0.003..0.003 rows=0 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
> > rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
> > (never executed)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >                        ->  Seq Scan on events t4  (cost=0.00..11.30
> > rows=130 width=516) (never executed)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> > ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
> > rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      Index Cond: ((dochash)::text = (t0.dochash)::text)
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
> > (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
> > 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> >      Rows Removed by Filter: 1
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total runtime:
> > 747.031 ms
> >  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
> >  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
> > n_distinct=5.0 most_common_vals={Z,C,G,P}
> > most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
> >
> >
> > 2016-04-27 13:43 GMT+02:00 Karl Wright <da...@gmail.com>:
> >> Hi Konstantin,
> >>
> >> The query you are looking at is performed by the UI only, and there is a
> >> parameter you can set which applies a limit to the number of documents
> so
> >> that the count is reported as "<limit>+" in the UI.  This is the
> parameter:
> >>
> >> org.apache.manifoldcf.ui.maxstatuscount
> >>
> >> As for why the database gets slow for crawling, unless you are seeing
> >> reports in the log of long-running queries, then it's a good chance you
> need
> >> to vacuum your database instance.  I generally recommend that a vacuum
> full
> >> be done periodically for database instances.  Autovacuuming has gotten
> a lot
> >> better in postgres than it used to be but at least in the past the
> >> autovacuuming process would get far behind ManifoldCF and so the
> database
> >> would get quite bloated anyway.  So I'd give that a try.
> >>
> >> If you are seeing logging output mentioning slow queries, you may need
> to
> >> tune how often MCF analyzes certain tables.  There are parameters that
> >> control that as well.  In general, if there is a slow query with a bad
> plan,
> >> and analyzing the tables involved makes it come up with a much better
> plan,
> >> analysis is not happening often enough.  But first, before you get to
> that
> >> point, have a look at the log and see whether this is likely to be the
> >> problem.  (Usually it is the stuffer query that gets slow when there's
> an
> >> issue with table analysis, FWIW).  Please feel free to post the plan of
> the
> >> queries being reported here.
> >>
> >> Thanks,
> >> Karl
> >>
> >>
> >> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <je...@gmail.com> wrote:
> >>>
> >>> Hi Karl,
> >>>
> >>> I set up two MCF instances (quick setup) on the same machine, using
> >>> the same Postgres 9.3 instance (with different databases
> >>> "org.apache.manifoldcf.database.name" of course).
> >>> After a couple of days I've got a performance issue: one MCF instance
> >>> has become very slow - it processes a few docs per hour only. I guess,
> >>> the bottleneck is the database:
> >>>
> >>> "normal" instance:
> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> >>> 738.311 rows in the table, took 1,2 sec
> >>> "G";50674
> >>> "F";68
> >>> "P";149179
> >>> "C";402367
> >>> "A";33
> >>> "Z";136676
> >>>
> >>> "slow" instance (currently with a single active job):
> >>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> >>> 2.745.329 rows in the table, took 350 sec
> >>> "G";337922  --STATUS_PENDINGPURGATORY
> >>> "F";449     --STATUS_ACTIVEPURGATORY
> >>> "P";25909   --STATUS_PENDING
> >>> "C";562772  --STATUS_COMPLETE
> >>> "A";9       --STATUS_ACTIVE
> >>> "Z";1644927 --STATUS_PURGATORY
> >>>
> >>> Since "count(*)" is terrible slow in Postgres, I used the following
> >>> sql to count jobqueue's rows:
> >>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
> >>> relname = 'jobqueue';
> >>>
> >>> Both MCF instances have the same number of working threads, database
> >>> handles etc.
> >>> Is the database "full"? What could you recommend to improve the
> >>> performance?
> >>>
> >>> Thank you!
> >>> Konstantin
> >>
> >>
>

Re: Database performance

Posted by jetnet <je...@gmail.com>.
yet another long running query:

 WARN 2016-04-27 14:23:31,562 (Agents thread) - Found a long-running
query (215465 ms): [UPDATE jobqueue SET
docpriority=?,needpriorityprocessid=NULL,needpriority=? WHERE
docpriority<?]
 WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 0: '1.000000001E9'
 WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 1: 'T'
 WARN 2016-04-27 14:23:31,562 (Agents thread) -   Parameter 2: '1.000000001E9'
 WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan: Update on
jobqueue  (cost=0.56..37802.01 rows=19647 width=352)
 WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:   ->  Index
Scan using i1459943614200 on jobqueue  (cost=0.56..37802.01 rows=19647
width=352)
 WARN 2016-04-27 14:23:31,562 (Agents thread) -  Plan:         Index
Cond: (docpriority < 1000000001::double precision)
 WARN 2016-04-27 14:23:31,562 (Agents thread) -
 WARN 2016-04-27 14:23:31,578 (Agents thread) -  Stats: n_distinct=4.0
most_common_vals={Z,C,G,P}
most_common_freqs={0.64193332,0.21796666,0.13086666,0.0092333332}



2016-04-27 14:32 GMT+02:00 jetnet <je...@gmail.com>:
> Thank you for the fast reply!
>
> I provided the query to show, how many docs have been processed by the
> instances so far.
> I do not use UI to check the jobs statues - it takes 10-15 minutes to
> display the page :( (ui.maxstatuscount=100.000).
> The full vacuum and re-index database are running as a scheduled job
> on a daily basis.
> So, just tried to analyze the "problem" table - jobqueue, but I don't
> think it's helped a lot, the same "status" query took 250 second now,
> improved, but it's still very long.
>
> The long-running queries reported in the log are all about jobqueue table, e.g.:
>
>  WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
> query (75328 ms): [SELECT
> t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
> FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
> t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
> EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND
> t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
> jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
> t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
> t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
> prereqevents t3,events t4 WHERE t0.id=t3.owner AND
> t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0: '1.000000001E9'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4: '1461757713112'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
>  WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
> (cost=4016.80..4016.98 rows=72 width=235) (actual
> time=738.775..738.842 rows=400 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
> (cost=4016.80..4016.98 rows=72 width=235) (actual
> time=738.774..738.792 rows=400 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
> Key: t0.docpriority
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
> Method: top-N heapsort  Memory: 139kB
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
> Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
> (actual time=46.841..723.735 rows=20215 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> ->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235) (actual
> time=46.770..108.460 rows=20215 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>      Hash Cond: (t0.id = t3.owner)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>      ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
> (actual time=46.747..102.890 rows=20215 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>            Join Filter: (t0.jobid = t1.id)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>            Rows Removed by Join Filter: 643
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>            ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
> (actual time=0.015..0.024 rows=1 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                  Filter: ((priority = 5::bigint) AND ((status =
> 'A'::bpchar) OR (status = 'a'::bpchar)))
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                  Rows Removed by Filter: 3
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>            ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
> rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                  Recheck Cond: (((docpriority < 1000000001::double
> precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
> AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
> 1000000001::double precision) AND (status = 'G'::bpchar) AND
> (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)))
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                  ->  BitmapOr  (cost=2223.73..2223.73 rows=288
> width=0) (actual time=45.349..45.349 rows=0 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                        ->  Bitmap Index Scan on i1459943614200
> (cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
> rows=333 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                              Index Cond: ((docpriority <
> 1000000001::double precision) AND (status = 'P'::bpchar) AND
> (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                        ->  Bitmap Index Scan on i1459943614200
> (cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
> rows=20555 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                              Index Cond: ((docpriority <
> 1000000001::double precision) AND (status = 'G'::bpchar) AND
> (checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>      ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
> time=0.004..0.004 rows=0 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>            Buckets: 1024  Batches: 1  Memory Usage: 0kB
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>            ->  Hash Join  (cost=12.93..26.15 rows=130 width=8) (actual
> time=0.003..0.003 rows=0 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                  ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
> rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                  ->  Hash  (cost=11.30..11.30 rows=130 width=516)
> (never executed)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>                        ->  Seq Scan on events t4  (cost=0.00..11.30
> rows=130 width=516) (never executed)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
> ->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
> rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>      Index Cond: ((dochash)::text = (t0.dochash)::text)
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>      Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
> (status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
> 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
>      Rows Removed by Filter: 1
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total runtime:
> 747.031 ms
>  WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
>  WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
> n_distinct=5.0 most_common_vals={Z,C,G,P}
> most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}
>
>
> 2016-04-27 13:43 GMT+02:00 Karl Wright <da...@gmail.com>:
>> Hi Konstantin,
>>
>> The query you are looking at is performed by the UI only, and there is a
>> parameter you can set which applies a limit to the number of documents so
>> that the count is reported as "<limit>+" in the UI.  This is the parameter:
>>
>> org.apache.manifoldcf.ui.maxstatuscount
>>
>> As for why the database gets slow for crawling, unless you are seeing
>> reports in the log of long-running queries, then it's a good chance you need
>> to vacuum your database instance.  I generally recommend that a vacuum full
>> be done periodically for database instances.  Autovacuuming has gotten a lot
>> better in postgres than it used to be but at least in the past the
>> autovacuuming process would get far behind ManifoldCF and so the database
>> would get quite bloated anyway.  So I'd give that a try.
>>
>> If you are seeing logging output mentioning slow queries, you may need to
>> tune how often MCF analyzes certain tables.  There are parameters that
>> control that as well.  In general, if there is a slow query with a bad plan,
>> and analyzing the tables involved makes it come up with a much better plan,
>> analysis is not happening often enough.  But first, before you get to that
>> point, have a look at the log and see whether this is likely to be the
>> problem.  (Usually it is the stuffer query that gets slow when there's an
>> issue with table analysis, FWIW).  Please feel free to post the plan of the
>> queries being reported here.
>>
>> Thanks,
>> Karl
>>
>>
>> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <je...@gmail.com> wrote:
>>>
>>> Hi Karl,
>>>
>>> I set up two MCF instances (quick setup) on the same machine, using
>>> the same Postgres 9.3 instance (with different databases
>>> "org.apache.manifoldcf.database.name" of course).
>>> After a couple of days I've got a performance issue: one MCF instance
>>> has become very slow - it processes a few docs per hour only. I guess,
>>> the bottleneck is the database:
>>>
>>> "normal" instance:
>>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>>> 738.311 rows in the table, took 1,2 sec
>>> "G";50674
>>> "F";68
>>> "P";149179
>>> "C";402367
>>> "A";33
>>> "Z";136676
>>>
>>> "slow" instance (currently with a single active job):
>>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>>> 2.745.329 rows in the table, took 350 sec
>>> "G";337922  --STATUS_PENDINGPURGATORY
>>> "F";449     --STATUS_ACTIVEPURGATORY
>>> "P";25909   --STATUS_PENDING
>>> "C";562772  --STATUS_COMPLETE
>>> "A";9       --STATUS_ACTIVE
>>> "Z";1644927 --STATUS_PURGATORY
>>>
>>> Since "count(*)" is terrible slow in Postgres, I used the following
>>> sql to count jobqueue's rows:
>>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
>>> relname = 'jobqueue';
>>>
>>> Both MCF instances have the same number of working threads, database
>>> handles etc.
>>> Is the database "full"? What could you recommend to improve the
>>> performance?
>>>
>>> Thank you!
>>> Konstantin
>>
>>

Re: Database performance

Posted by jetnet <je...@gmail.com>.
Thank you for the fast reply!

I provided the query to show, how many docs have been processed by the
instances so far.
I do not use UI to check the jobs statues - it takes 10-15 minutes to
display the page :( (ui.maxstatuscount=100.000).
The full vacuum and re-index database are running as a scheduled job
on a daily basis.
So, just tried to analyze the "problem" table - jobqueue, but I don't
think it's helped a lot, the same "status" query took 250 second now,
improved, but it's still very long.

The long-running queries reported in the log are all about jobqueue table, e.g.:

 WARN 2016-04-27 13:49:48,596 (Stuffer thread) - Found a long-running
query (75328 ms): [SELECT
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount
FROM jobqueue t0  WHERE t0.docpriority<? AND  (t0.status=? OR
t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND
EXISTS(SELECT 'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND
t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' FROM
jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR
t2.status=? OR t2.status=? OR t2.status=? OR t2.status=? OR
t2.status=?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
prereqevents t3,events t4 WHERE t0.id=t3.owner AND
t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 0: '1.000000001E9'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 1: 'P'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 2: 'G'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 3: 'R'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 4: '1461757713112'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 5: 'A'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 6: 'a'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 7: '5'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 8: 'A'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 9: 'F'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 10: 'a'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 11: 'f'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 12: 'D'
 WARN 2016-04-27 13:49:48,628 (Stuffer thread) -   Parameter 13: 'd'
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Limit
(cost=4016.80..4016.98 rows=72 width=235) (actual
time=738.775..738.842 rows=400 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:   ->  Sort
(cost=4016.80..4016.98 rows=72 width=235) (actual
time=738.774..738.792 rows=400 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
Key: t0.docpriority
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         Sort
Method: top-N heapsort  Memory: 139kB
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:         ->
Nested Loop Anti Join  (cost=2252.06..4014.58 rows=72 width=235)
(actual time=46.841..723.735 rows=20215 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
->  Hash Anti Join  (cost=2251.51..3387.34 rows=72 width=235) (actual
time=46.770..108.460 rows=20215 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
     Hash Cond: (t0.id = t3.owner)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
     ->  Nested Loop  (cost=2223.73..3358.66 rows=72 width=235)
(actual time=46.747..102.890 rows=20215 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
           Join Filter: (t0.jobid = t1.id)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
           Rows Removed by Join Filter: 643
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
           ->  Seq Scan on jobs t1  (cost=0.00..2.14 rows=1 width=8)
(actual time=0.015..0.024 rows=1 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                 Filter: ((priority = 5::bigint) AND ((status =
'A'::bpchar) OR (status = 'a'::bpchar)))
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                 Rows Removed by Filter: 3
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
           ->  Bitmap Heap Scan on jobqueue t0  (cost=2223.73..3352.95
rows=286 width=235) (actual time=46.705..93.152 rows=20858 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                 Recheck Cond: (((docpriority < 1000000001::double
precision) AND (status = 'P'::bpchar) AND (checkaction = 'R'::bpchar)
AND (checktime <= 1461757713112::bigint)) OR ((docpriority <
1000000001::double precision) AND (status = 'G'::bpchar) AND
(checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint)))
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                 ->  BitmapOr  (cost=2223.73..2223.73 rows=288
width=0) (actual time=45.349..45.349 rows=0 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                       ->  Bitmap Index Scan on i1459943614200
(cost=0.00..1111.80 rows=20 width=0) (actual time=29.325..29.325
rows=333 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                             Index Cond: ((docpriority <
1000000001::double precision) AND (status = 'P'::bpchar) AND
(checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                       ->  Bitmap Index Scan on i1459943614200
(cost=0.00..1111.80 rows=268 width=0) (actual time=16.020..16.020
rows=20555 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                             Index Cond: ((docpriority <
1000000001::double precision) AND (status = 'G'::bpchar) AND
(checkaction = 'R'::bpchar) AND (checktime <= 1461757713112::bigint))
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
     ->  Hash  (cost=26.15..26.15 rows=130 width=8) (actual
time=0.004..0.004 rows=0 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
           Buckets: 1024  Batches: 1  Memory Usage: 0kB
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
           ->  Hash Join  (cost=12.93..26.15 rows=130 width=8) (actual
time=0.003..0.003 rows=0 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                 Hash Cond: ((t3.eventname)::text = (t4.name)::text)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                 ->  Seq Scan on prereqevents t3  (cost=0.00..11.40
rows=140 width=524) (actual time=0.001..0.001 rows=0 loops=1)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                 ->  Hash  (cost=11.30..11.30 rows=130 width=516)
(never executed)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
                       ->  Seq Scan on events t4  (cost=0.00..11.30
rows=130 width=516) (never executed)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
->  Index Scan using i1459943614199 on jobqueue t2  (cost=0.56..8.59
rows=1 width=49) (actual time=0.030..0.030 rows=0 loops=20215)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
     Index Cond: ((dochash)::text = (t0.dochash)::text)
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
     Filter: ((jobid <> t0.jobid) AND ((status = 'A'::bpchar) OR
(status = 'F'::bpchar) OR (status = 'a'::bpchar) OR (status =
'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan:
     Rows Removed by Filter: 1
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -  Plan: Total runtime:
747.031 ms
 WARN 2016-04-27 13:49:49,440 (Stuffer thread) -
 WARN 2016-04-27 13:49:50,206 (Stuffer thread) -  Stats:
n_distinct=5.0 most_common_vals={Z,C,G,P}
most_common_freqs={0.64013332,0.21616666,0.13356666,0.0101}


2016-04-27 13:43 GMT+02:00 Karl Wright <da...@gmail.com>:
> Hi Konstantin,
>
> The query you are looking at is performed by the UI only, and there is a
> parameter you can set which applies a limit to the number of documents so
> that the count is reported as "<limit>+" in the UI.  This is the parameter:
>
> org.apache.manifoldcf.ui.maxstatuscount
>
> As for why the database gets slow for crawling, unless you are seeing
> reports in the log of long-running queries, then it's a good chance you need
> to vacuum your database instance.  I generally recommend that a vacuum full
> be done periodically for database instances.  Autovacuuming has gotten a lot
> better in postgres than it used to be but at least in the past the
> autovacuuming process would get far behind ManifoldCF and so the database
> would get quite bloated anyway.  So I'd give that a try.
>
> If you are seeing logging output mentioning slow queries, you may need to
> tune how often MCF analyzes certain tables.  There are parameters that
> control that as well.  In general, if there is a slow query with a bad plan,
> and analyzing the tables involved makes it come up with a much better plan,
> analysis is not happening often enough.  But first, before you get to that
> point, have a look at the log and see whether this is likely to be the
> problem.  (Usually it is the stuffer query that gets slow when there's an
> issue with table analysis, FWIW).  Please feel free to post the plan of the
> queries being reported here.
>
> Thanks,
> Karl
>
>
> On Wed, Apr 27, 2016 at 7:33 AM, jetnet <je...@gmail.com> wrote:
>>
>> Hi Karl,
>>
>> I set up two MCF instances (quick setup) on the same machine, using
>> the same Postgres 9.3 instance (with different databases
>> "org.apache.manifoldcf.database.name" of course).
>> After a couple of days I've got a performance issue: one MCF instance
>> has become very slow - it processes a few docs per hour only. I guess,
>> the bottleneck is the database:
>>
>> "normal" instance:
>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> 738.311 rows in the table, took 1,2 sec
>> "G";50674
>> "F";68
>> "P";149179
>> "C";402367
>> "A";33
>> "Z";136676
>>
>> "slow" instance (currently with a single active job):
>> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
>> 2.745.329 rows in the table, took 350 sec
>> "G";337922  --STATUS_PENDINGPURGATORY
>> "F";449     --STATUS_ACTIVEPURGATORY
>> "P";25909   --STATUS_PENDING
>> "C";562772  --STATUS_COMPLETE
>> "A";9       --STATUS_ACTIVE
>> "Z";1644927 --STATUS_PURGATORY
>>
>> Since "count(*)" is terrible slow in Postgres, I used the following
>> sql to count jobqueue's rows:
>> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
>> relname = 'jobqueue';
>>
>> Both MCF instances have the same number of working threads, database
>> handles etc.
>> Is the database "full"? What could you recommend to improve the
>> performance?
>>
>> Thank you!
>> Konstantin
>
>

Re: Database performance

Posted by Karl Wright <da...@gmail.com>.
Hi Konstantin,

The query you are looking at is performed by the UI only, and there is a
parameter you can set which applies a limit to the number of documents so
that the count is reported as "<limit>+" in the UI.  This is the parameter:

org.apache.manifoldcf.ui.maxstatuscount

As for why the database gets slow for crawling, unless you are seeing
reports in the log of long-running queries, then it's a good chance you
need to vacuum your database instance.  I generally recommend that a vacuum
full be done periodically for database instances.  Autovacuuming has gotten
a lot better in postgres than it used to be but at least in the past the
autovacuuming process would get far behind ManifoldCF and so the database
would get quite bloated anyway.  So I'd give that a try.

If you are seeing logging output mentioning slow queries, you may need to
tune how often MCF analyzes certain tables.  There are parameters that
control that as well.  In general, if there is a slow query with a bad
plan, and analyzing the tables involved makes it come up with a much better
plan, analysis is not happening often enough.  But first, before you get to
that point, have a look at the log and see whether this is likely to be the
problem.  (Usually it is the stuffer query that gets slow when there's an
issue with table analysis, FWIW).  Please feel free to post the plan of the
queries being reported here.

Thanks,
Karl


On Wed, Apr 27, 2016 at 7:33 AM, jetnet <je...@gmail.com> wrote:

> Hi Karl,
>
> I set up two MCF instances (quick setup) on the same machine, using
> the same Postgres 9.3 instance (with different databases
> "org.apache.manifoldcf.database.name" of course).
> After a couple of days I've got a performance issue: one MCF instance
> has become very slow - it processes a few docs per hour only. I guess,
> the bottleneck is the database:
>
> "normal" instance:
> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> 738.311 rows in the table, took 1,2 sec
> "G";50674
> "F";68
> "P";149179
> "C";402367
> "A";33
> "Z";136676
>
> "slow" instance (currently with a single active job):
> SELECT status, count(*) AS count FROM jobqueue GROUP BY status --
> 2.745.329 rows in the table, took 350 sec
> "G";337922  --STATUS_PENDINGPURGATORY
> "F";449     --STATUS_ACTIVEPURGATORY
> "P";25909   --STATUS_PENDING
> "C";562772  --STATUS_COMPLETE
> "A";9       --STATUS_ACTIVE
> "Z";1644927 --STATUS_PURGATORY
>
> Since "count(*)" is terrible slow in Postgres, I used the following
> sql to count jobqueue's rows:
> SELECT reltuples::bigint AS approximate_row_count FROM pg_class WHERE
> relname = 'jobqueue';
>
> Both MCF instances have the same number of working threads, database
> handles etc.
> Is the database "full"? What could you recommend to improve the
> performance?
>
> Thank you!
> Konstantin
>