You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@manifoldcf.apache.org by Markus Schuch <ma...@deutschebahn.com> on 2017/11/29 21:46:54 UTC

AW: Amazon RDS for PostgreSQL Support

Hi,

since nobody responded, we started to experiment.

Setup:
AWS RDS Postgres 9.6.3
db.m4.xlarge (4 cores, 16 GB RAM)
provisioned iops 4000

Indexing Performance for most jobs is pretty good, except a for a bigger sharepoint crawl (~200.000 docs).

We are seeing a lot of long running queries for the tables carrydown here.

Logfile:
https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt

Settings:
https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%2520settings
(due to this discussion we left autovac on: https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html)

CPU Usage of the Database is between 15% and 35%

Are we hitting bad auto optimiziations of the newer postgresql version?

Many thanks in advance,
Markus

Re: Amazon RDS for PostgreSQL Support

Posted by Karl Wright <da...@gmail.com>.

Yes, that looks right.

Karl


On Sat, Dec 16, 2017 at 10:21 AM, Markus Schuch <ma...@web.de>
wrote:

> would this be the right place?
> https://cwiki.apache.org/confluence/display/CONNECTORS/FAQ
>
> Am 16.12.2017 um 15:08 schrieb Karl Wright:
> > We should write up your findings in a FAQ if you find MySQL to perform
> > better than postgresql.
> >
> > My guess is that what you are seeing is less "bad plan", more
> > "contention and backoff", but that would take some effort to tease out.
> >
> > Karl
> >
> > On Fri, Dec 15, 2017 at 3:06 PM, Karl Wright <daddywri@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >     Interesting.  Thanks for the update.
> >     Karl
> >
> >     On Fri, Dec 15, 2017 at 2:31 PM, Markus Schuch
> >     <markus.schuch@deutschebahn.com
> >     <ma...@deutschebahn.com>> wrote:
> >
> >         We were no able to resolve the performance issue with the
> >         carrydown table.____
> >
> >         __ __
> >
> >         We switched to RDS Aurora MySQL, which performs similar to the
> >         MariaDB database in our on premise datacenter. No problems with
> >         the carrydown table queries of the sharepoint connector.____
> >
> >         __ __
> >
> >         Regards____
> >
> >         Markus____
> >
> >         __ __
> >
> >         *Von:* Karl Wright [mailto:daddywri@gmail.com
> >         <ma...@gmail.com>]
> >         *Gesendet:* Donnerstag, 30. November 2017 12:23
> >
> >
> >         *An:* user@manifoldcf.apache.org <mailto:user@manifoldcf.
> apache.org>
> >         *Betreff:* Re: Amazon RDS for PostgreSQL Support____
> >
> >         __ __
> >
> >         Typically Jobs table is short and sequential scans are faster
> >         than index joins.  Postgres optimizes for that.____
> >
> >         __ __
> >
> >         No, the plans look fine.  Another reason for the long-running
> >         queries might well be contention and locking -- many threads
> >         will be trying to do similar things at the same time.  You will
> >         note that multiple records get updated in one query; this is
> >         usually helpful but when each update is expensive you could wind
> >         up with locking causing delays.____
> >
> >         __ __
> >
> >         Karl____
> >
> >         __ __
> >
> >         __ __
> >
> >         On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch
> >         <markus.schuch@deutschebahn.com
> >         <ma...@deutschebahn.com>> wrote:____
> >
> >             Hi Karl,____
> >
> >              ____
> >
> >             we disabled autovacuum____
> >
> >             The stats table show there was no autovac since then.____
> >
> >              ____
> >
> >             The long running queries still occur.____
> >
> >             There are no other apps using the database and no other jobs
> >             running.____
> >
> >              ____
> >
> >             But there is another long running query to the jobs table
> >             between the carrydown queries.____
> >
> >             This query seem to happen at the same time with the long
> >             running carrydown query.____
> >
> >              ____
> >
> >             The plan output says “Sec Scan on jobs…”. ____
> >
> >              ____
> >
> >             2017-11-30 08:16:16,008 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db: Found a long-running query (169057
> >             ms): [SELECT id FROM jobs WHERE  (status=? OR status=? OR
> >             status=?) FOR UPDATE]____
> >
> >             2017-11-30 08:16:16,009 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:   Parameter 0: 'A'____
> >
> >             2017-11-30 08:16:16,009 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:   Parameter 1: 'W'____
> >
> >             2017-11-30 08:16:16,009 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:   Parameter 2: 'R'____
> >
> >             2017-11-30 08:16:16,011 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:  Plan: LockRows  (cost=0.00..4.43
> >             rows=3 width=14) (actual time=0.022..0.024 rows=1
> loops=1)____
> >
> >             2017-11-30 08:16:16,011 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:  Plan:   ->  Seq Scan on jobs
> >             (cost=0.00..4.40 rows=3 width=14) (actual time=0.021..0.022
> >             rows=1 loops=1)____
> >
> >             2017-11-30 08:16:16,011 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:  Plan:         Filter: ((status =
> >             'A'::bpchar) OR (status = 'W'::bpchar) OR (status =
> >             'R'::bpchar))____
> >
> >             2017-11-30 08:16:16,011 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:  Plan:         Rows Removed by
> >             Filter: 22____
> >
> >             2017-11-30 08:16:16,011 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:  Plan: Planning time: 0.093 ms____
> >
> >             2017-11-30 08:16:16,011 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:  Plan: Execution time: 0.041 ms____
> >
> >             2017-11-30 08:16:16,011 WARN  [Finisher thread]
> >             org.apache.manifoldcf.db:____
> >
> >              ____
> >
> >             Jobs table indices:____
> >
> >             public    jobs       jobs_pkey
> >             CREATE UNIQUE INDEX jobs_pkey ON jobs USING btree (id)____
> >
> >             public    jobs
> >             i1511464305264                               CREATE INDEX
> >             i1511464305264 ON jobs USING btree (status, id, priority)____
> >
> >             public    jobs
> >             i1511464305263                               CREATE INDEX
> >             i1511464305263 ON jobs USING btree (status, processid)____
> >
> >             public    jobs
> >             i1511464305262                               CREATE INDEX
> >             i1511464305262 ON jobs USING btree (connectionname)____
> >
> >             public    jobs
> >             i1511464305261                               CREATE INDEX
> >             i1511464305261 ON jobs USING btree (failtime)____
> >
> >              ____
> >
> >             Shouldn’t be i1511464305264 be used?____
> >
> >              ____
> >
> >             Many thanks in advance____
> >
> >             Markus____
> >
> >              ____
> >
> >             *Von:* Karl Wright [mailto:daddywri@gmail.com
> >             <ma...@gmail.com>]
> >             *Gesendet:* Mittwoch, 29. November 2017 23:28
> >             *An:* user@manifoldcf.apache.org
> >             <ma...@manifoldcf.apache.org>
> >             *Betreff:* Re: Amazon RDS for PostgreSQL Support____
> >
> >              ____
> >
> >             The plans look good for the carrydown execution, so I have
> >             to conclude that the long-running queries are due to other
> >             considerations -- perhaps concurrent vacuuming, perhaps
> >             other queries pounding the database.____
> >
> >              ____
> >
> >             Sharepoint is especially hard on the carrydown table -- it
> >             gets huge.____
> >
> >              ____
> >
> >             Karl____
> >
> >              ____
> >
> >              ____
> >
> >             On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch
> >             <markus.schuch@deutschebahn.com
> >             <ma...@deutschebahn.com>> wrote:____
> >
> >                 Hi,____
> >
> >                  ____
> >
> >                 since nobody responded, we started to experiment.____
> >
> >                  ____
> >
> >                 Setup:____
> >
> >                 AWS RDS Postgres 9.6.3____
> >
> >                 db.m4.xlarge (4 cores, 16 GB RAM)____
> >
> >                 provisioned iops 4000____
> >
> >                  ____
> >
> >                 Indexing Performance for most jobs is pretty good,
> >                 except a for a bigger sharepoint crawl (~200.000
> docs).____
> >
> >                  ____
> >
> >                 We are seeing a lot of long running queries for the
> >                 tables carrydown here.____
> >
> >                  ____
> >
> >                 Logfile:____
> >
> >                 https://gist.githubusercontent.com/schuch/
> b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f
> 8417550c17/gistfile1.txt
> >                 <https://gist.githubusercontent.com/schuch/
> b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f
> 8417550c17/gistfile1.txt>____
> >
> >                  ____
> >
> >                 Settings:____
> >
> >                 https://gist.githubusercontent.com/schuch/
> 44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd
> 99cdffb242/postgresql%2520RDS%2520settings
> >                 <https://gist.githubusercontent.com/schuch/
> 44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd
> 99cdffb242/postgresql%2520RDS%2520settings>____
> >
> >                 (due to this discussion we left autovac on:
> >                 https://www.mail-archive.com/user@manifoldcf.apache.org/
> msg03512.html
> >                 <https://www.mail-archive.com/
> user@manifoldcf.apache.org/msg03512.html>)____
> >
> >                  ____
> >
> >                 CPU Usage of the Database is between 15% and 35%____
> >
> >                  ____
> >
> >                 Are we hitting bad auto optimiziations of the newer
> >                 postgresql version?____
> >
> >                  ____
> >
> >                 Many thanks in advance,____
> >
> >                 Markus____
> >
> >              ____
> >
> >         __ __
> >
> >
> >
>

Re: Amazon RDS for PostgreSQL Support

Posted by Markus Schuch <ma...@web.de>.

would this be the right place?
https://cwiki.apache.org/confluence/display/CONNECTORS/FAQ

Am 16.12.2017 um 15:08 schrieb Karl Wright:
> We should write up your findings in a FAQ if you find MySQL to perform
> better than postgresql.
> 
> My guess is that what you are seeing is less "bad plan", more
> "contention and backoff", but that would take some effort to tease out.
> 
> Karl
> 
> On Fri, Dec 15, 2017 at 3:06 PM, Karl Wright <daddywri@gmail.com
> <ma...@gmail.com>> wrote:
> 
>     Interesting.  Thanks for the update.
>     Karl
> 
>     On Fri, Dec 15, 2017 at 2:31 PM, Markus Schuch
>     <markus.schuch@deutschebahn.com
>     <ma...@deutschebahn.com>> wrote:
> 
>         We were no able to resolve the performance issue with the
>         carrydown table.____
> 
>         __ __
> 
>         We switched to RDS Aurora MySQL, which performs similar to the
>         MariaDB database in our on premise datacenter. No problems with
>         the carrydown table queries of the sharepoint connector.____
> 
>         __ __
> 
>         Regards____
> 
>         Markus____
> 
>         __ __
> 
>         *Von:* Karl Wright [mailto:daddywri@gmail.com
>         <ma...@gmail.com>]
>         *Gesendet:* Donnerstag, 30. November 2017 12:23
> 
> 
>         *An:* user@manifoldcf.apache.org <ma...@manifoldcf.apache.org>
>         *Betreff:* Re: Amazon RDS for PostgreSQL Support____
> 
>         __ __
> 
>         Typically Jobs table is short and sequential scans are faster
>         than index joins.  Postgres optimizes for that.____
> 
>         __ __
> 
>         No, the plans look fine.  Another reason for the long-running
>         queries might well be contention and locking -- many threads
>         will be trying to do similar things at the same time.  You will
>         note that multiple records get updated in one query; this is
>         usually helpful but when each update is expensive you could wind
>         up with locking causing delays.____
> 
>         __ __
> 
>         Karl____
> 
>         __ __
> 
>         __ __
> 
>         On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch
>         <markus.schuch@deutschebahn.com
>         <ma...@deutschebahn.com>> wrote:____
> 
>             Hi Karl,____
> 
>              ____
> 
>             we disabled autovacuum____
> 
>             The stats table show there was no autovac since then.____
> 
>              ____
> 
>             The long running queries still occur.____
> 
>             There are no other apps using the database and no other jobs
>             running.____
> 
>              ____
> 
>             But there is another long running query to the jobs table
>             between the carrydown queries.____
> 
>             This query seem to happen at the same time with the long
>             running carrydown query.____
> 
>              ____
> 
>             The plan output says “Sec Scan on jobs…”. ____
> 
>              ____
> 
>             2017-11-30 08:16:16,008 WARN  [Finisher thread]
>             org.apache.manifoldcf.db: Found a long-running query (169057
>             ms): [SELECT id FROM jobs WHERE  (status=? OR status=? OR
>             status=?) FOR UPDATE]____
> 
>             2017-11-30 08:16:16,009 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:   Parameter 0: 'A'____
> 
>             2017-11-30 08:16:16,009 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:   Parameter 1: 'W'____
> 
>             2017-11-30 08:16:16,009 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:   Parameter 2: 'R'____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan: LockRows  (cost=0.00..4.43
>             rows=3 width=14) (actual time=0.022..0.024 rows=1 loops=1)____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan:   ->  Seq Scan on jobs 
>             (cost=0.00..4.40 rows=3 width=14) (actual time=0.021..0.022
>             rows=1 loops=1)____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan:         Filter: ((status =
>             'A'::bpchar) OR (status = 'W'::bpchar) OR (status =
>             'R'::bpchar))____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan:         Rows Removed by
>             Filter: 22____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan: Planning time: 0.093 ms____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan: Execution time: 0.041 ms____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:____
> 
>              ____
> 
>             Jobs table indices:____
> 
>             public    jobs       jobs_pkey                         
>             CREATE UNIQUE INDEX jobs_pkey ON jobs USING btree (id)____
> 
>             public    jobs      
>             i1511464305264                               CREATE INDEX
>             i1511464305264 ON jobs USING btree (status, id, priority)____
> 
>             public    jobs      
>             i1511464305263                               CREATE INDEX
>             i1511464305263 ON jobs USING btree (status, processid)____
> 
>             public    jobs      
>             i1511464305262                               CREATE INDEX
>             i1511464305262 ON jobs USING btree (connectionname)____
> 
>             public    jobs      
>             i1511464305261                               CREATE INDEX
>             i1511464305261 ON jobs USING btree (failtime)____
> 
>              ____
> 
>             Shouldn’t be i1511464305264 be used?____
> 
>              ____
> 
>             Many thanks in advance____
> 
>             Markus____
> 
>              ____
> 
>             *Von:* Karl Wright [mailto:daddywri@gmail.com
>             <ma...@gmail.com>]
>             *Gesendet:* Mittwoch, 29. November 2017 23:28
>             *An:* user@manifoldcf.apache.org
>             <ma...@manifoldcf.apache.org>
>             *Betreff:* Re: Amazon RDS for PostgreSQL Support____
> 
>              ____
> 
>             The plans look good for the carrydown execution, so I have
>             to conclude that the long-running queries are due to other
>             considerations -- perhaps concurrent vacuuming, perhaps
>             other queries pounding the database.____
> 
>              ____
> 
>             Sharepoint is especially hard on the carrydown table -- it
>             gets huge.____
> 
>              ____
> 
>             Karl____
> 
>              ____
> 
>              ____
> 
>             On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch
>             <markus.schuch@deutschebahn.com
>             <ma...@deutschebahn.com>> wrote:____
> 
>                 Hi,____
> 
>                  ____
> 
>                 since nobody responded, we started to experiment.____
> 
>                  ____
> 
>                 Setup:____
> 
>                 AWS RDS Postgres 9.6.3____
> 
>                 db.m4.xlarge (4 cores, 16 GB RAM)____
> 
>                 provisioned iops 4000____
> 
>                  ____
> 
>                 Indexing Performance for most jobs is pretty good,
>                 except a for a bigger sharepoint crawl (~200.000 docs).____
> 
>                  ____
> 
>                 We are seeing a lot of long running queries for the
>                 tables carrydown here.____
> 
>                  ____
> 
>                 Logfile:____
> 
>                 https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt
>                 <https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt>____
> 
>                  ____
> 
>                 Settings:____
> 
>                 https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%2520settings
>                 <https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%2520settings>____
> 
>                 (due to this discussion we left autovac on:
>                 https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html
>                 <https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html>)____
> 
>                  ____
> 
>                 CPU Usage of the Database is between 15% and 35%____
> 
>                  ____
> 
>                 Are we hitting bad auto optimiziations of the newer
>                 postgresql version?____
> 
>                  ____
> 
>                 Many thanks in advance,____
> 
>                 Markus____
> 
>              ____
> 
>         __ __
> 
> 
>

Re: Amazon RDS for PostgreSQL Support

Posted by Karl Wright <da...@gmail.com>.

We should write up your findings in a FAQ if you find MySQL to perform
better than postgresql.

My guess is that what you are seeing is less "bad plan", more "contention
and backoff", but that would take some effort to tease out.

Karl

On Fri, Dec 15, 2017 at 3:06 PM, Karl Wright <da...@gmail.com> wrote:

> Interesting.  Thanks for the update.
> Karl
>
> On Fri, Dec 15, 2017 at 2:31 PM, Markus Schuch <
> markus.schuch@deutschebahn.com> wrote:
>
>> We were no able to resolve the performance issue with the carrydown table.
>>
>>
>>
>> We switched to RDS Aurora MySQL, which performs similar to the MariaDB
>> database in our on premise datacenter. No problems with the carrydown table
>> queries of the sharepoint connector.
>>
>>
>>
>> Regards
>>
>> Markus
>>
>>
>>
>> *Von:* Karl Wright [mailto:daddywri@gmail.com]
>> *Gesendet:* Donnerstag, 30. November 2017 12:23
>>
>> *An:* user@manifoldcf.apache.org
>> *Betreff:* Re: Amazon RDS for PostgreSQL Support
>>
>>
>>
>> Typically Jobs table is short and sequential scans are faster than index
>> joins.  Postgres optimizes for that.
>>
>>
>>
>> No, the plans look fine.  Another reason for the long-running queries
>> might well be contention and locking -- many threads will be trying to do
>> similar things at the same time.  You will note that multiple records get
>> updated in one query; this is usually helpful but when each update is
>> expensive you could wind up with locking causing delays.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch <
>> markus.schuch@deutschebahn.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> we disabled autovacuum
>>
>> The stats table show there was no autovac since then.
>>
>>
>>
>> The long running queries still occur.
>>
>> There are no other apps using the database and no other jobs running.
>>
>>
>>
>> But there is another long running query to the jobs table between the
>> carrydown queries.
>>
>> This query seem to happen at the same time with the long running
>> carrydown query.
>>
>>
>>
>> The plan output says “Sec Scan on jobs…”.
>>
>>
>>
>> 2017-11-30 08:16:16,008 WARN  [Finisher thread] org.apache.manifoldcf.db:
>> Found a long-running query (169057 ms): [SELECT id FROM jobs WHERE
>> (status=? OR status=? OR status=?) FOR UPDATE]
>>
>> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:   Parameter 0: 'A'
>>
>> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:   Parameter 1: 'W'
>>
>> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:   Parameter 2: 'R'
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan: LockRows  (cost=0.00..4.43 rows=3
>> width=14) (actual time=0.022..0.024 rows=1 loops=1)
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan:   ->  Seq Scan on jobs  (cost=0.00..4.40
>> rows=3 width=14) (actual time=0.021..0.022 rows=1 loops=1)
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan:         Filter: ((status = 'A'::bpchar) OR
>> (status = 'W'::bpchar) OR (status = 'R'::bpchar))
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan:         Rows Removed by Filter: 22
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan: Planning time: 0.093 ms
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan: Execution time: 0.041 ms
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
>>
>>
>>
>> Jobs table indices:
>>
>> public    jobs       jobs_pkey                          CREATE UNIQUE
>> INDEX jobs_pkey ON jobs USING btree (id)
>>
>> public    jobs       i1511464305264                               CREATE
>> INDEX i1511464305264 ON jobs USING btree (status, id, priority)
>>
>> public    jobs       i1511464305263                               CREATE
>> INDEX i1511464305263 ON jobs USING btree (status, processid)
>>
>> public    jobs       i1511464305262                               CREATE
>> INDEX i1511464305262 ON jobs USING btree (connectionname)
>>
>> public    jobs       i1511464305261                               CREATE
>> INDEX i1511464305261 ON jobs USING btree (failtime)
>>
>>
>>
>> Shouldn’t be i1511464305264 be used?
>>
>>
>>
>> Many thanks in advance
>>
>> Markus
>>
>>
>>
>> *Von:* Karl Wright [mailto:daddywri@gmail.com]
>> *Gesendet:* Mittwoch, 29. November 2017 23:28
>> *An:* user@manifoldcf.apache.org
>> *Betreff:* Re: Amazon RDS for PostgreSQL Support
>>
>>
>>
>> The plans look good for the carrydown execution, so I have to conclude
>> that the long-running queries are due to other considerations -- perhaps
>> concurrent vacuuming, perhaps other queries pounding the database.
>>
>>
>>
>> Sharepoint is especially hard on the carrydown table -- it gets huge.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch <
>> markus.schuch@deutschebahn.com> wrote:
>>
>> Hi,
>>
>>
>>
>> since nobody responded, we started to experiment.
>>
>>
>>
>> Setup:
>>
>> AWS RDS Postgres 9.6.3
>>
>> db.m4.xlarge (4 cores, 16 GB RAM)
>>
>> provisioned iops 4000
>>
>>
>>
>> Indexing Performance for most jobs is pretty good, except a for a bigger
>> sharepoint crawl (~200.000 docs).
>>
>>
>>
>> We are seeing a lot of long running queries for the tables carrydown here.
>>
>>
>>
>> Logfile:
>>
>> https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a
>> 52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt
>>
>>
>>
>> Settings:
>>
>> https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a
>> 910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/
>> postgresql%2520RDS%2520settings
>>
>> (due to this discussion we left autovac on:
>> https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html)
>>
>>
>>
>> CPU Usage of the Database is between 15% and 35%
>>
>>
>>
>> Are we hitting bad auto optimiziations of the newer postgresql version?
>>
>>
>>
>> Many thanks in advance,
>>
>> Markus
>>
>>
>>
>>
>>
>
>

Re: Amazon RDS for PostgreSQL Support

Posted by Karl Wright <da...@gmail.com>.

Interesting.  Thanks for the update.
Karl

On Fri, Dec 15, 2017 at 2:31 PM, Markus Schuch <
markus.schuch@deutschebahn.com> wrote:

> We were no able to resolve the performance issue with the carrydown table.
>
>
>
> We switched to RDS Aurora MySQL, which performs similar to the MariaDB
> database in our on premise datacenter. No problems with the carrydown table
> queries of the sharepoint connector.
>
>
>
> Regards
>
> Markus
>
>
>
> *Von:* Karl Wright [mailto:daddywri@gmail.com]
> *Gesendet:* Donnerstag, 30. November 2017 12:23
>
> *An:* user@manifoldcf.apache.org
> *Betreff:* Re: Amazon RDS for PostgreSQL Support
>
>
>
> Typically Jobs table is short and sequential scans are faster than index
> joins.  Postgres optimizes for that.
>
>
>
> No, the plans look fine.  Another reason for the long-running queries
> might well be contention and locking -- many threads will be trying to do
> similar things at the same time.  You will note that multiple records get
> updated in one query; this is usually helpful but when each update is
> expensive you could wind up with locking causing delays.
>
>
>
> Karl
>
>
>
>
>
> On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch <
> markus.schuch@deutschebahn.com> wrote:
>
> Hi Karl,
>
>
>
> we disabled autovacuum
>
> The stats table show there was no autovac since then.
>
>
>
> The long running queries still occur.
>
> There are no other apps using the database and no other jobs running.
>
>
>
> But there is another long running query to the jobs table between the
> carrydown queries.
>
> This query seem to happen at the same time with the long running carrydown
> query.
>
>
>
> The plan output says “Sec Scan on jobs…”.
>
>
>
> 2017-11-30 08:16:16,008 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Found a long-running query (169057 ms): [SELECT id FROM jobs WHERE
> (status=? OR status=? OR status=?) FOR UPDATE]
>
> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
> org.apache.manifoldcf.db:   Parameter 0: 'A'
>
> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
> org.apache.manifoldcf.db:   Parameter 1: 'W'
>
> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
> org.apache.manifoldcf.db:   Parameter 2: 'R'
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan: LockRows  (cost=0.00..4.43 rows=3 width=14) (actual time=0.022..0.024
> rows=1 loops=1)
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan:   ->  Seq Scan on jobs  (cost=0.00..4.40 rows=3 width=14) (actual
> time=0.021..0.022 rows=1 loops=1)
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan:         Filter: ((status = 'A'::bpchar) OR (status = 'W'::bpchar) OR
> (status = 'R'::bpchar))
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan:         Rows Removed by Filter: 22
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan: Planning time: 0.093 ms
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan: Execution time: 0.041 ms
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
>
>
>
> Jobs table indices:
>
> public    jobs       jobs_pkey                          CREATE UNIQUE
> INDEX jobs_pkey ON jobs USING btree (id)
>
> public    jobs       i1511464305264                               CREATE
> INDEX i1511464305264 ON jobs USING btree (status, id, priority)
>
> public    jobs       i1511464305263                               CREATE
> INDEX i1511464305263 ON jobs USING btree (status, processid)
>
> public    jobs       i1511464305262                               CREATE
> INDEX i1511464305262 ON jobs USING btree (connectionname)
>
> public    jobs       i1511464305261                               CREATE
> INDEX i1511464305261 ON jobs USING btree (failtime)
>
>
>
> Shouldn’t be i1511464305264 be used?
>
>
>
> Many thanks in advance
>
> Markus
>
>
>
> *Von:* Karl Wright [mailto:daddywri@gmail.com]
> *Gesendet:* Mittwoch, 29. November 2017 23:28
> *An:* user@manifoldcf.apache.org
> *Betreff:* Re: Amazon RDS for PostgreSQL Support
>
>
>
> The plans look good for the carrydown execution, so I have to conclude
> that the long-running queries are due to other considerations -- perhaps
> concurrent vacuuming, perhaps other queries pounding the database.
>
>
>
> Sharepoint is especially hard on the carrydown table -- it gets huge.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch <
> markus.schuch@deutschebahn.com> wrote:
>
> Hi,
>
>
>
> since nobody responded, we started to experiment.
>
>
>
> Setup:
>
> AWS RDS Postgres 9.6.3
>
> db.m4.xlarge (4 cores, 16 GB RAM)
>
> provisioned iops 4000
>
>
>
> Indexing Performance for most jobs is pretty good, except a for a bigger
> sharepoint crawl (~200.000 docs).
>
>
>
> We are seeing a lot of long running queries for the tables carrydown here.
>
>
>
> Logfile:
>
> https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3
> e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt
>
>
>
> Settings:
>
> https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff8
> 35/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%
> 2520settings
>
> (due to this discussion we left autovac on: https://www.mail-archive.com/
> user@manifoldcf.apache.org/msg03512.html)
>
>
>
> CPU Usage of the Database is between 15% and 35%
>
>
>
> Are we hitting bad auto optimiziations of the newer postgresql version?
>
>
>
> Many thanks in advance,
>
> Markus
>
>
>
>
>

AW: Amazon RDS for PostgreSQL Support

Posted by Markus Schuch <ma...@deutschebahn.com>.

We were no able to resolve the performance issue with the carrydown table.

We switched to RDS Aurora MySQL, which performs similar to the MariaDB database in our on premise datacenter. No problems with the carrydown table queries of the sharepoint connector.

Regards
Markus

Von: Karl Wright [mailto:daddywri@gmail.com]
Gesendet: Donnerstag, 30. November 2017 12:23
An: user@manifoldcf.apache.org
Betreff: Re: Amazon RDS for PostgreSQL Support

Typically Jobs table is short and sequential scans are faster than index joins.  Postgres optimizes for that.

No, the plans look fine.  Another reason for the long-running queries might well be contention and locking -- many threads will be trying to do similar things at the same time.  You will note that multiple records get updated in one query; this is usually helpful but when each update is expensive you could wind up with locking causing delays.

Karl


On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch <ma...@deutschebahn.com>> wrote:
Hi Karl,

we disabled autovacuum
The stats table show there was no autovac since then.

The long running queries still occur.
There are no other apps using the database and no other jobs running.

But there is another long running query to the jobs table between the carrydown queries.
This query seem to happen at the same time with the long running carrydown query.

The plan output says “Sec Scan on jobs…”.

2017-11-30 08:16:16,008 WARN  [Finisher thread] org.apache.manifoldcf.db: Found a long-running query (169057 ms): [SELECT id FROM jobs WHERE  (status=? OR status=? OR status=?) FOR UPDATE]
2017-11-30 08:16:16,009 WARN  [Finisher thread] org.apache.manifoldcf.db:   Parameter 0: 'A'
2017-11-30 08:16:16,009 WARN  [Finisher thread] org.apache.manifoldcf.db:   Parameter 1: 'W'
2017-11-30 08:16:16,009 WARN  [Finisher thread] org.apache.manifoldcf.db:   Parameter 2: 'R'
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan: LockRows  (cost=0.00..4.43 rows=3 width=14) (actual time=0.022..0.024 rows=1 loops=1)
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan:   ->  Seq Scan on jobs  (cost=0.00..4.40 rows=3 width=14) (actual time=0.021..0.022 rows=1 loops=1)
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan:         Filter: ((status = 'A'::bpchar) OR (status = 'W'::bpchar) OR (status = 'R'::bpchar))
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan:         Rows Removed by Filter: 22
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan: Planning time: 0.093 ms
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan: Execution time: 0.041 ms
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:

Jobs table indices:
public    jobs       jobs_pkey                          CREATE UNIQUE INDEX jobs_pkey ON jobs USING btree (id)
public    jobs       i1511464305264                               CREATE INDEX i1511464305264 ON jobs USING btree (status, id, priority)
public    jobs       i1511464305263                               CREATE INDEX i1511464305263 ON jobs USING btree (status, processid)
public    jobs       i1511464305262                               CREATE INDEX i1511464305262 ON jobs USING btree (connectionname)
public    jobs       i1511464305261                               CREATE INDEX i1511464305261 ON jobs USING btree (failtime)

Shouldn’t be i1511464305264 be used?

Many thanks in advance
Markus

Von: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Gesendet: Mittwoch, 29. November 2017 23:28
An: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Betreff: Re: Amazon RDS for PostgreSQL Support

The plans look good for the carrydown execution, so I have to conclude that the long-running queries are due to other considerations -- perhaps concurrent vacuuming, perhaps other queries pounding the database.

Sharepoint is especially hard on the carrydown table -- it gets huge.

Karl


On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch <ma...@deutschebahn.com>> wrote:
Hi,

since nobody responded, we started to experiment.

Setup:
AWS RDS Postgres 9.6.3
db.m4.xlarge (4 cores, 16 GB RAM)
provisioned iops 4000

Indexing Performance for most jobs is pretty good, except a for a bigger sharepoint crawl (~200.000 docs).

We are seeing a lot of long running queries for the tables carrydown here.

Logfile:
https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt

Settings:
https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%2520settings
(due to this discussion we left autovac on: https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html)

CPU Usage of the Database is between 15% and 35%

Are we hitting bad auto optimiziations of the newer postgresql version?

Many thanks in advance,
Markus

Re: Amazon RDS for PostgreSQL Support

Posted by Karl Wright <da...@gmail.com>.

Typically Jobs table is short and sequential scans are faster than index
joins.  Postgres optimizes for that.

No, the plans look fine.  Another reason for the long-running queries might
well be contention and locking -- many threads will be trying to do similar
things at the same time.  You will note that multiple records get updated
in one query; this is usually helpful but when each update is expensive you
could wind up with locking causing delays.

Karl


On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch <
markus.schuch@deutschebahn.com> wrote:

> Hi Karl,
>
>
>
> we disabled autovacuum
>
> The stats table show there was no autovac since then.
>
>
>
> The long running queries still occur.
>
> There are no other apps using the database and no other jobs running.
>
>
>
> But there is another long running query to the jobs table between the
> carrydown queries.
>
> This query seem to happen at the same time with the long running carrydown
> query.
>
>
>
> The plan output says “Sec Scan on jobs…”.
>
>
>
> 2017-11-30 08:16:16,008 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Found a long-running query (169057 ms): [SELECT id FROM jobs WHERE
> (status=? OR status=? OR status=?) FOR UPDATE]
>
> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
> org.apache.manifoldcf.db:   Parameter 0: 'A'
>
> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
> org.apache.manifoldcf.db:   Parameter 1: 'W'
>
> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
> org.apache.manifoldcf.db:   Parameter 2: 'R'
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan: LockRows  (cost=0.00..4.43 rows=3 width=14) (actual time=0.022..0.024
> rows=1 loops=1)
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan:   ->  Seq Scan on jobs  (cost=0.00..4.40 rows=3 width=14) (actual
> time=0.021..0.022 rows=1 loops=1)
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan:         Filter: ((status = 'A'::bpchar) OR (status = 'W'::bpchar) OR
> (status = 'R'::bpchar))
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan:         Rows Removed by Filter: 22
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan: Planning time: 0.093 ms
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
> Plan: Execution time: 0.041 ms
>
> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
>
>
>
> Jobs table indices:
>
> public    jobs       jobs_pkey                          CREATE UNIQUE
> INDEX jobs_pkey ON jobs USING btree (id)
>
> public    jobs       i1511464305264                               CREATE
> INDEX i1511464305264 ON jobs USING btree (status, id, priority)
>
> public    jobs       i1511464305263                               CREATE
> INDEX i1511464305263 ON jobs USING btree (status, processid)
>
> public    jobs       i1511464305262                               CREATE
> INDEX i1511464305262 ON jobs USING btree (connectionname)
>
> public    jobs       i1511464305261                               CREATE
> INDEX i1511464305261 ON jobs USING btree (failtime)
>
>
>
> Shouldn’t be i1511464305264 be used?
>
>
>
> Many thanks in advance
>
> Markus
>
>
>
> *Von:* Karl Wright [mailto:daddywri@gmail.com]
> *Gesendet:* Mittwoch, 29. November 2017 23:28
> *An:* user@manifoldcf.apache.org
> *Betreff:* Re: Amazon RDS for PostgreSQL Support
>
>
>
> The plans look good for the carrydown execution, so I have to conclude
> that the long-running queries are due to other considerations -- perhaps
> concurrent vacuuming, perhaps other queries pounding the database.
>
>
>
> Sharepoint is especially hard on the carrydown table -- it gets huge.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch <
> markus.schuch@deutschebahn.com> wrote:
>
> Hi,
>
>
>
> since nobody responded, we started to experiment.
>
>
>
> Setup:
>
> AWS RDS Postgres 9.6.3
>
> db.m4.xlarge (4 cores, 16 GB RAM)
>
> provisioned iops 4000
>
>
>
> Indexing Performance for most jobs is pretty good, except a for a bigger
> sharepoint crawl (~200.000 docs).
>
>
>
> We are seeing a lot of long running queries for the tables carrydown here.
>
>
>
> Logfile:
>
> https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3
> e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt
>
>
>
> Settings:
>
> https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff8
> 35/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%
> 2520settings
>
> (due to this discussion we left autovac on: https://www.mail-archive.com/
> user@manifoldcf.apache.org/msg03512.html)
>
>
>
> CPU Usage of the Database is between 15% and 35%
>
>
>
> Are we hitting bad auto optimiziations of the newer postgresql version?
>
>
>
> Many thanks in advance,
>
> Markus
>
>
>

AW: Amazon RDS for PostgreSQL Support

Posted by Markus Schuch <ma...@deutschebahn.com>.

Hi Karl,

we disabled autovacuum
The stats table show there was no autovac since then.

The long running queries still occur.
There are no other apps using the database and no other jobs running.

But there is another long running query to the jobs table between the carrydown queries.
This query seem to happen at the same time with the long running carrydown query.

The plan output says “Sec Scan on jobs…”.

2017-11-30 08:16:16,008 WARN  [Finisher thread] org.apache.manifoldcf.db: Found a long-running query (169057 ms): [SELECT id FROM jobs WHERE  (status=? OR status=? OR status=?) FOR UPDATE]
2017-11-30 08:16:16,009 WARN  [Finisher thread] org.apache.manifoldcf.db:   Parameter 0: 'A'
2017-11-30 08:16:16,009 WARN  [Finisher thread] org.apache.manifoldcf.db:   Parameter 1: 'W'
2017-11-30 08:16:16,009 WARN  [Finisher thread] org.apache.manifoldcf.db:   Parameter 2: 'R'
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan: LockRows  (cost=0.00..4.43 rows=3 width=14) (actual time=0.022..0.024 rows=1 loops=1)
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan:   ->  Seq Scan on jobs  (cost=0.00..4.40 rows=3 width=14) (actual time=0.021..0.022 rows=1 loops=1)
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan:         Filter: ((status = 'A'::bpchar) OR (status = 'W'::bpchar) OR (status = 'R'::bpchar))
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan:         Rows Removed by Filter: 22
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan: Planning time: 0.093 ms
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:  Plan: Execution time: 0.041 ms
2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:

Jobs table indices:
public    jobs       jobs_pkey                          CREATE UNIQUE INDEX jobs_pkey ON jobs USING btree (id)
public    jobs       i1511464305264                               CREATE INDEX i1511464305264 ON jobs USING btree (status, id, priority)
public    jobs       i1511464305263                               CREATE INDEX i1511464305263 ON jobs USING btree (status, processid)
public    jobs       i1511464305262                               CREATE INDEX i1511464305262 ON jobs USING btree (connectionname)
public    jobs       i1511464305261                               CREATE INDEX i1511464305261 ON jobs USING btree (failtime)

Shouldn’t be i1511464305264 be used?

Many thanks in advance
Markus

Von: Karl Wright [mailto:daddywri@gmail.com]
Gesendet: Mittwoch, 29. November 2017 23:28
An: user@manifoldcf.apache.org
Betreff: Re: Amazon RDS for PostgreSQL Support

The plans look good for the carrydown execution, so I have to conclude that the long-running queries are due to other considerations -- perhaps concurrent vacuuming, perhaps other queries pounding the database.

Sharepoint is especially hard on the carrydown table -- it gets huge.

Karl


On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch <ma...@deutschebahn.com>> wrote:
Hi,

since nobody responded, we started to experiment.

Setup:
AWS RDS Postgres 9.6.3
db.m4.xlarge (4 cores, 16 GB RAM)
provisioned iops 4000

Indexing Performance for most jobs is pretty good, except a for a bigger sharepoint crawl (~200.000 docs).

We are seeing a lot of long running queries for the tables carrydown here.

Logfile:
https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt

Settings:
https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%2520settings
(due to this discussion we left autovac on: https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html)

CPU Usage of the Database is between 15% and 35%

Are we hitting bad auto optimiziations of the newer postgresql version?

Many thanks in advance,
Markus

Re: Amazon RDS for PostgreSQL Support

Posted by Karl Wright <da...@gmail.com>.

The plans look good for the carrydown execution, so I have to conclude that
the long-running queries are due to other considerations -- perhaps
concurrent vacuuming, perhaps other queries pounding the database.

Sharepoint is especially hard on the carrydown table -- it gets huge.

Karl


On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch <
markus.schuch@deutschebahn.com> wrote:

> Hi,
>
>
>
> since nobody responded, we started to experiment.
>
>
>
> Setup:
>
> AWS RDS Postgres 9.6.3
>
> db.m4.xlarge (4 cores, 16 GB RAM)
>
> provisioned iops 4000
>
>
>
> Indexing Performance for most jobs is pretty good, except a for a bigger
> sharepoint crawl (~200.000 docs).
>
>
>
> We are seeing a lot of long running queries for the tables carrydown here.
>
>
>
> Logfile:
>
> https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3
> e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt
>
>
>
> Settings:
>
> https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff8
> 35/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%
> 2520settings
>
> (due to this discussion we left autovac on: https://www.mail-archive.com/
> user@manifoldcf.apache.org/msg03512.html)
>
>
>
> CPU Usage of the Database is between 15% and 35%
>
>
>
> Are we hitting bad auto optimiziations of the newer postgresql version?
>
>
>
> Many thanks in advance,
>
> Markus
>