You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Standen Guy <Gu...@uk.fujitsu.com> on 2018/08/03 15:32:02 UTC

PostgreSQL version to support MCF v2.10

Hi Karl/All,
               I am upgrading from MCF v2.6  supported by PostgreSQL v 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build and deployment instructions still suggest that PostgreSQL 9.3 is the latest tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of life next month ( Sept 2018), is there a preferred newer version that should be used?

As an experiment I have installed MCF 2.10  supported by PostgreSQL 10.4.  From the outside all seems to work OK, but investigation of the PostgreSQL  logs shows a lot of errors:

e.g.
"2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down at 2018-08-03 15:47:30 BST
2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to accept connections
2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a transaction in progress
2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in progress
2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a transaction in progress
2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in progress
2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5 AND childidhash=$6
2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed if retried."

And

"2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE"

Do you have any advice as to whether it is sensible to use PostgreSQL v10.x   and if so can these errors be overcome?

Best Regards,

Guy

Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW;  PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE. 
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
yes

On Wed, Sep 5, 2018 at 12:10 PM Steph van Schalkwyk <st...@remcam.net>
wrote:

> Thank you. So I'll stop for now?
> Steph
>
>
>
>
> *Steph van Schalkwyk*
> Principal, Remcam Search Engines
> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
> <http://www.remcam.net/> Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Wed, Sep 5, 2018 at 11:05 AM, Karl Wright <da...@gmail.com> wrote:
>
>> I'm already working on the Web Connector.  The UI has problems that
>> predate this change and I've alerted Kishore about them -- he'll look into
>> them later today.
>>
>> Karl
>>
>>
>> On Wed, Sep 5, 2018 at 11:55 AM Steph van Schalkwyk <st...@remcam.net>
>> wrote:
>>
>>> Thank you Karl.
>>> You are of course correct in that the incremental crawl is now broken in
>>> that it does a full crawl every time.
>>> I'll jump on the Web Connector and add that functionality.
>>> Thanks for this excellent application and all the help over the years.
>>> Steph
>>>
>>>
>>>
>>>
>>> *Steph van Schalkwyk*
>>> Principal, Remcam Search Engines
>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>> <https://mail.google.com/mail/u/0/#>
>>> <http://linkedin.com/in/vanschalkwyk>
>>>
>>> On Wed, Sep 5, 2018 at 6:33 AM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> The patch I uploaded doesn't work because the entire tab is broken;
>>>> looks like the UI refactoring broke it and it was never reported.  Fixing
>>>> now.
>>>> Karl
>>>>
>>>>
>>>> On Wed, Sep 5, 2018 at 3:57 AM Karl Wright <da...@gmail.com> wrote:
>>>>
>>>>> I coded up the web connector feature I think we need.  See
>>>>> CONNECTORS-1528; I've attached a patch.  Please apply and test it out to
>>>>> see if it solves the case problem for your IIS site.
>>>>>
>>>>> For the "//" issue, can you be more specific about the mapping you
>>>>> need to do?
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 4:17 PM Karl Wright <da...@gmail.com> wrote:
>>>>>
>>>>>> Hi Steph,
>>>>>>
>>>>>> Right, you wouldn't want to touch the framework.
>>>>>>
>>>>>> The effect of lower-casing the documentURI parameter in the
>>>>>> addOrReplaceDocumentWithException method in an output connector would be to
>>>>>> map multiple, independently-fetched, documents that differ only by the case
>>>>>> of the URL together into one document in the index.  The ManifoldCF
>>>>>> assumption is that a document with a certain URI can be tracked in the
>>>>>> index using exactly that URI.  Mapping the URI to lower case would break
>>>>>> that assumption so the framework would make the wrong decision in many
>>>>>> cases.
>>>>>>
>>>>>> If you are picking up documents using the web connector, therefore,
>>>>>> and you are getting duplicate documents because the document URLs are
>>>>>> sloppy, it is therefore essential that INSTEAD of mapping the document URI
>>>>>> to lower case in the output connector, you map to lower case in the
>>>>>> repository connector.  Otherwise the framework will not work right.
>>>>>>
>>>>>> There is a tab in the web connector that allows you to configure URL
>>>>>> normalization, called "Canonicalization".  This would be a very appropriate
>>>>>> place to add URL mapping to lower case.  It should be as simple as adding
>>>>>> one more checkbox column in the table, and modifying the method that does
>>>>>> the URL processing to include lower-casing.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 2:46 PM Steph van Schalkwyk <st...@remcam.net>
>>>>>> wrote:
>>>>>>
>>>>>>> Unless I have a massive misunderstanding somewhere...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Steph van Schalkwyk*
>>>>>>> Principal, Remcam Search Engines
>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 1:42 PM, Steph van Schalkwyk <
>>>>>>> steph@remcam.net> wrote:
>>>>>>>
>>>>>>>> Hi Karl
>>>>>>>> I'm addressing it in the ES Output Connector.
>>>>>>>> Not touching the framework :)
>>>>>>>> S
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Steph van Schalkwyk*
>>>>>>>> Principal, Remcam Search Engines
>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>
>>>>>>>> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright <da...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Let's make sure we're talking about the same thing.
>>>>>>>>>
>>>>>>>>> Here is the output connector method that receives the ID (as the
>>>>>>>>> documentURI parameter):
>>>>>>>>>
>>>>>>>>>   public int addOrReplaceDocumentWithException(String documentURI,
>>>>>>>>> VersionContext pipelineDescription, RepositoryDocument document, String
>>>>>>>>> authorityNameString, IOutputAddActivity activities)
>>>>>>>>>     throws ManifoldCFException, ServiceInterruption, IOException;
>>>>>>>>>
>>>>>>>>> ManifoldCF doesn't say anywhere that this ID is case insensitive.
>>>>>>>>> If you make it case insensitive in an output connector, this will
>>>>>>>>> potentially break a lot of things, for example incremental indexing (which
>>>>>>>>> organizes the last indexed version by document ID).
>>>>>>>>>
>>>>>>>>> I therefore highly recommend that any "sloppyness" in this
>>>>>>>>> parameter be addressed in the Repository Connector that constructs it.  If
>>>>>>>>> the connector is crawling a repository that believes that URLs are case
>>>>>>>>> insensitive then it should map these IDs to lower case.  If not, then it
>>>>>>>>> shouldn't.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <
>>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Karl.
>>>>>>>>>> The issue is that the ES Output Connector uses the uri to create
>>>>>>>>>> the _id. When used with IIS which allows case variation in the URI, it
>>>>>>>>>> creates multiple documents. Clients on Windows IIS are rarely cognizant of
>>>>>>>>>> that issue as IIS is so lax in policing that OTB.
>>>>>>>>>> Currently, every case variation in URI results in a new doc in
>>>>>>>>>> the index. This is only in the ES output connector.
>>>>>>>>>> I can add an optional checkbox to do determien that particular
>>>>>>>>>> action if that would help?
>>>>>>>>>> Regards,
>>>>>>>>>> Steph
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> THanks for the update.
>>>>>>>>>>> Lower-casing the ID would be fine except there are some
>>>>>>>>>>> connectors that care about case.  The web connector is one such because
>>>>>>>>>>> it's up to the web service to decide if case matters, so the web connector
>>>>>>>>>>> does not view urls with case differences as being the same.  Other
>>>>>>>>>>> connectors also will likely care as well. So I don't think lower-casing the
>>>>>>>>>>> document id is a smart thing to do.
>>>>>>>>>>>
>>>>>>>>>>> You could add this bit of configuration to the web connector, if
>>>>>>>>>>> that's what you are using, or to whatever other connector constructs the ID.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <
>>>>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Karl.
>>>>>>>>>>>>
>>>>>>>>>>>> I'll look into that.
>>>>>>>>>>>>
>>>>>>>>>>>> Another note:
>>>>>>>>>>>> Regarding the ES connector - I have made two additions to it
>>>>>>>>>>>> and should probably diff them for inclusion after approval:
>>>>>>>>>>>> 1. lowercased _id (the doc URI).
>>>>>>>>>>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy
>>>>>>>>>>>> sources, particularly IIS...)
>>>>>>>>>>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x
>>>>>>>>>>>> does not allow accedd to _id in the schema anymore, so no copy_field etc.
>>>>>>>>>>>> from _id). Hence "url".
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Steph
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <
>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Steph, I suspect that Jetty is leaking some resource, and
>>>>>>>>>>>>> we may need to upgrade it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <
>>>>>>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Olivier
>>>>>>>>>>>>>> By all means.
>>>>>>>>>>>>>> The only issue I have seen (totally unrelated) is with Jetty,
>>>>>>>>>>>>>> which has to be restarted about once a week. Still trying to find the issue.
>>>>>>>>>>>>>> I may be overly sensitive, but I suspect MCF 2.10 with
>>>>>>>>>>>>>> Postgres10 may be a bit slower. I have no empiric evidence at the moment as
>>>>>>>>>>>>>> I'm still delivering the project to UAT. Will keep you posted.
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Steph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype:
>>>>>>>>>>>>>> svanschalkwyk <https://mail.google.com/mail/u/0/#>
>>>>>>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>>>>>>>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks a lot for sharing your PostgreSQL configuration
>>>>>>>>>>>>>>> (sorry for the late answer). I will test it soon.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Olivier TAVARD
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <
>>>>>>>>>>>>>>> steph@remcam.net> a écrit :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> These are the rpm installs:
>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> postgresql_version: 10
>>>>>>>>>>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>>>>>>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>>>>>>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>>>>>>>>>>> postgresql_daemon: postgresql-10.service
>>>>>>>>>>>>>>> postgresql_packages:
>>>>>>>>>>>>>>> - postgresql10-libs
>>>>>>>>>>>>>>> - postgresql10
>>>>>>>>>>>>>>> - postgresql10-server
>>>>>>>>>>>>>>> - postgresql10-contrib
>>>>>>>>>>>>>>> # - postgresql10-devel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> postgresql_hba_entries:
>>>>>>>>>>>>>>> - { type: local, database: all, user: postgres, auth_method:
>>>>>>>>>>>>>>> peer }
>>>>>>>>>>>>>>> - { type: local, database: all, user: all, auth_method: peer
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '
>>>>>>>>>>>>>>> 127.0.0.1/32', auth_method: md5 }
>>>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '
>>>>>>>>>>>>>>> 0.0.0.0/0', auth_method: md5 }
>>>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> postgresql_global_config_options:
>>>>>>>>>>>>>>> - option: unix_socket_directories
>>>>>>>>>>>>>>> value: '{{ postgresql_unix_socket_directories | join(",")
>>>>>>>>>>>>>>> }}'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - option: standard_conforming_strings
>>>>>>>>>>>>>>> value: 'on'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - option: shared_buffers
>>>>>>>>>>>>>>> value: '1024MB'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>>>>>>>>>>> # checkpoint_segments=300
>>>>>>>>>>>>>>> - option: max_wal_size
>>>>>>>>>>>>>>> value: '14400MB'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - option: min_wal_size
>>>>>>>>>>>>>>> value: '80MB'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - option: maintenance_work_mem
>>>>>>>>>>>>>>> value: '2MB'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - option: listen_addresses
>>>>>>>>>>>>>>> value: '*'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - option: max_connections
>>>>>>>>>>>>>>> value: '400'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - option: checkpoint_timeout
>>>>>>>>>>>>>>> value: '900'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - option: datestyle
>>>>>>>>>>>>>>> value: "iso, mdy"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - option: autovacuum
>>>>>>>>>>>>>>> value: 'off'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # vacuum all databases every night (full vacuum on Sunday
>>>>>>>>>>>>>>> night, lazy vacuum every night)
>>>>>>>>>>>>>>> - name: add postgresql cron lazy vacuum
>>>>>>>>>>>>>>> cron:
>>>>>>>>>>>>>>> name: lazy_vacuum
>>>>>>>>>>>>>>> hour: 8
>>>>>>>>>>>>>>> minute: 0
>>>>>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>>>>>>>>>>> - name: add postgresql cron full vacuum
>>>>>>>>>>>>>>> cron:
>>>>>>>>>>>>>>> name: full_vacuum
>>>>>>>>>>>>>>> weekday: 0
>>>>>>>>>>>>>>> hour: 10
>>>>>>>>>>>>>>> minute: 0
>>>>>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze
>>>>>>>>>>>>>>> --quiet'"
>>>>>>>>>>>>>>> # re-index all databases once a week
>>>>>>>>>>>>>>> - name: add postgresql cron reindex
>>>>>>>>>>>>>>> cron:
>>>>>>>>>>>>>>> name: reindex
>>>>>>>>>>>>>>> weekday: 0
>>>>>>>>>>>>>>> hour: 12
>>>>>>>>>>>>>>> minute: 0
>>>>>>>>>>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from
>>>>>>>>>>>>>>> pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres
>>>>>>>>>>>>>>> {} -c \"reindex database {};\"' "
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is how I run 2.10.
>>>>>>>>>>>>>>> Been running fine for some weeks without user intervention.
>>>>>>>>>>>>>>> @Karl: Any comments please?
>>>>>>>>>>>>>>> Steph
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
Thank you. So I'll stop for now?
Steph




*Steph van Schalkwyk*
Principal, Remcam Search Engines
+1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
<http://www.remcam.net/> Skype: svanschalkwyk
<https://mail.google.com/mail/u/0/#>
<http://linkedin.com/in/vanschalkwyk>

On Wed, Sep 5, 2018 at 11:05 AM, Karl Wright <da...@gmail.com> wrote:

> I'm already working on the Web Connector.  The UI has problems that
> predate this change and I've alerted Kishore about them -- he'll look into
> them later today.
>
> Karl
>
>
> On Wed, Sep 5, 2018 at 11:55 AM Steph van Schalkwyk <st...@remcam.net>
> wrote:
>
>> Thank you Karl.
>> You are of course correct in that the incremental crawl is now broken in
>> that it does a full crawl every time.
>> I'll jump on the Web Connector and add that functionality.
>> Thanks for this excellent application and all the help over the years.
>> Steph
>>
>>
>>
>>
>> *Steph van Schalkwyk*
>> Principal, Remcam Search Engines
>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
>> <http://www.remcam.net/> Skype: svanschalkwyk
>> <https://mail.google.com/mail/u/0/#>
>> <http://linkedin.com/in/vanschalkwyk>
>>
>> On Wed, Sep 5, 2018 at 6:33 AM, Karl Wright <da...@gmail.com> wrote:
>>
>>> The patch I uploaded doesn't work because the entire tab is broken;
>>> looks like the UI refactoring broke it and it was never reported.  Fixing
>>> now.
>>> Karl
>>>
>>>
>>> On Wed, Sep 5, 2018 at 3:57 AM Karl Wright <da...@gmail.com> wrote:
>>>
>>>> I coded up the web connector feature I think we need.  See
>>>> CONNECTORS-1528; I've attached a patch.  Please apply and test it out to
>>>> see if it solves the case problem for your IIS site.
>>>>
>>>> For the "//" issue, can you be more specific about the mapping you need
>>>> to do?
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Tue, Sep 4, 2018 at 4:17 PM Karl Wright <da...@gmail.com> wrote:
>>>>
>>>>> Hi Steph,
>>>>>
>>>>> Right, you wouldn't want to touch the framework.
>>>>>
>>>>> The effect of lower-casing the documentURI parameter in the
>>>>> addOrReplaceDocumentWithException method in an output connector would
>>>>> be to map multiple, independently-fetched, documents that differ only by
>>>>> the case of the URL together into one document in the index.  The
>>>>> ManifoldCF assumption is that a document with a certain URI can be tracked
>>>>> in the index using exactly that URI.  Mapping the URI to lower case would
>>>>> break that assumption so the framework would make the wrong decision in
>>>>> many cases.
>>>>>
>>>>> If you are picking up documents using the web connector, therefore,
>>>>> and you are getting duplicate documents because the document URLs are
>>>>> sloppy, it is therefore essential that INSTEAD of mapping the document URI
>>>>> to lower case in the output connector, you map to lower case in the
>>>>> repository connector.  Otherwise the framework will not work right.
>>>>>
>>>>> There is a tab in the web connector that allows you to configure URL
>>>>> normalization, called "Canonicalization".  This would be a very appropriate
>>>>> place to add URL mapping to lower case.  It should be as simple as adding
>>>>> one more checkbox column in the table, and modifying the method that does
>>>>> the URL processing to include lower-casing.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 2:46 PM Steph van Schalkwyk <st...@remcam.net>
>>>>> wrote:
>>>>>
>>>>>> Unless I have a massive misunderstanding somewhere...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Steph van Schalkwyk*
>>>>>> Principal, Remcam Search Engines
>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 1:42 PM, Steph van Schalkwyk <steph@remcam.net
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Karl
>>>>>>> I'm addressing it in the ES Output Connector.
>>>>>>> Not touching the framework :)
>>>>>>> S
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Steph van Schalkwyk*
>>>>>>> Principal, Remcam Search Engines
>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright <da...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Let's make sure we're talking about the same thing.
>>>>>>>>
>>>>>>>> Here is the output connector method that receives the ID (as the
>>>>>>>> documentURI parameter):
>>>>>>>>
>>>>>>>>   public int addOrReplaceDocumentWithException(String documentURI,
>>>>>>>> VersionContext pipelineDescription, RepositoryDocument document, String
>>>>>>>> authorityNameString, IOutputAddActivity activities)
>>>>>>>>     throws ManifoldCFException, ServiceInterruption, IOException;
>>>>>>>>
>>>>>>>> ManifoldCF doesn't say anywhere that this ID is case insensitive.
>>>>>>>> If you make it case insensitive in an output connector, this will
>>>>>>>> potentially break a lot of things, for example incremental indexing (which
>>>>>>>> organizes the last indexed version by document ID).
>>>>>>>>
>>>>>>>> I therefore highly recommend that any "sloppyness" in this
>>>>>>>> parameter be addressed in the Repository Connector that constructs it.  If
>>>>>>>> the connector is crawling a repository that believes that URLs are case
>>>>>>>> insensitive then it should map these IDs to lower case.  If not, then it
>>>>>>>> shouldn't.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <
>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>
>>>>>>>>> Hi Karl.
>>>>>>>>> The issue is that the ES Output Connector uses the uri to create
>>>>>>>>> the _id. When used with IIS which allows case variation in the URI, it
>>>>>>>>> creates multiple documents. Clients on Windows IIS are rarely cognizant of
>>>>>>>>> that issue as IIS is so lax in policing that OTB.
>>>>>>>>> Currently, every case variation in URI results in a new doc in the
>>>>>>>>> index. This is only in the ES output connector.
>>>>>>>>> I can add an optional checkbox to do determien that particular
>>>>>>>>> action if that would help?
>>>>>>>>> Regards,
>>>>>>>>> Steph
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> THanks for the update.
>>>>>>>>>> Lower-casing the ID would be fine except there are some
>>>>>>>>>> connectors that care about case.  The web connector is one such because
>>>>>>>>>> it's up to the web service to decide if case matters, so the web connector
>>>>>>>>>> does not view urls with case differences as being the same.  Other
>>>>>>>>>> connectors also will likely care as well. So I don't think lower-casing the
>>>>>>>>>> document id is a smart thing to do.
>>>>>>>>>>
>>>>>>>>>> You could add this bit of configuration to the web connector, if
>>>>>>>>>> that's what you are using, or to whatever other connector constructs the ID.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <
>>>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Karl.
>>>>>>>>>>>
>>>>>>>>>>> I'll look into that.
>>>>>>>>>>>
>>>>>>>>>>> Another note:
>>>>>>>>>>> Regarding the ES connector - I have made two additions to it and
>>>>>>>>>>> should probably diff them for inclusion after approval:
>>>>>>>>>>> 1. lowercased _id (the doc URI).
>>>>>>>>>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy
>>>>>>>>>>> sources, particularly IIS...)
>>>>>>>>>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x
>>>>>>>>>>> does not allow accedd to _id in the schema anymore, so no copy_field etc.
>>>>>>>>>>> from _id). Hence "url".
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Steph
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <daddywri@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Steph, I suspect that Jetty is leaking some resource, and we
>>>>>>>>>>>> may need to upgrade it.
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <
>>>>>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Olivier
>>>>>>>>>>>>> By all means.
>>>>>>>>>>>>> The only issue I have seen (totally unrelated) is with Jetty,
>>>>>>>>>>>>> which has to be restarted about once a week. Still trying to find the issue.
>>>>>>>>>>>>> I may be overly sensitive, but I suspect MCF 2.10 with
>>>>>>>>>>>>> Postgres10 may be a bit slower. I have no empiric evidence at the moment as
>>>>>>>>>>>>> I'm still delivering the project to UAT. Will keep you posted.
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Steph
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svan
>>>>>>>>>>>>> schalkwyk <https://mail.google.com/mail/u/0/#>
>>>>>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>>>>>>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry
>>>>>>>>>>>>>> for the late answer). I will test it soon.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Olivier TAVARD
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <
>>>>>>>>>>>>>> steph@remcam.net> a écrit :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> These are the rpm installs:
>>>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.
>>>>>>>>>>>>>> rhel7.x86_64.rpm
>>>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.
>>>>>>>>>>>>>> x86_64.rpm
>>>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-contrib-10.4-
>>>>>>>>>>>>>> 1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.
>>>>>>>>>>>>>> rhel7.x86_64.rpm
>>>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-server-10.4-
>>>>>>>>>>>>>> 1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> postgresql_version: 10
>>>>>>>>>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>>>>>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>>>>>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>>>>>>>>>> postgresql_daemon: postgresql-10.service
>>>>>>>>>>>>>> postgresql_packages:
>>>>>>>>>>>>>> - postgresql10-libs
>>>>>>>>>>>>>> - postgresql10
>>>>>>>>>>>>>> - postgresql10-server
>>>>>>>>>>>>>> - postgresql10-contrib
>>>>>>>>>>>>>> # - postgresql10-devel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> postgresql_hba_entries:
>>>>>>>>>>>>>> - { type: local, database: all, user: postgres, auth_method:
>>>>>>>>>>>>>> peer }
>>>>>>>>>>>>>> - { type: local, database: all, user: all, auth_method: peer
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '
>>>>>>>>>>>>>> 127.0.0.1/32', auth_method: md5 }
>>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0
>>>>>>>>>>>>>> ', auth_method: md5 }
>>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> postgresql_global_config_options:
>>>>>>>>>>>>>> - option: unix_socket_directories
>>>>>>>>>>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - option: standard_conforming_strings
>>>>>>>>>>>>>> value: 'on'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - option: shared_buffers
>>>>>>>>>>>>>> value: '1024MB'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>>>>>>>>>> # checkpoint_segments=300
>>>>>>>>>>>>>> - option: max_wal_size
>>>>>>>>>>>>>> value: '14400MB'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - option: min_wal_size
>>>>>>>>>>>>>> value: '80MB'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - option: maintenance_work_mem
>>>>>>>>>>>>>> value: '2MB'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - option: listen_addresses
>>>>>>>>>>>>>> value: '*'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - option: max_connections
>>>>>>>>>>>>>> value: '400'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - option: checkpoint_timeout
>>>>>>>>>>>>>> value: '900'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - option: datestyle
>>>>>>>>>>>>>> value: "iso, mdy"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - option: autovacuum
>>>>>>>>>>>>>> value: 'off'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # vacuum all databases every night (full vacuum on Sunday
>>>>>>>>>>>>>> night, lazy vacuum every night)
>>>>>>>>>>>>>> - name: add postgresql cron lazy vacuum
>>>>>>>>>>>>>> cron:
>>>>>>>>>>>>>> name: lazy_vacuum
>>>>>>>>>>>>>> hour: 8
>>>>>>>>>>>>>> minute: 0
>>>>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>>>>>>>>>> - name: add postgresql cron full vacuum
>>>>>>>>>>>>>> cron:
>>>>>>>>>>>>>> name: full_vacuum
>>>>>>>>>>>>>> weekday: 0
>>>>>>>>>>>>>> hour: 10
>>>>>>>>>>>>>> minute: 0
>>>>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze
>>>>>>>>>>>>>> --quiet'"
>>>>>>>>>>>>>> # re-index all databases once a week
>>>>>>>>>>>>>> - name: add postgresql cron reindex
>>>>>>>>>>>>>> cron:
>>>>>>>>>>>>>> name: reindex
>>>>>>>>>>>>>> weekday: 0
>>>>>>>>>>>>>> hour: 12
>>>>>>>>>>>>>> minute: 0
>>>>>>>>>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from
>>>>>>>>>>>>>> pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres
>>>>>>>>>>>>>> {} -c \"reindex database {};\"' "
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is how I run 2.10.
>>>>>>>>>>>>>> Been running fine for some weeks without user intervention.
>>>>>>>>>>>>>> @Karl: Any comments please?
>>>>>>>>>>>>>> Steph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
I'm already working on the Web Connector.  The UI has problems that predate
this change and I've alerted Kishore about them -- he'll look into them
later today.

Karl


On Wed, Sep 5, 2018 at 11:55 AM Steph van Schalkwyk <st...@remcam.net>
wrote:

> Thank you Karl.
> You are of course correct in that the incremental crawl is now broken in
> that it does a full crawl every time.
> I'll jump on the Web Connector and add that functionality.
> Thanks for this excellent application and all the help over the years.
> Steph
>
>
>
>
> *Steph van Schalkwyk*
> Principal, Remcam Search Engines
> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
> <http://www.remcam.net/> Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Wed, Sep 5, 2018 at 6:33 AM, Karl Wright <da...@gmail.com> wrote:
>
>> The patch I uploaded doesn't work because the entire tab is broken; looks
>> like the UI refactoring broke it and it was never reported.  Fixing now.
>> Karl
>>
>>
>> On Wed, Sep 5, 2018 at 3:57 AM Karl Wright <da...@gmail.com> wrote:
>>
>>> I coded up the web connector feature I think we need.  See
>>> CONNECTORS-1528; I've attached a patch.  Please apply and test it out to
>>> see if it solves the case problem for your IIS site.
>>>
>>> For the "//" issue, can you be more specific about the mapping you need
>>> to do?
>>>
>>> Karl
>>>
>>>
>>> On Tue, Sep 4, 2018 at 4:17 PM Karl Wright <da...@gmail.com> wrote:
>>>
>>>> Hi Steph,
>>>>
>>>> Right, you wouldn't want to touch the framework.
>>>>
>>>> The effect of lower-casing the documentURI parameter in the
>>>> addOrReplaceDocumentWithException method in an output connector would be to
>>>> map multiple, independently-fetched, documents that differ only by the case
>>>> of the URL together into one document in the index.  The ManifoldCF
>>>> assumption is that a document with a certain URI can be tracked in the
>>>> index using exactly that URI.  Mapping the URI to lower case would break
>>>> that assumption so the framework would make the wrong decision in many
>>>> cases.
>>>>
>>>> If you are picking up documents using the web connector, therefore, and
>>>> you are getting duplicate documents because the document URLs are sloppy,
>>>> it is therefore essential that INSTEAD of mapping the document URI to lower
>>>> case in the output connector, you map to lower case in the repository
>>>> connector.  Otherwise the framework will not work right.
>>>>
>>>> There is a tab in the web connector that allows you to configure URL
>>>> normalization, called "Canonicalization".  This would be a very appropriate
>>>> place to add URL mapping to lower case.  It should be as simple as adding
>>>> one more checkbox column in the table, and modifying the method that does
>>>> the URL processing to include lower-casing.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Sep 4, 2018 at 2:46 PM Steph van Schalkwyk <st...@remcam.net>
>>>> wrote:
>>>>
>>>>> Unless I have a massive misunderstanding somewhere...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Steph van Schalkwyk*
>>>>> Principal, Remcam Search Engines
>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>> <https://mail.google.com/mail/u/0/#>
>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 1:42 PM, Steph van Schalkwyk <st...@remcam.net>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl
>>>>>> I'm addressing it in the ES Output Connector.
>>>>>> Not touching the framework :)
>>>>>> S
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Steph van Schalkwyk*
>>>>>> Principal, Remcam Search Engines
>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Let's make sure we're talking about the same thing.
>>>>>>>
>>>>>>> Here is the output connector method that receives the ID (as the
>>>>>>> documentURI parameter):
>>>>>>>
>>>>>>>   public int addOrReplaceDocumentWithException(String documentURI,
>>>>>>> VersionContext pipelineDescription, RepositoryDocument document, String
>>>>>>> authorityNameString, IOutputAddActivity activities)
>>>>>>>     throws ManifoldCFException, ServiceInterruption, IOException;
>>>>>>>
>>>>>>> ManifoldCF doesn't say anywhere that this ID is case insensitive.
>>>>>>> If you make it case insensitive in an output connector, this will
>>>>>>> potentially break a lot of things, for example incremental indexing (which
>>>>>>> organizes the last indexed version by document ID).
>>>>>>>
>>>>>>> I therefore highly recommend that any "sloppyness" in this parameter
>>>>>>> be addressed in the Repository Connector that constructs it.  If the
>>>>>>> connector is crawling a repository that believes that URLs are case
>>>>>>> insensitive then it should map these IDs to lower case.  If not, then it
>>>>>>> shouldn't.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <st...@remcam.net>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl.
>>>>>>>> The issue is that the ES Output Connector uses the uri to create
>>>>>>>> the _id. When used with IIS which allows case variation in the URI, it
>>>>>>>> creates multiple documents. Clients on Windows IIS are rarely cognizant of
>>>>>>>> that issue as IIS is so lax in policing that OTB.
>>>>>>>> Currently, every case variation in URI results in a new doc in the
>>>>>>>> index. This is only in the ES output connector.
>>>>>>>> I can add an optional checkbox to do determien that particular
>>>>>>>> action if that would help?
>>>>>>>> Regards,
>>>>>>>> Steph
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Steph van Schalkwyk*
>>>>>>>> Principal, Remcam Search Engines
>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>
>>>>>>>> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> THanks for the update.
>>>>>>>>> Lower-casing the ID would be fine except there are some connectors
>>>>>>>>> that care about case.  The web connector is one such because it's up to the
>>>>>>>>> web service to decide if case matters, so the web connector does not view
>>>>>>>>> urls with case differences as being the same.  Other connectors also will
>>>>>>>>> likely care as well. So I don't think lower-casing the document id is a
>>>>>>>>> smart thing to do.
>>>>>>>>>
>>>>>>>>> You could add this bit of configuration to the web connector, if
>>>>>>>>> that's what you are using, or to whatever other connector constructs the ID.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <
>>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Karl.
>>>>>>>>>>
>>>>>>>>>> I'll look into that.
>>>>>>>>>>
>>>>>>>>>> Another note:
>>>>>>>>>> Regarding the ES connector - I have made two additions to it and
>>>>>>>>>> should probably diff them for inclusion after approval:
>>>>>>>>>> 1. lowercased _id (the doc URI).
>>>>>>>>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy
>>>>>>>>>> sources, particularly IIS...)
>>>>>>>>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x
>>>>>>>>>> does not allow accedd to _id in the schema anymore, so no copy_field etc.
>>>>>>>>>> from _id). Hence "url".
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Steph
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Steph, I suspect that Jetty is leaking some resource, and we
>>>>>>>>>>> may need to upgrade it.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <
>>>>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Olivier
>>>>>>>>>>>> By all means.
>>>>>>>>>>>> The only issue I have seen (totally unrelated) is with Jetty,
>>>>>>>>>>>> which has to be restarted about once a week. Still trying to find the issue.
>>>>>>>>>>>> I may be overly sensitive, but I suspect MCF 2.10 with
>>>>>>>>>>>> Postgres10 may be a bit slower. I have no empiric evidence at the moment as
>>>>>>>>>>>> I'm still delivering the project to UAT. Will keep you posted.
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Steph
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>>>>>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry
>>>>>>>>>>>>> for the late answer). I will test it soon.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Olivier TAVARD
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net>
>>>>>>>>>>>>> a écrit :
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> These are the rpm installs:
>>>>>>>>>>>>> -
>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>> -
>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>> -
>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>> -
>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>> -
>>>>>>>>>>>>> file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>>
>>>>>>>>>>>>> postgresql_version: 10
>>>>>>>>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>>>>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>>>>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>>>>>>>>> postgresql_daemon: postgresql-10.service
>>>>>>>>>>>>> postgresql_packages:
>>>>>>>>>>>>> - postgresql10-libs
>>>>>>>>>>>>> - postgresql10
>>>>>>>>>>>>> - postgresql10-server
>>>>>>>>>>>>> - postgresql10-contrib
>>>>>>>>>>>>> # - postgresql10-devel
>>>>>>>>>>>>>
>>>>>>>>>>>>> postgresql_hba_entries:
>>>>>>>>>>>>> - { type: local, database: all, user: postgres, auth_method:
>>>>>>>>>>>>> peer }
>>>>>>>>>>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '
>>>>>>>>>>>>> 127.0.0.1/32', auth_method: md5 }
>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>>>
>>>>>>>>>>>>> postgresql_global_config_options:
>>>>>>>>>>>>> - option: unix_socket_directories
>>>>>>>>>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>>>>>>>>>
>>>>>>>>>>>>> - option: standard_conforming_strings
>>>>>>>>>>>>> value: 'on'
>>>>>>>>>>>>>
>>>>>>>>>>>>> - option: shared_buffers
>>>>>>>>>>>>> value: '1024MB'
>>>>>>>>>>>>>
>>>>>>>>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>>>>>>>>> # checkpoint_segments=300
>>>>>>>>>>>>> - option: max_wal_size
>>>>>>>>>>>>> value: '14400MB'
>>>>>>>>>>>>>
>>>>>>>>>>>>> - option: min_wal_size
>>>>>>>>>>>>> value: '80MB'
>>>>>>>>>>>>>
>>>>>>>>>>>>> - option: maintenance_work_mem
>>>>>>>>>>>>> value: '2MB'
>>>>>>>>>>>>>
>>>>>>>>>>>>> - option: listen_addresses
>>>>>>>>>>>>> value: '*'
>>>>>>>>>>>>>
>>>>>>>>>>>>> - option: max_connections
>>>>>>>>>>>>> value: '400'
>>>>>>>>>>>>>
>>>>>>>>>>>>> - option: checkpoint_timeout
>>>>>>>>>>>>> value: '900'
>>>>>>>>>>>>>
>>>>>>>>>>>>> - option: datestyle
>>>>>>>>>>>>> value: "iso, mdy"
>>>>>>>>>>>>>
>>>>>>>>>>>>> - option: autovacuum
>>>>>>>>>>>>> value: 'off'
>>>>>>>>>>>>>
>>>>>>>>>>>>> # vacuum all databases every night (full vacuum on Sunday
>>>>>>>>>>>>> night, lazy vacuum every night)
>>>>>>>>>>>>> - name: add postgresql cron lazy vacuum
>>>>>>>>>>>>> cron:
>>>>>>>>>>>>> name: lazy_vacuum
>>>>>>>>>>>>> hour: 8
>>>>>>>>>>>>> minute: 0
>>>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>>>>>>>>> - name: add postgresql cron full vacuum
>>>>>>>>>>>>> cron:
>>>>>>>>>>>>> name: full_vacuum
>>>>>>>>>>>>> weekday: 0
>>>>>>>>>>>>> hour: 10
>>>>>>>>>>>>> minute: 0
>>>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze
>>>>>>>>>>>>> --quiet'"
>>>>>>>>>>>>> # re-index all databases once a week
>>>>>>>>>>>>> - name: add postgresql cron reindex
>>>>>>>>>>>>> cron:
>>>>>>>>>>>>> name: reindex
>>>>>>>>>>>>> weekday: 0
>>>>>>>>>>>>> hour: 12
>>>>>>>>>>>>> minute: 0
>>>>>>>>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from
>>>>>>>>>>>>> pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres
>>>>>>>>>>>>> {} -c \"reindex database {};\"' "
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is how I run 2.10.
>>>>>>>>>>>>> Been running fine for some weeks without user intervention.
>>>>>>>>>>>>> @Karl: Any comments please?
>>>>>>>>>>>>> Steph
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
Thank you Karl.
You are of course correct in that the incremental crawl is now broken in
that it does a full crawl every time.
I'll jump on the Web Connector and add that functionality.
Thanks for this excellent application and all the help over the years.
Steph




*Steph van Schalkwyk*
Principal, Remcam Search Engines
+1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
<http://www.remcam.net/> Skype: svanschalkwyk
<https://mail.google.com/mail/u/0/#>
<http://linkedin.com/in/vanschalkwyk>

On Wed, Sep 5, 2018 at 6:33 AM, Karl Wright <da...@gmail.com> wrote:

> The patch I uploaded doesn't work because the entire tab is broken; looks
> like the UI refactoring broke it and it was never reported.  Fixing now.
> Karl
>
>
> On Wed, Sep 5, 2018 at 3:57 AM Karl Wright <da...@gmail.com> wrote:
>
>> I coded up the web connector feature I think we need.  See
>> CONNECTORS-1528; I've attached a patch.  Please apply and test it out to
>> see if it solves the case problem for your IIS site.
>>
>> For the "//" issue, can you be more specific about the mapping you need
>> to do?
>>
>> Karl
>>
>>
>> On Tue, Sep 4, 2018 at 4:17 PM Karl Wright <da...@gmail.com> wrote:
>>
>>> Hi Steph,
>>>
>>> Right, you wouldn't want to touch the framework.
>>>
>>> The effect of lower-casing the documentURI parameter in the
>>> addOrReplaceDocumentWithException method in an output connector would
>>> be to map multiple, independently-fetched, documents that differ only by
>>> the case of the URL together into one document in the index.  The
>>> ManifoldCF assumption is that a document with a certain URI can be tracked
>>> in the index using exactly that URI.  Mapping the URI to lower case would
>>> break that assumption so the framework would make the wrong decision in
>>> many cases.
>>>
>>> If you are picking up documents using the web connector, therefore, and
>>> you are getting duplicate documents because the document URLs are sloppy,
>>> it is therefore essential that INSTEAD of mapping the document URI to lower
>>> case in the output connector, you map to lower case in the repository
>>> connector.  Otherwise the framework will not work right.
>>>
>>> There is a tab in the web connector that allows you to configure URL
>>> normalization, called "Canonicalization".  This would be a very appropriate
>>> place to add URL mapping to lower case.  It should be as simple as adding
>>> one more checkbox column in the table, and modifying the method that does
>>> the URL processing to include lower-casing.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Tue, Sep 4, 2018 at 2:46 PM Steph van Schalkwyk <st...@remcam.net>
>>> wrote:
>>>
>>>> Unless I have a massive misunderstanding somewhere...
>>>>
>>>>
>>>>
>>>>
>>>> *Steph van Schalkwyk*
>>>> Principal, Remcam Search Engines
>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>> <https://mail.google.com/mail/u/0/#>
>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>
>>>> On Tue, Sep 4, 2018 at 1:42 PM, Steph van Schalkwyk <st...@remcam.net>
>>>> wrote:
>>>>
>>>>> Hi Karl
>>>>> I'm addressing it in the ES Output Connector.
>>>>> Not touching the framework :)
>>>>> S
>>>>>
>>>>>
>>>>>
>>>>> *Steph van Schalkwyk*
>>>>> Principal, Remcam Search Engines
>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>> <https://mail.google.com/mail/u/0/#>
>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Let's make sure we're talking about the same thing.
>>>>>>
>>>>>> Here is the output connector method that receives the ID (as the
>>>>>> documentURI parameter):
>>>>>>
>>>>>>   public int addOrReplaceDocumentWithException(String documentURI,
>>>>>> VersionContext pipelineDescription, RepositoryDocument document, String
>>>>>> authorityNameString, IOutputAddActivity activities)
>>>>>>     throws ManifoldCFException, ServiceInterruption, IOException;
>>>>>>
>>>>>> ManifoldCF doesn't say anywhere that this ID is case insensitive.  If
>>>>>> you make it case insensitive in an output connector, this will potentially
>>>>>> break a lot of things, for example incremental indexing (which organizes
>>>>>> the last indexed version by document ID).
>>>>>>
>>>>>> I therefore highly recommend that any "sloppyness" in this parameter
>>>>>> be addressed in the Repository Connector that constructs it.  If the
>>>>>> connector is crawling a repository that believes that URLs are case
>>>>>> insensitive then it should map these IDs to lower case.  If not, then it
>>>>>> shouldn't.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <st...@remcam.net>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Karl.
>>>>>>> The issue is that the ES Output Connector uses the uri to create the
>>>>>>> _id. When used with IIS which allows case variation in the URI, it creates
>>>>>>> multiple documents. Clients on Windows IIS are rarely cognizant of that
>>>>>>> issue as IIS is so lax in policing that OTB.
>>>>>>> Currently, every case variation in URI results in a new doc in the
>>>>>>> index. This is only in the ES output connector.
>>>>>>> I can add an optional checkbox to do determien that particular
>>>>>>> action if that would help?
>>>>>>> Regards,
>>>>>>> Steph
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Steph van Schalkwyk*
>>>>>>> Principal, Remcam Search Engines
>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> THanks for the update.
>>>>>>>> Lower-casing the ID would be fine except there are some connectors
>>>>>>>> that care about case.  The web connector is one such because it's up to the
>>>>>>>> web service to decide if case matters, so the web connector does not view
>>>>>>>> urls with case differences as being the same.  Other connectors also will
>>>>>>>> likely care as well. So I don't think lower-casing the document id is a
>>>>>>>> smart thing to do.
>>>>>>>>
>>>>>>>> You could add this bit of configuration to the web connector, if
>>>>>>>> that's what you are using, or to whatever other connector constructs the ID.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <
>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>
>>>>>>>>> Thanks Karl.
>>>>>>>>>
>>>>>>>>> I'll look into that.
>>>>>>>>>
>>>>>>>>> Another note:
>>>>>>>>> Regarding the ES connector - I have made two additions to it and
>>>>>>>>> should probably diff them for inclusion after approval:
>>>>>>>>> 1. lowercased _id (the doc URI).
>>>>>>>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
>>>>>>>>> particularly IIS...)
>>>>>>>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x
>>>>>>>>> does not allow accedd to _id in the schema anymore, so no copy_field etc.
>>>>>>>>> from _id). Hence "url".
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Steph
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Steph, I suspect that Jetty is leaking some resource, and we
>>>>>>>>>> may need to upgrade it.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <
>>>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>>>
>>>>>>>>>>> Olivier
>>>>>>>>>>> By all means.
>>>>>>>>>>> The only issue I have seen (totally unrelated) is with Jetty,
>>>>>>>>>>> which has to be restarted about once a week. Still trying to find the issue.
>>>>>>>>>>> I may be overly sensitive, but I suspect MCF 2.10 with
>>>>>>>>>>> Postgres10 may be a bit slower. I have no empiric evidence at the moment as
>>>>>>>>>>> I'm still delivering the project to UAT. Will keep you posted.
>>>>>>>>>>> Regards,
>>>>>>>>>>> Steph
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>>>>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry
>>>>>>>>>>>> for the late answer). I will test it soon.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Olivier TAVARD
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net>
>>>>>>>>>>>> a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> These are the rpm installs:
>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.
>>>>>>>>>>>> rhel7.x86_64.rpm
>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.
>>>>>>>>>>>> x86_64.rpm
>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-contrib-10.4-
>>>>>>>>>>>> 1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.
>>>>>>>>>>>> rhel7.x86_64.rpm
>>>>>>>>>>>> - file:///tmp/postgres10/postgresql10-server-10.4-
>>>>>>>>>>>> 1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>>
>>>>>>>>>>>> postgresql_version: 10
>>>>>>>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>>>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>>>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>>>>>>>> postgresql_daemon: postgresql-10.service
>>>>>>>>>>>> postgresql_packages:
>>>>>>>>>>>> - postgresql10-libs
>>>>>>>>>>>> - postgresql10
>>>>>>>>>>>> - postgresql10-server
>>>>>>>>>>>> - postgresql10-contrib
>>>>>>>>>>>> # - postgresql10-devel
>>>>>>>>>>>>
>>>>>>>>>>>> postgresql_hba_entries:
>>>>>>>>>>>> - { type: local, database: all, user: postgres, auth_method:
>>>>>>>>>>>> peer }
>>>>>>>>>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>>>>>>>>>> - { type: host, database: all, user: all, address: '
>>>>>>>>>>>> 127.0.0.1/32', auth_method: md5 }
>>>>>>>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>>
>>>>>>>>>>>> postgresql_global_config_options:
>>>>>>>>>>>> - option: unix_socket_directories
>>>>>>>>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>>>>>>>>
>>>>>>>>>>>> - option: standard_conforming_strings
>>>>>>>>>>>> value: 'on'
>>>>>>>>>>>>
>>>>>>>>>>>> - option: shared_buffers
>>>>>>>>>>>> value: '1024MB'
>>>>>>>>>>>>
>>>>>>>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>>>>>>>> # checkpoint_segments=300
>>>>>>>>>>>> - option: max_wal_size
>>>>>>>>>>>> value: '14400MB'
>>>>>>>>>>>>
>>>>>>>>>>>> - option: min_wal_size
>>>>>>>>>>>> value: '80MB'
>>>>>>>>>>>>
>>>>>>>>>>>> - option: maintenance_work_mem
>>>>>>>>>>>> value: '2MB'
>>>>>>>>>>>>
>>>>>>>>>>>> - option: listen_addresses
>>>>>>>>>>>> value: '*'
>>>>>>>>>>>>
>>>>>>>>>>>> - option: max_connections
>>>>>>>>>>>> value: '400'
>>>>>>>>>>>>
>>>>>>>>>>>> - option: checkpoint_timeout
>>>>>>>>>>>> value: '900'
>>>>>>>>>>>>
>>>>>>>>>>>> - option: datestyle
>>>>>>>>>>>> value: "iso, mdy"
>>>>>>>>>>>>
>>>>>>>>>>>> - option: autovacuum
>>>>>>>>>>>> value: 'off'
>>>>>>>>>>>>
>>>>>>>>>>>> # vacuum all databases every night (full vacuum on Sunday
>>>>>>>>>>>> night, lazy vacuum every night)
>>>>>>>>>>>> - name: add postgresql cron lazy vacuum
>>>>>>>>>>>> cron:
>>>>>>>>>>>> name: lazy_vacuum
>>>>>>>>>>>> hour: 8
>>>>>>>>>>>> minute: 0
>>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>>>>>>>> - name: add postgresql cron full vacuum
>>>>>>>>>>>> cron:
>>>>>>>>>>>> name: full_vacuum
>>>>>>>>>>>> weekday: 0
>>>>>>>>>>>> hour: 10
>>>>>>>>>>>> minute: 0
>>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze
>>>>>>>>>>>> --quiet'"
>>>>>>>>>>>> # re-index all databases once a week
>>>>>>>>>>>> - name: add postgresql cron reindex
>>>>>>>>>>>> cron:
>>>>>>>>>>>> name: reindex
>>>>>>>>>>>> weekday: 0
>>>>>>>>>>>> hour: 12
>>>>>>>>>>>> minute: 0
>>>>>>>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from
>>>>>>>>>>>> pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres
>>>>>>>>>>>> {} -c \"reindex database {};\"' "
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is how I run 2.10.
>>>>>>>>>>>> Been running fine for some weeks without user intervention.
>>>>>>>>>>>> @Karl: Any comments please?
>>>>>>>>>>>> Steph
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
The patch I uploaded doesn't work because the entire tab is broken; looks
like the UI refactoring broke it and it was never reported.  Fixing now.
Karl


On Wed, Sep 5, 2018 at 3:57 AM Karl Wright <da...@gmail.com> wrote:

> I coded up the web connector feature I think we need.  See
> CONNECTORS-1528; I've attached a patch.  Please apply and test it out to
> see if it solves the case problem for your IIS site.
>
> For the "//" issue, can you be more specific about the mapping you need to
> do?
>
> Karl
>
>
> On Tue, Sep 4, 2018 at 4:17 PM Karl Wright <da...@gmail.com> wrote:
>
>> Hi Steph,
>>
>> Right, you wouldn't want to touch the framework.
>>
>> The effect of lower-casing the documentURI parameter in the
>> addOrReplaceDocumentWithException method in an output connector would be to
>> map multiple, independently-fetched, documents that differ only by the case
>> of the URL together into one document in the index.  The ManifoldCF
>> assumption is that a document with a certain URI can be tracked in the
>> index using exactly that URI.  Mapping the URI to lower case would break
>> that assumption so the framework would make the wrong decision in many
>> cases.
>>
>> If you are picking up documents using the web connector, therefore, and
>> you are getting duplicate documents because the document URLs are sloppy,
>> it is therefore essential that INSTEAD of mapping the document URI to lower
>> case in the output connector, you map to lower case in the repository
>> connector.  Otherwise the framework will not work right.
>>
>> There is a tab in the web connector that allows you to configure URL
>> normalization, called "Canonicalization".  This would be a very appropriate
>> place to add URL mapping to lower case.  It should be as simple as adding
>> one more checkbox column in the table, and modifying the method that does
>> the URL processing to include lower-casing.
>>
>> Karl
>>
>>
>>
>> On Tue, Sep 4, 2018 at 2:46 PM Steph van Schalkwyk <st...@remcam.net>
>> wrote:
>>
>>> Unless I have a massive misunderstanding somewhere...
>>>
>>>
>>>
>>>
>>> *Steph van Schalkwyk*
>>> Principal, Remcam Search Engines
>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>> <https://mail.google.com/mail/u/0/#>
>>> <http://linkedin.com/in/vanschalkwyk>
>>>
>>> On Tue, Sep 4, 2018 at 1:42 PM, Steph van Schalkwyk <st...@remcam.net>
>>> wrote:
>>>
>>>> Hi Karl
>>>> I'm addressing it in the ES Output Connector.
>>>> Not touching the framework :)
>>>> S
>>>>
>>>>
>>>>
>>>> *Steph van Schalkwyk*
>>>> Principal, Remcam Search Engines
>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>> <https://mail.google.com/mail/u/0/#>
>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>
>>>> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>>
>>>>> Let's make sure we're talking about the same thing.
>>>>>
>>>>> Here is the output connector method that receives the ID (as the
>>>>> documentURI parameter):
>>>>>
>>>>>   public int addOrReplaceDocumentWithException(String documentURI,
>>>>> VersionContext pipelineDescription, RepositoryDocument document, String
>>>>> authorityNameString, IOutputAddActivity activities)
>>>>>     throws ManifoldCFException, ServiceInterruption, IOException;
>>>>>
>>>>> ManifoldCF doesn't say anywhere that this ID is case insensitive.  If
>>>>> you make it case insensitive in an output connector, this will potentially
>>>>> break a lot of things, for example incremental indexing (which organizes
>>>>> the last indexed version by document ID).
>>>>>
>>>>> I therefore highly recommend that any "sloppyness" in this parameter
>>>>> be addressed in the Repository Connector that constructs it.  If the
>>>>> connector is crawling a repository that believes that URLs are case
>>>>> insensitive then it should map these IDs to lower case.  If not, then it
>>>>> shouldn't.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <st...@remcam.net>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl.
>>>>>> The issue is that the ES Output Connector uses the uri to create the
>>>>>> _id. When used with IIS which allows case variation in the URI, it creates
>>>>>> multiple documents. Clients on Windows IIS are rarely cognizant of that
>>>>>> issue as IIS is so lax in policing that OTB.
>>>>>> Currently, every case variation in URI results in a new doc in the
>>>>>> index. This is only in the ES output connector.
>>>>>> I can add an optional checkbox to do determien that particular action
>>>>>> if that would help?
>>>>>> Regards,
>>>>>> Steph
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Steph van Schalkwyk*
>>>>>> Principal, Remcam Search Engines
>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> THanks for the update.
>>>>>>> Lower-casing the ID would be fine except there are some connectors
>>>>>>> that care about case.  The web connector is one such because it's up to the
>>>>>>> web service to decide if case matters, so the web connector does not view
>>>>>>> urls with case differences as being the same.  Other connectors also will
>>>>>>> likely care as well. So I don't think lower-casing the document id is a
>>>>>>> smart thing to do.
>>>>>>>
>>>>>>> You could add this bit of configuration to the web connector, if
>>>>>>> that's what you are using, or to whatever other connector constructs the ID.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <
>>>>>>> steph@remcam.net> wrote:
>>>>>>>
>>>>>>>> Thanks Karl.
>>>>>>>>
>>>>>>>> I'll look into that.
>>>>>>>>
>>>>>>>> Another note:
>>>>>>>> Regarding the ES connector - I have made two additions to it and
>>>>>>>> should probably diff them for inclusion after approval:
>>>>>>>> 1. lowercased _id (the doc URI).
>>>>>>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
>>>>>>>> particularly IIS...)
>>>>>>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x does
>>>>>>>> not allow accedd to _id in the schema anymore, so no copy_field etc. from
>>>>>>>> _id). Hence "url".
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Steph
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Steph van Schalkwyk*
>>>>>>>> Principal, Remcam Search Engines
>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>
>>>>>>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Steph, I suspect that Jetty is leaking some resource, and we
>>>>>>>>> may need to upgrade it.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <
>>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>>
>>>>>>>>>> Olivier
>>>>>>>>>> By all means.
>>>>>>>>>> The only issue I have seen (totally unrelated) is with Jetty,
>>>>>>>>>> which has to be restarted about once a week. Still trying to find the issue.
>>>>>>>>>> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10
>>>>>>>>>> may be a bit slower. I have no empiric evidence at the moment as I'm still
>>>>>>>>>> delivering the project to UAT. Will keep you posted.
>>>>>>>>>> Regards,
>>>>>>>>>> Steph
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>>>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry
>>>>>>>>>>> for the late answer). I will test it soon.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Olivier TAVARD
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net>
>>>>>>>>>>> a écrit :
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> These are the rpm installs:
>>>>>>>>>>> -
>>>>>>>>>>> file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>> -
>>>>>>>>>>> file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>> -
>>>>>>>>>>> file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>> -
>>>>>>>>>>> file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>> -
>>>>>>>>>>> file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>>
>>>>>>>>>>> postgresql_version: 10
>>>>>>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>>>>>>> postgresql_daemon: postgresql-10.service
>>>>>>>>>>> postgresql_packages:
>>>>>>>>>>> - postgresql10-libs
>>>>>>>>>>> - postgresql10
>>>>>>>>>>> - postgresql10-server
>>>>>>>>>>> - postgresql10-contrib
>>>>>>>>>>> # - postgresql10-devel
>>>>>>>>>>>
>>>>>>>>>>> postgresql_hba_entries:
>>>>>>>>>>> - { type: local, database: all, user: postgres, auth_method:
>>>>>>>>>>> peer }
>>>>>>>>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>>>>>>>>> - { type: host, database: all, user: all, address: '127.0.0.1/32
>>>>>>>>>>> ', auth_method: md5 }
>>>>>>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>>
>>>>>>>>>>> postgresql_global_config_options:
>>>>>>>>>>> - option: unix_socket_directories
>>>>>>>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>>>>>>>
>>>>>>>>>>> - option: standard_conforming_strings
>>>>>>>>>>> value: 'on'
>>>>>>>>>>>
>>>>>>>>>>> - option: shared_buffers
>>>>>>>>>>> value: '1024MB'
>>>>>>>>>>>
>>>>>>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>>>>>>> # checkpoint_segments=300
>>>>>>>>>>> - option: max_wal_size
>>>>>>>>>>> value: '14400MB'
>>>>>>>>>>>
>>>>>>>>>>> - option: min_wal_size
>>>>>>>>>>> value: '80MB'
>>>>>>>>>>>
>>>>>>>>>>> - option: maintenance_work_mem
>>>>>>>>>>> value: '2MB'
>>>>>>>>>>>
>>>>>>>>>>> - option: listen_addresses
>>>>>>>>>>> value: '*'
>>>>>>>>>>>
>>>>>>>>>>> - option: max_connections
>>>>>>>>>>> value: '400'
>>>>>>>>>>>
>>>>>>>>>>> - option: checkpoint_timeout
>>>>>>>>>>> value: '900'
>>>>>>>>>>>
>>>>>>>>>>> - option: datestyle
>>>>>>>>>>> value: "iso, mdy"
>>>>>>>>>>>
>>>>>>>>>>> - option: autovacuum
>>>>>>>>>>> value: 'off'
>>>>>>>>>>>
>>>>>>>>>>> # vacuum all databases every night (full vacuum on Sunday night,
>>>>>>>>>>> lazy vacuum every night)
>>>>>>>>>>> - name: add postgresql cron lazy vacuum
>>>>>>>>>>> cron:
>>>>>>>>>>> name: lazy_vacuum
>>>>>>>>>>> hour: 8
>>>>>>>>>>> minute: 0
>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>>>>>>> - name: add postgresql cron full vacuum
>>>>>>>>>>> cron:
>>>>>>>>>>> name: full_vacuum
>>>>>>>>>>> weekday: 0
>>>>>>>>>>> hour: 10
>>>>>>>>>>> minute: 0
>>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze
>>>>>>>>>>> --quiet'"
>>>>>>>>>>> # re-index all databases once a week
>>>>>>>>>>> - name: add postgresql cron reindex
>>>>>>>>>>> cron:
>>>>>>>>>>> name: reindex
>>>>>>>>>>> weekday: 0
>>>>>>>>>>> hour: 12
>>>>>>>>>>> minute: 0
>>>>>>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from
>>>>>>>>>>> pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres
>>>>>>>>>>> {} -c \"reindex database {};\"' "
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This is how I run 2.10.
>>>>>>>>>>> Been running fine for some weeks without user intervention.
>>>>>>>>>>> @Karl: Any comments please?
>>>>>>>>>>> Steph
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
I coded up the web connector feature I think we need.  See CONNECTORS-1528;
I've attached a patch.  Please apply and test it out to see if it solves
the case problem for your IIS site.

For the "//" issue, can you be more specific about the mapping you need to
do?

Karl


On Tue, Sep 4, 2018 at 4:17 PM Karl Wright <da...@gmail.com> wrote:

> Hi Steph,
>
> Right, you wouldn't want to touch the framework.
>
> The effect of lower-casing the documentURI parameter in the
> addOrReplaceDocumentWithException method in an output connector would be to
> map multiple, independently-fetched, documents that differ only by the case
> of the URL together into one document in the index.  The ManifoldCF
> assumption is that a document with a certain URI can be tracked in the
> index using exactly that URI.  Mapping the URI to lower case would break
> that assumption so the framework would make the wrong decision in many
> cases.
>
> If you are picking up documents using the web connector, therefore, and
> you are getting duplicate documents because the document URLs are sloppy,
> it is therefore essential that INSTEAD of mapping the document URI to lower
> case in the output connector, you map to lower case in the repository
> connector.  Otherwise the framework will not work right.
>
> There is a tab in the web connector that allows you to configure URL
> normalization, called "Canonicalization".  This would be a very appropriate
> place to add URL mapping to lower case.  It should be as simple as adding
> one more checkbox column in the table, and modifying the method that does
> the URL processing to include lower-casing.
>
> Karl
>
>
>
> On Tue, Sep 4, 2018 at 2:46 PM Steph van Schalkwyk <st...@remcam.net>
> wrote:
>
>> Unless I have a massive misunderstanding somewhere...
>>
>>
>>
>>
>> *Steph van Schalkwyk*
>> Principal, Remcam Search Engines
>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
>> <http://www.remcam.net/> Skype: svanschalkwyk
>> <https://mail.google.com/mail/u/0/#>
>> <http://linkedin.com/in/vanschalkwyk>
>>
>> On Tue, Sep 4, 2018 at 1:42 PM, Steph van Schalkwyk <st...@remcam.net>
>> wrote:
>>
>>> Hi Karl
>>> I'm addressing it in the ES Output Connector.
>>> Not touching the framework :)
>>> S
>>>
>>>
>>>
>>> *Steph van Schalkwyk*
>>> Principal, Remcam Search Engines
>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>> <https://mail.google.com/mail/u/0/#>
>>> <http://linkedin.com/in/vanschalkwyk>
>>>
>>> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> Let's make sure we're talking about the same thing.
>>>>
>>>> Here is the output connector method that receives the ID (as the
>>>> documentURI parameter):
>>>>
>>>>   public int addOrReplaceDocumentWithException(String documentURI,
>>>> VersionContext pipelineDescription, RepositoryDocument document, String
>>>> authorityNameString, IOutputAddActivity activities)
>>>>     throws ManifoldCFException, ServiceInterruption, IOException;
>>>>
>>>> ManifoldCF doesn't say anywhere that this ID is case insensitive.  If
>>>> you make it case insensitive in an output connector, this will potentially
>>>> break a lot of things, for example incremental indexing (which organizes
>>>> the last indexed version by document ID).
>>>>
>>>> I therefore highly recommend that any "sloppyness" in this parameter be
>>>> addressed in the Repository Connector that constructs it.  If the connector
>>>> is crawling a repository that believes that URLs are case insensitive then
>>>> it should map these IDs to lower case.  If not, then it shouldn't.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <st...@remcam.net>
>>>> wrote:
>>>>
>>>>> Hi Karl.
>>>>> The issue is that the ES Output Connector uses the uri to create the
>>>>> _id. When used with IIS which allows case variation in the URI, it creates
>>>>> multiple documents. Clients on Windows IIS are rarely cognizant of that
>>>>> issue as IIS is so lax in policing that OTB.
>>>>> Currently, every case variation in URI results in a new doc in the
>>>>> index. This is only in the ES output connector.
>>>>> I can add an optional checkbox to do determien that particular action
>>>>> if that would help?
>>>>> Regards,
>>>>> Steph
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Steph van Schalkwyk*
>>>>> Principal, Remcam Search Engines
>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>> <https://mail.google.com/mail/u/0/#>
>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> THanks for the update.
>>>>>> Lower-casing the ID would be fine except there are some connectors
>>>>>> that care about case.  The web connector is one such because it's up to the
>>>>>> web service to decide if case matters, so the web connector does not view
>>>>>> urls with case differences as being the same.  Other connectors also will
>>>>>> likely care as well. So I don't think lower-casing the document id is a
>>>>>> smart thing to do.
>>>>>>
>>>>>> You could add this bit of configuration to the web connector, if
>>>>>> that's what you are using, or to whatever other connector constructs the ID.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <st...@remcam.net>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Karl.
>>>>>>>
>>>>>>> I'll look into that.
>>>>>>>
>>>>>>> Another note:
>>>>>>> Regarding the ES connector - I have made two additions to it and
>>>>>>> should probably diff them for inclusion after approval:
>>>>>>> 1. lowercased _id (the doc URI).
>>>>>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
>>>>>>> particularly IIS...)
>>>>>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x does
>>>>>>> not allow accedd to _id in the schema anymore, so no copy_field etc. from
>>>>>>> _id). Hence "url".
>>>>>>>
>>>>>>> Regards,
>>>>>>> Steph
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Steph van Schalkwyk*
>>>>>>> Principal, Remcam Search Engines
>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Steph, I suspect that Jetty is leaking some resource, and we may
>>>>>>>> need to upgrade it.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <
>>>>>>>> steph@remcam.net> wrote:
>>>>>>>>
>>>>>>>>> Olivier
>>>>>>>>> By all means.
>>>>>>>>> The only issue I have seen (totally unrelated) is with Jetty,
>>>>>>>>> which has to be restarted about once a week. Still trying to find the issue.
>>>>>>>>> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10
>>>>>>>>> may be a bit slower. I have no empiric evidence at the moment as I'm still
>>>>>>>>> delivering the project to UAT. Will keep you posted.
>>>>>>>>> Regards,
>>>>>>>>> Steph
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Steph van Schalkwyk*
>>>>>>>>> Principal, Remcam Search Engines
>>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry for
>>>>>>>>>> the late answer). I will test it soon.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Olivier TAVARD
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net>
>>>>>>>>>> a écrit :
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> These are the rpm installs:
>>>>>>>>>> -
>>>>>>>>>> file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>> -
>>>>>>>>>> file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>> -
>>>>>>>>>> file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>> -
>>>>>>>>>> file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>>
>>>>>>>>>> postgresql_version: 10
>>>>>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>>>>>> postgresql_daemon: postgresql-10.service
>>>>>>>>>> postgresql_packages:
>>>>>>>>>> - postgresql10-libs
>>>>>>>>>> - postgresql10
>>>>>>>>>> - postgresql10-server
>>>>>>>>>> - postgresql10-contrib
>>>>>>>>>> # - postgresql10-devel
>>>>>>>>>>
>>>>>>>>>> postgresql_hba_entries:
>>>>>>>>>> - { type: local, database: all, user: postgres, auth_method: peer
>>>>>>>>>> }
>>>>>>>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>>>>>>>> - { type: host, database: all, user: all, address: '127.0.0.1/32',
>>>>>>>>>> auth_method: md5 }
>>>>>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>>>>>> auth_method: md5 }
>>>>>>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>>>>>>>> auth_method: md5 }
>>>>>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>>>>>> auth_method: md5 }
>>>>>>>>>>
>>>>>>>>>> postgresql_global_config_options:
>>>>>>>>>> - option: unix_socket_directories
>>>>>>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>>>>>>
>>>>>>>>>> - option: standard_conforming_strings
>>>>>>>>>> value: 'on'
>>>>>>>>>>
>>>>>>>>>> - option: shared_buffers
>>>>>>>>>> value: '1024MB'
>>>>>>>>>>
>>>>>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>>>>>> # checkpoint_segments=300
>>>>>>>>>> - option: max_wal_size
>>>>>>>>>> value: '14400MB'
>>>>>>>>>>
>>>>>>>>>> - option: min_wal_size
>>>>>>>>>> value: '80MB'
>>>>>>>>>>
>>>>>>>>>> - option: maintenance_work_mem
>>>>>>>>>> value: '2MB'
>>>>>>>>>>
>>>>>>>>>> - option: listen_addresses
>>>>>>>>>> value: '*'
>>>>>>>>>>
>>>>>>>>>> - option: max_connections
>>>>>>>>>> value: '400'
>>>>>>>>>>
>>>>>>>>>> - option: checkpoint_timeout
>>>>>>>>>> value: '900'
>>>>>>>>>>
>>>>>>>>>> - option: datestyle
>>>>>>>>>> value: "iso, mdy"
>>>>>>>>>>
>>>>>>>>>> - option: autovacuum
>>>>>>>>>> value: 'off'
>>>>>>>>>>
>>>>>>>>>> # vacuum all databases every night (full vacuum on Sunday night,
>>>>>>>>>> lazy vacuum every night)
>>>>>>>>>> - name: add postgresql cron lazy vacuum
>>>>>>>>>> cron:
>>>>>>>>>> name: lazy_vacuum
>>>>>>>>>> hour: 8
>>>>>>>>>> minute: 0
>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>>>>>> - name: add postgresql cron full vacuum
>>>>>>>>>> cron:
>>>>>>>>>> name: full_vacuum
>>>>>>>>>> weekday: 0
>>>>>>>>>> hour: 10
>>>>>>>>>> minute: 0
>>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>>>>>>>>>> # re-index all databases once a week
>>>>>>>>>> - name: add postgresql cron reindex
>>>>>>>>>> cron:
>>>>>>>>>> name: reindex
>>>>>>>>>> weekday: 0
>>>>>>>>>> hour: 12
>>>>>>>>>> minute: 0
>>>>>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from
>>>>>>>>>> pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres
>>>>>>>>>> {} -c \"reindex database {};\"' "
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is how I run 2.10.
>>>>>>>>>> Been running fine for some weeks without user intervention.
>>>>>>>>>> @Karl: Any comments please?
>>>>>>>>>> Steph
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>>

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
Hi Steph,

Right, you wouldn't want to touch the framework.

The effect of lower-casing the documentURI parameter in the
addOrReplaceDocumentWithException method in an output connector would be to
map multiple, independently-fetched, documents that differ only by the case
of the URL together into one document in the index.  The ManifoldCF
assumption is that a document with a certain URI can be tracked in the
index using exactly that URI.  Mapping the URI to lower case would break
that assumption so the framework would make the wrong decision in many
cases.

If you are picking up documents using the web connector, therefore, and you
are getting duplicate documents because the document URLs are sloppy, it is
therefore essential that INSTEAD of mapping the document URI to lower case
in the output connector, you map to lower case in the repository
connector.  Otherwise the framework will not work right.

There is a tab in the web connector that allows you to configure URL
normalization, called "Canonicalization".  This would be a very appropriate
place to add URL mapping to lower case.  It should be as simple as adding
one more checkbox column in the table, and modifying the method that does
the URL processing to include lower-casing.

Karl



On Tue, Sep 4, 2018 at 2:46 PM Steph van Schalkwyk <st...@remcam.net> wrote:

> Unless I have a massive misunderstanding somewhere...
>
>
>
>
> *Steph van Schalkwyk*
> Principal, Remcam Search Engines
> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
> <http://www.remcam.net/> Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Tue, Sep 4, 2018 at 1:42 PM, Steph van Schalkwyk <st...@remcam.net>
> wrote:
>
>> Hi Karl
>> I'm addressing it in the ES Output Connector.
>> Not touching the framework :)
>> S
>>
>>
>>
>> *Steph van Schalkwyk*
>> Principal, Remcam Search Engines
>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
>> <http://www.remcam.net/> Skype: svanschalkwyk
>> <https://mail.google.com/mail/u/0/#>
>> <http://linkedin.com/in/vanschalkwyk>
>>
>> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Let's make sure we're talking about the same thing.
>>>
>>> Here is the output connector method that receives the ID (as the
>>> documentURI parameter):
>>>
>>>   public int addOrReplaceDocumentWithException(String documentURI,
>>> VersionContext pipelineDescription, RepositoryDocument document, String
>>> authorityNameString, IOutputAddActivity activities)
>>>     throws ManifoldCFException, ServiceInterruption, IOException;
>>>
>>> ManifoldCF doesn't say anywhere that this ID is case insensitive.  If
>>> you make it case insensitive in an output connector, this will potentially
>>> break a lot of things, for example incremental indexing (which organizes
>>> the last indexed version by document ID).
>>>
>>> I therefore highly recommend that any "sloppyness" in this parameter be
>>> addressed in the Repository Connector that constructs it.  If the connector
>>> is crawling a repository that believes that URLs are case insensitive then
>>> it should map these IDs to lower case.  If not, then it shouldn't.
>>>
>>> Karl
>>>
>>>
>>> On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <st...@remcam.net>
>>> wrote:
>>>
>>>> Hi Karl.
>>>> The issue is that the ES Output Connector uses the uri to create the
>>>> _id. When used with IIS which allows case variation in the URI, it creates
>>>> multiple documents. Clients on Windows IIS are rarely cognizant of that
>>>> issue as IIS is so lax in policing that OTB.
>>>> Currently, every case variation in URI results in a new doc in the
>>>> index. This is only in the ES output connector.
>>>> I can add an optional checkbox to do determien that particular action
>>>> if that would help?
>>>> Regards,
>>>> Steph
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Steph van Schalkwyk*
>>>> Principal, Remcam Search Engines
>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>> <https://mail.google.com/mail/u/0/#>
>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>
>>>> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> THanks for the update.
>>>>> Lower-casing the ID would be fine except there are some connectors
>>>>> that care about case.  The web connector is one such because it's up to the
>>>>> web service to decide if case matters, so the web connector does not view
>>>>> urls with case differences as being the same.  Other connectors also will
>>>>> likely care as well. So I don't think lower-casing the document id is a
>>>>> smart thing to do.
>>>>>
>>>>> You could add this bit of configuration to the web connector, if
>>>>> that's what you are using, or to whatever other connector constructs the ID.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <st...@remcam.net>
>>>>> wrote:
>>>>>
>>>>>> Thanks Karl.
>>>>>>
>>>>>> I'll look into that.
>>>>>>
>>>>>> Another note:
>>>>>> Regarding the ES connector - I have made two additions to it and
>>>>>> should probably diff them for inclusion after approval:
>>>>>> 1. lowercased _id (the doc URI).
>>>>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
>>>>>> particularly IIS...)
>>>>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x does
>>>>>> not allow accedd to _id in the schema anymore, so no copy_field etc. from
>>>>>> _id). Hence "url".
>>>>>>
>>>>>> Regards,
>>>>>> Steph
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Steph van Schalkwyk*
>>>>>> Principal, Remcam Search Engines
>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Steph, I suspect that Jetty is leaking some resource, and we may
>>>>>>> need to upgrade it.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <
>>>>>>> steph@remcam.net> wrote:
>>>>>>>
>>>>>>>> Olivier
>>>>>>>> By all means.
>>>>>>>> The only issue I have seen (totally unrelated) is with Jetty, which
>>>>>>>> has to be restarted about once a week. Still trying to find the issue.
>>>>>>>> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10
>>>>>>>> may be a bit slower. I have no empiric evidence at the moment as I'm still
>>>>>>>> delivering the project to UAT. Will keep you posted.
>>>>>>>> Regards,
>>>>>>>> Steph
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Steph van Schalkwyk*
>>>>>>>> Principal, Remcam Search Engines
>>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>>
>>>>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry for
>>>>>>>>> the late answer). I will test it soon.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Olivier TAVARD
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a
>>>>>>>>> écrit :
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> These are the rpm installs:
>>>>>>>>> -
>>>>>>>>> file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>> -
>>>>>>>>> file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>> -
>>>>>>>>> file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>> -
>>>>>>>>> file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>>>
>>>>>>>>> postgresql_version: 10
>>>>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>>>>> postgresql_daemon: postgresql-10.service
>>>>>>>>> postgresql_packages:
>>>>>>>>> - postgresql10-libs
>>>>>>>>> - postgresql10
>>>>>>>>> - postgresql10-server
>>>>>>>>> - postgresql10-contrib
>>>>>>>>> # - postgresql10-devel
>>>>>>>>>
>>>>>>>>> postgresql_hba_entries:
>>>>>>>>> - { type: local, database: all, user: postgres, auth_method: peer
>>>>>>>>> }
>>>>>>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>>>>>>> - { type: host, database: all, user: all, address: '127.0.0.1/32',
>>>>>>>>> auth_method: md5 }
>>>>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>>>>> auth_method: md5 }
>>>>>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>>>>>>> auth_method: md5 }
>>>>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>>>>> auth_method: md5 }
>>>>>>>>>
>>>>>>>>> postgresql_global_config_options:
>>>>>>>>> - option: unix_socket_directories
>>>>>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>>>>>
>>>>>>>>> - option: standard_conforming_strings
>>>>>>>>> value: 'on'
>>>>>>>>>
>>>>>>>>> - option: shared_buffers
>>>>>>>>> value: '1024MB'
>>>>>>>>>
>>>>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>>>>> # checkpoint_segments=300
>>>>>>>>> - option: max_wal_size
>>>>>>>>> value: '14400MB'
>>>>>>>>>
>>>>>>>>> - option: min_wal_size
>>>>>>>>> value: '80MB'
>>>>>>>>>
>>>>>>>>> - option: maintenance_work_mem
>>>>>>>>> value: '2MB'
>>>>>>>>>
>>>>>>>>> - option: listen_addresses
>>>>>>>>> value: '*'
>>>>>>>>>
>>>>>>>>> - option: max_connections
>>>>>>>>> value: '400'
>>>>>>>>>
>>>>>>>>> - option: checkpoint_timeout
>>>>>>>>> value: '900'
>>>>>>>>>
>>>>>>>>> - option: datestyle
>>>>>>>>> value: "iso, mdy"
>>>>>>>>>
>>>>>>>>> - option: autovacuum
>>>>>>>>> value: 'off'
>>>>>>>>>
>>>>>>>>> # vacuum all databases every night (full vacuum on Sunday night,
>>>>>>>>> lazy vacuum every night)
>>>>>>>>> - name: add postgresql cron lazy vacuum
>>>>>>>>> cron:
>>>>>>>>> name: lazy_vacuum
>>>>>>>>> hour: 8
>>>>>>>>> minute: 0
>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>>>>> - name: add postgresql cron full vacuum
>>>>>>>>> cron:
>>>>>>>>> name: full_vacuum
>>>>>>>>> weekday: 0
>>>>>>>>> hour: 10
>>>>>>>>> minute: 0
>>>>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>>>>>>>>> # re-index all databases once a week
>>>>>>>>> - name: add postgresql cron reindex
>>>>>>>>> cron:
>>>>>>>>> name: reindex
>>>>>>>>> weekday: 0
>>>>>>>>> hour: 12
>>>>>>>>> minute: 0
>>>>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from
>>>>>>>>> pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres
>>>>>>>>> {} -c \"reindex database {};\"' "
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is how I run 2.10.
>>>>>>>>> Been running fine for some weeks without user intervention.
>>>>>>>>> @Karl: Any comments please?
>>>>>>>>> Steph
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
Unless I have a massive misunderstanding somewhere...




*Steph van Schalkwyk*
Principal, Remcam Search Engines
+1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
<http://www.remcam.net/> Skype: svanschalkwyk
<https://mail.google.com/mail/u/0/#>
<http://linkedin.com/in/vanschalkwyk>

On Tue, Sep 4, 2018 at 1:42 PM, Steph van Schalkwyk <st...@remcam.net>
wrote:

> Hi Karl
> I'm addressing it in the ES Output Connector.
> Not touching the framework :)
> S
>
>
>
> *Steph van Schalkwyk*
> Principal, Remcam Search Engines
> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
> <http://www.remcam.net/> Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright <da...@gmail.com> wrote:
>
>> Let's make sure we're talking about the same thing.
>>
>> Here is the output connector method that receives the ID (as the
>> documentURI parameter):
>>
>>   public int addOrReplaceDocumentWithException(String documentURI,
>> VersionContext pipelineDescription, RepositoryDocument document, String
>> authorityNameString, IOutputAddActivity activities)
>>     throws ManifoldCFException, ServiceInterruption, IOException;
>>
>> ManifoldCF doesn't say anywhere that this ID is case insensitive.  If you
>> make it case insensitive in an output connector, this will potentially
>> break a lot of things, for example incremental indexing (which organizes
>> the last indexed version by document ID).
>>
>> I therefore highly recommend that any "sloppyness" in this parameter be
>> addressed in the Repository Connector that constructs it.  If the connector
>> is crawling a repository that believes that URLs are case insensitive then
>> it should map these IDs to lower case.  If not, then it shouldn't.
>>
>> Karl
>>
>>
>> On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <st...@remcam.net>
>> wrote:
>>
>>> Hi Karl.
>>> The issue is that the ES Output Connector uses the uri to create the
>>> _id. When used with IIS which allows case variation in the URI, it creates
>>> multiple documents. Clients on Windows IIS are rarely cognizant of that
>>> issue as IIS is so lax in policing that OTB.
>>> Currently, every case variation in URI results in a new doc in the
>>> index. This is only in the ES output connector.
>>> I can add an optional checkbox to do determien that particular action if
>>> that would help?
>>> Regards,
>>> Steph
>>>
>>>
>>>
>>>
>>>
>>> *Steph van Schalkwyk*
>>> Principal, Remcam Search Engines
>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>> <https://mail.google.com/mail/u/0/#>
>>> <http://linkedin.com/in/vanschalkwyk>
>>>
>>> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> THanks for the update.
>>>> Lower-casing the ID would be fine except there are some connectors that
>>>> care about case.  The web connector is one such because it's up to the web
>>>> service to decide if case matters, so the web connector does not view urls
>>>> with case differences as being the same.  Other connectors also will likely
>>>> care as well. So I don't think lower-casing the document id is a smart
>>>> thing to do.
>>>>
>>>> You could add this bit of configuration to the web connector, if that's
>>>> what you are using, or to whatever other connector constructs the ID.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <st...@remcam.net>
>>>> wrote:
>>>>
>>>>> Thanks Karl.
>>>>>
>>>>> I'll look into that.
>>>>>
>>>>> Another note:
>>>>> Regarding the ES connector - I have made two additions to it and
>>>>> should probably diff them for inclusion after approval:
>>>>> 1. lowercased _id (the doc URI).
>>>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
>>>>> particularly IIS...)
>>>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x does
>>>>> not allow accedd to _id in the schema anymore, so no copy_field etc. from
>>>>> _id). Hence "url".
>>>>>
>>>>> Regards,
>>>>> Steph
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Steph van Schalkwyk*
>>>>> Principal, Remcam Search Engines
>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>> <https://mail.google.com/mail/u/0/#>
>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Steph, I suspect that Jetty is leaking some resource, and we may
>>>>>> need to upgrade it.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <st...@remcam.net>
>>>>>> wrote:
>>>>>>
>>>>>>> Olivier
>>>>>>> By all means.
>>>>>>> The only issue I have seen (totally unrelated) is with Jetty, which
>>>>>>> has to be restarted about once a week. Still trying to find the issue.
>>>>>>> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10
>>>>>>> may be a bit slower. I have no empiric evidence at the moment as I'm still
>>>>>>> delivering the project to UAT. Will keep you posted.
>>>>>>> Regards,
>>>>>>> Steph
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Steph van Schalkwyk*
>>>>>>> Principal, Remcam Search Engines
>>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry for
>>>>>>>> the late answer). I will test it soon.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Olivier TAVARD
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a
>>>>>>>> écrit :
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> These are the rpm installs:
>>>>>>>> - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.
>>>>>>>> x86_64.rpm
>>>>>>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>>> - file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.
>>>>>>>> rhel7.x86_64.rpm
>>>>>>>> - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.
>>>>>>>> x86_64.rpm
>>>>>>>> - file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.
>>>>>>>> x86_64.rpm
>>>>>>>>
>>>>>>>> postgresql_version: 10
>>>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>>>> postgresql_daemon: postgresql-10.service
>>>>>>>> postgresql_packages:
>>>>>>>> - postgresql10-libs
>>>>>>>> - postgresql10
>>>>>>>> - postgresql10-server
>>>>>>>> - postgresql10-contrib
>>>>>>>> # - postgresql10-devel
>>>>>>>>
>>>>>>>> postgresql_hba_entries:
>>>>>>>> - { type: local, database: all, user: postgres, auth_method: peer }
>>>>>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>>>>>> - { type: host, database: all, user: all, address: '127.0.0.1/32',
>>>>>>>> auth_method: md5 }
>>>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>>>> auth_method: md5 }
>>>>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>>>>>> auth_method: md5 }
>>>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>>>> auth_method: md5 }
>>>>>>>>
>>>>>>>> postgresql_global_config_options:
>>>>>>>> - option: unix_socket_directories
>>>>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>>>>
>>>>>>>> - option: standard_conforming_strings
>>>>>>>> value: 'on'
>>>>>>>>
>>>>>>>> - option: shared_buffers
>>>>>>>> value: '1024MB'
>>>>>>>>
>>>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>>>> # checkpoint_segments=300
>>>>>>>> - option: max_wal_size
>>>>>>>> value: '14400MB'
>>>>>>>>
>>>>>>>> - option: min_wal_size
>>>>>>>> value: '80MB'
>>>>>>>>
>>>>>>>> - option: maintenance_work_mem
>>>>>>>> value: '2MB'
>>>>>>>>
>>>>>>>> - option: listen_addresses
>>>>>>>> value: '*'
>>>>>>>>
>>>>>>>> - option: max_connections
>>>>>>>> value: '400'
>>>>>>>>
>>>>>>>> - option: checkpoint_timeout
>>>>>>>> value: '900'
>>>>>>>>
>>>>>>>> - option: datestyle
>>>>>>>> value: "iso, mdy"
>>>>>>>>
>>>>>>>> - option: autovacuum
>>>>>>>> value: 'off'
>>>>>>>>
>>>>>>>> # vacuum all databases every night (full vacuum on Sunday night,
>>>>>>>> lazy vacuum every night)
>>>>>>>> - name: add postgresql cron lazy vacuum
>>>>>>>> cron:
>>>>>>>> name: lazy_vacuum
>>>>>>>> hour: 8
>>>>>>>> minute: 0
>>>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>>>> - name: add postgresql cron full vacuum
>>>>>>>> cron:
>>>>>>>> name: full_vacuum
>>>>>>>> weekday: 0
>>>>>>>> hour: 10
>>>>>>>> minute: 0
>>>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>>>>>>>> # re-index all databases once a week
>>>>>>>> - name: add postgresql cron reindex
>>>>>>>> cron:
>>>>>>>> name: reindex
>>>>>>>> weekday: 0
>>>>>>>> hour: 12
>>>>>>>> minute: 0
>>>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from
>>>>>>>> pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres
>>>>>>>> {} -c \"reindex database {};\"' "
>>>>>>>>
>>>>>>>>
>>>>>>>> This is how I run 2.10.
>>>>>>>> Been running fine for some weeks without user intervention.
>>>>>>>> @Karl: Any comments please?
>>>>>>>> Steph
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
Hi Karl
I'm addressing it in the ES Output Connector.
Not touching the framework :)
S



*Steph van Schalkwyk*
Principal, Remcam Search Engines
+1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
<http://www.remcam.net/> Skype: svanschalkwyk
<https://mail.google.com/mail/u/0/#>
<http://linkedin.com/in/vanschalkwyk>

On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright <da...@gmail.com> wrote:

> Let's make sure we're talking about the same thing.
>
> Here is the output connector method that receives the ID (as the
> documentURI parameter):
>
>   public int addOrReplaceDocumentWithException(String documentURI,
> VersionContext pipelineDescription, RepositoryDocument document, String
> authorityNameString, IOutputAddActivity activities)
>     throws ManifoldCFException, ServiceInterruption, IOException;
>
> ManifoldCF doesn't say anywhere that this ID is case insensitive.  If you
> make it case insensitive in an output connector, this will potentially
> break a lot of things, for example incremental indexing (which organizes
> the last indexed version by document ID).
>
> I therefore highly recommend that any "sloppyness" in this parameter be
> addressed in the Repository Connector that constructs it.  If the connector
> is crawling a repository that believes that URLs are case insensitive then
> it should map these IDs to lower case.  If not, then it shouldn't.
>
> Karl
>
>
> On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <st...@remcam.net>
> wrote:
>
>> Hi Karl.
>> The issue is that the ES Output Connector uses the uri to create the _id.
>> When used with IIS which allows case variation in the URI, it creates
>> multiple documents. Clients on Windows IIS are rarely cognizant of that
>> issue as IIS is so lax in policing that OTB.
>> Currently, every case variation in URI results in a new doc in the index.
>> This is only in the ES output connector.
>> I can add an optional checkbox to do determien that particular action if
>> that would help?
>> Regards,
>> Steph
>>
>>
>>
>>
>>
>> *Steph van Schalkwyk*
>> Principal, Remcam Search Engines
>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
>> <http://www.remcam.net/> Skype: svanschalkwyk
>> <https://mail.google.com/mail/u/0/#>
>> <http://linkedin.com/in/vanschalkwyk>
>>
>> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> THanks for the update.
>>> Lower-casing the ID would be fine except there are some connectors that
>>> care about case.  The web connector is one such because it's up to the web
>>> service to decide if case matters, so the web connector does not view urls
>>> with case differences as being the same.  Other connectors also will likely
>>> care as well. So I don't think lower-casing the document id is a smart
>>> thing to do.
>>>
>>> You could add this bit of configuration to the web connector, if that's
>>> what you are using, or to whatever other connector constructs the ID.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <st...@remcam.net>
>>> wrote:
>>>
>>>> Thanks Karl.
>>>>
>>>> I'll look into that.
>>>>
>>>> Another note:
>>>> Regarding the ES connector - I have made two additions to it and should
>>>> probably diff them for inclusion after approval:
>>>> 1. lowercased _id (the doc URI).
>>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
>>>> particularly IIS...)
>>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x does not
>>>> allow accedd to _id in the schema anymore, so no copy_field etc. from _id).
>>>> Hence "url".
>>>>
>>>> Regards,
>>>> Steph
>>>>
>>>>
>>>>
>>>>
>>>> *Steph van Schalkwyk*
>>>> Principal, Remcam Search Engines
>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>> <https://mail.google.com/mail/u/0/#>
>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>
>>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Steph, I suspect that Jetty is leaking some resource, and we may
>>>>> need to upgrade it.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <st...@remcam.net>
>>>>> wrote:
>>>>>
>>>>>> Olivier
>>>>>> By all means.
>>>>>> The only issue I have seen (totally unrelated) is with Jetty, which
>>>>>> has to be restarted about once a week. Still trying to find the issue.
>>>>>> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10 may
>>>>>> be a bit slower. I have no empiric evidence at the moment as I'm still
>>>>>> delivering the project to UAT. Will keep you posted.
>>>>>> Regards,
>>>>>> Steph
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Steph van Schalkwyk*
>>>>>> Principal, Remcam Search Engines
>>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>>> <https://mail.google.com/mail/u/0/#>
>>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>>
>>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry for
>>>>>>> the late answer). I will test it soon.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>>
>>>>>>> Olivier TAVARD
>>>>>>>
>>>>>>>
>>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a
>>>>>>> écrit :
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> These are the rpm installs:
>>>>>>> - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.
>>>>>>> rhel7.x86_64.rpm
>>>>>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>> - file:///tmp/postgres10/postgresql10-contrib-10.4-
>>>>>>> 1PGDG.rhel7.x86_64.rpm
>>>>>>> - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.
>>>>>>> rhel7.x86_64.rpm
>>>>>>> - file:///tmp/postgres10/postgresql10-server-10.4-
>>>>>>> 1PGDG.rhel7.x86_64.rpm
>>>>>>>
>>>>>>> postgresql_version: 10
>>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>>> postgresql_daemon: postgresql-10.service
>>>>>>> postgresql_packages:
>>>>>>> - postgresql10-libs
>>>>>>> - postgresql10
>>>>>>> - postgresql10-server
>>>>>>> - postgresql10-contrib
>>>>>>> # - postgresql10-devel
>>>>>>>
>>>>>>> postgresql_hba_entries:
>>>>>>> - { type: local, database: all, user: postgres, auth_method: peer }
>>>>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>>>>> - { type: host, database: all, user: all, address: '127.0.0.1/32',
>>>>>>> auth_method: md5 }
>>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>>> auth_method: md5 }
>>>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>>>>> auth_method: md5 }
>>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>>> auth_method: md5 }
>>>>>>>
>>>>>>> postgresql_global_config_options:
>>>>>>> - option: unix_socket_directories
>>>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>>>
>>>>>>> - option: standard_conforming_strings
>>>>>>> value: 'on'
>>>>>>>
>>>>>>> - option: shared_buffers
>>>>>>> value: '1024MB'
>>>>>>>
>>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>>> # checkpoint_segments=300
>>>>>>> - option: max_wal_size
>>>>>>> value: '14400MB'
>>>>>>>
>>>>>>> - option: min_wal_size
>>>>>>> value: '80MB'
>>>>>>>
>>>>>>> - option: maintenance_work_mem
>>>>>>> value: '2MB'
>>>>>>>
>>>>>>> - option: listen_addresses
>>>>>>> value: '*'
>>>>>>>
>>>>>>> - option: max_connections
>>>>>>> value: '400'
>>>>>>>
>>>>>>> - option: checkpoint_timeout
>>>>>>> value: '900'
>>>>>>>
>>>>>>> - option: datestyle
>>>>>>> value: "iso, mdy"
>>>>>>>
>>>>>>> - option: autovacuum
>>>>>>> value: 'off'
>>>>>>>
>>>>>>> # vacuum all databases every night (full vacuum on Sunday night,
>>>>>>> lazy vacuum every night)
>>>>>>> - name: add postgresql cron lazy vacuum
>>>>>>> cron:
>>>>>>> name: lazy_vacuum
>>>>>>> hour: 8
>>>>>>> minute: 0
>>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>>> - name: add postgresql cron full vacuum
>>>>>>> cron:
>>>>>>> name: full_vacuum
>>>>>>> weekday: 0
>>>>>>> hour: 10
>>>>>>> minute: 0
>>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>>>>>>> # re-index all databases once a week
>>>>>>> - name: add postgresql cron reindex
>>>>>>> cron:
>>>>>>> name: reindex
>>>>>>> weekday: 0
>>>>>>> hour: 12
>>>>>>> minute: 0
>>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from
>>>>>>> pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres
>>>>>>> {} -c \"reindex database {};\"' "
>>>>>>>
>>>>>>>
>>>>>>> This is how I run 2.10.
>>>>>>> Been running fine for some weeks without user intervention.
>>>>>>> @Karl: Any comments please?
>>>>>>> Steph
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
Let's make sure we're talking about the same thing.

Here is the output connector method that receives the ID (as the
documentURI parameter):

  public int addOrReplaceDocumentWithException(String documentURI,
VersionContext pipelineDescription, RepositoryDocument document, String
authorityNameString, IOutputAddActivity activities)
    throws ManifoldCFException, ServiceInterruption, IOException;

ManifoldCF doesn't say anywhere that this ID is case insensitive.  If you
make it case insensitive in an output connector, this will potentially
break a lot of things, for example incremental indexing (which organizes
the last indexed version by document ID).

I therefore highly recommend that any "sloppyness" in this parameter be
addressed in the Repository Connector that constructs it.  If the connector
is crawling a repository that believes that URLs are case insensitive then
it should map these IDs to lower case.  If not, then it shouldn't.

Karl


On Tue, Sep 4, 2018 at 1:36 PM Steph van Schalkwyk <st...@remcam.net> wrote:

> Hi Karl.
> The issue is that the ES Output Connector uses the uri to create the _id.
> When used with IIS which allows case variation in the URI, it creates
> multiple documents. Clients on Windows IIS are rarely cognizant of that
> issue as IIS is so lax in policing that OTB.
> Currently, every case variation in URI results in a new doc in the index.
> This is only in the ES output connector.
> I can add an optional checkbox to do determien that particular action if
> that would help?
> Regards,
> Steph
>
>
>
>
>
> *Steph van Schalkwyk*
> Principal, Remcam Search Engines
> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
> <http://www.remcam.net/> Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com> wrote:
>
>> THanks for the update.
>> Lower-casing the ID would be fine except there are some connectors that
>> care about case.  The web connector is one such because it's up to the web
>> service to decide if case matters, so the web connector does not view urls
>> with case differences as being the same.  Other connectors also will likely
>> care as well. So I don't think lower-casing the document id is a smart
>> thing to do.
>>
>> You could add this bit of configuration to the web connector, if that's
>> what you are using, or to whatever other connector constructs the ID.
>>
>> Karl
>>
>>
>>
>> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <st...@remcam.net>
>> wrote:
>>
>>> Thanks Karl.
>>>
>>> I'll look into that.
>>>
>>> Another note:
>>> Regarding the ES connector - I have made two additions to it and should
>>> probably diff them for inclusion after approval:
>>> 1. lowercased _id (the doc URI).
>>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
>>> particularly IIS...)
>>> 3. Added a "url" metadata field to the ES connector (as ES 6.x does not
>>> allow accedd to _id in the schema anymore, so no copy_field etc. from _id).
>>> Hence "url".
>>>
>>> Regards,
>>> Steph
>>>
>>>
>>>
>>>
>>> *Steph van Schalkwyk*
>>> Principal, Remcam Search Engines
>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>> <https://mail.google.com/mail/u/0/#>
>>> <http://linkedin.com/in/vanschalkwyk>
>>>
>>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> Hi Steph, I suspect that Jetty is leaking some resource, and we may
>>>> need to upgrade it.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <st...@remcam.net>
>>>> wrote:
>>>>
>>>>> Olivier
>>>>> By all means.
>>>>> The only issue I have seen (totally unrelated) is with Jetty, which
>>>>> has to be restarted about once a week. Still trying to find the issue.
>>>>> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10 may
>>>>> be a bit slower. I have no empiric evidence at the moment as I'm still
>>>>> delivering the project to UAT. Will keep you posted.
>>>>> Regards,
>>>>> Steph
>>>>>
>>>>>
>>>>>
>>>>> *Steph van Schalkwyk*
>>>>> Principal, Remcam Search Engines
>>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>>> <https://mail.google.com/mail/u/0/#>
>>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>>
>>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>>> olivier.tavard@francelabs.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry for the
>>>>>> late answer). I will test it soon.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>>
>>>>>> Olivier TAVARD
>>>>>>
>>>>>>
>>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a
>>>>>> écrit :
>>>>>>
>>>>>>
>>>>>>
>>>>>> These are the rpm installs:
>>>>>> -
>>>>>> file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>> -
>>>>>> file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>> -
>>>>>> file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>> -
>>>>>> file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>>>>>>
>>>>>> postgresql_version: 10
>>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>>> postgresql_daemon: postgresql-10.service
>>>>>> postgresql_packages:
>>>>>> - postgresql10-libs
>>>>>> - postgresql10
>>>>>> - postgresql10-server
>>>>>> - postgresql10-contrib
>>>>>> # - postgresql10-devel
>>>>>>
>>>>>> postgresql_hba_entries:
>>>>>> - { type: local, database: all, user: postgres, auth_method: peer }
>>>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>>>> - { type: host, database: all, user: all, address: '127.0.0.1/32',
>>>>>> auth_method: md5 }
>>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>>> auth_method: md5 }
>>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>>>> auth_method: md5 }
>>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>>> auth_method: md5 }
>>>>>>
>>>>>> postgresql_global_config_options:
>>>>>> - option: unix_socket_directories
>>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>>
>>>>>> - option: standard_conforming_strings
>>>>>> value: 'on'
>>>>>>
>>>>>> - option: shared_buffers
>>>>>> value: '1024MB'
>>>>>>
>>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>>> # checkpoint_segments=300
>>>>>> - option: max_wal_size
>>>>>> value: '14400MB'
>>>>>>
>>>>>> - option: min_wal_size
>>>>>> value: '80MB'
>>>>>>
>>>>>> - option: maintenance_work_mem
>>>>>> value: '2MB'
>>>>>>
>>>>>> - option: listen_addresses
>>>>>> value: '*'
>>>>>>
>>>>>> - option: max_connections
>>>>>> value: '400'
>>>>>>
>>>>>> - option: checkpoint_timeout
>>>>>> value: '900'
>>>>>>
>>>>>> - option: datestyle
>>>>>> value: "iso, mdy"
>>>>>>
>>>>>> - option: autovacuum
>>>>>> value: 'off'
>>>>>>
>>>>>> # vacuum all databases every night (full vacuum on Sunday night, lazy
>>>>>> vacuum every night)
>>>>>> - name: add postgresql cron lazy vacuum
>>>>>> cron:
>>>>>> name: lazy_vacuum
>>>>>> hour: 8
>>>>>> minute: 0
>>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>>> - name: add postgresql cron full vacuum
>>>>>> cron:
>>>>>> name: full_vacuum
>>>>>> weekday: 0
>>>>>> hour: 10
>>>>>> minute: 0
>>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>>>>>> # re-index all databases once a week
>>>>>> - name: add postgresql cron reindex
>>>>>> cron:
>>>>>> name: reindex
>>>>>> weekday: 0
>>>>>> hour: 12
>>>>>> minute: 0
>>>>>> job: "su - postgres -c 'psql -t -c \"select datname from pg_database
>>>>>> order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres {} -c
>>>>>> \"reindex database {};\"' "
>>>>>>
>>>>>>
>>>>>> This is how I run 2.10.
>>>>>> Been running fine for some weeks without user intervention.
>>>>>> @Karl: Any comments please?
>>>>>> Steph
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
Hi Karl.
The issue is that the ES Output Connector uses the uri to create the _id.
When used with IIS which allows case variation in the URI, it creates
multiple documents. Clients on Windows IIS are rarely cognizant of that
issue as IIS is so lax in policing that OTB.
Currently, every case variation in URI results in a new doc in the index.
This is only in the ES output connector.
I can add an optional checkbox to do determien that particular action if
that would help?
Regards,
Steph





*Steph van Schalkwyk*
Principal, Remcam Search Engines
+1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
<http://www.remcam.net/> Skype: svanschalkwyk
<https://mail.google.com/mail/u/0/#>
<http://linkedin.com/in/vanschalkwyk>

On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright <da...@gmail.com> wrote:

> THanks for the update.
> Lower-casing the ID would be fine except there are some connectors that
> care about case.  The web connector is one such because it's up to the web
> service to decide if case matters, so the web connector does not view urls
> with case differences as being the same.  Other connectors also will likely
> care as well. So I don't think lower-casing the document id is a smart
> thing to do.
>
> You could add this bit of configuration to the web connector, if that's
> what you are using, or to whatever other connector constructs the ID.
>
> Karl
>
>
>
> On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <st...@remcam.net>
> wrote:
>
>> Thanks Karl.
>>
>> I'll look into that.
>>
>> Another note:
>> Regarding the ES connector - I have made two additions to it and should
>> probably diff them for inclusion after approval:
>> 1. lowercased _id (the doc URI).
>> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
>> particularly IIS...)
>> 3. Added a "url" metadata field to the ES connector (as ES 6.x does not
>> allow accedd to _id in the schema anymore, so no copy_field etc. from _id).
>> Hence "url".
>>
>> Regards,
>> Steph
>>
>>
>>
>>
>> *Steph van Schalkwyk*
>> Principal, Remcam Search Engines
>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
>> <http://www.remcam.net/> Skype: svanschalkwyk
>> <https://mail.google.com/mail/u/0/#>
>> <http://linkedin.com/in/vanschalkwyk>
>>
>> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Hi Steph, I suspect that Jetty is leaking some resource, and we may need
>>> to upgrade it.
>>>
>>> Karl
>>>
>>>
>>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <st...@remcam.net>
>>> wrote:
>>>
>>>> Olivier
>>>> By all means.
>>>> The only issue I have seen (totally unrelated) is with Jetty, which has
>>>> to be restarted about once a week. Still trying to find the issue.
>>>> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10 may
>>>> be a bit slower. I have no empiric evidence at the moment as I'm still
>>>> delivering the project to UAT. Will keep you posted.
>>>> Regards,
>>>> Steph
>>>>
>>>>
>>>>
>>>> *Steph van Schalkwyk*
>>>> Principal, Remcam Search Engines
>>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>>> <https://mail.google.com/mail/u/0/#>
>>>> <http://linkedin.com/in/vanschalkwyk>
>>>>
>>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>>> olivier.tavard@francelabs.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry for the
>>>>> late answer). I will test it soon.
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>> Olivier TAVARD
>>>>>
>>>>>
>>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a
>>>>> écrit :
>>>>>
>>>>>
>>>>>
>>>>> These are the rpm installs:
>>>>> - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>>>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>>> - file:///tmp/postgres10/postgresql10-contrib-10.4-
>>>>> 1PGDG.rhel7.x86_64.rpm
>>>>> - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.
>>>>> rhel7.x86_64.rpm
>>>>> - file:///tmp/postgres10/postgresql10-server-10.4-
>>>>> 1PGDG.rhel7.x86_64.rpm
>>>>>
>>>>> postgresql_version: 10
>>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>>> postgresql_daemon: postgresql-10.service
>>>>> postgresql_packages:
>>>>> - postgresql10-libs
>>>>> - postgresql10
>>>>> - postgresql10-server
>>>>> - postgresql10-contrib
>>>>> # - postgresql10-devel
>>>>>
>>>>> postgresql_hba_entries:
>>>>> - { type: local, database: all, user: postgres, auth_method: peer }
>>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>>> - { type: host, database: all, user: all, address: '127.0.0.1/32',
>>>>> auth_method: md5 }
>>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>>> auth_method: md5 }
>>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>>> auth_method: md5 }
>>>>> - { type: host, database: all, user: all, address: '::0/0',
>>>>> auth_method: md5 }
>>>>>
>>>>> postgresql_global_config_options:
>>>>> - option: unix_socket_directories
>>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>>
>>>>> - option: standard_conforming_strings
>>>>> value: 'on'
>>>>>
>>>>> - option: shared_buffers
>>>>> value: '1024MB'
>>>>>
>>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>>> # checkpoint_segments=300
>>>>> - option: max_wal_size
>>>>> value: '14400MB'
>>>>>
>>>>> - option: min_wal_size
>>>>> value: '80MB'
>>>>>
>>>>> - option: maintenance_work_mem
>>>>> value: '2MB'
>>>>>
>>>>> - option: listen_addresses
>>>>> value: '*'
>>>>>
>>>>> - option: max_connections
>>>>> value: '400'
>>>>>
>>>>> - option: checkpoint_timeout
>>>>> value: '900'
>>>>>
>>>>> - option: datestyle
>>>>> value: "iso, mdy"
>>>>>
>>>>> - option: autovacuum
>>>>> value: 'off'
>>>>>
>>>>> # vacuum all databases every night (full vacuum on Sunday night, lazy
>>>>> vacuum every night)
>>>>> - name: add postgresql cron lazy vacuum
>>>>> cron:
>>>>> name: lazy_vacuum
>>>>> hour: 8
>>>>> minute: 0
>>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>>> - name: add postgresql cron full vacuum
>>>>> cron:
>>>>> name: full_vacuum
>>>>> weekday: 0
>>>>> hour: 10
>>>>> minute: 0
>>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>>>>> # re-index all databases once a week
>>>>> - name: add postgresql cron reindex
>>>>> cron:
>>>>> name: reindex
>>>>> weekday: 0
>>>>> hour: 12
>>>>> minute: 0
>>>>> job: "su - postgres -c 'psql -t -c \"select datname from pg_database
>>>>> order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres {} -c
>>>>> \"reindex database {};\"' "
>>>>>
>>>>>
>>>>> This is how I run 2.10.
>>>>> Been running fine for some weeks without user intervention.
>>>>> @Karl: Any comments please?
>>>>> Steph
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
THanks for the update.
Lower-casing the ID would be fine except there are some connectors that
care about case.  The web connector is one such because it's up to the web
service to decide if case matters, so the web connector does not view urls
with case differences as being the same.  Other connectors also will likely
care as well. So I don't think lower-casing the document id is a smart
thing to do.

You could add this bit of configuration to the web connector, if that's
what you are using, or to whatever other connector constructs the ID.

Karl



On Tue, Sep 4, 2018 at 12:04 PM Steph van Schalkwyk <st...@remcam.net>
wrote:

> Thanks Karl.
>
> I'll look into that.
>
> Another note:
> Regarding the ES connector - I have made two additions to it and should
> probably diff them for inclusion after approval:
> 1. lowercased _id (the doc URI).
> 2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
> particularly IIS...)
> 3. Added a "url" metadata field to the ES connector (as ES 6.x does not
> allow accedd to _id in the schema anymore, so no copy_field etc. from _id).
> Hence "url".
>
> Regards,
> Steph
>
>
>
>
> *Steph van Schalkwyk*
> Principal, Remcam Search Engines
> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
> <http://www.remcam.net/> Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi Steph, I suspect that Jetty is leaking some resource, and we may need
>> to upgrade it.
>>
>> Karl
>>
>>
>> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <st...@remcam.net>
>> wrote:
>>
>>> Olivier
>>> By all means.
>>> The only issue I have seen (totally unrelated) is with Jetty, which has
>>> to be restarted about once a week. Still trying to find the issue.
>>> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10 may be
>>> a bit slower. I have no empiric evidence at the moment as I'm still
>>> delivering the project to UAT. Will keep you posted.
>>> Regards,
>>> Steph
>>>
>>>
>>>
>>> *Steph van Schalkwyk*
>>> Principal, Remcam Search Engines
>>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net
>>> http://remcam.net <http://www.remcam.net/> Skype: svanschalkwyk
>>> <https://mail.google.com/mail/u/0/#>
>>> <http://linkedin.com/in/vanschalkwyk>
>>>
>>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>>> olivier.tavard@francelabs.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Thanks a lot for sharing your PostgreSQL configuration (sorry for the
>>>> late answer). I will test it soon.
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Olivier TAVARD
>>>>
>>>>
>>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a
>>>> écrit :
>>>>
>>>>
>>>>
>>>> These are the rpm installs:
>>>> - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>>> -
>>>> file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
>>>> - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>>>> -
>>>> file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>>>>
>>>> postgresql_version: 10
>>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>>> postgresql_bin_path: /usr/pgsql-10/bin
>>>> postgresql_config_path: /var/lib/pgsql/10/data
>>>> postgresql_daemon: postgresql-10.service
>>>> postgresql_packages:
>>>> - postgresql10-libs
>>>> - postgresql10
>>>> - postgresql10-server
>>>> - postgresql10-contrib
>>>> # - postgresql10-devel
>>>>
>>>> postgresql_hba_entries:
>>>> - { type: local, database: all, user: postgres, auth_method: peer }
>>>> - { type: local, database: all, user: all, auth_method: peer }
>>>> - { type: host, database: all, user: all, address: '127.0.0.1/32',
>>>> auth_method: md5 }
>>>> - { type: host, database: all, user: all, address: '::1/128',
>>>> auth_method: md5 }
>>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>>> auth_method: md5 }
>>>> - { type: host, database: all, user: all, address: '::0/0', auth_method:
>>>> md5 }
>>>>
>>>> postgresql_global_config_options:
>>>> - option: unix_socket_directories
>>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>>
>>>> - option: standard_conforming_strings
>>>> value: 'on'
>>>>
>>>> - option: shared_buffers
>>>> value: '1024MB'
>>>>
>>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>>> # checkpoint_segments=300
>>>> - option: max_wal_size
>>>> value: '14400MB'
>>>>
>>>> - option: min_wal_size
>>>> value: '80MB'
>>>>
>>>> - option: maintenance_work_mem
>>>> value: '2MB'
>>>>
>>>> - option: listen_addresses
>>>> value: '*'
>>>>
>>>> - option: max_connections
>>>> value: '400'
>>>>
>>>> - option: checkpoint_timeout
>>>> value: '900'
>>>>
>>>> - option: datestyle
>>>> value: "iso, mdy"
>>>>
>>>> - option: autovacuum
>>>> value: 'off'
>>>>
>>>> # vacuum all databases every night (full vacuum on Sunday night, lazy
>>>> vacuum every night)
>>>> - name: add postgresql cron lazy vacuum
>>>> cron:
>>>> name: lazy_vacuum
>>>> hour: 8
>>>> minute: 0
>>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>>> - name: add postgresql cron full vacuum
>>>> cron:
>>>> name: full_vacuum
>>>> weekday: 0
>>>> hour: 10
>>>> minute: 0
>>>> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>>>> # re-index all databases once a week
>>>> - name: add postgresql cron reindex
>>>> cron:
>>>> name: reindex
>>>> weekday: 0
>>>> hour: 12
>>>> minute: 0
>>>> job: "su - postgres -c 'psql -t -c \"select datname from pg_database
>>>> order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres {} -c
>>>> \"reindex database {};\"' "
>>>>
>>>>
>>>> This is how I run 2.10.
>>>> Been running fine for some weeks without user intervention.
>>>> @Karl: Any comments please?
>>>> Steph
>>>>
>>>>
>>>>
>>>>
>>>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
Thanks Karl.

I'll look into that.

Another note:
Regarding the ES connector - I have made two additions to it and should
probably diff them for inclusion after approval:
1. lowercased _id (the doc URI).
2. Removed dual "/" , e.g. "//" in the _id (I have sloppy sources,
particularly IIS...)
3. Added a "url" metadata field to the ES connector (as ES 6.x does not
allow accedd to _id in the schema anymore, so no copy_field etc. from _id).
Hence "url".

Regards,
Steph




*Steph van Schalkwyk*
Principal, Remcam Search Engines
+1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
<http://www.remcam.net/> Skype: svanschalkwyk
<https://mail.google.com/mail/u/0/#>
<http://linkedin.com/in/vanschalkwyk>

On Tue, Sep 4, 2018 at 10:50 AM, Karl Wright <da...@gmail.com> wrote:

> Hi Steph, I suspect that Jetty is leaking some resource, and we may need
> to upgrade it.
>
> Karl
>
>
> On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <st...@remcam.net>
> wrote:
>
>> Olivier
>> By all means.
>> The only issue I have seen (totally unrelated) is with Jetty, which has
>> to be restarted about once a week. Still trying to find the issue.
>> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10 may be
>> a bit slower. I have no empiric evidence at the moment as I'm still
>> delivering the project to UAT. Will keep you posted.
>> Regards,
>> Steph
>>
>>
>>
>> *Steph van Schalkwyk*
>> Principal, Remcam Search Engines
>> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
>> <http://www.remcam.net/> Skype: svanschalkwyk
>> <https://mail.google.com/mail/u/0/#>
>> <http://linkedin.com/in/vanschalkwyk>
>>
>> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
>> olivier.tavard@francelabs.com> wrote:
>>
>>> Hello,
>>>
>>> Thanks a lot for sharing your PostgreSQL configuration (sorry for the
>>> late answer). I will test it soon.
>>>
>>> Best regards,
>>>
>>>
>>> Olivier TAVARD
>>>
>>>
>>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a écrit
>>> :
>>>
>>>
>>>
>>> These are the rpm installs:
>>> - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>>> - file:///tmp/postgres10/postgresql10-contrib-10.4-
>>> 1PGDG.rhel7.x86_64.rpm
>>> - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>>> - file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>>>
>>> postgresql_version: 10
>>> postgresql_data_dir: /var/lib/pgsql/10/data
>>> postgresql_bin_path: /usr/pgsql-10/bin
>>> postgresql_config_path: /var/lib/pgsql/10/data
>>> postgresql_daemon: postgresql-10.service
>>> postgresql_packages:
>>> - postgresql10-libs
>>> - postgresql10
>>> - postgresql10-server
>>> - postgresql10-contrib
>>> # - postgresql10-devel
>>>
>>> postgresql_hba_entries:
>>> - { type: local, database: all, user: postgres, auth_method: peer }
>>> - { type: local, database: all, user: all, auth_method: peer }
>>> - { type: host, database: all, user: all, address: '127.0.0.1/32',
>>> auth_method: md5 }
>>> - { type: host, database: all, user: all, address: '::1/128',
>>> auth_method: md5 }
>>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>>> auth_method: md5 }
>>> - { type: host, database: all, user: all, address: '::0/0', auth_method:
>>> md5 }
>>>
>>> postgresql_global_config_options:
>>> - option: unix_socket_directories
>>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>>
>>> - option: standard_conforming_strings
>>> value: 'on'
>>>
>>> - option: shared_buffers
>>> value: '1024MB'
>>>
>>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>>> # checkpoint_segments=300
>>> - option: max_wal_size
>>> value: '14400MB'
>>>
>>> - option: min_wal_size
>>> value: '80MB'
>>>
>>> - option: maintenance_work_mem
>>> value: '2MB'
>>>
>>> - option: listen_addresses
>>> value: '*'
>>>
>>> - option: max_connections
>>> value: '400'
>>>
>>> - option: checkpoint_timeout
>>> value: '900'
>>>
>>> - option: datestyle
>>> value: "iso, mdy"
>>>
>>> - option: autovacuum
>>> value: 'off'
>>>
>>> # vacuum all databases every night (full vacuum on Sunday night, lazy
>>> vacuum every night)
>>> - name: add postgresql cron lazy vacuum
>>> cron:
>>> name: lazy_vacuum
>>> hour: 8
>>> minute: 0
>>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>>> - name: add postgresql cron full vacuum
>>> cron:
>>> name: full_vacuum
>>> weekday: 0
>>> hour: 10
>>> minute: 0
>>> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>>> # re-index all databases once a week
>>> - name: add postgresql cron reindex
>>> cron:
>>> name: reindex
>>> weekday: 0
>>> hour: 12
>>> minute: 0
>>> job: "su - postgres -c 'psql -t -c \"select datname from pg_database
>>> order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres {} -c
>>> \"reindex database {};\"' "
>>>
>>>
>>> This is how I run 2.10.
>>> Been running fine for some weeks without user intervention.
>>> @Karl: Any comments please?
>>> Steph
>>>
>>>
>>>
>>>
>>

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
Hi Steph, I suspect that Jetty is leaking some resource, and we may need to
upgrade it.

Karl


On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk <st...@remcam.net>
wrote:

> Olivier
> By all means.
> The only issue I have seen (totally unrelated) is with Jetty, which has to
> be restarted about once a week. Still trying to find the issue.
> I may be overly sensitive, but I suspect MCF 2.10 with Postgres10 may be a
> bit slower. I have no empiric evidence at the moment as I'm still
> delivering the project to UAT. Will keep you posted.
> Regards,
> Steph
>
>
>
> *Steph van Schalkwyk*
> Principal, Remcam Search Engines
> +1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
> <http://www.remcam.net/> Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
> olivier.tavard@francelabs.com> wrote:
>
>> Hello,
>>
>> Thanks a lot for sharing your PostgreSQL configuration (sorry for the
>> late answer). I will test it soon.
>>
>> Best regards,
>>
>>
>> Olivier TAVARD
>>
>>
>> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a écrit :
>>
>>
>>
>> These are the rpm installs:
>> - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>> - file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
>> - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>> - file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>>
>> postgresql_version: 10
>> postgresql_data_dir: /var/lib/pgsql/10/data
>> postgresql_bin_path: /usr/pgsql-10/bin
>> postgresql_config_path: /var/lib/pgsql/10/data
>> postgresql_daemon: postgresql-10.service
>> postgresql_packages:
>> - postgresql10-libs
>> - postgresql10
>> - postgresql10-server
>> - postgresql10-contrib
>> # - postgresql10-devel
>>
>> postgresql_hba_entries:
>> - { type: local, database: all, user: postgres, auth_method: peer }
>> - { type: local, database: all, user: all, auth_method: peer }
>> - { type: host, database: all, user: all, address: '127.0.0.1/32',
>> auth_method: md5 }
>> - { type: host, database: all, user: all, address: '::1/128', auth_method:
>> md5 }
>> - { type: host, database: all, user: all, address: '0.0.0.0/0',
>> auth_method: md5 }
>> - { type: host, database: all, user: all, address: '::0/0', auth_method:
>> md5 }
>>
>> postgresql_global_config_options:
>> - option: unix_socket_directories
>> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>>
>> - option: standard_conforming_strings
>> value: 'on'
>>
>> - option: shared_buffers
>> value: '1024MB'
>>
>> # max_wal_size = (3 * checkpoint_segments) * 16MB
>> # checkpoint_segments=300
>> - option: max_wal_size
>> value: '14400MB'
>>
>> - option: min_wal_size
>> value: '80MB'
>>
>> - option: maintenance_work_mem
>> value: '2MB'
>>
>> - option: listen_addresses
>> value: '*'
>>
>> - option: max_connections
>> value: '400'
>>
>> - option: checkpoint_timeout
>> value: '900'
>>
>> - option: datestyle
>> value: "iso, mdy"
>>
>> - option: autovacuum
>> value: 'off'
>>
>> # vacuum all databases every night (full vacuum on Sunday night, lazy
>> vacuum every night)
>> - name: add postgresql cron lazy vacuum
>> cron:
>> name: lazy_vacuum
>> hour: 8
>> minute: 0
>> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>> - name: add postgresql cron full vacuum
>> cron:
>> name: full_vacuum
>> weekday: 0
>> hour: 10
>> minute: 0
>> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>> # re-index all databases once a week
>> - name: add postgresql cron reindex
>> cron:
>> name: reindex
>> weekday: 0
>> hour: 12
>> minute: 0
>> job: "su - postgres -c 'psql -t -c \"select datname from pg_database
>> order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres {} -c
>> \"reindex database {};\"' "
>>
>>
>> This is how I run 2.10.
>> Been running fine for some weeks without user intervention.
>> @Karl: Any comments please?
>> Steph
>>
>>
>>
>>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
Olivier
By all means.
The only issue I have seen (totally unrelated) is with Jetty, which has to
be restarted about once a week. Still trying to find the issue.
I may be overly sensitive, but I suspect MCF 2.10 with Postgres10 may be a
bit slower. I have no empiric evidence at the moment as I'm still
delivering the project to UAT. Will keep you posted.
Regards,
Steph



*Steph van Schalkwyk*
Principal, Remcam Search Engines
+1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
<http://www.remcam.net/> Skype: svanschalkwyk
<https://mail.google.com/mail/u/0/#>
<http://linkedin.com/in/vanschalkwyk>

On Tue, Sep 4, 2018 at 9:59 AM, Olivier Tavard <
olivier.tavard@francelabs.com> wrote:

> Hello,
>
> Thanks a lot for sharing your PostgreSQL configuration (sorry for the late
> answer). I will test it soon.
>
> Best regards,
>
>
> Olivier TAVARD
>
>
> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a écrit :
>
>
>
> These are the rpm installs:
> - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
> - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
> - file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
> - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
> - file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
>
> postgresql_version: 10
> postgresql_data_dir: /var/lib/pgsql/10/data
> postgresql_bin_path: /usr/pgsql-10/bin
> postgresql_config_path: /var/lib/pgsql/10/data
> postgresql_daemon: postgresql-10.service
> postgresql_packages:
> - postgresql10-libs
> - postgresql10
> - postgresql10-server
> - postgresql10-contrib
> # - postgresql10-devel
>
> postgresql_hba_entries:
> - { type: local, database: all, user: postgres, auth_method: peer }
> - { type: local, database: all, user: all, auth_method: peer }
> - { type: host, database: all, user: all, address: '127.0.0.1/32',
> auth_method: md5 }
> - { type: host, database: all, user: all, address: '::1/128', auth_method:
> md5 }
> - { type: host, database: all, user: all, address: '0.0.0.0/0',
> auth_method: md5 }
> - { type: host, database: all, user: all, address: '::0/0', auth_method:
> md5 }
>
> postgresql_global_config_options:
> - option: unix_socket_directories
> value: '{{ postgresql_unix_socket_directories | join(",") }}'
>
> - option: standard_conforming_strings
> value: 'on'
>
> - option: shared_buffers
> value: '1024MB'
>
> # max_wal_size = (3 * checkpoint_segments) * 16MB
> # checkpoint_segments=300
> - option: max_wal_size
> value: '14400MB'
>
> - option: min_wal_size
> value: '80MB'
>
> - option: maintenance_work_mem
> value: '2MB'
>
> - option: listen_addresses
> value: '*'
>
> - option: max_connections
> value: '400'
>
> - option: checkpoint_timeout
> value: '900'
>
> - option: datestyle
> value: "iso, mdy"
>
> - option: autovacuum
> value: 'off'
>
> # vacuum all databases every night (full vacuum on Sunday night, lazy
> vacuum every night)
> - name: add postgresql cron lazy vacuum
> cron:
> name: lazy_vacuum
> hour: 8
> minute: 0
> job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
> - name: add postgresql cron full vacuum
> cron:
> name: full_vacuum
> weekday: 0
> hour: 10
> minute: 0
> job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
> # re-index all databases once a week
> - name: add postgresql cron reindex
> cron:
> name: reindex
> weekday: 0
> hour: 12
> minute: 0
> job: "su - postgres -c 'psql -t -c \"select datname from pg_database
> order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres {} -c
> \"reindex database {};\"' "
>
>
> This is how I run 2.10.
> Been running fine for some weeks without user intervention.
> @Karl: Any comments please?
> Steph
>
>
>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Olivier Tavard <ol...@francelabs.com>.
Hello,

Thanks a lot for sharing your PostgreSQL configuration (sorry for the late answer). I will test it soon.

Best regards,


Olivier TAVARD


> Le 23 août 2018 à 19:20, Steph van Schalkwyk <st...@remcam.net> a écrit :
> 
> 
> 
> These are the rpm installs:
>         - file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
>         - file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
>         - file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
>         - file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
>         - file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm
> 
>       postgresql_version: 10
>       postgresql_data_dir: /var/lib/pgsql/10/data
>       postgresql_bin_path: /usr/pgsql-10/bin
>       postgresql_config_path: /var/lib/pgsql/10/data
>       postgresql_daemon: postgresql-10.service
>       postgresql_packages:
>         - postgresql10-libs
>         - postgresql10
>         - postgresql10-server
>         - postgresql10-contrib
> #        - postgresql10-devel    
> 
>       postgresql_hba_entries:
>         - { type: local, database: all, user: postgres, auth_method: peer }
>         - { type: local, database: all, user: all, auth_method: peer }
>         - { type: host, database: all, user: all, address: '127.0.0.1/32 <http://127.0.0.1/32>', auth_method: md5 }
>         - { type: host, database: all, user: all, address: '::1/128', auth_method: md5 }         
>         - { type: host, database: all, user: all, address: '0.0.0.0/0 <http://0.0.0.0/0>', auth_method: md5 }
>         - { type: host, database: all, user: all, address: '::0/0', auth_method: md5 }
> 
>       postgresql_global_config_options:
>         - option: unix_socket_directories
>           value: '{{ postgresql_unix_socket_directories | join(",") }}'
> 
>         - option: standard_conforming_strings
>           value: 'on'
> 
>         - option: shared_buffers
>           value: '1024MB'
> 
>         # max_wal_size = (3 * checkpoint_segments) * 16MB
>         # checkpoint_segments=300
>         - option: max_wal_size
>           value: '14400MB'
> 
>         - option: min_wal_size
>           value: '80MB'
> 
>         - option: maintenance_work_mem
>           value: '2MB'
> 
>         - option: listen_addresses
>           value: '*'
> 
>         - option: max_connections
>           value: '400'
> 
>         - option: checkpoint_timeout
>           value: '900'
> 
>         - option: datestyle
>           value: "iso, mdy"
> 
>         - option: autovacuum
>           value: 'off'
> 
>     # vacuum all databases every night (full vacuum on Sunday night, lazy vacuum every night)
>     - name: add postgresql cron lazy vacuum
>       cron:
>         name: lazy_vacuum
>         hour: 8
>         minute: 0
>         job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
>     - name: add postgresql cron full vacuum
>       cron:
>         name: full_vacuum
>         weekday: 0
>         hour: 10
>         minute: 0
>         job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
>     # re-index all databases once a week
>     - name: add postgresql cron reindex
>       cron:
>         name: reindex
>         weekday: 0
>         hour: 12
>         minute: 0
>         job: "su - postgres -c 'psql -t -c \"select datname from pg_database order by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres {} -c \"reindex database {};\"' "
> 
> 
> This is how I run 2.10.
> Been running fine for some weeks without user intervention.
> @Karl: Any comments please?
> Steph
> 
> 


Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
These are the rpm installs:
- file:///tmp/postgres10/postgresql10-libs-10.4-1PGDG.rhel7.x86_64.rpm
- file:///tmp/postgres10/postgresql10-10.4-1PGDG.rhel7.x86_64.rpm
- file:///tmp/postgres10/postgresql10-contrib-10.4-1PGDG.rhel7.x86_64.rpm
- file:///tmp/postgres10/postgresql10-devel-10.4-1PGDG.rhel7.x86_64.rpm
- file:///tmp/postgres10/postgresql10-server-10.4-1PGDG.rhel7.x86_64.rpm

postgresql_version: 10
postgresql_data_dir: /var/lib/pgsql/10/data
postgresql_bin_path: /usr/pgsql-10/bin
postgresql_config_path: /var/lib/pgsql/10/data
postgresql_daemon: postgresql-10.service
postgresql_packages:
- postgresql10-libs
- postgresql10
- postgresql10-server
- postgresql10-contrib
# - postgresql10-devel

postgresql_hba_entries:
- { type: local, database: all, user: postgres, auth_method: peer }
- { type: local, database: all, user: all, auth_method: peer }
- { type: host, database: all, user: all, address: '127.0.0.1/32',
auth_method: md5 }
- { type: host, database: all, user: all, address: '::1/128', auth_method:
md5 }
- { type: host, database: all, user: all, address: '0.0.0.0/0', auth_method:
md5 }
- { type: host, database: all, user: all, address: '::0/0', auth_method: md5
}

postgresql_global_config_options:
- option: unix_socket_directories
value: '{{ postgresql_unix_socket_directories | join(",") }}'

- option: standard_conforming_strings
value: 'on'

- option: shared_buffers
value: '1024MB'

# max_wal_size = (3 * checkpoint_segments) * 16MB
# checkpoint_segments=300
- option: max_wal_size
value: '14400MB'

- option: min_wal_size
value: '80MB'

- option: maintenance_work_mem
value: '2MB'

- option: listen_addresses
value: '*'

- option: max_connections
value: '400'

- option: checkpoint_timeout
value: '900'

- option: datestyle
value: "iso, mdy"

- option: autovacuum
value: 'off'

# vacuum all databases every night (full vacuum on Sunday night, lazy
vacuum every night)
- name: add postgresql cron lazy vacuum
cron:
name: lazy_vacuum
hour: 8
minute: 0
job: "su - postgres -c 'vacuumdb --all --analyze --quiet'"
- name: add postgresql cron full vacuum
cron:
name: full_vacuum
weekday: 0
hour: 10
minute: 0
job: "su - postgres -c 'vacuumdb --all --full --analyze --quiet'"
# re-index all databases once a week
- name: add postgresql cron reindex
cron:
name: reindex
weekday: 0
hour: 12
minute: 0
job: "su - postgres -c 'psql -t -c \"select datname from pg_database order
by datname;\" | xargs -n 1 -I\"{}\" -- psql -U postgres {} -c \"reindex
database {};\"' "


This is how I run 2.10.
Been running fine for some weeks without user intervention.
@Karl: Any comments please?
Steph

Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
I'll publish them in a bit.

Re: PostgreSQL version to support MCF v2.10

Posted by Olivier Tavard <ol...@francelabs.com>.
Hi,

I am also interested by the migration to PostgreSQL 10.
@Steph it would be nice if you could precise the changes you did in the configuration file when you wrote :
"I'm using 10.4 with no issues. 

One or two of the recommended settings for MCF have changed between 9.6 and 10. 

Simple to resolve though.
"

Thanks !
Best regards,

Olivier


> Le 6 août 2018 à 15:52, Karl Wright <da...@gmail.com> a écrit :
> 
> It is what is expected with multiple threads active at the same time.
> Karl
> 
> 
> On Mon, Aug 6, 2018 at 7:26 AM Standen Guy <Guy.Standen@uk.fujitsu.com <ma...@uk.fujitsu.com>> wrote:
> Hi Karl,
> 
> I haven’t experienced any job aborts, so all seems OK in that respect.
> 
> Is there anything I can do to reduce these errors in the first place, or it is just to be expected with the nature of the multiple worker threads and the query types issued by ManifoldCF?
> 
> Best Regards,
> 
>  
> 
> Guy
> 
>  
> 
> From: Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com>] 
> Sent: 06 August 2018 12:16
> To: user@manifoldcf.apache.org <ma...@manifoldcf.apache.org>
> Subject: Re: PostgreSQL version to support MCF v2.10
> 
>  
> 
> These are exactly the same kind of issue as the first "error" reported.  They will be retried.  If they did not get retried, they would abort the job immediately.
> 
>  
> 
> Karl
> 
>  
> 
>  
> 
> On Mon, Aug 6, 2018 at 6:57 AM Standen Guy <Guy.Standen@uk.fujitsu.com <ma...@uk.fujitsu.com>> wrote:
> 
> Hi Karl,
> 
>                Thanks for the prompt response regarding the first  error example.   Do you have a view as to second error  i.e.
> 
> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due to concurrent update
> 
> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
> 
> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due to concurrent update
> 
> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
> 
> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due to concurrent update
> 
> “
> 
>  
> 
> These errors don’t suggest a retry may sort them out  - is this an issue?
> 
>  
> 
> Many Thanks,
> 
>  
> 
> Guy
> 
>  
> 
> From: Karl Wright [mailto:daddywri@gmail.com <ma...@gmail.com>] 
> Sent: 06 August 2018 10:52
> To: user@manifoldcf.apache.org <ma...@manifoldcf.apache.org>
> Subject: Re: PostgreSQL version to support MCF v2.10
> 
>  
> 
> Ah, the following errors:
> 
> >>>>>>
> 
> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
> 
> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
> 
> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if retried.
> 
> <<<<<< 
> 
>  
> 
> ... occur because of concurrent transactions.  The transaction is indeed retried when this occurs, so unless your job aborts, you are fine.
> 
>  
> 
> Karl
> 
>  
> 
>  
> 
> On Mon, Aug 6, 2018 at 5:49 AM Karl Wright <daddywri@gmail.com <ma...@gmail.com>> wrote:
> 
> What errors are these?  Please include them and I can let you know.
> 
>  
> 
> Karl
> 
>  
> 
>  
> 
> On Mon, Aug 6, 2018 at 4:50 AM Standen Guy <Guy.Standen@uk.fujitsu.com <ma...@uk.fujitsu.com>> wrote:
> 
> Thank you Karl and Steph,
> 
>  
> 
> Steph, yes I don’t seem to have any issues with running the MCF jobs, but am concerned about the PostgreSQL errors. Do you ( or anyone else)  have a view on the errors I have seen in the PostgreSQL logs  - is this something you have seen with 10.4  and if so was it corrected by changing some settings? 
> 
>  
> 
> Best Regards
> 
>  
> 
> Guy
> 
>  
> 
> From: Steph van Schalkwyk [mailto:steph@remcam.net <ma...@remcam.net>] 
> Sent: 03 August 2018 23:21
> To: user@manifoldcf.apache.org <ma...@manifoldcf.apache.org>
> Subject: Re: PostgreSQL version to support MCF v2.10
> 
>  
> 
> I'm using 10.4 with no issues. 
> 
> One or two of the recommended settings for MCF have changed between 9.6 and 10. 
> 
> Simple to resolve though.
> 
> Steph
> 
>  
> 
> 
> 
>  
> 
> On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright <daddywri@gmail.com <ma...@gmail.com>> wrote:
> 
> Hi Guy,
> 
>  
> 
> I use Postgresql 9.6 myself and have found no issues with it.  I don't know about v 10 however.
> 
>  
> 
> Karl
> 
>  
> 
>  
> 
> On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Guy.Standen@uk.fujitsu.com <ma...@uk.fujitsu.com>> wrote:
> 
> Hi Karl/All,
> 
>                I am upgrading from MCF v2.6  supported by PostgreSQL v 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build and deployment instructions still suggest that PostgreSQL 9.3 is the latest tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of life next month ( Sept 2018), is there a preferred newer version that should be used?
> 
>  
> 
> As an experiment I have installed MCF 2.10  supported by PostgreSQL 10.4.  From the outside all seems to work OK, but investigation of the PostgreSQL  logs shows a lot of errors:
> 
>  
> 
> e.g.
> 
> “2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down at 2018-08-03 15:47:30 BST
> 
> 2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to accept connections
> 
> 2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a transaction in progress
> 
> 2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in progress
> 
> 2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a transaction in progress
> 
> 2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in progress
> 
> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
> 
> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
> 
> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if retried.
> 
> 2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
> 
> 2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due to read/write dependencies among transactions
> 
> 2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
> 
> 2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed if retried.
> 
> 2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
> 
> 2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due to read/write dependencies among transactions
> 
> 2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
> 
> 2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed if retried.
> 
> 2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
> 
> 2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
> 
> 2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
> 
> 2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed if retried.
> 
> 2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5 AND childidhash=$6
> 
> 2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due to read/write dependencies among transactions
> 
> 2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
> 
> 2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed if retried.”
> 
>  
> 
> And
> 
>  
> 
> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due to concurrent update
> 
> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
> 
> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due to concurrent update
> 
> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
> 
> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due to concurrent update
> 
> 2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
> 
> 2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due to concurrent update
> 
> 2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
> 
> 2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due to concurrent update
> 
> 2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”
> 
>  
> 
> Do you have any advice as to whether it is sensible to use PostgreSQL v10.x   and if so can these errors be overcome?
> 
>  
> 
> Best Regards,
> 
>  
> 
> Guy
> 
> 
> Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW <https://maps.google.com/?q=22+Baker+Street,+London+W1U+3BW&entry=gmail&source=g>; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE. 
> This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.
> 
>  
> 
> 
> Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE. 
> This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.
> 
> 
> Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE. 
> This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.
> 
> 
> Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE. 
> This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.


Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
It is what is expected with multiple threads active at the same time.
Karl


On Mon, Aug 6, 2018 at 7:26 AM Standen Guy <Gu...@uk.fujitsu.com>
wrote:

> Hi Karl,
>
> I haven’t experienced any job aborts, so all seems OK in that respect.
>
> Is there anything I can do to reduce these errors in the first place, or
> it is just to be expected with the nature of the multiple worker threads
> and the query types issued by ManifoldCF?
>
> Best Regards,
>
>
>
> Guy
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* 06 August 2018 12:16
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: PostgreSQL version to support MCF v2.10
>
>
>
> These are exactly the same kind of issue as the first "error" reported.
> They will be retried.  If they did not get retried, they would abort the
> job immediately.
>
>
>
> Karl
>
>
>
>
>
> On Mon, Aug 6, 2018 at 6:57 AM Standen Guy <Gu...@uk.fujitsu.com>
> wrote:
>
> Hi Karl,
>
>                Thanks for the prompt response regarding the first  error
> example.   Do you have a view as to second error  i.e.
>
> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access
> due to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due
> to concurrent update
>
> “
>
>
>
> These errors don’t suggest a retry may sort them out  - is this an issue?
>
>
>
> Many Thanks,
>
>
>
> Guy
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* 06 August 2018 10:52
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: PostgreSQL version to support MCF v2.10
>
>
>
> Ah, the following errors:
>
> >>>>>>
>
> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> <<<<<<
>
>
>
> ... occur because of concurrent transactions.  The transaction is indeed
> retried when this occurs, so unless your job aborts, you are fine.
>
>
>
> Karl
>
>
>
>
>
> On Mon, Aug 6, 2018 at 5:49 AM Karl Wright <da...@gmail.com> wrote:
>
> What errors are these?  Please include them and I can let you know.
>
>
>
> Karl
>
>
>
>
>
> On Mon, Aug 6, 2018 at 4:50 AM Standen Guy <Gu...@uk.fujitsu.com>
> wrote:
>
> Thank you Karl and Steph,
>
>
>
> Steph, yes I don’t seem to have any issues with running the MCF jobs, but
> am concerned about the PostgreSQL errors. Do you ( or anyone else)  have a
> view on the errors I have seen in the PostgreSQL logs  - is this something
> you have seen with 10.4  and if so was it corrected by changing some
> settings?
>
>
>
> Best Regards
>
>
>
> Guy
>
>
>
> *From:* Steph van Schalkwyk [mailto:steph@remcam.net]
> *Sent:* 03 August 2018 23:21
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: PostgreSQL version to support MCF v2.10
>
>
>
> I'm using 10.4 with no issues.
>
> One or two of the recommended settings for MCF have changed between 9.6
> and 10.
>
> Simple to resolve though.
>
> Steph
>
>
>
>
>
>
> On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright <da...@gmail.com> wrote:
>
> Hi Guy,
>
>
>
> I use Postgresql 9.6 myself and have found no issues with it.  I don't
> know about v 10 however.
>
>
>
> Karl
>
>
>
>
>
> On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Gu...@uk.fujitsu.com>
> wrote:
>
> Hi Karl/All,
>
>                I am upgrading from MCF v2.6  supported by PostgreSQL v
> 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to
> which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build
> and deployment instructions still suggest that PostgreSQL 9.3 is the latest
> tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of
> life next month ( Sept 2018), is there a preferred newer version that
> should be used?
>
>
>
> As an experiment I have installed MCF 2.10  supported by PostgreSQL 10.4.
> From the outside all seems to work OK, but investigation of the PostgreSQL
> logs shows a lot of errors:
>
>
>
> e.g.
>
> “2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down at
> 2018-08-03 15:47:30 BST
>
> 2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to
> accept connections
>
> 2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a
> transaction in progress
>
> 2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in
> progress
>
> 2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a
> transaction in progress
>
> 2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in
> progress
>
> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during write.
>
> 2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET
> processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5
> AND childidhash=$6
>
> 2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on
> identification as a pivot, during write.
>
> 2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed if
> retried.”
>
>
>
> And
>
>
>
> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”
>
>
>
> Do you have any advice as to whether it is sensible to use PostgreSQL
> v10.x   and if so can these errors be overcome?
>
>
>
> Best Regards,
>
>
>
> Guy
>
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW
> <https://maps.google.com/?q=22+Baker+Street,+London+W1U+3BW&entry=gmail&source=g>;
> PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu
> Laboratories of Europe Limited (registered in England No. 4153469) both
> with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>
>
>
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and
> Fujitsu Laboratories of Europe Limited (registered in England No. 4153469)
> both with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and
> Fujitsu Laboratories of Europe Limited (registered in England No. 4153469)
> both with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and
> Fujitsu Laboratories of Europe Limited (registered in England No. 4153469)
> both with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>

RE: PostgreSQL version to support MCF v2.10

Posted by Standen Guy <Gu...@uk.fujitsu.com>.
Hi Karl,
I haven’t experienced any job aborts, so all seems OK in that respect.
Is there anything I can do to reduce these errors in the first place, or it is just to be expected with the nature of the multiple worker threads and the query types issued by ManifoldCF?
Best Regards,

Guy

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: 06 August 2018 12:16
To: user@manifoldcf.apache.org
Subject: Re: PostgreSQL version to support MCF v2.10

These are exactly the same kind of issue as the first "error" reported.  They will be retried.  If they did not get retried, they would abort the job immediately.

Karl


On Mon, Aug 6, 2018 at 6:57 AM Standen Guy <Gu...@uk.fujitsu.com>> wrote:
Hi Karl,
               Thanks for the prompt response regarding the first  error example.   Do you have a view as to second error  i.e.
“2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due to concurrent update
“

These errors don’t suggest a retry may sort them out  - is this an issue?

Many Thanks,

Guy

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: 06 August 2018 10:52
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: PostgreSQL version to support MCF v2.10

Ah, the following errors:

>>>>>>
2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if retried.
<<<<<<

... occur because of concurrent transactions.  The transaction is indeed retried when this occurs, so unless your job aborts, you are fine.

Karl


On Mon, Aug 6, 2018 at 5:49 AM Karl Wright <da...@gmail.com>> wrote:
What errors are these?  Please include them and I can let you know.

Karl


On Mon, Aug 6, 2018 at 4:50 AM Standen Guy <Gu...@uk.fujitsu.com>> wrote:
Thank you Karl and Steph,

Steph, yes I don’t seem to have any issues with running the MCF jobs, but am concerned about the PostgreSQL errors. Do you ( or anyone else)  have a view on the errors I have seen in the PostgreSQL logs  - is this something you have seen with 10.4  and if so was it corrected by changing some settings?

Best Regards

Guy

From: Steph van Schalkwyk [mailto:steph@remcam.net<ma...@remcam.net>]
Sent: 03 August 2018 23:21
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: PostgreSQL version to support MCF v2.10

I'm using 10.4 with no issues.
One or two of the recommended settings for MCF have changed between 9.6 and 10.
Simple to resolve though.
Steph



On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright <da...@gmail.com>> wrote:
Hi Guy,

I use Postgresql 9.6 myself and have found no issues with it.  I don't know about v 10 however.

Karl


On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Gu...@uk.fujitsu.com>> wrote:
Hi Karl/All,
               I am upgrading from MCF v2.6  supported by PostgreSQL v 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build and deployment instructions still suggest that PostgreSQL 9.3 is the latest tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of life next month ( Sept 2018), is there a preferred newer version that should be used?

As an experiment I have installed MCF 2.10  supported by PostgreSQL 10.4.  From the outside all seems to work OK, but investigation of the PostgreSQL  logs shows a lot of errors:

e.g.
“2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down at 2018-08-03 15:47:30 BST
2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to accept connections
2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a transaction in progress
2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in progress
2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a transaction in progress
2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in progress
2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5 AND childidhash=$6
2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed if retried.”

And

“2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”

Do you have any advice as to whether it is sensible to use PostgreSQL v10.x   and if so can these errors be overcome?

Best Regards,

Guy

Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW<https://maps.google.com/?q=22+Baker+Street,+London+W1U+3BW&entry=gmail&source=g>; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE.
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.


Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE.
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.

Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE.
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.

Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW;  PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE. 
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
These are exactly the same kind of issue as the first "error" reported.
They will be retried.  If they did not get retried, they would abort the
job immediately.

Karl


On Mon, Aug 6, 2018 at 6:57 AM Standen Guy <Gu...@uk.fujitsu.com>
wrote:

> Hi Karl,
>
>                Thanks for the prompt response regarding the first  error
> example.   Do you have a view as to second error  i.e.
>
> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access
> due to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due
> to concurrent update
>
> “
>
>
>
> These errors don’t suggest a retry may sort them out  - is this an issue?
>
>
>
> Many Thanks,
>
>
>
> Guy
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* 06 August 2018 10:52
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: PostgreSQL version to support MCF v2.10
>
>
>
> Ah, the following errors:
>
> >>>>>>
>
> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> <<<<<<
>
>
>
> ... occur because of concurrent transactions.  The transaction is indeed
> retried when this occurs, so unless your job aborts, you are fine.
>
>
>
> Karl
>
>
>
>
>
> On Mon, Aug 6, 2018 at 5:49 AM Karl Wright <da...@gmail.com> wrote:
>
> What errors are these?  Please include them and I can let you know.
>
>
>
> Karl
>
>
>
>
>
> On Mon, Aug 6, 2018 at 4:50 AM Standen Guy <Gu...@uk.fujitsu.com>
> wrote:
>
> Thank you Karl and Steph,
>
>
>
> Steph, yes I don’t seem to have any issues with running the MCF jobs, but
> am concerned about the PostgreSQL errors. Do you ( or anyone else)  have a
> view on the errors I have seen in the PostgreSQL logs  - is this something
> you have seen with 10.4  and if so was it corrected by changing some
> settings?
>
>
>
> Best Regards
>
>
>
> Guy
>
>
>
> *From:* Steph van Schalkwyk [mailto:steph@remcam.net]
> *Sent:* 03 August 2018 23:21
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: PostgreSQL version to support MCF v2.10
>
>
>
> I'm using 10.4 with no issues.
>
> One or two of the recommended settings for MCF have changed between 9.6
> and 10.
>
> Simple to resolve though.
>
> Steph
>
>
>
>
>
>
> On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright <da...@gmail.com> wrote:
>
> Hi Guy,
>
>
>
> I use Postgresql 9.6 myself and have found no issues with it.  I don't
> know about v 10 however.
>
>
>
> Karl
>
>
>
>
>
> On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Gu...@uk.fujitsu.com>
> wrote:
>
> Hi Karl/All,
>
>                I am upgrading from MCF v2.6  supported by PostgreSQL v
> 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to
> which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build
> and deployment instructions still suggest that PostgreSQL 9.3 is the latest
> tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of
> life next month ( Sept 2018), is there a preferred newer version that
> should be used?
>
>
>
> As an experiment I have installed MCF 2.10  supported by PostgreSQL 10.4.
> From the outside all seems to work OK, but investigation of the PostgreSQL
> logs shows a lot of errors:
>
>
>
> e.g.
>
> “2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down at
> 2018-08-03 15:47:30 BST
>
> 2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to
> accept connections
>
> 2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a
> transaction in progress
>
> 2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in
> progress
>
> 2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a
> transaction in progress
>
> 2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in
> progress
>
> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during write.
>
> 2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET
> processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5
> AND childidhash=$6
>
> 2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on
> identification as a pivot, during write.
>
> 2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed if
> retried.”
>
>
>
> And
>
>
>
> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”
>
>
>
> Do you have any advice as to whether it is sensible to use PostgreSQL
> v10.x   and if so can these errors be overcome?
>
>
>
> Best Regards,
>
>
>
> Guy
>
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW
> <https://maps.google.com/?q=22+Baker+Street,+London+W1U+3BW&entry=gmail&source=g>;
> PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu
> Laboratories of Europe Limited (registered in England No. 4153469) both
> with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>
>
>
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and
> Fujitsu Laboratories of Europe Limited (registered in England No. 4153469)
> both with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and
> Fujitsu Laboratories of Europe Limited (registered in England No. 4153469)
> both with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>

RE: PostgreSQL version to support MCF v2.10

Posted by Standen Guy <Gu...@uk.fujitsu.com>.
Hi Karl,
               Thanks for the prompt response regarding the first  error example.   Do you have a view as to second error  i.e.
“2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due to concurrent update
“

These errors don’t suggest a retry may sort them out  - is this an issue?

Many Thanks,

Guy

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: 06 August 2018 10:52
To: user@manifoldcf.apache.org
Subject: Re: PostgreSQL version to support MCF v2.10

Ah, the following errors:

>>>>>>
2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if retried.
<<<<<<

... occur because of concurrent transactions.  The transaction is indeed retried when this occurs, so unless your job aborts, you are fine.

Karl


On Mon, Aug 6, 2018 at 5:49 AM Karl Wright <da...@gmail.com>> wrote:
What errors are these?  Please include them and I can let you know.

Karl


On Mon, Aug 6, 2018 at 4:50 AM Standen Guy <Gu...@uk.fujitsu.com>> wrote:
Thank you Karl and Steph,

Steph, yes I don’t seem to have any issues with running the MCF jobs, but am concerned about the PostgreSQL errors. Do you ( or anyone else)  have a view on the errors I have seen in the PostgreSQL logs  - is this something you have seen with 10.4  and if so was it corrected by changing some settings?

Best Regards

Guy

From: Steph van Schalkwyk [mailto:steph@remcam.net<ma...@remcam.net>]
Sent: 03 August 2018 23:21
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: PostgreSQL version to support MCF v2.10

I'm using 10.4 with no issues.
One or two of the recommended settings for MCF have changed between 9.6 and 10.
Simple to resolve though.
Steph



On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright <da...@gmail.com>> wrote:
Hi Guy,

I use Postgresql 9.6 myself and have found no issues with it.  I don't know about v 10 however.

Karl


On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Gu...@uk.fujitsu.com>> wrote:
Hi Karl/All,
               I am upgrading from MCF v2.6  supported by PostgreSQL v 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build and deployment instructions still suggest that PostgreSQL 9.3 is the latest tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of life next month ( Sept 2018), is there a preferred newer version that should be used?

As an experiment I have installed MCF 2.10  supported by PostgreSQL 10.4.  From the outside all seems to work OK, but investigation of the PostgreSQL  logs shows a lot of errors:

e.g.
“2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down at 2018-08-03 15:47:30 BST
2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to accept connections
2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a transaction in progress
2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in progress
2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a transaction in progress
2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in progress
2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5 AND childidhash=$6
2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed if retried.”

And

“2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”

Do you have any advice as to whether it is sensible to use PostgreSQL v10.x   and if so can these errors be overcome?

Best Regards,

Guy

Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW<https://maps.google.com/?q=22+Baker+Street,+London+W1U+3BW&entry=gmail&source=g>; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE.
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.


Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE.
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.

Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW;  PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE. 
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
Ah, the following errors:

>>>>>>

2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due
to read/write dependencies among transactions

2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on
identification as a pivot, during conflict in checking.

2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if
retried.

<<<<<<


... occur because of concurrent transactions.  The transaction is indeed
retried when this occurs, so unless your job aborts, you are fine.


Karl



On Mon, Aug 6, 2018 at 5:49 AM Karl Wright <da...@gmail.com> wrote:

> What errors are these?  Please include them and I can let you know.
>
> Karl
>
>
> On Mon, Aug 6, 2018 at 4:50 AM Standen Guy <Gu...@uk.fujitsu.com>
> wrote:
>
>> Thank you Karl and Steph,
>>
>>
>>
>> Steph, yes I don’t seem to have any issues with running the MCF jobs, but
>> am concerned about the PostgreSQL errors. Do you ( or anyone else)  have a
>> view on the errors I have seen in the PostgreSQL logs  - is this something
>> you have seen with 10.4  and if so was it corrected by changing some
>> settings?
>>
>>
>>
>> Best Regards
>>
>>
>>
>> Guy
>>
>>
>>
>> *From:* Steph van Schalkwyk [mailto:steph@remcam.net]
>> *Sent:* 03 August 2018 23:21
>> *To:* user@manifoldcf.apache.org
>> *Subject:* Re: PostgreSQL version to support MCF v2.10
>>
>>
>>
>> I'm using 10.4 with no issues.
>>
>> One or two of the recommended settings for MCF have changed between 9.6
>> and 10.
>>
>> Simple to resolve though.
>>
>> Steph
>>
>>
>>
>>
>>
>>
>> On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright <da...@gmail.com> wrote:
>>
>> Hi Guy,
>>
>>
>>
>> I use Postgresql 9.6 myself and have found no issues with it.  I don't
>> know about v 10 however.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Gu...@uk.fujitsu.com>
>> wrote:
>>
>> Hi Karl/All,
>>
>>                I am upgrading from MCF v2.6  supported by PostgreSQL v
>> 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to
>> which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build
>> and deployment instructions still suggest that PostgreSQL 9.3 is the latest
>> tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of
>> life next month ( Sept 2018), is there a preferred newer version that
>> should be used?
>>
>>
>>
>> As an experiment I have installed MCF 2.10  supported by PostgreSQL
>> 10.4.  From the outside all seems to work OK, but investigation of the
>> PostgreSQL  logs shows a lot of errors:
>>
>>
>>
>> e.g.
>>
>> “2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down
>> at 2018-08-03 15:47:30 BST
>>
>> 2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to
>> accept connections
>>
>> 2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a
>> transaction in progress
>>
>> 2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in
>> progress
>>
>> 2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a
>> transaction in progress
>>
>> 2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in
>> progress
>>
>> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during conflict in checking.
>>
>> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed
>> if retried.
>>
>> 2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue
>> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
>> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>>
>> 2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during conflict in checking.
>>
>> 2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed
>> if retried.
>>
>> 2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue
>> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
>> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>>
>> 2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during conflict in checking.
>>
>> 2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed
>> if retried.
>>
>> 2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue
>> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
>> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>>
>> 2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during write.
>>
>> 2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed
>> if retried.
>>
>> 2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET
>> processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5
>> AND childidhash=$6
>>
>> 2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during write.
>>
>> 2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed
>> if retried.”
>>
>>
>>
>> And
>>
>>
>>
>> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access
>> due to concurrent update
>>
>> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>>
>> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due
>> to concurrent update
>>
>> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>>
>> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due
>> to concurrent update
>>
>> 2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>>
>> 2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due
>> to concurrent update
>>
>> 2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>>
>> 2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due
>> to concurrent update
>>
>> 2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”
>>
>>
>>
>> Do you have any advice as to whether it is sensible to use PostgreSQL
>> v10.x   and if so can these errors be overcome?
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> Guy
>>
>>
>> Unless otherwise stated, this email has been sent from Fujitsu Services
>> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
>> England No 2216100) both with registered offices at: 22 Baker Street,
>> London W1U 3BW
>> <https://maps.google.com/?q=22+Baker+Street,+London+W1U+3BW&entry=gmail&source=g>;
>> PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu
>> Laboratories of Europe Limited (registered in England No. 4153469) both
>> with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
>> Middlesex, UB4 8FE.
>> This email is only for the use of its intended recipient. Its contents
>> are subject to a duty of confidence and may be privileged. Fujitsu does not
>> guarantee that this email has not been intercepted and amended or that it
>> is virus-free.
>>
>>
>>
>> Unless otherwise stated, this email has been sent from Fujitsu Services
>> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
>> England No 2216100) both with registered offices at: 22 Baker Street,
>> London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and
>> Fujitsu Laboratories of Europe Limited (registered in England No. 4153469)
>> both with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
>> Middlesex, UB4 8FE.
>> This email is only for the use of its intended recipient. Its contents
>> are subject to a duty of confidence and may be privileged. Fujitsu does not
>> guarantee that this email has not been intercepted and amended or that it
>> is virus-free.
>>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
What errors are these?  Please include them and I can let you know.

Karl


On Mon, Aug 6, 2018 at 4:50 AM Standen Guy <Gu...@uk.fujitsu.com>
wrote:

> Thank you Karl and Steph,
>
>
>
> Steph, yes I don’t seem to have any issues with running the MCF jobs, but
> am concerned about the PostgreSQL errors. Do you ( or anyone else)  have a
> view on the errors I have seen in the PostgreSQL logs  - is this something
> you have seen with 10.4  and if so was it corrected by changing some
> settings?
>
>
>
> Best Regards
>
>
>
> Guy
>
>
>
> *From:* Steph van Schalkwyk [mailto:steph@remcam.net]
> *Sent:* 03 August 2018 23:21
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: PostgreSQL version to support MCF v2.10
>
>
>
> I'm using 10.4 with no issues.
>
> One or two of the recommended settings for MCF have changed between 9.6
> and 10.
>
> Simple to resolve though.
>
> Steph
>
>
>
>
>
>
> On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright <da...@gmail.com> wrote:
>
> Hi Guy,
>
>
>
> I use Postgresql 9.6 myself and have found no issues with it.  I don't
> know about v 10 however.
>
>
>
> Karl
>
>
>
>
>
> On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Gu...@uk.fujitsu.com>
> wrote:
>
> Hi Karl/All,
>
>                I am upgrading from MCF v2.6  supported by PostgreSQL v
> 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to
> which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build
> and deployment instructions still suggest that PostgreSQL 9.3 is the latest
> tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of
> life next month ( Sept 2018), is there a preferred newer version that
> should be used?
>
>
>
> As an experiment I have installed MCF 2.10  supported by PostgreSQL 10.4.
> From the outside all seems to work OK, but investigation of the PostgreSQL
> logs shows a lot of errors:
>
>
>
> e.g.
>
> “2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down at
> 2018-08-03 15:47:30 BST
>
> 2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to
> accept connections
>
> 2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a
> transaction in progress
>
> 2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in
> progress
>
> 2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a
> transaction in progress
>
> 2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in
> progress
>
> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during write.
>
> 2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET
> processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5
> AND childidhash=$6
>
> 2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on
> identification as a pivot, during write.
>
> 2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed if
> retried.”
>
>
>
> And
>
>
>
> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”
>
>
>
> Do you have any advice as to whether it is sensible to use PostgreSQL
> v10.x   and if so can these errors be overcome?
>
>
>
> Best Regards,
>
>
>
> Guy
>
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW
> <https://maps.google.com/?q=22+Baker+Street,+London+W1U+3BW&entry=gmail&source=g>;
> PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu
> Laboratories of Europe Limited (registered in England No. 4153469) both
> with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>
>
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and
> Fujitsu Laboratories of Europe Limited (registered in England No. 4153469)
> both with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>

RE: PostgreSQL version to support MCF v2.10

Posted by Standen Guy <Gu...@uk.fujitsu.com>.
Thank you Karl and Steph,

Steph, yes I don’t seem to have any issues with running the MCF jobs, but am concerned about the PostgreSQL errors. Do you ( or anyone else)  have a view on the errors I have seen in the PostgreSQL logs  - is this something you have seen with 10.4  and if so was it corrected by changing some settings?

Best Regards

Guy

From: Steph van Schalkwyk [mailto:steph@remcam.net]
Sent: 03 August 2018 23:21
To: user@manifoldcf.apache.org
Subject: Re: PostgreSQL version to support MCF v2.10

I'm using 10.4 with no issues.
One or two of the recommended settings for MCF have changed between 9.6 and 10.
Simple to resolve though.
Steph



On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright <da...@gmail.com>> wrote:
Hi Guy,

I use Postgresql 9.6 myself and have found no issues with it.  I don't know about v 10 however.

Karl


On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Gu...@uk.fujitsu.com>> wrote:
Hi Karl/All,
               I am upgrading from MCF v2.6  supported by PostgreSQL v 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build and deployment instructions still suggest that PostgreSQL 9.3 is the latest tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of life next month ( Sept 2018), is there a preferred newer version that should be used?

As an experiment I have installed MCF 2.10  supported by PostgreSQL 10.4.  From the outside all seems to work OK, but investigation of the PostgreSQL  logs shows a lot of errors:

e.g.
“2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down at 2018-08-03 15:47:30 BST
2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to accept connections
2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a transaction in progress
2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in progress
2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a transaction in progress
2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in progress
2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict in checking.
2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed if retried.
2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5 AND childidhash=$6
2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due to read/write dependencies among transactions
2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed if retried.”

And

“2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due to concurrent update
2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”

Do you have any advice as to whether it is sensible to use PostgreSQL v10.x   and if so can these errors be overcome?

Best Regards,

Guy

Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW<https://maps.google.com/?q=22+Baker+Street,+London+W1U+3BW&entry=gmail&source=g>; PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE.
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.


Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered offices at: 22 Baker Street, London W1U 3BW;  PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE. 
This email is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu does not guarantee that this email has not been intercepted and amended or that it is virus-free.

Re: PostgreSQL version to support MCF v2.10

Posted by Steph van Schalkwyk <st...@remcam.net>.
I'm using 10.4 with no issues.
One or two of the recommended settings for MCF have changed between 9.6 and
10.
Simple to resolve though.
Steph



On Fri, Aug 3, 2018 at 1:29 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Guy,
>
> I use Postgresql 9.6 myself and have found no issues with it.  I don't
> know about v 10 however.
>
> Karl
>
>
> On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Gu...@uk.fujitsu.com>
> wrote:
>
>> Hi Karl/All,
>>
>>                I am upgrading from MCF v2.6  supported by PostgreSQL v
>> 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to
>> which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build
>> and deployment instructions still suggest that PostgreSQL 9.3 is the latest
>> tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of
>> life next month ( Sept 2018), is there a preferred newer version that
>> should be used?
>>
>>
>>
>> As an experiment I have installed MCF 2.10  supported by PostgreSQL
>> 10.4.  From the outside all seems to work OK, but investigation of the
>> PostgreSQL  logs shows a lot of errors:
>>
>>
>>
>> e.g.
>>
>> “2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down
>> at 2018-08-03 15:47:30 BST
>>
>> 2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to
>> accept connections
>>
>> 2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a
>> transaction in progress
>>
>> 2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in
>> progress
>>
>> 2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a
>> transaction in progress
>>
>> 2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in
>> progress
>>
>> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during conflict in checking.
>>
>> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed
>> if retried.
>>
>> 2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue
>> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
>> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>>
>> 2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during conflict in checking.
>>
>> 2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed
>> if retried.
>>
>> 2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue
>> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
>> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>>
>> 2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during conflict in checking.
>>
>> 2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed
>> if retried.
>>
>> 2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue
>> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
>> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>>
>> 2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during write.
>>
>> 2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed
>> if retried.
>>
>> 2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET
>> processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5
>> AND childidhash=$6
>>
>> 2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due
>> to read/write dependencies among transactions
>>
>> 2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on
>> identification as a pivot, during write.
>>
>> 2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed
>> if retried.”
>>
>>
>>
>> And
>>
>>
>>
>> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access
>> due to concurrent update
>>
>> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>>
>> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due
>> to concurrent update
>>
>> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>>
>> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due
>> to concurrent update
>>
>> 2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>>
>> 2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due
>> to concurrent update
>>
>> 2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>>
>> 2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due
>> to concurrent update
>>
>> 2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime
>> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”
>>
>>
>>
>> Do you have any advice as to whether it is sensible to use PostgreSQL
>> v10.x   and if so can these errors be overcome?
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> Guy
>>
>> Unless otherwise stated, this email has been sent from Fujitsu Services
>> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
>> England No 2216100) both with registered offices at: 22 Baker Street,
>> London W1U 3BW
>> <https://maps.google.com/?q=22+Baker+Street,+London+W1U+3BW&entry=gmail&source=g>;
>> PFU (EMEA) Limited, (registered in England No 1578652) and Fujitsu
>> Laboratories of Europe Limited (registered in England No. 4153469) both
>> with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
>> Middlesex, UB4 8FE.
>> This email is only for the use of its intended recipient. Its contents
>> are subject to a duty of confidence and may be privileged. Fujitsu does not
>> guarantee that this email has not been intercepted and amended or that it
>> is virus-free.
>>
>

Re: PostgreSQL version to support MCF v2.10

Posted by Karl Wright <da...@gmail.com>.
Hi Guy,

I use Postgresql 9.6 myself and have found no issues with it.  I don't know
about v 10 however.

Karl


On Fri, Aug 3, 2018 at 11:32 AM Standen Guy <Gu...@uk.fujitsu.com>
wrote:

> Hi Karl/All,
>
>                I am upgrading from MCF v2.6  supported by PostgreSQL v
> 9.3.16   to  MCF v2.10.  I wonder if there is any official advice as to
> which version of PostgreSQL  will support  MCF v2.10? The  MCF v2.10 build
> and deployment instructions still suggest that PostgreSQL 9.3 is the latest
> tested version of PostgreSQL.  Given that PostgreSQL 9.3.x  is going end of
> life next month ( Sept 2018), is there a preferred newer version that
> should be used?
>
>
>
> As an experiment I have installed MCF 2.10  supported by PostgreSQL 10.4.
> From the outside all seems to work OK, but investigation of the PostgreSQL
> logs shows a lot of errors:
>
>
>
> e.g.
>
> “2018-08-03 15:50:00.629 BST [7920] LOG:  database system was shut down at
> 2018-08-03 15:47:30 BST
>
> 2018-08-03 15:50:00.734 BST [6344] LOG:  database system is ready to
> accept connections
>
> 2018-08-03 15:52:11.140 BST [6460] WARNING:  there is already a
> transaction in progress
>
> 2018-08-03 15:52:11.219 BST [6460] WARNING:  there is no transaction in
> progress
>
> 2018-08-03 15:52:13.844 BST [5716] WARNING:  there is already a
> transaction in progress
>
> 2018-08-03 15:52:13.879 BST [5716] WARNING:  there is no transaction in
> progress
>
> 2018-08-03 15:52:25.218 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.218 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.218 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.218 BST [4140] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:25.219 BST [5800] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.219 BST [5800] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.219 BST [5800] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.219 BST [5800] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:25.222 BST [5692] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:25.222 BST [5692] DETAIL:  Reason code: Canceled on
> identification as a pivot, during conflict in checking.
>
> 2018-08-03 15:52:25.222 BST [5692] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:25.222 BST [5692] STATEMENT:  INSERT INTO jobqueue
> (jobid,docpriority,checktime,docid,needpriority,dochash,id,checkaction,status)
> VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9)
>
> 2018-08-03 15:52:28.149 BST [4140] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:28.149 BST [4140] DETAIL:  Reason code: Canceled on
> identification as a pivot, during write.
>
> 2018-08-03 15:52:28.149 BST [4140] HINT:  The transaction might succeed if
> retried.
>
> 2018-08-03 15:52:28.149 BST [4140] STATEMENT:  UPDATE intrinsiclink SET
> processid=$1,isnew=$2 WHERE jobid=$3 AND parentidhash=$4 AND linktype=$5
> AND childidhash=$6
>
> 2018-08-03 15:52:28.261 BST [5156] ERROR:  could not serialize access due
> to read/write dependencies among transactions
>
> 2018-08-03 15:52:28.261 BST [5156] DETAIL:  Reason code: Canceled on
> identification as a pivot, during write.
>
> 2018-08-03 15:52:28.261 BST [5156] HINT:  The transaction might succeed if
> retried.”
>
>
>
> And
>
>
>
> “2018-08-03 15:52:42.855 BST [5272] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5272] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [7424] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [7424] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.855 BST [5716] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.855 BST [5716] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.856 BST [1328] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.856 BST [1328] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE
>
> 2018-08-03 15:52:42.856 BST [5800] ERROR:  could not serialize access due
> to concurrent update
>
> 2018-08-03 15:52:42.856 BST [5800] STATEMENT:  SELECT id,status,checktime
> FROM jobqueue WHERE dochash=$1 AND jobid=$2 FOR UPDATE”
>
>
>
> Do you have any advice as to whether it is sensible to use PostgreSQL
> v10.x   and if so can these errors be overcome?
>
>
>
> Best Regards,
>
>
>
> Guy
>
> Unless otherwise stated, this email has been sent from Fujitsu Services
> Limited (registered in England No 96056); Fujitsu EMEA PLC (registered in
> England No 2216100) both with registered offices at: 22 Baker Street,
> London W1U 3BW; PFU (EMEA) Limited, (registered in England No 1578652) and
> Fujitsu Laboratories of Europe Limited (registered in England No. 4153469)
> both with registered offices at: Hayes Park Central, Hayes End Road, Hayes,
> Middlesex, UB4 8FE.
> This email is only for the use of its intended recipient. Its contents are
> subject to a duty of confidence and may be privileged. Fujitsu does not
> guarantee that this email has not been intercepted and amended or that it
> is virus-free.
>