You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Olivier Tavard <ol...@francelabs.com> on 2019/05/21 09:00:29 UTC

Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

Hi Markus,

We have the same error (with postgresql database). Did the error occur again since your last mail ?
Did you change something on your MCF configuration to fix this ?

Thanks,
Best regards,

Olivier 


> Le 13 févr. 2019 à 13:58, Markus Schuch <ma...@deutschebahn.com> a écrit :
> 
> Hi Karl,
>  
> we set the diagnostigs logger to level debug.
>  
> I will get back when the error occurs again.
>  
> Cheers,
> Markus
> 
> 
> Pflichtangaben anzeigen <http://www.deutschebahn.com/pflichtangaben/20190211>
> 
> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier: http://www.deutschebahn.com/de/konzern/datenschutz <http://www.deutschebahn.com/de/konzern/datenschutz>
> Le 12 févr. 2019 à 17:41, Karl Wright <da...@gmail.com> a écrit :
> 
> Hi Marcus,
> 
> There's a properties.xml debugging logger you can enable that will keep track of what's happening with transactions, so that when an error of this kinds is reported, information about why the situation is unexpected is dumped to the log.  The logger is called "diagnostics" e.g. "org.apache.manifoldcf.diagnostics".
> 
> Karl
> 
> 
> On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch <markus.schuch@deutschebahn.com <ma...@deutschebahn.com>> wrote:
> Hi,
> 
> we are seeing "Error: Unexpected jobqueue status - record id 1484612513829, expecting active status, saw 4" from time to time.
> I didn’t report it, because we were running on an old MCF version, and there were some bugreports relating to this error that are resolved in newer versions.
> 
> Now we have upgraded to latest and greatest and the error still occurred. So want to start to track this down.
> 
> Out setup is:
> 
> ManifoldCF 2.12, running in a Docker Container based on Redhat Linux, OpenJDK 8
> AWS RDS Database (Aurora MySQL -> 5.6 compatible)
> Single Process Setup
> 
> We run several Jobs (run once mode) over night with schedule windows and max runtime settings. Some of the time windows are overlapping.
> At normal days some Jobs finish during their time window, some go to WAITING and will go on in the next night.
> 
> The error mostly hits at a Sharepoint Repository Job.
> We have the impression, that the error is somehow related to situations, when service interruptions (e.g. connection issues) occur in other jobs.
> 
>  
> 
> Maybe all this is related to fundamental problems with the database or the JDBC driver
> 
> (Karl expressed his concerns about that in the maybe-related https://issues.apache.org/jira/browse/CONNECTORS-1581 <https://issues.apache.org/jira/browse/CONNECTORS-1581> ticket)
> 
> 
> On the other hand, there were similar bugreports in the past concerning unexpected jobqueue status and there were resolved.
> 
> Mayby there is a chance this can also be analyzed and fixed, but I do not really know where to start.
> 
> 
> Has anybody ideas how we can track this down / debug this to get to the bottom of the problem?
> What could be the first step in the analysis?
> 
> Many thanks in advance
> Markus
> 
> May be related bugreports:
> 
>  
> 
> [0] https://issues.apache.org/jira/browse/CONNECTORS-1395 <https://issues.apache.org/jira/browse/CONNECTORS-1395>
> [1] https://issues.apache.org/jira/browse/CONNECTORS-1180 <https://issues.apache.org/jira/browse/CONNECTORS-1180>
> [2] https://issues.apache.org/jira/browse/CONNECTORS-590 <https://issues.apache.org/jira/browse/CONNECTORS-590>
> [3] https://issues.apache.org/jira/browse/CONNECTORS-246 <https://issues.apache.org/jira/browse/CONNECTORS-246>
>  
> 
> --
> 
> Markus Schuch
> 
> Web Business (T.IPB 26)
> 
>  
> 
> DB Systel GmbH
> 
> Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main
> 
> 
> 
> Pflichtangaben anzeigen <http://www.deutschebahn.com/pflichtangaben/20190211>
> 
> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier: http://www.deutschebahn.com/de/konzern/datenschutz <http://www.deutschebahn.com/de/konzern/datenschutz>


Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

Posted by Karl Wright <da...@gmail.com>.
And yes, we'd also need to hand the mysql folks a similar test case.
Karl


On Sat, Jun 8, 2019 at 1:53 AM Karl Wright <da...@gmail.com> wrote:

> Here's an explanation for Postgresql about what is supposed to happen in
> this case.  See slide 7.
>
> https://www.postgresql.org/files/developer/concurrency.pdf
>
> The explanation should apply to all MVCC databases as well.  The sequence
> we're seeing is:
>
> Thread 1:
> <start transaction>
> <do_expensive_stuffer_query>
> <update_jobqueue_status>
> <end transaction>
> <hand records to worker threads>
>
>
> Worker thread:
> <process document>
> <begin transaction>
> <change jobqueue status from active to completed>
> <end transaction
>
> The worker thread, however, is finding that the jobqueue status is not
> what was expected.  It's as if transactions didn't actually serialize, once
> in a great while, and either the original stuffer update never became
> visible by the time the worker thread completed its work, or some other
> thread modified it while the worker thread was processing the document.
> But I've got every jobqueue job status write instrumented so if that were
> what was happening we'd see it in the diagnostic dump, and we do not.
>
> I wish it were reasonably possible to hand the postgresql folks a test
> case that demonstrated this behavior.  But, as you know, the problem is
> quite rare, and you have to pound on the database for an extended period of
> time before you get this to happen. :-(
>
> Karl
>
> On Fri, Jun 7, 2019 at 3:28 AM Karl Wright <da...@gmail.com> wrote:
>
>> I just reviewed the places where JobQueue status is updated to be sure
>> that all places this happens are included in the diagnostics trackiing.
>> They were, save for when a cluster gets restarted, and I added that to the
>> code in trunk as well.
>>
>> So the analysis, above, stands.
>>
>> :-(
>>
>> Karl
>>
>>
>> On Thu, Jun 6, 2019 at 4:28 PM Karl Wright <da...@gmail.com> wrote:
>>
>>> I just looked at this briefly.
>>>
>>> For
>>> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204,
>>> look for the following record ID:
>>>
>>> 1522147023170
>>>
>>> This is the one when we look at it after the document is processed that it has status 4:
>>>
>>> >>>>>>
>>>
>>> manifoldcf_1     | 2019-06-06T02:01:05.380401275Z 2019-06-06T04:01:05,380 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: ==== Forensics for record 1522147023170, current status: 4 ====
>>>
>>> manifoldcf_1     | 2019-06-06T02:01:05.380512360Z 2019-06-06T04:01:05,380 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: === Current stack trace ===
>>> manifoldcf_1     | 2019-06-06T02:01:05.380544192Z java.lang.Exception: Forensics stack trace
>>> manifoldcf_1     | 2019-06-06T02:01:05.380548494Z 	at org.apache.manifoldcf.crawler.jobs.TrackerClass.printForensics(TrackerClass.java:216) [mcf-pull-agent.jar:?]
>>> manifoldcf_1     | 2019-06-06T02:01:05.380552564Z 	at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1028) [mcf-pull-agent.jar:?]
>>> manifoldcf_1     | 2019-06-06T02:01:05.380568210Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3414) [mcf-pull-agent.jar:?]
>>>
>>> <<<<<<
>>>
>>> It expects it to have status 5, though, because the only way the document could be being worked on at all is if the document had been put into status 5 by the stuffer thread:
>>>
>>> >>>>>>
>>>
>>> manifoldcf_1     | 2019-06-06T02:01:05.387210733Z 2019-06-06T04:01:05,381 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: == About to commit transaction by 'Stuffer thread' at 1559786464972 ==
>>> manifoldcf_1     | 2019-06-06T02:01:05.387250827Z java.lang.Exception: Precommit stack trace
>>> manifoldcf_1     | 2019-06-06T02:01:05.387256464Z 	at org.apache.manifoldcf.crawler.jobs.TrackerClass.notePrecommit(TrackerClass.java:108) ~[mcf-pull-agent.jar:?]
>>> manifoldcf_1     | 2019-06-06T02:01:05.387260987Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.fetchAndProcessDocuments(JobManager.java:3295) ~[mcf-pull-agent.jar:?]
>>> manifoldcf_1     | 2019-06-06T02:01:05.387265034Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:3014) ~[mcf-pull-agent.jar:?]
>>> manifoldcf_1     | 2019-06-06T02:01:05.387269073Z 	at org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:186) ~[mcf-pull-agent.jar:?]
>>> manifoldcf_1     | 2019-06-06T02:01:05.387285663Z 2019-06-06T04:01:05,387 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics:     Transaction includes:
>>> manifoldcf_1     | 2019-06-06T02:01:05.387290117Z 2019-06-06T04:01:05,387 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics:       Record 1522147023170 status modified to 5: Make active
>>>
>>> ...
>>>
>>> <<<<<<
>>>
>>> The actual commit follows a short while later to actually set the status, and only AFTER that are the documents queued to be handed to the worker threads.
>>>
>>> So what this is saying is that the stuffer thread sets the document status to 5 and then a worker thread gets it and when the worker thread is done it tries to set the status back to 4 -- but it's ALREADY at 4 and that is not possible.
>>>
>>> Now, it may be worth examining the possibility that some other job gets to the document while it is waiting to be queued, and does stuff to the document's jobqueue record.  There are a couple of background threads which (for example) reprioritize documents that might conceivably change the jobqueue record's status.  They generally work by changing the record status in a transaction to some other value that would not make the record eligible for queuing, and then changing it back when done.  But unless completely overlooked, these would, however, be visible in the diagnostics, and furthermore it should not be possible for one thread in a transaction to change the status at the same time as another thread in a transaction changes the status.
>>>
>>> So either way I'm forced to conclude that a record gets changed and committed, but the change is not visible to another thread later trying to reset the status.
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>> On Thu, Jun 6, 2019 at 5:58 AM Karl Wright <da...@gmail.com> wrote:
>>>
>>>> I can look at the log output but not until this weekend.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Thu, Jun 6, 2019 at 3:59 AM Markus Schuch <
>>>> markus.schuch@deutschebahn.com> wrote:
>>>>
>>>>> Hi Olivier,
>>>>>
>>>>>
>>>>>
>>>>> we were not able to fix this yet.
>>>>>
>>>>>
>>>>>
>>>>> But now we have new diagnostics log data, after the error occurred
>>>>> again yesterday:
>>>>>
>>>>>
>>>>>
>>>>> Unexpected jobqueue status - record id 1522147023170, expecting active
>>>>> status, saw 4
>>>>>
>>>>>
>>>>> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204
>>>>>
>>>>>
>>>>>
>>>>> Unexpected jobqueue status - record id 1541529864711, expecting active
>>>>> status, saw 4
>>>>>
>>>>>
>>>>> https://gist.githubusercontent.com/schuch/d73672e44b0edd5a9184170c9ee417a5/raw/f55b1c023056ef8e310d99489a0b6f9dfcaaa585/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201541529864711,%2520expecting%2520active%2520status,%2520saw%25204
>>>>>
>>>>>
>>>>>
>>>>> @Karl, maybe you can get some ideas from that log data?
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Markus
>>>>>
>>>>>
>>>>>
>>>>> *Von:* Olivier Tavard <ol...@francelabs.com>
>>>>> *Gesendet:* Dienstag, 21. Mai 2019 11:00
>>>>> *An:* user@manifoldcf.apache.org
>>>>> *Betreff:* Re: Error: Unexpected jobqueue status - record id X,
>>>>> expecting active status, saw 4 (MySQL compatible Database)
>>>>>
>>>>>
>>>>>
>>>>> Hi Markus,
>>>>>
>>>>>
>>>>>
>>>>> We have the same error (with postgresql database). Did the error occur
>>>>> again since your last mail ?
>>>>>
>>>>> Did you change something on your MCF configuration to fix this ?
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>>
>>>>> Olivier
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Le 13 févr. 2019 à 13:58, Markus Schuch <
>>>>> markus.schuch@deutschebahn.com> a écrit :
>>>>>
>>>>>
>>>>>
>>>>> Hi Karl,
>>>>>
>>>>>
>>>>>
>>>>> we set the diagnostigs logger to level debug.
>>>>>
>>>>>
>>>>>
>>>>> I will get back when the error occurs again.
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Markus
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>>
>>>>> Pflichtangaben anzeigen
>>>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>>>
>>>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>>>
>>>>>
>>>>>
>>>>> Le 12 févr. 2019 à 17:41, Karl Wright <da...@gmail.com> a écrit :
>>>>>
>>>>>
>>>>>
>>>>> Hi Marcus,
>>>>>
>>>>>
>>>>>
>>>>> There's a properties.xml debugging logger you can enable that will
>>>>> keep track of what's happening with transactions, so that when an error of
>>>>> this kinds is reported, information about why the situation is unexpected
>>>>> is dumped to the log.  The logger is called "diagnostics" e.g.
>>>>> "org.apache.manifoldcf.diagnostics".
>>>>>
>>>>>
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch <
>>>>> markus.schuch@deutschebahn.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> we are seeing "Error: Unexpected jobqueue status - record id
>>>>> 1484612513829, expecting active status, saw 4" from time to time.
>>>>> I didn’t report it, because we were running on an old MCF version, and
>>>>> there were some bugreports relating to this error that are resolved in
>>>>> newer versions.
>>>>>
>>>>> Now we have upgraded to latest and greatest and the error still
>>>>> occurred. So want to start to track this down.
>>>>>
>>>>> Out setup is:
>>>>>
>>>>>    - ManifoldCF 2.12, running in a Docker Container based on Redhat
>>>>>    Linux, OpenJDK 8
>>>>>    - AWS RDS Database (Aurora MySQL -> 5.6 compatible)
>>>>>    - Single Process Setup
>>>>>
>>>>>
>>>>> We run several Jobs (run once mode) over night with schedule windows
>>>>> and max runtime settings. Some of the time windows are overlapping.
>>>>> At normal days some Jobs finish during their time window, some go to
>>>>> WAITING and will go on in the next night.
>>>>>
>>>>> The error mostly hits at a Sharepoint Repository Job.
>>>>> We have the impression, that the error is somehow related to
>>>>> situations, when service interruptions (e.g. connection issues) occur in
>>>>> other jobs.
>>>>>
>>>>>
>>>>>
>>>>> Maybe all this is related to fundamental problems with the database or
>>>>> the JDBC driver
>>>>>
>>>>> (Karl expressed his concerns about that in the maybe-related
>>>>> https://issues.apache.org/jira/browse/CONNECTORS-1581 ticket)
>>>>>
>>>>> On the other hand, there were similar bugreports in the past
>>>>> concerning unexpected jobqueue status and there were resolved.
>>>>>
>>>>> Mayby there is a chance this can also be analyzed and fixed, but I do
>>>>> not really know where to start.
>>>>>
>>>>>
>>>>> Has anybody ideas how we can track this down / debug this to get to
>>>>> the bottom of the problem?
>>>>> What could be the first step in the analysis?
>>>>>
>>>>> Many thanks in advance
>>>>> Markus
>>>>>
>>>>> May be related bugreports:
>>>>>
>>>>>
>>>>>
>>>>> [0] https://issues.apache.org/jira/browse/CONNECTORS-1395
>>>>> [1] https://issues.apache.org/jira/browse/CONNECTORS-1180
>>>>> [2] https://issues.apache.org/jira/browse/CONNECTORS-590
>>>>> [3] https://issues.apache.org/jira/browse/CONNECTORS-246
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Markus Schuch
>>>>>
>>>>> Web Business (T.IPB 26)
>>>>>
>>>>>
>>>>>
>>>>> DB Systel GmbH
>>>>>
>>>>> Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>>
>>>>> Pflichtangaben anzeigen
>>>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>>>
>>>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Pflichtangaben anzeigen
>>>>> <http://www.deutschebahn.com/pflichtangaben/20190529>
>>>>>
>>>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>>>
>>>>

Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

Posted by Karl Wright <da...@gmail.com>.
Here's an explanation for Postgresql about what is supposed to happen in
this case.  See slide 7.

https://www.postgresql.org/files/developer/concurrency.pdf

The explanation should apply to all MVCC databases as well.  The sequence
we're seeing is:

Thread 1:
<start transaction>
<do_expensive_stuffer_query>
<update_jobqueue_status>
<end transaction>
<hand records to worker threads>


Worker thread:
<process document>
<begin transaction>
<change jobqueue status from active to completed>
<end transaction

The worker thread, however, is finding that the jobqueue status is not what
was expected.  It's as if transactions didn't actually serialize, once in a
great while, and either the original stuffer update never became visible by
the time the worker thread completed its work, or some other thread
modified it while the worker thread was processing the document.  But I've
got every jobqueue job status write instrumented so if that were what was
happening we'd see it in the diagnostic dump, and we do not.

I wish it were reasonably possible to hand the postgresql folks a test case
that demonstrated this behavior.  But, as you know, the problem is quite
rare, and you have to pound on the database for an extended period of time
before you get this to happen. :-(

Karl

On Fri, Jun 7, 2019 at 3:28 AM Karl Wright <da...@gmail.com> wrote:

> I just reviewed the places where JobQueue status is updated to be sure
> that all places this happens are included in the diagnostics trackiing.
> They were, save for when a cluster gets restarted, and I added that to the
> code in trunk as well.
>
> So the analysis, above, stands.
>
> :-(
>
> Karl
>
>
> On Thu, Jun 6, 2019 at 4:28 PM Karl Wright <da...@gmail.com> wrote:
>
>> I just looked at this briefly.
>>
>> For
>> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204,
>> look for the following record ID:
>>
>> 1522147023170
>>
>> This is the one when we look at it after the document is processed that it has status 4:
>>
>> >>>>>>
>>
>> manifoldcf_1     | 2019-06-06T02:01:05.380401275Z 2019-06-06T04:01:05,380 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: ==== Forensics for record 1522147023170, current status: 4 ====
>>
>> manifoldcf_1     | 2019-06-06T02:01:05.380512360Z 2019-06-06T04:01:05,380 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: === Current stack trace ===
>> manifoldcf_1     | 2019-06-06T02:01:05.380544192Z java.lang.Exception: Forensics stack trace
>> manifoldcf_1     | 2019-06-06T02:01:05.380548494Z 	at org.apache.manifoldcf.crawler.jobs.TrackerClass.printForensics(TrackerClass.java:216) [mcf-pull-agent.jar:?]
>> manifoldcf_1     | 2019-06-06T02:01:05.380552564Z 	at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1028) [mcf-pull-agent.jar:?]
>> manifoldcf_1     | 2019-06-06T02:01:05.380568210Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3414) [mcf-pull-agent.jar:?]
>>
>> <<<<<<
>>
>> It expects it to have status 5, though, because the only way the document could be being worked on at all is if the document had been put into status 5 by the stuffer thread:
>>
>> >>>>>>
>>
>> manifoldcf_1     | 2019-06-06T02:01:05.387210733Z 2019-06-06T04:01:05,381 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: == About to commit transaction by 'Stuffer thread' at 1559786464972 ==
>> manifoldcf_1     | 2019-06-06T02:01:05.387250827Z java.lang.Exception: Precommit stack trace
>> manifoldcf_1     | 2019-06-06T02:01:05.387256464Z 	at org.apache.manifoldcf.crawler.jobs.TrackerClass.notePrecommit(TrackerClass.java:108) ~[mcf-pull-agent.jar:?]
>> manifoldcf_1     | 2019-06-06T02:01:05.387260987Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.fetchAndProcessDocuments(JobManager.java:3295) ~[mcf-pull-agent.jar:?]
>> manifoldcf_1     | 2019-06-06T02:01:05.387265034Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:3014) ~[mcf-pull-agent.jar:?]
>> manifoldcf_1     | 2019-06-06T02:01:05.387269073Z 	at org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:186) ~[mcf-pull-agent.jar:?]
>> manifoldcf_1     | 2019-06-06T02:01:05.387285663Z 2019-06-06T04:01:05,387 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics:     Transaction includes:
>> manifoldcf_1     | 2019-06-06T02:01:05.387290117Z 2019-06-06T04:01:05,387 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics:       Record 1522147023170 status modified to 5: Make active
>>
>> ...
>>
>> <<<<<<
>>
>> The actual commit follows a short while later to actually set the status, and only AFTER that are the documents queued to be handed to the worker threads.
>>
>> So what this is saying is that the stuffer thread sets the document status to 5 and then a worker thread gets it and when the worker thread is done it tries to set the status back to 4 -- but it's ALREADY at 4 and that is not possible.
>>
>> Now, it may be worth examining the possibility that some other job gets to the document while it is waiting to be queued, and does stuff to the document's jobqueue record.  There are a couple of background threads which (for example) reprioritize documents that might conceivably change the jobqueue record's status.  They generally work by changing the record status in a transaction to some other value that would not make the record eligible for queuing, and then changing it back when done.  But unless completely overlooked, these would, however, be visible in the diagnostics, and furthermore it should not be possible for one thread in a transaction to change the status at the same time as another thread in a transaction changes the status.
>>
>> So either way I'm forced to conclude that a record gets changed and committed, but the change is not visible to another thread later trying to reset the status.
>>
>> Karl
>>
>>
>>
>>
>> On Thu, Jun 6, 2019 at 5:58 AM Karl Wright <da...@gmail.com> wrote:
>>
>>> I can look at the log output but not until this weekend.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Jun 6, 2019 at 3:59 AM Markus Schuch <
>>> markus.schuch@deutschebahn.com> wrote:
>>>
>>>> Hi Olivier,
>>>>
>>>>
>>>>
>>>> we were not able to fix this yet.
>>>>
>>>>
>>>>
>>>> But now we have new diagnostics log data, after the error occurred
>>>> again yesterday:
>>>>
>>>>
>>>>
>>>> Unexpected jobqueue status - record id 1522147023170, expecting active
>>>> status, saw 4
>>>>
>>>>
>>>> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204
>>>>
>>>>
>>>>
>>>> Unexpected jobqueue status - record id 1541529864711, expecting active
>>>> status, saw 4
>>>>
>>>>
>>>> https://gist.githubusercontent.com/schuch/d73672e44b0edd5a9184170c9ee417a5/raw/f55b1c023056ef8e310d99489a0b6f9dfcaaa585/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201541529864711,%2520expecting%2520active%2520status,%2520saw%25204
>>>>
>>>>
>>>>
>>>> @Karl, maybe you can get some ideas from that log data?
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Markus
>>>>
>>>>
>>>>
>>>> *Von:* Olivier Tavard <ol...@francelabs.com>
>>>> *Gesendet:* Dienstag, 21. Mai 2019 11:00
>>>> *An:* user@manifoldcf.apache.org
>>>> *Betreff:* Re: Error: Unexpected jobqueue status - record id X,
>>>> expecting active status, saw 4 (MySQL compatible Database)
>>>>
>>>>
>>>>
>>>> Hi Markus,
>>>>
>>>>
>>>>
>>>> We have the same error (with postgresql database). Did the error occur
>>>> again since your last mail ?
>>>>
>>>> Did you change something on your MCF configuration to fix this ?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Best regards,
>>>>
>>>>
>>>>
>>>> Olivier
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Le 13 févr. 2019 à 13:58, Markus Schuch <ma...@deutschebahn.com>
>>>> a écrit :
>>>>
>>>>
>>>>
>>>> Hi Karl,
>>>>
>>>>
>>>>
>>>> we set the diagnostigs logger to level debug.
>>>>
>>>>
>>>>
>>>> I will get back when the error occurs again.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Markus
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>>
>>>> Pflichtangaben anzeigen
>>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>>
>>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>>
>>>>
>>>>
>>>> Le 12 févr. 2019 à 17:41, Karl Wright <da...@gmail.com> a écrit :
>>>>
>>>>
>>>>
>>>> Hi Marcus,
>>>>
>>>>
>>>>
>>>> There's a properties.xml debugging logger you can enable that will keep
>>>> track of what's happening with transactions, so that when an error of this
>>>> kinds is reported, information about why the situation is unexpected is
>>>> dumped to the log.  The logger is called "diagnostics" e.g.
>>>> "org.apache.manifoldcf.diagnostics".
>>>>
>>>>
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch <
>>>> markus.schuch@deutschebahn.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> we are seeing "Error: Unexpected jobqueue status - record id
>>>> 1484612513829, expecting active status, saw 4" from time to time.
>>>> I didn’t report it, because we were running on an old MCF version, and
>>>> there were some bugreports relating to this error that are resolved in
>>>> newer versions.
>>>>
>>>> Now we have upgraded to latest and greatest and the error still
>>>> occurred. So want to start to track this down.
>>>>
>>>> Out setup is:
>>>>
>>>>    - ManifoldCF 2.12, running in a Docker Container based on Redhat
>>>>    Linux, OpenJDK 8
>>>>    - AWS RDS Database (Aurora MySQL -> 5.6 compatible)
>>>>    - Single Process Setup
>>>>
>>>>
>>>> We run several Jobs (run once mode) over night with schedule windows
>>>> and max runtime settings. Some of the time windows are overlapping.
>>>> At normal days some Jobs finish during their time window, some go to
>>>> WAITING and will go on in the next night.
>>>>
>>>> The error mostly hits at a Sharepoint Repository Job.
>>>> We have the impression, that the error is somehow related to
>>>> situations, when service interruptions (e.g. connection issues) occur in
>>>> other jobs.
>>>>
>>>>
>>>>
>>>> Maybe all this is related to fundamental problems with the database or
>>>> the JDBC driver
>>>>
>>>> (Karl expressed his concerns about that in the maybe-related
>>>> https://issues.apache.org/jira/browse/CONNECTORS-1581 ticket)
>>>>
>>>> On the other hand, there were similar bugreports in the past concerning
>>>> unexpected jobqueue status and there were resolved.
>>>>
>>>> Mayby there is a chance this can also be analyzed and fixed, but I do
>>>> not really know where to start.
>>>>
>>>>
>>>> Has anybody ideas how we can track this down / debug this to get to the
>>>> bottom of the problem?
>>>> What could be the first step in the analysis?
>>>>
>>>> Many thanks in advance
>>>> Markus
>>>>
>>>> May be related bugreports:
>>>>
>>>>
>>>>
>>>> [0] https://issues.apache.org/jira/browse/CONNECTORS-1395
>>>> [1] https://issues.apache.org/jira/browse/CONNECTORS-1180
>>>> [2] https://issues.apache.org/jira/browse/CONNECTORS-590
>>>> [3] https://issues.apache.org/jira/browse/CONNECTORS-246
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Markus Schuch
>>>>
>>>> Web Business (T.IPB 26)
>>>>
>>>>
>>>>
>>>> DB Systel GmbH
>>>>
>>>> Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>>
>>>> Pflichtangaben anzeigen
>>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>>
>>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Pflichtangaben anzeigen
>>>> <http://www.deutschebahn.com/pflichtangaben/20190529>
>>>>
>>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>>
>>>

Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

Posted by Karl Wright <da...@gmail.com>.
I just reviewed the places where JobQueue status is updated to be sure that
all places this happens are included in the diagnostics trackiing.  They
were, save for when a cluster gets restarted, and I added that to the code
in trunk as well.

So the analysis, above, stands.

:-(

Karl


On Thu, Jun 6, 2019 at 4:28 PM Karl Wright <da...@gmail.com> wrote:

> I just looked at this briefly.
>
> For
> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204,
> look for the following record ID:
>
> 1522147023170
>
> This is the one when we look at it after the document is processed that it has status 4:
>
> >>>>>>
>
> manifoldcf_1     | 2019-06-06T02:01:05.380401275Z 2019-06-06T04:01:05,380 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: ==== Forensics for record 1522147023170, current status: 4 ====
>
> manifoldcf_1     | 2019-06-06T02:01:05.380512360Z 2019-06-06T04:01:05,380 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: === Current stack trace ===
> manifoldcf_1     | 2019-06-06T02:01:05.380544192Z java.lang.Exception: Forensics stack trace
> manifoldcf_1     | 2019-06-06T02:01:05.380548494Z 	at org.apache.manifoldcf.crawler.jobs.TrackerClass.printForensics(TrackerClass.java:216) [mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.380552564Z 	at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1028) [mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.380568210Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3414) [mcf-pull-agent.jar:?]
>
> <<<<<<
>
> It expects it to have status 5, though, because the only way the document could be being worked on at all is if the document had been put into status 5 by the stuffer thread:
>
> >>>>>>
>
> manifoldcf_1     | 2019-06-06T02:01:05.387210733Z 2019-06-06T04:01:05,381 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: == About to commit transaction by 'Stuffer thread' at 1559786464972 ==
> manifoldcf_1     | 2019-06-06T02:01:05.387250827Z java.lang.Exception: Precommit stack trace
> manifoldcf_1     | 2019-06-06T02:01:05.387256464Z 	at org.apache.manifoldcf.crawler.jobs.TrackerClass.notePrecommit(TrackerClass.java:108) ~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387260987Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.fetchAndProcessDocuments(JobManager.java:3295) ~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387265034Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:3014) ~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387269073Z 	at org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:186) ~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387285663Z 2019-06-06T04:01:05,387 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics:     Transaction includes:
> manifoldcf_1     | 2019-06-06T02:01:05.387290117Z 2019-06-06T04:01:05,387 DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics:       Record 1522147023170 status modified to 5: Make active
>
> ...
>
> <<<<<<
>
> The actual commit follows a short while later to actually set the status, and only AFTER that are the documents queued to be handed to the worker threads.
>
> So what this is saying is that the stuffer thread sets the document status to 5 and then a worker thread gets it and when the worker thread is done it tries to set the status back to 4 -- but it's ALREADY at 4 and that is not possible.
>
> Now, it may be worth examining the possibility that some other job gets to the document while it is waiting to be queued, and does stuff to the document's jobqueue record.  There are a couple of background threads which (for example) reprioritize documents that might conceivably change the jobqueue record's status.  They generally work by changing the record status in a transaction to some other value that would not make the record eligible for queuing, and then changing it back when done.  But unless completely overlooked, these would, however, be visible in the diagnostics, and furthermore it should not be possible for one thread in a transaction to change the status at the same time as another thread in a transaction changes the status.
>
> So either way I'm forced to conclude that a record gets changed and committed, but the change is not visible to another thread later trying to reset the status.
>
> Karl
>
>
>
>
> On Thu, Jun 6, 2019 at 5:58 AM Karl Wright <da...@gmail.com> wrote:
>
>> I can look at the log output but not until this weekend.
>>
>> Karl
>>
>>
>> On Thu, Jun 6, 2019 at 3:59 AM Markus Schuch <
>> markus.schuch@deutschebahn.com> wrote:
>>
>>> Hi Olivier,
>>>
>>>
>>>
>>> we were not able to fix this yet.
>>>
>>>
>>>
>>> But now we have new diagnostics log data, after the error occurred again
>>> yesterday:
>>>
>>>
>>>
>>> Unexpected jobqueue status - record id 1522147023170, expecting active
>>> status, saw 4
>>>
>>>
>>> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204
>>>
>>>
>>>
>>> Unexpected jobqueue status - record id 1541529864711, expecting active
>>> status, saw 4
>>>
>>>
>>> https://gist.githubusercontent.com/schuch/d73672e44b0edd5a9184170c9ee417a5/raw/f55b1c023056ef8e310d99489a0b6f9dfcaaa585/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201541529864711,%2520expecting%2520active%2520status,%2520saw%25204
>>>
>>>
>>>
>>> @Karl, maybe you can get some ideas from that log data?
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Markus
>>>
>>>
>>>
>>> *Von:* Olivier Tavard <ol...@francelabs.com>
>>> *Gesendet:* Dienstag, 21. Mai 2019 11:00
>>> *An:* user@manifoldcf.apache.org
>>> *Betreff:* Re: Error: Unexpected jobqueue status - record id X,
>>> expecting active status, saw 4 (MySQL compatible Database)
>>>
>>>
>>>
>>> Hi Markus,
>>>
>>>
>>>
>>> We have the same error (with postgresql database). Did the error occur
>>> again since your last mail ?
>>>
>>> Did you change something on your MCF configuration to fix this ?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Best regards,
>>>
>>>
>>>
>>> Olivier
>>>
>>>
>>>
>>>
>>>
>>> Le 13 févr. 2019 à 13:58, Markus Schuch <ma...@deutschebahn.com>
>>> a écrit :
>>>
>>>
>>>
>>> Hi Karl,
>>>
>>>
>>>
>>> we set the diagnostigs logger to level debug.
>>>
>>>
>>>
>>> I will get back when the error occurs again.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Markus
>>>
>>>
>>> ------------------------------
>>>
>>>
>>> Pflichtangaben anzeigen
>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>
>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier:
>>>  http://www.deutschebahn.com/de/konzern/datenschutz
>>>
>>>
>>>
>>> Le 12 févr. 2019 à 17:41, Karl Wright <da...@gmail.com> a écrit :
>>>
>>>
>>>
>>> Hi Marcus,
>>>
>>>
>>>
>>> There's a properties.xml debugging logger you can enable that will keep
>>> track of what's happening with transactions, so that when an error of this
>>> kinds is reported, information about why the situation is unexpected is
>>> dumped to the log.  The logger is called "diagnostics" e.g.
>>> "org.apache.manifoldcf.diagnostics".
>>>
>>>
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch <
>>> markus.schuch@deutschebahn.com> wrote:
>>>
>>> Hi,
>>>
>>> we are seeing "Error: Unexpected jobqueue status - record id
>>> 1484612513829, expecting active status, saw 4" from time to time.
>>> I didn’t report it, because we were running on an old MCF version, and
>>> there were some bugreports relating to this error that are resolved in
>>> newer versions.
>>>
>>> Now we have upgraded to latest and greatest and the error still
>>> occurred. So want to start to track this down.
>>>
>>> Out setup is:
>>>
>>>    - ManifoldCF 2.12, running in a Docker Container based on Redhat
>>>    Linux, OpenJDK 8
>>>    - AWS RDS Database (Aurora MySQL -> 5.6 compatible)
>>>    - Single Process Setup
>>>
>>>
>>> We run several Jobs (run once mode) over night with schedule windows and
>>> max runtime settings. Some of the time windows are overlapping.
>>> At normal days some Jobs finish during their time window, some go to
>>> WAITING and will go on in the next night.
>>>
>>> The error mostly hits at a Sharepoint Repository Job.
>>> We have the impression, that the error is somehow related to situations,
>>> when service interruptions (e.g. connection issues) occur in other jobs.
>>>
>>>
>>>
>>> Maybe all this is related to fundamental problems with the database or
>>> the JDBC driver
>>>
>>> (Karl expressed his concerns about that in the maybe-related
>>> https://issues.apache.org/jira/browse/CONNECTORS-1581 ticket)
>>>
>>> On the other hand, there were similar bugreports in the past concerning
>>> unexpected jobqueue status and there were resolved.
>>>
>>> Mayby there is a chance this can also be analyzed and fixed, but I do
>>> not really know where to start.
>>>
>>>
>>> Has anybody ideas how we can track this down / debug this to get to the
>>> bottom of the problem?
>>> What could be the first step in the analysis?
>>>
>>> Many thanks in advance
>>> Markus
>>>
>>> May be related bugreports:
>>>
>>>
>>>
>>> [0] https://issues.apache.org/jira/browse/CONNECTORS-1395
>>> [1] https://issues.apache.org/jira/browse/CONNECTORS-1180
>>> [2] https://issues.apache.org/jira/browse/CONNECTORS-590
>>> [3] https://issues.apache.org/jira/browse/CONNECTORS-246
>>>
>>>
>>>
>>> --
>>>
>>> Markus Schuch
>>>
>>> Web Business (T.IPB 26)
>>>
>>>
>>>
>>> DB Systel GmbH
>>>
>>> Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main
>>>
>>>
>>> ------------------------------
>>>
>>>
>>> Pflichtangaben anzeigen
>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>
>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Pflichtangaben anzeigen
>>> <http://www.deutschebahn.com/pflichtangaben/20190529>
>>>
>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>
>>

Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

Posted by Karl Wright <da...@gmail.com>.
I just looked at this briefly.

For
https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204,
look for the following record ID:

1522147023170

This is the one when we look at it after the document is processed
that it has status 4:

>>>>>>

manifoldcf_1     | 2019-06-06T02:01:05.380401275Z
2019-06-06T04:01:05,380 DEBUG [Worker thread '37']
org.apache.manifoldcf.diagnostics: ==== Forensics for record
1522147023170, current status: 4 ====

manifoldcf_1     | 2019-06-06T02:01:05.380512360Z
2019-06-06T04:01:05,380 DEBUG [Worker thread '37']
org.apache.manifoldcf.diagnostics: === Current stack trace ===
manifoldcf_1     | 2019-06-06T02:01:05.380544192Z java.lang.Exception:
Forensics stack trace
manifoldcf_1     | 2019-06-06T02:01:05.380548494Z 	at
org.apache.manifoldcf.crawler.jobs.TrackerClass.printForensics(TrackerClass.java:216)
[mcf-pull-agent.jar:?]
manifoldcf_1     | 2019-06-06T02:01:05.380552564Z 	at
org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1028)
[mcf-pull-agent.jar:?]
manifoldcf_1     | 2019-06-06T02:01:05.380568210Z 	at
org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3414)
[mcf-pull-agent.jar:?]

<<<<<<

It expects it to have status 5, though, because the only way the
document could be being worked on at all is if the document had been
put into status 5 by the stuffer thread:

>>>>>>

manifoldcf_1     | 2019-06-06T02:01:05.387210733Z
2019-06-06T04:01:05,381 DEBUG [Worker thread '37']
org.apache.manifoldcf.diagnostics: == About to commit transaction by
'Stuffer thread' at 1559786464972 ==
manifoldcf_1     | 2019-06-06T02:01:05.387250827Z java.lang.Exception:
Precommit stack trace
manifoldcf_1     | 2019-06-06T02:01:05.387256464Z 	at
org.apache.manifoldcf.crawler.jobs.TrackerClass.notePrecommit(TrackerClass.java:108)
~[mcf-pull-agent.jar:?]
manifoldcf_1     | 2019-06-06T02:01:05.387260987Z 	at
org.apache.manifoldcf.crawler.jobs.JobManager.fetchAndProcessDocuments(JobManager.java:3295)
~[mcf-pull-agent.jar:?]
manifoldcf_1     | 2019-06-06T02:01:05.387265034Z 	at
org.apache.manifoldcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:3014)
~[mcf-pull-agent.jar:?]
manifoldcf_1     | 2019-06-06T02:01:05.387269073Z 	at
org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:186)
~[mcf-pull-agent.jar:?]
manifoldcf_1     | 2019-06-06T02:01:05.387285663Z
2019-06-06T04:01:05,387 DEBUG [Worker thread '37']
org.apache.manifoldcf.diagnostics:     Transaction includes:
manifoldcf_1     | 2019-06-06T02:01:05.387290117Z
2019-06-06T04:01:05,387 DEBUG [Worker thread '37']
org.apache.manifoldcf.diagnostics:       Record 1522147023170 status
modified to 5: Make active

...

<<<<<<

The actual commit follows a short while later to actually set the
status, and only AFTER that are the documents queued to be handed to
the worker threads.

So what this is saying is that the stuffer thread sets the document
status to 5 and then a worker thread gets it and when the worker
thread is done it tries to set the status back to 4 -- but it's
ALREADY at 4 and that is not possible.

Now, it may be worth examining the possibility that some other job
gets to the document while it is waiting to be queued, and does stuff
to the document's jobqueue record.  There are a couple of background
threads which (for example) reprioritize documents that might
conceivably change the jobqueue record's status.  They generally work
by changing the record status in a transaction to some other value
that would not make the record eligible for queuing, and then changing
it back when done.  But unless completely overlooked, these would,
however, be visible in the diagnostics, and furthermore it should not
be possible for one thread in a transaction to change the status at
the same time as another thread in a transaction changes the status.

So either way I'm forced to conclude that a record gets changed and
committed, but the change is not visible to another thread later
trying to reset the status.

Karl




On Thu, Jun 6, 2019 at 5:58 AM Karl Wright <da...@gmail.com> wrote:

> I can look at the log output but not until this weekend.
>
> Karl
>
>
> On Thu, Jun 6, 2019 at 3:59 AM Markus Schuch <
> markus.schuch@deutschebahn.com> wrote:
>
>> Hi Olivier,
>>
>>
>>
>> we were not able to fix this yet.
>>
>>
>>
>> But now we have new diagnostics log data, after the error occurred again
>> yesterday:
>>
>>
>>
>> Unexpected jobqueue status - record id 1522147023170, expecting active
>> status, saw 4
>>
>>
>> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204
>>
>>
>>
>> Unexpected jobqueue status - record id 1541529864711, expecting active
>> status, saw 4
>>
>>
>> https://gist.githubusercontent.com/schuch/d73672e44b0edd5a9184170c9ee417a5/raw/f55b1c023056ef8e310d99489a0b6f9dfcaaa585/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201541529864711,%2520expecting%2520active%2520status,%2520saw%25204
>>
>>
>>
>> @Karl, maybe you can get some ideas from that log data?
>>
>>
>>
>> Cheers,
>>
>> Markus
>>
>>
>>
>> *Von:* Olivier Tavard <ol...@francelabs.com>
>> *Gesendet:* Dienstag, 21. Mai 2019 11:00
>> *An:* user@manifoldcf.apache.org
>> *Betreff:* Re: Error: Unexpected jobqueue status - record id X,
>> expecting active status, saw 4 (MySQL compatible Database)
>>
>>
>>
>> Hi Markus,
>>
>>
>>
>> We have the same error (with postgresql database). Did the error occur
>> again since your last mail ?
>>
>> Did you change something on your MCF configuration to fix this ?
>>
>>
>>
>> Thanks,
>>
>> Best regards,
>>
>>
>>
>> Olivier
>>
>>
>>
>>
>>
>> Le 13 févr. 2019 à 13:58, Markus Schuch <ma...@deutschebahn.com>
>> a écrit :
>>
>>
>>
>> Hi Karl,
>>
>>
>>
>> we set the diagnostigs logger to level debug.
>>
>>
>>
>> I will get back when the error occurs again.
>>
>>
>>
>> Cheers,
>>
>> Markus
>>
>>
>> ------------------------------
>>
>>
>> Pflichtangaben anzeigen
>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>
>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier:
>>  http://www.deutschebahn.com/de/konzern/datenschutz
>>
>>
>>
>> Le 12 févr. 2019 à 17:41, Karl Wright <da...@gmail.com> a écrit :
>>
>>
>>
>> Hi Marcus,
>>
>>
>>
>> There's a properties.xml debugging logger you can enable that will keep
>> track of what's happening with transactions, so that when an error of this
>> kinds is reported, information about why the situation is unexpected is
>> dumped to the log.  The logger is called "diagnostics" e.g.
>> "org.apache.manifoldcf.diagnostics".
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch <
>> markus.schuch@deutschebahn.com> wrote:
>>
>> Hi,
>>
>> we are seeing "Error: Unexpected jobqueue status - record id
>> 1484612513829, expecting active status, saw 4" from time to time.
>> I didn’t report it, because we were running on an old MCF version, and
>> there were some bugreports relating to this error that are resolved in
>> newer versions.
>>
>> Now we have upgraded to latest and greatest and the error still occurred.
>> So want to start to track this down.
>>
>> Out setup is:
>>
>>    - ManifoldCF 2.12, running in a Docker Container based on Redhat
>>    Linux, OpenJDK 8
>>    - AWS RDS Database (Aurora MySQL -> 5.6 compatible)
>>    - Single Process Setup
>>
>>
>> We run several Jobs (run once mode) over night with schedule windows and
>> max runtime settings. Some of the time windows are overlapping.
>> At normal days some Jobs finish during their time window, some go to
>> WAITING and will go on in the next night.
>>
>> The error mostly hits at a Sharepoint Repository Job.
>> We have the impression, that the error is somehow related to situations,
>> when service interruptions (e.g. connection issues) occur in other jobs.
>>
>>
>>
>> Maybe all this is related to fundamental problems with the database or
>> the JDBC driver
>>
>> (Karl expressed his concerns about that in the maybe-related
>> https://issues.apache.org/jira/browse/CONNECTORS-1581 ticket)
>>
>> On the other hand, there were similar bugreports in the past concerning
>> unexpected jobqueue status and there were resolved.
>>
>> Mayby there is a chance this can also be analyzed and fixed, but I do not
>> really know where to start.
>>
>>
>> Has anybody ideas how we can track this down / debug this to get to the
>> bottom of the problem?
>> What could be the first step in the analysis?
>>
>> Many thanks in advance
>> Markus
>>
>> May be related bugreports:
>>
>>
>>
>> [0] https://issues.apache.org/jira/browse/CONNECTORS-1395
>> [1] https://issues.apache.org/jira/browse/CONNECTORS-1180
>> [2] https://issues.apache.org/jira/browse/CONNECTORS-590
>> [3] https://issues.apache.org/jira/browse/CONNECTORS-246
>>
>>
>>
>> --
>>
>> Markus Schuch
>>
>> Web Business (T.IPB 26)
>>
>>
>>
>> DB Systel GmbH
>>
>> Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main
>>
>>
>> ------------------------------
>>
>>
>> Pflichtangaben anzeigen
>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>
>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier:
>> http://www.deutschebahn.com/de/konzern/datenschutz
>>
>>
>>
>>
>> ------------------------------
>>
>> Pflichtangaben anzeigen
>> <http://www.deutschebahn.com/pflichtangaben/20190529>
>>
>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier:
>> http://www.deutschebahn.com/de/konzern/datenschutz
>>
>

Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

Posted by Karl Wright <da...@gmail.com>.
I can look at the log output but not until this weekend.

Karl


On Thu, Jun 6, 2019 at 3:59 AM Markus Schuch <ma...@deutschebahn.com>
wrote:

> Hi Olivier,
>
>
>
> we were not able to fix this yet.
>
>
>
> But now we have new diagnostics log data, after the error occurred again
> yesterday:
>
>
>
> Unexpected jobqueue status - record id 1522147023170, expecting active
> status, saw 4
>
>
> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204
>
>
>
> Unexpected jobqueue status - record id 1541529864711, expecting active
> status, saw 4
>
>
> https://gist.githubusercontent.com/schuch/d73672e44b0edd5a9184170c9ee417a5/raw/f55b1c023056ef8e310d99489a0b6f9dfcaaa585/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201541529864711,%2520expecting%2520active%2520status,%2520saw%25204
>
>
>
> @Karl, maybe you can get some ideas from that log data?
>
>
>
> Cheers,
>
> Markus
>
>
>
> *Von:* Olivier Tavard <ol...@francelabs.com>
> *Gesendet:* Dienstag, 21. Mai 2019 11:00
> *An:* user@manifoldcf.apache.org
> *Betreff:* Re: Error: Unexpected jobqueue status - record id X, expecting
> active status, saw 4 (MySQL compatible Database)
>
>
>
> Hi Markus,
>
>
>
> We have the same error (with postgresql database). Did the error occur
> again since your last mail ?
>
> Did you change something on your MCF configuration to fix this ?
>
>
>
> Thanks,
>
> Best regards,
>
>
>
> Olivier
>
>
>
>
>
> Le 13 févr. 2019 à 13:58, Markus Schuch <ma...@deutschebahn.com>
> a écrit :
>
>
>
> Hi Karl,
>
>
>
> we set the diagnostigs logger to level debug.
>
>
>
> I will get back when the error occurs again.
>
>
>
> Cheers,
>
> Markus
>
>
> ------------------------------
>
>
> Pflichtangaben anzeigen
> <http://www.deutschebahn.com/pflichtangaben/20190211>
>
> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier:
> http://www.deutschebahn.com/de/konzern/datenschutz
>
>
>
> Le 12 févr. 2019 à 17:41, Karl Wright <da...@gmail.com> a écrit :
>
>
>
> Hi Marcus,
>
>
>
> There's a properties.xml debugging logger you can enable that will keep
> track of what's happening with transactions, so that when an error of this
> kinds is reported, information about why the situation is unexpected is
> dumped to the log.  The logger is called "diagnostics" e.g.
> "org.apache.manifoldcf.diagnostics".
>
>
>
> Karl
>
>
>
>
>
> On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch <
> markus.schuch@deutschebahn.com> wrote:
>
> Hi,
>
> we are seeing "Error: Unexpected jobqueue status - record id
> 1484612513829, expecting active status, saw 4" from time to time.
> I didn’t report it, because we were running on an old MCF version, and
> there were some bugreports relating to this error that are resolved in
> newer versions.
>
> Now we have upgraded to latest and greatest and the error still occurred.
> So want to start to track this down.
>
> Out setup is:
>
>    - ManifoldCF 2.12, running in a Docker Container based on Redhat
>    Linux, OpenJDK 8
>    - AWS RDS Database (Aurora MySQL -> 5.6 compatible)
>    - Single Process Setup
>
>
> We run several Jobs (run once mode) over night with schedule windows and
> max runtime settings. Some of the time windows are overlapping.
> At normal days some Jobs finish during their time window, some go to
> WAITING and will go on in the next night.
>
> The error mostly hits at a Sharepoint Repository Job.
> We have the impression, that the error is somehow related to situations,
> when service interruptions (e.g. connection issues) occur in other jobs.
>
>
>
> Maybe all this is related to fundamental problems with the database or the
> JDBC driver
>
> (Karl expressed his concerns about that in the maybe-related
> https://issues.apache.org/jira/browse/CONNECTORS-1581 ticket)
>
> On the other hand, there were similar bugreports in the past concerning
> unexpected jobqueue status and there were resolved.
>
> Mayby there is a chance this can also be analyzed and fixed, but I do not
> really know where to start.
>
>
> Has anybody ideas how we can track this down / debug this to get to the
> bottom of the problem?
> What could be the first step in the analysis?
>
> Many thanks in advance
> Markus
>
> May be related bugreports:
>
>
>
> [0] https://issues.apache.org/jira/browse/CONNECTORS-1395
> [1] https://issues.apache.org/jira/browse/CONNECTORS-1180
> [2] https://issues.apache.org/jira/browse/CONNECTORS-590
> [3] https://issues.apache.org/jira/browse/CONNECTORS-246
>
>
>
> --
>
> Markus Schuch
>
> Web Business (T.IPB 26)
>
>
>
> DB Systel GmbH
>
> Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main
>
>
> ------------------------------
>
>
> Pflichtangaben anzeigen
> <http://www.deutschebahn.com/pflichtangaben/20190211>
>
> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier:
> http://www.deutschebahn.com/de/konzern/datenschutz
>
>
>
>
> ------------------------------
>
> Pflichtangaben anzeigen
> <http://www.deutschebahn.com/pflichtangaben/20190529>
>
> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier:
> http://www.deutschebahn.com/de/konzern/datenschutz
>

AW: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

Posted by Markus Schuch <ma...@deutschebahn.com>.
Hi Olivier,

we were not able to fix this yet.

But now we have new diagnostics log data, after the error occurred again yesterday:

Unexpected jobqueue status - record id 1522147023170, expecting active status, saw 4
https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204

Unexpected jobqueue status - record id 1541529864711, expecting active status, saw 4
https://gist.githubusercontent.com/schuch/d73672e44b0edd5a9184170c9ee417a5/raw/f55b1c023056ef8e310d99489a0b6f9dfcaaa585/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201541529864711,%2520expecting%2520active%2520status,%2520saw%25204

@Karl, maybe you can get some ideas from that log data?

Cheers,
Markus

Von: Olivier Tavard <ol...@francelabs.com>
Gesendet: Dienstag, 21. Mai 2019 11:00
An: user@manifoldcf.apache.org
Betreff: Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

Hi Markus,

We have the same error (with postgresql database). Did the error occur again since your last mail ?
Did you change something on your MCF configuration to fix this ?

Thanks,
Best regards,

Olivier



Le 13 févr. 2019 à 13:58, Markus Schuch <ma...@deutschebahn.com>> a écrit :

Hi Karl,

we set the diagnostigs logger to level debug.

I will get back when the error occurs again.

Cheers,
Markus

________________________________

Pflichtangaben anzeigen<http://www.deutschebahn.com/pflichtangaben/20190211>

Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier: http://www.deutschebahn.com/de/konzern/datenschutz

Le 12 févr. 2019 à 17:41, Karl Wright <da...@gmail.com>> a écrit :

Hi Marcus,

There's a properties.xml debugging logger you can enable that will keep track of what's happening with transactions, so that when an error of this kinds is reported, information about why the situation is unexpected is dumped to the log.  The logger is called "diagnostics" e.g. "org.apache.manifoldcf.diagnostics".

Karl


On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch <ma...@deutschebahn.com>> wrote:
Hi,

we are seeing "Error: Unexpected jobqueue status - record id 1484612513829, expecting active status, saw 4" from time to time.
I didn’t report it, because we were running on an old MCF version, and there were some bugreports relating to this error that are resolved in newer versions.
Now we have upgraded to latest and greatest and the error still occurred. So want to start to track this down.

Out setup is:

  *   ManifoldCF 2.12, running in a Docker Container based on Redhat Linux, OpenJDK 8
  *   AWS RDS Database (Aurora MySQL -> 5.6 compatible)
  *   Single Process Setup

We run several Jobs (run once mode) over night with schedule windows and max runtime settings. Some of the time windows are overlapping.
At normal days some Jobs finish during their time window, some go to WAITING and will go on in the next night.

The error mostly hits at a Sharepoint Repository Job.
We have the impression, that the error is somehow related to situations, when service interruptions (e.g. connection issues) occur in other jobs.

Maybe all this is related to fundamental problems with the database or the JDBC driver
(Karl expressed his concerns about that in the maybe-related https://issues.apache.org/jira/browse/CONNECTORS-1581 ticket)
On the other hand, there were similar bugreports in the past concerning unexpected jobqueue status and there were resolved.
Mayby there is a chance this can also be analyzed and fixed, but I do not really know where to start.

Has anybody ideas how we can track this down / debug this to get to the bottom of the problem?
What could be the first step in the analysis?

Many thanks in advance
Markus

May be related bugreports:

[0] https://issues.apache.org/jira/browse/CONNECTORS-1395
[1] https://issues.apache.org/jira/browse/CONNECTORS-1180
[2] https://issues.apache.org/jira/browse/CONNECTORS-590
[3] https://issues.apache.org/jira/browse/CONNECTORS-246

--
Markus Schuch
Web Business (T.IPB 26)

DB Systel GmbH
Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main

________________________________

Pflichtangaben anzeigen<http://www.deutschebahn.com/pflichtangaben/20190211>

Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier: http://www.deutschebahn.com/de/konzern/datenschutz


________________________________

Pflichtangaben anzeigen<http://www.deutschebahn.com/pflichtangaben/20190529>

Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier: http://www.deutschebahn.com/de/konzern/datenschutz