You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Mike Drob (Jira)" <ji...@apache.org> on 2022/03/07 18:09:00 UTC

[jira] [Reopened] (SOLR-14524) Harden MultiThreadedOCPTest

     [ https://issues.apache.org/jira/browse/SOLR-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Drob reopened SOLR-14524:
------------------------------
      Assignee: Ilan Ginzburg  (was: Mike Drob)

> Harden MultiThreadedOCPTest
> ---------------------------
>
>                 Key: SOLR-14524
>                 URL: https://issues.apache.org/jira/browse/SOLR-14524
>             Project: Solr
>          Issue Type: Test
>          Components: SolrCloud
>    Affects Versions: 9.0
>            Reporter: Ilan Ginzburg
>            Assignee: Ilan Ginzburg
>            Priority: Minor
>              Labels: test
>             Fix For: 9.0
>
>         Attachments: MultiThreadedOCPTest.log
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {{MultiThreadedOCPTest.test()}} fails occasionally in Jenkins because of timing of tasks enqueue to the Collection API queue.
> This test in {{testFillWorkQueue()}} enqueues a large number of tasks (115, more than the 100 Collection API parallel executors) to the Collection API queue for a collection COLL_A, then observes a short delay and enqueues a task for another collection COLL_B.
>  It verifies that the COLL_B task (that does not require the same lock as the COLL_A tasks) completes before the third COLL_A task.
> Test failures happen because when enqueues are slowed down enough, the first 3 tasks on COLL_A complete even before the COLL_B task gets enqueued!
> In one sample failed Jenkins test execution, the COLL_B task enqueue happened 1275ms after the enqueue of the first COLL_A, leaving plenty of time for a few (and possibly all) COLL_A tasks to complete.
> Fix will be along the lines of:
>  * Make the “blocking” COLL_A task longer to execute (currently 1 second) to compensate for slow enqueues.
>  * Verify the COLL_B task (a 1ms task) finishes before the long running COLL_A task does. This would be a good indication that even though the collection queue was filled with tasks waiting for a busy lock, a non competing task was picked and executed right away.
>  * Delay the enqueue of the COLL_B task to the end of processing of the first COLL_A task. This would guarantee that COLL_B is enqueued once at least some COLL_A tasks started processing at the Overseer. Possibly also verify that the long running task of COLL_A didn't finish execution yet when the COLL_B task is enqueued...
>  * It might be possible to set a (very) long duration for the slow task of COLL_A (to be less vulnerable to execution delays) without requiring the test to wait for that task to complete, but only wait for the COLL_B task to complete (so the test doesn't run for too long).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org