You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Gus Heck (JIRA)" <ji...@apache.org> on 2019/07/05 05:32:00 UTC

[jira] [Commented] (SOLR-13375) Dimensional Routed Aliases

    [ https://issues.apache.org/jira/browse/SOLR-13375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878979#comment-16878979 ] 

Gus Heck commented on SOLR-13375:
---------------------------------

First potentially functional patch. Lots of refactoring to move logic into the routed alias classes from the maintain Cmd classes, which are now consolidated to a single MaintainRoutedAlias class. Also refactored many longer methods into smaller chunks. The basic strategy here is to use a specialized subclass of the primary routed alias classes to provide the context for answering the question of whether or not collections need to be created. I also altered the basic logic such that the routed aliases fully calculate the collection to which the document should be routed and then push that "target" collection to the maintain command repeatedly until the collection required has been created. This simplifies the situation which previously had MaintainCategoryRoutedAliasCmd working off of the value encountered in the document and the MaintainTimeRoutedAlias was just marching the collections forward without knowledge of the end-state. The imbalance in those strategies needed to be resolved to keep DRA's tractable. Another notable abstraction added is a notion of "actions" that are requested by each routed alias during the execution of the MaintainRoutedAliasCmd. In the case of DRA's these are generated by each sub-dimension and collated into a final set of actions. by the DRA. 

Additionally I hit a very time consuming bug with tests, where I eventually realized that the problem is that the results of an admin command become visible (to the test) before the execution of the command is entirely completed, and the test that has waited for a collection to be visible can begin to shut down while an Async operation is still in progress. This can lead to never being released from the watcher.await(timeout); call in OverseerTaskQueue.offer never releasing (and then the shutdown cycle that is waiting for the core async thread to terminate waits until the timeout expires). This only showed up if I saturated my CPU and then only ablut 20% of the time. The sneaky thing about this is if you beasted it and went to bed or went to lunch it would complete successfully because of the timeout, but the time it took to do so was ridiculous if you were waiting for it. 

A 5 second Thread.sleep() as the last line of the test reliably resolved this, but not being happy with that  I added a count of pending overseerTasks and a allowOverseerPendingTasksToComplete() method to OverseerTaskQueue and the first thing that happens on CoreContainer.shutdown is it calls the new method (which of course first prohibits new tasks from being queued... though I'm not sure if the exception thrown to threads that try is ideal...). Once the in-progress tasks finish shutdown proceeds normally.  This completely solved the problems with my async collection creation tests.  [^SOLR-13375.patch] 

> Dimensional Routed Aliases
> --------------------------
>
>                 Key: SOLR-13375
>                 URL: https://issues.apache.org/jira/browse/SOLR-13375
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: master (9.0)
>            Reporter: Gus Heck
>            Assignee: Gus Heck
>            Priority: Major
>         Attachments: SOLR-13375.patch, SOLR-13375.patch, SOLR-13375.patch
>
>
> Current available routed aliases are restricted to a single field. This feature will allow Solr to provide data driven collection access, creation and management based on multiple fields in a document. The collections will be queried and updated in a unified manner via an alias. Current routing is restricted to the values of a single field. The particularly useful combination at this time will be Category X Time routing but Category X Category may also be useful. More importantly, if additional routing schemes are created in the future (either as contributions or as custom code by users) combination among these should be supported. 
> It is expected that not all combinations will be useful, and that determination of usefulness I expect to leave up to the user. Some Routing schemes may need to be limited to be the leaf/last routing scheme for technical reasons, though I'm not entirely convinced of that yet. If so, a flag will be added to the RoutedAlias interface.
> Initial desire is to support two levels, though if arbitrary levels can be supported easily that will be done.
> This could also have been called CompositeRoutedAlias, but that creates a TLA clash with CategoryRoutedAlias.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org