You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2017/12/28 04:21:00 UTC

[jira] [Updated] (SOLR-11653) create next time collection based on a fixed time gap

     [ https://issues.apache.org/jira/browse/SOLR-11653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-11653:
--------------------------------
    Attachment: SOLR-11653.patch

Here's a first draft patch that is fairly incomplete insofar as lacking tests and I haven't actually run this code at all yet.  It shows the approach.  There are two main parts:

(1) New *RoutedAliasCreateCollectionCmd*, an Overseer Cmd registered as "ROUTEDALIAS_CREATECOLL".  It adds the next collection to a time routed alias.  It assumes the metadata on the alias with a certain prefix is collection creation metadata, and it mandates collection.configName is present (we want all the collections to have the same configset, by default any way).  The collection creation is invoked in two steps by first calling CollectionsHandler.CollectionOperation.CREATE_OP.execute to get the overseer message, and then it delivers it to CreateCollectionCmd indirectly via the OverseerCollectionMessageHandler.  The alias is updated to have the new collection at the first position (thus reverse chronological order).  Note that this Cmd has a parameter ifHeadCollName that is the head (latest) collection name that the caller sees when it calls the command.  If the head collection is something else, the Cmd returns without error, as it's assumed there may have been a race of multiple attempts to create the next collection at the same time.

(2) Changes to TimeRoutedAliasUpdateProcessor.  There's now a loop such that if we think we need to create the collection, we do so and then we retry from the start, more or less.  This is mostly because we may need to create a series of collections if the current collection head is very out of date.  I also added a check to throw an exception if the timestamp of the document is far into the future (currently 10 minutes).

So yeah I need to actually use it and work on tests.  But there are some code re-arrangement that should be done as well, I think.  The Cmd calls into the URP to share some code but it ought to be the other way around.  Or maybe a new "TimeRoutedAliasInfo" class could exist that is used by both the URP and Cmd?  There will probably be some code sharing with SOLR-11722 like formatting the collection name from a timestamp -- CC [~gus_heck]

> create next time collection based on a fixed time gap
> -----------------------------------------------------
>
>                 Key: SOLR-11653
>                 URL: https://issues.apache.org/jira/browse/SOLR-11653
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: SOLR-11653.patch
>
>
> For time series collections (as part of a collection Alias with certain metadata), we want to automatically add new collections. In this issue, this is about creating the next collection based on a configurable fixed time gap.  And we will also add this collection synchronously once a document flowing through the URP chain exceeds the gap, as opposed to asynchronously in advance.  There will be some Alias metadata to define in this issue.  The preponderance of the implementation will be in TimePartitionedUpdateProcessor or perhaps a helper to this URP.
> note: other issues will implement pre-emptive creation and capping collections by size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org