You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hrishikesh Gadre (JIRA)" <ji...@apache.org> on 2016/06/15 18:42:10 UTC

[jira] [Comment Edited] (SOLR-7374) Backup/Restore should provide a param for specifying the directory implementation it should use

    [ https://issues.apache.org/jira/browse/SOLR-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332291#comment-15332291 ] 

Hrishikesh Gadre edited comment on SOLR-7374 at 6/15/16 6:42 PM:
-----------------------------------------------------------------

[~varunthacker] [~markrmiller@gmail.com] Thanks for the comments!

bq. I think we need to deal with the "location" param better. Before this patch we used to read location as a query param . If the query param is empty then we read it from the cluster property. 

bq. With this patch we are adding the ability to specify "location" in the solr.xml file but it will never be used? CollectionsHandler will bail out early today .

This is a partial patch handling ONLY core level changes. The collection level changes are being captured in the patch for SOLR-9055. I did this to keep the patch relatively short and easier to review. In the patch for SOLR-9055 - I have changed the CollectionsHandler implementation to read default location from solr.xml (instead of cluster property). Since this core level operation is "internal" - technically we don't have to handle the case for missing "location" param in this patch (i.e. we can keep the original behavior). I think I made this change to simplify unit testing.

bq. One approach would be to deprecate the usage of cluster prop and look at query param followed by solr.xml ? Looking at three places seems messy .
bq. [Mark] It is a bit odd to have some config in solr.xml and then default location as a cluster prop, but much nicer to be able to easily change the default location on the fly. solr.xml is a pain to change and requires a restart.

I agree that cluster property approach is more convenient as compared to solr.xml. But since we allow users to configure multiple repositories in solr.xml, we can not really use the current cluster property as is. This is because user may want to specify different location for different file-systems (or repositories). Hence at minimum we need one cluster property per repository configuration (e.g. name could be <repository-name>-location). But based on my understanding CLUSTERPROP API implementation requires fixed (or well-known) property names,

https://github.com/apache/lucene-solr/blob/651499c82df482b493b0ed166c2ab7196af0a794/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterProperties.java#L90

We may have to relax this restriction for this work. On the other hand, specifying default location in solr.xml is not so bad since user can always specify a location parameter to avoid restarting the Solr cluster. Thoughts?

bq. Can we reuse the "location" string with (BackupRepository.DEFAULT_LOCATION_PROPERTY) on line 871/925 of CoreAdminOperation? Let's fix it in CollectionsHandler and OverseerCollectionMessageHandler as well?

Let me fix the CoreAdminOperation in this patch. I will defer the collection level changes to SOLR-9055.

bq. In RestoreCore do we need to deprecate the old RestoreCore ctor ? Any reason why we can't remove it directly?

The deprecated constructor is used by the ReplicationHandler. The new constructor expects the BackupRepository reference which can be obtained only via CoreContainer. I couldn't find a way to get hold of CoreContainer in ReplicationHandler. Hence I didn't remove this constructor.

bq. repo is the key used to specify the implementation. In Solr xml the tag is called repository . Should we just use repository throughout?

Sure that make sense. Let me fix this.





was (Author: hgadre):
[~varunthacker] [~markrmiller@gmail.com] Thanks for the comments!

bq. I think we need to deal with the "location" param better. Before this patch we used to read location as a query param . If the query param is empty then we read it from the cluster property. bq. With this patch we are adding the ability to specify "location" in the solr.xml file but it will never be used? CollectionsHandler will bail out early today .

This is a partial patch handling ONLY core level changes. The collection level changes are being captured in the patch for SOLR-9055. I did this to keep the patch relatively short and easier to review. In the patch for SOLR-9055 - I have changed the CollectionsHandler implementation to read default location from solr.xml (instead of cluster property). Since this core level operation is "internal" - technically we don't have to handle the case for missing "location" param in this patch (i.e. we can keep the original behavior). I think I made this change to simplify unit testing.

bq. One approach would be to deprecate the usage of cluster prop and look at query param followed by solr.xml ? Looking at three places seems messy .
bq. [Mark] It is a bit odd to have some config in solr.xml and then default location as a cluster prop, but much nicer to be able to easily change the default location on the fly. solr.xml is a pain to change and requires a restart.

I agree that cluster property approach is more convenient as compared to solr.xml. But since we allow users to configure multiple repositories in solr.xml, we can not really use the current cluster property as is. This is because user may want to specify different location for different file-systems (or repositories). Hence at minimum we need one cluster property per repository configuration (e.g. name could be <repository-name>-location). But based on my understanding CLUSTERPROP API implementation requires fixed (or well-known) property names,

https://github.com/apache/lucene-solr/blob/651499c82df482b493b0ed166c2ab7196af0a794/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterProperties.java#L90

We may have to relax this restriction for this work. On the other hand, specifying default location in solr.xml is not so bad since user can always specify a location parameter to avoid restarting the Solr cluster. Thoughts?

bq. Can we reuse the "location" string with (BackupRepository.DEFAULT_LOCATION_PROPERTY) on line 871/925 of CoreAdminOperation? Let's fix it in CollectionsHandler and OverseerCollectionMessageHandler as well?

Let me fix the CoreAdminOperation in this patch. I will defer the collection level changes to SOLR-9055.

bq. In RestoreCore do we need to deprecate the old RestoreCore ctor ? Any reason why we can't remove it directly?

The deprecated constructor is used by the ReplicationHandler. The new constructor expects the BackupRepository reference which can be obtained only via CoreContainer. I couldn't find a way to get hold of CoreContainer in ReplicationHandler. Hence I didn't remove this constructor.

bq. repo is the key used to specify the implementation. In Solr xml the tag is called repository . Should we just use repository throughout?

Sure that make sense. Let me fix this.




> Backup/Restore should provide a param for specifying the directory implementation it should use
> -----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7374
>                 URL: https://issues.apache.org/jira/browse/SOLR-7374
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Varun Thacker
>            Assignee: Mark Miller
>             Fix For: 5.2, 6.0
>
>         Attachments: SOLR-7374.patch, SOLR-7374.patch, SOLR-7374.patch, SOLR-7374.patch, SOLR-7374.patch
>
>
> Currently when we create a backup we use SimpleFSDirectory to write the backup indexes. Similarly during a restore we open the index using FSDirectory.open . 
> We should provide a param called {{directoryImpl}} or {{type}} which will be used to specify the Directory implementation to backup the index. 
> Likewise during a restore you would need to specify the directory impl which was used during backup so that the index can be opened correctly.
> This param will address the problem that currently if a user is running Solr on HDFS there is no way to use the backup/restore functionality as the directory is hardcoded.
> With this one could be running Solr on a local FS but backup the index on HDFS etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org