You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ezra Epstein (JIRA)" <ji...@apache.org> on 2008/04/01 02:52:24 UTC
[jira] Created: (SOLR-524) snappuller has limitation w/r/t/
handling multiple web apps
snappuller has limitation w/r/t/ handling multiple web apps
-----------------------------------------------------------
Key: SOLR-524
URL: https://issues.apache.org/jira/browse/SOLR-524
Project: Solr
Issue Type: Improvement
Components: replication
Affects Versions: 1.2
Environment: Linux (CentOS release 5 (Final))
Java JDK 6
Reporter: Ezra Epstein
Priority: Minor
The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps. In particular, by changing:
# rsync over files that have changed
rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
to:
# rsync over files that have changed
rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip
and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue. Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module. This is req'd for multiple webapps since they won't share a data folder.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-524) snappuller has limitation w/r/t/
handling multiple web apps
Posted by "Ezra Epstein (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584298#action_12584298 ]
Ezra Epstein commented on SOLR-524:
-----------------------------------
I see that I didn't explain the issue very well. The situation is that we have multiple indices, hence, in 1.2, multiple web-apps. We also have replication, so we need to pull snapshots of the index/data files for each webapp/index. The snappuller script has no way to do this. The rsyncd-start script creates a single rsyncd MODULE (not webapp), named "solr". The snappuller script always pulls directly from this modules fixed path - there's no way to extend from that module root in the snappuller script. Thus: if ${data_dir} is /opt/solr/data snappuller will pull a snapshot from that folder. But with multiple webapps we'll have:
/opt/solr/webapp1/data
/opt/solr/webapp2/data
and the current solr scripts seem to let us start rsyncd pointing at one folder or the other, but not both. So either we can:
+ start a new instance of rsyncd - though we'd need a different module name since [solr] is taken by the first instance - though, I guess we could have it listen on a different port, which is potentially confusing (like running 2 instances of tomcat just to host 2 webapps)
+ not use rsynd and just use rsync directly (via ssh)
+ change rsyncd-start to allow multiple module names: [webapp1], [webapp2], etc - ok, but then its hard to add new webapps/indices
+ start rsyncd so that the [solr] module points to a root folder, e.g., /opt/solr in the above example, and then allow a variable in snappuller - set via the scripts.conf in the slaves - that specifies the path within this single module. Thus we have the first path as /solr/webapp1/data and the second path (in the second webapp) as /solr/webapp2/data
More succinctly, I don't see how to use the scripts to support replication of multiple indices/webapps. This approach allows a way that seems to scale and work well - with one caveat, the data dirs for the various indices must all be under a common root folder (though that could be "/". so it's a minor constraint). So, if not the above, what is the recommended way to replicate multiple indices?
> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
> Key: SOLR-524
> URL: https://issues.apache.org/jira/browse/SOLR-524
> Project: Solr
> Issue Type: Improvement
> Components: replication
> Affects Versions: 1.2
> Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
> Reporter: Ezra Epstein
> Priority: Minor
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps. In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
> to:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue. Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module. This is req'd for multiple webapps since they won't share a data folder.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-524) snappuller has limitation
w/r/t/ handling multiple web apps
Posted by "Bill Au (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584150#action_12584150 ]
billa edited comment on SOLR-524 at 4/1/08 7:27 AM:
------------------------------------------------------
In the command line in question:
rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
The string "solr" IS NOT the webapp name. II is the name used by rsyncd to map to a file system path.
Here is the content of rsyncd.conf, which is generated by rsyncd-start dynamically:
uid = $(whoami)
gid = $(whoami)
use chroot = no
list = no
pid file = ${solr_root}/logs/rsyncd.pid
log file = ${solr_root}/logs/rsyncd.log
[solr]
path = ${data_dir}
comment = Solr
was (Author: billa):
In the command line in question:
rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
The string "solr" IS NOT the webapp name. II is the name used by rsyncd to map to a file system path.
Here is the content of rsyncd.conf, which is generated by rsyncd-start dynamically:
#### rsyncd.conf file ####
uid = $(whoami)
gid = $(whoami)
use chroot = no
list = no
pid file = ${solr_root}/logs/rsyncd.pid
log file = ${solr_root}/logs/rsyncd.log
[solr]
path = ${data_dir}
comment = Solr
> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
> Key: SOLR-524
> URL: https://issues.apache.org/jira/browse/SOLR-524
> Project: Solr
> Issue Type: Improvement
> Components: replication
> Affects Versions: 1.2
> Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
> Reporter: Ezra Epstein
> Priority: Minor
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps. In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
> to:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue. Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module. This is req'd for multiple webapps since they won't share a data folder.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-524) snappuller has limitation w/r/t/
handling multiple web apps
Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hoss Man resolved SOLR-524.
---------------------------
Resolution: Duplicate
The scripts are currently designed with the assumption that each index (regardless of whether it is from a separate core or a separate webapp) be replicated using a distinct instance of rsyncd (with a distinct port number)
there are a few advantages to this approach: notably that it's easy to disable replication for a single index while you do maintenance on the master.
That said: there are are plenty of compelling reasons to simplify and/or add alternate mechanisms for replicating multiple indexes ... but we already have an issue tracking this (SOLR-433) So i'm going to resolve this asa dup.
Ezra: please take a look at SOLR-433 and the approaches being taken in that issue.
> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
> Key: SOLR-524
> URL: https://issues.apache.org/jira/browse/SOLR-524
> Project: Solr
> Issue Type: Improvement
> Components: replication
> Affects Versions: 1.2
> Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
> Reporter: Ezra Epstein
> Priority: Minor
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps. In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
> to:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue. Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module. This is req'd for multiple webapps since they won't share a data folder.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-524) snappuller has limitation w/r/t/
handling multiple web apps
Posted by "Bill Au (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584150#action_12584150 ]
Bill Au commented on SOLR-524:
------------------------------
In the command line in question:
rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
The string "solr" IS NOT the webapp name. II is the name used by rsyncd to map to a file system path.
Here is the content of rsyncd.conf, which is generated by rsyncd-start dynamically:
#### rsyncd.conf file ####
uid = $(whoami)
gid = $(whoami)
use chroot = no
list = no
pid file = ${solr_root}/logs/rsyncd.pid
log file = ${solr_root}/logs/rsyncd.log
[solr]
path = ${data_dir}
comment = Solr
> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
> Key: SOLR-524
> URL: https://issues.apache.org/jira/browse/SOLR-524
> Project: Solr
> Issue Type: Improvement
> Components: replication
> Affects Versions: 1.2
> Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
> Reporter: Ezra Epstein
> Priority: Minor
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps. In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
> to:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue. Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module. This is req'd for multiple webapps since they won't share a data folder.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.