You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ezra Epstein (JIRA)" <ji...@apache.org> on 2008/04/01 02:52:24 UTC

[jira] Created: (SOLR-524) snappuller has limitation w/r/t/ handling multiple web apps

snappuller has limitation w/r/t/ handling multiple web apps
-----------------------------------------------------------

                 Key: SOLR-524
                 URL: https://issues.apache.org/jira/browse/SOLR-524
             Project: Solr
          Issue Type: Improvement
          Components: replication
    Affects Versions: 1.2
         Environment: Linux (CentOS release 5 (Final))
Java JDK 6
            Reporter: Ezra Epstein
            Priority: Minor


The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps.  In particular, by changing:

# rsync over files that have changed
rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip

to: 

# rsync over files that have changed
rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip

and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue.  Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module.  This is req'd for multiple webapps since they won't share a data folder.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-524) snappuller has limitation w/r/t/ handling multiple web apps

Posted by "Ezra Epstein (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584298#action_12584298 ] 

Ezra Epstein commented on SOLR-524:
-----------------------------------

I see that I didn't explain the issue very well.  The situation is that we have multiple indices, hence, in 1.2, multiple web-apps.  We also have replication, so we need to pull snapshots of the index/data files for each webapp/index.  The snappuller script has no way to do this.  The rsyncd-start script creates a single rsyncd MODULE (not webapp), named "solr".  The snappuller script always pulls directly from this modules fixed path - there's no way to extend from that module root in the snappuller script.  Thus: if ${data_dir} is /opt/solr/data snappuller will pull a snapshot from that folder.  But with multiple webapps we'll have:

/opt/solr/webapp1/data
/opt/solr/webapp2/data

and the current solr scripts seem to let us start rsyncd pointing at one folder or the other, but not both.  So either we can:
+ start a new instance of rsyncd - though we'd need a different module name since [solr] is taken by the first instance - though, I guess we could have it listen on a different port, which is potentially confusing (like running 2 instances of tomcat just to host 2 webapps)
+ not use rsynd and just use rsync directly (via ssh)
+ change rsyncd-start to allow multiple module names: [webapp1], [webapp2], etc - ok, but then its hard to add new webapps/indices
+ start rsyncd so that the [solr] module points to a root folder, e.g., /opt/solr in the above example, and then allow a variable in snappuller - set via the scripts.conf in the slaves - that specifies the path within this single module.  Thus we have the first path as /solr/webapp1/data and the second path (in the second webapp) as /solr/webapp2/data

More succinctly, I don't see how to use the scripts to support replication of multiple indices/webapps.  This approach allows a way that seems to scale and work well - with one caveat, the data dirs for the various indices must all be under a common root folder (though that could be "/". so it's a minor constraint).  So, if not the above, what is the recommended way to replicate multiple indices?


> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
>                 Key: SOLR-524
>                 URL: https://issues.apache.org/jira/browse/SOLR-524
>             Project: Solr
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 1.2
>         Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
>            Reporter: Ezra Epstein
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps.  In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
> to: 
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue.  Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module.  This is req'd for multiple webapps since they won't share a data folder.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-524) snappuller has limitation w/r/t/ handling multiple web apps

Posted by "Bill Au (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584150#action_12584150 ] 

billa edited comment on SOLR-524 at 4/1/08 7:27 AM:
------------------------------------------------------

In the command line in question:

rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip

The string "solr" IS NOT the webapp name.  II is the name used by rsyncd to map to a file system path.

Here is the content of rsyncd.conf, which is generated by rsyncd-start dynamically:

uid = $(whoami)
gid = $(whoami)
use chroot = no
list = no
pid file = ${solr_root}/logs/rsyncd.pid
log file = ${solr_root}/logs/rsyncd.log
[solr]
    path = ${data_dir}
    comment = Solr



      was (Author: billa):
    In the command line in question:

rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip

The string "solr" IS NOT the webapp name.  II is the name used by rsyncd to map to a file system path.

Here is the content of rsyncd.conf, which is generated by rsyncd-start dynamically:
#### rsyncd.conf file ####

uid = $(whoami)
gid = $(whoami)
use chroot = no
list = no
pid file = ${solr_root}/logs/rsyncd.pid
log file = ${solr_root}/logs/rsyncd.log
[solr]
    path = ${data_dir}
    comment = Solr


  
> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
>                 Key: SOLR-524
>                 URL: https://issues.apache.org/jira/browse/SOLR-524
>             Project: Solr
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 1.2
>         Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
>            Reporter: Ezra Epstein
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps.  In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
> to: 
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue.  Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module.  This is req'd for multiple webapps since they won't share a data folder.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-524) snappuller has limitation w/r/t/ handling multiple web apps

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man resolved SOLR-524.
---------------------------

    Resolution: Duplicate

The scripts are currently designed with the assumption that each index (regardless of whether it is from a separate core or a separate webapp) be replicated using a distinct instance of rsyncd (with a distinct port number)

there are a few advantages to this approach: notably that it's easy to disable replication for a single index while you do maintenance on the master.

That said: there are are plenty of compelling reasons to simplify and/or add alternate mechanisms for replicating multiple indexes ... but we already have an issue tracking this (SOLR-433) So i'm going to resolve this asa dup.

Ezra: please take a look at SOLR-433 and the approaches being taken in that issue.

> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
>                 Key: SOLR-524
>                 URL: https://issues.apache.org/jira/browse/SOLR-524
>             Project: Solr
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 1.2
>         Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
>            Reporter: Ezra Epstein
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps.  In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
> to: 
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue.  Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module.  This is req'd for multiple webapps since they won't share a data folder.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-524) snappuller has limitation w/r/t/ handling multiple web apps

Posted by "Bill Au (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584150#action_12584150 ] 

Bill Au commented on SOLR-524:
------------------------------

In the command line in question:

rsync -Wa${verbose}${compress} --delete ${sizeonly} \
${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip

The string "solr" IS NOT the webapp name.  II is the name used by rsyncd to map to a file system path.

Here is the content of rsyncd.conf, which is generated by rsyncd-start dynamically:
#### rsyncd.conf file ####

uid = $(whoami)
gid = $(whoami)
use chroot = no
list = no
pid file = ${solr_root}/logs/rsyncd.pid
log file = ${solr_root}/logs/rsyncd.log
[solr]
    path = ${data_dir}
    comment = Solr



> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
>                 Key: SOLR-524
>                 URL: https://issues.apache.org/jira/browse/SOLR-524
>             Project: Solr
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 1.2
>         Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
>            Reporter: Ezra Epstein
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating the indices for multiple webapps.  In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip
> to: 
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a default value of "solr" before the 'unset' commands at the top of the snappuller script, I've worked around the issue.  Still, it seems better to not hard-code the module name ([solr]) and also to allow some flexibility in the location of the data files under that module.  This is req'd for multiple webapps since they won't share a data folder.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.