You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2010/01/16 00:14:54 UTC

[jira] Created: (SOLR-1724) Real Basic Core Management with Zookeeper

Real Basic Core Management with Zookeeper
-----------------------------------------

                 Key: SOLR-1724
                 URL: https://issues.apache.org/jira/browse/SOLR-1724
             Project: Solr
          Issue Type: New Feature
          Components: multicore
    Affects Versions: 1.4
            Reporter: Jason Rutherglen
             Fix For: 1.5


Though we're implementing cloud, I need something real soon I can
play with and deploy. So this'll be a patch that only deploys
new cores, and that's about it. The arch is real simple:

On Zookeeper there'll be a directory that contains files that
represent the state of the cores of a given set of servers which
will look like the following:

/production/cores-1.txt
/production/cores-2.txt
/production/core-host-1-actual.txt (ephemeral node per host)

Where each core-N.txt file contains:

hostname,corename,instanceDir,coredownloadpath

coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc

and

core-host-actual.txt contains:

hostname,corename,instanceDir,size

Everytime a new core-N.txt file is added, the listening host
finds it's entry in the list and begins the process of trying to
match the entries. Upon completion, it updates it's
/core-host-1-actual.txt file to it's completed state or logs an error.

When all host actual files are written (without errors), then a
new core-1-actual.txt file is written which can be picked up by
another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

* No-commit

* NodeCoresManagerTest.testInstallCores works

* There's HDFS test cases using MiniDFSCluster



> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

No-commit

NodeCoresManager[Test] needs more work

A CoreController matchHosts unit test was added to CoreControllerTest

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

Added a way to hold a given number of host or cores files around in ZK, after which, the oldest are deleted.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837418#action_12837418 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

We need a test case with a partial install, and cleaning up any extraneous files afterwards

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804655#action_12804655 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

Need to have a command line tool that dumps the state of the
existing cluster from ZK, out to a json file for a particular
version. 

For my setup I'll have a program that'll look at this cluster
state file and generate an input file that'll be written to ZK,
which essentially instructs the Solr nodes to match the new
cluster state. This allows me to easily write my own
functionality that operates on the cluster that's external to
deploying new software into Solr. 

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: commons-lang-2.4.jar

commons-lang-2.4.jar is required

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835963#action_12835963 ] 

Ted Dunning commented on SOLR-1724:
-----------------------------------


Will this http access also allow a cluster with incrementally updated cores to replicate a core after a node failure?

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

Here's an update, we're onto the actual Solr node portion of the code, and some tests around that.  I'm focusing on downloading cores out of HDFS because that's my use case.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801216#action_12801216 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

Ted,

Thanks for the Katta link. 

This patch will likely de-emphasize the distributed search part,
which is where the ephemeral node is used (i.e. a given server
lists it's current state). I basically want to take care of this
one little deployment aspect of cores, improving on the wacky
hackedy system I'm running today. Then IF it works, then I'll
look at the distributed search part, hopefully in a totally
separate patch.



> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839937#action_12839937 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

I'm starting work on the cores file upload.  The cores file is in JSON format, and can be assembled by an entirely different process (i.e. the core assignment creation is decoupled from core deployment).  

I need to figure out how Solr HTML HTTP file uploading works... There's probably an example somewhere.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804760#action_12804760 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

The ZK port changed in ZkTestServer

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835965#action_12835965 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

For the above core moving, utilizing the existing Java replication will probably be suitable.  However, in all cases we need to copy the contents of all files related to the core (meaning everything under conf and data).  How does one accomplish this?

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801051#action_12801051 ] 

Ted Dunning commented on SOLR-1724:
-----------------------------------


Katta had some interesting issues in the design of this.

These are discussed here: http://oss.101tec.com/jira/browse/KATTA-43

The basic design consideration is that failure of a node needs to automagically update the ZK state accordingly.  This allows all important updates to files to go one direction as well.


> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

Removing cores seems to work well, on to modified cores... I checkpointing progress in case things break, I can easily roll back.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804750#action_12804750 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

I did an svn update, though now am seeing the following error:

java.util.concurrent.TimeoutException: Could not connect to ZooKeeper within 5000 ms
	at org.apache.solr.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:131)
	at org.apache.solr.cloud.SolrZkClient.<init>(SolrZkClient.java:106)
	at org.apache.solr.cloud.SolrZkClient.<init>(SolrZkClient.java:72)
	at org.apache.solr.cloud.CoreControllerTest.testCores(CoreControllerTest.java:48)

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

Zipping from a Lucene directory works and has a test case

A ReplicationHandler is added by default under a unique name, if one exists already, we still create our own, for the express purpose of locking an index commit point, zipping it, then uploading it to, for example, HDFS.  This part will likely be written next.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836018#action_12836018 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

Some further notes... I can reuse the replication code, but am going to place the functionality into core admin handler because it needs to work across cores and not have to be configured in each core's solrconfig.  

Also, we need to somehow support merging cores... Is that available yet?  Looks like merge indexes is only for directories?

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836022#action_12836022 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

We need a URL type parameter to define if a URL in a core info is to a zip file or to a Solr server download point.  

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803371#action_12803371 ] 

Ted Dunning commented on SOLR-1724:
-----------------------------------

{quote}
... I agree, I'm not really into ephemeral
ZK nodes for Solr hosts/nodes. The reason is contact with ZK is
highly superficial and can be intermittent. 
{quote}
I have found that when I was having trouble with ZK connectivity, the problems were simply surfacing issues that I had anyway.  You do have to configure the ZK client to not have long pauses (that is incompatible with SOLR how?) and you may need to adjust the timeouts on the ZK side.  More importantly, any issues with ZK connectivity will have their parallels with any other heartbeat mechanism and replicating a heartbeat system that tries to match ZK for reliability is going to be a significant  source of very nasty bugs.  Better to not rewrite that already works.  Keep in mind that ZK *connection* issues are not the same as session expiration.  Katta has a fairly important set of bugfixes now to make that distinction and ZK will soon handle connection loss on its own. 

It isn't a bad idea to keep shards around for a while if a node goes down.  That can seriously decrease the cost of momentary outages such as for a software upgrade.  The idea is that when the node comes back, it can advertise availability of some shards and replication of those shards should cease.



> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: gson-1.4.jar
                hadoop-0.20.2-dev-test.jar
                hadoop-0.20.2-dev-core.jar

Hadoop and Gson dependencies

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836013#action_12836013 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

I think the check on whether a conf file's been modified, to reload the core, can borrow from the replication handler and check the diff based on the checksum of the files... Though this somewhat complicates the storage of the checksum and the resultant JSON file.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838926#action_12838926 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

In thinking about this some more, in order for the functionality
provided in this issue to be more useful, there could be a web
based UI to easily view the master cores table. There can
additionally be an easy way to upload the new cores version into
Zookeeper. I'm not sure if the uploading should be web based or
command line, I'm figuring web based, simply because this is
more in line with the rest of Solr. 

As a core is installed or is in the midst of some other process
(such as backing itself up), the node/NodeCoresManager can
report the ongoing status to Zookeeper. For large cores (i.e. 20
GB) it's important to see how they're doing, and if they're
taking too long, begin some remedial action. The UI can display
the statuses. 


> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837898#action_12837898 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

I'm not sure how we'll handle (or if we even need to) installing
a new core over an existing core of the same name, in other
words core replacement. I think the instanceDir would need to be
different, which means we'll need to detect and fail on the case
of a new cores version (aka desired state) trying to install
itself into an existing core's instanceDir. Otherwise this
potential error case is costly in production. 

It makes me wonder about the shard id in Solr Cloud and how that
can be used to uniquely identify an installed core, if a core of
a given name is not guaranteed to be the same across Solr
servers.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800994#action_12800994 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

Additionally, upon successful completion of a core-version deployment to a set of nodes, then a customizable deletion policy like thing will be default, cleanup the old cores on the system.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834539#action_12834539 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

There's a wiki for this issue where the general specification is defined: 

http://wiki.apache.org/solr/DeploymentofSolrCoreswithZookeeper

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

Backing a core up works, at least according to the test case... I will probably begin to test this patch in a staging environment next, where Zookeeper is run in it's own process and a real HDFS cluster is used.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839159#action_12839159 ] 

Lance Norskog commented on SOLR-1724:
-------------------------------------

Are you using any of these? 

An Eclipse plug-in: 
[http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper] 

A Django (Python web toolkit) app: 
http://github.com/phunt/zookeeper_dashboard 

A Swing UI 
[http://issues.apache.org/jira/browse/ZOOKEEPER-418] 

All seem to have recent activity. Maybe one of them could become a custom monitor. 

If you want to monitor a horde of machines & apps via JMX, Hyperic might be the right tool: 
[http://support.hyperic.com/display/DOC/JMX+Plugin] 
[http://support.hyperic.com/display/DOC/JMX+Plugin+Tutorial] 

When I tried Hyperic out a couple of years ago I was really impressed.


> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801215#action_12801215 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

Note to self: I need a way to upload an empty core/confdir from the command line, basically into ZK, then reference that core from ZK (I think this'll work?).  I'd rather not rely on a separate http server or something... The size of a jared up Solr conf dir shouldn't be too much for ZK?

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835981#action_12835981 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

{quote}Will this http access also allow a cluster with
incrementally updated cores to replicate a core after a node
failure? {quote}

You're talking about moving an existing core into HDFS? That's a
great idea... I'll add it to the list!

Maybe for general "actions" to the system, there can be a ZK
directory acting as a queue that contains actions to be
performed by the cluster. When the action is completed it's
corresponding action file is deleted. 

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

Fixed the unit tests that were failing due to the switch over to using CoreContainer's initZooKeeper method.  ZkNodeCoresManager is instantiated in CoreContainer.  

There's a beginning of a UI in zkcores.jsp

I think we still need a core move test.  I'm thinking of adding backing up a core as an action that may be performed in a new cores version file.  

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839520#action_12839520 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

Started on the nodes reporting their status to separate files that are ephemeral nodes, there's no sense in keeping them around if the node isn't up, and the status is legitimately ephemeral.  In this case, the status will be something like "Core download 45% (7 GB of 15GB)".  

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801255#action_12801255 ] 

Yonik Seeley commented on SOLR-1724:
------------------------------------

{quote}
These are discussed here: http://oss.101tec.com/jira/browse/KATTA-43

The basic design consideration is that failure of a node needs to automagically update the ZK state accordingly. This allows all important updates to files to go one direction as well.
{quote}

We actually started out that way... (when a node went down there wasn't really any trace it ever existed) but have been moving away from it.
ZK may not just be a reflection of the cluster but may also control certain aspects of the cluster that you want persistent.  For example, marking a node as "disabled" (i.e. don't use it).  One could create APIs on the node to enable and disable and have that reflected in ZK, but it seems like more work than simply saying "change this znode".

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836896#action_12836896 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

I'm taking the approach of simply reusing SnapPuller and a replication handler for each core... This'll be faster to implement and more reliable for the first release (ie I won't run into little wacky bugs because I'll be reusing code that's well tested).  

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804590#action_12804590 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

{quote}If you know your going to not store file data at nodes
that have children (the only way that downloading to a real file
system makes sense), you could just call getChildren - if there
are children, its a dir, otherwise its a file. Doesn't work for
empty dirs, but you could also just do getData, and if it
returns null, treat it as a dir, else treat it as a file.{quote}

Thanks Mark... 

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835819#action_12835819 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

We need a test case for deleted and modified cores.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835490#action_12835490 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

I need to figure out how integrate this with the Solr Cloud distributed search stuff... Hmm... Maybe I'll start with the Solr Cloud test cases?

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804762#action_12804762 ] 

Mark Miller commented on SOLR-1724:
-----------------------------------

bq. The ZK port changed in ZkTestServe

Yeah - too easy to bump against a local ZooKeeper server with the default port, so I've switched it up.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801447#action_12801447 ] 

Yonik Seeley commented on SOLR-1724:
------------------------------------

bq. A somewhat secondary issue is whether the cluster master has to be involved in every query.

Yeah, that's never been part of the plans AFAIK.  In fact, in this first simple/short iteration, we have no master at all (or if there is one that can direct anything, it will be customer code).

bq. After trying several options in production, what I find is best is that the master lay down a statement of desired state and the nodes publish their status in a different and ephemeral fashion.

Right - this is captured on the solr-cloud wiki with the ideas of "model" and "state".  So far we're only dealing with state - reflecting what the current cluster looks like, and the details of how "model" type stuff (what state the nodes should strive for) hasn't been spelled out yet.

Of course, this has hijacked Jason's issue... sorry!

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

I added a test case that simulates attempting to install a bad core.

Still need to get the backup a Solr core to HDFS working.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

Updated to HEAD

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801296#action_12801296 ] 

Ted Dunning commented on SOLR-1724:
-----------------------------------

{quote}
We actually started out that way... (when a node went down there wasn't really any trace it ever existed) but have been moving away from it.
ZK may not just be a reflection of the cluster but may also control certain aspects of the cluster that you want persistent. For example, marking a node as "disabled" (i.e. don't use it). One could create APIs on the node to enable and disable and have that reflected in ZK, but it seems like more work than simply saying "change this znode".
{quote}

I see this as a  conflation of two or three goals that leads to trouble.  All of the goals are worthy and important, but the conflation of them leads to difficult problems.  Taken separately, the goals are easily met.

One goal is the reflection of current cluster state.  That is most reliably done using ephemeral files roughly as I described.

Another goal is the reflection of constraints or desired state of the cluster.  This is best handled as you describe, with permanent files since you don't want this desired state to disappear when a node disappears.

The real issue is making sure that the source of whatever information is most directly connected to the physical manifestation of that information.  Moreover, it is important in some cases (node state, for instance) that the state stay correct even when the source of that state loses control by crashing, hanging or becoming otherwise indisposed.  Inserting an intermediary into this chain of control is a bad idea.  Replicating ZK's rather well implemented ephemeral state mechanism with ad hoc heartbeats is also a bad idea (remember how *many* bugs there have been in hadoop relative to heartbeats and the name node?).

A somewhat secondary issue is whether the cluster master has to be involved in every query.  That seems like a really bad bottleneck to me and Katta provides a proof of existence that this is not necessary.

After trying several options in production, what I find is best is that the master lay down a statement of desired state and the nodes publish their status in a different and ephemeral fashion.  The master can record a history or there may be general directions such as your disabled list however you like but that shouldn't be mixed into the node status because you otherwise get into a situation where ephemeral files can no longer be used for what they are good at.



> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804773#action_12804773 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

For some reason ZkTestServer doesn't need to be shutdown any longer?

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803943#action_12803943 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

Do we have some code that recursively downloads a tree of files from ZK?  The challenge is I don't see a way to find out if a given path represents a directory or not.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803965#action_12803965 ] 

Mark Miller commented on SOLR-1724:
-----------------------------------

Well, a path could be both a directory and a file with the zookeeper abstraction, which doesn't really work on a standard filesystem.

If you know your going to not store file data at nodes that have children (the only way that downloading to a real file system makes sense), you could just call getChildren - if there are children, its a dir, otherwise its a file. Doesn't work for empty dirs, but you could also just do getData, and if it returns null, treat it as a dir, else treat it as a file.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836898#action_12836898 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

Actually, I just realized the whole exercise of moving a core is pointless, it's exactly the same as replication, so this is a non-issue...

I'm going to work on backing up a core to HDFS...

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835871#action_12835871 ] 

Jason Rutherglen edited comment on SOLR-1724 at 2/19/10 6:36 PM:
-----------------------------------------------------------------

Removing cores seems to work well, on to modified cores... I'm checkpointing progress in case things break, I can easily roll back.

      was (Author: jasonrutherglen):
    Removing cores seems to work well, on to modified cores... I checkpointing progress in case things break, I can easily roll back.
  
> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated SOLR-1724:
-----------------------------------

    Attachment: SOLR-1724.patch

Here's the first cut... I agree, I'm not really into ephemeral
ZK nodes for Solr hosts/nodes. The reason is contact with ZK is
highly superficial and can be intermittent. I'm mostly concerned
with insuring the core operations succeed on a given server. If
a server goes down, there needs to be more than ZK to prove it,
and if it goes down completely, I'll simply reallocate it's
cores to another server using the core management mechanism
provided in this patch. 

The issue is still being worked on, specifically the Solr server
portion that downloads the cores from some location, or performs
operations. The file format will move to json. 

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835513#action_12835513 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

I need to add the deletion policy before I can test this in a real environment, otherwise bunches of useless files will pile up in ZK.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835955#action_12835955 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

Also needed is the ability to move an existing core to a
different Solr server. The core will need to be copied via
direct HTTP file access, from a Solr server to another Solr
server. There is no need to zip the core first. 

This feature is useful for core indexes that have been
incrementally built, then need to be archived (i.e. the index was not
constructed using Hadoop).

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>         Attachments: commons-lang-2.4.jar, gson-1.4.jar, hadoop-0.20.2-dev-core.jar, hadoop-0.20.2-dev-test.jar, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch, SOLR-1724.patch
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1724) Real Basic Core Management with Zookeeper

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801244#action_12801244 ] 

Jason Rutherglen commented on SOLR-1724:
----------------------------------------

This'll be a patch on the cloud branch to reuse what's started, I don't see any core management code in there yet, so this looks complimentary.

> Real Basic Core Management with Zookeeper
> -----------------------------------------
>
>                 Key: SOLR-1724
>                 URL: https://issues.apache.org/jira/browse/SOLR-1724
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>             Fix For: 1.5
>
>
> Though we're implementing cloud, I need something real soon I can
> play with and deploy. So this'll be a patch that only deploys
> new cores, and that's about it. The arch is real simple:
> On Zookeeper there'll be a directory that contains files that
> represent the state of the cores of a given set of servers which
> will look like the following:
> /production/cores-1.txt
> /production/cores-2.txt
> /production/core-host-1-actual.txt (ephemeral node per host)
> Where each core-N.txt file contains:
> hostname,corename,instanceDir,coredownloadpath
> coredownloadpath is a URL such as file://, http://, hftp://, hdfs://, ftp://, etc
> and
> core-host-actual.txt contains:
> hostname,corename,instanceDir,size
> Everytime a new core-N.txt file is added, the listening host
> finds it's entry in the list and begins the process of trying to
> match the entries. Upon completion, it updates it's
> /core-host-1-actual.txt file to it's completed state or logs an error.
> When all host actual files are written (without errors), then a
> new core-1-actual.txt file is written which can be picked up by
> another process that can create a new core proxy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.