You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov> on 2015/12/30 21:09:00 UTC

Testing Solr configuration, schema, and other fields

At my organization, I want to create a tool that allows users to keep a solr configuration as a Git repository.   Then, I want my Continuous Integration environment to take some branch of the git repository and "publish" it into ZooKeeper/SolrCloud.

Working on my own, it is only a very small pain to note foolish errors I've made, fix them, and restart.    However, I want my users to be able to edit their own Solr schema and config *most* of the time, at least on development servers.    They will not have command-line access to these servers, and I want to avoid endless restarts.

I'm not interested in fighting to maintain such a useless thing as a DTD/XSD without community support; what I really want to know is whether Solr will start and can index some sample documents.   I'm wondering whether I might be able to build a tool to fire up an EmbeddedSolrServer and capture error messages/exceptions in a reasonable way.     This tool could then be run by my users before they commit to git, and then again by the CI server before it "publishes" the configuration to ZooKeeper/SolrCloud.

Any suggestions?

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

Re: Testing Solr configuration, schema, and other fields

Posted by Erick Erickson <er...@gmail.com>.

Hmmm, a couple of things:

the bin/solr script could be used as a model in this scenario for
how to automate a lot of this. I'm thinking you can skip all the
argument parsing and that and just see how the SolrCLI jar file
is used to spin up collections, upload configs and the like. In fact,
assuming a unique collection name per developer you could
use a common dev SolrCloud setup for this.

Or heck, perhaps just use the bin/solr script for all of that...

The other thing I was assuming is that you don't _really_ care
about starting/stopping Solr, it's more the requirement for your
devs to upload the configs, reload a collection, find out whether
the collection is running or not, if not find the log files and see why
cycle you'd like to shorten....

FWIW,
Erick

On Thu, Dec 31, 2015 at 8:31 AM, Davis, Daniel (NIH/NLM) [C]
<da...@nih.gov> wrote:
> Erik, that suggests an additional approach that seems to have "legs":
>
> * A webapp that acts as a sort of Cloud IDE for Solr configsets.   It supports multiple projects and a single SolrCloud cluster.   For each project, it upconfigs a git repository local to the webapp, and has the ability to define tests that run against a "temporary" collection to verify the configuration.
>
> * A command-line utility that upconfigs the configuration a local directory, creates a temporary collection, and supports an optional "tests" by applying an update query.
>
> Since the webapp would be based on something like the command-line utility (maybe in library form), I think I'm still going to target the command-line utility as my "minimum viable product".   I'll support SolrCloud first, and then see about EmbeddedSolrServer.
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik.hatcher@gmail.com]
> Sent: Thursday, December 31, 2015 10:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> Dan - I’m a fan of the idea of using EmbeddedSolrServer for the type of thing you mention, but since you’re already using SolrCloud how about simply upconfig’ing the configuration from the Git repo, create a temporary collection using that configset and smoke test it before making it ready for end client/customer/user use?   Maybe the configset and collection created for smoke testing are just temporary in order to validate it.
>
> —
> Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com <http://www.lucidworks.com/>
>
>
>
>> On Dec 30, 2015, at 3:09 PM, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
>>
>> At my organization, I want to create a tool that allows users to keep a solr configuration as a Git repository.   Then, I want my Continuous Integration environment to take some branch of the git repository and "publish" it into ZooKeeper/SolrCloud.
>>
>> Working on my own, it is only a very small pain to note foolish errors I've made, fix them, and restart.    However, I want my users to be able to edit their own Solr schema and config *most* of the time, at least on development servers.    They will not have command-line access to these servers, and I want to avoid endless restarts.
>>
>> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD without community support; what I really want to know is whether Solr will start and can index some sample documents.   I'm wondering whether I might be able to build a tool to fire up an EmbeddedSolrServer and capture error messages/exceptions in a reasonable way.     This tool could then be run by my users before they commit to git, and then again by the CI server before it "publishes" the configuration to ZooKeeper/SolrCloud.
>>
>> Any suggestions?
>>
>> Dan Davis, Systems/Applications Architect (Contractor), Office of
>> Computer and Communications Systems, National Library of Medicine, NIH
>>
>

RE: Testing Solr configuration, schema, and other fields

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

Erik, that suggests an additional approach that seems to have "legs":

* A webapp that acts as a sort of Cloud IDE for Solr configsets.   It supports multiple projects and a single SolrCloud cluster.   For each project, it upconfigs a git repository local to the webapp, and has the ability to define tests that run against a "temporary" collection to verify the configuration.

* A command-line utility that upconfigs the configuration a local directory, creates a temporary collection, and supports an optional "tests" by applying an update query.

Since the webapp would be based on something like the command-line utility (maybe in library form), I think I'm still going to target the command-line utility as my "minimum viable product".   I'll support SolrCloud first, and then see about EmbeddedSolrServer.

-----Original Message-----
From: Erik Hatcher [mailto:erik.hatcher@gmail.com] 
Sent: Thursday, December 31, 2015 10:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Testing Solr configuration, schema, and other fields

Dan - I’m a fan of the idea of using EmbeddedSolrServer for the type of thing you mention, but since you’re already using SolrCloud how about simply upconfig’ing the configuration from the Git repo, create a temporary collection using that configset and smoke test it before making it ready for end client/customer/user use?   Maybe the configset and collection created for smoke testing are just temporary in order to validate it.

—
Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com <http://www.lucidworks.com/>

> On Dec 30, 2015, at 3:09 PM, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
> 
> At my organization, I want to create a tool that allows users to keep a solr configuration as a Git repository.   Then, I want my Continuous Integration environment to take some branch of the git repository and "publish" it into ZooKeeper/SolrCloud.
> 
> Working on my own, it is only a very small pain to note foolish errors I've made, fix them, and restart.    However, I want my users to be able to edit their own Solr schema and config *most* of the time, at least on development servers.    They will not have command-line access to these servers, and I want to avoid endless restarts.
> 
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD without community support; what I really want to know is whether Solr will start and can index some sample documents.   I'm wondering whether I might be able to build a tool to fire up an EmbeddedSolrServer and capture error messages/exceptions in a reasonable way.     This tool could then be run by my users before they commit to git, and then again by the CI server before it "publishes" the configuration to ZooKeeper/SolrCloud.
> 
> Any suggestions?
> 
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>

Re: Testing Solr configuration, schema, and other fields

Posted by Erik Hatcher <er...@gmail.com>.

Dan - I’m a fan of the idea of using EmbeddedSolrServer for the type of thing you mention, but since you’re already using SolrCloud how about simply upconfig’ing the configuration from the Git repo, create a temporary collection using that configset and smoke test it before making it ready for end client/customer/user use?   Maybe the configset and collection created for smoke testing are just temporary in order to validate it.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>



> On Dec 30, 2015, at 3:09 PM, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
> 
> At my organization, I want to create a tool that allows users to keep a solr configuration as a Git repository.   Then, I want my Continuous Integration environment to take some branch of the git repository and "publish" it into ZooKeeper/SolrCloud.
> 
> Working on my own, it is only a very small pain to note foolish errors I've made, fix them, and restart.    However, I want my users to be able to edit their own Solr schema and config *most* of the time, at least on development servers.    They will not have command-line access to these servers, and I want to avoid endless restarts.
> 
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD without community support; what I really want to know is whether Solr will start and can index some sample documents.   I'm wondering whether I might be able to build a tool to fire up an EmbeddedSolrServer and capture error messages/exceptions in a reasonable way.     This tool could then be run by my users before they commit to git, and then again by the CI server before it "publishes" the configuration to ZooKeeper/SolrCloud.
> 
> Any suggestions?
> 
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>

RE: Testing Solr configuration, schema, and other fields

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

Heh

National Library of Medicine (NLM) is all over the map in terms of "not-invented-here", being a large organization within a large organization.  It's my personal tendency towards "not-invented-here" that concerns me.

-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Thursday, December 31, 2015 12:24 PM
To: solr-user <so...@lucene.apache.org>
Subject: RE: Testing Solr configuration, schema, and other fields

Well, I guess NIH stands for Not Invented Here. No idea what NLM is for.

P.s. sorry, could not resist. I worked for orgs like that too :-( On 1 Jan 2016 12:03 am, "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>
wrote:

> That's incredibly cool.   Much easier than the chef/puppet scripts and
> stuff I've seen.    I'm certain to play with this and get under the hood;
> however, we locally don't have a permission to use AWS EC2 in this corner
> of NLM.    There's some limited use of S3 and Glacier.   Maybe we'll
> negotiate EC2 for dev later this year, maybe not.
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Thursday, December 31, 2015 11:40 AM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> Makes sense.
>
> Answering the answer email in this thread, did you look at Solr Scale?
> Maybe it has the base infrastructure you need:
> https://github.com/LucidWorks/solr-scale-tk
>
> Regards,
>    Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C] < 
> daniel.davis@nih.gov> wrote:
> >> What is the next step you are stuck on?
> >>
> >> Regards,
> >>    Alex
> >
> > I'm not really stuck.   My question has been about the best practices.
>  I am trying to work against "not-invented-here" syndrome,
> "only-useful-here" syndrome, and "boil-the-ocean" syndrome.    I have to
> make the solution work with a Continuous Integration (CI) environment 
> that will not be creating either docker images or VMs for each 
> project, and so I've been seeking the wisdom of the crowd.
> >
> > -----Original Message-----
> > From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> > Sent: Thursday, December 31, 2015 12:42 AM
> > To: solr-user <so...@lucene.apache.org>
> > Subject: Re: Testing Solr configuration, schema, and other fields
> >
> > I might be just confused here, but I am not sure what your bottle 
> > neck
> actually is. You seem to know your critical path already, so how can 
> we help?
> >
> > Starting new solr core from given configuration directory is easy.
> Catching hard errors from that is probably just gripping logs or a 
> custom logger.
> >
> > And you don't seem to be talking about lint style soft sanity 
> > checks,
> but rather the initialization stopping hard checks.
> >
> > What is the next step you are stuck on?
> >
> > Regards,
> >    Alex
> > On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]"
> > <da...@nih.gov>
> > wrote:
> >
> >> At my organization, I want to create a tool that allows users to keep a
> >> solr configuration as a Git repository.   Then, I want my Continuous
> >> Integration environment to take some branch of the git repository 
> >> and "publish" it into ZooKeeper/SolrCloud.
> >>
> >> Working on my own, it is only a very small pain to note foolish errors
> >> I've made, fix them, and restart.    However, I want my users to be
> able to
> >> edit their own Solr schema and config *most* of the time, at least on
> >> development servers.    They will not have command-line access to these
> >> servers, and I want to avoid endless restarts.
> >>
> >> I'm not interested in fighting to maintain such a useless thing as 
> >> a DTD/XSD without community support; what I really want to know is whether
> >> Solr will start and can index some sample documents.   I'm wondering
> >> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> >> and capture error messages/exceptions in a reasonable way.     This tool
> >> could then be run by my users before they commit to git, and then 
> >> again by the CI server before it "publishes" the configuration to 
> >> ZooKeeper/SolrCloud.
> >>
> >> Any suggestions?
> >>
> >> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> >> Computer and Communications Systems, National Library of Medicine, 
> >> NIH
> >>
> >>
>

RE: Testing Solr configuration, schema, and other fields

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

Well, I guess NIH stands for Not Invented Here. No idea what NLM is for.

P.s. sorry, could not resist. I worked for orgs like that too :-(
On 1 Jan 2016 12:03 am, "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>
wrote:

> That's incredibly cool.   Much easier than the chef/puppet scripts and
> stuff I've seen.    I'm certain to play with this and get under the hood;
> however, we locally don't have a permission to use AWS EC2 in this corner
> of NLM.    There's some limited use of S3 and Glacier.   Maybe we'll
> negotiate EC2 for dev later this year, maybe not.
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Thursday, December 31, 2015 11:40 AM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> Makes sense.
>
> Answering the answer email in this thread, did you look at Solr Scale?
> Maybe it has the base infrastructure you need:
> https://github.com/LucidWorks/solr-scale-tk
>
> Regards,
>    Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C] <
> daniel.davis@nih.gov> wrote:
> >> What is the next step you are stuck on?
> >>
> >> Regards,
> >>    Alex
> >
> > I'm not really stuck.   My question has been about the best practices.
>  I am trying to work against "not-invented-here" syndrome,
> "only-useful-here" syndrome, and "boil-the-ocean" syndrome.    I have to
> make the solution work with a Continuous Integration (CI) environment that
> will not be creating either docker images or VMs for each project, and so
> I've been seeking the wisdom of the crowd.
> >
> > -----Original Message-----
> > From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> > Sent: Thursday, December 31, 2015 12:42 AM
> > To: solr-user <so...@lucene.apache.org>
> > Subject: Re: Testing Solr configuration, schema, and other fields
> >
> > I might be just confused here, but I am not sure what your bottle neck
> actually is. You seem to know your critical path already, so how can we
> help?
> >
> > Starting new solr core from given configuration directory is easy.
> Catching hard errors from that is probably just gripping logs or a custom
> logger.
> >
> > And you don't seem to be talking about lint style soft sanity checks,
> but rather the initialization stopping hard checks.
> >
> > What is the next step you are stuck on?
> >
> > Regards,
> >    Alex
> > On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]"
> > <da...@nih.gov>
> > wrote:
> >
> >> At my organization, I want to create a tool that allows users to keep a
> >> solr configuration as a Git repository.   Then, I want my Continuous
> >> Integration environment to take some branch of the git repository and
> >> "publish" it into ZooKeeper/SolrCloud.
> >>
> >> Working on my own, it is only a very small pain to note foolish errors
> >> I've made, fix them, and restart.    However, I want my users to be
> able to
> >> edit their own Solr schema and config *most* of the time, at least on
> >> development servers.    They will not have command-line access to these
> >> servers, and I want to avoid endless restarts.
> >>
> >> I'm not interested in fighting to maintain such a useless thing as a
> >> DTD/XSD without community support; what I really want to know is whether
> >> Solr will start and can index some sample documents.   I'm wondering
> >> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> >> and capture error messages/exceptions in a reasonable way.     This tool
> >> could then be run by my users before they commit to git, and then
> >> again by the CI server before it "publishes" the configuration to
> >> ZooKeeper/SolrCloud.
> >>
> >> Any suggestions?
> >>
> >> Dan Davis, Systems/Applications Architect (Contractor), Office of
> >> Computer and Communications Systems, National Library of Medicine,
> >> NIH
> >>
> >>
>

RE: Testing Solr configuration, schema, and other fields

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

That's incredibly cool.   Much easier than the chef/puppet scripts and stuff I've seen.    I'm certain to play with this and get under the hood; however, we locally don't have a permission to use AWS EC2 in this corner of NLM.    There's some limited use of S3 and Glacier.   Maybe we'll negotiate EC2 for dev later this year, maybe not.
 
-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Thursday, December 31, 2015 11:40 AM
To: solr-user <so...@lucene.apache.org>
Subject: Re: Testing Solr configuration, schema, and other fields

Makes sense.

Answering the answer email in this thread, did you look at Solr Scale?
Maybe it has the base infrastructure you need:
https://github.com/LucidWorks/solr-scale-tk

Regards,
   Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
>> What is the next step you are stuck on?
>>
>> Regards,
>>    Alex
>
> I'm not really stuck.   My question has been about the best practices.   I am trying to work against "not-invented-here" syndrome, "only-useful-here" syndrome, and "boil-the-ocean" syndrome.    I have to make the solution work with a Continuous Integration (CI) environment that will not be creating either docker images or VMs for each project, and so I've been seeking the wisdom of the crowd.
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Thursday, December 31, 2015 12:42 AM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> I might be just confused here, but I am not sure what your bottle neck actually is. You seem to know your critical path already, so how can we help?
>
> Starting new solr core from given configuration directory is easy. Catching hard errors from that is probably just gripping logs or a custom logger.
>
> And you don't seem to be talking about lint style soft sanity checks, but rather the initialization stopping hard checks.
>
> What is the next step you are stuck on?
>
> Regards,
>    Alex
> On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" 
> <da...@nih.gov>
> wrote:
>
>> At my organization, I want to create a tool that allows users to keep a
>> solr configuration as a Git repository.   Then, I want my Continuous
>> Integration environment to take some branch of the git repository and 
>> "publish" it into ZooKeeper/SolrCloud.
>>
>> Working on my own, it is only a very small pain to note foolish errors
>> I've made, fix them, and restart.    However, I want my users to be able to
>> edit their own Solr schema and config *most* of the time, at least on
>> development servers.    They will not have command-line access to these
>> servers, and I want to avoid endless restarts.
>>
>> I'm not interested in fighting to maintain such a useless thing as a 
>> DTD/XSD without community support; what I really want to know is whether
>> Solr will start and can index some sample documents.   I'm wondering
>> whether I might be able to build a tool to fire up an EmbeddedSolrServer
>> and capture error messages/exceptions in a reasonable way.     This tool
>> could then be run by my users before they commit to git, and then 
>> again by the CI server before it "publishes" the configuration to 
>> ZooKeeper/SolrCloud.
>>
>> Any suggestions?
>>
>> Dan Davis, Systems/Applications Architect (Contractor), Office of 
>> Computer and Communications Systems, National Library of Medicine, 
>> NIH
>>
>>

Re: Testing Solr configuration, schema, and other fields

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

Makes sense.

Answering the answer email in this thread, did you look at Solr Scale?
Maybe it has the base infrastructure you need:
https://github.com/LucidWorks/solr-scale-tk

Regards,
   Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C]
<da...@nih.gov> wrote:
>> What is the next step you are stuck on?
>>
>> Regards,
>>    Alex
>
> I'm not really stuck.   My question has been about the best practices.   I am trying to work against "not-invented-here" syndrome, "only-useful-here" syndrome, and "boil-the-ocean" syndrome.    I have to make the solution work with a Continuous Integration (CI) environment that will not be creating either docker images or VMs for each project, and so I've been seeking the wisdom of the crowd.
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Thursday, December 31, 2015 12:42 AM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> I might be just confused here, but I am not sure what your bottle neck actually is. You seem to know your critical path already, so how can we help?
>
> Starting new solr core from given configuration directory is easy. Catching hard errors from that is probably just gripping logs or a custom logger.
>
> And you don't seem to be talking about lint style soft sanity checks, but rather the initialization stopping hard checks.
>
> What is the next step you are stuck on?
>
> Regards,
>    Alex
> On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>
> wrote:
>
>> At my organization, I want to create a tool that allows users to keep a
>> solr configuration as a Git repository.   Then, I want my Continuous
>> Integration environment to take some branch of the git repository and
>> "publish" it into ZooKeeper/SolrCloud.
>>
>> Working on my own, it is only a very small pain to note foolish errors
>> I've made, fix them, and restart.    However, I want my users to be able to
>> edit their own Solr schema and config *most* of the time, at least on
>> development servers.    They will not have command-line access to these
>> servers, and I want to avoid endless restarts.
>>
>> I'm not interested in fighting to maintain such a useless thing as a
>> DTD/XSD without community support; what I really want to know is whether
>> Solr will start and can index some sample documents.   I'm wondering
>> whether I might be able to build a tool to fire up an EmbeddedSolrServer
>> and capture error messages/exceptions in a reasonable way.     This tool
>> could then be run by my users before they commit to git, and then
>> again by the CI server before it "publishes" the configuration to
>> ZooKeeper/SolrCloud.
>>
>> Any suggestions?
>>
>> Dan Davis, Systems/Applications Architect (Contractor), Office of
>> Computer and Communications Systems, National Library of Medicine, NIH
>>
>>

RE: Testing Solr configuration, schema, and other fields

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

> What is the next step you are stuck on?
> 
> Regards,
>    Alex

I'm not really stuck.   My question has been about the best practices.   I am trying to work against "not-invented-here" syndrome, "only-useful-here" syndrome, and "boil-the-ocean" syndrome.    I have to make the solution work with a Continuous Integration (CI) environment that will not be creating either docker images or VMs for each project, and so I've been seeking the wisdom of the crowd.

-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Thursday, December 31, 2015 12:42 AM
To: solr-user <so...@lucene.apache.org>
Subject: Re: Testing Solr configuration, schema, and other fields

I might be just confused here, but I am not sure what your bottle neck actually is. You seem to know your critical path already, so how can we help?

Starting new solr core from given configuration directory is easy. Catching hard errors from that is probably just gripping logs or a custom logger.

And you don't seem to be talking about lint style soft sanity checks, but rather the initialization stopping hard checks.

What is the next step you are stuck on?

Regards,
   Alex
On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>
wrote:

> At my organization, I want to create a tool that allows users to keep a
> solr configuration as a Git repository.   Then, I want my Continuous
> Integration environment to take some branch of the git repository and 
> "publish" it into ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors
> I've made, fix them, and restart.    However, I want my users to be able to
> edit their own Solr schema and config *most* of the time, at least on
> development servers.    They will not have command-line access to these
> servers, and I want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a 
> DTD/XSD without community support; what I really want to know is whether
> Solr will start and can index some sample documents.   I'm wondering
> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> and capture error messages/exceptions in a reasonable way.     This tool
> could then be run by my users before they commit to git, and then 
> again by the CI server before it "publishes" the configuration to 
> ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>
>

Re: Testing Solr configuration, schema, and other fields

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

I might be just confused here, but I am not sure what your bottle neck
actually is. You seem to know your critical path already, so how can we
help?

Starting new solr core from given configuration directory is easy. Catching
hard errors from that is probably just gripping logs or a custom logger.

And you don't seem to be talking about lint style soft sanity checks, but
rather the initialization stopping hard checks.

What is the next step you are stuck on?

Regards,
   Alex
On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>
wrote:

> At my organization, I want to create a tool that allows users to keep a
> solr configuration as a Git repository.   Then, I want my Continuous
> Integration environment to take some branch of the git repository and
> "publish" it into ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors
> I've made, fix them, and restart.    However, I want my users to be able to
> edit their own Solr schema and config *most* of the time, at least on
> development servers.    They will not have command-line access to these
> servers, and I want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a
> DTD/XSD without community support; what I really want to know is whether
> Solr will start and can index some sample documents.   I'm wondering
> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> and capture error messages/exceptions in a reasonable way.     This tool
> could then be run by my users before they commit to git, and then again by
> the CI server before it "publishes" the configuration to
> ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>
>

RE: Testing Solr configuration, schema, and other fields

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

I think of enterprise search as very similar to RDBMS:

- It belongs in the backend behind your app.
- Each project ought to control its own schema and data.

So, I want the configset for each team's Solr collections to be stored in our Git server just as the RDBMS schema is if a developer is using a framework or a couple of SQL files, scripts, and a VERSION table.    It ought to be that easy.


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, December 30, 2015 5:37 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: Testing Solr configuration, schema, and other fields

Yeah, the notion of DTDs have gone around several times but always founder on the fact that you can, say, define your own Filter with it's own set of parameters etc. Sure, you can make a generic DTD that accommodates this, but then it becomes so general as to be little more than a syntax checker.

The managed schema stuff allows modifications of the schema via REST calls and there is some equivalent functionality for solrconfig.xml, but the interesting bit about that is that then your VCS is not the "one true source" of the configs, it almost goes backwards: Modify the configs in Zookeeper then check in to Git.
And even that doesn't really solve, say, putting default search fields in solrconfig.xml that do not exist in the schema file.

Frankly what I usually do when heavily editing either one is just do it on my local laptop, either stand alone or SolrCloud, _then_ check it in and/or test it on my cloud setup. So I guess the take-away is that I don't have any very good solution here.

Best,
Erick


On Wed, Dec 30, 2015 at 1:10 PM, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
> Your bottom line point is that EmbeddedSolrServer is different, and some configurations will not work on it where they would work on a SolrCloud.   This is well taken.   Maybe creating a new collection on existing dev nodes could be done.
>
> As far as VDI and Puppet.   My requirements are different because my organization is different.   I would prefer not to go into how different.   I have written puppet modules for other system configurations, tested them on AWS EC2, and yet those modules have not been adopted by my organization.
>
>
> -----Original Message-----
> From: Mark Horninger [mailto:mhorninger@grayhairsoftware.com]
> Sent: Wednesday, December 30, 2015 3:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Testing Solr configuration, schema, and other fields
>
> Daniel,
>
>
> Sounds almost like you're reinventing the wheel.  Could you possibly automate this through puppet or Chef?  With a VDI environment, then all you would need to do is build a new VM Node based on original setup.  Then you can just roll out the node as one of the zk nodes.
>
> Just a thought on that subject.
>
> v/r,
>
> -Mark H.
>
> -----Original Message-----
> From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.davis@nih.gov]
> Sent: Wednesday, December 30, 2015 3:10 PM
> To: solr-user@lucene.apache.org
> Subject: Testing Solr configuration, schema, and other fields
>
> At my organization, I want to create a tool that allows users to keep a solr configuration as a Git repository.   Then, I want my Continuous Integration environment to take some branch of the git repository and "publish" it into ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors I've made, fix them, and restart.    However, I want my users to be able to edit their own Solr schema and config *most* of the time, at least on development servers.    They will not have command-line access to these servers, and I want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD without community support; what I really want to know is whether Solr will start and can index some sample documents.   I'm wondering whether I might be able to build a tool to fire up an EmbeddedSolrServer and capture error messages/exceptions in a reasonable way.     This tool could then be run by my users before they commit to git, and then again by the CI server before it "publishes" the configuration to ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>
> [GrayHair]
> GHS Confidentiality Notice
>
> This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of this information is prohibited, and may be punishable by law. If this was sent to you in error, please notify the sender by reply e-mail and destroy all copies of the original message.
>
> GrayHair Software <http://www.grayhairSoftware.com>
>

Re: Testing Solr configuration, schema, and other fields

Posted by Erick Erickson <er...@gmail.com>.

Yeah, the notion of DTDs have gone around several times but always founder
on the fact that you can, say, define your own Filter with it's own set of
parameters etc. Sure, you can make a generic DTD that accommodates
this, but then it becomes so general as to be little more than a syntax checker.

The managed schema stuff allows modifications of the schema via REST calls
and there is some equivalent functionality for solrconfig.xml, but the
interesting
bit about that is that then your VCS is not the "one true source" of
the configs,
it almost goes backwards: Modify the configs in Zookeeper then check in to Git.
And even that doesn't really solve, say, putting default search fields in
solrconfig.xml that do not exist in the schema file.

Frankly what I usually do when heavily editing either one is just do
it on my local
laptop, either stand alone or SolrCloud, _then_ check it in and/or test it on
my cloud setup. So I guess the take-away is that I don't have any very good
solution here.

Best,
Erick


On Wed, Dec 30, 2015 at 1:10 PM, Davis, Daniel (NIH/NLM) [C]
<da...@nih.gov> wrote:
> Your bottom line point is that EmbeddedSolrServer is different, and some configurations will not work on it where they would work on a SolrCloud.   This is well taken.   Maybe creating a new collection on existing dev nodes could be done.
>
> As far as VDI and Puppet.   My requirements are different because my organization is different.   I would prefer not to go into how different.   I have written puppet modules for other system configurations, tested them on AWS EC2, and yet those modules have not been adopted by my organization.
>
>
> -----Original Message-----
> From: Mark Horninger [mailto:mhorninger@grayhairsoftware.com]
> Sent: Wednesday, December 30, 2015 3:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Testing Solr configuration, schema, and other fields
>
> Daniel,
>
>
> Sounds almost like you're reinventing the wheel.  Could you possibly automate this through puppet or Chef?  With a VDI environment, then all you would need to do is build a new VM Node based on original setup.  Then you can just roll out the node as one of the zk nodes.
>
> Just a thought on that subject.
>
> v/r,
>
> -Mark H.
>
> -----Original Message-----
> From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.davis@nih.gov]
> Sent: Wednesday, December 30, 2015 3:10 PM
> To: solr-user@lucene.apache.org
> Subject: Testing Solr configuration, schema, and other fields
>
> At my organization, I want to create a tool that allows users to keep a solr configuration as a Git repository.   Then, I want my Continuous Integration environment to take some branch of the git repository and "publish" it into ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors I've made, fix them, and restart.    However, I want my users to be able to edit their own Solr schema and config *most* of the time, at least on development servers.    They will not have command-line access to these servers, and I want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD without community support; what I really want to know is whether Solr will start and can index some sample documents.   I'm wondering whether I might be able to build a tool to fire up an EmbeddedSolrServer and capture error messages/exceptions in a reasonable way.     This tool could then be run by my users before they commit to git, and then again by the CI server before it "publishes" the configuration to ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH
>
> [GrayHair]
> GHS Confidentiality Notice
>
> This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of this information is prohibited, and may be punishable by law. If this was sent to you in error, please notify the sender by reply e-mail and destroy all copies of the original message.
>
> GrayHair Software <http://www.grayhairSoftware.com>
>

RE: Testing Solr configuration, schema, and other fields

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

Your bottom line point is that EmbeddedSolrServer is different, and some configurations will not work on it where they would work on a SolrCloud.   This is well taken.   Maybe creating a new collection on existing dev nodes could be done.

As far as VDI and Puppet.   My requirements are different because my organization is different.   I would prefer not to go into how different.   I have written puppet modules for other system configurations, tested them on AWS EC2, and yet those modules have not been adopted by my organization.

-----Original Message-----
From: Mark Horninger [mailto:mhorninger@grayhairsoftware.com] 
Sent: Wednesday, December 30, 2015 3:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Testing Solr configuration, schema, and other fields

Daniel,

Sounds almost like you're reinventing the wheel.  Could you possibly automate this through puppet or Chef?  With a VDI environment, then all you would need to do is build a new VM Node based on original setup.  Then you can just roll out the node as one of the zk nodes.

Just a thought on that subject.

v/r,

-Mark H.

-----Original Message-----
From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.davis@nih.gov]
Sent: Wednesday, December 30, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Testing Solr configuration, schema, and other fields

At my organization, I want to create a tool that allows users to keep a solr configuration as a Git repository.   Then, I want my Continuous Integration environment to take some branch of the git repository and "publish" it into ZooKeeper/SolrCloud.

Working on my own, it is only a very small pain to note foolish errors I've made, fix them, and restart.    However, I want my users to be able to edit their own Solr schema and config *most* of the time, at least on development servers.    They will not have command-line access to these servers, and I want to avoid endless restarts.

I'm not interested in fighting to maintain such a useless thing as a DTD/XSD without community support; what I really want to know is whether Solr will start and can index some sample documents.   I'm wondering whether I might be able to build a tool to fire up an EmbeddedSolrServer and capture error messages/exceptions in a reasonable way.     This tool could then be run by my users before they commit to git, and then again by the CI server before it "publishes" the configuration to ZooKeeper/SolrCloud.

Any suggestions?

Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH

[GrayHair]
GHS Confidentiality Notice

This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of this information is prohibited, and may be punishable by law. If this was sent to you in error, please notify the sender by reply e-mail and destroy all copies of the original message.

GrayHair Software <http://www.grayhairSoftware.com>

RE: Testing Solr configuration, schema, and other fields

Posted by Mark Horninger <mh...@grayhairsoftware.com>.

Daniel,


Sounds almost like you're reinventing the wheel.  Could you possibly automate this through puppet or Chef?  With a VDI environment, then all you would need to do is build a new VM Node based on original setup.  Then you can just roll out the node as one of the zk nodes.

Just a thought on that subject.

v/r,

-Mark H.

-----Original Message-----
From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.davis@nih.gov]
Sent: Wednesday, December 30, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Testing Solr configuration, schema, and other fields

At my organization, I want to create a tool that allows users to keep a solr configuration as a Git repository.   Then, I want my Continuous Integration environment to take some branch of the git repository and "publish" it into ZooKeeper/SolrCloud.

Working on my own, it is only a very small pain to note foolish errors I've made, fix them, and restart.    However, I want my users to be able to edit their own Solr schema and config *most* of the time, at least on development servers.    They will not have command-line access to these servers, and I want to avoid endless restarts.

I'm not interested in fighting to maintain such a useless thing as a DTD/XSD without community support; what I really want to know is whether Solr will start and can index some sample documents.   I'm wondering whether I might be able to build a tool to fire up an EmbeddedSolrServer and capture error messages/exceptions in a reasonable way.     This tool could then be run by my users before they commit to git, and then again by the CI server before it "publishes" the configuration to ZooKeeper/SolrCloud.

Any suggestions?

Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH

[GrayHair]
GHS Confidentiality Notice

This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of this information is prohibited, and may be punishable by law. If this was sent to you in error, please notify the sender by reply e-mail and destroy all copies of the original message.

GrayHair Software <http://www.grayhairSoftware.com>