You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bob Lawson <bw...@gmail.com> on 2016/01/08 03:36:03 UTC

Manage schema.xml via Solrj?

I want to programmatically make changes to schema.xml using java to do it.  Should I use Solrj to do this or is there a better way?  Can I use Solrj to make the rest calls that make up the schema API?  Whatever the answer, can anyone point me to an example showing how to do it?  Thanks!


Re: Manage schema.xml via Solrj?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/8/2016 6:30 AM, Bob Lawson wrote:
> Thanks for the replies.  The problem I'm trying to solve is to automate
> whatever steps I can in configuring Solr for our customer.  Rather than an
> admin have to edit schema.xml, I thought it would be easier and less
> error-prone to do it programmatically.  But I'm a novice, so if there is a
> better, more standard way, please let me know.  Thanks!!!

I personally find editing the schema.xml to be the best option, but I
have not actually used the Schema API.  At the point in my deployment
where I was making frequent schema edits (mostly on 1.4 versions, with
some of it on 3.x versions), the API did not exist.

The information about this API in the reference guide looks pretty nice.

> PS:  What do you mean by "XY problem"?

This is summarized here:

https://home.apache.org/~hossman/#xyproblem

Thanks,
Shawn


RE: Manage schema.xml via Solrj?

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
Bob,

XY problem means that you are presenting the imagined solution without presenting the problem to solve.   In other words, you are presenting X (solve for X), without a full statement of the equation to be solved for X.

My guess at your problem is the same as my problem - editing Solr configuration (schema and solrconfig.xml) as files is very flexible and Agile compared to a form based solution, but that comes with the downside that anyone can "crash" a Solr collection by editing the schema wrong.   This goes beyond just XML syntax checking, obviously.    But only Solr is the authority on what a good schema (and other configuration) should look like.

I'm working on a tool that can provide a bit of "smoke testing" on a Solr configuration directory.   The workflow I envision is like this:

1. DEVELOPER, TEAM LEAD, or SOLR ADMIN MAKE CHANGES TO CONFIGURATION DIRECTORY

     In the beginning, they may need to make lots of changes.   Eventually, they are only making small changes, but we don't want those
     Small changes to crash anything.

2. DEVELOPER, TEAM LEAD, or SOLR ADMIN TRIGGER CONTINUOS INTEGRATION

     When they push or merge to a git branch,  that may trigger a CI workflow.   The workflow works like this:

         2a.  Run the "smoke test" tool to (a) create a temporary configset in Zookeeper, (b) create a temporary collection in SolrCloud, and (c) do simple indexing.
         2b.  Use zkCli.sh and solr.sh to update the actual configset and collection in SolrCloud.

3. ITERATE

     This can happen again and again with a "staging", "QA", "Production" set of branches.    Other checks can be put into the CI workflow as well.

So, along the way to having this vision (of my solution), I also considered the advantage of schemaless systems.   I don't want to throw stones, but I think schemaless is mostly a marketing term for a couple of reasons:

 - I do Linked Data/RDF - it is different from SQL, but not schemaless.   If your "vocabulary" is badly designed, then your users will have problems.
 - ElasticSearch is not really schemaless.   Any ElasticSearch conference is filled with tracks/sessions on how to get your "field mappings" right, and what happens if you don't (too big indexes, need to re-index to fix stuff, etc.)
 - IBM Watson Explorer is not really schemaless - your update document has to specify the type and treatment of each field, or your XSLT must transform your document into a structure that does so.

Many of us have also seen what happens with non-dernormalized SQL or fully normalized SQL.   "Schemafull" ought to be a marketing term as well.

-----Original Message-----
From: Bob Lawson [mailto:bwlawson.jr@gmail.com] 
Sent: Friday, January 08, 2016 8:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Manage schema.xml via Solrj?

Thanks for the replies.  The problem I'm trying to solve is to automate whatever steps I can in configuring Solr for our customer.  Rather than an admin have to edit schema.xml, I thought it would be easier and less error-prone to do it programmatically.  But I'm a novice, so if there is a better, more standard way, please let me know.  Thanks!!!

PS:  What do you mean by "XY problem"?

On Thu, Jan 7, 2016 at 11:20 PM, Erick Erickson <er...@gmail.com>
wrote:

> I'd ask first what the high-level problem you're trying to solve is, 
> this could be an XY problem.
>
> That said, there's the Schema API you can use, see:
> https://cwiki.apache.org/confluence/display/solr/Schema+API
>
> You can access it from the SolrJ library, see SchemaRequest.java. For 
> examples of using this, see:
> SchemaTest.java
>
> to _get_ the Solr source code to see these, see:
> https://wiki.apache.org/solr/HowToContribute
>
> Best,
> Erick
>
> On Thu, Jan 7, 2016 at 7:01 PM, Binoy Dalal <bi...@gmail.com>
> wrote:
> > I am not sure about solrj but you can use any XML parsing library to 
> > achieve this.
> > Take a look here:
> > http://www.tutorialspoint.com/java_xml/java_xml_parsers.htm
> >
> > On Fri, 8 Jan 2016, 08:06 Bob Lawson <bw...@gmail.com> wrote:
> >
> >> I want to programmatically make changes to schema.xml using java to 
> >> do it.  Should I use Solrj to do this or is there a better way?  
> >> Can I use Solrj to make the rest calls that make up the schema API?  
> >> Whatever the answer, can anyone point me to an example showing how to do it?  Thanks!
> >>
> >> --
> > Regards,
> > Binoy Dalal
>

Re: Manage schema.xml via Solrj?

Posted by Bob Lawson <bw...@gmail.com>.
Thanks for the replies.  The problem I'm trying to solve is to automate
whatever steps I can in configuring Solr for our customer.  Rather than an
admin have to edit schema.xml, I thought it would be easier and less
error-prone to do it programmatically.  But I'm a novice, so if there is a
better, more standard way, please let me know.  Thanks!!!

PS:  What do you mean by "XY problem"?

On Thu, Jan 7, 2016 at 11:20 PM, Erick Erickson <er...@gmail.com>
wrote:

> I'd ask first what the high-level problem you're trying to solve is, this
> could be an XY problem.
>
> That said, there's the Schema API you can use, see:
> https://cwiki.apache.org/confluence/display/solr/Schema+API
>
> You can access it from the SolrJ library, see
> SchemaRequest.java. For examples of using this, see:
> SchemaTest.java
>
> to _get_ the Solr source code to see these, see:
> https://wiki.apache.org/solr/HowToContribute
>
> Best,
> Erick
>
> On Thu, Jan 7, 2016 at 7:01 PM, Binoy Dalal <bi...@gmail.com>
> wrote:
> > I am not sure about solrj but you can use any XML parsing library to
> > achieve this.
> > Take a look here:
> > http://www.tutorialspoint.com/java_xml/java_xml_parsers.htm
> >
> > On Fri, 8 Jan 2016, 08:06 Bob Lawson <bw...@gmail.com> wrote:
> >
> >> I want to programmatically make changes to schema.xml using java to do
> >> it.  Should I use Solrj to do this or is there a better way?  Can I use
> >> Solrj to make the rest calls that make up the schema API?  Whatever the
> >> answer, can anyone point me to an example showing how to do it?  Thanks!
> >>
> >> --
> > Regards,
> > Binoy Dalal
>

Re: Manage schema.xml via Solrj?

Posted by Erick Erickson <er...@gmail.com>.
I'd ask first what the high-level problem you're trying to solve is, this
could be an XY problem.

That said, there's the Schema API you can use, see:
https://cwiki.apache.org/confluence/display/solr/Schema+API

You can access it from the SolrJ library, see
SchemaRequest.java. For examples of using this, see:
SchemaTest.java

to _get_ the Solr source code to see these, see:
https://wiki.apache.org/solr/HowToContribute

Best,
Erick

On Thu, Jan 7, 2016 at 7:01 PM, Binoy Dalal <bi...@gmail.com> wrote:
> I am not sure about solrj but you can use any XML parsing library to
> achieve this.
> Take a look here:
> http://www.tutorialspoint.com/java_xml/java_xml_parsers.htm
>
> On Fri, 8 Jan 2016, 08:06 Bob Lawson <bw...@gmail.com> wrote:
>
>> I want to programmatically make changes to schema.xml using java to do
>> it.  Should I use Solrj to do this or is there a better way?  Can I use
>> Solrj to make the rest calls that make up the schema API?  Whatever the
>> answer, can anyone point me to an example showing how to do it?  Thanks!
>>
>> --
> Regards,
> Binoy Dalal

Re: Manage schema.xml via Solrj?

Posted by Binoy Dalal <bi...@gmail.com>.
I am not sure about solrj but you can use any XML parsing library to
achieve this.
Take a look here:
http://www.tutorialspoint.com/java_xml/java_xml_parsers.htm

On Fri, 8 Jan 2016, 08:06 Bob Lawson <bw...@gmail.com> wrote:

> I want to programmatically make changes to schema.xml using java to do
> it.  Should I use Solrj to do this or is there a better way?  Can I use
> Solrj to make the rest calls that make up the schema API?  Whatever the
> answer, can anyone point me to an example showing how to do it?  Thanks!
>
> --
Regards,
Binoy Dalal

Re: Manage schema.xml via Solrj?

Posted by Bob Lawson <bw...@gmail.com>.
Thank you all so much for your responses.  Very helpful indeed!


> On Jan 8, 2016, at 12:03 PM, Erick Erickson <er...@gmail.com> wrote:
> 
> First, Daniel nailed the XY problem, but this isn't that...
> 
> You're correct that hand-editing the schema file is error-prone.
> The managed schema API is your friend here. There are
> several commercial front-ends that already do this.
> 
> The managed schema API is all just HTTP, so there's nothing
> precluding a Java program from interpreting a form and sending
> off the proper HTTP requests to modify the schema.
> 
> The SolrJ client library has some sugar around this, there's no
> reason you can't use that as it's just a jar (and a dependency on
> a logging jar).
> 
> For SolrCloud it's a little different. You need to make sure your
> changes get to Zookeeper, which the schema API will handle
> for you.
> 
> One thing that's a bit confusing is "managed schema" and
> "schemaless". They both use the same underlying mechanism
> to modify the schema.xml file. With "managed schema" you do
> what you're talking about, have some process where you make
> specific modifications with the schema API to a controlled
> schema file.
> 
> "schemaless" automatically tries to guess what the schema
> _should_ be and uses the managed schema API to implement
> those guesses.
> 
> GW:
> Schema guessing is a great way to get things started, but virtually
> every organization I work with takes explicit control of the schema.
> They do this for three reasons:
> 1> the assumptions in managed schema create indexes that can be
> made much smaller by judicious options on the fields.
> 2> the search cases require careful analysis chains.
> 3> the guesses are wrong. I.e. if the first number encountered in a
> field is, say, 3 and the guessing says "Oh, this is an int field". The
> next doc is 3.4.. you'll get a parsing error and fail to index the doc.
> 
> 
> Best,
> Erick
> 
>> On Fri, Jan 8, 2016 at 7:38 AM, GW <th...@gmail.com> wrote:
>> Bob,
>> 
>> Not sure why you would want to do this. You can set up Solr to guess the
>> schema. It creates a file called manage_schema.xml for an override. This is
>> the case with 5.3 I came across it by accident setting it up the first time
>> and I was a little annoyed but it made for a quick setup. Your programming
>> would still need to realise the new doc structure and use that new document
>> structure. The only problem is it's a bit generic in the guess work and I
>> did not spend much time testing it out so I am not really versed in
>> operating it. I got myself mack to schema.xml ASAP. My thoughts are you are
>> looking at a lot of work for little gain.
>> 
>> Best,
>> 
>> GW
>> 
>> 
>> 
>>> On 7 January 2016 at 21:36, Bob Lawson <bw...@gmail.com> wrote:
>>> 
>>> I want to programmatically make changes to schema.xml using java to do
>>> it.  Should I use Solrj to do this or is there a better way?  Can I use
>>> Solrj to make the rest calls that make up the schema API?  Whatever the
>>> answer, can anyone point me to an example showing how to do it?  Thanks!
>>> 
>>> 

Re: Manage schema.xml via Solrj?

Posted by Erick Erickson <er...@gmail.com>.
First, Daniel nailed the XY problem, but this isn't that...

You're correct that hand-editing the schema file is error-prone.
The managed schema API is your friend here. There are
several commercial front-ends that already do this.

The managed schema API is all just HTTP, so there's nothing
precluding a Java program from interpreting a form and sending
off the proper HTTP requests to modify the schema.

The SolrJ client library has some sugar around this, there's no
reason you can't use that as it's just a jar (and a dependency on
a logging jar).

For SolrCloud it's a little different. You need to make sure your
changes get to Zookeeper, which the schema API will handle
for you.

One thing that's a bit confusing is "managed schema" and
"schemaless". They both use the same underlying mechanism
to modify the schema.xml file. With "managed schema" you do
what you're talking about, have some process where you make
specific modifications with the schema API to a controlled
schema file.

"schemaless" automatically tries to guess what the schema
_should_ be and uses the managed schema API to implement
those guesses.

GW:
Schema guessing is a great way to get things started, but virtually
every organization I work with takes explicit control of the schema.
They do this for three reasons:
1> the assumptions in managed schema create indexes that can be
made much smaller by judicious options on the fields.
2> the search cases require careful analysis chains.
3> the guesses are wrong. I.e. if the first number encountered in a
field is, say, 3 and the guessing says "Oh, this is an int field". The
next doc is 3.4.. you'll get a parsing error and fail to index the doc.


Best,
Erick

On Fri, Jan 8, 2016 at 7:38 AM, GW <th...@gmail.com> wrote:
> Bob,
>
> Not sure why you would want to do this. You can set up Solr to guess the
> schema. It creates a file called manage_schema.xml for an override. This is
> the case with 5.3 I came across it by accident setting it up the first time
> and I was a little annoyed but it made for a quick setup. Your programming
> would still need to realise the new doc structure and use that new document
> structure. The only problem is it's a bit generic in the guess work and I
> did not spend much time testing it out so I am not really versed in
> operating it. I got myself mack to schema.xml ASAP. My thoughts are you are
> looking at a lot of work for little gain.
>
> Best,
>
> GW
>
>
>
> On 7 January 2016 at 21:36, Bob Lawson <bw...@gmail.com> wrote:
>
>> I want to programmatically make changes to schema.xml using java to do
>> it.  Should I use Solrj to do this or is there a better way?  Can I use
>> Solrj to make the rest calls that make up the schema API?  Whatever the
>> answer, can anyone point me to an example showing how to do it?  Thanks!
>>
>>

Re: Manage schema.xml via Solrj?

Posted by GW <th...@gmail.com>.
Bob,

Not sure why you would want to do this. You can set up Solr to guess the
schema. It creates a file called manage_schema.xml for an override. This is
the case with 5.3 I came across it by accident setting it up the first time
and I was a little annoyed but it made for a quick setup. Your programming
would still need to realise the new doc structure and use that new document
structure. The only problem is it's a bit generic in the guess work and I
did not spend much time testing it out so I am not really versed in
operating it. I got myself mack to schema.xml ASAP. My thoughts are you are
looking at a lot of work for little gain.

Best,

GW



On 7 January 2016 at 21:36, Bob Lawson <bw...@gmail.com> wrote:

> I want to programmatically make changes to schema.xml using java to do
> it.  Should I use Solrj to do this or is there a better way?  Can I use
> Solrj to make the rest calls that make up the schema API?  Whatever the
> answer, can anyone point me to an example showing how to do it?  Thanks!
>
>