You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Stephen Lewis <sl...@panopto.com> on 2016/06/21 02:01:38 UTC

Updating solr schema for a collection in place

Hello,

I've recently set up a solr cloud using solr 6.0, and I've been having some
trouble getting our collections to pick up schema updates. Following the docs
on zkcli.sh
<https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files>
and
the collections API
<https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2>,
I have uploaded the new schema by placing it onto a solr node at
/opt/solr/server/configsets/my_collection/conf/schema.xml and running

​/opt/solr/cloud-scripts/zkcli.sh \

-zkhost zkdns.foo.bar \

-cmd upconfig \

-confname my_collection \

-confdir /opt/solr/server/configsets/my_collection/conf


​and then triggering a reload of the collection by hitting

solrNodeDns.foo.bar:<PORT>/solr/admin/collections?action=RELOAD&name=my_collection

​
The action reports success.

Afterwards, however, I see something kind of strange. If I go to the admin
page and look at the schema in /
~cloud?view=tree,
​ the updated schema is present. However, when I go to the collections
admin page and click on schema, I do not see the new fields present.
Querying for them directly also continues to lead to 400 "bad request"
responses, so suggesting that the new schema hasn't been picked up anywhere
else either.

Is there another step that I am missing to complete the update? I found
this stack overflow
<http://stackoverflow.com/questions/36714077/solr-reload-is-not-picking-up-the-latest-changes-from-zookeeper>
post where the posted is advised to recreate each core, though this seems
like the wrong way to go to me. Any advice you have is appreciated.

A few more notes about the cluster: I am running solr 6.0 in solr-cloud
mode with freestanding zookeeper machines running zk 3.4.6.

Thanks!

Stephen

stephen-lewis.net

Re: Updating solr schema for a collection in place

Posted by Erick Erickson <er...@gmail.com>.
Well, if it works... changing schema factories should be fine, assuming
you've correctly configured before reload, I.e. point at the right schema
file etc. My comments were more thinking about changing the schema rather
than the config.

Best,
Erick
On Jun 20, 2016 10:26 PM, "Stephen Lewis" <sl...@panopto.com> wrote:

Oh, also I see when I first replied, I missed addressing this


> For instance,
> ​ ​
> having a field defined with docValues set to false, indexing some data
> then changing that field to docValues="true" and indexing some more data

will give you "interesting" results.


The way we update our data model is to run fields in parallel as we migrate
fields through a "rebuild" we do in the background. For any update
requiring in place updates of fields (which we've yet to do), we would have
stood up a parallel cloud and run a data migration. (We weren't actually
100% sure this would be strictly necessary.) If I understand you right, we
could use the managed schema
<
https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig
>
factory
and perform an atomic field update to the schema like this in place. That
could make testing and migration even quicker for us. I think in prod we
would be able to make good use of this too, though we would probably still
want to run a parallel cloud in isolation while doing this update if there
were a risk of delay in write throughput or heavy perf at peak time.

In my test environment noodling, I noticed that even when using a managed
schema, I could update the solrconfig.xml through a reload. Is it generally
safe to switch between schema factories through schema reloads, or is this
getting on the "cavalier" side of things? :)

On Mon, Jun 20, 2016 at 9:51 PM, Stephen Lewis <sl...@panopto.com> wrote:

> ​Thanks for the advice! I haven't encountered those nuances yet so it's
> great to be aware of them now.
>
> I manage our solr clouds through an OO python package which models our
> search stack. We use this package deploy to stacks which are isolated and
> configurable, but otherwise identical. We push our updates to the config
> through to our test environment for a test pass, next to our production
> clouds in parallel, and finally we flight to users. It's been a pretty
good
> system so far, and generally I haven't had many issues using solr 6.0. We
> were using 4.9 until relatively recently, and we did have some troubles
> with the collections API. In those cases, we resolved by recreating the
> collection. So far 6.0 seems to hum along gracefully as we use the API.
>
> Thanks again for letting us know to keep a sharp eye on the details and to
> be on the lookout for interesting behavior :)
>
> Best,
> Stephen
>
> On Mon, Jun 20, 2016 at 7:56 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Glad you found the issue. The switch to managed has tripped up
>> more people than just you!
>>
>> Do be a little cautious about changing the schema however. There
>> are some "benign" changes you can do when you already have data
>> indexed and a series of others that are not benign. For instance,
>> having a field defined with docValues set to false, indexing some data
>> then changing that field to docValues="true" and indexing some more data
>> will give you "interesting" results.
>>
>> Other operations, like adding new fieldTypes or new Fields are entirely
>> benign.
>>
>> Mostly, this is just a caution that if you are changing your schema
>> and find results wonky (e.g. facet counts not correct, docs not being
>> found
>> when you change stemming, etc). to consider deleting/recreating the
>> collection before tearing your hair out.
>>
>> Best,
>> Erick
>>
>> On Mon, Jun 20, 2016 at 10:37 PM, Stephen Lewis <sl...@panopto.com>
>> wrote:
>> > I'm happy to say I figured out the issue. Looking through previous
>> > questions in this forum, I was able to find someone hitting the same
>> issue
>> > which I was. After upgrading versions, we switched to the managed
>> instead
>> > of the ClassicIndexSchemaFactory unintentionally. Sorry for the bother!
>> >
>> > On Mon, Jun 20, 2016 at 7:01 PM, Stephen Lewis <sl...@panopto.com>
>> wrote:
>> >
>> >> Hello,
>> >>
>> >> I've recently set up a solr cloud using solr 6.0, and I've been having
>> >> some trouble getting our collections to pick up schema updates.
>> Following
>> >> the docs on zkcli.sh
>> >> <
>>
https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files
>
>> and
>> >> the collections API
>> >> <
>>
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2
>> >,
>> >> I have uploaded the new schema by placing it onto a solr node at
>> >> /opt/solr/server/configsets/my_collection/conf/schema.xml and running
>> >>
>> >> /opt/solr/cloud-scripts/zkcli.sh \
>> >>
>> >> -zkhost zkdns.foo.bar \
>> >>
>> >> -cmd upconfig \
>> >>
>> >> -confname my_collection \
>> >>
>> >> -confdir /opt/solr/server/configsets/my_collection/conf
>> >>
>> >>
>> >> and then triggering a reload of the collection by hitting
>> >>
>> >>
>> >>
>>
solrNodeDns.foo.bar:<PORT>/solr/admin/collections?action=RELOAD&name=my_collection
>> >>
>> >>
>> >> The action reports success.
>> >>
>> >> Afterwards, however, I see something kind of strange. If I go to the
>> admin
>> >> page and look at the schema in /
>> >> ~cloud?view=tree,
>> >> the updated schema is present. However, when I go to the collections
>> >> admin page and click on schema, I do not see the new fields present.
>> >> Querying for them directly also continues to lead to 400 "bad request"
>> >> responses, so suggesting that the new schema hasn't been picked up
>> anywhere
>> >> else either.
>> >>
>> >> Is there another step that I am missing to complete the update? I
found
>> >> this stack overflow
>> >> <
>>
http://stackoverflow.com/questions/36714077/solr-reload-is-not-picking-up-the-latest-changes-from-zookeeper
>> >
>> >> post where the posted is advised to recreate each core, though this
>> seems
>> >> like the wrong way to go to me. Any advice you have is appreciated.
>> >>
>> >> A few more notes about the cluster: I am running solr 6.0 in
solr-cloud
>> >> mode with freestanding zookeeper machines running zk 3.4.6.
>> >>
>> >> Thanks!
>> >>
>> >> Stephen
>> >>
>> >> stephen-lewis.net
>> >>
>> >
>> >
>> >
>> > --
>> > Stephen
>> >
>> > (206)753-9320
>> > stephen-lewis.net
>>
>
>
>
> --
> Stephen
>
> (206)753-9320
> stephen-lewis.net
>



--
Stephen

(206)753-9320
stephen-lewis.net

Re: Updating solr schema for a collection in place

Posted by Stephen Lewis <sl...@panopto.com>.
Oh, also I see when I first replied, I missed addressing this


> For instance,
> ​ ​
> having a field defined with docValues set to false, indexing some data
> then changing that field to docValues="true" and indexing some more data

will give you "interesting" results.


The way we update our data model is to run fields in parallel as we migrate
fields through a "rebuild" we do in the background. For any update
requiring in place updates of fields (which we've yet to do), we would have
stood up a parallel cloud and run a data migration. (We weren't actually
100% sure this would be strictly necessary.) If I understand you right, we
could use the managed schema
<https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig>
factory
and perform an atomic field update to the schema like this in place. That
could make testing and migration even quicker for us. I think in prod we
would be able to make good use of this too, though we would probably still
want to run a parallel cloud in isolation while doing this update if there
were a risk of delay in write throughput or heavy perf at peak time.

In my test environment noodling, I noticed that even when using a managed
schema, I could update the solrconfig.xml through a reload. Is it generally
safe to switch between schema factories through schema reloads, or is this
getting on the "cavalier" side of things? :)

On Mon, Jun 20, 2016 at 9:51 PM, Stephen Lewis <sl...@panopto.com> wrote:

> ​Thanks for the advice! I haven't encountered those nuances yet so it's
> great to be aware of them now.
>
> I manage our solr clouds through an OO python package which models our
> search stack. We use this package deploy to stacks which are isolated and
> configurable, but otherwise identical. We push our updates to the config
> through to our test environment for a test pass, next to our production
> clouds in parallel, and finally we flight to users. It's been a pretty good
> system so far, and generally I haven't had many issues using solr 6.0. We
> were using 4.9 until relatively recently, and we did have some troubles
> with the collections API. In those cases, we resolved by recreating the
> collection. So far 6.0 seems to hum along gracefully as we use the API.
>
> Thanks again for letting us know to keep a sharp eye on the details and to
> be on the lookout for interesting behavior :)
>
> Best,
> Stephen
>
> On Mon, Jun 20, 2016 at 7:56 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Glad you found the issue. The switch to managed has tripped up
>> more people than just you!
>>
>> Do be a little cautious about changing the schema however. There
>> are some "benign" changes you can do when you already have data
>> indexed and a series of others that are not benign. For instance,
>> having a field defined with docValues set to false, indexing some data
>> then changing that field to docValues="true" and indexing some more data
>> will give you "interesting" results.
>>
>> Other operations, like adding new fieldTypes or new Fields are entirely
>> benign.
>>
>> Mostly, this is just a caution that if you are changing your schema
>> and find results wonky (e.g. facet counts not correct, docs not being
>> found
>> when you change stemming, etc). to consider deleting/recreating the
>> collection before tearing your hair out.
>>
>> Best,
>> Erick
>>
>> On Mon, Jun 20, 2016 at 10:37 PM, Stephen Lewis <sl...@panopto.com>
>> wrote:
>> > I'm happy to say I figured out the issue. Looking through previous
>> > questions in this forum, I was able to find someone hitting the same
>> issue
>> > which I was. After upgrading versions, we switched to the managed
>> instead
>> > of the ClassicIndexSchemaFactory unintentionally. Sorry for the bother!
>> >
>> > On Mon, Jun 20, 2016 at 7:01 PM, Stephen Lewis <sl...@panopto.com>
>> wrote:
>> >
>> >> Hello,
>> >>
>> >> I've recently set up a solr cloud using solr 6.0, and I've been having
>> >> some trouble getting our collections to pick up schema updates.
>> Following
>> >> the docs on zkcli.sh
>> >> <
>> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files>
>> and
>> >> the collections API
>> >> <
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2
>> >,
>> >> I have uploaded the new schema by placing it onto a solr node at
>> >> /opt/solr/server/configsets/my_collection/conf/schema.xml and running
>> >>
>> >> /opt/solr/cloud-scripts/zkcli.sh \
>> >>
>> >> -zkhost zkdns.foo.bar \
>> >>
>> >> -cmd upconfig \
>> >>
>> >> -confname my_collection \
>> >>
>> >> -confdir /opt/solr/server/configsets/my_collection/conf
>> >>
>> >>
>> >> and then triggering a reload of the collection by hitting
>> >>
>> >>
>> >>
>> solrNodeDns.foo.bar:<PORT>/solr/admin/collections?action=RELOAD&name=my_collection
>> >>
>> >>
>> >> The action reports success.
>> >>
>> >> Afterwards, however, I see something kind of strange. If I go to the
>> admin
>> >> page and look at the schema in /
>> >> ~cloud?view=tree,
>> >> the updated schema is present. However, when I go to the collections
>> >> admin page and click on schema, I do not see the new fields present.
>> >> Querying for them directly also continues to lead to 400 "bad request"
>> >> responses, so suggesting that the new schema hasn't been picked up
>> anywhere
>> >> else either.
>> >>
>> >> Is there another step that I am missing to complete the update? I found
>> >> this stack overflow
>> >> <
>> http://stackoverflow.com/questions/36714077/solr-reload-is-not-picking-up-the-latest-changes-from-zookeeper
>> >
>> >> post where the posted is advised to recreate each core, though this
>> seems
>> >> like the wrong way to go to me. Any advice you have is appreciated.
>> >>
>> >> A few more notes about the cluster: I am running solr 6.0 in solr-cloud
>> >> mode with freestanding zookeeper machines running zk 3.4.6.
>> >>
>> >> Thanks!
>> >>
>> >> Stephen
>> >>
>> >> stephen-lewis.net
>> >>
>> >
>> >
>> >
>> > --
>> > Stephen
>> >
>> > (206)753-9320
>> > stephen-lewis.net
>>
>
>
>
> --
> Stephen
>
> (206)753-9320
> stephen-lewis.net
>



-- 
Stephen

(206)753-9320
stephen-lewis.net

Re: Updating solr schema for a collection in place

Posted by Stephen Lewis <sl...@panopto.com>.
​Thanks for the advice! I haven't encountered those nuances yet so it's
great to be aware of them now.

I manage our solr clouds through an OO python package which models our
search stack. We use this package deploy to stacks which are isolated and
configurable, but otherwise identical. We push our updates to the config
through to our test environment for a test pass, next to our production
clouds in parallel, and finally we flight to users. It's been a pretty good
system so far, and generally I haven't had many issues using solr 6.0. We
were using 4.9 until relatively recently, and we did have some troubles
with the collections API. In those cases, we resolved by recreating the
collection. So far 6.0 seems to hum along gracefully as we use the API.

Thanks again for letting us know to keep a sharp eye on the details and to
be on the lookout for interesting behavior :)

Best,
Stephen

On Mon, Jun 20, 2016 at 7:56 PM, Erick Erickson <er...@gmail.com>
wrote:

> Glad you found the issue. The switch to managed has tripped up
> more people than just you!
>
> Do be a little cautious about changing the schema however. There
> are some "benign" changes you can do when you already have data
> indexed and a series of others that are not benign. For instance,
> having a field defined with docValues set to false, indexing some data
> then changing that field to docValues="true" and indexing some more data
> will give you "interesting" results.
>
> Other operations, like adding new fieldTypes or new Fields are entirely
> benign.
>
> Mostly, this is just a caution that if you are changing your schema
> and find results wonky (e.g. facet counts not correct, docs not being found
> when you change stemming, etc). to consider deleting/recreating the
> collection before tearing your hair out.
>
> Best,
> Erick
>
> On Mon, Jun 20, 2016 at 10:37 PM, Stephen Lewis <sl...@panopto.com>
> wrote:
> > I'm happy to say I figured out the issue. Looking through previous
> > questions in this forum, I was able to find someone hitting the same
> issue
> > which I was. After upgrading versions, we switched to the managed instead
> > of the ClassicIndexSchemaFactory unintentionally. Sorry for the bother!
> >
> > On Mon, Jun 20, 2016 at 7:01 PM, Stephen Lewis <sl...@panopto.com>
> wrote:
> >
> >> Hello,
> >>
> >> I've recently set up a solr cloud using solr 6.0, and I've been having
> >> some trouble getting our collections to pick up schema updates.
> Following
> >> the docs on zkcli.sh
> >> <
> https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files>
> and
> >> the collections API
> >> <
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2
> >,
> >> I have uploaded the new schema by placing it onto a solr node at
> >> /opt/solr/server/configsets/my_collection/conf/schema.xml and running
> >>
> >> /opt/solr/cloud-scripts/zkcli.sh \
> >>
> >> -zkhost zkdns.foo.bar \
> >>
> >> -cmd upconfig \
> >>
> >> -confname my_collection \
> >>
> >> -confdir /opt/solr/server/configsets/my_collection/conf
> >>
> >>
> >> and then triggering a reload of the collection by hitting
> >>
> >>
> >>
> solrNodeDns.foo.bar:<PORT>/solr/admin/collections?action=RELOAD&name=my_collection
> >>
> >>
> >> The action reports success.
> >>
> >> Afterwards, however, I see something kind of strange. If I go to the
> admin
> >> page and look at the schema in /
> >> ~cloud?view=tree,
> >> the updated schema is present. However, when I go to the collections
> >> admin page and click on schema, I do not see the new fields present.
> >> Querying for them directly also continues to lead to 400 "bad request"
> >> responses, so suggesting that the new schema hasn't been picked up
> anywhere
> >> else either.
> >>
> >> Is there another step that I am missing to complete the update? I found
> >> this stack overflow
> >> <
> http://stackoverflow.com/questions/36714077/solr-reload-is-not-picking-up-the-latest-changes-from-zookeeper
> >
> >> post where the posted is advised to recreate each core, though this
> seems
> >> like the wrong way to go to me. Any advice you have is appreciated.
> >>
> >> A few more notes about the cluster: I am running solr 6.0 in solr-cloud
> >> mode with freestanding zookeeper machines running zk 3.4.6.
> >>
> >> Thanks!
> >>
> >> Stephen
> >>
> >> stephen-lewis.net
> >>
> >
> >
> >
> > --
> > Stephen
> >
> > (206)753-9320
> > stephen-lewis.net
>



-- 
Stephen

(206)753-9320
stephen-lewis.net

Re: Updating solr schema for a collection in place

Posted by Erick Erickson <er...@gmail.com>.
Glad you found the issue. The switch to managed has tripped up
more people than just you!

Do be a little cautious about changing the schema however. There
are some "benign" changes you can do when you already have data
indexed and a series of others that are not benign. For instance,
having a field defined with docValues set to false, indexing some data
then changing that field to docValues="true" and indexing some more data
will give you "interesting" results.

Other operations, like adding new fieldTypes or new Fields are entirely
benign.

Mostly, this is just a caution that if you are changing your schema
and find results wonky (e.g. facet counts not correct, docs not being found
when you change stemming, etc). to consider deleting/recreating the
collection before tearing your hair out.

Best,
Erick

On Mon, Jun 20, 2016 at 10:37 PM, Stephen Lewis <sl...@panopto.com> wrote:
> I'm happy to say I figured out the issue. Looking through previous
> questions in this forum, I was able to find someone hitting the same issue
> which I was. After upgrading versions, we switched to the managed instead
> of the ClassicIndexSchemaFactory unintentionally. Sorry for the bother!
>
> On Mon, Jun 20, 2016 at 7:01 PM, Stephen Lewis <sl...@panopto.com> wrote:
>
>> Hello,
>>
>> I've recently set up a solr cloud using solr 6.0, and I've been having
>> some trouble getting our collections to pick up schema updates. Following
>> the docs on zkcli.sh
>> <https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files> and
>> the collections API
>> <https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2>,
>> I have uploaded the new schema by placing it onto a solr node at
>> /opt/solr/server/configsets/my_collection/conf/schema.xml and running
>>
>> /opt/solr/cloud-scripts/zkcli.sh \
>>
>> -zkhost zkdns.foo.bar \
>>
>> -cmd upconfig \
>>
>> -confname my_collection \
>>
>> -confdir /opt/solr/server/configsets/my_collection/conf
>>
>>
>> and then triggering a reload of the collection by hitting
>>
>>
>> solrNodeDns.foo.bar:<PORT>/solr/admin/collections?action=RELOAD&name=my_collection
>>
>>
>> The action reports success.
>>
>> Afterwards, however, I see something kind of strange. If I go to the admin
>> page and look at the schema in /
>> ~cloud?view=tree,
>> the updated schema is present. However, when I go to the collections
>> admin page and click on schema, I do not see the new fields present.
>> Querying for them directly also continues to lead to 400 "bad request"
>> responses, so suggesting that the new schema hasn't been picked up anywhere
>> else either.
>>
>> Is there another step that I am missing to complete the update? I found
>> this stack overflow
>> <http://stackoverflow.com/questions/36714077/solr-reload-is-not-picking-up-the-latest-changes-from-zookeeper>
>> post where the posted is advised to recreate each core, though this seems
>> like the wrong way to go to me. Any advice you have is appreciated.
>>
>> A few more notes about the cluster: I am running solr 6.0 in solr-cloud
>> mode with freestanding zookeeper machines running zk 3.4.6.
>>
>> Thanks!
>>
>> Stephen
>>
>> stephen-lewis.net
>>
>
>
>
> --
> Stephen
>
> (206)753-9320
> stephen-lewis.net

Re: Updating solr schema for a collection in place

Posted by Stephen Lewis <sl...@panopto.com>.
I'm happy to say I figured out the issue. Looking through previous
questions in this forum, I was able to find someone hitting the same issue
which I was. After upgrading versions, we switched to the managed instead
of the ClassicIndexSchemaFactory unintentionally. Sorry for the bother!

On Mon, Jun 20, 2016 at 7:01 PM, Stephen Lewis <sl...@panopto.com> wrote:

> Hello,
>
> I've recently set up a solr cloud using solr 6.0, and I've been having
> some trouble getting our collections to pick up schema updates. Following
> the docs on zkcli.sh
> <https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files> and
> the collections API
> <https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2>,
> I have uploaded the new schema by placing it onto a solr node at
> /opt/solr/server/configsets/my_collection/conf/schema.xml and running
>
> ​/opt/solr/cloud-scripts/zkcli.sh \
>
> -zkhost zkdns.foo.bar \
>
> -cmd upconfig \
>
> -confname my_collection \
>
> -confdir /opt/solr/server/configsets/my_collection/conf
>
>
> ​and then triggering a reload of the collection by hitting
>
>
> solrNodeDns.foo.bar:<PORT>/solr/admin/collections?action=RELOAD&name=my_collection
>
> ​
> The action reports success.
>
> Afterwards, however, I see something kind of strange. If I go to the admin
> page and look at the schema in /
> ~cloud?view=tree,
> ​ the updated schema is present. However, when I go to the collections
> admin page and click on schema, I do not see the new fields present.
> Querying for them directly also continues to lead to 400 "bad request"
> responses, so suggesting that the new schema hasn't been picked up anywhere
> else either.
>
> Is there another step that I am missing to complete the update? I found
> this stack overflow
> <http://stackoverflow.com/questions/36714077/solr-reload-is-not-picking-up-the-latest-changes-from-zookeeper>
> post where the posted is advised to recreate each core, though this seems
> like the wrong way to go to me. Any advice you have is appreciated.
>
> A few more notes about the cluster: I am running solr 6.0 in solr-cloud
> mode with freestanding zookeeper machines running zk 3.4.6.
>
> Thanks!
>
> Stephen
>
> stephen-lewis.net
>



-- 
Stephen

(206)753-9320
stephen-lewis.net