You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by xavi jmlucjav <jm...@gmail.com> on 2016/09/25 10:24:48 UTC

issue transplanting standalone core into solrcloud (plus upgrade)

Hi,

I have an existing 3.6 standalone installation. It has to be moved to
Solrcloud 6.1.0. Reindexing is not an option, so I did the following:

- Use IndexUpgrader to upgrade 3.6 -> 4.4 -> 5.5. I did not upgrade to 6.X
as 5.5 should be readable by 6.x
- Install solrcloud 6.1 cluster
- modify schema/solrconfig for cloud support (add _version_, tlog etc)
- follow the method mentioned here
http://lucene.472066.n3.nabble.com/Copy-existing-index-from-standalone-Solr-to-Solr-cloud-td4149920.html
I did not find any other doc on how to transplant a standalone core int
solrcloud

Everything went well, no errors when solr restarted, the collections shows
the right number of docs. But when I try to run a query, I get:

null:java.lang.NullPointerException
at
org.apache.lucene.util.LegacyNumericUtils.prefixCodedToLong(LegacyNumericUtils.java:189)
at org.apache.solr.schema.TrieField.toObject(TrieField.java:155)
at org.apache.solr.schema.TrieField.write(TrieField.java:324)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:133)
at
org.apache.solr.response.JSONWriter.writeSolrDocument(JSONResponseWriter.java:345)
at
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:249)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:151)
at
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)
at
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
at
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:731)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)

I was wondering how the non existance of the _version_ field would be
handled, but as that thread above said it would work.
Can anyone shed some light?

thanks

Re: issue transplanting standalone core into solrcloud (plus upgrade)

Posted by xavi jmlucjav <jm...@gmail.com>.
I guess there is no other way than reindex:
- of course, not all fields are stored, that would have been too easy
- it might (??) work if as Jan says I build a custom solr version with
removed IntFields added etc, but going down this rabbithole sounds too
risky, too much work for what, not sure it would eventually work, specially
considering the last point:
- I did not get any response to this, but my understanding now is that you
cannot take a standalone solr core /data  (without a _version_ field) and
put that into solrcloud setup, as _version_ is needed.

xavier

On Mon, Sep 26, 2016 at 9:21 PM, Jan Høydahl <jh...@cominvent.com> wrote:

> If all the fields in your current schema has stored=“true”, you can try to
> export
> the full index to an XML file which can then be imported into 6.1.
> If some fields are not stored you will only be able to recover the
> inverted index
> representation of that data, which may not be enough to recreate the
> original
> data (or in some cases maybe it is enough).
>
> If you share a copy of your old schema.xml we may be able to help.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 26. sep. 2016 kl. 20.39 skrev Shawn Heisey <ap...@elyograg.org>:
> >
> > On 9/26/2016 6:28 AM, xavi jmlucjav wrote:
> >> Yes, I had to change some fields, basically to use TrieIntField etc
> >> instead
> >> of the old IntField. I was assuming by using the IndexUpgrader to
> upgrade
> >> the data to 6.1, the older IntField would work with the new
> TrieIntField.
> >> But I have tried loading the upgraded data into a standalone 6.1 and I
> am
> >> hitting the same issue, so this is not related to _version_ field (more
> on
> >> that below). Forget about solrcloud for now, having an old 3.6 index,
> >> should it be possible to use IndexUpgrader and load it on 6.1? How would
> >> one need to handle IntFields etc?
> >
> > The only option when you change the class on a field in your schema is
> > to wipe the index and rebuild it.  TrieIntField uses a completely
> > different on-disk data format than IntField did.  The two formats simply
> > aren't compatible.  This is not a bug, it's a fundamental fact of Lucene
> > indexes.
> >
> > Lucene doesn't use a schema -- that's a Solr concept.  IndexUpgrader is
> > a Lucene program that doesn't know what kind of data each field
> > contains, it just reaches down into the old index format, grabs the
> > internal data in each field, and copies it to a new index using the new
> > format.  The internal data must still be consistent with the Lucene
> > program for the index to work in a new version.  When you're running
> > Solr, it uses the schema to know how to read the index.
> >
> > In 5.x and 6.x, IntField does not exist, and attempting to read that
> > data using TrieIntField will not work.
> >
> > The luceneMatchVersion setting in solrconfig.xml can cause certain
> > components (tokenizers and filters mainly) to revert to old behavior in
> > the previous major version.  Version 6.x doesn't hold onto behavior from
> > 3.x and 4.x -- it can only revert behavior back to 5.x versions.
> >
> > The luceneMatchVersion setting cannot bring back removed classes like
> > IntField, and it does NOT affect the on-disk index format.
> >
> > Your particular situation will require a full reindex.  It is not
> > possible to upgrade an index using those old class types.
> >
> > Thanks,
> > Shawn
> >
>
>

Re: issue transplanting standalone core into solrcloud (plus upgrade)

Posted by Jan Høydahl <jh...@cominvent.com>.
If all the fields in your current schema has stored=“true”, you can try to export
the full index to an XML file which can then be imported into 6.1.
If some fields are not stored you will only be able to recover the inverted index
representation of that data, which may not be enough to recreate the original
data (or in some cases maybe it is enough).

If you share a copy of your old schema.xml we may be able to help.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 26. sep. 2016 kl. 20.39 skrev Shawn Heisey <ap...@elyograg.org>:
> 
> On 9/26/2016 6:28 AM, xavi jmlucjav wrote:
>> Yes, I had to change some fields, basically to use TrieIntField etc
>> instead
>> of the old IntField. I was assuming by using the IndexUpgrader to upgrade
>> the data to 6.1, the older IntField would work with the new TrieIntField.
>> But I have tried loading the upgraded data into a standalone 6.1 and I am
>> hitting the same issue, so this is not related to _version_ field (more on
>> that below). Forget about solrcloud for now, having an old 3.6 index,
>> should it be possible to use IndexUpgrader and load it on 6.1? How would
>> one need to handle IntFields etc?
> 
> The only option when you change the class on a field in your schema is
> to wipe the index and rebuild it.  TrieIntField uses a completely
> different on-disk data format than IntField did.  The two formats simply
> aren't compatible.  This is not a bug, it's a fundamental fact of Lucene
> indexes.
> 
> Lucene doesn't use a schema -- that's a Solr concept.  IndexUpgrader is
> a Lucene program that doesn't know what kind of data each field
> contains, it just reaches down into the old index format, grabs the
> internal data in each field, and copies it to a new index using the new
> format.  The internal data must still be consistent with the Lucene
> program for the index to work in a new version.  When you're running
> Solr, it uses the schema to know how to read the index.
> 
> In 5.x and 6.x, IntField does not exist, and attempting to read that
> data using TrieIntField will not work.
> 
> The luceneMatchVersion setting in solrconfig.xml can cause certain
> components (tokenizers and filters mainly) to revert to old behavior in
> the previous major version.  Version 6.x doesn't hold onto behavior from
> 3.x and 4.x -- it can only revert behavior back to 5.x versions.
> 
> The luceneMatchVersion setting cannot bring back removed classes like
> IntField, and it does NOT affect the on-disk index format.
> 
> Your particular situation will require a full reindex.  It is not
> possible to upgrade an index using those old class types.
> 
> Thanks,
> Shawn
> 


Re: issue transplanting standalone core into solrcloud (plus upgrade)

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/26/2016 6:28 AM, xavi jmlucjav wrote:
> Yes, I had to change some fields, basically to use TrieIntField etc
> instead
> of the old IntField. I was assuming by using the IndexUpgrader to upgrade
> the data to 6.1, the older IntField would work with the new TrieIntField.
> But I have tried loading the upgraded data into a standalone 6.1 and I am
> hitting the same issue, so this is not related to _version_ field (more on
> that below). Forget about solrcloud for now, having an old 3.6 index,
> should it be possible to use IndexUpgrader and load it on 6.1? How would
> one need to handle IntFields etc?

The only option when you change the class on a field in your schema is
to wipe the index and rebuild it.  TrieIntField uses a completely
different on-disk data format than IntField did.  The two formats simply
aren't compatible.  This is not a bug, it's a fundamental fact of Lucene
indexes.

Lucene doesn't use a schema -- that's a Solr concept.  IndexUpgrader is
a Lucene program that doesn't know what kind of data each field
contains, it just reaches down into the old index format, grabs the
internal data in each field, and copies it to a new index using the new
format.  The internal data must still be consistent with the Lucene
program for the index to work in a new version.  When you're running
Solr, it uses the schema to know how to read the index.

In 5.x and 6.x, IntField does not exist, and attempting to read that
data using TrieIntField will not work.

The luceneMatchVersion setting in solrconfig.xml can cause certain
components (tokenizers and filters mainly) to revert to old behavior in
the previous major version.  Version 6.x doesn't hold onto behavior from
3.x and 4.x -- it can only revert behavior back to 5.x versions.

The luceneMatchVersion setting cannot bring back removed classes like
IntField, and it does NOT affect the on-disk index format.

Your particular situation will require a full reindex.  It is not
possible to upgrade an index using those old class types.

Thanks,
Shawn


Re: issue transplanting standalone core into solrcloud (plus upgrade)

Posted by Jan Høydahl <ja...@cominvent.com>.
Better keep your old schema unchanged if you want to use an old index. The upgrader does not change field type for you. If the old IntField does not exist in 6.x you're out of luck, may try to build a custom version with the old field types as addons..

Sendt fra min iPhone

> Den 26. sep. 2016 kl. 14.28 skrev xavi jmlucjav <jm...@gmail.com>:
> 
> Hi Shawn/Jan,
> 
>> On Sun, Sep 25, 2016 at 6:18 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>> 
>>> On 9/25/2016 4:24 AM, xavi jmlucjav wrote:
>>> Everything went well, no errors when solr restarted, the collections
>> shows
>>> the right number of docs. But when I try to run a query, I get:
>>> 
>>> null:java.lang.NullPointerException
>> 
>> Did you change any of the fieldType class values as you adjusted the
>> schema for the upgrade?  A number of classes that were valid and
>> deprecated in 3.6 and 4.x were completely removed by 5.x, and 6.x
>> probably removed a few more.
>> 
> 
> Yes, I had to change some fields, basically to use TrieIntField etc instead
> of the old IntField. I was assuming by using the IndexUpgrader to upgrade
> the data to 6.1, the older IntField would work with the new TrieIntField.
> But I have tried loading the upgraded data into a standalone 6.1 and I am
> hitting the same issue, so this is not related to _version_ field (more on
> that below). Forget about solrcloud for now, having an old 3.6 index,
> should it be possible to use IndexUpgrader and load it on 6.1? How would
> one need to handle IntFields etc?
> 
> 
> 
>> 
>> If you did make changes like this to your schema, then what's in the
>> index will no longer match the schema, and the *only* option is a
>> reindex.  Exceptions are likely if you don't reindex after schema
>> changes to the class value(s) or the index analyzer(s).
>> 
>> Regarding the _version_ field:  SolrCloud expects this field to be in
>> your schema.  It might also expect that that every document in the index
>> will already contain a value in this field.  Adding _version_ to your
>> schema should be treated similarly to the changes mentioned above -- a
>> reindex is required for proper operation.
>> 
>> Even if the schema didn't change in a way that *requires* a reindex ...
>> the number of changes to the analysis components across three major
>> version jumps is quite large.  Solr might not work as expected because
>> of those changes unless you reindex, even if you don't see any
>> exceptions.  Changes to your schema because of changes in analysis
>> component behavior might  be required -- which is another situation that
>> usually requires a reindex.
>> 
>> Because of these potential problems, I always start a new Solr version
>> with no index data and completely rebuild my indexes after an upgrade.
>> That is the best way to ensure success.
>> 
> 
> I am totally aware of all the advantages of reindexing, sure. And that is
> what I always do, this time thought, seems the original data is not
> available...
> 
> 
>> You referenced a mailing list thread where somebody had success
>> converting non-cloud to cloud... but that was on version 4.8.1, two
>> major versions back from the version you're running.  They also did not
>> upgrade major versions -- from some things they said at the beginning of
>> the thread, I know that the source version was at least 4.4.  The thread
>> didn't mention any schema changes, either.
>> 
>> If the schema doesn't change at all, moving from non-cloud to cloud is
>> very possible, but if the schema changes, the index data might not match
>> the schema any more, and that situation will not work.
>> 
> Since you jumped three major versions, it's almost guaranteed that your
>> schema *did* change, and the changes may have been more extensive than
>> just adding the _version_ field.
>> 
>> It's possible that there's a problem when converting a non-cloud install
>> with no _version_ field to a cloud install where the only schema change
>> is adding the _version_ field.  We can treat THAT situation as a bug,
>> but if there are other schema changes besides adding _version_, the
>> exception you encountered is most likely not a bug.
>> 
> 
> 
> The are two orthogonal issues here:
> A. moving to solrcloud from  standalone without reindexing. And without
> having a _version_ field already indexed, of course. Is this even possible?
> From the thread above, I understood it was possible, but you say that
> solrcloud expects _version_ to be there, with values, so this makes this
> move totally impossible without a reindexing. This should be made clear
> somewhere in the doc. I understand it is not a frequent scenario, but will
> be a deal breaker when it happens. So far the only thing I found is the
> aforementioned thread, that if I am not misreading, makes it sound as it
> will work ok.
> 
> B. upgrading from a very old 3.6 version to 6.1 without reindexing: it
> seems like I am hitting an issue with this first. Even if this was
> resolved, I would not be able to achieve my goal due A, but would be good
> to know how to get this done too, if possible.
> 
> Jan: I tried tweaking luceneMatchVersion too, no luck though.
> xavier
> 
> 
>> 
>> Thanks,
>> Shawn
>> 
>> 

Re: issue transplanting standalone core into solrcloud (plus upgrade)

Posted by xavi jmlucjav <jm...@gmail.com>.
Hi Shawn/Jan,

On Sun, Sep 25, 2016 at 6:18 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 9/25/2016 4:24 AM, xavi jmlucjav wrote:
> > Everything went well, no errors when solr restarted, the collections
> shows
> > the right number of docs. But when I try to run a query, I get:
> >
> > null:java.lang.NullPointerException
>
> Did you change any of the fieldType class values as you adjusted the
> schema for the upgrade?  A number of classes that were valid and
> deprecated in 3.6 and 4.x were completely removed by 5.x, and 6.x
> probably removed a few more.
>

Yes, I had to change some fields, basically to use TrieIntField etc instead
of the old IntField. I was assuming by using the IndexUpgrader to upgrade
the data to 6.1, the older IntField would work with the new TrieIntField.
But I have tried loading the upgraded data into a standalone 6.1 and I am
hitting the same issue, so this is not related to _version_ field (more on
that below). Forget about solrcloud for now, having an old 3.6 index,
should it be possible to use IndexUpgrader and load it on 6.1? How would
one need to handle IntFields etc?



>
> If you did make changes like this to your schema, then what's in the
> index will no longer match the schema, and the *only* option is a
> reindex.  Exceptions are likely if you don't reindex after schema
> changes to the class value(s) or the index analyzer(s).
>
> Regarding the _version_ field:  SolrCloud expects this field to be in
> your schema.  It might also expect that that every document in the index
> will already contain a value in this field.  Adding _version_ to your
> schema should be treated similarly to the changes mentioned above -- a
> reindex is required for proper operation.
>
> Even if the schema didn't change in a way that *requires* a reindex ...
> the number of changes to the analysis components across three major
> version jumps is quite large.  Solr might not work as expected because
> of those changes unless you reindex, even if you don't see any
> exceptions.  Changes to your schema because of changes in analysis
> component behavior might  be required -- which is another situation that
> usually requires a reindex.
>
> Because of these potential problems, I always start a new Solr version
> with no index data and completely rebuild my indexes after an upgrade.
> That is the best way to ensure success.
>

I am totally aware of all the advantages of reindexing, sure. And that is
what I always do, this time thought, seems the original data is not
available...


> You referenced a mailing list thread where somebody had success
> converting non-cloud to cloud... but that was on version 4.8.1, two
> major versions back from the version you're running.  They also did not
> upgrade major versions -- from some things they said at the beginning of
> the thread, I know that the source version was at least 4.4.  The thread
> didn't mention any schema changes, either.
>
> If the schema doesn't change at all, moving from non-cloud to cloud is
> very possible, but if the schema changes, the index data might not match
> the schema any more, and that situation will not work.
>
Since you jumped three major versions, it's almost guaranteed that your
> schema *did* change, and the changes may have been more extensive than
> just adding the _version_ field.
>
> It's possible that there's a problem when converting a non-cloud install
> with no _version_ field to a cloud install where the only schema change
> is adding the _version_ field.  We can treat THAT situation as a bug,
> but if there are other schema changes besides adding _version_, the
> exception you encountered is most likely not a bug.
>


The are two orthogonal issues here:
A. moving to solrcloud from  standalone without reindexing. And without
having a _version_ field already indexed, of course. Is this even possible?
From the thread above, I understood it was possible, but you say that
solrcloud expects _version_ to be there, with values, so this makes this
move totally impossible without a reindexing. This should be made clear
somewhere in the doc. I understand it is not a frequent scenario, but will
be a deal breaker when it happens. So far the only thing I found is the
aforementioned thread, that if I am not misreading, makes it sound as it
will work ok.

B. upgrading from a very old 3.6 version to 6.1 without reindexing: it
seems like I am hitting an issue with this first. Even if this was
resolved, I would not be able to achieve my goal due A, but would be good
to know how to get this done too, if possible.

Jan: I tried tweaking luceneMatchVersion too, no luck though.
xavier


>
> Thanks,
> Shawn
>
>

Re: issue transplanting standalone core into solrcloud (plus upgrade)

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/25/2016 4:24 AM, xavi jmlucjav wrote:
> Everything went well, no errors when solr restarted, the collections shows
> the right number of docs. But when I try to run a query, I get:
>
> null:java.lang.NullPointerException

Did you change any of the fieldType class values as you adjusted the
schema for the upgrade?  A number of classes that were valid and
deprecated in 3.6 and 4.x were completely removed by 5.x, and 6.x
probably removed a few more.

If you did make changes like this to your schema, then what's in the
index will no longer match the schema, and the *only* option is a
reindex.  Exceptions are likely if you don't reindex after schema
changes to the class value(s) or the index analyzer(s).

Regarding the _version_ field:  SolrCloud expects this field to be in
your schema.  It might also expect that that every document in the index
will already contain a value in this field.  Adding _version_ to your
schema should be treated similarly to the changes mentioned above -- a
reindex is required for proper operation.

Even if the schema didn't change in a way that *requires* a reindex ...
the number of changes to the analysis components across three major
version jumps is quite large.  Solr might not work as expected because
of those changes unless you reindex, even if you don't see any
exceptions.  Changes to your schema because of changes in analysis
component behavior might  be required -- which is another situation that
usually requires a reindex.

Because of these potential problems, I always start a new Solr version
with no index data and completely rebuild my indexes after an upgrade. 
That is the best way to ensure success.

You referenced a mailing list thread where somebody had success
converting non-cloud to cloud... but that was on version 4.8.1, two
major versions back from the version you're running.  They also did not
upgrade major versions -- from some things they said at the beginning of
the thread, I know that the source version was at least 4.4.  The thread
didn't mention any schema changes, either.

If the schema doesn't change at all, moving from non-cloud to cloud is
very possible, but if the schema changes, the index data might not match
the schema any more, and that situation will not work.

Since you jumped three major versions, it's almost guaranteed that your
schema *did* change, and the changes may have been more extensive than
just adding the _version_ field.

It's possible that there's a problem when converting a non-cloud install
with no _version_ field to a cloud install where the only schema change
is adding the _version_ field.  We can treat THAT situation as a bug,
but if there are other schema changes besides adding _version_, the
exception you encountered is most likely not a bug.

Thanks,
Shawn


Re: issue transplanting standalone core into solrcloud (plus upgrade)

Posted by Jan Høydahl <ja...@cominvent.com>.
Did you change the <luceneMatchVersion> tag in your solrconfig.xml?
You could try to let it stay at 3.6 and let compatibility mode kick in where applicable.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. sep. 2016 kl. 12.24 skrev xavi jmlucjav <jm...@gmail.com>:
> 
> Hi,
> 
> I have an existing 3.6 standalone installation. It has to be moved to
> Solrcloud 6.1.0. Reindexing is not an option, so I did the following:
> 
> - Use IndexUpgrader to upgrade 3.6 -> 4.4 -> 5.5. I did not upgrade to 6.X
> as 5.5 should be readable by 6.x
> - Install solrcloud 6.1 cluster
> - modify schema/solrconfig for cloud support (add _version_, tlog etc)
> - follow the method mentioned here
> http://lucene.472066.n3.nabble.com/Copy-existing-index-from-standalone-Solr-to-Solr-cloud-td4149920.html
> I did not find any other doc on how to transplant a standalone core int
> solrcloud
> 
> Everything went well, no errors when solr restarted, the collections shows
> the right number of docs. But when I try to run a query, I get:
> 
> null:java.lang.NullPointerException
> at
> org.apache.lucene.util.LegacyNumericUtils.prefixCodedToLong(LegacyNumericUtils.java:189)
> at org.apache.solr.schema.TrieField.toObject(TrieField.java:155)
> at org.apache.solr.schema.TrieField.write(TrieField.java:324)
> at
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:133)
> at
> org.apache.solr.response.JSONWriter.writeSolrDocument(JSONResponseWriter.java:345)
> at
> org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:249)
> at
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:151)
> at
> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)
> at
> org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
> at
> org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95)
> at
> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60)
> at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
> at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:731)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
> 
> I was wondering how the non existance of the _version_ field would be
> handled, but as that thread above said it would work.
> Can anyone shed some light?
> 
> thanks