You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by jerome moliere <je...@javaxpert.com> on 2021/05/14 21:12:17 UTC

Migrating huge files volume

Hi all,
I would like to know the best practices to migrate a mission critical Solr
cluster from version 6.x to 8.x without any service shutdown.
Is the double upgrade required (mandatory) ?
We can introduce another temporary group of machines ro join the cluster,
then we may stop the existing nodes one by one once upgrade done...
In this case we have  different indexes  and 2 replicas per shard...

Is it possible to have concurently 2 versions of SolR running in the same
cluster?
Of course a full reindexing seems to be required , there is no problem we
have enough time ( upgrade procedure is not required to be finished in a
short time).
I would like to have a summary of options available , pros & cons for each
one and if possible a few tips & tricks to avoid some pitfalls...

If you have good pointers to share and any feedback from production
upgrades they will be welcome...
Thanks for your help .
Kind regards

Re: Migrating huge files volume

Posted by matthew sporleder <ms...@gmail.com>.
On Fri, May 14, 2021 at 5:13 PM jerome moliere <je...@javaxpert.com> wrote:
>
> Hi all,
> I would like to know the best practices to migrate a mission critical Solr
> cluster from version 6.x to 8.x without any service shutdown.
> Is the double upgrade required (mandatory) ?
> We can introduce another temporary group of machines ro join the cluster,
> then we may stop the existing nodes one by one once upgrade done...
> In this case we have  different indexes  and 2 replicas per shard...
>
> Is it possible to have concurently 2 versions of SolR running in the same
> cluster?
> Of course a full reindexing seems to be required , there is no problem we
> have enough time ( upgrade procedure is not required to be finished in a
> short time).
> I would like to have a summary of options available , pros & cons for each
> one and if possible a few tips & tricks to avoid some pitfalls...
>
> If you have good pointers to share and any feedback from production
> upgrades they will be welcome...
> Thanks for your help .
> Kind regards

With the (lack of) constraints mentioned here I think the easiest path
is a new cluster, full re-index, and then swap over your clients.

Re: Migrating huge files volume

Posted by Dwane Hall <dw...@hotmail.com>.
Hey Jerome,

Are you using Solr Cloud?  If so you'll require a reindex as you hint at below (Solr committers recommend reindexing through major version jumps anyway).  From memory you need to be on at least a 7.x version of Solr to be able to upgrade to 8.x and the newer 8.x releases (from about 8.5 from memory) also use a newer zookeeper version so you may have some incompatibilities which will prevent you from adding new nodes to the existing cluster with your 6.x nodes (or using another znode in the existing environment).  If possible can you add your new 8.x nodes using another (newer) zookeeper instance alongside your existing infrastructure and then switch your users over to the new environment when the reindexing process is complete?  Your new "temporary" group of machines would then become your primaries and your old 6.x versions would become redundant.  If you have your clients access Solr via a proxy/load balancer (which is recommended best practice), and your clients access your collections via a collection alias then the switch over can appear seamless via an API call to the collections API to reroute your users to the new Solr 8.x environment.

i.e.  DELETEALIAS over old Solr 6.x instance, ADDALIAS of same name to new 8.x instance (millisecond response times)

With the upgrade there's been a jump in Java versions from 8.x to 11.x and a change in the default garbage collection algorithm from CMS to G1.   A separate dedicated 8.x environment would enable you to test these changes in isolation and prevent any production surprises as I've occasionally seen comments on this forum from users indicating differences in resource consumption on their hosts when this jump has been made.   If you're using some of the Trie... fileds in your schema they've also become deprecated in more recent versions of Solr and some of the default settings (docValues) have changed so you'll want to confirm your schema is compatible with 8.x Solr.  Solr has a good list of upgrade notes with every release so it'd be worth making your way through that list as well as there's more than I can explicitly call out and I don't know what features of Solr you're using (facets, nested docs ...etc.) https://solr.apache.org/guide/8_8/solr-upgrade-notes.html
Solr Upgrade Notes | Apache Solr Reference Guide 8.8.2<https://solr.apache.org/guide/8_8/solr-upgrade-notes.html>
Removing base_url from Stored State. If you’re able to upgrade SolrJ to 8.8.x for all of your client applications, then you can set -Dsolr.storeBaseUrl=false to better align the stored state in Zookeeper with future versions of Solr; as of Solr 9.x, the base_url will no longer be persisted in stored state. However, if you are not able to upgrade SolrJ to 8.8.x for all client applications ...
solr.apache.org


Every Solr environment is different so it's recommended you run tests under your own environmental conditions to confirm the behaviour is as you expect it to be.

This is how I've typically managed upgrades in the past and if you're not using Solr Cloud disregard my advice that's where all of my Solr experience resides.

Cheers,

Dwane
________________________________
From: jerome moliere <je...@javaxpert.com>
Sent: Saturday, 15 May 2021 7:12 AM
To: users@solr.apache.org <us...@solr.apache.org>
Subject: Migrating huge files volume

Hi all,
I would like to know the best practices to migrate a mission critical Solr
cluster from version 6.x to 8.x without any service shutdown.
Is the double upgrade required (mandatory) ?
We can introduce another temporary group of machines ro join the cluster,
then we may stop the existing nodes one by one once upgrade done...
In this case we have  different indexes  and 2 replicas per shard...

Is it possible to have concurently 2 versions of SolR running in the same
cluster?
Of course a full reindexing seems to be required , there is no problem we
have enough time ( upgrade procedure is not required to be finished in a
short time).
I would like to have a summary of options available , pros & cons for each
one and if possible a few tips & tricks to avoid some pitfalls...

If you have good pointers to share and any feedback from production
upgrades they will be welcome...
Thanks for your help .
Kind regards

Re: Migrating huge files volume

Posted by SF G <sf...@yahoo.com.INVALID>.

Sent from my iPhone

> On May 15, 2021, at 4:20 PM, matthew sporleder <ms...@gmail.com> wrote:
> 
> Ensure your new cluster has a unique zookeeper chroot. 
> 
>> On May 15, 2021, at 3:10 PM, jerome moliere <je...@javaxpert.com> wrote:
>> 
>> Hi  guys (Shawn.Dwane & others)
>> thanks for your support...
>> My problem is tricky , I will try to summarize the constraints :
>> - existing Solr with Zookeeper cluster , I cannot migrate Zk ensemble
>> cluster separately because it is used by other applications (kafka ...) or
>> I should change the current  config to switch to the new version
>> - running version 6.x JDK 8 on RH 6.2 (?), target config is OpenJDK11 on RH
>> 8.x
>> - 2 indexes and different shards (number depends from the target
>> environmen/volumet)
>> - we have 4 environments to migrate dev/staging (no important data here)
>> and production + another (less important but customers facing)
>> - production cluster , so because of the SLA the maximum down time is 4
>> hours and I am sure that it is not sufficient to reindex more than 1
>> billion of documents
>> - for the moment I don't know the schema used but sure I will have access
>> to the old version soon
>> - SolarJ clients to  be migrated too (customers would like not to upgrade
>> these clients but it seems to be mandatory judging by the docs & your
>> feedbacks)
>> 
>> On my todo list I already have written some points:
>> - build the new cluster independently from the  existing one
>> - try to use the same Zk ensemble for both clusters ( there is no problem
>> with recent Zk on JDK 11 with old Solr 6 running JDK 8 ?)
>> - migrate the config files
>> - migrate the JVM config : as mentionned by Dwane , of course we 'll try to
>> get the best options offered by new JVM (ZGC or G1 , we 'll need some
>> benchmarks to choose the best option)
>> 
>> then reindexing full documents list...
>> 
>> So the switch from one cluster to the other will be a network trick only
>> routing traffic from one cluster to the other...
>> 
>> Your expert advices strengthen my idea, but I fear to forgive an important
>> step .....
>> You already helped me to paint an approximate procedure ....
>> 
>> Thanks again for your support.
>> 
>> 
>> 
>>>> On Sat, May 15, 2021 at 5:27 PM Shawn Heisey <ap...@elyograg.org> wrote:
>>> 
>>>> On 5/14/2021 3:12 PM, jerome moliere wrote:
>>>> I would like to know the best practices to migrate a mission critical
>>> Solr
>>>> cluster from version 6.x to 8.x without any service shutdown.
>>>> Is the double upgrade required (mandatory) ?
>>>> We can introduce another temporary group of machines ro join the cluster,
>>>> then we may stop the existing nodes one by one once upgrade done...
>>>> In this case we have different indexes and 2 replicas per shard...
>>> 
>>> If you have an index that has ever been touched by a 6.x version, you
>>> can't use it in 8.x, even if you first upgrade it with a 7.x version.
>>> It will fail.  Reindexing is required.  Jumping more than one major
>>> version was iffy and not recommended before 6.x, now it is explicitly
>>> enforced as not possible.  The enforcement is done by Lucene.
>>> 
>>>> Is it possible to have concurently 2 versions of SolR running in the same
>>>> cluster?
>>> 
>>> It would not be recommended at all to run two different major versions
>>> in one cluster.  I am assuming SolrCloud here, where the nodes are
>>> always talking to each other.
>>> 
>>>> Of course a full reindexing seems to be required , there is no problem we
>>>> have enough time ( upgrade procedure is not required to be finished in a
>>>> short time).
>>>> I would like to have a summary of options available , pros & cons for
>>> each
>>>> one and if possible a few tips & tricks to avoid some pitfalls...
>>> 
>>> Build the new cluster separately from the old one.  They could share the
>>> same zookeeper ensemble by setting up the new cluster with a different
>>> chroot.
>>> 
>>> Create new config files for the indexes with the new version examples as
>>> starting points.
>>> 
>>> Build the indexes from scratch on the new cluster.
>>> 
>>> Switch the URLs in your application to point to the new cluster.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
>> 
>> -- 
>> J.MOLIERE - Mentor/J


Re: Migrating huge files volume

Posted by matthew sporleder <ms...@gmail.com>.
Ensure your new cluster has a unique zookeeper chroot. 

> On May 15, 2021, at 3:10 PM, jerome moliere <je...@javaxpert.com> wrote:
> 
> Hi  guys (Shawn.Dwane & others)
> thanks for your support...
> My problem is tricky , I will try to summarize the constraints :
> - existing Solr with Zookeeper cluster , I cannot migrate Zk ensemble
> cluster separately because it is used by other applications (kafka ...) or
> I should change the current  config to switch to the new version
> - running version 6.x JDK 8 on RH 6.2 (?), target config is OpenJDK11 on RH
> 8.x
> - 2 indexes and different shards (number depends from the target
> environmen/volumet)
> - we have 4 environments to migrate dev/staging (no important data here)
> and production + another (less important but customers facing)
> - production cluster , so because of the SLA the maximum down time is 4
> hours and I am sure that it is not sufficient to reindex more than 1
> billion of documents
> - for the moment I don't know the schema used but sure I will have access
> to the old version soon
> - SolarJ clients to  be migrated too (customers would like not to upgrade
> these clients but it seems to be mandatory judging by the docs & your
> feedbacks)
> 
> On my todo list I already have written some points:
> - build the new cluster independently from the  existing one
> - try to use the same Zk ensemble for both clusters ( there is no problem
> with recent Zk on JDK 11 with old Solr 6 running JDK 8 ?)
> - migrate the config files
> - migrate the JVM config : as mentionned by Dwane , of course we 'll try to
> get the best options offered by new JVM (ZGC or G1 , we 'll need some
> benchmarks to choose the best option)
> 
> then reindexing full documents list...
> 
> So the switch from one cluster to the other will be a network trick only
> routing traffic from one cluster to the other...
> 
> Your expert advices strengthen my idea, but I fear to forgive an important
> step .....
> You already helped me to paint an approximate procedure ....
> 
> Thanks again for your support.
> 
> 
> 
>> On Sat, May 15, 2021 at 5:27 PM Shawn Heisey <ap...@elyograg.org> wrote:
>> 
>>> On 5/14/2021 3:12 PM, jerome moliere wrote:
>>> I would like to know the best practices to migrate a mission critical
>> Solr
>>> cluster from version 6.x to 8.x without any service shutdown.
>>> Is the double upgrade required (mandatory) ?
>>> We can introduce another temporary group of machines ro join the cluster,
>>> then we may stop the existing nodes one by one once upgrade done...
>>> In this case we have different indexes and 2 replicas per shard...
>> 
>> If you have an index that has ever been touched by a 6.x version, you
>> can't use it in 8.x, even if you first upgrade it with a 7.x version.
>> It will fail.  Reindexing is required.  Jumping more than one major
>> version was iffy and not recommended before 6.x, now it is explicitly
>> enforced as not possible.  The enforcement is done by Lucene.
>> 
>>> Is it possible to have concurently 2 versions of SolR running in the same
>>> cluster?
>> 
>> It would not be recommended at all to run two different major versions
>> in one cluster.  I am assuming SolrCloud here, where the nodes are
>> always talking to each other.
>> 
>>> Of course a full reindexing seems to be required , there is no problem we
>>> have enough time ( upgrade procedure is not required to be finished in a
>>> short time).
>>> I would like to have a summary of options available , pros & cons for
>> each
>>> one and if possible a few tips & tricks to avoid some pitfalls...
>> 
>> Build the new cluster separately from the old one.  They could share the
>> same zookeeper ensemble by setting up the new cluster with a different
>> chroot.
>> 
>> Create new config files for the indexes with the new version examples as
>> starting points.
>> 
>> Build the indexes from scratch on the new cluster.
>> 
>> Switch the URLs in your application to point to the new cluster.
>> 
>> Thanks,
>> Shawn
>> 
> 
> 
> -- 
> J.MOLIERE - Mentor/J

Re: Migrating huge files volume

Posted by jerome moliere <je...@javaxpert.com>.
Hi  guys (Shawn.Dwane & others)
thanks for your support...
My problem is tricky , I will try to summarize the constraints :
- existing Solr with Zookeeper cluster , I cannot migrate Zk ensemble
cluster separately because it is used by other applications (kafka ...) or
I should change the current  config to switch to the new version
- running version 6.x JDK 8 on RH 6.2 (?), target config is OpenJDK11 on RH
8.x
- 2 indexes and different shards (number depends from the target
environmen/volumet)
- we have 4 environments to migrate dev/staging (no important data here)
and production + another (less important but customers facing)
- production cluster , so because of the SLA the maximum down time is 4
hours and I am sure that it is not sufficient to reindex more than 1
billion of documents
- for the moment I don't know the schema used but sure I will have access
to the old version soon
- SolarJ clients to  be migrated too (customers would like not to upgrade
these clients but it seems to be mandatory judging by the docs & your
feedbacks)

On my todo list I already have written some points:
- build the new cluster independently from the  existing one
- try to use the same Zk ensemble for both clusters ( there is no problem
with recent Zk on JDK 11 with old Solr 6 running JDK 8 ?)
- migrate the config files
- migrate the JVM config : as mentionned by Dwane , of course we 'll try to
get the best options offered by new JVM (ZGC or G1 , we 'll need some
benchmarks to choose the best option)

then reindexing full documents list...

So the switch from one cluster to the other will be a network trick only
routing traffic from one cluster to the other...

Your expert advices strengthen my idea, but I fear to forgive an important
step .....
You already helped me to paint an approximate procedure ....

Thanks again for your support.



On Sat, May 15, 2021 at 5:27 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/14/2021 3:12 PM, jerome moliere wrote:
> > I would like to know the best practices to migrate a mission critical
> Solr
> > cluster from version 6.x to 8.x without any service shutdown.
> > Is the double upgrade required (mandatory) ?
> > We can introduce another temporary group of machines ro join the cluster,
> > then we may stop the existing nodes one by one once upgrade done...
> > In this case we have different indexes and 2 replicas per shard...
>
> If you have an index that has ever been touched by a 6.x version, you
> can't use it in 8.x, even if you first upgrade it with a 7.x version.
> It will fail.  Reindexing is required.  Jumping more than one major
> version was iffy and not recommended before 6.x, now it is explicitly
> enforced as not possible.  The enforcement is done by Lucene.
>
> > Is it possible to have concurently 2 versions of SolR running in the same
> > cluster?
>
> It would not be recommended at all to run two different major versions
> in one cluster.  I am assuming SolrCloud here, where the nodes are
> always talking to each other.
>
> > Of course a full reindexing seems to be required , there is no problem we
> > have enough time ( upgrade procedure is not required to be finished in a
> > short time).
> > I would like to have a summary of options available , pros & cons for
> each
> > one and if possible a few tips & tricks to avoid some pitfalls...
>
> Build the new cluster separately from the old one.  They could share the
> same zookeeper ensemble by setting up the new cluster with a different
> chroot.
>
> Create new config files for the indexes with the new version examples as
> starting points.
>
> Build the indexes from scratch on the new cluster.
>
> Switch the URLs in your application to point to the new cluster.
>
> Thanks,
> Shawn
>


-- 
J.MOLIERE - Mentor/J

Re: Migrating huge files volume

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/14/2021 3:12 PM, jerome moliere wrote:
> I would like to know the best practices to migrate a mission critical Solr
> cluster from version 6.x to 8.x without any service shutdown.
> Is the double upgrade required (mandatory) ?
> We can introduce another temporary group of machines ro join the cluster,
> then we may stop the existing nodes one by one once upgrade done...
> In this case we have different indexes and 2 replicas per shard...

If you have an index that has ever been touched by a 6.x version, you 
can't use it in 8.x, even if you first upgrade it with a 7.x version. 
It will fail.  Reindexing is required.  Jumping more than one major 
version was iffy and not recommended before 6.x, now it is explicitly 
enforced as not possible.  The enforcement is done by Lucene.

> Is it possible to have concurently 2 versions of SolR running in the same
> cluster?

It would not be recommended at all to run two different major versions 
in one cluster.  I am assuming SolrCloud here, where the nodes are 
always talking to each other.

> Of course a full reindexing seems to be required , there is no problem we
> have enough time ( upgrade procedure is not required to be finished in a
> short time).
> I would like to have a summary of options available , pros & cons for each
> one and if possible a few tips & tricks to avoid some pitfalls...

Build the new cluster separately from the old one.  They could share the 
same zookeeper ensemble by setting up the new cluster with a different 
chroot.

Create new config files for the indexes with the new version examples as 
starting points.

Build the indexes from scratch on the new cluster.

Switch the URLs in your application to point to the new cluster.

Thanks,
Shawn