You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by "Tomer Y." <to...@mesuvag.com> on 2021/10/14 07:06:25 UTC

Can Solr 8.10 S3BackupRepository work without a shared NFS drive?

Hello,

This is the first time I send a message to this User List, any help will be
appreciated, we're also open for (paid) consultancy.

We are looking to deploy SolrCloud 8.10 into an EKS cluster
Normally, you'd need a shared volume between all Solr nodes - because every
node/pod needs access to the data being restored. This can be solved using
any NFS (EFS or File Gateway) or replicating an EBS volume per number of
nodes in the cluster and attached one to each

My question is if it's possible using the S3BackupRepository to skip having
the need to use EFS/File Gateway and have each Solr node communicate
directly with S3

If the answer is yes, then a followup question: our backup is about 5TB.
Does this means that each of the nodes in the cluster will need to fetch
5TB from S3?



Thank you

Re: Can Solr 8.10 S3BackupRepository work without a shared NFS drive?

Posted by Jason Gerlowski <ge...@gmail.com>.

To your second question: no.

Solr's backup process works by sending a message to each shard leader
to fetch and restore the data in that shard.  Shard leaders fetch this
data from the backup repository (S3 in this case), and then send
copies of this data to any other replicas that might exist in the
shard.

To use a concrete example.  Let's say your 5TB collection has three
shards with 2 replicas per shard, each on their own Kube pod.

Each shard leader will pull its share of the 5TB backup from S3.  (I
guess that'd be ~1.66TB on 3 different Solr replicas.). Once each
shard leader has the data, it sends its 1.66TB to any other replicas
in the shard it's responsible for.  So total network traffic for this
layout would be 5TB network traffic to S3, and 5TB traffic between
Solr nodes within the cluster.  If there were 3 replicas per shard,
there would be 10TB traffic between Solr nodes. etc.

To complicate the picture a bit here too: these are upper bounds on
the amount of network traffic that would occur.  Starting in Solr 8.9
backups are smart enough to only fetch data incrementally.  So unless
your restoration target is totally empty, it should see much less
Solr<-->S3 and Solr<-->Solr traffic.  (The actual amount would depend
on how similar the current index is to the backed up copy.)

Hope that helps!

Best,

Jason

On Mon, Nov 1, 2021 at 1:17 PM Houston Putman <ho...@gmail.com> wrote:
>
> To answer your first question, yes, the S3BackupRepository connects
> directly to S3. There is no need to have any shared storage. The next
> version of the Solr Operator (v0.5.0) will actually make this very easy to
> enable on Kubernetes clusters, such as EKS.
>
> I am not sure about the answer to your second question.
>
> - Houston
>
>
> On Thu, Oct 14, 2021 at 4:00 AM Tomer Y. <to...@mesuvag.com> wrote:
>
> > Hello,
> >
> > This is the first time I send a message to this User List, any help will be
> > appreciated, we're also open for (paid) consultancy.
> >
> > We are looking to deploy SolrCloud 8.10 into an EKS cluster
> > Normally, you'd need a shared volume between all Solr nodes - because every
> > node/pod needs access to the data being restored. This can be solved using
> > any NFS (EFS or File Gateway) or replicating an EBS volume per number of
> > nodes in the cluster and attached one to each
> >
> > My question is if it's possible using the S3BackupRepository to skip having
> > the need to use EFS/File Gateway and have each Solr node communicate
> > directly with S3
> >
> > If the answer is yes, then a followup question: our backup is about 5TB.
> > Does this means that each of the nodes in the cluster will need to fetch
> > 5TB from S3?
> >
> >
> >
> > Thank you
> >

Re: Can Solr 8.10 S3BackupRepository work without a shared NFS drive?

Posted by Houston Putman <ho...@gmail.com>.

To answer your first question, yes, the S3BackupRepository connects
directly to S3. There is no need to have any shared storage. The next
version of the Solr Operator (v0.5.0) will actually make this very easy to
enable on Kubernetes clusters, such as EKS.

I am not sure about the answer to your second question.

- Houston


On Thu, Oct 14, 2021 at 4:00 AM Tomer Y. <to...@mesuvag.com> wrote:

> Hello,
>
> This is the first time I send a message to this User List, any help will be
> appreciated, we're also open for (paid) consultancy.
>
> We are looking to deploy SolrCloud 8.10 into an EKS cluster
> Normally, you'd need a shared volume between all Solr nodes - because every
> node/pod needs access to the data being restored. This can be solved using
> any NFS (EFS or File Gateway) or replicating an EBS volume per number of
> nodes in the cluster and attached one to each
>
> My question is if it's possible using the S3BackupRepository to skip having
> the need to use EFS/File Gateway and have each Solr node communicate
> directly with S3
>
> If the answer is yes, then a followup question: our backup is about 5TB.
> Does this means that each of the nodes in the cluster will need to fetch
> 5TB from S3?
>
>
>
> Thank you
>