You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Abhishek Mishra <so...@gmail.com> on 2021/01/14 11:46:28 UTC

Re: solrcloud with EKS kubernetes

Hi Jonathan,
it was really helpful. Some of the metrics were crossing threshold like
network bandwidth etc.

Regards,
Abhishek

On Sat, Dec 26, 2020 at 7:54 PM Jonathan Tan <jt...@gmail.com> wrote:

> Hi Abhishek,
>
> Merry Christmas to you too!
> I think it's really a question regarding your indexing speed NFRs.
>
> Have you had a chance to take a look at your IOPS & write bytes/second
> graphs for that host & PVC?
>
> I'd suggest that's the first thing to go look at, so that you can find out
> whether you're actually IOPS bound or not.
> If you are, then it becomes a question of *how* you're indexing, and
> whether that can be "slowed down" or not.
>
>
>
> On Thu, Dec 24, 2020 at 5:55 PM Abhishek Mishra <so...@gmail.com>
> wrote:
>
> > Hi Jonathan,
> > Merry Christmas.
> > Thanks for the suggestion. To manage IOPS can we do something on
> > rate-limiting behalf?
> >
> > Regards,
> > Abhishek
> >
> >
> > On Thu, Dec 17, 2020 at 5:07 AM Jonathan Tan <jt...@gmail.com> wrote:
> >
> > > Hi Abhishek,
> > >
> > > We're running Solr Cloud 8.6 on GKE.
> > > 3 node cluster, running 4 cpus (configured) and 8gb of min & max JVM
> > > configured, all with anti-affinity so they never exist on the same
> node.
> > > It's got 2 collections of ~13documents each, 6 shards, 3 replicas each,
> > > disk usage on each node is ~54gb (we've got all the shards replicated
> to
> > > all nodes)
> > >
> > > We're also using a 200gb zonal SSD, which *has* been necessary just so
> > that
> > > we've got the right IOPS & bandwidth. (That's approximately 6000 IOPS
> for
> > > read & write each, and 96MB/s for read & write each)
> > >
> > > Various lessons learnt...
> > > You definitely don't want them ever on the same kubernetes node. From a
> > > resilience perspective, yes, but also when one SOLR node gets busy,
> they
> > > tend to all get busy, so now you'll have resource contention. Recovery
> > can
> > > also get very busy and resource intensive, and again, sitting on the
> same
> > > node is problematic. We also saw the need to move to SSDs because of
> how
> > > IOPS bound we were.
> > >
> > > Did I mention use SSDs? ;)
> > >
> > > Good luck!
> > >
> > > On Mon, Dec 14, 2020 at 5:34 PM Abhishek Mishra <so...@gmail.com>
> > > wrote:
> > >
> > > > Hi Houston,
> > > > Sorry for the late reply. Each shard has a 9GB size around.
> > > > Yeah, we are providing enough resources to pods. We are currently
> > > > using c5.4xlarge.
> > > > XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
> > > > No, I haven't run it outside Kubernetes. But I do have colleagues who
> > did
> > > > the same on 7.2 and didn't face any issue regarding it.
> > > > Storage volume is gp2 50GB.
> > > > It's not the search query where we are facing inconsistencies or
> > > timeouts.
> > > > Seems some internal admin APIs sometimes have issues. So while adding
> > new
> > > > replica in clusters sometimes result in inconsistencies. Like
> recovery
> > > > takes some time more than one hour.
> > > >
> > > > Regards,
> > > > Abhishek
> > > >
> > > > On Thu, Dec 10, 2020 at 10:23 AM Houston Putman <
> > houstonputman@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hello Abhishek,
> > > > >
> > > > > It's really hard to provide any advice without knowing any
> > information
> > > > > about your setup/usage.
> > > > >
> > > > > Are you giving your Solr pods enough resources on EKS?
> > > > > Have you run Solr in the same configuration outside of kubernetes
> in
> > > the
> > > > > past without timeouts?
> > > > > What type of storage volumes are you using to store your data?
> > > > > Are you using headless services to connect your Solr Nodes, or
> > > ingresses?
> > > > >
> > > > > If this is the first time that you are using this data + Solr
> > > > > configuration, maybe it's just that your data within Solr isn't
> > > optimized
> > > > > for the type of queries that you are doing.
> > > > > If you have run it successfully in the past outside of Kubernetes,
> > > then I
> > > > > would look at the resources that you are giving your pods and the
> > > storage
> > > > > volumes that you are using.
> > > > > If you are using Ingresses, that might be causing slow connections
> > > > between
> > > > > nodes, or between your client and Solr.
> > > > >
> > > > > - Houston
> > > > >
> > > > > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra <
> solrmishra@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hello guys,
> > > > > > We are kind of facing some of the issues(Like timeout etc.) which
> > are
> > > > > very
> > > > > > inconsistent. By any chance can it be related to EKS? We are
> using
> > > solr
> > > > > 7.7
> > > > > > and zookeeper 3.4.13. Should we move to ECS?
> > > > > >
> > > > > > Regards,
> > > > > > Abhishek
> > > > > >
> > > > >
> > > >
> > >
> >
>