You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rajdeep Sahoo <ra...@gmail.com> on 2020/01/17 18:43:33 UTC

Solr cloud production set up

Hi all,
 We are using solr cloud 7.7.1
In a live production environment how many solr cloud server do we need,
 Currently ,we are using master slave set up with 16 slave server with solr
4.6.
In solr cloud do we need to scale it up or 16 server will suffice the
purpose.

Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
Got your point.
  If we think about the infra, then in cloud do we need more infra in
comparison to master slave.



On Sat, 18 Jan, 2020, 2:24 PM Jörn Franke, <jo...@gmail.com> wrote:

> I think you should do your own measurements. This is very document and
> processing specific.
> You can run a test with a simple setup for let’s say 1 mio document and
> interpolate from this. It could be also that your ETL is the bottleneck and
> not Solr.
> At the same time you can simulate user queries using Jmeter or similar.
>
> > Am 18.01.2020 um 09:05 schrieb Rajdeep Sahoo <rajdeepsahoo2012@gmail.com
> >:
> >
> > Our Index size is huge and in master slave the full indexing time is
> almost
> > 24 hrs.
> >   In future the no of documents will increase.
> > So,please some one recommend about the no of nodes and configuration like
> > ram and cpu core for solr cloud.
> >
> >> On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, <wu...@wunderwood.org>
> >> wrote:
> >>
> >> Why do you want to change to Solr Cloud? Master/slave is a great, stable
> >> cluster architecture.
> >>
> >> wunder
> >> Walter Underwood
> >> wunder@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo <rajdeepsahoo2012@gmail.com
> >
> >> wrote:
> >>>
> >>> Please reply anyone
> >>>
> >>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <
> >> rajdeepsahoo2012@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>> We are using solr cloud 7.7.1
> >>>> In a live production environment how many solr cloud server do we
> need,
> >>>> Currently ,we are using master slave set up with 16 slave server with
> >>>> solr 4.6.
> >>>> In solr cloud do we need to scale it up or 16 server will suffice the
> >>>> purpose.
> >>>>
> >>>>
> >>
> >>
>

Re: Solr cloud production set up

Posted by Jörn Franke <jo...@gmail.com>.
I think you should do your own measurements. This is very document and processing specific.
You can run a test with a simple setup for let’s say 1 mio document and interpolate from this. It could be also that your ETL is the bottleneck and not Solr.
At the same time you can simulate user queries using Jmeter or similar.

> Am 18.01.2020 um 09:05 schrieb Rajdeep Sahoo <ra...@gmail.com>:
> 
> Our Index size is huge and in master slave the full indexing time is almost
> 24 hrs.
>   In future the no of documents will increase.
> So,please some one recommend about the no of nodes and configuration like
> ram and cpu core for solr cloud.
> 
>> On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, <wu...@wunderwood.org>
>> wrote:
>> 
>> Why do you want to change to Solr Cloud? Master/slave is a great, stable
>> cluster architecture.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo <ra...@gmail.com>
>> wrote:
>>> 
>>> Please reply anyone
>>> 
>>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <
>> rajdeepsahoo2012@gmail.com>
>>> wrote:
>>> 
>>>> Hi all,
>>>> We are using solr cloud 7.7.1
>>>> In a live production environment how many solr cloud server do we need,
>>>> Currently ,we are using master slave set up with 16 slave server with
>>>> solr 4.6.
>>>> In solr cloud do we need to scale it up or 16 server will suffice the
>>>> purpose.
>>>> 
>>>> 
>> 
>> 

Re: Solr cloud production set up

Posted by Walter Underwood <wu...@wunderwood.org>.
How big? We index 35 million documents in about 6 hours.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 18, 2020, at 12:05 AM, Rajdeep Sahoo <ra...@gmail.com> wrote:
> 
> Our Index size is huge and in master slave the full indexing time is almost
> 24 hrs.
>   In future the no of documents will increase.
> So,please some one recommend about the no of nodes and configuration like
> ram and cpu core for solr cloud.
> 
> On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, <wu...@wunderwood.org>
> wrote:
> 
>> Why do you want to change to Solr Cloud? Master/slave is a great, stable
>> cluster architecture.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo <ra...@gmail.com>
>> wrote:
>>> 
>>> Please reply anyone
>>> 
>>> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <
>> rajdeepsahoo2012@gmail.com>
>>> wrote:
>>> 
>>>> Hi all,
>>>> We are using solr cloud 7.7.1
>>>> In a live production environment how many solr cloud server do we need,
>>>> Currently ,we are using master slave set up with 16 slave server with
>>>> solr 4.6.
>>>> In solr cloud do we need to scale it up or 16 server will suffice the
>>>> purpose.
>>>> 
>>>> 
>> 
>> 


Re: Solr cloud production set up

Posted by Paras Lehana <pa...@indiamart.com>.
Hi Rajdeep,


   1. I assume you had enabled docValues for the facet fields, right?
   2. What does your GC logs tell? Do you get freezes and CPU spikes during
   intervals?
   3. Caching will help in querying. I'll need to see a sample query of
   yours to recommend what you can tweak.


On Tue, 28 Jan 2020 at 19:09, Jason Gerlowski <ge...@gmail.com> wrote:

> Hi Rajdeep,
>
> Unfortunately it's near impossible for anyone here to tell you what
> parameters to tweak.  People might take guesses based on their
> individual past experience, but ultimately those are just guesses.
>
> There are just too many variables affecting Solr performance for
> anyone to have a good guess without access to the cluster itself and
> the time and will to dig into it.
>
> Are there GC params that need tweaking?  Very possible, but you'll
> have to look into your gc logs to see how much time is being spent in
> gc.  Are there query params you could be changing?  Very possible, but
> you'll have to identify the types of queries you're submitting and see
> whether the ref-guide offers any information on how to tweak
> performance for those particular qparsers, facets, etc.  Is the number
> of facets the reason for slow queries?  Very possible, but you'll have
> to turn faceting off or run debug=timing and see how what that tells
> you about the QTime's.
>
> Tuning Solr performance is a tough, time consuming process.  I wish
> there was an easier answer for you, but there's not.
>
> Best,
>
> Jason
>
> On Mon, Jan 20, 2020 at 12:06 PM Rajdeep Sahoo
> <ra...@gmail.com> wrote:
> >
> > Please suggest anyone
> >
> > On Sun, 19 Jan, 2020, 9:43 AM Rajdeep Sahoo, <rajdeepsahoo2012@gmail.com
> >
> > wrote:
> >
> > > Apart from reducing no of facets in the query, is there any other query
> > > params or gc params or heap space or anything else that we need to
> tweak
> > > for improving search response time.
> > >
> > > On Sun, 19 Jan, 2020, 3:15 AM Erick Erickson, <erickerickson@gmail.com
> >
> > > wrote:
> > >
> > >> Add &debug=timing to the query and it’ll show you the time each
> component
> > >> takes.
> > >>
> > >> > On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo <
> rajdeepsahoo2012@gmail.com>
> > >> wrote:
> > >> >
> > >> > Thanks for the suggestion,
> > >> >
> > >> > Is there any way to get the info which operation or which query
> params
> > >> are
> > >> > increasing the response time.
> > >> >
> > >> >
> > >> > On Sat, 18 Jan, 2020, 11:59 PM Dave, <ha...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> If you’re not getting values, don’t ask for the facet. Facets are
> > >> >> expensive as hell, maybe you should think more about your query’s
> than
> > >> your
> > >> >> infrastructure, solr cloud won’t help you at all especially if your
> > >> asking
> > >> >> for things you don’t need
> > >> >>
> > >> >>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo <
> > >> rajdeepsahoo2012@gmail.com>
> > >> >> wrote:
> > >> >>>
> > >> >>> We have assigned 16 gb out of 24gb for heap .
> > >> >>> No other process is running on that node.
> > >> >>>
> > >> >>> 200 facets fields are there in the query but we will not be
> getting
> > >> the
> > >> >>> values for each facets for every search.
> > >> >>> There can be max of 50-60 facets for which we will be getting
> values.
> > >> >>>
> > >> >>> We are using caching,is it not going to help.
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <
> apache@elyograg.org>
> > >> >> wrote:
> > >> >>>>
> > >> >>>>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> > >> >>>>> We are having 2.3 million documents and size is 2.5 gb.
> > >> >>>>>  10 core cpu and 24 gb ram . 16 slave nodes.
> > >> >>>>>
> > >> >>>>>  Still some of the queries are taking 50 sec at solr end.
> > >> >>>>> As we are using solr 4.6 .
> > >> >>>>>  Other thing is we are having 200 (avg) facet fields  in a
> query.
> > >> >>>>> And 30 searchable fields.
> > >> >>>>> Is there any way to identify why it is taking 50 sec for a
> query.
> > >> >>>>>    Multiple concurrent requests are there.
> > >> >>>>
> > >> >>>> Searching 30 fields and computing 200 facets is never going to be
> > >> super
> > >> >>>> fast.  Switching to cloud will not help, and might make it
> slower.
> > >> >>>>
> > >> >>>> Your index is pretty small to a lot of us.  There are people
> running
> > >> >>>> indexes with billions of documents that take terabytes of disk
> space.
> > >> >>>>
> > >> >>>> As Walter mentioned, computing 200 facets is going to require a
> fair
> > >> >>>> amount of heap memory.  One *possible* problem here is that the
> Solr
> > >> >>>> heap size is too small, so a lot of GC is required.  How much of
> the
> > >> >>>> 24GB have you assigned to the heap?  Is there any software other
> than
> > >> >>>> Solr running on these nodes?
> > >> >>>>
> > >> >>>> Thanks,
> > >> >>>> Shawn
> > >> >>>>
> > >> >>
> > >>
> > >>
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>

Re: Solr cloud production set up

Posted by Jason Gerlowski <ge...@gmail.com>.
Hi Rajdeep,

Unfortunately it's near impossible for anyone here to tell you what
parameters to tweak.  People might take guesses based on their
individual past experience, but ultimately those are just guesses.

There are just too many variables affecting Solr performance for
anyone to have a good guess without access to the cluster itself and
the time and will to dig into it.

Are there GC params that need tweaking?  Very possible, but you'll
have to look into your gc logs to see how much time is being spent in
gc.  Are there query params you could be changing?  Very possible, but
you'll have to identify the types of queries you're submitting and see
whether the ref-guide offers any information on how to tweak
performance for those particular qparsers, facets, etc.  Is the number
of facets the reason for slow queries?  Very possible, but you'll have
to turn faceting off or run debug=timing and see how what that tells
you about the QTime's.

Tuning Solr performance is a tough, time consuming process.  I wish
there was an easier answer for you, but there's not.

Best,

Jason

On Mon, Jan 20, 2020 at 12:06 PM Rajdeep Sahoo
<ra...@gmail.com> wrote:
>
> Please suggest anyone
>
> On Sun, 19 Jan, 2020, 9:43 AM Rajdeep Sahoo, <ra...@gmail.com>
> wrote:
>
> > Apart from reducing no of facets in the query, is there any other query
> > params or gc params or heap space or anything else that we need to tweak
> > for improving search response time.
> >
> > On Sun, 19 Jan, 2020, 3:15 AM Erick Erickson, <er...@gmail.com>
> > wrote:
> >
> >> Add &debug=timing to the query and it’ll show you the time each component
> >> takes.
> >>
> >> > On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo <ra...@gmail.com>
> >> wrote:
> >> >
> >> > Thanks for the suggestion,
> >> >
> >> > Is there any way to get the info which operation or which query params
> >> are
> >> > increasing the response time.
> >> >
> >> >
> >> > On Sat, 18 Jan, 2020, 11:59 PM Dave, <ha...@gmail.com>
> >> wrote:
> >> >
> >> >> If you’re not getting values, don’t ask for the facet. Facets are
> >> >> expensive as hell, maybe you should think more about your query’s than
> >> your
> >> >> infrastructure, solr cloud won’t help you at all especially if your
> >> asking
> >> >> for things you don’t need
> >> >>
> >> >>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo <
> >> rajdeepsahoo2012@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> We have assigned 16 gb out of 24gb for heap .
> >> >>> No other process is running on that node.
> >> >>>
> >> >>> 200 facets fields are there in the query but we will not be getting
> >> the
> >> >>> values for each facets for every search.
> >> >>> There can be max of 50-60 facets for which we will be getting values.
> >> >>>
> >> >>> We are using caching,is it not going to help.
> >> >>>
> >> >>>
> >> >>>
> >> >>>> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <ap...@elyograg.org>
> >> >> wrote:
> >> >>>>
> >> >>>>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> >> >>>>> We are having 2.3 million documents and size is 2.5 gb.
> >> >>>>>  10 core cpu and 24 gb ram . 16 slave nodes.
> >> >>>>>
> >> >>>>>  Still some of the queries are taking 50 sec at solr end.
> >> >>>>> As we are using solr 4.6 .
> >> >>>>>  Other thing is we are having 200 (avg) facet fields  in a query.
> >> >>>>> And 30 searchable fields.
> >> >>>>> Is there any way to identify why it is taking 50 sec for a query.
> >> >>>>>    Multiple concurrent requests are there.
> >> >>>>
> >> >>>> Searching 30 fields and computing 200 facets is never going to be
> >> super
> >> >>>> fast.  Switching to cloud will not help, and might make it slower.
> >> >>>>
> >> >>>> Your index is pretty small to a lot of us.  There are people running
> >> >>>> indexes with billions of documents that take terabytes of disk space.
> >> >>>>
> >> >>>> As Walter mentioned, computing 200 facets is going to require a fair
> >> >>>> amount of heap memory.  One *possible* problem here is that the Solr
> >> >>>> heap size is too small, so a lot of GC is required.  How much of the
> >> >>>> 24GB have you assigned to the heap?  Is there any software other than
> >> >>>> Solr running on these nodes?
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Shawn
> >> >>>>
> >> >>
> >>
> >>

Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
Please suggest anyone

On Sun, 19 Jan, 2020, 9:43 AM Rajdeep Sahoo, <ra...@gmail.com>
wrote:

> Apart from reducing no of facets in the query, is there any other query
> params or gc params or heap space or anything else that we need to tweak
> for improving search response time.
>
> On Sun, 19 Jan, 2020, 3:15 AM Erick Erickson, <er...@gmail.com>
> wrote:
>
>> Add &debug=timing to the query and it’ll show you the time each component
>> takes.
>>
>> > On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo <ra...@gmail.com>
>> wrote:
>> >
>> > Thanks for the suggestion,
>> >
>> > Is there any way to get the info which operation or which query params
>> are
>> > increasing the response time.
>> >
>> >
>> > On Sat, 18 Jan, 2020, 11:59 PM Dave, <ha...@gmail.com>
>> wrote:
>> >
>> >> If you’re not getting values, don’t ask for the facet. Facets are
>> >> expensive as hell, maybe you should think more about your query’s than
>> your
>> >> infrastructure, solr cloud won’t help you at all especially if your
>> asking
>> >> for things you don’t need
>> >>
>> >>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo <
>> rajdeepsahoo2012@gmail.com>
>> >> wrote:
>> >>>
>> >>> We have assigned 16 gb out of 24gb for heap .
>> >>> No other process is running on that node.
>> >>>
>> >>> 200 facets fields are there in the query but we will not be getting
>> the
>> >>> values for each facets for every search.
>> >>> There can be max of 50-60 facets for which we will be getting values.
>> >>>
>> >>> We are using caching,is it not going to help.
>> >>>
>> >>>
>> >>>
>> >>>> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <ap...@elyograg.org>
>> >> wrote:
>> >>>>
>> >>>>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
>> >>>>> We are having 2.3 million documents and size is 2.5 gb.
>> >>>>>  10 core cpu and 24 gb ram . 16 slave nodes.
>> >>>>>
>> >>>>>  Still some of the queries are taking 50 sec at solr end.
>> >>>>> As we are using solr 4.6 .
>> >>>>>  Other thing is we are having 200 (avg) facet fields  in a query.
>> >>>>> And 30 searchable fields.
>> >>>>> Is there any way to identify why it is taking 50 sec for a query.
>> >>>>>    Multiple concurrent requests are there.
>> >>>>
>> >>>> Searching 30 fields and computing 200 facets is never going to be
>> super
>> >>>> fast.  Switching to cloud will not help, and might make it slower.
>> >>>>
>> >>>> Your index is pretty small to a lot of us.  There are people running
>> >>>> indexes with billions of documents that take terabytes of disk space.
>> >>>>
>> >>>> As Walter mentioned, computing 200 facets is going to require a fair
>> >>>> amount of heap memory.  One *possible* problem here is that the Solr
>> >>>> heap size is too small, so a lot of GC is required.  How much of the
>> >>>> 24GB have you assigned to the heap?  Is there any software other than
>> >>>> Solr running on these nodes?
>> >>>>
>> >>>> Thanks,
>> >>>> Shawn
>> >>>>
>> >>
>>
>>

Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
Apart from reducing no of facets in the query, is there any other query
params or gc params or heap space or anything else that we need to tweak
for improving search response time.

On Sun, 19 Jan, 2020, 3:15 AM Erick Erickson, <er...@gmail.com>
wrote:

> Add &debug=timing to the query and it’ll show you the time each component
> takes.
>
> > On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo <ra...@gmail.com>
> wrote:
> >
> > Thanks for the suggestion,
> >
> > Is there any way to get the info which operation or which query params
> are
> > increasing the response time.
> >
> >
> > On Sat, 18 Jan, 2020, 11:59 PM Dave, <ha...@gmail.com>
> wrote:
> >
> >> If you’re not getting values, don’t ask for the facet. Facets are
> >> expensive as hell, maybe you should think more about your query’s than
> your
> >> infrastructure, solr cloud won’t help you at all especially if your
> asking
> >> for things you don’t need
> >>
> >>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo <rajdeepsahoo2012@gmail.com
> >
> >> wrote:
> >>>
> >>> We have assigned 16 gb out of 24gb for heap .
> >>> No other process is running on that node.
> >>>
> >>> 200 facets fields are there in the query but we will not be getting the
> >>> values for each facets for every search.
> >>> There can be max of 50-60 facets for which we will be getting values.
> >>>
> >>> We are using caching,is it not going to help.
> >>>
> >>>
> >>>
> >>>> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <ap...@elyograg.org>
> >> wrote:
> >>>>
> >>>>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> >>>>> We are having 2.3 million documents and size is 2.5 gb.
> >>>>>  10 core cpu and 24 gb ram . 16 slave nodes.
> >>>>>
> >>>>>  Still some of the queries are taking 50 sec at solr end.
> >>>>> As we are using solr 4.6 .
> >>>>>  Other thing is we are having 200 (avg) facet fields  in a query.
> >>>>> And 30 searchable fields.
> >>>>> Is there any way to identify why it is taking 50 sec for a query.
> >>>>>    Multiple concurrent requests are there.
> >>>>
> >>>> Searching 30 fields and computing 200 facets is never going to be
> super
> >>>> fast.  Switching to cloud will not help, and might make it slower.
> >>>>
> >>>> Your index is pretty small to a lot of us.  There are people running
> >>>> indexes with billions of documents that take terabytes of disk space.
> >>>>
> >>>> As Walter mentioned, computing 200 facets is going to require a fair
> >>>> amount of heap memory.  One *possible* problem here is that the Solr
> >>>> heap size is too small, so a lot of GC is required.  How much of the
> >>>> 24GB have you assigned to the heap?  Is there any software other than
> >>>> Solr running on these nodes?
> >>>>
> >>>> Thanks,
> >>>> Shawn
> >>>>
> >>
>
>

Re: Solr cloud production set up

Posted by Erick Erickson <er...@gmail.com>.
Add &debug=timing to the query and it’ll show you the time each component takes.

> On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo <ra...@gmail.com> wrote:
> 
> Thanks for the suggestion,
> 
> Is there any way to get the info which operation or which query params are
> increasing the response time.
> 
> 
> On Sat, 18 Jan, 2020, 11:59 PM Dave, <ha...@gmail.com> wrote:
> 
>> If you’re not getting values, don’t ask for the facet. Facets are
>> expensive as hell, maybe you should think more about your query’s than your
>> infrastructure, solr cloud won’t help you at all especially if your asking
>> for things you don’t need
>> 
>>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo <ra...@gmail.com>
>> wrote:
>>> 
>>> We have assigned 16 gb out of 24gb for heap .
>>> No other process is running on that node.
>>> 
>>> 200 facets fields are there in the query but we will not be getting the
>>> values for each facets for every search.
>>> There can be max of 50-60 facets for which we will be getting values.
>>> 
>>> We are using caching,is it not going to help.
>>> 
>>> 
>>> 
>>>> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <ap...@elyograg.org>
>> wrote:
>>>> 
>>>>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
>>>>> We are having 2.3 million documents and size is 2.5 gb.
>>>>>  10 core cpu and 24 gb ram . 16 slave nodes.
>>>>> 
>>>>>  Still some of the queries are taking 50 sec at solr end.
>>>>> As we are using solr 4.6 .
>>>>>  Other thing is we are having 200 (avg) facet fields  in a query.
>>>>> And 30 searchable fields.
>>>>> Is there any way to identify why it is taking 50 sec for a query.
>>>>>    Multiple concurrent requests are there.
>>>> 
>>>> Searching 30 fields and computing 200 facets is never going to be super
>>>> fast.  Switching to cloud will not help, and might make it slower.
>>>> 
>>>> Your index is pretty small to a lot of us.  There are people running
>>>> indexes with billions of documents that take terabytes of disk space.
>>>> 
>>>> As Walter mentioned, computing 200 facets is going to require a fair
>>>> amount of heap memory.  One *possible* problem here is that the Solr
>>>> heap size is too small, so a lot of GC is required.  How much of the
>>>> 24GB have you assigned to the heap?  Is there any software other than
>>>> Solr running on these nodes?
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>> 


Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
Thanks for the suggestion,

 Is there any way to get the info which operation or which query params are
increasing the response time.


On Sat, 18 Jan, 2020, 11:59 PM Dave, <ha...@gmail.com> wrote:

> If you’re not getting values, don’t ask for the facet. Facets are
> expensive as hell, maybe you should think more about your query’s than your
> infrastructure, solr cloud won’t help you at all especially if your asking
> for things you don’t need
>
> > On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo <ra...@gmail.com>
> wrote:
> >
> > We have assigned 16 gb out of 24gb for heap .
> > No other process is running on that node.
> >
> > 200 facets fields are there in the query but we will not be getting the
> > values for each facets for every search.
> > There can be max of 50-60 facets for which we will be getting values.
> >
> > We are using caching,is it not going to help.
> >
> >
> >
> >> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <ap...@elyograg.org>
> wrote:
> >>
> >>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> >>> We are having 2.3 million documents and size is 2.5 gb.
> >>>   10 core cpu and 24 gb ram . 16 slave nodes.
> >>>
> >>>   Still some of the queries are taking 50 sec at solr end.
> >>> As we are using solr 4.6 .
> >>>   Other thing is we are having 200 (avg) facet fields  in a query.
> >>>  And 30 searchable fields.
> >>>  Is there any way to identify why it is taking 50 sec for a query.
> >>>     Multiple concurrent requests are there.
> >>
> >> Searching 30 fields and computing 200 facets is never going to be super
> >> fast.  Switching to cloud will not help, and might make it slower.
> >>
> >> Your index is pretty small to a lot of us.  There are people running
> >> indexes with billions of documents that take terabytes of disk space.
> >>
> >> As Walter mentioned, computing 200 facets is going to require a fair
> >> amount of heap memory.  One *possible* problem here is that the Solr
> >> heap size is too small, so a lot of GC is required.  How much of the
> >> 24GB have you assigned to the heap?  Is there any software other than
> >> Solr running on these nodes?
> >>
> >> Thanks,
> >> Shawn
> >>
>

Re: Solr cloud production set up

Posted by Dave <ha...@gmail.com>.
If you’re not getting values, don’t ask for the facet. Facets are expensive as hell, maybe you should think more about your query’s than your infrastructure, solr cloud won’t help you at all especially if your asking for things you don’t need

> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo <ra...@gmail.com> wrote:
> 
> We have assigned 16 gb out of 24gb for heap .
> No other process is running on that node.
> 
> 200 facets fields are there in the query but we will not be getting the
> values for each facets for every search.
> There can be max of 50-60 facets for which we will be getting values.
> 
> We are using caching,is it not going to help.
> 
> 
> 
>> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <ap...@elyograg.org> wrote:
>> 
>>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
>>> We are having 2.3 million documents and size is 2.5 gb.
>>>   10 core cpu and 24 gb ram . 16 slave nodes.
>>> 
>>>   Still some of the queries are taking 50 sec at solr end.
>>> As we are using solr 4.6 .
>>>   Other thing is we are having 200 (avg) facet fields  in a query.
>>>  And 30 searchable fields.
>>>  Is there any way to identify why it is taking 50 sec for a query.
>>>     Multiple concurrent requests are there.
>> 
>> Searching 30 fields and computing 200 facets is never going to be super
>> fast.  Switching to cloud will not help, and might make it slower.
>> 
>> Your index is pretty small to a lot of us.  There are people running
>> indexes with billions of documents that take terabytes of disk space.
>> 
>> As Walter mentioned, computing 200 facets is going to require a fair
>> amount of heap memory.  One *possible* problem here is that the Solr
>> heap size is too small, so a lot of GC is required.  How much of the
>> 24GB have you assigned to the heap?  Is there any software other than
>> Solr running on these nodes?
>> 
>> Thanks,
>> Shawn
>> 

Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
We have assigned 16 gb out of 24gb for heap .
 No other process is running on that node.

200 facets fields are there in the query but we will not be getting the
values for each facets for every search.
There can be max of 50-60 facets for which we will be getting values.

 We are using caching,is it not going to help.



On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <ap...@elyograg.org> wrote:

> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> > We are having 2.3 million documents and size is 2.5 gb.
> >    10 core cpu and 24 gb ram . 16 slave nodes.
> >
> >    Still some of the queries are taking 50 sec at solr end.
> > As we are using solr 4.6 .
> >    Other thing is we are having 200 (avg) facet fields  in a query.
> >   And 30 searchable fields.
> >   Is there any way to identify why it is taking 50 sec for a query.
> >      Multiple concurrent requests are there.
>
> Searching 30 fields and computing 200 facets is never going to be super
> fast.  Switching to cloud will not help, and might make it slower.
>
> Your index is pretty small to a lot of us.  There are people running
> indexes with billions of documents that take terabytes of disk space.
>
> As Walter mentioned, computing 200 facets is going to require a fair
> amount of heap memory.  One *possible* problem here is that the Solr
> heap size is too small, so a lot of GC is required.  How much of the
> 24GB have you assigned to the heap?  Is there any software other than
> Solr running on these nodes?
>
> Thanks,
> Shawn
>

Re: Solr cloud production set up

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> We are having 2.3 million documents and size is 2.5 gb.
>    10 core cpu and 24 gb ram . 16 slave nodes.
> 
>    Still some of the queries are taking 50 sec at solr end.
> As we are using solr 4.6 .
>    Other thing is we are having 200 (avg) facet fields  in a query.
>   And 30 searchable fields.
>   Is there any way to identify why it is taking 50 sec for a query.
>      Multiple concurrent requests are there.

Searching 30 fields and computing 200 facets is never going to be super 
fast.  Switching to cloud will not help, and might make it slower.

Your index is pretty small to a lot of us.  There are people running 
indexes with billions of documents that take terabytes of disk space.

As Walter mentioned, computing 200 facets is going to require a fair 
amount of heap memory.  One *possible* problem here is that the Solr 
heap size is too small, so a lot of GC is required.  How much of the 
24GB have you assigned to the heap?  Is there any software other than 
Solr running on these nodes?

Thanks,
Shawn

Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
We are having 2.3 million documents and size is 2.5 gb.
  10 core cpu and 24 gb ram . 16 slave nodes.

  Still some of the queries are taking 50 sec at solr end.
As we are using solr 4.6 .
  Other thing is we are having 200 (avg) facet fields  in a query.
 And 30 searchable fields.
 Is there any way to identify why it is taking 50 sec for a query.
    Multiple concurrent requests are there.



On Sat, 18 Jan, 2020, 10:32 PM Dave, <ha...@gmail.com> wrote:

> Agreed with the above. what’s your idea of “huge”? I have 600 ish gb in
> one core plus another 250x2 in two more on the same standalone solr
> instance and it runs more than fine
>
> > On Jan 18, 2020, at 11:31 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> >
> > On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote:
> >> Our Index size is huge and in master slave the full indexing time is
> almost
> >> 24 hrs.
> >>    In future the no of documents will increase.
> >> So,please some one recommend about the no of nodes and configuration
> like
> >> ram and cpu core for solr cloud.
> >
> > Indexing is not going to be any faster in SolrCloud.  It would probably
> be a little bit slower.  The best way to speed up indexing, whether running
> SolrCloud or not, is to make your indexing processes run in parallel, so
> that multiple batches of documents are being indexed at the same time.
> >
> > SolrCloud is not a magic bullet that solves all problems.  It's just a
> different way of managing indexes that has more automation, and makes
> initial setup of a distributed index a lot easier.  It doesn't do the job
> any faster than running without SolrCloud.  The legacy master/slave mode is
> likely to be a little bit faster.
> >
> > You haven't provided any of the information required for us to guess
> about the system requirements.  And it will be a guess ... we could be
> completely wrong.
> >
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> > Thanks,
> > Shawn
>

Re: Solr cloud production set up

Posted by Dave <ha...@gmail.com>.
Agreed with the above. what’s your idea of “huge”? I have 600 ish gb in one core plus another 250x2 in two more on the same standalone solr instance and it runs more than fine

> On Jan 18, 2020, at 11:31 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote:
>> Our Index size is huge and in master slave the full indexing time is almost
>> 24 hrs.
>>    In future the no of documents will increase.
>> So,please some one recommend about the no of nodes and configuration like
>> ram and cpu core for solr cloud.
> 
> Indexing is not going to be any faster in SolrCloud.  It would probably be a little bit slower.  The best way to speed up indexing, whether running SolrCloud or not, is to make your indexing processes run in parallel, so that multiple batches of documents are being indexed at the same time.
> 
> SolrCloud is not a magic bullet that solves all problems.  It's just a different way of managing indexes that has more automation, and makes initial setup of a distributed index a lot easier.  It doesn't do the job any faster than running without SolrCloud.  The legacy master/slave mode is likely to be a little bit faster.
> 
> You haven't provided any of the information required for us to guess about the system requirements.  And it will be a guess ... we could be completely wrong.
> 
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> 
> Thanks,
> Shawn

Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
Although we are having a avg of 200 facet fields in the search request all
of them will not be having values in each request.
    Max of 50-60 facet fields will be having some value.
  And we are using function query,is it having some performance impact.


On Sat, 18 Jan, 2020, 11:10 PM Walter Underwood, <wu...@wunderwood.org>
wrote:

> For indexing, is the master node CPU around 90%? If not, you aren’t
> sending requests fast enough or your disk is slow.
>
> For querying, 200 facet fields is HUGE. That will take a lot of Java heap
> memory and will be slow. Each facet fields requires large in-memory arrays
> and sorting.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jan 18, 2020, at 9:29 AM, Rajdeep Sahoo <ra...@gmail.com>
> wrote:
> >
> > Hi shawn,
> >  Thanks for this info,
> > Could you Please address my below query,
> >
> >
> > We are having 2.3 million documents and size is 2.5 gb.
> > With this data do we need solr cloud.
> >
> >  10 core cpu and 24 gb ram . 16 slave nodes.
> >
> >  Still some of the queries are taking 50 sec at solr end.
> > As we are using solr 4.6 .
> >  Other thing is we are having 200 (avg) facet fields  in a query.
> > And 30 searchable fields.
> > Is there any way to identify why it is taking 50 sec for a query.
> >    Multiple concurrent requests are there.
> >
> > And how to optimize the search response time as it is almost 1 mins for
> > some request.
> >
> >
> > On Sat, 18 Jan, 2020, 10:52 PM Shawn Heisey, <ap...@elyograg.org>
> wrote:
> >
> >> On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote:
> >>> We do parallel indexing in production,
> >>>
> >>>  What about search performance in solr cloud in comparison with master
> >>> slave.
> >>>    And what about  block join performance in solr cloud.
> >>>    Do we need to increase the infra for solr cloud as we would be
> >>> maintaining multiple shard and replica.
> >>>   Is there any co relation with master slave set up.
> >>
> >> As I said before, SolrCloud is not a magic bullet that solves
> >> performance issues.  If the index characteristics are the same (number
> >> of docs, total size), performance in SolrCloud will be nearly identical
> >> to non-cloud.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>

Re: Solr cloud production set up

Posted by Walter Underwood <wu...@wunderwood.org>.
For indexing, is the master node CPU around 90%? If not, you aren’t sending requests fast enough or your disk is slow.

For querying, 200 facet fields is HUGE. That will take a lot of Java heap memory and will be slow. Each facet fields requires large in-memory arrays and sorting.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 18, 2020, at 9:29 AM, Rajdeep Sahoo <ra...@gmail.com> wrote:
> 
> Hi shawn,
>  Thanks for this info,
> Could you Please address my below query,
> 
> 
> We are having 2.3 million documents and size is 2.5 gb.
> With this data do we need solr cloud.
> 
>  10 core cpu and 24 gb ram . 16 slave nodes.
> 
>  Still some of the queries are taking 50 sec at solr end.
> As we are using solr 4.6 .
>  Other thing is we are having 200 (avg) facet fields  in a query.
> And 30 searchable fields.
> Is there any way to identify why it is taking 50 sec for a query.
>    Multiple concurrent requests are there.
> 
> And how to optimize the search response time as it is almost 1 mins for
> some request.
> 
> 
> On Sat, 18 Jan, 2020, 10:52 PM Shawn Heisey, <ap...@elyograg.org> wrote:
> 
>> On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote:
>>> We do parallel indexing in production,
>>> 
>>>  What about search performance in solr cloud in comparison with master
>>> slave.
>>>    And what about  block join performance in solr cloud.
>>>    Do we need to increase the infra for solr cloud as we would be
>>> maintaining multiple shard and replica.
>>>   Is there any co relation with master slave set up.
>> 
>> As I said before, SolrCloud is not a magic bullet that solves
>> performance issues.  If the index characteristics are the same (number
>> of docs, total size), performance in SolrCloud will be nearly identical
>> to non-cloud.
>> 
>> Thanks,
>> Shawn
>> 


Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
Hi shawn,
  Thanks for this info,
Could you Please address my below query,


We are having 2.3 million documents and size is 2.5 gb.
 With this data do we need solr cloud.

  10 core cpu and 24 gb ram . 16 slave nodes.

  Still some of the queries are taking 50 sec at solr end.
As we are using solr 4.6 .
  Other thing is we are having 200 (avg) facet fields  in a query.
 And 30 searchable fields.
 Is there any way to identify why it is taking 50 sec for a query.
    Multiple concurrent requests are there.

And how to optimize the search response time as it is almost 1 mins for
some request.


On Sat, 18 Jan, 2020, 10:52 PM Shawn Heisey, <ap...@elyograg.org> wrote:

> On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote:
> > We do parallel indexing in production,
> >
> >   What about search performance in solr cloud in comparison with master
> > slave.
> >     And what about  block join performance in solr cloud.
> >     Do we need to increase the infra for solr cloud as we would be
> > maintaining multiple shard and replica.
> >    Is there any co relation with master slave set up.
>
> As I said before, SolrCloud is not a magic bullet that solves
> performance issues.  If the index characteristics are the same (number
> of docs, total size), performance in SolrCloud will be nearly identical
> to non-cloud.
>
> Thanks,
> Shawn
>

Re: Solr cloud production set up

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/18/2020 9:55 AM, Rajdeep Sahoo wrote:
> We do parallel indexing in production,
> 
>   What about search performance in solr cloud in comparison with master
> slave.
>     And what about  block join performance in solr cloud.
>     Do we need to increase the infra for solr cloud as we would be
> maintaining multiple shard and replica.
>    Is there any co relation with master slave set up.

As I said before, SolrCloud is not a magic bullet that solves 
performance issues.  If the index characteristics are the same (number 
of docs, total size), performance in SolrCloud will be nearly identical 
to non-cloud.

Thanks,
Shawn

Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
Hi shawn,
 Thanks for your reply

We do parallel indexing in production,

 What about search performance in solr cloud in comparison with master
slave.
   And what about  block join performance in solr cloud.
   Do we need to increase the infra for solr cloud as we would be
maintaining multiple shard and replica.
  Is there any co relation with master slave set up.




On Sat, 18 Jan, 2020, 10:01 PM Shawn Heisey, <ap...@elyograg.org> wrote:

> On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote:
> > Our Index size is huge and in master slave the full indexing time is
> almost
> > 24 hrs.
> >     In future the no of documents will increase.
> > So,please some one recommend about the no of nodes and configuration like
> > ram and cpu core for solr cloud.
>
> Indexing is not going to be any faster in SolrCloud.  It would probably
> be a little bit slower.  The best way to speed up indexing, whether
> running SolrCloud or not, is to make your indexing processes run in
> parallel, so that multiple batches of documents are being indexed at the
> same time.
>
> SolrCloud is not a magic bullet that solves all problems.  It's just a
> different way of managing indexes that has more automation, and makes
> initial setup of a distributed index a lot easier.  It doesn't do the
> job any faster than running without SolrCloud.  The legacy master/slave
> mode is likely to be a little bit faster.
>
> You haven't provided any of the information required for us to guess
> about the system requirements.  And it will be a guess ... we could be
> completely wrong.
>
>
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Thanks,
> Shawn
>

Re: Solr cloud production set up

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/18/2020 1:05 AM, Rajdeep Sahoo wrote:
> Our Index size is huge and in master slave the full indexing time is almost
> 24 hrs.
>     In future the no of documents will increase.
> So,please some one recommend about the no of nodes and configuration like
> ram and cpu core for solr cloud.

Indexing is not going to be any faster in SolrCloud.  It would probably 
be a little bit slower.  The best way to speed up indexing, whether 
running SolrCloud or not, is to make your indexing processes run in 
parallel, so that multiple batches of documents are being indexed at the 
same time.

SolrCloud is not a magic bullet that solves all problems.  It's just a 
different way of managing indexes that has more automation, and makes 
initial setup of a distributed index a lot easier.  It doesn't do the 
job any faster than running without SolrCloud.  The legacy master/slave 
mode is likely to be a little bit faster.

You haven't provided any of the information required for us to guess 
about the system requirements.  And it will be a guess ... we could be 
completely wrong.

https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Thanks,
Shawn

Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
Our Index size is huge and in master slave the full indexing time is almost
24 hrs.
   In future the no of documents will increase.
So,please some one recommend about the no of nodes and configuration like
ram and cpu core for solr cloud.

On Sat, 18 Jan, 2020, 8:05 AM Walter Underwood, <wu...@wunderwood.org>
wrote:

> Why do you want to change to Solr Cloud? Master/slave is a great, stable
> cluster architecture.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo <ra...@gmail.com>
> wrote:
> >
> > Please reply anyone
> >
> > On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <
> rajdeepsahoo2012@gmail.com>
> > wrote:
> >
> >> Hi all,
> >> We are using solr cloud 7.7.1
> >> In a live production environment how many solr cloud server do we need,
> >> Currently ,we are using master slave set up with 16 slave server with
> >> solr 4.6.
> >> In solr cloud do we need to scale it up or 16 server will suffice the
> >> purpose.
> >>
> >>
>
>

Re: Solr cloud production set up

Posted by Walter Underwood <wu...@wunderwood.org>.
Why do you want to change to Solr Cloud? Master/slave is a great, stable cluster architecture.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 17, 2020, at 6:19 PM, Rajdeep Sahoo <ra...@gmail.com> wrote:
> 
> Please reply anyone
> 
> On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <ra...@gmail.com>
> wrote:
> 
>> Hi all,
>> We are using solr cloud 7.7.1
>> In a live production environment how many solr cloud server do we need,
>> Currently ,we are using master slave set up with 16 slave server with
>> solr 4.6.
>> In solr cloud do we need to scale it up or 16 server will suffice the
>> purpose.
>> 
>> 


Re: Solr cloud production set up

Posted by Rajdeep Sahoo <ra...@gmail.com>.
Please reply anyone

On Sat, 18 Jan, 2020, 12:13 AM Rajdeep Sahoo, <ra...@gmail.com>
wrote:

> Hi all,
>  We are using solr cloud 7.7.1
> In a live production environment how many solr cloud server do we need,
>  Currently ,we are using master slave set up with 16 slave server with
> solr 4.6.
> In solr cloud do we need to scale it up or 16 server will suffice the
> purpose.
>
>