You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Software Dev <st...@gmail.com> on 2014/01/20 22:00:27 UTC

Solr Cloud Bulk Indexing Questions

We are testing our shiny new Solr Cloud architecture but we are
experiencing some issues when doing bulk indexing.

We have 5 solr cloud machines running and 3 indexing machines (separate
from the cloud servers). The indexing machines pull off ids from a queue
then they index and ship over a document via a CloudSolrServer. It appears
that the indexers are too fast because the load (particularly disk io) on
the solr cloud machines spikes through the roof making the entire cluster
unusable. It's kind of odd because the total index size is not even
large..ie, < 10GB. Are there any optimization/enhancements I could try to
help alleviate these problems?

I should note that for the above collection we have only have 1 shard thats
replicated across all machines so all machines have the full index.

Would we benefit from switching to a ConcurrentUpdateSolrServer where all
updates get sent to 1 machine and 1 machine only? We could then remove this
machine from our cluster than that handles user requests.

Thanks for any input.

Re: Solr Cloud Bulk Indexing Questions

Posted by Shawn Heisey <so...@elyograg.org>.
On 1/23/2014 11:01 AM, Software Dev wrote:
> Is there any way to configure autoCommit, softCommit values on a per
> request basis? The majority of the time we have small flow of updates
> coming in and we would like to see them in ASAP. However we occasionally
> need to do some bulk indexing (once a week or less) and the need to see
> those updates right away isn't as critical.
>
> I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode
> and the other 5% is "Index-Heavy Query-Light/Heavy" mode.

One thing missing on that searchhub page is the commitWithin parameter.  
This is a parameter that will ensure that any documents added by that 
update request will be committed within the number of milliseconds 
given.  This is particularly useful for bursty updates, because if all 
your updates are done before the commitWithin time expires, a single 
commit will get all of them, not just the first one.

http://wiki.apache.org/solr/CommitWithin

Since Solr 4.0, commitWithin will result in a soft commit. With 4.2 and 
later, it can optionally be changed to a hard commit.

https://issues.apache.org/jira/browse/SOLR-4370

If you're using SolrCloud with a distributed index, some versions may 
not work as expected when using commitWithin:

https://issues.apache.org/jira/browse/SOLR-5658

Thanks,
Shawn


Re: Solr Cloud Bulk Indexing Questions

Posted by Software Dev <st...@gmail.com>.
Also, any suggestions on debugging? What should I look for and how? Thanks


On Thu, Jan 23, 2014 at 10:01 AM, Software Dev <st...@gmail.com>wrote:

> Thanks for suggestions. After reading that document I feel even more
> confused though because I always thought that hard commits should be less
> frequent that hard commits.
>
> Is there any way to configure autoCommit, softCommit values on a per
> request basis? The majority of the time we have small flow of updates
> coming in and we would like to see them in ASAP. However we occasionally
> need to do some bulk indexing (once a week or less) and the need to see
> those updates right away isn't as critical.
>
> I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode
> and the other 5% is "Index-Heavy Query-Light/Heavy" mode.
>
> Thanks
>
>
> On Wed, Jan 22, 2014 at 5:33 PM, Erick Erickson <er...@gmail.com>wrote:
>
>> When you're doing hard commits, is it with openSeacher = true or
>> false? It should probably be false...
>>
>> Here's a rundown of the soft/hard commit consequences:
>>
>>
>> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> I suspect (but, of course, can't prove) that you're over-committing
>> and hitting segment
>> merges without meaning to...
>>
>> FWIW,
>> Erick
>>
>> On Wed, Jan 22, 2014 at 1:46 PM, Software Dev <st...@gmail.com>
>> wrote:
>> > A suggestion would be to hard commit much less often, ie every 10
>> > minutes, and see if there is a change.
>> >
>> > - Will try this
>> >
>> > How much system RAM ? JVM Heap ? Enough space in RAM for system disk
>> cache ?
>> >
>> > - We have 18G of ram 12 dedicated to Solr but as of right now the total
>> > index size is only 5GB
>> >
>> > Ah, and what about network IO ? Could that be a limiting factor ?
>> >
>> > - What is the size of your documents ? A few KB, MB, ... ?
>> >
>> > Under 1MB
>> >
>> > - Again, total index size is only 5GB so I dont know if this would be a
>> > problem
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
>> > <an...@kelkoo.com>wrote:
>> >
>> >> 1 node having more load should be the leader (because of the extra work
>> >> of receiving and distributing updates, but my experiences show only a
>> >> bit more CPU usage, and no difference in disk IO).
>> >>
>> >> A suggestion would be to hard commit much less often, ie every 10
>> >> minutes, and see if there is a change.
>> >> How much system RAM ? JVM Heap ? Enough space in RAM for system disk
>> cache
>> >> ?
>> >> What is the size of your documents ? A few KB, MB, ... ?
>> >> Ah, and what about network IO ? Could that be a limiting factor ?
>> >>
>> >>
>> >> André
>> >>
>> >>
>> >> On 2014-01-21 23:40, Software Dev wrote:
>> >>
>> >>> Any other suggestions?
>> >>>
>> >>>
>> >>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <
>> static.void.dev@gmail.com>
>> >>> wrote:
>> >>>
>> >>>  4.6.0
>> >>>>
>> >>>>
>> >>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller <markrmiller@gmail.com
>> >>>> >wrote:
>> >>>>
>> >>>>  What version are you running?
>> >>>>>
>> >>>>> - Mark
>> >>>>>
>> >>>>> On Jan 20, 2014, at 5:43 PM, Software Dev <
>> static.void.dev@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>  We also noticed that disk IO shoots up to 100% on 1 of the nodes.
>> Do
>> >>>>>> all
>> >>>>>> updates get sent to one machine or something?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>> >>>>>>
>> >>>>> static.void.dev@gmail.com>wrote:
>> >>>>>
>> >>>>>> We commit have a soft commit every 5 seconds and hard commit every
>> 30.
>> >>>>>>>
>> >>>>>> As
>> >>>>>
>> >>>>>> far as docs/second it would guess around 200/sec which doesn't seem
>> >>>>>>>
>> >>>>>> that
>> >>>>>
>> >>>>>> high.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>> >>>>>>>
>> >>>>>> erickerickson@gmail.com>wrote:
>> >>>>>
>> >>>>>> Questions: How often do you commit your updates? What is your
>> >>>>>>>> indexing rate in docs/second?
>> >>>>>>>>
>> >>>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If
>> the
>> >>>>>>>> server is having trouble keeping up with updates, switching to
>> CUSS
>> >>>>>>>> probably wouldn't help.
>> >>>>>>>>
>> >>>>>>>> So I suspect there's something not optimal about your setup
>> that's
>> >>>>>>>> the culprit.
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Erick
>> >>>>>>>>
>> >>>>>>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>> >>>>>>>>
>> >>>>>>> static.void.dev@gmail.com>
>> >>>>>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> We are testing our shiny new Solr Cloud architecture but we are
>> >>>>>>>>> experiencing some issues when doing bulk indexing.
>> >>>>>>>>>
>> >>>>>>>>> We have 5 solr cloud machines running and 3 indexing machines
>> >>>>>>>>>
>> >>>>>>>> (separate
>> >>>>>
>> >>>>>> from the cloud servers). The indexing machines pull off ids from a
>> >>>>>>>>>
>> >>>>>>>> queue
>> >>>>>
>> >>>>>> then they index and ship over a document via a CloudSolrServer. It
>> >>>>>>>>>
>> >>>>>>>> appears
>> >>>>>>>>
>> >>>>>>>>> that the indexers are too fast because the load (particularly
>> disk
>> >>>>>>>>>
>> >>>>>>>> io)
>> >>>>>
>> >>>>>> on
>> >>>>>>>>
>> >>>>>>>>> the solr cloud machines spikes through the roof making the
>> entire
>> >>>>>>>>>
>> >>>>>>>> cluster
>> >>>>>>>>
>> >>>>>>>>> unusable. It's kind of odd because the total index size is not
>> even
>> >>>>>>>>> large..ie, < 10GB. Are there any optimization/enhancements I
>> could
>> >>>>>>>>>
>> >>>>>>>> try
>> >>>>>
>> >>>>>> to
>> >>>>>>>>
>> >>>>>>>>> help alleviate these problems?
>> >>>>>>>>>
>> >>>>>>>>> I should note that for the above collection we have only have 1
>> >>>>>>>>> shard
>> >>>>>>>>>
>> >>>>>>>> thats
>> >>>>>>>>
>> >>>>>>>>> replicated across all machines so all machines have the full
>> index.
>> >>>>>>>>>
>> >>>>>>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer
>> >>>>>>>>> where
>> >>>>>>>>>
>> >>>>>>>> all
>> >>>>>>>>
>> >>>>>>>>> updates get sent to 1 machine and 1 machine only? We could then
>> >>>>>>>>>
>> >>>>>>>> remove
>> >>>>>
>> >>>>>> this
>> >>>>>>>>
>> >>>>>>>>> machine from our cluster than that handles user requests.
>> >>>>>>>>>
>> >>>>>>>>> Thanks for any input.
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>> --
>> >>> André Bois-Crettez
>> >>>
>> >>> Software Architect
>> >>> Search Developer
>> >>> http://www.kelkoo.com/
>> >>>
>> >>
>> >> Kelkoo SAS
>> >> Société par Actions Simplifiée
>> >> Au capital de € 4.168.964,30
>> >> Siège social : 8, rue du Sentier 75002 Paris
>> >> 425 093 069 RCS Paris
>> >>
>> >> Ce message et les pièces jointes sont confidentiels et établis à
>> >> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
>> >> destinataire de ce message, merci de le détruire et d'en avertir
>> >> l'expéditeur.
>> >>
>>
>
>

Re: Solr Cloud Bulk Indexing Questions

Posted by Software Dev <st...@gmail.com>.
Thanks for suggestions. After reading that document I feel even more
confused though because I always thought that hard commits should be less
frequent that hard commits.

Is there any way to configure autoCommit, softCommit values on a per
request basis? The majority of the time we have small flow of updates
coming in and we would like to see them in ASAP. However we occasionally
need to do some bulk indexing (once a week or less) and the need to see
those updates right away isn't as critical.

I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode
and the other 5% is "Index-Heavy Query-Light/Heavy" mode.

Thanks


On Wed, Jan 22, 2014 at 5:33 PM, Erick Erickson <er...@gmail.com>wrote:

> When you're doing hard commits, is it with openSeacher = true or
> false? It should probably be false...
>
> Here's a rundown of the soft/hard commit consequences:
>
>
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> I suspect (but, of course, can't prove) that you're over-committing
> and hitting segment
> merges without meaning to...
>
> FWIW,
> Erick
>
> On Wed, Jan 22, 2014 at 1:46 PM, Software Dev <st...@gmail.com>
> wrote:
> > A suggestion would be to hard commit much less often, ie every 10
> > minutes, and see if there is a change.
> >
> > - Will try this
> >
> > How much system RAM ? JVM Heap ? Enough space in RAM for system disk
> cache ?
> >
> > - We have 18G of ram 12 dedicated to Solr but as of right now the total
> > index size is only 5GB
> >
> > Ah, and what about network IO ? Could that be a limiting factor ?
> >
> > - What is the size of your documents ? A few KB, MB, ... ?
> >
> > Under 1MB
> >
> > - Again, total index size is only 5GB so I dont know if this would be a
> > problem
> >
> >
> >
> >
> >
> >
> > On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
> > <an...@kelkoo.com>wrote:
> >
> >> 1 node having more load should be the leader (because of the extra work
> >> of receiving and distributing updates, but my experiences show only a
> >> bit more CPU usage, and no difference in disk IO).
> >>
> >> A suggestion would be to hard commit much less often, ie every 10
> >> minutes, and see if there is a change.
> >> How much system RAM ? JVM Heap ? Enough space in RAM for system disk
> cache
> >> ?
> >> What is the size of your documents ? A few KB, MB, ... ?
> >> Ah, and what about network IO ? Could that be a limiting factor ?
> >>
> >>
> >> André
> >>
> >>
> >> On 2014-01-21 23:40, Software Dev wrote:
> >>
> >>> Any other suggestions?
> >>>
> >>>
> >>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <
> static.void.dev@gmail.com>
> >>> wrote:
> >>>
> >>>  4.6.0
> >>>>
> >>>>
> >>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller <markrmiller@gmail.com
> >>>> >wrote:
> >>>>
> >>>>  What version are you running?
> >>>>>
> >>>>> - Mark
> >>>>>
> >>>>> On Jan 20, 2014, at 5:43 PM, Software Dev <static.void.dev@gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>>  We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do
> >>>>>> all
> >>>>>> updates get sent to one machine or something?
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
> >>>>>>
> >>>>> static.void.dev@gmail.com>wrote:
> >>>>>
> >>>>>> We commit have a soft commit every 5 seconds and hard commit every
> 30.
> >>>>>>>
> >>>>>> As
> >>>>>
> >>>>>> far as docs/second it would guess around 200/sec which doesn't seem
> >>>>>>>
> >>>>>> that
> >>>>>
> >>>>>> high.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
> >>>>>>>
> >>>>>> erickerickson@gmail.com>wrote:
> >>>>>
> >>>>>> Questions: How often do you commit your updates? What is your
> >>>>>>>> indexing rate in docs/second?
> >>>>>>>>
> >>>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If
> the
> >>>>>>>> server is having trouble keeping up with updates, switching to
> CUSS
> >>>>>>>> probably wouldn't help.
> >>>>>>>>
> >>>>>>>> So I suspect there's something not optimal about your setup that's
> >>>>>>>> the culprit.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Erick
> >>>>>>>>
> >>>>>>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
> >>>>>>>>
> >>>>>>> static.void.dev@gmail.com>
> >>>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> We are testing our shiny new Solr Cloud architecture but we are
> >>>>>>>>> experiencing some issues when doing bulk indexing.
> >>>>>>>>>
> >>>>>>>>> We have 5 solr cloud machines running and 3 indexing machines
> >>>>>>>>>
> >>>>>>>> (separate
> >>>>>
> >>>>>> from the cloud servers). The indexing machines pull off ids from a
> >>>>>>>>>
> >>>>>>>> queue
> >>>>>
> >>>>>> then they index and ship over a document via a CloudSolrServer. It
> >>>>>>>>>
> >>>>>>>> appears
> >>>>>>>>
> >>>>>>>>> that the indexers are too fast because the load (particularly
> disk
> >>>>>>>>>
> >>>>>>>> io)
> >>>>>
> >>>>>> on
> >>>>>>>>
> >>>>>>>>> the solr cloud machines spikes through the roof making the entire
> >>>>>>>>>
> >>>>>>>> cluster
> >>>>>>>>
> >>>>>>>>> unusable. It's kind of odd because the total index size is not
> even
> >>>>>>>>> large..ie, < 10GB. Are there any optimization/enhancements I
> could
> >>>>>>>>>
> >>>>>>>> try
> >>>>>
> >>>>>> to
> >>>>>>>>
> >>>>>>>>> help alleviate these problems?
> >>>>>>>>>
> >>>>>>>>> I should note that for the above collection we have only have 1
> >>>>>>>>> shard
> >>>>>>>>>
> >>>>>>>> thats
> >>>>>>>>
> >>>>>>>>> replicated across all machines so all machines have the full
> index.
> >>>>>>>>>
> >>>>>>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer
> >>>>>>>>> where
> >>>>>>>>>
> >>>>>>>> all
> >>>>>>>>
> >>>>>>>>> updates get sent to 1 machine and 1 machine only? We could then
> >>>>>>>>>
> >>>>>>>> remove
> >>>>>
> >>>>>> this
> >>>>>>>>
> >>>>>>>>> machine from our cluster than that handles user requests.
> >>>>>>>>>
> >>>>>>>>> Thanks for any input.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>> --
> >>> André Bois-Crettez
> >>>
> >>> Software Architect
> >>> Search Developer
> >>> http://www.kelkoo.com/
> >>>
> >>
> >> Kelkoo SAS
> >> Société par Actions Simplifiée
> >> Au capital de € 4.168.964,30
> >> Siège social : 8, rue du Sentier 75002 Paris
> >> 425 093 069 RCS Paris
> >>
> >> Ce message et les pièces jointes sont confidentiels et établis à
> >> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> >> destinataire de ce message, merci de le détruire et d'en avertir
> >> l'expéditeur.
> >>
>

Re: Solr Cloud Bulk Indexing Questions

Posted by Erick Erickson <er...@gmail.com>.
When you're doing hard commits, is it with openSeacher = true or
false? It should probably be false...

Here's a rundown of the soft/hard commit consequences:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

I suspect (but, of course, can't prove) that you're over-committing
and hitting segment
merges without meaning to...

FWIW,
Erick

On Wed, Jan 22, 2014 at 1:46 PM, Software Dev <st...@gmail.com> wrote:
> A suggestion would be to hard commit much less often, ie every 10
> minutes, and see if there is a change.
>
> - Will try this
>
> How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache ?
>
> - We have 18G of ram 12 dedicated to Solr but as of right now the total
> index size is only 5GB
>
> Ah, and what about network IO ? Could that be a limiting factor ?
>
> - What is the size of your documents ? A few KB, MB, ... ?
>
> Under 1MB
>
> - Again, total index size is only 5GB so I dont know if this would be a
> problem
>
>
>
>
>
>
> On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
> <an...@kelkoo.com>wrote:
>
>> 1 node having more load should be the leader (because of the extra work
>> of receiving and distributing updates, but my experiences show only a
>> bit more CPU usage, and no difference in disk IO).
>>
>> A suggestion would be to hard commit much less often, ie every 10
>> minutes, and see if there is a change.
>> How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache
>> ?
>> What is the size of your documents ? A few KB, MB, ... ?
>> Ah, and what about network IO ? Could that be a limiting factor ?
>>
>>
>> André
>>
>>
>> On 2014-01-21 23:40, Software Dev wrote:
>>
>>> Any other suggestions?
>>>
>>>
>>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <st...@gmail.com>
>>> wrote:
>>>
>>>  4.6.0
>>>>
>>>>
>>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller <markrmiller@gmail.com
>>>> >wrote:
>>>>
>>>>  What version are you running?
>>>>>
>>>>> - Mark
>>>>>
>>>>> On Jan 20, 2014, at 5:43 PM, Software Dev <st...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do
>>>>>> all
>>>>>> updates get sent to one machine or something?
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>>>>>>
>>>>> static.void.dev@gmail.com>wrote:
>>>>>
>>>>>> We commit have a soft commit every 5 seconds and hard commit every 30.
>>>>>>>
>>>>>> As
>>>>>
>>>>>> far as docs/second it would guess around 200/sec which doesn't seem
>>>>>>>
>>>>>> that
>>>>>
>>>>>> high.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>>>>>>>
>>>>>> erickerickson@gmail.com>wrote:
>>>>>
>>>>>> Questions: How often do you commit your updates? What is your
>>>>>>>> indexing rate in docs/second?
>>>>>>>>
>>>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>>>>>>>> server is having trouble keeping up with updates, switching to CUSS
>>>>>>>> probably wouldn't help.
>>>>>>>>
>>>>>>>> So I suspect there's something not optimal about your setup that's
>>>>>>>> the culprit.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Erick
>>>>>>>>
>>>>>>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>>>>>>>>
>>>>>>> static.void.dev@gmail.com>
>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> We are testing our shiny new Solr Cloud architecture but we are
>>>>>>>>> experiencing some issues when doing bulk indexing.
>>>>>>>>>
>>>>>>>>> We have 5 solr cloud machines running and 3 indexing machines
>>>>>>>>>
>>>>>>>> (separate
>>>>>
>>>>>> from the cloud servers). The indexing machines pull off ids from a
>>>>>>>>>
>>>>>>>> queue
>>>>>
>>>>>> then they index and ship over a document via a CloudSolrServer. It
>>>>>>>>>
>>>>>>>> appears
>>>>>>>>
>>>>>>>>> that the indexers are too fast because the load (particularly disk
>>>>>>>>>
>>>>>>>> io)
>>>>>
>>>>>> on
>>>>>>>>
>>>>>>>>> the solr cloud machines spikes through the roof making the entire
>>>>>>>>>
>>>>>>>> cluster
>>>>>>>>
>>>>>>>>> unusable. It's kind of odd because the total index size is not even
>>>>>>>>> large..ie, < 10GB. Are there any optimization/enhancements I could
>>>>>>>>>
>>>>>>>> try
>>>>>
>>>>>> to
>>>>>>>>
>>>>>>>>> help alleviate these problems?
>>>>>>>>>
>>>>>>>>> I should note that for the above collection we have only have 1
>>>>>>>>> shard
>>>>>>>>>
>>>>>>>> thats
>>>>>>>>
>>>>>>>>> replicated across all machines so all machines have the full index.
>>>>>>>>>
>>>>>>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer
>>>>>>>>> where
>>>>>>>>>
>>>>>>>> all
>>>>>>>>
>>>>>>>>> updates get sent to 1 machine and 1 machine only? We could then
>>>>>>>>>
>>>>>>>> remove
>>>>>
>>>>>> this
>>>>>>>>
>>>>>>>>> machine from our cluster than that handles user requests.
>>>>>>>>>
>>>>>>>>> Thanks for any input.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>> --
>>> André Bois-Crettez
>>>
>>> Software Architect
>>> Search Developer
>>> http://www.kelkoo.com/
>>>
>>
>> Kelkoo SAS
>> Société par Actions Simplifiée
>> Au capital de € 4.168.964,30
>> Siège social : 8, rue du Sentier 75002 Paris
>> 425 093 069 RCS Paris
>>
>> Ce message et les pièces jointes sont confidentiels et établis à
>> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
>> destinataire de ce message, merci de le détruire et d'en avertir
>> l'expéditeur.
>>

Re: Solr Cloud Bulk Indexing Questions

Posted by Software Dev <st...@gmail.com>.
A suggestion would be to hard commit much less often, ie every 10
minutes, and see if there is a change.

- Will try this

How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache ?

- We have 18G of ram 12 dedicated to Solr but as of right now the total
index size is only 5GB

Ah, and what about network IO ? Could that be a limiting factor ?

- What is the size of your documents ? A few KB, MB, ... ?

Under 1MB

- Again, total index size is only 5GB so I dont know if this would be a
problem






On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
<an...@kelkoo.com>wrote:

> 1 node having more load should be the leader (because of the extra work
> of receiving and distributing updates, but my experiences show only a
> bit more CPU usage, and no difference in disk IO).
>
> A suggestion would be to hard commit much less often, ie every 10
> minutes, and see if there is a change.
> How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache
> ?
> What is the size of your documents ? A few KB, MB, ... ?
> Ah, and what about network IO ? Could that be a limiting factor ?
>
>
> André
>
>
> On 2014-01-21 23:40, Software Dev wrote:
>
>> Any other suggestions?
>>
>>
>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <st...@gmail.com>
>> wrote:
>>
>>  4.6.0
>>>
>>>
>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller <markrmiller@gmail.com
>>> >wrote:
>>>
>>>  What version are you running?
>>>>
>>>> - Mark
>>>>
>>>> On Jan 20, 2014, at 5:43 PM, Software Dev <st...@gmail.com>
>>>> wrote:
>>>>
>>>>  We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do
>>>>> all
>>>>> updates get sent to one machine or something?
>>>>>
>>>>>
>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>>>>>
>>>> static.void.dev@gmail.com>wrote:
>>>>
>>>>> We commit have a soft commit every 5 seconds and hard commit every 30.
>>>>>>
>>>>> As
>>>>
>>>>> far as docs/second it would guess around 200/sec which doesn't seem
>>>>>>
>>>>> that
>>>>
>>>>> high.
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>>>>>>
>>>>> erickerickson@gmail.com>wrote:
>>>>
>>>>> Questions: How often do you commit your updates? What is your
>>>>>>> indexing rate in docs/second?
>>>>>>>
>>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>>>>>>> server is having trouble keeping up with updates, switching to CUSS
>>>>>>> probably wouldn't help.
>>>>>>>
>>>>>>> So I suspect there's something not optimal about your setup that's
>>>>>>> the culprit.
>>>>>>>
>>>>>>> Best,
>>>>>>> Erick
>>>>>>>
>>>>>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>>>>>>>
>>>>>> static.void.dev@gmail.com>
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>>> We are testing our shiny new Solr Cloud architecture but we are
>>>>>>>> experiencing some issues when doing bulk indexing.
>>>>>>>>
>>>>>>>> We have 5 solr cloud machines running and 3 indexing machines
>>>>>>>>
>>>>>>> (separate
>>>>
>>>>> from the cloud servers). The indexing machines pull off ids from a
>>>>>>>>
>>>>>>> queue
>>>>
>>>>> then they index and ship over a document via a CloudSolrServer. It
>>>>>>>>
>>>>>>> appears
>>>>>>>
>>>>>>>> that the indexers are too fast because the load (particularly disk
>>>>>>>>
>>>>>>> io)
>>>>
>>>>> on
>>>>>>>
>>>>>>>> the solr cloud machines spikes through the roof making the entire
>>>>>>>>
>>>>>>> cluster
>>>>>>>
>>>>>>>> unusable. It's kind of odd because the total index size is not even
>>>>>>>> large..ie, < 10GB. Are there any optimization/enhancements I could
>>>>>>>>
>>>>>>> try
>>>>
>>>>> to
>>>>>>>
>>>>>>>> help alleviate these problems?
>>>>>>>>
>>>>>>>> I should note that for the above collection we have only have 1
>>>>>>>> shard
>>>>>>>>
>>>>>>> thats
>>>>>>>
>>>>>>>> replicated across all machines so all machines have the full index.
>>>>>>>>
>>>>>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer
>>>>>>>> where
>>>>>>>>
>>>>>>> all
>>>>>>>
>>>>>>>> updates get sent to 1 machine and 1 machine only? We could then
>>>>>>>>
>>>>>>> remove
>>>>
>>>>> this
>>>>>>>
>>>>>>>> machine from our cluster than that handles user requests.
>>>>>>>>
>>>>>>>> Thanks for any input.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> --
>> André Bois-Crettez
>>
>> Software Architect
>> Search Developer
>> http://www.kelkoo.com/
>>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>

Re: Solr Cloud Bulk Indexing Questions

Posted by Andre Bois-Crettez <an...@kelkoo.com>.
1 node having more load should be the leader (because of the extra work
of receiving and distributing updates, but my experiences show only a
bit more CPU usage, and no difference in disk IO).

A suggestion would be to hard commit much less often, ie every 10
minutes, and see if there is a change.
How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache ?
What is the size of your documents ? A few KB, MB, ... ?
Ah, and what about network IO ? Could that be a limiting factor ?


André

On 2014-01-21 23:40, Software Dev wrote:
> Any other suggestions?
>
>
> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <st...@gmail.com>wrote:
>
>> 4.6.0
>>
>>
>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller <ma...@gmail.com>wrote:
>>
>>> What version are you running?
>>>
>>> - Mark
>>>
>>> On Jan 20, 2014, at 5:43 PM, Software Dev <st...@gmail.com>
>>> wrote:
>>>
>>>> We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all
>>>> updates get sent to one machine or something?
>>>>
>>>>
>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>>> static.void.dev@gmail.com>wrote:
>>>>> We commit have a soft commit every 5 seconds and hard commit every 30.
>>> As
>>>>> far as docs/second it would guess around 200/sec which doesn't seem
>>> that
>>>>> high.
>>>>>
>>>>>
>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>>> erickerickson@gmail.com>wrote:
>>>>>> Questions: How often do you commit your updates? What is your
>>>>>> indexing rate in docs/second?
>>>>>>
>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>>>>>> server is having trouble keeping up with updates, switching to CUSS
>>>>>> probably wouldn't help.
>>>>>>
>>>>>> So I suspect there's something not optimal about your setup that's
>>>>>> the culprit.
>>>>>>
>>>>>> Best,
>>>>>> Erick
>>>>>>
>>>>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>>> static.void.dev@gmail.com>
>>>>>> wrote:
>>>>>>> We are testing our shiny new Solr Cloud architecture but we are
>>>>>>> experiencing some issues when doing bulk indexing.
>>>>>>>
>>>>>>> We have 5 solr cloud machines running and 3 indexing machines
>>> (separate
>>>>>>> from the cloud servers). The indexing machines pull off ids from a
>>> queue
>>>>>>> then they index and ship over a document via a CloudSolrServer. It
>>>>>> appears
>>>>>>> that the indexers are too fast because the load (particularly disk
>>> io)
>>>>>> on
>>>>>>> the solr cloud machines spikes through the roof making the entire
>>>>>> cluster
>>>>>>> unusable. It's kind of odd because the total index size is not even
>>>>>>> large..ie, < 10GB. Are there any optimization/enhancements I could
>>> try
>>>>>> to
>>>>>>> help alleviate these problems?
>>>>>>>
>>>>>>> I should note that for the above collection we have only have 1 shard
>>>>>> thats
>>>>>>> replicated across all machines so all machines have the full index.
>>>>>>>
>>>>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer where
>>>>>> all
>>>>>>> updates get sent to 1 machine and 1 machine only? We could then
>>> remove
>>>>>> this
>>>>>>> machine from our cluster than that handles user requests.
>>>>>>>
>>>>>>> Thanks for any input.
>>>>>
>>>
>
> --
> André Bois-Crettez
>
> Software Architect
> Search Developer
> http://www.kelkoo.com/

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Re: Solr Cloud Bulk Indexing Questions

Posted by Software Dev <st...@gmail.com>.
Any other suggestions?


On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <st...@gmail.com>wrote:

> 4.6.0
>
>
> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller <ma...@gmail.com>wrote:
>
>> What version are you running?
>>
>> - Mark
>>
>> On Jan 20, 2014, at 5:43 PM, Software Dev <st...@gmail.com>
>> wrote:
>>
>> > We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all
>> > updates get sent to one machine or something?
>> >
>> >
>> > On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>> static.void.dev@gmail.com>wrote:
>> >
>> >> We commit have a soft commit every 5 seconds and hard commit every 30.
>> As
>> >> far as docs/second it would guess around 200/sec which doesn't seem
>> that
>> >> high.
>> >>
>> >>
>> >> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>> erickerickson@gmail.com>wrote:
>> >>
>> >>> Questions: How often do you commit your updates? What is your
>> >>> indexing rate in docs/second?
>> >>>
>> >>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>> >>> server is having trouble keeping up with updates, switching to CUSS
>> >>> probably wouldn't help.
>> >>>
>> >>> So I suspect there's something not optimal about your setup that's
>> >>> the culprit.
>> >>>
>> >>> Best,
>> >>> Erick
>> >>>
>> >>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>> static.void.dev@gmail.com>
>> >>> wrote:
>> >>>> We are testing our shiny new Solr Cloud architecture but we are
>> >>>> experiencing some issues when doing bulk indexing.
>> >>>>
>> >>>> We have 5 solr cloud machines running and 3 indexing machines
>> (separate
>> >>>> from the cloud servers). The indexing machines pull off ids from a
>> queue
>> >>>> then they index and ship over a document via a CloudSolrServer. It
>> >>> appears
>> >>>> that the indexers are too fast because the load (particularly disk
>> io)
>> >>> on
>> >>>> the solr cloud machines spikes through the roof making the entire
>> >>> cluster
>> >>>> unusable. It's kind of odd because the total index size is not even
>> >>>> large..ie, < 10GB. Are there any optimization/enhancements I could
>> try
>> >>> to
>> >>>> help alleviate these problems?
>> >>>>
>> >>>> I should note that for the above collection we have only have 1 shard
>> >>> thats
>> >>>> replicated across all machines so all machines have the full index.
>> >>>>
>> >>>> Would we benefit from switching to a ConcurrentUpdateSolrServer where
>> >>> all
>> >>>> updates get sent to 1 machine and 1 machine only? We could then
>> remove
>> >>> this
>> >>>> machine from our cluster than that handles user requests.
>> >>>>
>> >>>> Thanks for any input.
>> >>>
>> >>
>> >>
>>
>>
>

Re: Solr Cloud Bulk Indexing Questions

Posted by Software Dev <st...@gmail.com>.
4.6.0


On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller <ma...@gmail.com> wrote:

> What version are you running?
>
> - Mark
>
> On Jan 20, 2014, at 5:43 PM, Software Dev <st...@gmail.com>
> wrote:
>
> > We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all
> > updates get sent to one machine or something?
> >
> >
> > On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <static.void.dev@gmail.com
> >wrote:
> >
> >> We commit have a soft commit every 5 seconds and hard commit every 30.
> As
> >> far as docs/second it would guess around 200/sec which doesn't seem that
> >> high.
> >>
> >>
> >> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
> erickerickson@gmail.com>wrote:
> >>
> >>> Questions: How often do you commit your updates? What is your
> >>> indexing rate in docs/second?
> >>>
> >>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
> >>> server is having trouble keeping up with updates, switching to CUSS
> >>> probably wouldn't help.
> >>>
> >>> So I suspect there's something not optimal about your setup that's
> >>> the culprit.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
> static.void.dev@gmail.com>
> >>> wrote:
> >>>> We are testing our shiny new Solr Cloud architecture but we are
> >>>> experiencing some issues when doing bulk indexing.
> >>>>
> >>>> We have 5 solr cloud machines running and 3 indexing machines
> (separate
> >>>> from the cloud servers). The indexing machines pull off ids from a
> queue
> >>>> then they index and ship over a document via a CloudSolrServer. It
> >>> appears
> >>>> that the indexers are too fast because the load (particularly disk io)
> >>> on
> >>>> the solr cloud machines spikes through the roof making the entire
> >>> cluster
> >>>> unusable. It's kind of odd because the total index size is not even
> >>>> large..ie, < 10GB. Are there any optimization/enhancements I could try
> >>> to
> >>>> help alleviate these problems?
> >>>>
> >>>> I should note that for the above collection we have only have 1 shard
> >>> thats
> >>>> replicated across all machines so all machines have the full index.
> >>>>
> >>>> Would we benefit from switching to a ConcurrentUpdateSolrServer where
> >>> all
> >>>> updates get sent to 1 machine and 1 machine only? We could then remove
> >>> this
> >>>> machine from our cluster than that handles user requests.
> >>>>
> >>>> Thanks for any input.
> >>>
> >>
> >>
>
>

Re: Solr Cloud Bulk Indexing Questions

Posted by Mark Miller <ma...@gmail.com>.
What version are you running?

- Mark

On Jan 20, 2014, at 5:43 PM, Software Dev <st...@gmail.com> wrote:

> We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all
> updates get sent to one machine or something?
> 
> 
> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <st...@gmail.com>wrote:
> 
>> We commit have a soft commit every 5 seconds and hard commit every 30. As
>> far as docs/second it would guess around 200/sec which doesn't seem that
>> high.
>> 
>> 
>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <er...@gmail.com>wrote:
>> 
>>> Questions: How often do you commit your updates? What is your
>>> indexing rate in docs/second?
>>> 
>>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>>> server is having trouble keeping up with updates, switching to CUSS
>>> probably wouldn't help.
>>> 
>>> So I suspect there's something not optimal about your setup that's
>>> the culprit.
>>> 
>>> Best,
>>> Erick
>>> 
>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <st...@gmail.com>
>>> wrote:
>>>> We are testing our shiny new Solr Cloud architecture but we are
>>>> experiencing some issues when doing bulk indexing.
>>>> 
>>>> We have 5 solr cloud machines running and 3 indexing machines (separate
>>>> from the cloud servers). The indexing machines pull off ids from a queue
>>>> then they index and ship over a document via a CloudSolrServer. It
>>> appears
>>>> that the indexers are too fast because the load (particularly disk io)
>>> on
>>>> the solr cloud machines spikes through the roof making the entire
>>> cluster
>>>> unusable. It's kind of odd because the total index size is not even
>>>> large..ie, < 10GB. Are there any optimization/enhancements I could try
>>> to
>>>> help alleviate these problems?
>>>> 
>>>> I should note that for the above collection we have only have 1 shard
>>> thats
>>>> replicated across all machines so all machines have the full index.
>>>> 
>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer where
>>> all
>>>> updates get sent to 1 machine and 1 machine only? We could then remove
>>> this
>>>> machine from our cluster than that handles user requests.
>>>> 
>>>> Thanks for any input.
>>> 
>> 
>> 


Re: Solr Cloud Bulk Indexing Questions

Posted by Software Dev <st...@gmail.com>.
We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all
updates get sent to one machine or something?


On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <st...@gmail.com>wrote:

> We commit have a soft commit every 5 seconds and hard commit every 30. As
> far as docs/second it would guess around 200/sec which doesn't seem that
> high.
>
>
> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <er...@gmail.com>wrote:
>
>> Questions: How often do you commit your updates? What is your
>> indexing rate in docs/second?
>>
>> In a SolrCloud setup, you should be using a CloudSolrServer. If the
>> server is having trouble keeping up with updates, switching to CUSS
>> probably wouldn't help.
>>
>> So I suspect there's something not optimal about your setup that's
>> the culprit.
>>
>> Best,
>> Erick
>>
>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <st...@gmail.com>
>> wrote:
>> > We are testing our shiny new Solr Cloud architecture but we are
>> > experiencing some issues when doing bulk indexing.
>> >
>> > We have 5 solr cloud machines running and 3 indexing machines (separate
>> > from the cloud servers). The indexing machines pull off ids from a queue
>> > then they index and ship over a document via a CloudSolrServer. It
>> appears
>> > that the indexers are too fast because the load (particularly disk io)
>> on
>> > the solr cloud machines spikes through the roof making the entire
>> cluster
>> > unusable. It's kind of odd because the total index size is not even
>> > large..ie, < 10GB. Are there any optimization/enhancements I could try
>> to
>> > help alleviate these problems?
>> >
>> > I should note that for the above collection we have only have 1 shard
>> thats
>> > replicated across all machines so all machines have the full index.
>> >
>> > Would we benefit from switching to a ConcurrentUpdateSolrServer where
>> all
>> > updates get sent to 1 machine and 1 machine only? We could then remove
>> this
>> > machine from our cluster than that handles user requests.
>> >
>> > Thanks for any input.
>>
>
>

Re: Solr Cloud Bulk Indexing Questions

Posted by Software Dev <st...@gmail.com>.
We commit have a soft commit every 5 seconds and hard commit every 30. As
far as docs/second it would guess around 200/sec which doesn't seem that
high.


On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <er...@gmail.com>wrote:

> Questions: How often do you commit your updates? What is your
> indexing rate in docs/second?
>
> In a SolrCloud setup, you should be using a CloudSolrServer. If the
> server is having trouble keeping up with updates, switching to CUSS
> probably wouldn't help.
>
> So I suspect there's something not optimal about your setup that's
> the culprit.
>
> Best,
> Erick
>
> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <st...@gmail.com>
> wrote:
> > We are testing our shiny new Solr Cloud architecture but we are
> > experiencing some issues when doing bulk indexing.
> >
> > We have 5 solr cloud machines running and 3 indexing machines (separate
> > from the cloud servers). The indexing machines pull off ids from a queue
> > then they index and ship over a document via a CloudSolrServer. It
> appears
> > that the indexers are too fast because the load (particularly disk io) on
> > the solr cloud machines spikes through the roof making the entire cluster
> > unusable. It's kind of odd because the total index size is not even
> > large..ie, < 10GB. Are there any optimization/enhancements I could try to
> > help alleviate these problems?
> >
> > I should note that for the above collection we have only have 1 shard
> thats
> > replicated across all machines so all machines have the full index.
> >
> > Would we benefit from switching to a ConcurrentUpdateSolrServer where all
> > updates get sent to 1 machine and 1 machine only? We could then remove
> this
> > machine from our cluster than that handles user requests.
> >
> > Thanks for any input.
>

Re: Solr Cloud Bulk Indexing Questions

Posted by Erick Erickson <er...@gmail.com>.
Questions: How often do you commit your updates? What is your
indexing rate in docs/second?

In a SolrCloud setup, you should be using a CloudSolrServer. If the
server is having trouble keeping up with updates, switching to CUSS
probably wouldn't help.

So I suspect there's something not optimal about your setup that's
the culprit.

Best,
Erick

On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <st...@gmail.com> wrote:
> We are testing our shiny new Solr Cloud architecture but we are
> experiencing some issues when doing bulk indexing.
>
> We have 5 solr cloud machines running and 3 indexing machines (separate
> from the cloud servers). The indexing machines pull off ids from a queue
> then they index and ship over a document via a CloudSolrServer. It appears
> that the indexers are too fast because the load (particularly disk io) on
> the solr cloud machines spikes through the roof making the entire cluster
> unusable. It's kind of odd because the total index size is not even
> large..ie, < 10GB. Are there any optimization/enhancements I could try to
> help alleviate these problems?
>
> I should note that for the above collection we have only have 1 shard thats
> replicated across all machines so all machines have the full index.
>
> Would we benefit from switching to a ConcurrentUpdateSolrServer where all
> updates get sent to 1 machine and 1 machine only? We could then remove this
> machine from our cluster than that handles user requests.
>
> Thanks for any input.

Re: Solr Cloud Bulk Indexing Questions

Posted by Software Dev <st...@gmail.com>.
Does maxWriteMBPerSec apply to NRTCachingDirectoryFactory? I only
see maxMergeSizeMB and maxCachedMB as configuration values.


On Thu, Jan 23, 2014 at 11:05 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> Have you tried maxWriteMBPerSec?
>
> http://search-lucene.com/?q=maxWriteMBPerSec&fc_project=Solr
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <static.void.dev@gmail.com
> >wrote:
>
> > We are testing our shiny new Solr Cloud architecture but we are
> > experiencing some issues when doing bulk indexing.
> >
> > We have 5 solr cloud machines running and 3 indexing machines (separate
> > from the cloud servers). The indexing machines pull off ids from a queue
> > then they index and ship over a document via a CloudSolrServer. It
> appears
> > that the indexers are too fast because the load (particularly disk io) on
> > the solr cloud machines spikes through the roof making the entire cluster
> > unusable. It's kind of odd because the total index size is not even
> > large..ie, < 10GB. Are there any optimization/enhancements I could try to
> > help alleviate these problems?
> >
> > I should note that for the above collection we have only have 1 shard
> thats
> > replicated across all machines so all machines have the full index.
> >
> > Would we benefit from switching to a ConcurrentUpdateSolrServer where all
> > updates get sent to 1 machine and 1 machine only? We could then remove
> this
> > machine from our cluster than that handles user requests.
> >
> > Thanks for any input.
> >
>

Re: Solr Cloud Bulk Indexing Questions

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

Have you tried maxWriteMBPerSec?

http://search-lucene.com/?q=maxWriteMBPerSec&fc_project=Solr

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <st...@gmail.com>wrote:

> We are testing our shiny new Solr Cloud architecture but we are
> experiencing some issues when doing bulk indexing.
>
> We have 5 solr cloud machines running and 3 indexing machines (separate
> from the cloud servers). The indexing machines pull off ids from a queue
> then they index and ship over a document via a CloudSolrServer. It appears
> that the indexers are too fast because the load (particularly disk io) on
> the solr cloud machines spikes through the roof making the entire cluster
> unusable. It's kind of odd because the total index size is not even
> large..ie, < 10GB. Are there any optimization/enhancements I could try to
> help alleviate these problems?
>
> I should note that for the above collection we have only have 1 shard thats
> replicated across all machines so all machines have the full index.
>
> Would we benefit from switching to a ConcurrentUpdateSolrServer where all
> updates get sent to 1 machine and 1 machine only? We could then remove this
> machine from our cluster than that handles user requests.
>
> Thanks for any input.
>