You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kevin Osborn <ke...@cbsi.com> on 2013/08/13 20:54:14 UTC

Indexing hangs when more than 1 server in a cluster

I am using Solr Cloud 4.4. It is pretty much a base configuration. We have
2 servers and 3 collections. Collection1 is 1 shard and the Collection2 and
Collection3 both have 2 shards. Both servers are identical.

So, here is my process, I do a lot of queries on Collection1 and
Collection2. I then do a bunch of inserts into Collection3. I am doing CSV
uploads. I am also doing custom shard routing. All the products in a single
upload will have the same shard key. All Solr interaction is through SolrJ
with full Zookeeper awareness. My uploads are also using soft commits.

I tried this on a record set of 936 products. Everything worked fine. I
then sent over a record set of 300k products. The upload into Collection3
is chunked. I tried both 1000 and 200,000 with similar results. The first
upload to Solr would just hang. There would simply be no response from
Solr. A few of the products from this request would make it into the index,
but not many.

In this state, queries continued to work, but deletes did not.

My only solution was to kill each Solr process.

As an experiment, I did the large catalog first. First, I reset everything.
With A chunk size of 1000, about 110,000 out of 300,000 records made it
into Solr before the process hung. Again, queries worked, but deletes did
not and I had to kill Solr. It hung after about 30 seconds. Timing-wise,
this is at about the second autocommit cycle, given the default autocommit
of 15 seconds. I am not sure if this is related or not.

As an additional experiment, I ran the entire test with just a single node
in the cluster. This time, everything ran fine.

Does anyone have any ideas? Everything is pretty default. These servers are
Azure VMs, although I have seen similar behavior running two Solr instances
on a single internal server as well.

I had also noticed similar behavior before with Solr 4.3. It definitely has
something do with the clustering, but I am not sure what. And I don't see
any error message (or really anything else) in the Solr logs.

Thanks.

-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: Indexing hangs when more than 1 server in a cluster

Posted by Kevin Osborn <ke...@cbsi.com>.
I may have a bit of good news. The ulimit of open files was set to 4096. I
just chose a random high limit (100000) and it seems to be working better
now. I still have more testing to do though, but the initial results are
hopeful.



On Wed, Aug 14, 2013 at 4:22 PM, Kevin Osborn <ke...@cbsi.com> wrote:

> Actually, I thought it worked last night, but that may have just been a
> fluke. Today, it is not working.
>
> This is what I have done.
>
> I have turned off autoCommit and softAutoCommit. My updates are not
> sending any softCommit messages.
>
> I am sending over data in chunks of 500 records.
>
> At the end of each complete upload, I am doing an explicit commit.
>
> So, I send over the first upload (536 records). So, this works fine in 2
> chunks. After the commit, they are searchable as well.
>
> The second catalog is much larger (300k records). This starts uploading
> about 5 minutes later. Usually, it hangs on the very first chunk. If I were
> to kill the server during the hang, it does then work.
>
> As a variation, I set autoCommit maxDocs with openSearcher to false. I set
> this to 10000. During the test, it hung right after the second autocommit.
>
> In both servers, I see a long waiting commitScheduler object in the thread
> dump.
>
>    - sun.misc.Unsafe.park(Native Method)
>    - java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>    -
>    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>    -
>    java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1079)
>    -
>    java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
>    -
>    java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>    -
>    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>    -
>    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>    - java.lang.Thread.run(Thread.java:722)
>
>
> On the second server, I see quite a few other long waiting processes as
> well.
>
> -Kevin
>
>
> On Wed, Aug 14, 2013 at 9:51 AM, Kevin Osborn <ke...@cbsi.com>wrote:
>
>> Thanks so much for your help and for the explanations. Eventually, we
>> will be doing several batches in parallel. But at least now I know where to
>> look and can do some testing on various scenarios.
>>
>> Since we may be doing a lot of heavy uploading (while still doing a lot
>> of queries), having a autoCommit interval shorter than the softAutoCommit
>> internal does sound interesting and I will test it out. And then just
>> disable softCommit on my batch uploads.
>>
>> Either way, I at least know where to focus my efforts.
>>
>> -Kevin
>>
>>
>>
>> On Wed, Aug 14, 2013 at 6:27 AM, Jason Hellman <
>> jhellman@innoventsolutions.com> wrote:
>>
>>> Kevin,
>>>
>>> I wouldn't have considered using softCommits at all based on what I
>>> understand from your use case.  You appear to be loading in large batches,
>>> and softCommits are better aligned to NRT search where there is a steady
>>> stream of smaller updates that need to be available immediately.
>>>
>>> As Erick pointed out, soft commits are all about avoiding constant
>>> reopening of the index searcher…where by constant we mean every few
>>> seconds.  Provided you can wait until your batch is completed, and that
>>> frequency is roughly a minute or more, you likely will find an
>>> old-fashioned hard commit (with openSearcher="true") will work just fine
>>> (YMMV).
>>>
>>> Jason
>>>
>>>
>>>
>>> On Aug 14, 2013, at 4:51 AM, Erick Erickson <er...@gmail.com>
>>> wrote:
>>>
>>> > right, SOLR-5081 is possible but somewhat unlikely
>>> > given the fact that you actually don't have very many
>>> > nodes in your cluster.
>>> >
>>> > soft commits aren't relevant to the tlog, but here's
>>> > the thing. Your tlogs may get replayed
>>> > when you restart solr. If they're large, this may take
>>> > a long time. When you said you restarted Solr after
>>> > killing it, you might have triggered this.
>>> >
>>> > The way to keep tlogs small is to hard commit more
>>> > frequently (you should look at their size before
>>> > worrying about it though!). If you set openSearcher=false,
>>> > this is pretty inexpensive, all it really does is close
>>> > the current segment files, open new ones, and start a new
>>> > tlog file. It does _not_ invalidate caches, do autowarming,
>>> > all that expensive stuff.
>>> >
>>> > Your soft commit does _not_ improve performance! It is
>>> > just "less expensive" than a hard commit with
>>> > openSearcher=true. It _does_ invalidate caches, fire
>>> > off autowarming, etc. So it does "improve performance"
>>> > over doing hard commits with openSearcher=true
>>> > with the same frequency, but it still isn't free. It's still
>>> > good to have the soft commit interval as long as you
>>> > can tolerate.
>>> >
>>> > It's perfectly reasonable to have a hard commit interval
>>> > that's much shorter than your soft commit interval. As
>>> > Yonik explained once, "soft commits are about visibility
>>> > but hard commits are about durability".
>>> >
>>> > Best
>>> > Erick
>>> >
>>> >
>>> > On Wed, Aug 14, 2013 at 2:20 AM, Kevin Osborn <ke...@cbsi.com>
>>> wrote:
>>> >
>>> >> Interesting, that did work. Do you or anyone else have any ideas or
>>> what I
>>> >> should look at? While soft commit is not a requirement in my project,
>>> my
>>> >> understanding is that it should help performance. On the same index,
>>> I will
>>> >> be doing both a large number of queries as well as updates.
>>> >>
>>> >> If I have to disable autoCommit, should I increase the chunk size?
>>> >>
>>> >> Of course, I will have to run a more large scale test tomorrow, but I
>>> saw
>>> >> this problem fairly consistently in my smaller test.
>>> >>
>>> >> In a previous experiment, I applied the SOLR-4816 patch that someone
>>> >> indicated might help. I also reduced the CSV upload chunk size to
>>> 500. It
>>> >> seemed like things got a little better, but still eventually hung.
>>> >>
>>> >> I also see SOLR-5081, but I don't know if that is my issue or not. At
>>> least
>>> >> in my test, the index writes are not parallel as in the ticket.
>>> >>
>>> >> -Kevin
>>> >>
>>> >>
>>> >> On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman <
>>> >> jhellman@innoventsolutions.com> wrote:
>>> >>
>>> >>> While I don't have a past history of this issue to use as reference,
>>> if I
>>> >>> were in your shoes I would consider trying your updates with
>>> softCommit
>>> >>> disabled.  My suspicion is you're experiencing some issue with the
>>> >>> transaction logging and how it's managed when your hard commit
>>> occurs.
>>> >>>
>>> >>> If you can give that a try and let us know how that fares we might
>>> have
>>> >>> some further input to share.
>>> >>>
>>> >>>
>>> >>> On Aug 13, 2013, at 11:54 AM, Kevin Osborn <ke...@cbsi.com>
>>> >> wrote:
>>> >>>
>>> >>>> I am using Solr Cloud 4.4. It is pretty much a base configuration.
>>> We
>>> >>> have
>>> >>>> 2 servers and 3 collections. Collection1 is 1 shard and the
>>> Collection2
>>> >>> and
>>> >>>> Collection3 both have 2 shards. Both servers are identical.
>>> >>>>
>>> >>>> So, here is my process, I do a lot of queries on Collection1 and
>>> >>>> Collection2. I then do a bunch of inserts into Collection3. I am
>>> doing
>>> >>> CSV
>>> >>>> uploads. I am also doing custom shard routing. All the products in a
>>> >>> single
>>> >>>> upload will have the same shard key. All Solr interaction is through
>>> >>> SolrJ
>>> >>>> with full Zookeeper awareness. My uploads are also using soft
>>> commits.
>>> >>>>
>>> >>>> I tried this on a record set of 936 products. Everything worked
>>> fine. I
>>> >>>> then sent over a record set of 300k products. The upload into
>>> >> Collection3
>>> >>>> is chunked. I tried both 1000 and 200,000 with similar results. The
>>> >> first
>>> >>>> upload to Solr would just hang. There would simply be no response
>>> from
>>> >>>> Solr. A few of the products from this request would make it into the
>>> >>> index,
>>> >>>> but not many.
>>> >>>>
>>> >>>> In this state, queries continued to work, but deletes did not.
>>> >>>>
>>> >>>> My only solution was to kill each Solr process.
>>> >>>>
>>> >>>> As an experiment, I did the large catalog first. First, I reset
>>> >>> everything.
>>> >>>> With A chunk size of 1000, about 110,000 out of 300,000 records
>>> made it
>>> >>>> into Solr before the process hung. Again, queries worked, but
>>> deletes
>>> >> did
>>> >>>> not and I had to kill Solr. It hung after about 30 seconds.
>>> >> Timing-wise,
>>> >>>> this is at about the second autocommit cycle, given the default
>>> >>> autocommit
>>> >>>> of 15 seconds. I am not sure if this is related or not.
>>> >>>>
>>> >>>> As an additional experiment, I ran the entire test with just a
>>> single
>>> >>> node
>>> >>>> in the cluster. This time, everything ran fine.
>>> >>>>
>>> >>>> Does anyone have any ideas? Everything is pretty default. These
>>> servers
>>> >>> are
>>> >>>> Azure VMs, although I have seen similar behavior running two Solr
>>> >>> instances
>>> >>>> on a single internal server as well.
>>> >>>>
>>> >>>> I had also noticed similar behavior before with Solr 4.3. It
>>> definitely
>>> >>> has
>>> >>>> something do with the clustering, but I am not sure what. And I
>>> don't
>>> >> see
>>> >>>> any error message (or really anything else) in the Solr logs.
>>> >>>>
>>> >>>> Thanks.
>>> >>>>
>>> >>>> --
>>> >>>> *KEVIN OSBORN*
>>> >>>> LEAD SOFTWARE ENGINEER
>>> >>>> CNET Content Solutions
>>> >>>> OFFICE 949.399.8714
>>> >>>> CELL 949.310.4677      SKYPE osbornk
>>> >>>> 5 Park Plaza, Suite 600, Irvine, CA 92614
>>> >>>> [image: CNET Content Solutions]
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> *KEVIN OSBORN*
>>> >> LEAD SOFTWARE ENGINEER
>>> >> CNET Content Solutions
>>> >> OFFICE 949.399.8714
>>> >> CELL 949.310.4677      SKYPE osbornk
>>> >> 5 Park Plaza, Suite 600, Irvine, CA 92614
>>> >> [image: CNET Content Solutions]
>>> >>
>>>
>>>
>>
>>
>> --
>> *KEVIN OSBORN*
>>
>> LEAD SOFTWARE ENGINEER
>> CNET Content Solutions
>> OFFICE 949.399.8714
>> CELL 949.310.4677      SKYPE osbornk
>> 5 Park Plaza, Suite 600, Irvine, CA 92614
>> [image: CNET Content Solutions]
>>
>>
>
>
> --
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677      SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: Indexing hangs when more than 1 server in a cluster

Posted by Kevin Osborn <ke...@cbsi.com>.
Actually, I thought it worked last night, but that may have just been a
fluke. Today, it is not working.

This is what I have done.

I have turned off autoCommit and softAutoCommit. My updates are not sending
any softCommit messages.

I am sending over data in chunks of 500 records.

At the end of each complete upload, I am doing an explicit commit.

So, I send over the first upload (536 records). So, this works fine in 2
chunks. After the commit, they are searchable as well.

The second catalog is much larger (300k records). This starts uploading
about 5 minutes later. Usually, it hangs on the very first chunk. If I were
to kill the server during the hang, it does then work.

As a variation, I set autoCommit maxDocs with openSearcher to false. I set
this to 10000. During the test, it hung right after the second autocommit.

In both servers, I see a long waiting commitScheduler object in the thread
dump.

   - sun.misc.Unsafe.park(Native Method)
   - java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   -
   java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
   -
   java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1079)
   -
   java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
   -
   java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
   -
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
   -
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   - java.lang.Thread.run(Thread.java:722)


On the second server, I see quite a few other long waiting processes as
well.

-Kevin


On Wed, Aug 14, 2013 at 9:51 AM, Kevin Osborn <ke...@cbsi.com> wrote:

> Thanks so much for your help and for the explanations. Eventually, we will
> be doing several batches in parallel. But at least now I know where to look
> and can do some testing on various scenarios.
>
> Since we may be doing a lot of heavy uploading (while still doing a lot of
> queries), having a autoCommit interval shorter than the softAutoCommit
> internal does sound interesting and I will test it out. And then just
> disable softCommit on my batch uploads.
>
> Either way, I at least know where to focus my efforts.
>
> -Kevin
>
>
>
> On Wed, Aug 14, 2013 at 6:27 AM, Jason Hellman <
> jhellman@innoventsolutions.com> wrote:
>
>> Kevin,
>>
>> I wouldn't have considered using softCommits at all based on what I
>> understand from your use case.  You appear to be loading in large batches,
>> and softCommits are better aligned to NRT search where there is a steady
>> stream of smaller updates that need to be available immediately.
>>
>> As Erick pointed out, soft commits are all about avoiding constant
>> reopening of the index searcher…where by constant we mean every few
>> seconds.  Provided you can wait until your batch is completed, and that
>> frequency is roughly a minute or more, you likely will find an
>> old-fashioned hard commit (with openSearcher="true") will work just fine
>> (YMMV).
>>
>> Jason
>>
>>
>>
>> On Aug 14, 2013, at 4:51 AM, Erick Erickson <er...@gmail.com>
>> wrote:
>>
>> > right, SOLR-5081 is possible but somewhat unlikely
>> > given the fact that you actually don't have very many
>> > nodes in your cluster.
>> >
>> > soft commits aren't relevant to the tlog, but here's
>> > the thing. Your tlogs may get replayed
>> > when you restart solr. If they're large, this may take
>> > a long time. When you said you restarted Solr after
>> > killing it, you might have triggered this.
>> >
>> > The way to keep tlogs small is to hard commit more
>> > frequently (you should look at their size before
>> > worrying about it though!). If you set openSearcher=false,
>> > this is pretty inexpensive, all it really does is close
>> > the current segment files, open new ones, and start a new
>> > tlog file. It does _not_ invalidate caches, do autowarming,
>> > all that expensive stuff.
>> >
>> > Your soft commit does _not_ improve performance! It is
>> > just "less expensive" than a hard commit with
>> > openSearcher=true. It _does_ invalidate caches, fire
>> > off autowarming, etc. So it does "improve performance"
>> > over doing hard commits with openSearcher=true
>> > with the same frequency, but it still isn't free. It's still
>> > good to have the soft commit interval as long as you
>> > can tolerate.
>> >
>> > It's perfectly reasonable to have a hard commit interval
>> > that's much shorter than your soft commit interval. As
>> > Yonik explained once, "soft commits are about visibility
>> > but hard commits are about durability".
>> >
>> > Best
>> > Erick
>> >
>> >
>> > On Wed, Aug 14, 2013 at 2:20 AM, Kevin Osborn <ke...@cbsi.com>
>> wrote:
>> >
>> >> Interesting, that did work. Do you or anyone else have any ideas or
>> what I
>> >> should look at? While soft commit is not a requirement in my project,
>> my
>> >> understanding is that it should help performance. On the same index, I
>> will
>> >> be doing both a large number of queries as well as updates.
>> >>
>> >> If I have to disable autoCommit, should I increase the chunk size?
>> >>
>> >> Of course, I will have to run a more large scale test tomorrow, but I
>> saw
>> >> this problem fairly consistently in my smaller test.
>> >>
>> >> In a previous experiment, I applied the SOLR-4816 patch that someone
>> >> indicated might help. I also reduced the CSV upload chunk size to 500.
>> It
>> >> seemed like things got a little better, but still eventually hung.
>> >>
>> >> I also see SOLR-5081, but I don't know if that is my issue or not. At
>> least
>> >> in my test, the index writes are not parallel as in the ticket.
>> >>
>> >> -Kevin
>> >>
>> >>
>> >> On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman <
>> >> jhellman@innoventsolutions.com> wrote:
>> >>
>> >>> While I don't have a past history of this issue to use as reference,
>> if I
>> >>> were in your shoes I would consider trying your updates with
>> softCommit
>> >>> disabled.  My suspicion is you're experiencing some issue with the
>> >>> transaction logging and how it's managed when your hard commit occurs.
>> >>>
>> >>> If you can give that a try and let us know how that fares we might
>> have
>> >>> some further input to share.
>> >>>
>> >>>
>> >>> On Aug 13, 2013, at 11:54 AM, Kevin Osborn <ke...@cbsi.com>
>> >> wrote:
>> >>>
>> >>>> I am using Solr Cloud 4.4. It is pretty much a base configuration. We
>> >>> have
>> >>>> 2 servers and 3 collections. Collection1 is 1 shard and the
>> Collection2
>> >>> and
>> >>>> Collection3 both have 2 shards. Both servers are identical.
>> >>>>
>> >>>> So, here is my process, I do a lot of queries on Collection1 and
>> >>>> Collection2. I then do a bunch of inserts into Collection3. I am
>> doing
>> >>> CSV
>> >>>> uploads. I am also doing custom shard routing. All the products in a
>> >>> single
>> >>>> upload will have the same shard key. All Solr interaction is through
>> >>> SolrJ
>> >>>> with full Zookeeper awareness. My uploads are also using soft
>> commits.
>> >>>>
>> >>>> I tried this on a record set of 936 products. Everything worked
>> fine. I
>> >>>> then sent over a record set of 300k products. The upload into
>> >> Collection3
>> >>>> is chunked. I tried both 1000 and 200,000 with similar results. The
>> >> first
>> >>>> upload to Solr would just hang. There would simply be no response
>> from
>> >>>> Solr. A few of the products from this request would make it into the
>> >>> index,
>> >>>> but not many.
>> >>>>
>> >>>> In this state, queries continued to work, but deletes did not.
>> >>>>
>> >>>> My only solution was to kill each Solr process.
>> >>>>
>> >>>> As an experiment, I did the large catalog first. First, I reset
>> >>> everything.
>> >>>> With A chunk size of 1000, about 110,000 out of 300,000 records made
>> it
>> >>>> into Solr before the process hung. Again, queries worked, but deletes
>> >> did
>> >>>> not and I had to kill Solr. It hung after about 30 seconds.
>> >> Timing-wise,
>> >>>> this is at about the second autocommit cycle, given the default
>> >>> autocommit
>> >>>> of 15 seconds. I am not sure if this is related or not.
>> >>>>
>> >>>> As an additional experiment, I ran the entire test with just a single
>> >>> node
>> >>>> in the cluster. This time, everything ran fine.
>> >>>>
>> >>>> Does anyone have any ideas? Everything is pretty default. These
>> servers
>> >>> are
>> >>>> Azure VMs, although I have seen similar behavior running two Solr
>> >>> instances
>> >>>> on a single internal server as well.
>> >>>>
>> >>>> I had also noticed similar behavior before with Solr 4.3. It
>> definitely
>> >>> has
>> >>>> something do with the clustering, but I am not sure what. And I don't
>> >> see
>> >>>> any error message (or really anything else) in the Solr logs.
>> >>>>
>> >>>> Thanks.
>> >>>>
>> >>>> --
>> >>>> *KEVIN OSBORN*
>> >>>> LEAD SOFTWARE ENGINEER
>> >>>> CNET Content Solutions
>> >>>> OFFICE 949.399.8714
>> >>>> CELL 949.310.4677      SKYPE osbornk
>> >>>> 5 Park Plaza, Suite 600, Irvine, CA 92614
>> >>>> [image: CNET Content Solutions]
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> *KEVIN OSBORN*
>> >> LEAD SOFTWARE ENGINEER
>> >> CNET Content Solutions
>> >> OFFICE 949.399.8714
>> >> CELL 949.310.4677      SKYPE osbornk
>> >> 5 Park Plaza, Suite 600, Irvine, CA 92614
>> >> [image: CNET Content Solutions]
>> >>
>>
>>
>
>
> --
> *KEVIN OSBORN*
>
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677      SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: Indexing hangs when more than 1 server in a cluster

Posted by Kevin Osborn <ke...@cbsi.com>.
Thanks so much for your help and for the explanations. Eventually, we will
be doing several batches in parallel. But at least now I know where to look
and can do some testing on various scenarios.

Since we may be doing a lot of heavy uploading (while still doing a lot of
queries), having a autoCommit interval shorter than the softAutoCommit
internal does sound interesting and I will test it out. And then just
disable softCommit on my batch uploads.

Either way, I at least know where to focus my efforts.

-Kevin



On Wed, Aug 14, 2013 at 6:27 AM, Jason Hellman <
jhellman@innoventsolutions.com> wrote:

> Kevin,
>
> I wouldn't have considered using softCommits at all based on what I
> understand from your use case.  You appear to be loading in large batches,
> and softCommits are better aligned to NRT search where there is a steady
> stream of smaller updates that need to be available immediately.
>
> As Erick pointed out, soft commits are all about avoiding constant
> reopening of the index searcher…where by constant we mean every few
> seconds.  Provided you can wait until your batch is completed, and that
> frequency is roughly a minute or more, you likely will find an
> old-fashioned hard commit (with openSearcher="true") will work just fine
> (YMMV).
>
> Jason
>
>
>
> On Aug 14, 2013, at 4:51 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > right, SOLR-5081 is possible but somewhat unlikely
> > given the fact that you actually don't have very many
> > nodes in your cluster.
> >
> > soft commits aren't relevant to the tlog, but here's
> > the thing. Your tlogs may get replayed
> > when you restart solr. If they're large, this may take
> > a long time. When you said you restarted Solr after
> > killing it, you might have triggered this.
> >
> > The way to keep tlogs small is to hard commit more
> > frequently (you should look at their size before
> > worrying about it though!). If you set openSearcher=false,
> > this is pretty inexpensive, all it really does is close
> > the current segment files, open new ones, and start a new
> > tlog file. It does _not_ invalidate caches, do autowarming,
> > all that expensive stuff.
> >
> > Your soft commit does _not_ improve performance! It is
> > just "less expensive" than a hard commit with
> > openSearcher=true. It _does_ invalidate caches, fire
> > off autowarming, etc. So it does "improve performance"
> > over doing hard commits with openSearcher=true
> > with the same frequency, but it still isn't free. It's still
> > good to have the soft commit interval as long as you
> > can tolerate.
> >
> > It's perfectly reasonable to have a hard commit interval
> > that's much shorter than your soft commit interval. As
> > Yonik explained once, "soft commits are about visibility
> > but hard commits are about durability".
> >
> > Best
> > Erick
> >
> >
> > On Wed, Aug 14, 2013 at 2:20 AM, Kevin Osborn <ke...@cbsi.com>
> wrote:
> >
> >> Interesting, that did work. Do you or anyone else have any ideas or
> what I
> >> should look at? While soft commit is not a requirement in my project, my
> >> understanding is that it should help performance. On the same index, I
> will
> >> be doing both a large number of queries as well as updates.
> >>
> >> If I have to disable autoCommit, should I increase the chunk size?
> >>
> >> Of course, I will have to run a more large scale test tomorrow, but I
> saw
> >> this problem fairly consistently in my smaller test.
> >>
> >> In a previous experiment, I applied the SOLR-4816 patch that someone
> >> indicated might help. I also reduced the CSV upload chunk size to 500.
> It
> >> seemed like things got a little better, but still eventually hung.
> >>
> >> I also see SOLR-5081, but I don't know if that is my issue or not. At
> least
> >> in my test, the index writes are not parallel as in the ticket.
> >>
> >> -Kevin
> >>
> >>
> >> On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman <
> >> jhellman@innoventsolutions.com> wrote:
> >>
> >>> While I don't have a past history of this issue to use as reference,
> if I
> >>> were in your shoes I would consider trying your updates with softCommit
> >>> disabled.  My suspicion is you're experiencing some issue with the
> >>> transaction logging and how it's managed when your hard commit occurs.
> >>>
> >>> If you can give that a try and let us know how that fares we might have
> >>> some further input to share.
> >>>
> >>>
> >>> On Aug 13, 2013, at 11:54 AM, Kevin Osborn <ke...@cbsi.com>
> >> wrote:
> >>>
> >>>> I am using Solr Cloud 4.4. It is pretty much a base configuration. We
> >>> have
> >>>> 2 servers and 3 collections. Collection1 is 1 shard and the
> Collection2
> >>> and
> >>>> Collection3 both have 2 shards. Both servers are identical.
> >>>>
> >>>> So, here is my process, I do a lot of queries on Collection1 and
> >>>> Collection2. I then do a bunch of inserts into Collection3. I am doing
> >>> CSV
> >>>> uploads. I am also doing custom shard routing. All the products in a
> >>> single
> >>>> upload will have the same shard key. All Solr interaction is through
> >>> SolrJ
> >>>> with full Zookeeper awareness. My uploads are also using soft commits.
> >>>>
> >>>> I tried this on a record set of 936 products. Everything worked fine.
> I
> >>>> then sent over a record set of 300k products. The upload into
> >> Collection3
> >>>> is chunked. I tried both 1000 and 200,000 with similar results. The
> >> first
> >>>> upload to Solr would just hang. There would simply be no response from
> >>>> Solr. A few of the products from this request would make it into the
> >>> index,
> >>>> but not many.
> >>>>
> >>>> In this state, queries continued to work, but deletes did not.
> >>>>
> >>>> My only solution was to kill each Solr process.
> >>>>
> >>>> As an experiment, I did the large catalog first. First, I reset
> >>> everything.
> >>>> With A chunk size of 1000, about 110,000 out of 300,000 records made
> it
> >>>> into Solr before the process hung. Again, queries worked, but deletes
> >> did
> >>>> not and I had to kill Solr. It hung after about 30 seconds.
> >> Timing-wise,
> >>>> this is at about the second autocommit cycle, given the default
> >>> autocommit
> >>>> of 15 seconds. I am not sure if this is related or not.
> >>>>
> >>>> As an additional experiment, I ran the entire test with just a single
> >>> node
> >>>> in the cluster. This time, everything ran fine.
> >>>>
> >>>> Does anyone have any ideas? Everything is pretty default. These
> servers
> >>> are
> >>>> Azure VMs, although I have seen similar behavior running two Solr
> >>> instances
> >>>> on a single internal server as well.
> >>>>
> >>>> I had also noticed similar behavior before with Solr 4.3. It
> definitely
> >>> has
> >>>> something do with the clustering, but I am not sure what. And I don't
> >> see
> >>>> any error message (or really anything else) in the Solr logs.
> >>>>
> >>>> Thanks.
> >>>>
> >>>> --
> >>>> *KEVIN OSBORN*
> >>>> LEAD SOFTWARE ENGINEER
> >>>> CNET Content Solutions
> >>>> OFFICE 949.399.8714
> >>>> CELL 949.310.4677      SKYPE osbornk
> >>>> 5 Park Plaza, Suite 600, Irvine, CA 92614
> >>>> [image: CNET Content Solutions]
> >>>
> >>>
> >>
> >>
> >> --
> >> *KEVIN OSBORN*
> >> LEAD SOFTWARE ENGINEER
> >> CNET Content Solutions
> >> OFFICE 949.399.8714
> >> CELL 949.310.4677      SKYPE osbornk
> >> 5 Park Plaza, Suite 600, Irvine, CA 92614
> >> [image: CNET Content Solutions]
> >>
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: Indexing hangs when more than 1 server in a cluster

Posted by Jason Hellman <jh...@innoventsolutions.com>.
Kevin,

I wouldn't have considered using softCommits at all based on what I understand from your use case.  You appear to be loading in large batches, and softCommits are better aligned to NRT search where there is a steady stream of smaller updates that need to be available immediately.  

As Erick pointed out, soft commits are all about avoiding constant reopening of the index searcher…where by constant we mean every few seconds.  Provided you can wait until your batch is completed, and that frequency is roughly a minute or more, you likely will find an old-fashioned hard commit (with openSearcher="true") will work just fine (YMMV).

Jason



On Aug 14, 2013, at 4:51 AM, Erick Erickson <er...@gmail.com> wrote:

> right, SOLR-5081 is possible but somewhat unlikely
> given the fact that you actually don't have very many
> nodes in your cluster.
> 
> soft commits aren't relevant to the tlog, but here's
> the thing. Your tlogs may get replayed
> when you restart solr. If they're large, this may take
> a long time. When you said you restarted Solr after
> killing it, you might have triggered this.
> 
> The way to keep tlogs small is to hard commit more
> frequently (you should look at their size before
> worrying about it though!). If you set openSearcher=false,
> this is pretty inexpensive, all it really does is close
> the current segment files, open new ones, and start a new
> tlog file. It does _not_ invalidate caches, do autowarming,
> all that expensive stuff.
> 
> Your soft commit does _not_ improve performance! It is
> just "less expensive" than a hard commit with
> openSearcher=true. It _does_ invalidate caches, fire
> off autowarming, etc. So it does "improve performance"
> over doing hard commits with openSearcher=true
> with the same frequency, but it still isn't free. It's still
> good to have the soft commit interval as long as you
> can tolerate.
> 
> It's perfectly reasonable to have a hard commit interval
> that's much shorter than your soft commit interval. As
> Yonik explained once, "soft commits are about visibility
> but hard commits are about durability".
> 
> Best
> Erick
> 
> 
> On Wed, Aug 14, 2013 at 2:20 AM, Kevin Osborn <ke...@cbsi.com> wrote:
> 
>> Interesting, that did work. Do you or anyone else have any ideas or what I
>> should look at? While soft commit is not a requirement in my project, my
>> understanding is that it should help performance. On the same index, I will
>> be doing both a large number of queries as well as updates.
>> 
>> If I have to disable autoCommit, should I increase the chunk size?
>> 
>> Of course, I will have to run a more large scale test tomorrow, but I saw
>> this problem fairly consistently in my smaller test.
>> 
>> In a previous experiment, I applied the SOLR-4816 patch that someone
>> indicated might help. I also reduced the CSV upload chunk size to 500. It
>> seemed like things got a little better, but still eventually hung.
>> 
>> I also see SOLR-5081, but I don't know if that is my issue or not. At least
>> in my test, the index writes are not parallel as in the ticket.
>> 
>> -Kevin
>> 
>> 
>> On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman <
>> jhellman@innoventsolutions.com> wrote:
>> 
>>> While I don't have a past history of this issue to use as reference, if I
>>> were in your shoes I would consider trying your updates with softCommit
>>> disabled.  My suspicion is you're experiencing some issue with the
>>> transaction logging and how it's managed when your hard commit occurs.
>>> 
>>> If you can give that a try and let us know how that fares we might have
>>> some further input to share.
>>> 
>>> 
>>> On Aug 13, 2013, at 11:54 AM, Kevin Osborn <ke...@cbsi.com>
>> wrote:
>>> 
>>>> I am using Solr Cloud 4.4. It is pretty much a base configuration. We
>>> have
>>>> 2 servers and 3 collections. Collection1 is 1 shard and the Collection2
>>> and
>>>> Collection3 both have 2 shards. Both servers are identical.
>>>> 
>>>> So, here is my process, I do a lot of queries on Collection1 and
>>>> Collection2. I then do a bunch of inserts into Collection3. I am doing
>>> CSV
>>>> uploads. I am also doing custom shard routing. All the products in a
>>> single
>>>> upload will have the same shard key. All Solr interaction is through
>>> SolrJ
>>>> with full Zookeeper awareness. My uploads are also using soft commits.
>>>> 
>>>> I tried this on a record set of 936 products. Everything worked fine. I
>>>> then sent over a record set of 300k products. The upload into
>> Collection3
>>>> is chunked. I tried both 1000 and 200,000 with similar results. The
>> first
>>>> upload to Solr would just hang. There would simply be no response from
>>>> Solr. A few of the products from this request would make it into the
>>> index,
>>>> but not many.
>>>> 
>>>> In this state, queries continued to work, but deletes did not.
>>>> 
>>>> My only solution was to kill each Solr process.
>>>> 
>>>> As an experiment, I did the large catalog first. First, I reset
>>> everything.
>>>> With A chunk size of 1000, about 110,000 out of 300,000 records made it
>>>> into Solr before the process hung. Again, queries worked, but deletes
>> did
>>>> not and I had to kill Solr. It hung after about 30 seconds.
>> Timing-wise,
>>>> this is at about the second autocommit cycle, given the default
>>> autocommit
>>>> of 15 seconds. I am not sure if this is related or not.
>>>> 
>>>> As an additional experiment, I ran the entire test with just a single
>>> node
>>>> in the cluster. This time, everything ran fine.
>>>> 
>>>> Does anyone have any ideas? Everything is pretty default. These servers
>>> are
>>>> Azure VMs, although I have seen similar behavior running two Solr
>>> instances
>>>> on a single internal server as well.
>>>> 
>>>> I had also noticed similar behavior before with Solr 4.3. It definitely
>>> has
>>>> something do with the clustering, but I am not sure what. And I don't
>> see
>>>> any error message (or really anything else) in the Solr logs.
>>>> 
>>>> Thanks.
>>>> 
>>>> --
>>>> *KEVIN OSBORN*
>>>> LEAD SOFTWARE ENGINEER
>>>> CNET Content Solutions
>>>> OFFICE 949.399.8714
>>>> CELL 949.310.4677      SKYPE osbornk
>>>> 5 Park Plaza, Suite 600, Irvine, CA 92614
>>>> [image: CNET Content Solutions]
>>> 
>>> 
>> 
>> 
>> --
>> *KEVIN OSBORN*
>> LEAD SOFTWARE ENGINEER
>> CNET Content Solutions
>> OFFICE 949.399.8714
>> CELL 949.310.4677      SKYPE osbornk
>> 5 Park Plaza, Suite 600, Irvine, CA 92614
>> [image: CNET Content Solutions]
>> 


Re: Indexing hangs when more than 1 server in a cluster

Posted by Erick Erickson <er...@gmail.com>.
right, SOLR-5081 is possible but somewhat unlikely
given the fact that you actually don't have very many
nodes in your cluster.

soft commits aren't relevant to the tlog, but here's
the thing. Your tlogs may get replayed
when you restart solr. If they're large, this may take
a long time. When you said you restarted Solr after
killing it, you might have triggered this.

The way to keep tlogs small is to hard commit more
frequently (you should look at their size before
worrying about it though!). If you set openSearcher=false,
this is pretty inexpensive, all it really does is close
the current segment files, open new ones, and start a new
tlog file. It does _not_ invalidate caches, do autowarming,
all that expensive stuff.

Your soft commit does _not_ improve performance! It is
just "less expensive" than a hard commit with
openSearcher=true. It _does_ invalidate caches, fire
off autowarming, etc. So it does "improve performance"
over doing hard commits with openSearcher=true
with the same frequency, but it still isn't free. It's still
good to have the soft commit interval as long as you
can tolerate.

It's perfectly reasonable to have a hard commit interval
that's much shorter than your soft commit interval. As
Yonik explained once, "soft commits are about visibility
but hard commits are about durability".

Best
Erick


On Wed, Aug 14, 2013 at 2:20 AM, Kevin Osborn <ke...@cbsi.com> wrote:

> Interesting, that did work. Do you or anyone else have any ideas or what I
> should look at? While soft commit is not a requirement in my project, my
> understanding is that it should help performance. On the same index, I will
> be doing both a large number of queries as well as updates.
>
> If I have to disable autoCommit, should I increase the chunk size?
>
> Of course, I will have to run a more large scale test tomorrow, but I saw
> this problem fairly consistently in my smaller test.
>
> In a previous experiment, I applied the SOLR-4816 patch that someone
> indicated might help. I also reduced the CSV upload chunk size to 500. It
> seemed like things got a little better, but still eventually hung.
>
> I also see SOLR-5081, but I don't know if that is my issue or not. At least
> in my test, the index writes are not parallel as in the ticket.
>
> -Kevin
>
>
> On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman <
> jhellman@innoventsolutions.com> wrote:
>
> > While I don't have a past history of this issue to use as reference, if I
> > were in your shoes I would consider trying your updates with softCommit
> > disabled.  My suspicion is you're experiencing some issue with the
> > transaction logging and how it's managed when your hard commit occurs.
> >
> > If you can give that a try and let us know how that fares we might have
> > some further input to share.
> >
> >
> > On Aug 13, 2013, at 11:54 AM, Kevin Osborn <ke...@cbsi.com>
> wrote:
> >
> > > I am using Solr Cloud 4.4. It is pretty much a base configuration. We
> > have
> > > 2 servers and 3 collections. Collection1 is 1 shard and the Collection2
> > and
> > > Collection3 both have 2 shards. Both servers are identical.
> > >
> > > So, here is my process, I do a lot of queries on Collection1 and
> > > Collection2. I then do a bunch of inserts into Collection3. I am doing
> > CSV
> > > uploads. I am also doing custom shard routing. All the products in a
> > single
> > > upload will have the same shard key. All Solr interaction is through
> > SolrJ
> > > with full Zookeeper awareness. My uploads are also using soft commits.
> > >
> > > I tried this on a record set of 936 products. Everything worked fine. I
> > > then sent over a record set of 300k products. The upload into
> Collection3
> > > is chunked. I tried both 1000 and 200,000 with similar results. The
> first
> > > upload to Solr would just hang. There would simply be no response from
> > > Solr. A few of the products from this request would make it into the
> > index,
> > > but not many.
> > >
> > > In this state, queries continued to work, but deletes did not.
> > >
> > > My only solution was to kill each Solr process.
> > >
> > > As an experiment, I did the large catalog first. First, I reset
> > everything.
> > > With A chunk size of 1000, about 110,000 out of 300,000 records made it
> > > into Solr before the process hung. Again, queries worked, but deletes
> did
> > > not and I had to kill Solr. It hung after about 30 seconds.
> Timing-wise,
> > > this is at about the second autocommit cycle, given the default
> > autocommit
> > > of 15 seconds. I am not sure if this is related or not.
> > >
> > > As an additional experiment, I ran the entire test with just a single
> > node
> > > in the cluster. This time, everything ran fine.
> > >
> > > Does anyone have any ideas? Everything is pretty default. These servers
> > are
> > > Azure VMs, although I have seen similar behavior running two Solr
> > instances
> > > on a single internal server as well.
> > >
> > > I had also noticed similar behavior before with Solr 4.3. It definitely
> > has
> > > something do with the clustering, but I am not sure what. And I don't
> see
> > > any error message (or really anything else) in the Solr logs.
> > >
> > > Thanks.
> > >
> > > --
> > > *KEVIN OSBORN*
> > > LEAD SOFTWARE ENGINEER
> > > CNET Content Solutions
> > > OFFICE 949.399.8714
> > > CELL 949.310.4677      SKYPE osbornk
> > > 5 Park Plaza, Suite 600, Irvine, CA 92614
> > > [image: CNET Content Solutions]
> >
> >
>
>
> --
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677      SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]
>

Re: Indexing hangs when more than 1 server in a cluster

Posted by Kevin Osborn <ke...@cbsi.com>.
Interesting, that did work. Do you or anyone else have any ideas or what I
should look at? While soft commit is not a requirement in my project, my
understanding is that it should help performance. On the same index, I will
be doing both a large number of queries as well as updates.

If I have to disable autoCommit, should I increase the chunk size?

Of course, I will have to run a more large scale test tomorrow, but I saw
this problem fairly consistently in my smaller test.

In a previous experiment, I applied the SOLR-4816 patch that someone
indicated might help. I also reduced the CSV upload chunk size to 500. It
seemed like things got a little better, but still eventually hung.

I also see SOLR-5081, but I don't know if that is my issue or not. At least
in my test, the index writes are not parallel as in the ticket.

-Kevin


On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman <
jhellman@innoventsolutions.com> wrote:

> While I don't have a past history of this issue to use as reference, if I
> were in your shoes I would consider trying your updates with softCommit
> disabled.  My suspicion is you're experiencing some issue with the
> transaction logging and how it's managed when your hard commit occurs.
>
> If you can give that a try and let us know how that fares we might have
> some further input to share.
>
>
> On Aug 13, 2013, at 11:54 AM, Kevin Osborn <ke...@cbsi.com> wrote:
>
> > I am using Solr Cloud 4.4. It is pretty much a base configuration. We
> have
> > 2 servers and 3 collections. Collection1 is 1 shard and the Collection2
> and
> > Collection3 both have 2 shards. Both servers are identical.
> >
> > So, here is my process, I do a lot of queries on Collection1 and
> > Collection2. I then do a bunch of inserts into Collection3. I am doing
> CSV
> > uploads. I am also doing custom shard routing. All the products in a
> single
> > upload will have the same shard key. All Solr interaction is through
> SolrJ
> > with full Zookeeper awareness. My uploads are also using soft commits.
> >
> > I tried this on a record set of 936 products. Everything worked fine. I
> > then sent over a record set of 300k products. The upload into Collection3
> > is chunked. I tried both 1000 and 200,000 with similar results. The first
> > upload to Solr would just hang. There would simply be no response from
> > Solr. A few of the products from this request would make it into the
> index,
> > but not many.
> >
> > In this state, queries continued to work, but deletes did not.
> >
> > My only solution was to kill each Solr process.
> >
> > As an experiment, I did the large catalog first. First, I reset
> everything.
> > With A chunk size of 1000, about 110,000 out of 300,000 records made it
> > into Solr before the process hung. Again, queries worked, but deletes did
> > not and I had to kill Solr. It hung after about 30 seconds. Timing-wise,
> > this is at about the second autocommit cycle, given the default
> autocommit
> > of 15 seconds. I am not sure if this is related or not.
> >
> > As an additional experiment, I ran the entire test with just a single
> node
> > in the cluster. This time, everything ran fine.
> >
> > Does anyone have any ideas? Everything is pretty default. These servers
> are
> > Azure VMs, although I have seen similar behavior running two Solr
> instances
> > on a single internal server as well.
> >
> > I had also noticed similar behavior before with Solr 4.3. It definitely
> has
> > something do with the clustering, but I am not sure what. And I don't see
> > any error message (or really anything else) in the Solr logs.
> >
> > Thanks.
> >
> > --
> > *KEVIN OSBORN*
> > LEAD SOFTWARE ENGINEER
> > CNET Content Solutions
> > OFFICE 949.399.8714
> > CELL 949.310.4677      SKYPE osbornk
> > 5 Park Plaza, Suite 600, Irvine, CA 92614
> > [image: CNET Content Solutions]
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: Indexing hangs when more than 1 server in a cluster

Posted by Jason Hellman <jh...@innoventsolutions.com>.
While I don't have a past history of this issue to use as reference, if I were in your shoes I would consider trying your updates with softCommit disabled.  My suspicion is you're experiencing some issue with the transaction logging and how it's managed when your hard commit occurs.

If you can give that a try and let us know how that fares we might have some further input to share.


On Aug 13, 2013, at 11:54 AM, Kevin Osborn <ke...@cbsi.com> wrote:

> I am using Solr Cloud 4.4. It is pretty much a base configuration. We have
> 2 servers and 3 collections. Collection1 is 1 shard and the Collection2 and
> Collection3 both have 2 shards. Both servers are identical.
> 
> So, here is my process, I do a lot of queries on Collection1 and
> Collection2. I then do a bunch of inserts into Collection3. I am doing CSV
> uploads. I am also doing custom shard routing. All the products in a single
> upload will have the same shard key. All Solr interaction is through SolrJ
> with full Zookeeper awareness. My uploads are also using soft commits.
> 
> I tried this on a record set of 936 products. Everything worked fine. I
> then sent over a record set of 300k products. The upload into Collection3
> is chunked. I tried both 1000 and 200,000 with similar results. The first
> upload to Solr would just hang. There would simply be no response from
> Solr. A few of the products from this request would make it into the index,
> but not many.
> 
> In this state, queries continued to work, but deletes did not.
> 
> My only solution was to kill each Solr process.
> 
> As an experiment, I did the large catalog first. First, I reset everything.
> With A chunk size of 1000, about 110,000 out of 300,000 records made it
> into Solr before the process hung. Again, queries worked, but deletes did
> not and I had to kill Solr. It hung after about 30 seconds. Timing-wise,
> this is at about the second autocommit cycle, given the default autocommit
> of 15 seconds. I am not sure if this is related or not.
> 
> As an additional experiment, I ran the entire test with just a single node
> in the cluster. This time, everything ran fine.
> 
> Does anyone have any ideas? Everything is pretty default. These servers are
> Azure VMs, although I have seen similar behavior running two Solr instances
> on a single internal server as well.
> 
> I had also noticed similar behavior before with Solr 4.3. It definitely has
> something do with the clustering, but I am not sure what. And I don't see
> any error message (or really anything else) in the Solr logs.
> 
> Thanks.
> 
> -- 
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677      SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]