You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by maephisto <my...@yahoo.com> on 2013/10/14 15:44:14 UTC

Concurent indexing

Hi,

I have a collection (numShards=3, replicationFactor=2) split on 2 machines.
Since the amount of data is huge I have to index, I would like start
multiple instances of the same process that would index data to Solr.
Is there any limitation or counter-indication is this area? 

The indexing client is custom built by me and parses files (each instance
parses a different file), and the uniqueId is auto-generated. 
Would a commit in a process also commit the uncommitted changes created by
another process?



--
View this message in context: http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Concurent indexing

Posted by Chris Geeringh <ge...@gmail.com>.

Erick, yes. Using SolrJ and CloudSolrServer - both 4.6 snapshots from 13 Oct


On 18 October 2013 12:17, Erick Erickson <er...@gmail.com> wrote:

> Chris:
>
> OK, one of those stack traces does have the problem I referenced in the
> other thread. Are you sending updates to the server with SolrJ? And are you
> using CloudSolrServer? If you are, I'm surprised...
>
>  There are the important lines:
>
>    1. - java.util.concurrent.Semaphore.acquire() @bci=5, line=317 (Compiled
>    frame)
>    2.  - org.apache.solr.util.AdjustableSemaphore.acquire() @bci=4, line=61
>    (Compiled frame)
>    3.  - org.apache.solr.update.SolrCmdDistributor.submit(org.apache.solr.
>    update.SolrCmdDistributor$Request) @bci=22, line=418 (Compiled frame)
>    4.  - org.apache.solr.update.SolrCmdDistributor.submit(org.apache.solr.
>    client.solrj.request.UpdateRequest,
>
>
>
>
>
> On Wed, Oct 16, 2013 at 2:04 PM, Chris Geeringh <ge...@gmail.com>
> wrote:
>
> > Here's another jstack http://pastebin.com/8JiQc3rb
> >
> >
> > On 16 October 2013 11:53, Chris Geeringh <ge...@gmail.com> wrote:
> >
> > > Hi Erick, here is a paste from other thread (debugging update request)
> > > with my input as I am seeing errors too:
> > >
> > > I ran an import last night, and this morning my cloud wouldn't accept
> > > updates. I'm running the latest 4.6 snapshot. I was importing with
> latest
> > > solrj snapshot, and using java bin transport with CloudSolrServer.
> > >
> > > The cluster had indexed ~1.3 million docs before no further updates
> were
> > > accepted, querying still working.
> > >
> > > I'll run jstack shortly and provide the results.
> > >
> > > Here is my jstack output... Lots of blocked threads.
> > >
> > > http://pastebin.com/1ktjBYbf
> > >
> > >
> > >
> > > On 16 October 2013 11:46, Erick Erickson <er...@gmail.com>
> > wrote:
> > >
> > >> Run jstack on the solr process (standard with Java) and
> > >> look for the word "semaphore". You should see your
> > >> servers blocked on this in the Solr code. That'll pretty
> > >> much nail it.
> > >>
> > >> There's an open JIRA to fix the underlying cause, see:
> > >> SOLR-5232, but that's currently slated for 4.6 which
> > >> won't be cut for a while.
> > >>
> > >> Also, there's a patch that will fix this as a side effect,
> > >> assuming you're using SolrJ, see. This is available in 4.5
> > >> SOLR-4816
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Oct 15, 2013 at 1:33 PM, michael.boom <my...@yahoo.com>
> > >> wrote:
> > >>
> > >> > Here's some of the Solr's last words (log content before it stoped
> > >> > accepting
> > >> > updates), maybe someone can help me interpret that.
> > >> > http://pastebin.com/mv7fH62H
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > View this message in context:
> > >> >
> > >>
> >
> http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095642.html
> > >> > Sent from the Solr - User mailing list archive at Nabble.com.
> > >> >
> > >>
> > >
> > >
> >
>

Re: Concurent indexing

Posted by Erick Erickson <er...@gmail.com>.

Chris:

OK, one of those stack traces does have the problem I referenced in the
other thread. Are you sending updates to the server with SolrJ? And are you
using CloudSolrServer? If you are, I'm surprised...

 There are the important lines:

   1. - java.util.concurrent.Semaphore.acquire() @bci=5, line=317 (Compiled
   frame)
   2.  - org.apache.solr.util.AdjustableSemaphore.acquire() @bci=4, line=61
   (Compiled frame)
   3.  - org.apache.solr.update.SolrCmdDistributor.submit(org.apache.solr.
   update.SolrCmdDistributor$Request) @bci=22, line=418 (Compiled frame)
   4.  - org.apache.solr.update.SolrCmdDistributor.submit(org.apache.solr.
   client.solrj.request.UpdateRequest,





On Wed, Oct 16, 2013 at 2:04 PM, Chris Geeringh <ge...@gmail.com> wrote:

> Here's another jstack http://pastebin.com/8JiQc3rb
>
>
> On 16 October 2013 11:53, Chris Geeringh <ge...@gmail.com> wrote:
>
> > Hi Erick, here is a paste from other thread (debugging update request)
> > with my input as I am seeing errors too:
> >
> > I ran an import last night, and this morning my cloud wouldn't accept
> > updates. I'm running the latest 4.6 snapshot. I was importing with latest
> > solrj snapshot, and using java bin transport with CloudSolrServer.
> >
> > The cluster had indexed ~1.3 million docs before no further updates were
> > accepted, querying still working.
> >
> > I'll run jstack shortly and provide the results.
> >
> > Here is my jstack output... Lots of blocked threads.
> >
> > http://pastebin.com/1ktjBYbf
> >
> >
> >
> > On 16 October 2013 11:46, Erick Erickson <er...@gmail.com>
> wrote:
> >
> >> Run jstack on the solr process (standard with Java) and
> >> look for the word "semaphore". You should see your
> >> servers blocked on this in the Solr code. That'll pretty
> >> much nail it.
> >>
> >> There's an open JIRA to fix the underlying cause, see:
> >> SOLR-5232, but that's currently slated for 4.6 which
> >> won't be cut for a while.
> >>
> >> Also, there's a patch that will fix this as a side effect,
> >> assuming you're using SolrJ, see. This is available in 4.5
> >> SOLR-4816
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >>
> >> On Tue, Oct 15, 2013 at 1:33 PM, michael.boom <my...@yahoo.com>
> >> wrote:
> >>
> >> > Here's some of the Solr's last words (log content before it stoped
> >> > accepting
> >> > updates), maybe someone can help me interpret that.
> >> > http://pastebin.com/mv7fH62H
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >> >
> >>
> http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095642.html
> >> > Sent from the Solr - User mailing list archive at Nabble.com.
> >> >
> >>
> >
> >
>

Re: Concurent indexing

Posted by Chris Geeringh <ge...@gmail.com>.

Here's another jstack http://pastebin.com/8JiQc3rb


On 16 October 2013 11:53, Chris Geeringh <ge...@gmail.com> wrote:

> Hi Erick, here is a paste from other thread (debugging update request)
> with my input as I am seeing errors too:
>
> I ran an import last night, and this morning my cloud wouldn't accept
> updates. I'm running the latest 4.6 snapshot. I was importing with latest
> solrj snapshot, and using java bin transport with CloudSolrServer.
>
> The cluster had indexed ~1.3 million docs before no further updates were
> accepted, querying still working.
>
> I'll run jstack shortly and provide the results.
>
> Here is my jstack output... Lots of blocked threads.
>
> http://pastebin.com/1ktjBYbf
>
>
>
> On 16 October 2013 11:46, Erick Erickson <er...@gmail.com> wrote:
>
>> Run jstack on the solr process (standard with Java) and
>> look for the word "semaphore". You should see your
>> servers blocked on this in the Solr code. That'll pretty
>> much nail it.
>>
>> There's an open JIRA to fix the underlying cause, see:
>> SOLR-5232, but that's currently slated for 4.6 which
>> won't be cut for a while.
>>
>> Also, there's a patch that will fix this as a side effect,
>> assuming you're using SolrJ, see. This is available in 4.5
>> SOLR-4816
>>
>> Best,
>> Erick
>>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 1:33 PM, michael.boom <my...@yahoo.com>
>> wrote:
>>
>> > Here's some of the Solr's last words (log content before it stoped
>> > accepting
>> > updates), maybe someone can help me interpret that.
>> > http://pastebin.com/mv7fH62H
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> >
>> http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095642.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>
>
>

Re: Concurent indexing

Posted by Chris Geeringh <ge...@gmail.com>.

Hi Erick, here is a paste from other thread (debugging update request) with
my input as I am seeing errors too:

I ran an import last night, and this morning my cloud wouldn't accept
updates. I'm running the latest 4.6 snapshot. I was importing with latest
solrj snapshot, and using java bin transport with CloudSolrServer.

The cluster had indexed ~1.3 million docs before no further updates were
accepted, querying still working.

I'll run jstack shortly and provide the results.

Here is my jstack output... Lots of blocked threads.

http://pastebin.com/1ktjBYbf

On 16 October 2013 11:46, Erick Erickson <er...@gmail.com> wrote:

> Run jstack on the solr process (standard with Java) and
> look for the word "semaphore". You should see your
> servers blocked on this in the Solr code. That'll pretty
> much nail it.
>
> There's an open JIRA to fix the underlying cause, see:
> SOLR-5232, but that's currently slated for 4.6 which
> won't be cut for a while.
>
> Also, there's a patch that will fix this as a side effect,
> assuming you're using SolrJ, see. This is available in 4.5
> SOLR-4816
>
> Best,
> Erick
>
>
>
>
> On Tue, Oct 15, 2013 at 1:33 PM, michael.boom <my...@yahoo.com> wrote:
>
> > Here's some of the Solr's last words (log content before it stoped
> > accepting
> > updates), maybe someone can help me interpret that.
> > http://pastebin.com/mv7fH62H
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095642.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Concurent indexing

Posted by Erick Erickson <er...@gmail.com>.

Run jstack on the solr process (standard with Java) and
look for the word "semaphore". You should see your
servers blocked on this in the Solr code. That'll pretty
much nail it.

There's an open JIRA to fix the underlying cause, see:
SOLR-5232, but that's currently slated for 4.6 which
won't be cut for a while.

Also, there's a patch that will fix this as a side effect,
assuming you're using SolrJ, see. This is available in 4.5
SOLR-4816

Best,
Erick

On Tue, Oct 15, 2013 at 1:33 PM, michael.boom <my...@yahoo.com> wrote:

> Here's some of the Solr's last words (log content before it stoped
> accepting
> updates), maybe someone can help me interpret that.
> http://pastebin.com/mv7fH62H
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095642.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Concurent indexing

Posted by "michael.boom" <my...@yahoo.com>.

Here's some of the Solr's last words (log content before it stoped accepting
updates), maybe someone can help me interpret that.
http://pastebin.com/mv7fH62H



--
View this message in context: http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095642.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Concurent indexing

Posted by maephisto <my...@yahoo.com>.

Hi Chris!
Could you describe your problem, how similar is it to mine?
Also, on which version of Solr are you encountering it?



--
View this message in context: http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095630.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Concurent indexing

Posted by Chris Geeringh <ge...@gmail.com>.

I can confirm I am seeing the same issue with Tomcat - cluster split over 4
nodes.

Is this fix in a 4.6 snapshot?


On 15 October 2013 08:28, maephisto <my...@yahoo.com> wrote:

> Thanks for the tip!
>
> I must mention that I am using Solr 4.4.0 and this problem only appears
> when
> i'm doing the indexing in the SolrCloud configuration deployed on
> standalone
> Jetty 9.0.6.
> When I do the same operations on a modified example in Solr 4.4.0 with
> embedded Jetty, indexing to a simple core, I do not have any problem of
> this
> sort.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095610.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Concurent indexing

Posted by maephisto <my...@yahoo.com>.

Thanks for the tip!

I must mention that I am using Solr 4.4.0 and this problem only appears when
i'm doing the indexing in the SolrCloud configuration deployed on standalone
Jetty 9.0.6.
When I do the same operations on a modified example in Solr 4.4.0 with
embedded Jetty, indexing to a simple core, I do not have any problem of this
sort.



--
View this message in context: http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Concurent indexing

Posted by Steve Rowe <sa...@gmail.com>.

Hi maephisto,

This issue can cause an update deadlock, and may have caused the problem you were seeing: https://issues.apache.org/jira/browse/SOLR-4327 - a fix will be included in forthcoming 4.5.1.

Steve


On Oct 14, 2013, at 10:20 AM, maephisto <my...@yahoo.com> wrote:

> Thank you!
> 
> I was worried because i was experimenting with this system, and at some
> point i was processing 2 big files and both indexing processes had added
> about 750k docs when suddenly Solr simply refused to accept any more added
> docs. Querying was working fine but trying to add 1 more single doc would
> get no response.
> (I had no autocommit setup)
> 
> It only came back to life when i restarted Jetty.
> Any idea what went wrong? Is there a maximum nr of docs that can be added? 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095416.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Concurent indexing

Posted by maephisto <my...@yahoo.com>.

Thank you!

I was worried because i was experimenting with this system, and at some
point i was processing 2 big files and both indexing processes had added
about 750k docs when suddenly Solr simply refused to accept any more added
docs. Querying was working fine but trying to add 1 more single doc would
get no response.
(I had no autocommit setup)

It only came back to life when i restarted Jetty.
Any idea what went wrong? Is there a maximum nr of docs that can be added? 



--
View this message in context: http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095416.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Concurent indexing

Posted by Jason Hellman <jh...@innoventsolutions.com>.

The limitations on how many threads you can use to load data is primarily driven by factors on your hardware:  CPU, heap usage, I/O, and the like.  It is common for most index load processes to be able to handle more incoming data on the Solr side of the equation than can typically be loaded from the source repository.  You'll have to explore a bit to find the limits, but if your hardware is sufficient you can likely load a great deal.

As for commits, they will indeed commit anything added to Solr regardless of the thread of the update.  Keep this in mind if you have a rollback concept in mind, or if you're measuring your incremental load to restart in case of error/failure.  Presuming you want more control, and If you are multi-threading index updates, it may be useful to have a delegate handle the commit process…or on a large data load, consider a commit at the end.  

On Oct 14, 2013, at 6:44 AM, maephisto <my...@yahoo.com> wrote:

> Hi,
> 
> I have a collection (numShards=3, replicationFactor=2) split on 2 machines.
> Since the amount of data is huge I have to index, I would like start
> multiple instances of the same process that would index data to Solr.
> Is there any limitation or counter-indication is this area? 
> 
> The indexing client is custom built by me and parses files (each instance
> parses a different file), and the uniqueId is auto-generated. 
> Would a commit in a process also commit the uncommitted changes created by
> another process?
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409.html
> Sent from the Solr - User mailing list archive at Nabble.com.