You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jamie Johnson <je...@gmail.com> on 2012/02/17 02:56:49 UTC

distributed deletes working?

With solr-2358 being committed to trunk do deletes and updates get
distributed/routed like adds do? Also when a down shard comes back up are
the deletes/updates forwarded as well? Reading the jira I believe the
answer is yes, I just want to verify before bringing the latest into my
environment.

Re: distributed deletes working?

Posted by Sami Siren <ss...@gmail.com>.

On Fri, Feb 17, 2012 at 6:03 PM, Jamie Johnson <je...@gmail.com> wrote:
> Thanks Sami, so long at it's expected ;)
>
> In regards to the replication not working the way I think it should,
> am I missing something or is it simply not working the way I think?

It should work. I also tried to reproduce your issue but was not able
to. Could you try reproduce your problem with the provided scripts
that are in solr/cloud-dev/ I think example2.sh might be a good start.
It's not identical to your situation (it has 1 core per instance) but
would be great if you could verify that you see the issue with that
setup or not.

--
 Sami Siren

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

Thanks Mark.  I'll pull the latest trunk today and run with that.

On Sun, Feb 26, 2012 at 10:37 AM, Mark Miller <ma...@gmail.com> wrote:
>>
>>
>>
>> Are there any outstanding issues that I should be aware of?
>>
>>
> Not that I know of - we where trying to track down an issue around peer
> sync recovery that our ChaosMonkey* tests were tripping, but looks like
> Yonik may have tracked that down last night.
>
> * The ChaosMonkey tests  randomly start, stop, and kill servers as we index
> and delete with multiple threads - at the end we make sure everything is
> consistent and that the client(s) had no errors sending requests.
>
> --
> - Mark
>
> http://www.lucidimagination.com

Re: distributed deletes working?

Posted by Mark Miller <ma...@gmail.com>.

>
>
>
> Are there any outstanding issues that I should be aware of?
>
>
Not that I know of - we where trying to track down an issue around peer
sync recovery that our ChaosMonkey* tests were tripping, but looks like
Yonik may have tracked that down last night.

* The ChaosMonkey tests  randomly start, stop, and kill servers as we index
and delete with multiple threads - at the end we make sure everything is
consistent and that the client(s) had no errors sending requests.

-- 
- Mark

http://www.lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

I'm always excited to try the new stuff, we've been running off of a
fairly old version from solrbranch which I'd like to move off of, this
is the first step.

Are there any outstanding issues that I should be aware of?  I've only
run simple tests thus far, but plan to run some more comprehensive
tests this week.

On Fri, Feb 24, 2012 at 11:10 PM, Mark Miller <ma...@gmail.com> wrote:
> No problem! Only wish I had found what the issue was sooner - I suspected a couple other issues we found could have been related, but since I had not duplicated it I could not be sure - the early use and feedback is invaluable though.
>
> On Feb 24, 2012, at 10:53 PM, Jamie Johnson wrote:
>
>> Pulling the latest version seems to have fixed whatever issue that
>> previously existed so everything appears to be working properly.  I'm
>> seeing updates make it to the downed server once it recovers and
>> deletes (even by query, which I wasn't sure would work) being
>> forwarded as well.  So everything looks good.  Again thanks for
>> helping out with this.
>>
>> On Fri, Feb 24, 2012 at 10:23 PM, Jamie Johnson <je...@gmail.com> wrote:
>>> I'm pulling the latest now.  Once I've rebuilt and setup the test I'll
>>> forward all the logs on to you.  Again thanks for looking into this.
>>>
>>> On Fri, Feb 24, 2012 at 9:20 PM, Mark Miller <ma...@gmail.com> wrote:
>>>>
>>>> On Feb 22, 2012, at 9:54 PM, Jamie Johnson wrote:
>>>>
>>>>> Perhaps if you could give me the steps you're using to test I can find
>>>>> an error in what I'm doing.
>>>>
>>>>
>>>> I've tested a few ways - one of them as Sami has already explained.
>>>>
>>>> Could you try one more time from the latest? A few issues have been addressed recently.
>>>>
>>>> If it still doesn't work, could you zip up all the logs and fire them off to me? I'm sure I should be able to see what is going on from that.
>>>>
>>>> - Mark
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

Re: distributed deletes working?

Posted by Mark Miller <ma...@gmail.com>.

No problem! Only wish I had found what the issue was sooner - I suspected a couple other issues we found could have been related, but since I had not duplicated it I could not be sure - the early use and feedback is invaluable though.

On Feb 24, 2012, at 10:53 PM, Jamie Johnson wrote:

> Pulling the latest version seems to have fixed whatever issue that
> previously existed so everything appears to be working properly.  I'm
> seeing updates make it to the downed server once it recovers and
> deletes (even by query, which I wasn't sure would work) being
> forwarded as well.  So everything looks good.  Again thanks for
> helping out with this.
> 
> On Fri, Feb 24, 2012 at 10:23 PM, Jamie Johnson <je...@gmail.com> wrote:
>> I'm pulling the latest now.  Once I've rebuilt and setup the test I'll
>> forward all the logs on to you.  Again thanks for looking into this.
>> 
>> On Fri, Feb 24, 2012 at 9:20 PM, Mark Miller <ma...@gmail.com> wrote:
>>> 
>>> On Feb 22, 2012, at 9:54 PM, Jamie Johnson wrote:
>>> 
>>>> Perhaps if you could give me the steps you're using to test I can find
>>>> an error in what I'm doing.
>>> 
>>> 
>>> I've tested a few ways - one of them as Sami has already explained.
>>> 
>>> Could you try one more time from the latest? A few issues have been addressed recently.
>>> 
>>> If it still doesn't work, could you zip up all the logs and fire them off to me? I'm sure I should be able to see what is going on from that.
>>> 
>>> - Mark
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 

- Mark Miller
lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

Pulling the latest version seems to have fixed whatever issue that
previously existed so everything appears to be working properly.  I'm
seeing updates make it to the downed server once it recovers and
deletes (even by query, which I wasn't sure would work) being
forwarded as well.  So everything looks good.  Again thanks for
helping out with this.

On Fri, Feb 24, 2012 at 10:23 PM, Jamie Johnson <je...@gmail.com> wrote:
> I'm pulling the latest now.  Once I've rebuilt and setup the test I'll
> forward all the logs on to you.  Again thanks for looking into this.
>
> On Fri, Feb 24, 2012 at 9:20 PM, Mark Miller <ma...@gmail.com> wrote:
>>
>> On Feb 22, 2012, at 9:54 PM, Jamie Johnson wrote:
>>
>>> Perhaps if you could give me the steps you're using to test I can find
>>> an error in what I'm doing.
>>
>>
>> I've tested a few ways - one of them as Sami has already explained.
>>
>> Could you try one more time from the latest? A few issues have been addressed recently.
>>
>> If it still doesn't work, could you zip up all the logs and fire them off to me? I'm sure I should be able to see what is going on from that.
>>
>> - Mark
>>
>>
>>
>>
>>
>>
>>
>>

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

I'm pulling the latest now.  Once I've rebuilt and setup the test I'll
forward all the logs on to you.  Again thanks for looking into this.

On Fri, Feb 24, 2012 at 9:20 PM, Mark Miller <ma...@gmail.com> wrote:
>
> On Feb 22, 2012, at 9:54 PM, Jamie Johnson wrote:
>
>> Perhaps if you could give me the steps you're using to test I can find
>> an error in what I'm doing.
>
>
> I've tested a few ways - one of them as Sami has already explained.
>
> Could you try one more time from the latest? A few issues have been addressed recently.
>
> If it still doesn't work, could you zip up all the logs and fire them off to me? I'm sure I should be able to see what is going on from that.
>
> - Mark
>
>
>
>
>
>
>
>

Re: distributed deletes working?

Posted by Mark Miller <ma...@gmail.com>.

On Feb 22, 2012, at 9:54 PM, Jamie Johnson wrote:

> Perhaps if you could give me the steps you're using to test I can find
> an error in what I'm doing.

I've tested a few ways - one of them as Sami has already explained.

Could you try one more time from the latest? A few issues have been addressed recently.

If it still doesn't work, could you zip up all the logs and fire them off to me? I'm sure I should be able to see what is going on from that.

- Mark

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

Perhaps if you could give me the steps you're using to test I can find
an error in what I'm doing.


On Wed, Feb 22, 2012 at 9:24 PM, Mark Miller <ma...@gmail.com> wrote:
> Yonik did fix an issue around peer sync and deletes a few days ago - long chance that was involved?
>
> Otherwise, neither Sami nor I have replicated these results so far.
>
> On Feb 22, 2012, at 8:56 PM, Jamie Johnson wrote:
>
>> I know everyone is busy, but I was wondering if anyone had found
>> anything with this?  Any suggestions on what I could be doing wrong
>> would be greatly appreciated.
>>
>> On Fri, Feb 17, 2012 at 4:08 PM, Mark Miller <ma...@gmail.com> wrote:
>>>
>>> On Feb 17, 2012, at 3:56 PM, Jamie Johnson wrote:
>>>
>>>> id field is a UUID.
>>>
>>> Strange - was using UUID's myself in same test this morning...
>>>
>>> I'll try again soon.
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

Re: distributed deletes working?

Posted by Mark Miller <ma...@gmail.com>.

Yonik did fix an issue around peer sync and deletes a few days ago - long chance that was involved?

Otherwise, neither Sami nor I have replicated these results so far.

On Feb 22, 2012, at 8:56 PM, Jamie Johnson wrote:

> I know everyone is busy, but I was wondering if anyone had found
> anything with this?  Any suggestions on what I could be doing wrong
> would be greatly appreciated.
> 
> On Fri, Feb 17, 2012 at 4:08 PM, Mark Miller <ma...@gmail.com> wrote:
>> 
>> On Feb 17, 2012, at 3:56 PM, Jamie Johnson wrote:
>> 
>>> id field is a UUID.
>> 
>> Strange - was using UUID's myself in same test this morning...
>> 
>> I'll try again soon.
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

I know everyone is busy, but I was wondering if anyone had found
anything with this?  Any suggestions on what I could be doing wrong
would be greatly appreciated.

On Fri, Feb 17, 2012 at 4:08 PM, Mark Miller <ma...@gmail.com> wrote:
>
> On Feb 17, 2012, at 3:56 PM, Jamie Johnson wrote:
>
>> id field is a UUID.
>
> Strange - was using UUID's myself in same test this morning...
>
> I'll try again soon.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

Re: distributed deletes working?

Posted by Mark Miller <ma...@gmail.com>.

On Feb 17, 2012, at 3:56 PM, Jamie Johnson wrote:

> id field is a UUID.

Strange - was using UUID's myself in same test this morning...

I'll try again soon.

- Mark Miller
lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

yes committing in the mix.

id field is a UUID.

On Fri, Feb 17, 2012 at 3:22 PM, Mark Miller <ma...@gmail.com> wrote:
> You are committing in that mix right?
>
> On Feb 17, 2012, at 2:07 PM, Jamie Johnson wrote:
>
>> This was with the cloud-dev solrcloud-start.sh script (after that I've
>> used solrcloud-start-existing.sh).
>>
>> Essentially I run ./solrcloud-start-existing.sh
>> index docs
>> kill 1 of the solr instances (using kill -9 on the pid)
>> delete a doc from running instances
>> restart killed solr instance
>>
>> on doing this the deleted document is still lingering in the instance
>> that was down.
>>
>> On Fri, Feb 17, 2012 at 2:04 PM, Mark Miller <ma...@gmail.com> wrote:
>>> Hmm...just tried this with only deletes, and the replica sync'd fine for me.
>>>
>>> Is this with your multi core setup or were you trying with instances?
>>>
>>> On Feb 17, 2012, at 1:52 PM, Jamie Johnson wrote:
>>>
>>>> Yes, still seeing that.  Master has 8 items, replica has 9.  So the
>>>> delete didn't seem to work when the node was down.
>>>>
>>>> On Fri, Feb 17, 2012 at 1:41 PM, Yonik Seeley
>>>> <yo...@lucidimagination.com> wrote:
>>>>> On Fri, Feb 17, 2012 at 1:38 PM, Jamie Johnson <je...@gmail.com> wrote:
>>>>>> Something that didn't work though
>>>>>> was if a node was down when a delete happened and then comes back up,
>>>>>> that node still listed the id I deleted.  Is this currently supported?
>>>>>
>>>>> Yes, that should work fine.  Are you still seing that behavior?
>>>>>
>>>>> -Yonik
>>>>> lucidimagination.com
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

Re: distributed deletes working?

Posted by Mark Miller <ma...@gmail.com>.

You are committing in that mix right?

On Feb 17, 2012, at 2:07 PM, Jamie Johnson wrote:

> This was with the cloud-dev solrcloud-start.sh script (after that I've
> used solrcloud-start-existing.sh).
> 
> Essentially I run ./solrcloud-start-existing.sh
> index docs
> kill 1 of the solr instances (using kill -9 on the pid)
> delete a doc from running instances
> restart killed solr instance
> 
> on doing this the deleted document is still lingering in the instance
> that was down.
> 
> On Fri, Feb 17, 2012 at 2:04 PM, Mark Miller <ma...@gmail.com> wrote:
>> Hmm...just tried this with only deletes, and the replica sync'd fine for me.
>> 
>> Is this with your multi core setup or were you trying with instances?
>> 
>> On Feb 17, 2012, at 1:52 PM, Jamie Johnson wrote:
>> 
>>> Yes, still seeing that.  Master has 8 items, replica has 9.  So the
>>> delete didn't seem to work when the node was down.
>>> 
>>> On Fri, Feb 17, 2012 at 1:41 PM, Yonik Seeley
>>> <yo...@lucidimagination.com> wrote:
>>>> On Fri, Feb 17, 2012 at 1:38 PM, Jamie Johnson <je...@gmail.com> wrote:
>>>>> Something that didn't work though
>>>>> was if a node was down when a delete happened and then comes back up,
>>>>> that node still listed the id I deleted.  Is this currently supported?
>>>> 
>>>> Yes, that should work fine.  Are you still seing that behavior?
>>>> 
>>>> -Yonik
>>>> lucidimagination.com
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com

Re: distributed deletes working?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Feb 17, 2012 at 2:07 PM, Jamie Johnson <je...@gmail.com> wrote:
> This was with the cloud-dev solrcloud-start.sh script (after that I've
> used solrcloud-start-existing.sh).
>
> Essentially I run ./solrcloud-start-existing.sh
> index docs
> kill 1 of the solr instances (using kill -9 on the pid)
> delete a doc from running instances
> restart killed solr instance
>
> on doing this the deleted document is still lingering in the instance
> that was down.

Hmmm.  Shot in the dark : is your "id" field type something other than "string"?

-Yonik
lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

This was with the cloud-dev solrcloud-start.sh script (after that I've
used solrcloud-start-existing.sh).

Essentially I run ./solrcloud-start-existing.sh
index docs
kill 1 of the solr instances (using kill -9 on the pid)
delete a doc from running instances
restart killed solr instance

on doing this the deleted document is still lingering in the instance
that was down.

On Fri, Feb 17, 2012 at 2:04 PM, Mark Miller <ma...@gmail.com> wrote:
> Hmm...just tried this with only deletes, and the replica sync'd fine for me.
>
> Is this with your multi core setup or were you trying with instances?
>
> On Feb 17, 2012, at 1:52 PM, Jamie Johnson wrote:
>
>> Yes, still seeing that.  Master has 8 items, replica has 9.  So the
>> delete didn't seem to work when the node was down.
>>
>> On Fri, Feb 17, 2012 at 1:41 PM, Yonik Seeley
>> <yo...@lucidimagination.com> wrote:
>>> On Fri, Feb 17, 2012 at 1:38 PM, Jamie Johnson <je...@gmail.com> wrote:
>>>> Something that didn't work though
>>>> was if a node was down when a delete happened and then comes back up,
>>>> that node still listed the id I deleted.  Is this currently supported?
>>>
>>> Yes, that should work fine.  Are you still seing that behavior?
>>>
>>> -Yonik
>>> lucidimagination.com
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

Re: distributed deletes working?

Posted by Mark Miller <ma...@gmail.com>.

Hmm...just tried this with only deletes, and the replica sync'd fine for me.

Is this with your multi core setup or were you trying with instances?

On Feb 17, 2012, at 1:52 PM, Jamie Johnson wrote:

> Yes, still seeing that.  Master has 8 items, replica has 9.  So the
> delete didn't seem to work when the node was down.
> 
> On Fri, Feb 17, 2012 at 1:41 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Fri, Feb 17, 2012 at 1:38 PM, Jamie Johnson <je...@gmail.com> wrote:
>>> Something that didn't work though
>>> was if a node was down when a delete happened and then comes back up,
>>> that node still listed the id I deleted.  Is this currently supported?
>> 
>> Yes, that should work fine.  Are you still seing that behavior?
>> 
>> -Yonik
>> lucidimagination.com

- Mark Miller
lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

Yes, still seeing that.  Master has 8 items, replica has 9.  So the
delete didn't seem to work when the node was down.

On Fri, Feb 17, 2012 at 1:41 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Feb 17, 2012 at 1:38 PM, Jamie Johnson <je...@gmail.com> wrote:
>> Something that didn't work though
>> was if a node was down when a delete happened and then comes back up,
>> that node still listed the id I deleted.  Is this currently supported?
>
> Yes, that should work fine.  Are you still seing that behavior?
>
> -Yonik
> lucidimagination.com

Re: distributed deletes working?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Feb 17, 2012 at 1:38 PM, Jamie Johnson <je...@gmail.com> wrote:
> Something that didn't work though
> was if a node was down when a delete happened and then comes back up,
> that node still listed the id I deleted.  Is this currently supported?

Yes, that should work fine.  Are you still seing that behavior?

-Yonik
lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

Ok, so I'm making some progress now.  With _version_ in the schema
(forgot about this because I remember asking about it before) deletes
across the cluster work when I delete by id.  Updates work as well if
a node is down it recovered fine.  Something that didn't work though
was if a node was down when a delete happened and then comes back up,
that node still listed the id I deleted.  Is this currently supported?

On Fri, Feb 17, 2012 at 1:33 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Feb 17, 2012 at 1:27 PM, Jamie Johnson <je...@gmail.com> wrote:
>> I'm seeing the following.  Do I need a _version_ long field in my schema?
>
> Yep... versions are the way we keep things sane (shuffled updates to a
> replica can be correctly reordered, etc).
>
> -Yonik
> lucidimagination.com

Re: distributed deletes working?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Feb 17, 2012 at 1:27 PM, Jamie Johnson <je...@gmail.com> wrote:
> I'm seeing the following.  Do I need a _version_ long field in my schema?

Yep... versions are the way we keep things sane (shuffled updates to a
replica can be correctly reordered, etc).

-Yonik
lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

I'm seeing the following.  Do I need a _version_ long field in my schema?

Feb 17, 2012 1:15:50 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {delete=[f2c29abe-2e48-4965-adfb-8bd611293ff0]} 0 0
Feb 17, 2012 1:15:50 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: missing _version_ on
update from leader
	at org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:707)
	at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:478)
	at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:137)
	at org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:235)
	at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:166)
	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1523)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:405)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)



On Fri, Feb 17, 2012 at 11:25 AM, Jamie Johnson <je...@gmail.com> wrote:
> I stop the indexing, stop the shard, then start indexing again.  So
> shouldn't need Yonik's latest fix?  In regards to how far out of sync,
> it's completely out of sync, meaning index 100 documents to the
> cluster (40 on shard1 60 on shard2) then stop the instance, index 100
> more, when I bring the instance back up if I issue queries to just the
> solr instance I brought up the counts are the old counts.
>
> I'll startup the same test with out using multiple cores.  Give me a
> few and I'll provide the details.
>
>
>
> On Fri, Feb 17, 2012 at 11:19 AM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Fri, Feb 17, 2012 at 11:13 AM, Mark Miller <ma...@gmail.com> wrote:
>>> When exactly is this build from?
>>
>> Yeah... I just checked in a fix yesterday dealing with sync while
>> indexing is going on.
>>
>> -Yonik
>> lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

I stop the indexing, stop the shard, then start indexing again.  So
shouldn't need Yonik's latest fix?  In regards to how far out of sync,
it's completely out of sync, meaning index 100 documents to the
cluster (40 on shard1 60 on shard2) then stop the instance, index 100
more, when I bring the instance back up if I issue queries to just the
solr instance I brought up the counts are the old counts.

I'll startup the same test with out using multiple cores.  Give me a
few and I'll provide the details.

On Fri, Feb 17, 2012 at 11:19 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Feb 17, 2012 at 11:13 AM, Mark Miller <ma...@gmail.com> wrote:
>> When exactly is this build from?
>
> Yeah... I just checked in a fix yesterday dealing with sync while
> indexing is going on.
>
> -Yonik
> lucidimagination.com

Re: distributed deletes working?

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Feb 17, 2012 at 11:13 AM, Mark Miller <ma...@gmail.com> wrote:
> When exactly is this build from?

Yeah... I just checked in a fix yesterday dealing with sync while
indexing is going on.

-Yonik
lucidimagination.com

Re: distributed deletes working?

Posted by Mark Miller <ma...@gmail.com>.

On Feb 17, 2012, at 11:03 AM, Jamie Johnson wrote:

> Thanks Sami, so long at it's expected ;)

Yeah, its expected - we always use both the live nodes info and state to determine the full state for a shard.

> 
> In regards to the replication not working the way I think it should,
> am I missing something or is it simply not working the way I think?

This should work - in fact I just did the same testing this morning.

Are you indexing while you bring the shard down and then up (it should still work fine)?
Or do you stop indexing, bring down the shard, index, bring up the shard?

How far out of sync is it?

When exactly is this build from?

> 
> On Fri, Feb 17, 2012 at 11:01 AM, Sami Siren <ss...@gmail.com> wrote:
>> On Fri, Feb 17, 2012 at 5:10 PM, Jamie Johnson <je...@gmail.com> wrote:
>>> and having looked at this closer, shouldn't the down node not be
>>> marked as active when I stop that solr instance?
>> 
>> Currently the shard state is not updated in the cloudstate when a node
>> goes down. This behavior should probably be changed at some point.
>> 
>> --
>>  Sami Siren

- Mark Miller
lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

Thanks Sami, so long at it's expected ;)

In regards to the replication not working the way I think it should,
am I missing something or is it simply not working the way I think?

On Fri, Feb 17, 2012 at 11:01 AM, Sami Siren <ss...@gmail.com> wrote:
> On Fri, Feb 17, 2012 at 5:10 PM, Jamie Johnson <je...@gmail.com> wrote:
>> and having looked at this closer, shouldn't the down node not be
>> marked as active when I stop that solr instance?
>
> Currently the shard state is not updated in the cloudstate when a node
> goes down. This behavior should probably be changed at some point.
>
> --
>  Sami Siren

Re: distributed deletes working?

Posted by Sami Siren <ss...@gmail.com>.

On Fri, Feb 17, 2012 at 5:10 PM, Jamie Johnson <je...@gmail.com> wrote:
> and having looked at this closer, shouldn't the down node not be
> marked as active when I stop that solr instance?

Currently the shard state is not updated in the cloudstate when a node
goes down. This behavior should probably be changed at some point.

--
 Sami Siren

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

and having looked at this closer, shouldn't the down node not be
marked as active when I stop that solr instance?

On Fri, Feb 17, 2012 at 10:04 AM, Jamie Johnson <je...@gmail.com> wrote:
> Thanks Mark.  I'm still seeing some issues while indexing though.  I
> have the same setup describe in my previous email.  I do some indexing
> to the cluster with everything up and everything looks good.  I then
> take down one instance which is running 2 cores (shard2 slice 1 and
> shard 1 slice 2) and do some more inserts.  I then bring this second
> instance back up expecting that the system will recover the missing
> documents from the other instance but this isn't happening.  I see the
> following log message
>
> Feb 17, 2012 9:53:11 AM org.apache.solr.cloud.RecoveryStrategy run
> INFO: Sync Recovery was succesful - registering as Active
>
> which leads me to believe things should be in sync, but they are not.
> I've made no changes to the default solrconfig.xml, not sure if I need
> to or not but it looks like everything should work now.  Am I missing
> a configuration somewhere?
>
> Initial state
>
> {"collection1":{
>    "slice1":{
>      "JamiesMac.local:8501_solr_slice1_shard1":{
>        "shard_id":"slice1",
>        "leader":"true",
>        "state":"active",
>        "core":"slice1_shard1",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8501_solr",
>        "base_url":"http://JamiesMac.local:8501/solr"},
>      "JamiesMac.local:8502_solr_slice1_shard2":{
>        "shard_id":"slice1",
>        "state":"active",
>        "core":"slice1_shard2",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8502_solr",
>        "base_url":"http://JamiesMac.local:8502/solr"}},
>    "slice2":{
>      "JamiesMac.local:8501_solr_slice2_shard2":{
>        "shard_id":"slice2",
>        "leader":"true",
>        "state":"active",
>        "core":"slice2_shard2",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8501_solr",
>        "base_url":"http://JamiesMac.local:8501/solr"},
>      "JamiesMac.local:8502_solr_slice2_shard1":{
>        "shard_id":"slice2",
>        "state":"active",
>        "core":"slice2_shard1",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8502_solr",
>        "base_url":"http://JamiesMac.local:8502/solr"}}}}
>
>
> state with 1 solr instance down
>
> {"collection1":{
>    "slice1":{
>      "JamiesMac.local:8501_solr_slice1_shard1":{
>        "shard_id":"slice1",
>        "leader":"true",
>        "state":"active",
>        "core":"slice1_shard1",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8501_solr",
>        "base_url":"http://JamiesMac.local:8501/solr"},
>      "JamiesMac.local:8502_solr_slice1_shard2":{
>        "shard_id":"slice1",
>        "state":"active",
>        "core":"slice1_shard2",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8502_solr",
>        "base_url":"http://JamiesMac.local:8502/solr"}},
>    "slice2":{
>      "JamiesMac.local:8501_solr_slice2_shard2":{
>        "shard_id":"slice2",
>        "leader":"true",
>        "state":"active",
>        "core":"slice2_shard2",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8501_solr",
>        "base_url":"http://JamiesMac.local:8501/solr"},
>      "JamiesMac.local:8502_solr_slice2_shard1":{
>        "shard_id":"slice2",
>        "state":"active",
>        "core":"slice2_shard1",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8502_solr",
>        "base_url":"http://JamiesMac.local:8502/solr"}}}}
>
> state when everything comes back up after adding documents
>
> {"collection1":{
>    "slice1":{
>      "JamiesMac.local:8501_solr_slice1_shard1":{
>        "shard_id":"slice1",
>        "leader":"true",
>        "state":"active",
>        "core":"slice1_shard1",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8501_solr",
>        "base_url":"http://JamiesMac.local:8501/solr"},
>      "JamiesMac.local:8502_solr_slice1_shard2":{
>        "shard_id":"slice1",
>        "state":"active",
>        "core":"slice1_shard2",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8502_solr",
>        "base_url":"http://JamiesMac.local:8502/solr"}},
>    "slice2":{
>      "JamiesMac.local:8501_solr_slice2_shard2":{
>        "shard_id":"slice2",
>        "leader":"true",
>        "state":"active",
>        "core":"slice2_shard2",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8501_solr",
>        "base_url":"http://JamiesMac.local:8501/solr"},
>      "JamiesMac.local:8502_solr_slice2_shard1":{
>        "shard_id":"slice2",
>        "state":"active",
>        "core":"slice2_shard1",
>        "collection":"collection1",
>        "node_name":"JamiesMac.local:8502_solr",
>        "base_url":"http://JamiesMac.local:8502/solr"}}}}
>
>
> On Thu, Feb 16, 2012 at 10:24 PM, Mark Miller <ma...@gmail.com> wrote:
>> Yup - deletes are fine.
>>
>>
>> On Thu, Feb 16, 2012 at 8:56 PM, Jamie Johnson <je...@gmail.com> wrote:
>>
>>> With solr-2358 being committed to trunk do deletes and updates get
>>> distributed/routed like adds do? Also when a down shard comes back up are
>>> the deletes/updates forwarded as well? Reading the jira I believe the
>>> answer is yes, I just want to verify before bringing the latest into my
>>> environment.
>>>
>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com

Re: distributed deletes working?

Posted by Jamie Johnson <je...@gmail.com>.

Thanks Mark.  I'm still seeing some issues while indexing though.  I
have the same setup describe in my previous email.  I do some indexing
to the cluster with everything up and everything looks good.  I then
take down one instance which is running 2 cores (shard2 slice 1 and
shard 1 slice 2) and do some more inserts.  I then bring this second
instance back up expecting that the system will recover the missing
documents from the other instance but this isn't happening.  I see the
following log message

Feb 17, 2012 9:53:11 AM org.apache.solr.cloud.RecoveryStrategy run
INFO: Sync Recovery was succesful - registering as Active

which leads me to believe things should be in sync, but they are not.
I've made no changes to the default solrconfig.xml, not sure if I need
to or not but it looks like everything should work now.  Am I missing
a configuration somewhere?

Initial state

{"collection1":{
    "slice1":{
      "JamiesMac.local:8501_solr_slice1_shard1":{
        "shard_id":"slice1",
        "leader":"true",
        "state":"active",
        "core":"slice1_shard1",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8501_solr",
        "base_url":"http://JamiesMac.local:8501/solr"},
      "JamiesMac.local:8502_solr_slice1_shard2":{
        "shard_id":"slice1",
        "state":"active",
        "core":"slice1_shard2",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8502_solr",
        "base_url":"http://JamiesMac.local:8502/solr"}},
    "slice2":{
      "JamiesMac.local:8501_solr_slice2_shard2":{
        "shard_id":"slice2",
        "leader":"true",
        "state":"active",
        "core":"slice2_shard2",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8501_solr",
        "base_url":"http://JamiesMac.local:8501/solr"},
      "JamiesMac.local:8502_solr_slice2_shard1":{
        "shard_id":"slice2",
        "state":"active",
        "core":"slice2_shard1",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8502_solr",
        "base_url":"http://JamiesMac.local:8502/solr"}}}}


state with 1 solr instance down

{"collection1":{
    "slice1":{
      "JamiesMac.local:8501_solr_slice1_shard1":{
        "shard_id":"slice1",
        "leader":"true",
        "state":"active",
        "core":"slice1_shard1",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8501_solr",
        "base_url":"http://JamiesMac.local:8501/solr"},
      "JamiesMac.local:8502_solr_slice1_shard2":{
        "shard_id":"slice1",
        "state":"active",
        "core":"slice1_shard2",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8502_solr",
        "base_url":"http://JamiesMac.local:8502/solr"}},
    "slice2":{
      "JamiesMac.local:8501_solr_slice2_shard2":{
        "shard_id":"slice2",
        "leader":"true",
        "state":"active",
        "core":"slice2_shard2",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8501_solr",
        "base_url":"http://JamiesMac.local:8501/solr"},
      "JamiesMac.local:8502_solr_slice2_shard1":{
        "shard_id":"slice2",
        "state":"active",
        "core":"slice2_shard1",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8502_solr",
        "base_url":"http://JamiesMac.local:8502/solr"}}}}

state when everything comes back up after adding documents

{"collection1":{
    "slice1":{
      "JamiesMac.local:8501_solr_slice1_shard1":{
        "shard_id":"slice1",
        "leader":"true",
        "state":"active",
        "core":"slice1_shard1",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8501_solr",
        "base_url":"http://JamiesMac.local:8501/solr"},
      "JamiesMac.local:8502_solr_slice1_shard2":{
        "shard_id":"slice1",
        "state":"active",
        "core":"slice1_shard2",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8502_solr",
        "base_url":"http://JamiesMac.local:8502/solr"}},
    "slice2":{
      "JamiesMac.local:8501_solr_slice2_shard2":{
        "shard_id":"slice2",
        "leader":"true",
        "state":"active",
        "core":"slice2_shard2",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8501_solr",
        "base_url":"http://JamiesMac.local:8501/solr"},
      "JamiesMac.local:8502_solr_slice2_shard1":{
        "shard_id":"slice2",
        "state":"active",
        "core":"slice2_shard1",
        "collection":"collection1",
        "node_name":"JamiesMac.local:8502_solr",
        "base_url":"http://JamiesMac.local:8502/solr"}}}}


On Thu, Feb 16, 2012 at 10:24 PM, Mark Miller <ma...@gmail.com> wrote:
> Yup - deletes are fine.
>
>
> On Thu, Feb 16, 2012 at 8:56 PM, Jamie Johnson <je...@gmail.com> wrote:
>
>> With solr-2358 being committed to trunk do deletes and updates get
>> distributed/routed like adds do? Also when a down shard comes back up are
>> the deletes/updates forwarded as well? Reading the jira I believe the
>> answer is yes, I just want to verify before bringing the latest into my
>> environment.
>>
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com

Re: distributed deletes working?

Posted by Mark Miller <ma...@gmail.com>.

Yup - deletes are fine.


On Thu, Feb 16, 2012 at 8:56 PM, Jamie Johnson <je...@gmail.com> wrote:

> With solr-2358 being committed to trunk do deletes and updates get
> distributed/routed like adds do? Also when a down shard comes back up are
> the deletes/updates forwarded as well? Reading the jira I believe the
> answer is yes, I just want to verify before bringing the latest into my
> environment.
>



-- 
- Mark

http://www.lucidimagination.com