You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Gus Heck <gu...@gmail.com> on 2019/04/22 15:42:47 UTC

Indexing continue while zk server stopped?

BasicZkTest has the following bit of code, that I'm tripping on.

    zkServer.shutdown();

    *// document indexing shouldn't stop immediately after a ZK disconnect*
*    assertU(adoc("id", "201"));*

    Thread.sleep(300);

    // try a reconnect from disconnect
    zkServer = new ZkTestServer(zkDir, zkPort);
    zkServer.run(false);

It's not entirely clear to me that this should always be true.
ZkStateReader has means to cache and watch various bits of information, but
if it hasn't done the caching yet it may need to talk to zk before
completing the request. I am trying to use Collection Properties as an
alternative location for looking up the routed alias for a collection.
Current code uses a core property, but this is inconvenient for testing as
it can't be altered in the test... or at least I didn't find a way to alter
it. Also, future features such as archiving older collections from a TRA,
might find it useful to be able to disconnect the older collections from
the alias, but right now that would require finding all cores and editing
properties for all of them...

However BasicZkTest fails on this assert, because the fetching of
properties fails, throwing an exception.

So is this assert really reasonable? It kind of feels unreasonable but I'd
like some background from other folks here...
https://issues.apache.org/jira/browse/SOLR-7819 seems to have discussed
this some but The more I think about it, the more I'm convinced that
proceeding without zookeeper available seems dangerous. Any update sent to
an alias (TRA/CRA or regular) will need to check zookeeper for example....
Also security.json is in zookeeper, so anyone running with security on
probably tries to hit zookeeper on a cache miss too

I guess it comes down to the question of whether or not solr cloud should
work while zookeeper is down/unavail or not. This is the first I've run
into the notion that the answer might be yes. I'd always presumed that if
Zk went away all bets were off, because ZK is what makes a cloud out of us.

What I don't know is what existing use cases/installs might find this
assert critical (most of the above bug talked about LIR, and the comment on
the commit mentions leader election)

Thoughts?

-Gus

Re: Indexing continue while zk server stopped?

Posted by Erick Erickson <er...@gmail.com>.
+1 to changing the test if you think this is bogus, we have enough test failures we don’t need more. Although that _specific_ test doesn’t seem to fail all that regularly…



> On Apr 26, 2019, at 8:53 AM, Gus Heck <gu...@gmail.com> wrote:
> 
> assertU(adoc("id", "201"));


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Indexing continue while zk server stopped?

Posted by Gus Heck <gu...@gmail.com>.
Issue resolved itself without change to the test when I realized it might
be important to cache collection properties, though that's now under
discussion...  (see https://issues.apache.org/jira/browse/SOLR-13418 /
https://issues.apache.org/jira/browse/SOLR-13420 if you're interested)

Might still be good to change the test though since it seems to promise
more than can be delivered?



On Fri, Apr 26, 2019 at 10:46 AM David Smiley <da...@gmail.com>
wrote:

> I agree with Erick's response, and thus the test/assertion seems
> unreasonable.
>
> If ZK is down, all bets are off on indexing proceeding.  In practice,
> people expect searches to continue for some time at least.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Apr 22, 2019 at 1:54 PM Erick Erickson <er...@gmail.com>
> wrote:
>
>> On the surface, I’m automatically suspicious of _anything_ that relies on
>> an arbitrary wait period for a state to settle down. Would this 300ms sleep
>> be adequate on a very fast machine running just one test?
>>
>> I don’t see the value that assert anyway. I can’t come up with a use-case
>> for a running Solr functioning incorrectly because it failed to update a
>> document while ZooKeeper was shutting down.
>>
>> FWIW
>> Erick
>>
>> > On Apr 22, 2019, at 8:42 AM, Gus Heck <gu...@gmail.com> wrote:
>> >
>> > BasicZkTest has the following bit of code, that I'm tripping on.
>> >
>> >     zkServer.shutdown();
>> >
>> >     // document indexing shouldn't stop immediately after a ZK
>> disconnect
>> >     assertU(adoc("id", "201"));
>> >
>> >     Thread.sleep(300);
>> >
>> >     // try a reconnect from disconnect
>> >     zkServer = new ZkTestServer(zkDir, zkPort);
>> >     zkServer.run(false);
>> >
>> > It's not entirely clear to me that this should always be true.
>> ZkStateReader has means to cache and watch various bits of information, but
>> if it hasn't done the caching yet it may need to talk to zk before
>> completing the request. I am trying to use Collection Properties as an
>> alternative location for looking up the routed alias for a collection.
>> Current code uses a core property, but this is inconvenient for testing as
>> it can't be altered in the test... or at least I didn't find a way to alter
>> it. Also, future features such as archiving older collections from a TRA,
>> might find it useful to be able to disconnect the older collections from
>> the alias, but right now that would require finding all cores and editing
>> properties for all of them...
>> >
>> > However BasicZkTest fails on this assert, because the fetching of
>> properties fails, throwing an exception.
>> >
>> > So is this assert really reasonable? It kind of feels unreasonable but
>> I'd like some background from other folks here...
>> https://issues.apache.org/jira/browse/SOLR-7819 seems to have discussed
>> this some but The more I think about it, the more I'm convinced that
>> proceeding without zookeeper available seems dangerous. Any update sent to
>> an alias (TRA/CRA or regular) will need to check zookeeper for example....
>> Also security.json is in zookeeper, so anyone running with security on
>> probably tries to hit zookeeper on a cache miss too
>> >
>> > I guess it comes down to the question of whether or not solr cloud
>> should work while zookeeper is down/unavail or not. This is the first I've
>> run into the notion that the answer might be yes. I'd always presumed that
>> if Zk went away all bets were off, because ZK is what makes a cloud out of
>> us.
>> >
>> > What I don't know is what existing use cases/installs might find this
>> assert critical (most of the above bug talked about LIR, and the comment on
>> the commit mentions leader election)
>> >
>> > Thoughts?
>> >
>> > -Gus
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Indexing continue while zk server stopped?

Posted by David Smiley <da...@gmail.com>.
I agree with Erick's response, and thus the test/assertion seems
unreasonable.

If ZK is down, all bets are off on indexing proceeding.  In practice,
people expect searches to continue for some time at least.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Apr 22, 2019 at 1:54 PM Erick Erickson <er...@gmail.com>
wrote:

> On the surface, I’m automatically suspicious of _anything_ that relies on
> an arbitrary wait period for a state to settle down. Would this 300ms sleep
> be adequate on a very fast machine running just one test?
>
> I don’t see the value that assert anyway. I can’t come up with a use-case
> for a running Solr functioning incorrectly because it failed to update a
> document while ZooKeeper was shutting down.
>
> FWIW
> Erick
>
> > On Apr 22, 2019, at 8:42 AM, Gus Heck <gu...@gmail.com> wrote:
> >
> > BasicZkTest has the following bit of code, that I'm tripping on.
> >
> >     zkServer.shutdown();
> >
> >     // document indexing shouldn't stop immediately after a ZK disconnect
> >     assertU(adoc("id", "201"));
> >
> >     Thread.sleep(300);
> >
> >     // try a reconnect from disconnect
> >     zkServer = new ZkTestServer(zkDir, zkPort);
> >     zkServer.run(false);
> >
> > It's not entirely clear to me that this should always be true.
> ZkStateReader has means to cache and watch various bits of information, but
> if it hasn't done the caching yet it may need to talk to zk before
> completing the request. I am trying to use Collection Properties as an
> alternative location for looking up the routed alias for a collection.
> Current code uses a core property, but this is inconvenient for testing as
> it can't be altered in the test... or at least I didn't find a way to alter
> it. Also, future features such as archiving older collections from a TRA,
> might find it useful to be able to disconnect the older collections from
> the alias, but right now that would require finding all cores and editing
> properties for all of them...
> >
> > However BasicZkTest fails on this assert, because the fetching of
> properties fails, throwing an exception.
> >
> > So is this assert really reasonable? It kind of feels unreasonable but
> I'd like some background from other folks here...
> https://issues.apache.org/jira/browse/SOLR-7819 seems to have discussed
> this some but The more I think about it, the more I'm convinced that
> proceeding without zookeeper available seems dangerous. Any update sent to
> an alias (TRA/CRA or regular) will need to check zookeeper for example....
> Also security.json is in zookeeper, so anyone running with security on
> probably tries to hit zookeeper on a cache miss too
> >
> > I guess it comes down to the question of whether or not solr cloud
> should work while zookeeper is down/unavail or not. This is the first I've
> run into the notion that the answer might be yes. I'd always presumed that
> if Zk went away all bets were off, because ZK is what makes a cloud out of
> us.
> >
> > What I don't know is what existing use cases/installs might find this
> assert critical (most of the above bug talked about LIR, and the comment on
> the commit mentions leader election)
> >
> > Thoughts?
> >
> > -Gus
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Indexing continue while zk server stopped?

Posted by Erick Erickson <er...@gmail.com>.
On the surface, I’m automatically suspicious of _anything_ that relies on an arbitrary wait period for a state to settle down. Would this 300ms sleep be adequate on a very fast machine running just one test?

I don’t see the value that assert anyway. I can’t come up with a use-case for a running Solr functioning incorrectly because it failed to update a document while ZooKeeper was shutting down.

FWIW
Erick

> On Apr 22, 2019, at 8:42 AM, Gus Heck <gu...@gmail.com> wrote:
> 
> BasicZkTest has the following bit of code, that I'm tripping on. 
> 
>     zkServer.shutdown();
> 
>     // document indexing shouldn't stop immediately after a ZK disconnect
>     assertU(adoc("id", "201"));
> 
>     Thread.sleep(300);
>     
>     // try a reconnect from disconnect
>     zkServer = new ZkTestServer(zkDir, zkPort);
>     zkServer.run(false);
> 
> It's not entirely clear to me that this should always be true. ZkStateReader has means to cache and watch various bits of information, but if it hasn't done the caching yet it may need to talk to zk before completing the request. I am trying to use Collection Properties as an alternative location for looking up the routed alias for a collection. Current code uses a core property, but this is inconvenient for testing as it can't be altered in the test... or at least I didn't find a way to alter it. Also, future features such as archiving older collections from a TRA, might find it useful to be able to disconnect the older collections from the alias, but right now that would require finding all cores and editing properties for all of them...  
> 
> However BasicZkTest fails on this assert, because the fetching of properties fails, throwing an exception. 
> 
> So is this assert really reasonable? It kind of feels unreasonable but I'd like some background from other folks here... https://issues.apache.org/jira/browse/SOLR-7819 seems to have discussed this some but The more I think about it, the more I'm convinced that proceeding without zookeeper available seems dangerous. Any update sent to an alias (TRA/CRA or regular) will need to check zookeeper for example.... Also security.json is in zookeeper, so anyone running with security on probably tries to hit zookeeper on a cache miss too
> 
> I guess it comes down to the question of whether or not solr cloud should work while zookeeper is down/unavail or not. This is the first I've run into the notion that the answer might be yes. I'd always presumed that if Zk went away all bets were off, because ZK is what makes a cloud out of us.
> 
> What I don't know is what existing use cases/installs might find this assert critical (most of the above bug talked about LIR, and the comment on the commit mentions leader election)
> 
> Thoughts?
> 
> -Gus


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org