You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dwane Hall <dw...@hotmail.com> on 2018/09/12 11:47:45 UTC

switch query parser and solr cloud

Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.

I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.

Essentially I want the following to occur:

-A user has the option to pass through an optional request parameter "allResults" to solr
-If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
-If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})

Environment
Solr 7.3.1 (1 solr node DEV, 4 solr nodes PTST)
4 shard collection

My Implementation
I'm using the switch query parser to choose client behaviour by appending a filter query to the user request very similar to what is documented in the solr reference guide here (https://lucene.apache.org/solr/guide/7_4/other-parsers.html#switch-query-parser)

The request uses the params api (pertinent line below is the _appends_ filter queries)
(useParams=firstParams,secondParams)

  "set":{
    "firstParams":{
        "op":"AND",
        "wt":"json",
        "start":0,
        "allResults":"false",
        "fl":"FIELD_1,FIELD_2,SUMMARY_FIELD",
      "_appends_":{
        "fq":"{!switch default=\"{!collapse field=SUMMARY_FIELD}\" case.true=*:* v=${allResults}}",
      },
      "_invariants_":{
        "deftype":"edismax",
        "timeAllowed":20000,
        "rows":"30",
        "echoParams":"none",
        }
      }
   }

   "set":{
    "secondParams":{
        "df":"FIELD_1",
        "q":"{!edismax v=${searchString} df=FIELD_1 q.op=${op}}",
      "_invariants_":{
        "qf":"FIELD_1,FIELD_2,SUMMARY_FIELD",
        }
      }
   }}

Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.

Does anyone know if the switch query parser has any limitations in a sharded solr cloud environment or can provide any possible troubleshooting advice?

Any community recommendations would be greatly appreciated

Solr stack trace
2018-09-12 12:16:12,918 4064160860 ERROR : [c:my_collection s:shard1 r:core_node3 x:my_collection_ptst_shard1_replica_n1] org.apache.solr.common.SolrException : org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2: java.lang.NullPointerException
        at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
        at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
        at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748

Thanks for taking the time to assist,

Dwane

Re: switch query parser and solr cloud

Posted by Dwane Hall <dw...@hotmail.com>.
Afternoon all,

Just to add some closure to this topic in case anybody else stumbles across a similar problem I've managed to resolve my issue by removing the switch query parser from the _appends_ component of the parameter set.

so the parameter set changes from this

 "set":{
    "firstParams":{
        "op":"AND",
        "wt":"json",
        "start":0,
        "allResults":"false",
        "fl":"FIELD_1,FIELD_2,SUMMARY_FIELD",
      "_appends_":{
        "fq":"{!switch default=\"{!collapse field=SUMMARY_FIELD}\" case.true=*:* v=${allResults}}",
      },

to just a regular old filter query

 "set":{
    "firstParams":{
        "op":"AND",
        "wt":"json",
        "start":0,
        "allResults":"false",
        "fl":"FIELD_1,FIELD_2,SUMMARY_FIELD",
        "fq":"{!switch default=\"{!collapse field=SUMMARY_FIELD}\" case.true=*:* v=${allResults}}",

Somewhat odd.

Thanks again to Erick and Shawn for taking the time to assist and talk this through.

Dwane
________________________________
From: Dwane Hall <dw...@hotmail.com>
Sent: Thursday, 13 September 2018 6:42 AM
To: Erick Erickson; solr-user@lucene.apache.org
Subject: Re: switch query parser and solr cloud

Thanks for the suggestions and responses Erick and Shawn.  Erick I only return 30 records irrespective of the query (not the entire payload) I removed some of my configuration settings for readability. The parameter "allResults" was a little misleading I apologise for that but I appreciate your input.

Shawn thanks for your comments. Regarding the switch query parser the Hossman has a great description of its use and application here (https://lucidworks.com/2013/02/20/custom-solr-request-params/).  PTST is just our performance testing environment and is not important in the context of the question other than it being a multi node solr environment.  The server side error was the null pointer which is why I was having a few difficulties debugging it as there was not a lot of info to troubleshoot.  I'll keep playing and explore the client filter option for addressing this issue.

Thanks again for both of your input

Cheers,

Dwane
________________________________
From: Erick Erickson <er...@gmail.com>
Sent: Thursday, 13 September 2018 12:20 AM
To: solr-user
Subject: Re: switch query parser and solr cloud

You will run into significant problems if, when returning "all
results", you return large result sets. For regular queries I like to
limit the return to 100, although 1,000 is sometimes OK.

Millions will blow you out of the water, use CursorMark or Streaming
for very large result sets. CursorMark gets you a page at a time, but
efficiently and Streaming doesn't consume huge amounts of memory.

And assuming you could possible return 1M rows, say, what would the
user do with it? Displaying in a browser is problematic for instance.

Best,
Erick
On Wed, Sep 12, 2018 at 5:54 AM Shawn Heisey <ap...@elyograg.org> wrote:
>
> On 9/12/2018 5:47 AM, Dwane Hall wrote:
> > Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.
> >
> > I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.
> >
> > Essentially I want the following to occur:
> >
> > -A user has the option to pass through an optional request parameter "allResults" to solr
> > -If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
> > -If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})
>
> I'm looking at the documentation for the switch parser and I'm having
> difficulty figuring out what it actually does.
>
> This is the kind of thing that is better to handle in your client
> instead of asking Solr to do it for you.  You'd have to have your code
> construct the complex localparam for the switch parser ... it would be
> much easier to write code to insert your special collapse filter when it
> is required.
>
> > Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.
>
> Some features break down when working with sharded indexes.  This is one
> of the reasons that sharding should only be done when it is absolutely
> required.  A single-shard index tends to perform better anyway, unless
> it's really really huge.
>
> The error is a remote exception, from
> https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which
> suggests that maybe not all your documents are co-located on the same
> shard the way you think they are.  Is this a remote server/shard?  I am
> completely guessing here.  It's always possible that you've encountered
> a bug.  Does this one (not fixed) look like it might apply?
>
> https://issues.apache.org/jira/browse/SOLR-9104
>
> There should be a server-side error logged by the Solr instance running
> on myserver:1234 as well.  Have you looked at that?
>
> I do not know what PTST means.  Is that important for me to understand?
>
> Thanks,
> Shawn
>

Re: switch query parser and solr cloud

Posted by Dwane Hall <dw...@hotmail.com>.
Thanks for the suggestions and responses Erick and Shawn.  Erick I only return 30 records irrespective of the query (not the entire payload) I removed some of my configuration settings for readability. The parameter "allResults" was a little misleading I apologise for that but I appreciate your input.

Shawn thanks for your comments. Regarding the switch query parser the Hossman has a great description of its use and application here (https://lucidworks.com/2013/02/20/custom-solr-request-params/).  PTST is just our performance testing environment and is not important in the context of the question other than it being a multi node solr environment.  The server side error was the null pointer which is why I was having a few difficulties debugging it as there was not a lot of info to troubleshoot.  I'll keep playing and explore the client filter option for addressing this issue.

Thanks again for both of your input

Cheers,

Dwane
________________________________
From: Erick Erickson <er...@gmail.com>
Sent: Thursday, 13 September 2018 12:20 AM
To: solr-user
Subject: Re: switch query parser and solr cloud

You will run into significant problems if, when returning "all
results", you return large result sets. For regular queries I like to
limit the return to 100, although 1,000 is sometimes OK.

Millions will blow you out of the water, use CursorMark or Streaming
for very large result sets. CursorMark gets you a page at a time, but
efficiently and Streaming doesn't consume huge amounts of memory.

And assuming you could possible return 1M rows, say, what would the
user do with it? Displaying in a browser is problematic for instance.

Best,
Erick
On Wed, Sep 12, 2018 at 5:54 AM Shawn Heisey <ap...@elyograg.org> wrote:
>
> On 9/12/2018 5:47 AM, Dwane Hall wrote:
> > Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.
> >
> > I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.
> >
> > Essentially I want the following to occur:
> >
> > -A user has the option to pass through an optional request parameter "allResults" to solr
> > -If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
> > -If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})
>
> I'm looking at the documentation for the switch parser and I'm having
> difficulty figuring out what it actually does.
>
> This is the kind of thing that is better to handle in your client
> instead of asking Solr to do it for you.  You'd have to have your code
> construct the complex localparam for the switch parser ... it would be
> much easier to write code to insert your special collapse filter when it
> is required.
>
> > Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.
>
> Some features break down when working with sharded indexes.  This is one
> of the reasons that sharding should only be done when it is absolutely
> required.  A single-shard index tends to perform better anyway, unless
> it's really really huge.
>
> The error is a remote exception, from
> https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which
> suggests that maybe not all your documents are co-located on the same
> shard the way you think they are.  Is this a remote server/shard?  I am
> completely guessing here.  It's always possible that you've encountered
> a bug.  Does this one (not fixed) look like it might apply?
>
> https://issues.apache.org/jira/browse/SOLR-9104
>
> There should be a server-side error logged by the Solr instance running
> on myserver:1234 as well.  Have you looked at that?
>
> I do not know what PTST means.  Is that important for me to understand?
>
> Thanks,
> Shawn
>

Re: switch query parser and solr cloud

Posted by Erick Erickson <er...@gmail.com>.
You will run into significant problems if, when returning "all
results", you return large result sets. For regular queries I like to
limit the return to 100, although 1,000 is sometimes OK.

Millions will blow you out of the water, use CursorMark or Streaming
for very large result sets. CursorMark gets you a page at a time, but
efficiently and Streaming doesn't consume huge amounts of memory.

And assuming you could possible return 1M rows, say, what would the
user do with it? Displaying in a browser is problematic for instance.

Best,
Erick
On Wed, Sep 12, 2018 at 5:54 AM Shawn Heisey <ap...@elyograg.org> wrote:
>
> On 9/12/2018 5:47 AM, Dwane Hall wrote:
> > Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.
> >
> > I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.
> >
> > Essentially I want the following to occur:
> >
> > -A user has the option to pass through an optional request parameter "allResults" to solr
> > -If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
> > -If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})
>
> I'm looking at the documentation for the switch parser and I'm having
> difficulty figuring out what it actually does.
>
> This is the kind of thing that is better to handle in your client
> instead of asking Solr to do it for you.  You'd have to have your code
> construct the complex localparam for the switch parser ... it would be
> much easier to write code to insert your special collapse filter when it
> is required.
>
> > Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.
>
> Some features break down when working with sharded indexes.  This is one
> of the reasons that sharding should only be done when it is absolutely
> required.  A single-shard index tends to perform better anyway, unless
> it's really really huge.
>
> The error is a remote exception, from
> https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which
> suggests that maybe not all your documents are co-located on the same
> shard the way you think they are.  Is this a remote server/shard?  I am
> completely guessing here.  It's always possible that you've encountered
> a bug.  Does this one (not fixed) look like it might apply?
>
> https://issues.apache.org/jira/browse/SOLR-9104
>
> There should be a server-side error logged by the Solr instance running
> on myserver:1234 as well.  Have you looked at that?
>
> I do not know what PTST means.  Is that important for me to understand?
>
> Thanks,
> Shawn
>

Re: switch query parser and solr cloud

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/12/2018 5:47 AM, Dwane Hall wrote:
> Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.
>
> I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.
>
> Essentially I want the following to occur:
>
> -A user has the option to pass through an optional request parameter "allResults" to solr
> -If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
> -If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})

I'm looking at the documentation for the switch parser and I'm having 
difficulty figuring out what it actually does.

This is the kind of thing that is better to handle in your client 
instead of asking Solr to do it for you.  You'd have to have your code 
construct the complex localparam for the switch parser ... it would be 
much easier to write code to insert your special collapse filter when it 
is required.

> Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.

Some features break down when working with sharded indexes.  This is one 
of the reasons that sharding should only be done when it is absolutely 
required.  A single-shard index tends to perform better anyway, unless 
it's really really huge.

The error is a remote exception, from 
https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which 
suggests that maybe not all your documents are co-located on the same 
shard the way you think they are.  Is this a remote server/shard?  I am 
completely guessing here.  It's always possible that you've encountered 
a bug.  Does this one (not fixed) look like it might apply?

https://issues.apache.org/jira/browse/SOLR-9104

There should be a server-side error logged by the Solr instance running 
on myserver:1234 as well.  Have you looked at that?

I do not know what PTST means.  Is that important for me to understand?

Thanks,
Shawn