You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by tedsolr <ts...@sciquest.com> on 2016/08/03 16:42:51 UTC

QParsePlugin not working on sharded collection

I'm trying to verify that a very simple custom post filter will work on a
sharded collection. So far it doesn't. Here are the search results on my
single shard test collection:

{
  "responseHeader": {
    "status": 0,
    "QTime": 17
  },
  "thecountis": "946028",
  "myvar": "hello",
  "response": {
    "numFound": 946028,
    "start": 0,
    "docs": [
...]
}

When I run against a two shard collection (same data set) it's as though the
post filter doesn't exist. The results don't include my additions to the
response:

{
  "responseHeader": {
    "status": 0,
    "QTime": 17
  },
  "response": {
    "numFound": 946028,
    "start": 0,
    "docs": [
...]
}

Here's the solconfig.xml:

...
<queryParser name="TedFilter" class="...TedPlugin" />
   <requestHandler name="/ted" class="solr.SearchHandler">
	   <lst name="appends">
			<str name="fq">{!TedFilter myvar=hello}</str>
		</lst>
   </requestHandler>
...

And here's the simplest plugin I could write:

public class TedPlugin extends QParserPlugin {
	@Override
	public void init(NamedList arg0) {
	}

	@Override
	public QParser createParser(String arg0, final SolrParams arg1, final
SolrParams arg2, final SolrQueryRequest arg3) {
		return new QParser(arg0, arg1, arg2, arg3) {

			@Override
			public Query parse() throws SyntaxError {
				return new TedQuery(arg1, arg2, arg3);
			}
		};
	}
}

public class TedQuery extends AnalyticsQuery {
	private final String myvar;

	TedQuery(SolrParams localParams, SolrParams params, SolrQueryRequest req) {
		myvar = localParams.get("myvar");
	}

	@Override
	public DelegatingCollector getAnalyticsCollector(ResponseBuilder rb,
IndexSearcher searcher) {
		return new TedCollector(myvar, rb);
	}

	@Override
	public boolean equals(Object o) {
		if (o instanceof TedQuery) {
			TedQuery tq = (TedQuery) o;
			return Objects.equals(this.myvar, tq.myvar);
		}
		return false;
	}

	@Override
	public int hashCode() {
		return myvar == null ? 1 : myvar.hashCode();
	}


	class TedCollector extends DelegatingCollector {
		ResponseBuilder rb;
		int count;
		String myvar;

		public TedCollector(String myvar, ResponseBuilder rb) {
			this.rb = rb;
			this.myvar = myvar;
		}

		@Override
		public void collect(int doc) throws IOException {
			count++;
			super.collect(doc);
		}

		@Override
		public void finish() throws IOException {
			rb.rsp.add("thecountis", String.valueOf(count));
			rb.rsp.add("myvar", myvar);

			if (super.delegate instanceof DelegatingCollector) {
				((DelegatingCollector) super.delegate).finish();
			}
		}
	}
}

What am I doing wrong? Thanks!
Ted
v5.2.1 SolrCloud mode



--
View this message in context: http://lucene.472066.n3.nabble.com/QParsePlugin-not-working-on-sharded-collection-tp4290249.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: QParsePlugin not working on sharded collection

Posted by tedsolr <ts...@sciquest.com>.
So my implementation with a DocTransformer is causing an exception (with a
sharded collection):

ERROR - 2016-08-04 09:41:44.247; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.common.SolrException;
null:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at
http://localhost:8983/solr/ShardTest1_shard1_0_replica1: parsing error
	at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:538)
	at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235)
	at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227)
	at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
	at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:218)
	at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:183)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: parsing error
	at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:52)
	at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:536)
	... 12 more
Caused by: java.io.EOFException
	at
org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:208)
	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
	at
org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:508)
	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:202)
	at
org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:390)
	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:237)
	at
org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:135)
	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:204)
	at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:126)
	at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50)
	... 13 more

Here are the changes to TedQuery (I reduced the amount of data being
returned and map the docId to the document - like the [docid] transformer,
and put the map in the request context in the finish() method)

public void collect(int doc) throws IOException {
			count++;
			if (doc % 10000 == 0) {
				mydata.put(Integer.valueOf(doc + super.docBase), String.valueOf(doc +
super.docBase));
				super.collect(doc);
			}
		}

public void finish() throws IOException {
...
rb.req.getContext().put("mystats", mydata);
...
}

Here's the transformer:

public class TedTransform extends TransformerFactory {
	@Override
	public DocTransformer create(String arg0, SolrParams arg1, SolrQueryRequest
arg2) {
		return new TedTransformer(arg0, arg2);
	}

	private class TedTransformer extends TransformerWithContext {
		private final String f;
		private HashMap<Integer, String> data;

		public TedTransformer(String f, SolrQueryRequest r) {
			this.f = f;
		}

		@Override
		public String getName() {
			return null;
		}

		@Override
		public void transform(SolrDocument arg0, int arg1) throws IOException {
			if (context.req != null) {
			if (data == null) {
				data = (HashMap<Integer, String>)
context.req.getContext().get("mystats");
			}
			arg0.setField(f, data.get(Integer.valueOf(arg1)));
			}
		}
	}
}

And I added the transformer to the solrconfig.xml:

<transformer name="TedT" class="...TedTransform" />
   <queryParser name="TedFilter" class="...TedPlugin" />
   <requestHandler name="/ted" class="solr.SearchHandler">
	   <lst name="appends">
			<str name="fq">{!TedFilter myvar=hello}</str>
			<str name="fl">[TedT]</str>
		</lst>
   </requestHandler>

Why does this barf on multi-sharded collections?



--
View this message in context: http://lucene.472066.n3.nabble.com/QParsePlugin-not-working-on-sharded-collection-tp4290249p4290390.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: QParsePlugin not working on sharded collection

Posted by tedsolr <ts...@sciquest.com>.
Thanks Erick, you answered my question by pointing out the aggregator. I
didn't realize a merge strategy was _required_ to return stats info when
there are multiple shards. I'm having trouble with my actual plugin so I've
scaled back to the simplest possible example. I'm adding to it little by
little to see what the last straw is.



--
View this message in context: http://lucene.472066.n3.nabble.com/QParsePlugin-not-working-on-sharded-collection-tp4290249p4290365.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: QParsePlugin not working on sharded collection

Posted by Erick Erickson <er...@gmail.com>.
OK, I'm going to assume that somewhere you're
keeping more complicated structures around to
track all the docs coming through the collector so
you can know whether they're duplicates or not.

I think there are really two ways (at least) to go about
it
1> use a SearchComponent to add a separate section to
the response similar to highlighting or faceting.

2> go ahead and use a DocTransformer to add the data
to each individual doc. But the example you're using adds the
data to the meta-data, not an individual doc.....


Best,
Erick

On Wed, Aug 3, 2016 at 2:03 PM, tedsolr <ts...@sciquest.com> wrote:
> So I notice if I create the simplest MergeStrategy I can get my test values
> from the shard responses and then if I add info to the SolrQueryResponse it
> gets back to the caller. I still must be missing something. I wouldn't
> expect to have different code paths - one for single shard one for multi
> shard. So if the PostFilter is restricting the documents returned, what's
> the correct way to return my analytics info? Should I not be adding data to
> the SolrQueryResponse from within the delegating collector's finish()
> method? Here's what I'm trying to do (still works fine with a single shard
> collection :)
>
> - Use the DelegatingCollector to restrict docs returned (dropping docs that
> are "duplicates" based on my critieria)
> - Calculate 2 stats for each collected doc: a count of "duplicate" docs & a
> sum on a number field from these "duplicate" docs. I am doing the math in
> the collect() method.
> - Return the stats in the response stream. I'm using a TransformerFactory
> now to inject a new field into the results for each doc. Should I be using a
> SearchComponent instead?
>
>
> Erick Erickson wrote
>> Right, I don't have the code in front of me right now, but I think
>> your issue is at the "aggregation" point. You also have to put
>> some code in the aggregation bits that pull your custom parts
>> from the sub-request packets and puts in the final packet,
>> "doing the right thing" in terms of assembling them into
>> something meaningful along the way (e.g. averaging "myvar"
>> or putting it in a list identified by shard or......).
>>
>> I think if you fire the query at one of your shards with &distrib=false
>> you'll see your additions, which would demonstrate that your
>> filter is being found. I assume your custom jar is on the shards
>> or you'd get an exception (assuming you've pushed your
>> solrconfig to ZK).
>>
>> Best,
>> Erick
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/QParsePlugin-not-working-on-sharded-collection-tp4290249p4290285.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: QParsePlugin not working on sharded collection

Posted by tedsolr <ts...@sciquest.com>.
So I notice if I create the simplest MergeStrategy I can get my test values
from the shard responses and then if I add info to the SolrQueryResponse it
gets back to the caller. I still must be missing something. I wouldn't
expect to have different code paths - one for single shard one for multi
shard. So if the PostFilter is restricting the documents returned, what's
the correct way to return my analytics info? Should I not be adding data to
the SolrQueryResponse from within the delegating collector's finish()
method? Here's what I'm trying to do (still works fine with a single shard
collection :)

- Use the DelegatingCollector to restrict docs returned (dropping docs that
are "duplicates" based on my critieria)
- Calculate 2 stats for each collected doc: a count of "duplicate" docs & a
sum on a number field from these "duplicate" docs. I am doing the math in
the collect() method.
- Return the stats in the response stream. I'm using a TransformerFactory
now to inject a new field into the results for each doc. Should I be using a
SearchComponent instead?


Erick Erickson wrote
> Right, I don't have the code in front of me right now, but I think
> your issue is at the "aggregation" point. You also have to put
> some code in the aggregation bits that pull your custom parts
> from the sub-request packets and puts in the final packet,
> "doing the right thing" in terms of assembling them into
> something meaningful along the way (e.g. averaging "myvar"
> or putting it in a list identified by shard or......).
> 
> I think if you fire the query at one of your shards with &distrib=false
> you'll see your additions, which would demonstrate that your
> filter is being found. I assume your custom jar is on the shards
> or you'd get an exception (assuming you've pushed your
> solrconfig to ZK).
> 
> Best,
> Erick





--
View this message in context: http://lucene.472066.n3.nabble.com/QParsePlugin-not-working-on-sharded-collection-tp4290249p4290285.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: QParsePlugin not working on sharded collection

Posted by Erick Erickson <er...@gmail.com>.
Right, I don't have the code in front of me right now, but I think
your issue is at the "aggregation" point. You also have to put
some code in the aggregation bits that pull your custom parts
from the sub-request packets and puts in the final packet,
"doing the right thing" in terms of assembling them into
something meaningful along the way (e.g. averaging "myvar"
or putting it in a list identified by shard or......).

I think if you fire the query at one of your shards with &distrib=false
you'll see your additions, which would demonstrate that your
filter is being found. I assume your custom jar is on the shards
or you'd get an exception (assuming you've pushed your
solrconfig to ZK).

Best,
Erick

On Wed, Aug 3, 2016 at 9:42 AM, tedsolr <ts...@sciquest.com> wrote:
> I'm trying to verify that a very simple custom post filter will work on a
> sharded collection. So far it doesn't. Here are the search results on my
> single shard test collection:
>
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 17
>   },
>   "thecountis": "946028",
>   "myvar": "hello",
>   "response": {
>     "numFound": 946028,
>     "start": 0,
>     "docs": [
> ...]
> }
>
> When I run against a two shard collection (same data set) it's as though the
> post filter doesn't exist. The results don't include my additions to the
> response:
>
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 17
>   },
>   "response": {
>     "numFound": 946028,
>     "start": 0,
>     "docs": [
> ...]
> }
>
> Here's the solconfig.xml:
>
> ...
> <queryParser name="TedFilter" class="...TedPlugin" />
>    <requestHandler name="/ted" class="solr.SearchHandler">
>            <lst name="appends">
>                         <str name="fq">{!TedFilter myvar=hello}</str>
>                 </lst>
>    </requestHandler>
> ...
>
> And here's the simplest plugin I could write:
>
> public class TedPlugin extends QParserPlugin {
>         @Override
>         public void init(NamedList arg0) {
>         }
>
>         @Override
>         public QParser createParser(String arg0, final SolrParams arg1, final
> SolrParams arg2, final SolrQueryRequest arg3) {
>                 return new QParser(arg0, arg1, arg2, arg3) {
>
>                         @Override
>                         public Query parse() throws SyntaxError {
>                                 return new TedQuery(arg1, arg2, arg3);
>                         }
>                 };
>         }
> }
>
> public class TedQuery extends AnalyticsQuery {
>         private final String myvar;
>
>         TedQuery(SolrParams localParams, SolrParams params, SolrQueryRequest req) {
>                 myvar = localParams.get("myvar");
>         }
>
>         @Override
>         public DelegatingCollector getAnalyticsCollector(ResponseBuilder rb,
> IndexSearcher searcher) {
>                 return new TedCollector(myvar, rb);
>         }
>
>         @Override
>         public boolean equals(Object o) {
>                 if (o instanceof TedQuery) {
>                         TedQuery tq = (TedQuery) o;
>                         return Objects.equals(this.myvar, tq.myvar);
>                 }
>                 return false;
>         }
>
>         @Override
>         public int hashCode() {
>                 return myvar == null ? 1 : myvar.hashCode();
>         }
>
>
>         class TedCollector extends DelegatingCollector {
>                 ResponseBuilder rb;
>                 int count;
>                 String myvar;
>
>                 public TedCollector(String myvar, ResponseBuilder rb) {
>                         this.rb = rb;
>                         this.myvar = myvar;
>                 }
>
>                 @Override
>                 public void collect(int doc) throws IOException {
>                         count++;
>                         super.collect(doc);
>                 }
>
>                 @Override
>                 public void finish() throws IOException {
>                         rb.rsp.add("thecountis", String.valueOf(count));
>                         rb.rsp.add("myvar", myvar);
>
>                         if (super.delegate instanceof DelegatingCollector) {
>                                 ((DelegatingCollector) super.delegate).finish();
>                         }
>                 }
>         }
> }
>
> What am I doing wrong? Thanks!
> Ted
> v5.2.1 SolrCloud mode
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/QParsePlugin-not-working-on-sharded-collection-tp4290249.html
> Sent from the Solr - User mailing list archive at Nabble.com.