You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by tedsolr <ts...@sciquest.com> on 2016/07/27 21:36:14 UTC

AnalyticsQuery fails on a sharded collection

I'm looking to create a merge strategy for a custom QParserPlugin I have. The
plugin works fine on collections with one shard. I was very surprised to see
it throw an exception when I ran it against a sharded collection. So my
question is a bit of a shot in the dark. I'll first note that the
CollapsingQParserPlugin included with Solr works as expected on my test
collection with two shards.

The NPE occurs in my DelegatingCollector's finish() method as it's setting
the next doc base. It appears I have a null LeafReaderContext. Without
knowing anything about my code, what is it about multiple shards that might
throw off a collector like this?

thanks!
v5.2.1 



--
View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AnalyticsQuery fails on a sharded collection

Posted by Joel Bernstein <jo...@gmail.com>.

Yes the AnalyticsQuery is being called twice in the logs, which is not a
good thing. Originally I believe this was not the case but changes in the
QueryComponent in later release have caused this to happen. The test cases
aren't broken by this so it didn't get caught.

The actual merge of the results from the AnalyticsQuery, which is done in
the MergeStrategy, will only happen on the first stage. In the second stage
the results from the Analytics query should be ignored. As a work around
for the double call to the AnalyticsQuery you can look for the "ids" param
in your Analytics query and skip gathering the analytics if it's present.
The ids param is sent in the second phase of a distributed search.

What you're running into here is that the MergeStrategy is not really in
use in combination with the AnalyticsQuery. There are users that use the
MergeStrategy to handle custom merging of documents to produce custom
rankings. But the AnalyticsQuery really hasn't been used much with the
MergeStrategy that I'm aware of. So this has not been reported before.

I have moved away from using the MergeStrategy for merging custom
analytics. I'll give you a little context for how this has evolved.

The MergeStrategy was originally introduced for an e-commerce customer that
wanted to produce custom rankings. As part of that work the AnalyticsQuery
was added to support custom analytics. And the MergeStrategy supported that
as well.

Later, Streaming Expressions were added which took control of the merge in
a much more elegant way then the MergeStrategy. So now there are features
in Solr that nicely combine an AnalyticsQuery which is merged through the
Streaming Expression framework. The FeatureSelectionStream and the
TextLogitStream use this approach. These two streams are in master and
branch_6x if you want to see how they operate.



















Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Aug 11, 2016 at 10:29 AM, tedsolr <ts...@sciquest.com> wrote:

> OK, some more info ... it's not aggregating because the doc values it's
> using
> for grouping are the unique ID field's. There are some big differences in
> the whole flow between searches against a single shard collection, and
> searches against a multi-shard collection. In a single shard collection the
> AnalyticsQuery is called one time, and there's only one pass through the
> delegating collector. If someone could explain what's going on in a
> multi-sharded search that would help a lot I think. My test collection has
> two shards each one has a replica.
>
> For this search
> .../aggr?q=*:*&fl=VENDOR_NAME&sort=VENDOR_NAME+asc
> The user has selected just one field to view, so I make VENDOR_NAME the
> group by field.
>
> This is what I see while debugging:
> 1. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME
> +
> [AggregationStats]
> 2. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
> [AggregationStats]
> 3. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
> [AggregationStats]
> 4. getAnalyticsCollector() is called (fl is id + [AggregationStats])
> 5. getAnalyticsCollector() is called again (fl is id + [AggregationStats])
> 6. custom DelegatingCollector finish() is called
> 7. custom DelegatingCollector finish() is called
> 8. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME
> +
> [AggregationStats] + id +  [AggregationStats]
> 9. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME
> +
> [AggregationStats] + id +  [AggregationStats]
>
> And from the log:
>
> INFO  - 2016-08-11 09:19:47.245; [ShardTest1 shard1_1 core_node4
> ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_1_replica2/&rows=10&version=2&q=*:*&NOW=
> 1470925120206&isShard=true&wt=javabin&_=1470925120222}
> hits=12096 status=0 QTime=64734
>
> INFO  - 2016-08-11 09:19:48.876; [ShardTest1 shard1_0 core_node3
> ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_0_replica2/&rows=10&version=2&q=*:*&NOW=
> 1470925120206&isShard=true&wt=javabin&_=1470925120222}
> hits=12062 status=0 QTime=66365
>
> INFO  - 2016-08-11 09:19:50.952; [ShardTest1 shard1_1 core_node4
> ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=VENDOR_NAME&fl=[AggregationStats]&fl=id&
> shards.purpose=64&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_1_replica2/&version=2&q=*:*&NOW=
> 1470925120206&ids=100713,940122,44812,210965,584851&
> isShard=true&wt=javabin&_=1470925120222}
> status=0 QTime=2070
>
> INFO  - 2016-08-11 09:19:53.176; [ShardTest1 shard1_0 core_node3
> ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=VENDOR_NAME&fl=[AggregationStats]&fl=id&
> shards.purpose=64&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_0_replica2/&version=2&q=*:*&NOW=
> 1470925120206&ids=533737,44864,100672,940123,96752&
> isShard=true&wt=javabin&_=1470925120222}
> status=0 QTime=4293
>
> INFO  - 2016-08-11 09:19:53.178; [ShardTest1 shard1_0 core_node3
> ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
> params={q=*:*&indent=true&fl=VENDOR_NAME&sort=VENDOR_NAME+
> asc&wt=json&_=1470925120222}
> hits=24158 status=0 QTime=72972
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-
> tp4289274p4291301.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: AnalyticsQuery fails on a sharded collection

Posted by tedsolr <ts...@sciquest.com>.

OK, some more info ... it's not aggregating because the doc values it's using
for grouping are the unique ID field's. There are some big differences in
the whole flow between searches against a single shard collection, and
searches against a multi-shard collection. In a single shard collection the
AnalyticsQuery is called one time, and there's only one pass through the
delegating collector. If someone could explain what's going on in a
multi-sharded search that would help a lot I think. My test collection has
two shards each one has a replica.

For this search
.../aggr?q=*:*&fl=VENDOR_NAME&sort=VENDOR_NAME+asc 
The user has selected just one field to view, so I make VENDOR_NAME the
group by field.

This is what I see while debugging:
1. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME +
[AggregationStats]
2. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
[AggregationStats]
3. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
[AggregationStats]
4. getAnalyticsCollector() is called (fl is id + [AggregationStats])
5. getAnalyticsCollector() is called again (fl is id + [AggregationStats])
6. custom DelegatingCollector finish() is called
7. custom DelegatingCollector finish() is called
8. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME +
[AggregationStats] + id +  [AggregationStats]
9. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME +
[AggregationStats] + id +  [AggregationStats]

And from the log:

INFO  - 2016-08-11 09:19:47.245; [ShardTest1 shard1_1 core_node4
ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ShardTest1_shard1_1_replica2/&rows=10&version=2&q=*:*&NOW=1470925120206&isShard=true&wt=javabin&_=1470925120222}
hits=12096 status=0 QTime=64734 

INFO  - 2016-08-11 09:19:48.876; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/&rows=10&version=2&q=*:*&NOW=1470925120206&isShard=true&wt=javabin&_=1470925120222}
hits=12062 status=0 QTime=66365 

INFO  - 2016-08-11 09:19:50.952; [ShardTest1 shard1_1 core_node4
ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
params={distrib=false&qt=/aggr&fl=VENDOR_NAME&fl=[AggregationStats]&fl=id&shards.purpose=64&fq={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ShardTest1_shard1_1_replica2/&version=2&q=*:*&NOW=1470925120206&ids=100713,940122,44812,210965,584851&isShard=true&wt=javabin&_=1470925120222}
status=0 QTime=2070 

INFO  - 2016-08-11 09:19:53.176; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
params={distrib=false&qt=/aggr&fl=VENDOR_NAME&fl=[AggregationStats]&fl=id&shards.purpose=64&fq={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/&version=2&q=*:*&NOW=1470925120206&ids=533737,44864,100672,940123,96752&isShard=true&wt=javabin&_=1470925120222}
status=0 QTime=4293 

INFO  - 2016-08-11 09:19:53.178; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
params={q=*:*&indent=true&fl=VENDOR_NAME&sort=VENDOR_NAME+asc&wt=json&_=1470925120222}
hits=24158 status=0 QTime=72972 




--
View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291301.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AnalyticsQuery fails on a sharded collection

Posted by tedsolr <ts...@sciquest.com>.

Quick update: the NPE was related to the way in which I passed params into
the Query via solrconfig.xml. It works fine for single sharded, but
something about it was masking the unique ID field in a multisharded
environment. Anyway, I was able to fix that by cleaning up the request
handler config:

<requestHandler name="/aggr" class="solr.SearchHandler">
		<lst name="appends">
			<str name="fq">{!AggregationPostFilter count=Count
spend=INVOICE_AMOUNT}</str>
			<str name="fl">[AggregationStats]</str>
		</lst>
   </requestHandler>

Now my post filter completes without errors (!) but it doesn't work - it
returns every single document specified by the query (q) param. It isn't
aggregating. (Broken record) It still works correctly on a single shard
collection. With this query, it should do exactly what the collapsing filter
does (and yes, that works perfectly):

.../aggr?q=*:*&fl=VENDOR_NAME&sort=VENDOR_NAME+asc



--
View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291190.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AnalyticsQuery fails on a sharded collection

Posted by tedsolr <ts...@sciquest.com>.

I still haven't found the reason for the NPE in my post filter when it runs
against a sharded collection, so I'm posting my code in the hopes that a
seasoned Solr pro might notice something. I thought perhaps not treating the
doc values as multi doc values when indexes are segmented might have been
the issue. But I optimized my test collection to merge the segments and the
search fails in the same spot....

ERROR - 2016-08-10 09:03:20.249; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.common.SolrException;
null:java.lang.NullPointerException
	at
org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1305)
	at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:758)
	at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:729)
	at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:388)


public class DocumentCollapsingCollector extends DelegatingCollector {
	static final String AGGR_STATS = "AggregationStats";
	static final String SORT_BY_SCORE = "SortByScore";
	private static final String TOTAL_DOCS_STAT = "totalDocCount";
	private final SolrQueryRequest req;
	private final ResponseBuilder rb;
	private final LeafReaderContext[] contexts;
	private final FixedBitSet collapsedSet;
	private final List<SortedDocValues> fieldValues;
	private final NumericDocValues spendValues;
	private final Map<FieldOrdinals, AggregationStats> aggregatedDocs;
	private int docBase;
	private final int maxDoc;
	private final int numberOfFields;
	private int totalDocs;
	private final SearchPreProcessor.SortBy sortBy;

	DocumentCollapsingCollector(int maxDoc, int segments, List<SortedDocValues>
docValues, NumericDocValues spendValues,
			SolrQueryRequest req, ResponseBuilder rb) {

		aggregatedDocs = new HashMap<>();
		this.maxDoc = maxDoc;
		contexts = new LeafReaderContext[segments];
		collapsedSet = new FixedBitSet(maxDoc);
		fieldValues = docValues;
		numberOfFields = docValues.size();
		this.spendValues = spendValues;
		this.req = req;
		this.rb = rb;
		sortBy = (SearchPreProcessor.SortBy) req.getContext().get(SORT_BY_SCORE);
	}

	@Override
	public void collect(int doc) throws IOException {
		int globalDoc = doc + docBase;
		int[] ords = new int[numberOfFields];

		int i=0;
		for (SortedDocValues vals : fieldValues) {
			ords[i++] = vals.getOrd(globalDoc);
		}

		FieldOrdinals ordinals = new FieldOrdinals(ords);
		AggregationStats stats = aggregatedDocs.get(ordinals);
		if (stats != null) {
			stats.bumpCount();
			stats.addSpend(Double.longBitsToDouble(spendValues.get(globalDoc)));
		} else {
			aggregatedDocs.put(ordinals, new AggregationStats(globalDoc,
Double.longBitsToDouble(spendValues.get(globalDoc))));
		}
		totalDocs++;
	}

	@Override
	public boolean needsScores() {
		return sortBy != null;
	}

	@Override
	protected void doSetNextReader(LeafReaderContext context) throws
IOException {
		contexts[context.ord] = context;
		docBase = context.docBase;
	}

	@Override
	public void finish() throws IOException {
		if (contexts.length == 0) {
			return;
		}

		for (AggregationStats docStats : aggregatedDocs.values()) {
			collapsedSet.set(docStats.getDocId());
		}

		// saving the stats to the request context so that a doc transformer can
pick them up
		AggregationStatsArray stats = new
AggregationStatsArray(aggregatedDocs.values());
		ImmutableSparseArray<AggregationStats> statsArray = new
ImmutableSparseArray<AggregationStats>(stats);
		req.getContext().put(AGGR_STATS, statsArray);

		int currentContext = 0;
		int currentDocBase = 0;
		int nextDocBase = currentContext+1 < contexts.length ?
contexts[currentContext+1].docBase : maxDoc;

		super.leafDelegate =
super.delegate.getLeafCollector(contexts[currentContext]);
		DummyScorer dummy = new DummyScorer();
		super.leafDelegate.setScorer(dummy);

		BitSetIterator it = new BitSetIterator(collapsedSet, 0L);
		int docId = -1;

		while ((docId = it.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
			if (SearchPreProcessor.SortBy.COUNT.equals(sortBy)) {
				dummy.score = statsArray.get(docId).getCount();
			} else if (SearchPreProcessor.SortBy.SPEND.equals(sortBy)) {
				dummy.score = (float) statsArray.get(docId).getSpend();
			}

			while (docId >= nextDocBase) {
				currentContext++;
				currentDocBase = contexts[currentContext].docBase;
				nextDocBase = currentContext+1 < contexts.length ?
contexts[currentContext+1].docBase : maxDoc;

				super.leafDelegate =
super.delegate.getLeafCollector(contexts[currentContext]);
				super.leafDelegate.setScorer(dummy);
			}

			int contextDoc = docId-currentDocBase;
			dummy.docId = contextDoc;
			super.leafDelegate.collect(contextDoc);
		}

		rb.rsp.add(TOTAL_DOCS_STAT, Integer.valueOf(totalDocs));

		if (super.delegate instanceof DelegatingCollector) {
			((DelegatingCollector) super.delegate).finish();
		}
	}

	private class FieldOrdinals {
		private final int[] ords;

		FieldOrdinals(int[] ords) {
			this.ords = ords;
		}

		int[] getOrds() {
			return ords;
		}

		@Override
		public int hashCode() {
			return Arrays.hashCode(ords);
		}

		@Override
		public boolean equals(Object obj) {
			return Arrays.equals(ords, ((FieldOrdinals)obj).getOrds());
		}
	}

	private class DummyScorer extends Scorer {
		float score;
		int docId;

		DummyScorer() {
			super(null);
		}

		@Override
		public float score() throws IOException {
			return score;
		}

		@Override
		public int freq() throws IOException {
			return 0;
		}

		@Override
		public int advance(int i) throws IOException {
			return -1;
		}

		@Override
		public long cost() {
			return 0;
		}

		@Override
		public int docID() {
			return docId;
		}

		@Override
		public int nextDoc() throws IOException {
			return 0;
		}
	}
}



--
View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291180.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AnalyticsQuery fails on a sharded collection

Posted by tedsolr <ts...@sciquest.com>.

Thanks Joel! However I'm come to realize that upgrading to Solr 6 is not a
near term reality due to the Java 8 requirement.

I don't want anyone to waste their time debugging my code. At least not
until I've made time to really work through it myself. I was just looking
for a pointer on generalities - if the collector works with a single shard
but not two, perhaps look at A and B.


Joel Bernstein wrote
> ...
> 
> As far using a MergeStrategy, I would suggest creating a streaming
> expression that handles the merge. This is a much cleaner approach. An
> example of how this works can be seen in this patch:
> 
> https://issues.apache.org/jira/secure/attachment/12820171/SOLR-9252.patch
> 
> The AnalyticsQuery in this case is:
> 
> TextLogisticRegressionQParserPlugin.java
> 
> The expression is:
> 
> TextLogitStream.java
> 
> The TextLogitStream has sample code for calling the shards and merging
> the results.
> 
> If you want to use this approach the following patch is needed so you
> can add your own streaming expression:
> 
> https://issues.apache.org/jira/browse/SOLR-9103
> 
> This will likely be in 6.2





--
View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4289364.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AnalyticsQuery fails on a sharded collection

Posted by Joel Bernstein <jo...@gmail.com>.

The finish() method operates on the search node, not the aggregator node.
So whether it's distributed shouldn't effect how it runs. If you can post
your code I might be able to see the issue.

As far using a MergeStrategy, I would suggest creating a streaming
expression that handles the merge. This is a much cleaner approach. An
example of how this works can be seen in this patch:

https://issues.apache.org/jira/secure/attachment/12820171/SOLR-9252.patch

The AnalyticsQuery in this case is:

TextLogisticRegressionQParserPlugin.java

The expression is:

TextLogitStream.java

The TextLogitStream has sample code for calling the shards and merging
the results.

If you want to use this approach the following patch is needed so you
can add your own streaming expression:

https://issues.apache.org/jira/browse/SOLR-9103

This will likely be in 6.2

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jul 27, 2016 at 5:36 PM, tedsolr <ts...@sciquest.com> wrote:

> I'm looking to create a merge strategy for a custom QParserPlugin I have.
> The
> plugin works fine on collections with one shard. I was very surprised to
> see
> it throw an exception when I ran it against a sharded collection. So my
> question is a bit of a shot in the dark. I'll first note that the
> CollapsingQParserPlugin included with Solr works as expected on my test
> collection with two shards.
>
> The NPE occurs in my DelegatingCollector's finish() method as it's setting
> the next doc base. It appears I have a null LeafReaderContext. Without
> knowing anything about my code, what is it about multiple shards that might
> throw off a collector like this?
>
> thanks!
> v5.2.1
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>