You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Per Steffensen <st...@designware.dk> on 2013/08/30 15:56:47 UTC
Complex group request
Hi
I want to do a fairly complex grouping request against Solr. Lets say
that I have fields "field1" and "timestamp" for all my documents.
In the request I want to provide a set of time-intervals and for each
distinct value of "field1" I want to get a count on in how many of the
time-intervals there is at least one document where the value of
"field1" is this distinct value. Smells like grouping but with an
advanced counting.
Example
Documents in Solr
field1 | timestamp
a | 1
a | 2
b | 1
a | 3
c | 5
a | 10
b | 12
b | 11
a | 13
d | 14
Doing a query with the following time-intervals (both ends included)
time-interval#1: 1 to 2
time-interval#2: 3 to 5
time-interval#3: 6 to 12
I would like to get the following result
field1-value | count
a | 3
b | 2
c | 1
Reasons
* field1-value a: Count=3, because there is a document with field1=a and
a timestamp between 1 to 2 (actually there are 2 such documents, but we
only count in how many time-intervals a is present and do not consider
how many times a is present in that interval), AND because there is a
document with field1=a and a timestamp between 3 and 5, AND because
there is a document with field1=a and a timestamp between 6 and 12
* field1-value b: Count=2, because there is at least one document with
field1=b in time-interval#1 AND time-interval#3 (there is no document
with field1=b in time-interval#2)
* field1-value c: Count=1, because there is at least one document with
field1=c in time-interval#2 (there is no document with field1=c in
neither time-interval#1 nor time-interval#3)
* No field1-value=d in the result-set, because d is not in at least in
one of the time-intervals.
The query part of the request probably needs to be
* q=timestamp:([1 TO 2]) OR timestamp:([3 TO 5]) OR timestamp:([6 TO 12])
but if I just add the following to the request
* group=true
* group.field=field1
* group.limit=1 (strange that you cannot set this to 0 BTW - I am not
interested in one of the documents)
I will get the following result
field1/group-value | count
a | 4 (because there is a total of 4
documents with field1=a in those time-intervals)
b | 3
c | 1
1) Is it possible for me to create a request that will produce the
result I want?
2) If yes to 1), how? What will the request look like?
3) If yes to 1), will it work in a distributed SolrCloud setup?
4) If yes to 1), will it perform?
5) If no to 1), is there a fairly simple Solr-code-change I can do in
order to make it possible? You do not have to hand me the solution, but
a few comments on how easy/hard it would be, and ideas on how to attack
the challenge would be nice.
Thanks!
Regards, Per Steffensen
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org