You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Per Steffensen <st...@designware.dk> on 2013/08/30 15:56:47 UTC

Complex group request

Hi

I want to do a fairly complex grouping request against Solr. Lets say 
that I have fields "field1" and "timestamp" for all my documents.

In the request I want to provide a set of time-intervals and for each 
distinct value of "field1" I want to get a count on in how many of the 
time-intervals there is at least one document where the value of 
"field1" is this distinct value. Smells like grouping but with an 
advanced counting.

Example
Documents in Solr
field1 | timestamp
a        | 1
a        | 2
b        | 1
a        | 3
c        | 5
a        | 10
b        | 12
b        | 11
a        | 13
d        | 14

Doing a query with the following time-intervals (both ends included)
time-interval#1: 1 to 2
time-interval#2: 3 to 5
time-interval#3: 6 to 12

I would like to get the following result
field1-value | count
a                  | 3
b                  | 2
c                  | 1
Reasons
* field1-value a: Count=3, because there is a document with field1=a and 
a timestamp between 1 to 2 (actually there are 2 such documents, but we 
only count in how many time-intervals a is present and do not consider 
how many times a is present in that interval), AND because there is a 
document with field1=a and a timestamp between 3 and 5, AND because 
there is a document with field1=a and a timestamp between 6 and 12
* field1-value b: Count=2, because there is at least one document with 
field1=b in time-interval#1 AND time-interval#3 (there is no document 
with field1=b in time-interval#2)
* field1-value c: Count=1, because there is at least one document with 
field1=c in time-interval#2 (there is no document with field1=c in 
neither time-interval#1 nor time-interval#3)
* No field1-value=d in the result-set, because d is not in at least in 
one of the time-intervals.

The query part of the request probably needs to be
* q=timestamp:([1 TO 2]) OR timestamp:([3 TO 5]) OR timestamp:([6 TO 12])
but if I just add the following to the request
* group=true
* group.field=field1
* group.limit=1 (strange that you cannot set this to 0 BTW - I am not 
interested in one of the documents)
I will get the following result
field1/group-value | count
a                            | 4 (because there is a total of 4 
documents with field1=a in those time-intervals)
b                            | 3
c                            | 1

1) Is it possible for me to create a request that will produce the 
result I want?
2) If yes to 1), how? What will the request look like?
3) If yes to 1), will it work in a distributed SolrCloud setup?
4) If yes to 1), will it perform?
5) If no to 1), is there a fairly simple Solr-code-change I can do in 
order to make it possible? You do not have to hand me the solution, but 
a few comments on how easy/hard it would be, and ideas on how to attack 
the challenge would be nice.

Thanks!

Regards, Per Steffensen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org