You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter 4U <pe...@hotmail.com> on 2009/12/10 13:30:18 UTC

Reverse sort facet query

Hello Forum,

 

I've had a search in the mail archives and on the 'net, but I'm sure I wouldn't be the first to have a requirement for this:

 

Does anyone know of a good way to perform a reverse-sorted facet query (i.e. rarest first)?

 

As you know facet.sort toggles between sorting on count or field name, but there's no built-in method for reverse count.

 

One way I've found to do this is to set facet.limit=-1 (and facet.mincount) to get the entire list, then take 'bottom-5' to get a 'rare' list.

This works, but it's not great for very large lists.

 

Does anyone know of a better way?

 

Many thanks,

Peter

 
 		 	   		  
_________________________________________________________________
Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
http://clk.atdmt.com/UKM/go/186394592/direct/01/

RE: Reverse sort facet query [SOLR-1672]

Posted by Chris Hostetter <ho...@fucit.org>.
: i.e. just extend facet.sort to allow a 'count desc'. By convention, ok 
: to use a a space in the name? - or would count.desc (and count.asc as 
: alias for count) be more compliant?

i would use space to remain consistent with the existing "sort" 
param. 

it might even make sense to refactor (re/ab)use the existing "sort" 
parsing code in QueryParsing.parseSort ... but now that that also know 
about parsing functions it's a bit hairry ... so that does seem a little 
crazy.




: 
:  
: 
: Peter
:  
: 
:  		 	   		  
: _________________________________________________________________
: We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
: http://clk.atdmt.com/UKM/go/195013117/direct/01/



-Hoss


RE: Reverse sort facet query [SOLR-1672]

Posted by Peter S <pe...@hotmail.com>.
> 
> now i'm totally confused: what are you suggesting this new param would 
> do, what does the name mean?
> 
Sorry, I wan't clear - there isn't a new parameter, except the one added in the patch. What I was suggesting here is to do the work
to remove the new parameter I just put in (facet.sortorder), and do it in exactly the way you mentioned - 
i.e. just extend facet.sort to allow a 'count desc'. By convention, ok to use a a space in the name? - or would count.desc (and count.asc as alias for count) be more compliant?

 

Peter
 

 		 	   		  
_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

RE: Reverse sort facet query [SOLR-1672]

Posted by Chris Hostetter <ho...@fucit.org>.
: working well. The only caveat to this is that the reverse sort results
: don't include 0-count facets (see notes in SOLR-1672), so reverse sort 
	...
: believe patching to include 0 counts could open a can of worms in terms 
: of b/w compat and performance, as 0 counts look to be skipped (by 
: default). I could be wrong, and you may know better how changes to 

Hmmm... that behavior should all be driven by facet.mincount.  i haven't 
look at that code in a long time, so an optimization may have been added 
to not bother trying to "sort" all of the 0s ... but the default for 
facet.mincount is 0 (ie: show everything)

: Would you like me to go ahead and amend the patch (w/o 0-counts) to define a new 'sort' parameter? 
: 
: For naming, I would propose an extension of FacetParams.FACET_SORT_COUNT ala:
: 
: public static final String FACET_SORT_COUNT_REVERSE = "count.reverse";

now i'm totally confused: what are you suggesting this new param would 
do, what does the name mean?

my original point was that we probably don't need any new params at all: 
just change facet.sort to accept "count desc" and "count asc" in addition 
to "count" (which would become an alias for "count desc").


-Hoss


RE: Reverse sort facet query [SOLR-1672]

Posted by Peter 4U <pe...@hotmail.com>.

 

> Date: Sun, 3 Jan 2010 22:18:33 -0800
> From: hossman_lucene@fucit.org
> To: solr-user@lucene.apache.org
> Subject: RE: Reverse sort facet query [SOLR-1672]
> 
> 
> : Yes, I thought about adding some 'new syntax', but I opted for a separate 'facet.sortorder' parameter,
> : 
> : mainly because I'm not familiar enough with the codebase to know what effect this might have on
> : 
> : backward compatibility. It would be easy enough to modify the patch I created to do it this way.
> 
> it shouldn't really affect anything -- it wouldn't really be new syntax, 
> just extending hte existing "sort" param syntax to apply to the 
> "facet.sort" param. The only back compat concern is making sure we 
> continue to support true/false as aliases, and having the default order 
> match the current bahvior if asc/desc aren't specified.
> 
> 
> -Hoss
> 


Yes, agreed. The current patch doesn't touch the b/w true/false aliasing, and any move to adding a new attr can keep all that intact.

I've been using the current patch extensively in our testing, and that's working well. The only caveat to this is that the reverse sort results

don't include 0-count facets (see notes in SOLR-1672), so reverse sort results start with the first count=1. This could be confusing as

there could well be many facets whose count is 0, and it might be expected that these be returned in the first instance.

>From my admittedly cursory look into the codebase regading this, I believe patching to include 0 counts could open a can of worms in terms

of b/w compat and performance, as 0 counts look to be skipped (by default). I could be wrong, and you may know better how changes to SimpleFacets/UnInvertedField would affect performance and compatibility.

If there is indeed a performance optimization in facet counting iteration, it would, imo, be preferable to have the optimization, rather than the 0-counts.

 

Would you like me to go ahead and amend the patch (w/o 0-counts) to define a new 'sort' parameter? 

For naming, I would propose an extension of FacetParams.FACET_SORT_COUNT ala:

 

public static final String FACET_SORT_COUNT_REVERSE = "count.reverse";

 

I can then easily modify the patch to detect/use this value to invoke the new behaviour.

Comments? Suggestions?

 

Thanks,

Peter

 

 

 

 
 		 	   		  
_________________________________________________________________
Have more than one Hotmail account? Link them together to easily access both
 http://clk.atdmt.com/UKM/go/186394591/direct/01/

RE: Reverse sort facet query [SOLR-1672]

Posted by Chris Hostetter <ho...@fucit.org>.
: Yes, I thought about adding some 'new syntax', but I opted for a separate 'facet.sortorder' parameter,
: 
: mainly because I'm not familiar enough with the codebase to know what effect this might have on
: 
: backward compatibility. It would be easy enough to modify the patch I created to do it this way.

it shouldn't really affect anything -- it wouldn't really be new syntax, 
just extending hte existing "sort" param syntax to apply to the 
"facet.sort" param.  The only back compat concern is making sure we 
continue to support true/false as aliases, and having the default order 
match the current bahvior if asc/desc aren't specified.


-Hoss


RE: Reverse sort facet query [SOLR-1672]

Posted by Peter 4U <pe...@hotmail.com>.
> in Solr 1.4 the boolean syntax was deprecated in place of keywords that 
> are more meaninful...
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort
> 
> ... "count" and "index" replaced "true" and "false"


Yes, I thought about adding some 'new syntax', but I opted for a separate 'facet.sortorder' parameter,

mainly because I'm not familiar enough with the codebase to know what effect this might have on

backward compatibility. It would be easy enough to modify the patch I created to do it this way.

[see SOLR-1672]

 

Thanks,

Peter

 


 
> Date: Thu, 24 Dec 2009 22:24:25 -0800
> From: hossman_lucene@fucit.org
> To: solr-user@lucene.apache.org
> Subject: RE: Reverse sort facet query
> 
> 
> : I'll have a look at SimpleFacets.java to look at patching it. I should 
> : think the sorting bit will be relatively straightforward. The tricky bit 
> : is how to submit the request via the query interface - there's only a 
> : boolean
> 
> in Solr 1.4 the boolean syntax was deprecated in place of keywords that 
> are more meaninful...
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort
> 
> ... "count" and "index" replaced "true" and "false"
> 
> we could always start supporting "count desc" and "count asc" (with 
> "count" as an alias for "count desct"
> 
> : The reverse facet query is for when you want to know which event (or 
> : group of event types) has happened the least
> 
> got it, thanks.
> 
> 
> 
> -Hoss
> 
 		 	   		  
_________________________________________________________________
Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
http://clk.atdmt.com/UKM/go/186394592/direct/01/

RE: Reverse sort facet query

Posted by Chris Hostetter <ho...@fucit.org>.
: I'll have a look at SimpleFacets.java to look at patching it. I should 
: think the sorting bit will be relatively straightforward. The tricky bit 
: is how to submit the request via the query interface - there's only a 
: boolean

in Solr 1.4 the boolean syntax was deprecated in place of keywords that 
are more meaninful...
  http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort

 ... "count" and "index" replaced "true" and "false"

we could always start supporting "count desc" and "count asc" (with 
"count" as an alias for "count desct"

: The reverse facet query is for when you want to know which event (or 
: group of event types) has happened the least

got it, thanks.



-Hoss


RE: Reverse sort facet query

Posted by Peter 4U <pe...@hotmail.com>.
Hello,

 

Thanks very much for your answer.

 

I'll have a look at SimpleFacets.java to look at patching it. I should think the sorting bit will be relatively straightforward. The tricky bit is how to submit the request via the query interface - there's only a boolean

for facet sorting - would probably require a new parameter so as to maintain bw compatilibity [e.g. &facet.reversesort=true] (if you have any thoughts on how you would like to see such functionality integrated into a query, let me know). When I have something working, I'll probably have to ask you the best way to submit a patch for this.

 

The use case is pretty straightforward, really:

 

In my case, the index is collecting/storing network events (logs, firewall events, Win event logs etc.).

 

The reverse facet query is for when you want to know which event (or group of event types) has happened the least

over a given period of time.

 

As a simple example:

Let's say you want to look at who has been logging in to a secure server over the past week, and this server is normally

accessed by only a handful of users.

But you don't want to know the 'typical' users that have logged in, you want to know who's only logged-in once, at say

3 o'clock in the morning on Wednesday. Hmmm, why's he/she doing that?

 

Here, a 'rare' query will show you the atypical behaviour.

 

Capacity Planning and Performance Monitoring is another example - where you might want to know which machines have 

produced the least number of errors or the least amount of traffic.

 

Outside of networking, stock control would be another example - 'what items are we about to run out of?'

 

Thanks,

Peter

 

 

 
> Date: Tue, 15 Dec 2009 13:12:44 -0800
> From: hossman_lucene@fucit.org
> To: solr-user@lucene.apache.org
> Subject: Re: Reverse sort facet query
> 
> 
> : Does anyone know of a good way to perform a reverse-sorted facet query (i.e. rarest first)?
> 
> I'm fairly confident that code doesn't exist at the moment. 
> 
> If i remember correctly, it would be fairly simply to implement if you'd 
> like to submit a patch: when sorting by count a simple bounded priority 
> queue is used, so we'd just have the change the comparator. If you're 
> interested in working on a patch it should be in SimpleFacets.java. I 
> think the queue is called "BoundedTreeSet"
> 
> 
> (that's a pretty novel request actually ... i don't remember anyone else 
> ever asking for anything like this before .. can you describe your use 
> case a bit -- i'm curious as to how/when you would use this data)
> 
> 
> 
> -Hoss
> 
 		 	   		  
_________________________________________________________________
Use Hotmail to send and receive mail from your different email accounts
http://clk.atdmt.com/UKM/go/186394592/direct/01/

Re: Reverse sort facet query

Posted by Chris Hostetter <ho...@fucit.org>.
: Does anyone know of a good way to perform a reverse-sorted facet query (i.e. rarest first)?

I'm fairly confident that code doesn't exist at the moment.  

If i remember correctly, it would be fairly simply to implement if you'd 
like to submit a patch:  when sorting by count a simple bounded priority 
queue is used, so we'd just have the change the comparator.  If you're 
interested in working on a patch it should be in SimpleFacets.java.  I 
think the queue is called "BoundedTreeSet"


(that's a pretty novel request actually ... i don't remember anyone else 
ever asking for anything like this before .. can you describe your use 
case a bit  -- i'm curious as to how/when you would use this data)



-Hoss