You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2011/04/03 01:45:05 UTC

[jira] [Commented] (SOLR-2366) Facet Range Gaps

    [ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015088#comment-13015088 ] 

Hoss Man commented on SOLR-2366:
--------------------------------

In no particular order...

* I like Jan's {{facet.range.spec}} naming suggestion better then my {{facet.range.buckets}} suggestion ... but i think {{facet.range.series}}, {{facet.range.seq}}, or {{facet.range.sequence}} might be better still.

* I think Jan's point about {{N}} vs {{+N}} in the sequence list as a way to mix absolute values vs increments definitely makes sense, and would be consistent with the existing date match expression.  

* the complexity with supporting *both* absolute values and increments would be the question of what solr should do with input like {{facet.range.seq=10,20,+50,+100,120,150}} ?  what ranges would we return? (10-20, 20-70, 70-???....)  would it be an error? would we give back ranges that overlapped?  what about {{facet.range.seq=10,50,+50,100,150&facet.range.include=all}} .. would that result in one of the ranges being [100 TO 100] or would we throw that one out?  (I think it would be wise to start out only implementing the absolute value approavh, since that seems (to me) the more useful option of the two, and then consider adding the incremental values as a separate issue later after hashing out hte semantics of these types of situations)

* A few of Jan's sample input suggestions used {{*}} at either the start or end of the sequence to denote "everything before" the second value or "everything after" the second to last value -- i don't think we need to support this syntax, I think the existing {{facet.range.other}} would still be the right way to support this with {{facet.range.sequence}}.  if you want "everything before" and/or "everything after" use {{facet.range.include=before}} and/or {{facet.range.include=after}} .. otherwise it would be confusing to decide what things like {{facet.range.include=before&facet.range.seq=*,10,20}} and {{facet.range.include=none&facet.range.seq=*,10,20}} mean.

* I *REALLY* don't think we should try to implement something like Jan's {{facet.range.labels}} suggestion.  I can't imagine any way of supporting it thta wouldn't prevent or radically complicate the "..." type continuation of series i suggested before, and that seems like a much more powerful feature then labels.  if a user is going to provide a label for every range, then you must enumerate every range, and you might as well enumerate them (and label them) with {{facet.query}} where the label and the query can be side by side.

This...

{code}
facet.query={!label="One or more"}bedrooms:[1 TO *]
facet.query={!label="Two or more"}bedrooms:[2 TO *]
facet.query={!label="Three or more"}bedrooms:[3 TO *]
facet.query={!label="Four or more"}bedrooms:[4 TO *]
{code}

...seems way more readable, and less prone to user error in tweaking, then this...

{code}
f.bedrooms.facet.range.spec=1..*,2..*,3..*,4..*
f.bedrooms.facet.range.labels="One or more","Two or more","Three or more","Four or more"
{code}

* Herman commented...

bq. While using fact.query allows us to construct arbitrary ranges, we must then pick them out of the results separately. This becomes more difficult if we arbitrarily facet on two or more fields/expressions. 

I don't see that as being particularly hard problem that we need to worry about helping users avoid,  Especially since users can anotate those queries using localparams and set any arbitrary key=val pairs on them that you want to help organize them and identify them later when parsing the response...

{code}
facet.query={!group=bed label="One or more"}bedrooms:[1 TO *]
facet.query={!group=bed label="Two or more"}bedrooms:[2 TO *]
facet.query={!group=bed label="Three or more"}bedrooms:[3 TO *]
facet.query={!group=bed label="Four or more"}bedrooms:[4 TO *]
facet.query={!group=size label="Small"}sqft:[* TO 1000]
facet.query={!group=size label="Medium"}sqft:[1000 TO 2500]
facet.query={!group=size label="Large"}sqft:[2500 TO *]
{code}




> Facet Range Gaps
> ----------------
>
>                 Key: SOLR-2366
>                 URL: https://issues.apache.org/jira/browse/SOLR-2366
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: SOLR-2366.patch, SOLR-2366.patch
>
>
> There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance.  We should be able to quantize the results into arbitrarily sized buckets.  I'd propose the syntax to be a comma separated list of sizes for each bucket.  If only one value is specified, then it behaves as it currently does.  Otherwise, it creates the different size buckets.  If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this)
> For instance,
> facet.range.start=0
> facet.range.end=400
> facet.range.gap=5,25,50,100
> would yield buckets of:
> 0-5,5-30,30-80,80-180,180-280,280-380,380-400

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org