You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2018/07/23 22:52:00 UTC

[jira] [Commented] (SOLR-12582) Consider api/documentation synergies/overlap between JSON Faceting relatedness() function and significantTerms sreaming expression

    [ https://issues.apache.org/jira/browse/SOLR-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553508#comment-16553508 ] 

Hoss Man commented on SOLR-12582:
---------------------------------

I'm not an expert on streaming expressions, and i have no first hand familiarity with the significantTerms streaming souce – but from what i can tell they are only orthogonally related.

the signficantTerms streaming source is somewhat comparable to a field facet sorted by the relatedness() function – but it seems to have the normal constraints of any streaming expression in terms of the source data fields, and how data for the entire collection is stream processed on single node – the relatedness() aggregation function isn't quite as limited, but also probably not as powerful when that "stream the entire collection" usecase is what you want.

as noted, relatedness() supports configuring arbitrary foreground/background queries, but it can also be used on an facet type – not just "term" faceting, so you can use it to score the buckets from any arbitrary facet (including range facets or facet queries) and in particular deal with sub facets.

what does that all mean in terms of what we should say about one vs the other in documentation? ...  i dunno.  I agree there should probably be some cross linking between the documentation to help draw awareness of the two for folks who find one, but the other might be more appropriate, but i'm not sure what form that should take (hence filing this issue rather then just making the change myself)

as far as trying to maintain the same option names – i don't know that that is feasible or really makes sense – at least in so much as adding new options to relatedness() using hte same names as the existing options on significantTerms.  notably the existing {{minDocFreq}} option on significantTerms is similar _in concept_ to the {{min_popularity}} option proposed in SOLR-12581 for relatedness(), but it would not really make sense to use {{minDocFreq}} as the option name in SOLR-12581 since the relatedness() function isn't tied to "terms" the way significantTerms is -- so "docFreq" has no real meaning, and the more general "popularity" makes more sense (i suppose we could change the option name in signifncatTerms -- but even then significantTerms doesn't produce the same concept of "popularity" that relatedness() does, andeven if it did because that that expression focuses exclusively on "terms" the concept of "DocFreq" is very appropriate.


 

> Consider api/documentation synergies/overlap between JSON Faceting relatedness() function and significantTerms sreaming expression
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-12582
>                 URL: https://issues.apache.org/jira/browse/SOLR-12582
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>
> In SOLR-12581, Alexandre asked the tangential question below, which i've spun off into it's own jira...
> {quote}
> Sort of a side-question, but this work _\[adding new options to the JSON faceting relatedness() aggregation\]_ seems to overlap/compliment the significantTerms work done for streaming/QueryParser: http://lucene.apache.org/solr/guide/7_4/stream-source-reference.html#significantterms
> Are we saying SignificantTerms is for simpler use cases (as fore/back queries are corpus-wide) and then go into relatedness() for more complex analysis? 
> Should the options be roughly compatible where it makes sense and/or similarly named?
> Just wondering because I could see this confusing newbies trying to see when to use which option.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org