You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Alexandre Rafalovitch (JIRA)" <ji...@apache.org> on 2016/10/05 12:04:20 UTC

[jira] [Commented] (SOLR-9193) Add scoreNodes Streaming Expression

    [ https://issues.apache.org/jira/browse/SOLR-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548509#comment-15548509 ] 

Alexandre Rafalovitch commented on SOLR-9193:
---------------------------------------------

I know this issue is closed, but I wanted to check before I open a new one.

The implicit definition of "/terms" is now:
{noformat}
   "/terms": {
      "class": "solr.SearchHandler",
      "useParams":"_TERMS",
      "components": [
        "terms"
      ]
    },
{noformat}

This conflicts with all explicit definitions we currently have in solrconfig.xml file:
{noformat}
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <bool name="terms">true</bool>
    <bool name="distrib">false</bool>
  </lst>
  <arr name="components">
    <str>terms</str>
  </arr>
</requestHandler>
{noformat}

Specifically, the existing definition is *terms=true* and *distrib=false*. As is, we cannot remove those definitions from the solrconfig. Any specific reasons those were not included when this ticket did the implicit definition (especially *distrib*) or was that just an oversight?

> Add scoreNodes Streaming Expression
> -----------------------------------
>
>                 Key: SOLR-9193
>                 URL: https://issues.apache.org/jira/browse/SOLR-9193
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrJ
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>             Fix For: 6.2
>
>         Attachments: SOLR-9193.patch
>
>
> The scoreNodes Streaming Expression is another *GraphExpression*. It will decorate a gatherNodes expression and use a tf-idf scoring algorithm to score the nodes.
> The gatherNodes expression only gathers nodes and aggregations. This is similar in nature to tf in search ranking, where the number of times a node appears in the traversal represents the tf. But this skews recommendations towards nodes that appear frequently in the index.
> Using the idf for each node we can score each node as a function of tf-idf. This will provide a boost to nodes that appear less frequently in the index. 
> The scoreNodes expression will gather the idf's from the shards for each node emitted by the underlying gatherNodes expression. It will then assign the score to each node. 
> The computed score will be added to each node in the *nodeScore* field. The docFreq of the node across the entire collection will be added to each node in the *docFreq* field. Other streaming expressions can then perform a ranking based on the nodeScore or compute their own score using the nodeFreq.
> proposed syntax:
> {code}
> top(n="10",
>       sort="nodeScore desc",
>       scoreNodes(gatherNodes(...))) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org