You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2007/02/09 08:10:05 UTC

[jira] Created: (SOLR-153) Facet Index

Facet Index
-----------

                 Key: SOLR-153
                 URL: https://issues.apache.org/jira/browse/SOLR-153
             Project: Solr
          Issue Type: New Feature
            Reporter: Yonik Seeley


A facet index, initially for non-hierarchical facets.
Start with all terms, and a set of documents for each term.  Group lower level nodes by taking the union of the sets, but keep track of the largest set going back all the way to the leaves (the max doc-freq for that node).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-153) Facet Index

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-153:
------------------------------

    Attachment: facettree.patch

Incomplete brainstorming code, for building the lowest level of the tree only.
Completely untested / uncompiled, no search-side code yet, and all in a single file.


> Facet Index
> -----------
>
>                 Key: SOLR-153
>                 URL: https://issues.apache.org/jira/browse/SOLR-153
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>         Attachments: facettree.patch
>
>
> A facet index, initially for non-hierarchical facets.
> Start with all terms, and a set of documents for each term.  Group lower level nodes by taking the union of the sets, but keep track of the largest set going back all the way to the leaves (the max doc-freq for that node).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-153) Facet Index

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-153:
------------------------------

    Attachment: facettree.patch

Attached again, this time correctly granting ASF license.

> Facet Index
> -----------
>
>                 Key: SOLR-153
>                 URL: https://issues.apache.org/jira/browse/SOLR-153
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>         Attachments: facettree.patch, facettree.patch, facettree.patch
>
>
> A facet index, initially for non-hierarchical facets.
> Start with all terms, and a set of documents for each term.  Group lower level nodes by taking the union of the sets, but keep track of the largest set going back all the way to the leaves (the max doc-freq for that node).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-153) Facet Index

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-153:
------------------------------

    Attachment: facettree.patch

Much more complete code, algorithm-wise.

I added code to build a tree.  It's based on a priority queue, but it only takes unionSize into account when selecting nodes to merge (not maxDf at all), and is thus sub-optimal.  I expect it to be replaced in the future, but it may work well enough for the first working version.

I added searching code that traverses the tree and expands nodes, estimating child intersection counts based on the parent count multiplied by the fraction of bits set in the child union.  
Right now, the next node to evaluate is based on estimatedIntersectionCount * maxDf, but something like estimatedIntersectionCount * sqrt(maxDf) might work better in the future.

This is still all really brainstorming code, all in one file, completely untested, and it will not work since there is no code to hook it up to Solr (construct a request or get the result).  This update is really just to back up the code somewhere, or in case I get hit by a bus :-)


> Facet Index
> -----------
>
>                 Key: SOLR-153
>                 URL: https://issues.apache.org/jira/browse/SOLR-153
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>         Attachments: facettree.patch, facettree.patch, facettree.patch
>
>
> A facet index, initially for non-hierarchical facets.
> Start with all terms, and a set of documents for each term.  Group lower level nodes by taking the union of the sets, but keep track of the largest set going back all the way to the leaves (the max doc-freq for that node).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-153) Facet Index

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-153:
------------------------------

    Attachment:     (was: facettree.patch)

> Facet Index
> -----------
>
>                 Key: SOLR-153
>                 URL: https://issues.apache.org/jira/browse/SOLR-153
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>         Attachments: facettree.patch, facettree.patch
>
>
> A facet index, initially for non-hierarchical facets.
> Start with all terms, and a set of documents for each term.  Group lower level nodes by taking the union of the sets, but keep track of the largest set going back all the way to the leaves (the max doc-freq for that node).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.