You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Martijn van Groningen (JIRA)" <ji...@apache.org> on 2009/09/02 20:20:33 UTC

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750582#action_12750582 ] 

Martijn van Groningen edited comment on SOLR-236 at 9/2/09 11:18 AM:
---------------------------------------------------------------------

Yes, specifying which collapse fields to return is a good idea. Just like the fl parameter for a normal request. 
I was thinking about how to fit this new feature into the current patch and I thought that it might be a good idea to revise the current field collapse result format. So that the results of this feature can fit nicely into the response. 

Currently the collapse response is like this:
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="doc">
            <int name="233238">1</int>
        </lst>
        <lst name="count">
            <int name="melkweg">1</int>
        </lst>
</lst>
{code}

I think a response format like the following would be more ....
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="results">
            <lst name="233238">
                 <str name="fieldValue">melkweg</str>
                 <int name="collapseCount">2</int>
                 <lst name="collapsedValues">
                     <str name="price">10.99, "1.999,99"</str>
                     <str name="name">adapter, laptop</str>
                 </lst>
            </lst>
        </lst>
</lst>
{code}
As you can see the data is more banded together and therefore easier to parse. The collapsedValues can have one or more fields, each containing collapsed field values in a comma separated format. The _collapseValues_ element will off course only be added when the client specifies the collapsed fields in the request.
What do you think about this new result format? 

      was (Author: martijn):
    Yes, specifying which collapse fields to return is a good idea. Just like the fl parameter for a normal request. 
I was thinking about how to fit this new feature into the current patch and I thought that it might be a good idea to revise the current field collapse result format. So that the results of this feature can fit nicely into the response. 

Currently the collapse response is like this:
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="doc">
            <int name="233238">1</int>
        </lst>
        <lst name="count">
            <int name="melkweg">1</int>
        </lst>
    </lst>
{code}

I think a response format like the following would be more ....
{code:xml}
<lst name="collapse_counts">
        <str name="field">venue</str>
        <lst name="">
            <lst name="233238">
                 <str name="fieldValue">melkweg</str>
                 <int name="collapseCount">2</int>
                 <lst name="collapsedValues">
                     <str name="price">10.99, "1.999,99"</str>
                     <str name="name">adapter, laptop</str>
                 </lst>
        </lst>
</lst>
{code}
As you can see the data is more banded together and therefore easier to parse. The collapsedValues can have one or more fields, each containing collapsed field values in a comma separated format. The _collapseValues_ element will off course only be added when the client specifies the collapsed fields in the request.
What do you think about this new result format? 
  
> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.