You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2010/08/23 02:36:16 UTC

[jira] Created: (SOLR-2068) Search Grouping: collapse by string specialization

Search Grouping: collapse by string specialization
--------------------------------------------------

                 Key: SOLR-2068
                 URL: https://issues.apache.org/jira/browse/SOLR-2068
             Project: Solr
          Issue Type: Sub-task
            Reporter: Yonik Seeley


Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (SOLR-2068) Search Grouping: collapse by string specialization

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926338#action_12926338 ] 

Martijn van Groningen edited comment on SOLR-2068 at 10/29/10 12:50 PM:
------------------------------------------------------------------------

The improvements extracted from SOLR-2205 dedicated the the phase 2 improvements.

      was (Author: martijn):
    The improvements extracted from SOLR-2205
  
> Search Grouping: collapse by string specialization
> --------------------------------------------------
>
>                 Key: SOLR-2068
>                 URL: https://issues.apache.org/jira/browse/SOLR-2068
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>         Attachments: SOLR-2068.patch
>
>
> Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2068) Search Grouping: collapse by string specialization

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-2068:
-------------------------------

    Attachment: SOLR-2068.patch

Here's a draft patch (completely untested) that implements the phase2 specialization:

bq.  at the start of each segment, look up the ords for the values and hash the group based on that ord (or leave it out of the hash if it didn't exist in that segment).


> Search Grouping: collapse by string specialization
> --------------------------------------------------
>
>                 Key: SOLR-2068
>                 URL: https://issues.apache.org/jira/browse/SOLR-2068
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>         Attachments: SOLR-2068.patch, SOLR-2068.patch, SOLR-2068.patch
>
>
> Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2068) Search Grouping: collapse by string specialization

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927700#action_12927700 ] 

Yonik Seeley commented on SOLR-2068:
------------------------------------

The ord in one segment isn't equivalent to the ord in another segment... so I don't think the currently attached patch will work.

> Search Grouping: collapse by string specialization
> --------------------------------------------------
>
>                 Key: SOLR-2068
>                 URL: https://issues.apache.org/jira/browse/SOLR-2068
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>         Attachments: SOLR-2068.patch, SOLR-2068.patch
>
>
> Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2068) Search Grouping: collapse by string specialization

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-2068:
----------------------------------------

    Attachment: SOLR-2068.patch

The improvements extracted from SOLR-2205

> Search Grouping: collapse by string specialization
> --------------------------------------------------
>
>                 Key: SOLR-2068
>                 URL: https://issues.apache.org/jira/browse/SOLR-2068
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>         Attachments: SOLR-2068.patch
>
>
> Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2068) Search Grouping: collapse by string specialization

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926290#action_12926290 ] 

Yonik Seeley commented on SOLR-2068:
------------------------------------

Going back over my old notes on how to efficiently do a string field per-segment:

Phase1:
 - Basically, hash based on ord (or a direct index lookup if the # of ords is small enough).  We don't look up the value of the string at this point.
 - When a segment changes, we need to convert the ords from the old segment to the new segment (i.e. look up it's value in the old segment, and find the ord of that in the new segment).
   - if the group value is not found in the new segment, the remove it from the hash.  Keep it in the ordered map since it can still be pushed out by other insertions.

Phase 2:
 - at the start of each segment, look up the ords for the values and hash the group based on that ord (or leave it out of the hash if it didn't exist in that segment).

Martijn's optimization in SOLR-2205 probably made Phase1 less important (except if there are very few unique groups), so perhaps we should start with Phase2 first.


> Search Grouping: collapse by string specialization
> --------------------------------------------------
>
>                 Key: SOLR-2068
>                 URL: https://issues.apache.org/jira/browse/SOLR-2068
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>
> Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2068) Search Grouping: collapse by string specialization

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926340#action_12926340 ] 

Martijn van Groningen commented on SOLR-2068:
---------------------------------------------

bq.  we would be better off using an OpenBitSet - or even better, a sparse set since the number of elements
I get the OpenBitSet part, but what actual class do you mean with a sparse set? (Not a primitive array I assume) 

> Search Grouping: collapse by string specialization
> --------------------------------------------------
>
>                 Key: SOLR-2068
>                 URL: https://issues.apache.org/jira/browse/SOLR-2068
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>         Attachments: SOLR-2068.patch
>
>
> Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2068) Search Grouping: collapse by string specialization

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927893#action_12927893 ] 

Martijn van Groningen commented on SOLR-2068:
---------------------------------------------

bq. The ord in one segment isn't equivalent to the ord in another segment... so I don't think the currently attached patch will work.
Yes, your right. I didn't notice that since the index I used for testing was optimized and there was only one subreader...

> Search Grouping: collapse by string specialization
> --------------------------------------------------
>
>                 Key: SOLR-2068
>                 URL: https://issues.apache.org/jira/browse/SOLR-2068
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>         Attachments: SOLR-2068.patch, SOLR-2068.patch, SOLR-2068.patch
>
>
> Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2068) Search Grouping: collapse by string specialization

Posted by "Martijn van Groningen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated SOLR-2068:
----------------------------------------

    Attachment: SOLR-2068.patch

* Updated the patch to the latest trunk
* Changed the boolean array to a OpenBitSet
* Made the Phase2GroupCollector per-segment friendly

> Search Grouping: collapse by string specialization
> --------------------------------------------------
>
>                 Key: SOLR-2068
>                 URL: https://issues.apache.org/jira/browse/SOLR-2068
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>         Attachments: SOLR-2068.patch, SOLR-2068.patch
>
>
> Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org