You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2010/05/18 17:52:43 UTC

[jira] Created: (SOLR-1918) Bit-wise scoring field type

Bit-wise scoring field type
---------------------------

                 Key: SOLR-1918
                 URL: https://issues.apache.org/jira/browse/SOLR-1918
             Project: Solr
          Issue Type: New Feature
          Components: Schema and Analysis
    Affects Versions: 3.1
            Reporter: Andrzej Bialecki 


This field type supports bit-wise matching with scoring over fields containing 32-bit integer values.

Internally during indexing the field value is interpreted as a 32-bit integer and converted to a set of bitmask terms. During search the query term is also converted into a BooleanQuery of corresponding bitmask terms, added as Occur.SHOULD clauses. The scoring of such query over such field produces the desired effect, where bit patterns closest to the query get highest scores.

NOTE: currently only single value per field, a future extension will support multiple values.

The difference between this issue and SOLR-1913 is that SOLR-1913 offers filtering only (no scoring), while this implementation performs scoring, which allows finding the best matching inexact bit patterns. This has applications in e.g. near-duplicate detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1918) Bit-wise scoring field type

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869314#action_12869314 ] 

Andrzej Bialecki  commented on SOLR-1918:
-----------------------------------------

Maybe you expected something more complex ... no, it's not the Hamming distance. The score comes from the number of differing bits between the query and the field value (actually, a list of bitmask terms created from the value). It's really a simple BooleanQuery in disguise - please see the unit test.

> Bit-wise scoring field type
> ---------------------------
>
>                 Key: SOLR-1918
>                 URL: https://issues.apache.org/jira/browse/SOLR-1918
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>    Affects Versions: 3.1
>            Reporter: Andrzej Bialecki 
>         Attachments: bitwise.patch
>
>
> This field type supports bit-wise matching with scoring over fields containing 32-bit integer values.
> Internally during indexing the field value is interpreted as a 32-bit integer and converted to a set of bitmask terms. During search the query term is also converted into a BooleanQuery of corresponding bitmask terms, added as Occur.SHOULD clauses. The scoring of such query over such field produces the desired effect, where bit patterns closest to the query get highest scores.
> NOTE: currently only single value per field, a future extension will support multiple values.
> The difference between this issue and SOLR-1913 is that SOLR-1913 offers filtering only (no scoring), while this implementation performs scoring, which allows finding the best matching inexact bit patterns. This has applications in e.g. near-duplicate detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-1918) Bit-wise scoring field type

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868962#action_12868962 ] 

Lance Norskog commented on SOLR-1918:
-------------------------------------

I can't quite follow the patch. Does this do scoring from the Hamming distance?

> Bit-wise scoring field type
> ---------------------------
>
>                 Key: SOLR-1918
>                 URL: https://issues.apache.org/jira/browse/SOLR-1918
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>    Affects Versions: 3.1
>            Reporter: Andrzej Bialecki 
>         Attachments: bitwise.patch
>
>
> This field type supports bit-wise matching with scoring over fields containing 32-bit integer values.
> Internally during indexing the field value is interpreted as a 32-bit integer and converted to a set of bitmask terms. During search the query term is also converted into a BooleanQuery of corresponding bitmask terms, added as Occur.SHOULD clauses. The scoring of such query over such field produces the desired effect, where bit patterns closest to the query get highest scores.
> NOTE: currently only single value per field, a future extension will support multiple values.
> The difference between this issue and SOLR-1913 is that SOLR-1913 offers filtering only (no scoring), while this implementation performs scoring, which allows finding the best matching inexact bit patterns. This has applications in e.g. near-duplicate detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-1918) Bit-wise scoring field type

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated SOLR-1918:
------------------------------------

    Attachment: bitwise.patch

BitwiseField implementation and a unit test.

> Bit-wise scoring field type
> ---------------------------
>
>                 Key: SOLR-1918
>                 URL: https://issues.apache.org/jira/browse/SOLR-1918
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>    Affects Versions: 3.1
>            Reporter: Andrzej Bialecki 
>         Attachments: bitwise.patch
>
>
> This field type supports bit-wise matching with scoring over fields containing 32-bit integer values.
> Internally during indexing the field value is interpreted as a 32-bit integer and converted to a set of bitmask terms. During search the query term is also converted into a BooleanQuery of corresponding bitmask terms, added as Occur.SHOULD clauses. The scoring of such query over such field produces the desired effect, where bit patterns closest to the query get highest scores.
> NOTE: currently only single value per field, a future extension will support multiple values.
> The difference between this issue and SOLR-1913 is that SOLR-1913 offers filtering only (no scoring), while this implementation performs scoring, which allows finding the best matching inexact bit patterns. This has applications in e.g. near-duplicate detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org