You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2007/09/16 00:02:35 UTC

[jira] Created: (SOLR-357) Prefixing Filter Factory -- for 'suggest'

Prefixing Filter Factory -- for 'suggest' 
------------------------------------------

                 Key: SOLR-357
                 URL: https://issues.apache.org/jira/browse/SOLR-357
             Project: Solr
          Issue Type: New Feature
            Reporter: Ryan McKinley


The PrefixingFilter builds a token for each prefix in the original token.  It is appropriate for a type-ahead suggest style function.

Given the token "solr", this will build a token for "s","so","sol","solr".  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-357) Prefixing Filter Factory -- for 'suggest'

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527793 ] 

Hoss Man commented on SOLR-357:
-------------------------------

isn't this a subset of the functionality in EdgeNGramFilterFactory ?

in answer to your followup questions:  

1) the easiest way i can think of to do this is with a boolean query on two fields -- one using KeywordTokenizer and the other using something else. "Canon PIXMA" would match on bth, "...video card" would only match on the second.

2) i'm not sure i understand the question, it sounds like you would just want a really sloppy phrase query, but i must be missing something.  should probably discuss outside of Jira.

> Prefixing Filter Factory -- for 'suggest' 
> ------------------------------------------
>
>                 Key: SOLR-357
>                 URL: https://issues.apache.org/jira/browse/SOLR-357
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-357-PrefixingFilter.patch
>
>
> The PrefixingFilter builds a token for each prefix in the original token.  It is appropriate for a type-ahead suggest style function.
> Given the token "solr", this will build a token for "s","so","sol","solr".  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-357) Prefixing Filter Factory -- for 'suggest'

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley resolved SOLR-357.
--------------------------------

    Resolution: Invalid

Yes, this is a subset of EdgeNGramFilter.

For more discussion, see: http://www.nabble.com/%27suggest%27-query-sorting-tf4450280.html

Hoss points out that KeywordTokenizerFactory may be a more appropriate tokenizer.  If you do need to complete internal tokens, use two fields.

I found this works well:


<fieldType name="prefix_full" class="solr.TextField" positionIncrementGap="1">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

<fieldType name="prefix_token" class="solr.TextField" positionIncrementGap="1">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

...
   <field name="prefix1" type="prefix_full"  indexed="true" stored="false"/>
   <field name="prefix2" type="prefix_token" indexed="true" stored="false"/>
...
   <copyField source="name" dest="prefix1"/>
   <copyField source="name" dest="prefix2"/>

If you query both fields, it boosts the first names that start with the query over the others:

http://localhost:8983/solr/select?fl=name,id&q=prefix1:ca%20prefix2:ca




> Prefixing Filter Factory -- for 'suggest' 
> ------------------------------------------
>
>                 Key: SOLR-357
>                 URL: https://issues.apache.org/jira/browse/SOLR-357
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-357-PrefixingFilter.patch
>
>
> The PrefixingFilter builds a token for each prefix in the original token.  It is appropriate for a type-ahead suggest style function.
> Given the token "solr", this will build a token for "s","so","sol","solr".  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-357) Prefixing Filter Factory -- for 'suggest'

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-357:
-------------------------------

    Attachment: SOLR-357-PrefixingFilter.patch

Adds a filter and adds it to the examples:

<fieldType name="prefixing" class="solr.TextField" positionIncrementGap="1">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.PrefixingFilterFactory" maxLength="25"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

------------------------

With the example docs, I search for "ca" and get everything with a token that starts with "ca"

Any idea how I can get this to sort by:
1. values that start with the prefix before values that just contain the prefix.  That is:
 "Canon PIXMA ... " before " .... video card"

2. When matching multiple tokens, rank values where the matched tokens are closer together higher.

Ideas?







> Prefixing Filter Factory -- for 'suggest' 
> ------------------------------------------
>
>                 Key: SOLR-357
>                 URL: https://issues.apache.org/jira/browse/SOLR-357
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-357-PrefixingFilter.patch
>
>
> The PrefixingFilter builds a token for each prefix in the original token.  It is appropriate for a type-ahead suggest style function.
> Given the token "solr", this will build a token for "s","so","sol","solr".  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.