You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2007/09/16 00:02:35 UTC
[jira] Created: (SOLR-357) Prefixing Filter Factory -- for
'suggest'
Prefixing Filter Factory -- for 'suggest'
------------------------------------------
Key: SOLR-357
URL: https://issues.apache.org/jira/browse/SOLR-357
Project: Solr
Issue Type: New Feature
Reporter: Ryan McKinley
The PrefixingFilter builds a token for each prefix in the original token. It is appropriate for a type-ahead suggest style function.
Given the token "solr", this will build a token for "s","so","sol","solr".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-357) Prefixing Filter Factory -- for
'suggest'
Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527793 ]
Hoss Man commented on SOLR-357:
-------------------------------
isn't this a subset of the functionality in EdgeNGramFilterFactory ?
in answer to your followup questions:
1) the easiest way i can think of to do this is with a boolean query on two fields -- one using KeywordTokenizer and the other using something else. "Canon PIXMA" would match on bth, "...video card" would only match on the second.
2) i'm not sure i understand the question, it sounds like you would just want a really sloppy phrase query, but i must be missing something. should probably discuss outside of Jira.
> Prefixing Filter Factory -- for 'suggest'
> ------------------------------------------
>
> Key: SOLR-357
> URL: https://issues.apache.org/jira/browse/SOLR-357
> Project: Solr
> Issue Type: New Feature
> Reporter: Ryan McKinley
> Attachments: SOLR-357-PrefixingFilter.patch
>
>
> The PrefixingFilter builds a token for each prefix in the original token. It is appropriate for a type-ahead suggest style function.
> Given the token "solr", this will build a token for "s","so","sol","solr".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-357) Prefixing Filter Factory -- for
'suggest'
Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan McKinley resolved SOLR-357.
--------------------------------
Resolution: Invalid
Yes, this is a subset of EdgeNGramFilter.
For more discussion, see: http://www.nabble.com/%27suggest%27-query-sorting-tf4450280.html
Hoss points out that KeywordTokenizerFactory may be a more appropriate tokenizer. If you do need to complete internal tokens, use two fields.
I found this works well:
<fieldType name="prefix_full" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="prefix_token" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
...
<field name="prefix1" type="prefix_full" indexed="true" stored="false"/>
<field name="prefix2" type="prefix_token" indexed="true" stored="false"/>
...
<copyField source="name" dest="prefix1"/>
<copyField source="name" dest="prefix2"/>
If you query both fields, it boosts the first names that start with the query over the others:
http://localhost:8983/solr/select?fl=name,id&q=prefix1:ca%20prefix2:ca
> Prefixing Filter Factory -- for 'suggest'
> ------------------------------------------
>
> Key: SOLR-357
> URL: https://issues.apache.org/jira/browse/SOLR-357
> Project: Solr
> Issue Type: New Feature
> Reporter: Ryan McKinley
> Attachments: SOLR-357-PrefixingFilter.patch
>
>
> The PrefixingFilter builds a token for each prefix in the original token. It is appropriate for a type-ahead suggest style function.
> Given the token "solr", this will build a token for "s","so","sol","solr".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-357) Prefixing Filter Factory -- for
'suggest'
Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan McKinley updated SOLR-357:
-------------------------------
Attachment: SOLR-357-PrefixingFilter.patch
Adds a filter and adds it to the examples:
<fieldType name="prefixing" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PrefixingFilterFactory" maxLength="25"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
------------------------
With the example docs, I search for "ca" and get everything with a token that starts with "ca"
Any idea how I can get this to sort by:
1. values that start with the prefix before values that just contain the prefix. That is:
"Canon PIXMA ... " before " .... video card"
2. When matching multiple tokens, rank values where the matched tokens are closer together higher.
Ideas?
> Prefixing Filter Factory -- for 'suggest'
> ------------------------------------------
>
> Key: SOLR-357
> URL: https://issues.apache.org/jira/browse/SOLR-357
> Project: Solr
> Issue Type: New Feature
> Reporter: Ryan McKinley
> Attachments: SOLR-357-PrefixingFilter.patch
>
>
> The PrefixingFilter builds a token for each prefix in the original token. It is appropriate for a type-ahead suggest style function.
> Given the token "solr", this will build a token for "s","so","sol","solr".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.