You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2017/12/15 16:43:04 UTC

[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

    [ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292780#comment-16292780 ] 

Erick Erickson commented on SOLR-6492:
--------------------------------------

Another application of this that just crossed my mind is the old "exact match when stemming" process. KeywordRepeatFilterFactory>>stemmer>>RemoveDuplicatesTokenFilterFactory at index time and then two analysis chains at query time, one with the stemmer and one without.

Still not perfect, if I index "running" and then search for "run" I'd get a match on the stemmed version. It would handle the case of indexing "run" and searching (exact match) on "running" and some of the other more surprising effects of stemming.

> Solr field type that supports multiple, dynamic analyzers
> ---------------------------------------------------------
>
>                 Key: SOLR-6492
>                 URL: https://issues.apache.org/jira/browse/SOLR-6492
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Trey Grainger
>             Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to support one or more dynamically-selected analyzers for a field. For example, someone may have a "content" field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German), a separate document in English (using an English Analyzer), and possibly even a field with mixed-language content in Greek and English. This latter case could pass the content separately through both an analyzer defined for Greek and another Analyzer defined for English, stacking or concatenating the token streams based upon the use-case.
> There are some distinct advantages in terms of index size and query performance which can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating content in separate fields and searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a different analyzer for the same field to remove a feature (i.e. turning on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org