You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Trey Grainger (JIRA)" <ji...@apache.org> on 2016/06/22 03:01:02 UTC

[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

    [ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343254#comment-15343254 ] 

Trey Grainger commented on SOLR-6492:
-------------------------------------

Hi [~krantiparisa] and [~dannytei1]. Apologies for the long lapse without a response on this issue. I won't get into the reasons here (combination of personal and professional commitments), but I just wanted to say that I expect to pick this issue back up in the near future and continue work on this patch.

In the meantime, I have added an ASL 2.0 license to the current code (from Solr in Action) so that folks can feel free to use what's there now: https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14

I'll turn what's there now into a patch, update it to Solr trunk, and keep iterating on it until the folks commenting on this issue are satisfied with the design and capabilities. Stay tuned...

> Solr field type that supports multiple, dynamic analyzers
> ---------------------------------------------------------
>
>                 Key: SOLR-6492
>                 URL: https://issues.apache.org/jira/browse/SOLR-6492
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Trey Grainger
>             Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to support one or more dynamically-selected analyzers for a field. For example, someone may have a "content" field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German), a separate document in English (using an English Analyzer), and possibly even a field with mixed-language content in Greek and English. This latter case could pass the content separately through both an analyzer defined for Greek and another Analyzer defined for English, stacking or concatenating the token streams based upon the use-case.
> There are some distinct advantages in terms of index size and query performance which can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating content in separate fields and searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a different analyzer for the same field to remove a feature (i.e. turning on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org