You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Issei Nishigata (JIRA)" <ji...@apache.org> on 2017/01/20 06:04:26 UTC
[jira] [Updated] (SOLR-10010) NGramTokenizer with
SynonymFilterFacory doesn't work properly when using Managed-Schema
[ https://issues.apache.org/jira/browse/SOLR-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Issei Nishigata updated SOLR-10010:
-----------------------------------
Description:
NGramTokenizer with SynonymFilterFacory doesn't work properly when using Managed-Schema
When using Managed-Schema, it doesn't work properly with the following settings.
{code:title=managed-schema}
<field name="bigram" type="text_bigram" indexed="true" stored="true"/>
<fieldType name="text_bigram" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="false">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="2"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
tokenizerFactory="solr.NGramTokenizerFactory"
tokenizerFactory.minGramSize="2" tokenizerFactory.maxGramSize="2"
ignoreCase="true" expand="true"/>
</analyzer>
</fieldType>
{code}
{code:title=synonyms.txt}
ab,ba
{code}
{code:title=expected}
querystring => "bigram:ab"
parsedquery => "bigram:ab bigram:ba"
{code}
{code:title=actual}
querystring => "bigram:ab"
parsedquery => "bigram:ab"
{code}
When using ClassicIndexSchemaFactory, works peroperly.
was:
NGramTokenizer with SynonymFilterFacory doesn't work properly when using Managed-Schema
When using Managed-Schema, it doesn't work properly with the following settings.
{code:title=managed-schema}
<field name="bigram" type="text_bigram" indexed="true" stored="true"/>
<fieldType name="text_bigram" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="false">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="2"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
tokenizerFactory="solr.NGramTokenizerFactory"
tokenizerFactory.minGramSize="2" tokenizerFactory.maxGramSize="2"
ignoreCase="true" expand="true"/>
</analyzer>
</fieldType>
{code}
{code:title=synonyms.txt}
ab,ba
{code}
{code:title=expected}
querystring => "bigram:ab"
parsedquery => "bigram:ab bigram:ba"
{code}
{code:title=actual}
querystring => "bigram:ab"
parsedquery => "bigram:ab"
{code}
When using ClassicIndexSchemaFactory, works peroperly.
I guess this is causing org.apache.lucene.analysis.synonym.SynonymFilterFactory#loadSynonyms doesn't set tokenizerFactory.minGramSize="2", tokenizerFactory.maxGramSize="2" when calling loader.inform() ( => constructor of "org.apache.solr.schema.IndexSchema")
(tokArgs is empty)
> NGramTokenizer with SynonymFilterFacory doesn't work properly when using Managed-Schema
> ---------------------------------------------------------------------------------------
>
> Key: SOLR-10010
> URL: https://issues.apache.org/jira/browse/SOLR-10010
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Schema and Analysis
> Reporter: Issei Nishigata
>
> NGramTokenizer with SynonymFilterFacory doesn't work properly when using Managed-Schema
> When using Managed-Schema, it doesn't work properly with the following settings.
> {code:title=managed-schema}
> <field name="bigram" type="text_bigram" indexed="true" stored="true"/>
> <fieldType name="text_bigram" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="false">
> <analyzer type="index">
> <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="2"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> tokenizerFactory="solr.NGramTokenizerFactory"
> tokenizerFactory.minGramSize="2" tokenizerFactory.maxGramSize="2"
> ignoreCase="true" expand="true"/>
> </analyzer>
> </fieldType>
> {code}
> {code:title=synonyms.txt}
> ab,ba
> {code}
> {code:title=expected}
> querystring => "bigram:ab"
> parsedquery => "bigram:ab bigram:ba"
> {code}
> {code:title=actual}
> querystring => "bigram:ab"
> parsedquery => "bigram:ab"
> {code}
> When using ClassicIndexSchemaFactory, works peroperly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org