You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Vlad (Jira)" <ji...@apache.org> on 2020/09/04 21:25:00 UTC
[jira] [Updated] (SOLR-14832) Inversion Eglish and numbers
characters in Arabic documents
[ https://issues.apache.org/jira/browse/SOLR-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vlad updated SOLR-14832:
------------------------
Description:
Hi Support,
please help to resolve an issue. I upload/index several documents in English and in Arabic languages to SOLR, in addition I use handler for Arabic language:
<fieldType name="text" class="solr.TextField" positionIncrementGap="50">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
There are two environments:
# Local machine:
- SOLR version: 4,2
- Windows version: 10
# DEV env:
- SOLR version 4.1 as part of the cloudera suit
- Linux core version: 3.10.0-862
Issue appears when uploading documents:
# Local machine:
- Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]")
- Doc in Arabic with some English words - ok (for example, "[www.apache.org|http://www.apache.org/]")
# DEV env:
- Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]")
- Doc in Arabic with some English - English text is inverted (for example, "gro.echapa.www"), what makes search by key words impossible.
Please advise whether this fixable and how?
Thank you in advance!
was:
Hi Support,
please help to resolve an issue. I upload/index several documents in English and in Arabic languages to SOLR, in addition I use handler for Arabic language:
<fieldType name="text" class="solr.TextField" positionIncrementGap="50">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
There are two environments:
# Local machine:
- SOLR version: 4,2
- Windows version: 10
# DEV env:
- SOLR version:
- Cloudera suit
- Linux core version: 3.10.0-862
Issue appears when uploading documents:
# Local machine:
- Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]")
- Doc in Arabic with some English words - ok (for example, "[www.apache.org|http://www.apache.org/]")
# DEV env:
- Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]")
- Doc in Arabic with some English - English text is inverted (for example, "gro.echapa.www"), what makes search by key words impossible.
Please advise whether this fixable and how?
> Inversion Eglish and numbers characters in Arabic documents
> -----------------------------------------------------------
>
> Key: SOLR-14832
> URL: https://issues.apache.org/jira/browse/SOLR-14832
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 4.1
> Reporter: Vlad
> Priority: Major
>
> Hi Support,
>
> please help to resolve an issue. I upload/index several documents in English and in Arabic languages to SOLR, in addition I use handler for Arabic language:
> <fieldType name="text" class="solr.TextField" positionIncrementGap="50">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.ArabicNormalizationFilterFactory"/>
> <filter class="solr.ArabicStemFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.ArabicNormalizationFilterFactory"/>
> <filter class="solr.ArabicStemFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
>
> </analyzer>
>
> There are two environments:
> # Local machine:
> - SOLR version: 4,2
> - Windows version: 10
>
> # DEV env:
> - SOLR version 4.1 as part of the cloudera suit
> - Linux core version: 3.10.0-862
>
> Issue appears when uploading documents:
> # Local machine:
> - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]")
> - Doc in Arabic with some English words - ok (for example, "[www.apache.org|http://www.apache.org/]")
>
> # DEV env:
> - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]")
> - Doc in Arabic with some English - English text is inverted (for example, "gro.echapa.www"), what makes search by key words impossible.
>
> Please advise whether this fixable and how?
>
> Thank you in advance!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org