You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by bing <ni...@hotmail.com> on 2012/01/30 05:05:41 UTC
language specific fields of "text"
Hi, all,
In this thread, I would like to ask some technical questions about how the
schema is defined to achieve language specific fields "text".
Say, currently I have the filed "text" defined as follows:
<field name="<b>text*" type="text_general" indexed="true"
stored="true" multiValued="true"/>
After indexing a document, I can see a field in the document extracted
correctly.
My first attempt is to add a filed named "text_en", defined exactly the same
way as "text":
<field name="<b>text_en*" type="text_general" indexed="true"
stored="true" multiValued="true"/>
However, after indexing the same document, why cannot I see the filed
extracted? Is it because "text" is a reserved field that cannot be changed
dynamically?
--
View this message in context: http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p3698985.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: language specific fields of "text"
Posted by AlexeyK <le...@gmail.com>.
You should use language detection processor factory, like below:
<processor
class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
<str name="langid.fl">content</str>
<str name="langid.langField">language</str>
<str name="langid.fallback">en</str>
*<str name="langid.map">true</str>
<str name="langid.map.fl">content,fullname</str>*
<str name="langid.map.keepOrig">true</str>
<str name="langid.whitelist">en,fr,de,es,ru,it</str>
<str name="langid.threshold">0.7</str>
</processor>
Once you have defined fields like content_en, content_fr etc., they will be
filled in automatically according to the recognized language
See http://wiki.apache.org/solr/LanguageDetection
--
View this message in context: http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p4031180.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: language specific fields of "text"
Posted by Paul Libbrecht <pa...@hoplahup.net>.
Hello bing,
Le 31 janv. 2012 à 04:27, bing a écrit :
> I understand your point of missing "text_en" in the document. It is. Not
> "text_en" but "text" exists.
Unless you use copyField or upload the field as another element, it will not get fed.
> But then it arises the question: isn't it dynamic to add language specific
> suffixes to an existing filed "text"?
not that I know of.
> I am new here. As far as I know, for some field "title", people can create
> "title_en" "title_fr" to incorporate different analyzers in the same schema.
> Even this, I am not seeing it happens. Thus, I am thinking whether it is
> possible I neglect some obvious point?
You'd use copyField.
> "Bing" is very common in the names of Chinese, as there are several Chinese
> characters corresponding to the same pronunciation.
good, I learn everyday.
paul
Re: language specific fields of "text"
Posted by bing <ni...@hotmail.com>.
Hi, Paul,
I understand your point of missing "text_en" in the document. It is. Not
"text_en" but "text" exists.
But then it arises the question: isn't it dynamic to add language specific
suffixes to an existing filed "text"?
I am new here. As far as I know, for some field "title", people can create
"title_en" "title_fr" to incorporate different analyzers in the same schema.
Even this, I am not seeing it happens. Thus, I am thinking whether it is
possible I neglect some obvious point?
"Bing" is very common in the names of Chinese, as there are several Chinese
characters corresponding to the same pronunciation.
Thanks for reply.
Best Regards,
Bing
--
View this message in context: http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p3702053.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: language specific fields of "text"
Posted by Paul Libbrecht <pa...@hoplahup.net>.
(bing is a surprising name for a mailing list about search engine)
My guess is that your document upload didn't contain the field text_en. Can it be?
Paul
bing <ni...@hotmail.com> a écrit :
>Hi, all,
>
>In this thread, I would like to ask some technical questions about how
>the
>schema is defined to achieve language specific fields "text".
>
>Say, currently I have the filed "text" defined as follows:
><field name="<b>text*" type="text_general" indexed="true"
>stored="true" multiValued="true"/>
>After indexing a document, I can see a field in the document extracted
>correctly.
>
>My first attempt is to add a filed named "text_en", defined exactly the
>same
>way as "text":
><field name="<b>text_en*" type="text_general" indexed="true"
>stored="true" multiValued="true"/>
>However, after indexing the same document, why cannot I see the filed
>extracted? Is it because "text" is a reserved field that cannot be
>changed
>dynamically?
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p3698985.html
>Sent from the Solr - User mailing list archive at Nabble.com.
--
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.