You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by bing <ni...@hotmail.com> on 2012/01/30 05:05:41 UTC

language specific fields of "text"

Hi, all, 

In this thread, I would like to ask some technical questions about how the
schema is defined to achieve  language specific fields "text". 

Say, currently I have the filed "text" defined as follows:
<field name=&quot;&lt;b>text*" type="text_general" indexed="true"
stored="true" multiValued="true"/> 
After indexing a document, I can see a field in the document extracted
correctly. 

My first attempt is to add a filed named "text_en", defined exactly the same
way as "text":
<field name=&quot;&lt;b>text_en*" type="text_general" indexed="true"
stored="true" multiValued="true"/> 
However, after indexing the same document, why cannot I see the filed
extracted? Is it because "text" is a reserved field that cannot be changed
dynamically? 

--
View this message in context: http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p3698985.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: language specific fields of "text"

Posted by AlexeyK <le...@gmail.com>.
You should use language detection processor factory, like below:

<processor
class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
         <str name="langid.fl">content</str>
         <str name="langid.langField">language</str>
         <str name="langid.fallback">en</str>
		 *<str name="langid.map">true</str>
		 <str name="langid.map.fl">content,fullname</str>*
<str name="langid.map.keepOrig">true</str>
<str name="langid.whitelist">en,fr,de,es,ru,it</str>
		 <str name="langid.threshold">0.7</str>
       </processor>

Once you have defined fields like content_en, content_fr etc., they will be
filled in automatically according to the recognized language

See http://wiki.apache.org/solr/LanguageDetection



--
View this message in context: http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p4031180.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: language specific fields of "text"

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Hello bing,

Le 31 janv. 2012 à 04:27, bing a écrit :
> I understand your point of missing "text_en" in the document. It is. Not
> "text_en" but "text" exists.

Unless you use copyField or upload the field as another element, it will not get fed.

> But then it arises the question: isn't it dynamic to add language specific
> suffixes to an existing filed "text"?

not that I know of.

> I am new here. As far as I know, for some field "title", people can create
> "title_en" "title_fr" to incorporate different analyzers in the same schema.
> Even this, I am not seeing it happens. Thus, I am thinking whether it is
> possible I neglect some obvious point? 

You'd use copyField.

> "Bing" is very common in the names of Chinese, as there are several Chinese
> characters corresponding to the same pronunciation. 

good, I learn everyday.

paul

Re: language specific fields of "text"

Posted by bing <ni...@hotmail.com>.
Hi, Paul, 

I understand your point of missing "text_en" in the document. It is. Not
"text_en" but "text" exists.
But then it arises the question: isn't it dynamic to add language specific
suffixes to an existing filed "text"?

I am new here. As far as I know, for some field "title", people can create
"title_en" "title_fr" to incorporate different analyzers in the same schema.
Even this, I am not seeing it happens. Thus, I am thinking whether it is
possible I neglect some obvious point? 

"Bing" is very common in the names of Chinese, as there are several Chinese
characters corresponding to the same pronunciation. 

Thanks for reply.

Best Regards, 
Bing

--
View this message in context: http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p3702053.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: language specific fields of "text"

Posted by Paul Libbrecht <pa...@hoplahup.net>.
(bing is a surprising name for a mailing list about search engine)

My guess is that your document upload didn't contain the field text_en. Can it be?
Paul


bing <ni...@hotmail.com> a écrit :

>Hi, all, 
>
>In this thread, I would like to ask some technical questions about how
>the
>schema is defined to achieve  language specific fields "text". 
>
>Say, currently I have the filed "text" defined as follows:
><field name=&quot;&lt;b>text*" type="text_general" indexed="true"
>stored="true" multiValued="true"/> 
>After indexing a document, I can see a field in the document extracted
>correctly. 
>
>My first attempt is to add a filed named "text_en", defined exactly the
>same
>way as "text":
><field name=&quot;&lt;b>text_en*" type="text_general" indexed="true"
>stored="true" multiValued="true"/> 
>However, after indexing the same document, why cannot I see the filed
>extracted? Is it because "text" is a reserved field that cannot be
>changed
>dynamically? 
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/language-specific-fields-of-text-tp3698985p3698985.html
>Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.