You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/04/07 23:53:46 UTC

Re: More than one language in the same document

: I have documents where text from two languages, e.g. (english & korean) or
: (english & german) are mixed u p in a fairly intensive way. 20-30% of the

if you search the list archives you'll find a lot of results for 
"languages" ... it's not something i deal with much but i believe using 
separate fields (or dynamic fields) for each language is considered the 
best strategy.




-Hoss

Re: More than one language in the same document

Posted by Chris Hostetter <ho...@fucit.org>.

: > A related question. What does 'copyField' actually do? Does it 'append'
: > content from the source field to the 'target' field? Or does it
: > replace/overwrite it? Thank you.
: > 
: >   
: It appends the content of the source field to the target.

strictly speaking, it adds the content to the target field as if it were 
another multi-valued field value.



-Hoss

Re: More than one language in the same document

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.

ashokc wrote:
> What I am doing right now is to capture all the content under "content_korea"
> for example, use 'copyField' to duplicate that content to "content_english".
> "content_korea" gets processed with CJK analyzers, and "content_english"
> gets processed with usual detailed index/query analyzers, filters, synonyms.
> Some results do come up, but I have not been able to verify that this
> approach is yielding better results.
>
> A related question. What does 'copyField' actually do? Does it 'append'
> content from the source field to the 'target' field? Or does it
> replace/overwrite it? Thank you.
>
>   
It appends the content of the source field to the target.

Koji

Re: More than one language in the same document

Posted by ashokc <as...@qualcomm.com>.

What I am doing right now is to capture all the content under "content_korea"
for example, use 'copyField' to duplicate that content to "content_english".
"content_korea" gets processed with CJK analyzers, and "content_english"
gets processed with usual detailed index/query analyzers, filters, synonyms.
Some results do come up, but I have not been able to verify that this
approach is yielding better results.

A related question. What does 'copyField' actually do? Does it 'append'
content from the source field to the 'target' field? Or does it
replace/overwrite it? Thank you.

- ashok

hossman wrote:
> 
> 
> : I have documents where text from two languages, e.g. (english & korean)
> or
> : (english & german) are mixed u p in a fairly intensive way. 20-30% of
> the
> 
> if you search the list archives you'll find a lot of results for 
> "languages" ... it's not something i deal with much but i believe using 
> separate fields (or dynamic fields) for each language is considered the 
> best strategy.
> 
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/More-than-one-language-in-the-same-document-tp22726478p22939331.html
Sent from the Solr - User mailing list archive at Nabble.com.