You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/04/07 23:53:46 UTC
Re: More than one language in the same document
: I have documents where text from two languages, e.g. (english & korean) or
: (english & german) are mixed u p in a fairly intensive way. 20-30% of the
if you search the list archives you'll find a lot of results for
"languages" ... it's not something i deal with much but i believe using
separate fields (or dynamic fields) for each language is considered the
best strategy.
-Hoss
Re: More than one language in the same document
Posted by Chris Hostetter <ho...@fucit.org>.
: > A related question. What does 'copyField' actually do? Does it 'append'
: > content from the source field to the 'target' field? Or does it
: > replace/overwrite it? Thank you.
: >
: >
: It appends the content of the source field to the target.
strictly speaking, it adds the content to the target field as if it were
another multi-valued field value.
-Hoss
Re: More than one language in the same document
Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
ashokc wrote:
> What I am doing right now is to capture all the content under "content_korea"
> for example, use 'copyField' to duplicate that content to "content_english".
> "content_korea" gets processed with CJK analyzers, and "content_english"
> gets processed with usual detailed index/query analyzers, filters, synonyms.
> Some results do come up, but I have not been able to verify that this
> approach is yielding better results.
>
> A related question. What does 'copyField' actually do? Does it 'append'
> content from the source field to the 'target' field? Or does it
> replace/overwrite it? Thank you.
>
>
It appends the content of the source field to the target.
Koji
Re: More than one language in the same document
Posted by ashokc <as...@qualcomm.com>.
What I am doing right now is to capture all the content under "content_korea"
for example, use 'copyField' to duplicate that content to "content_english".
"content_korea" gets processed with CJK analyzers, and "content_english"
gets processed with usual detailed index/query analyzers, filters, synonyms.
Some results do come up, but I have not been able to verify that this
approach is yielding better results.
A related question. What does 'copyField' actually do? Does it 'append'
content from the source field to the 'target' field? Or does it
replace/overwrite it? Thank you.
- ashok
hossman wrote:
>
>
> : I have documents where text from two languages, e.g. (english & korean)
> or
> : (english & german) are mixed u p in a fairly intensive way. 20-30% of
> the
>
> if you search the list archives you'll find a lot of results for
> "languages" ... it's not something i deal with much but i believe using
> separate fields (or dynamic fields) for each language is considered the
> best strategy.
>
>
>
>
> -Hoss
>
>
>
--
View this message in context: http://www.nabble.com/More-than-one-language-in-the-same-document-tp22726478p22939331.html
Sent from the Solr - User mailing list archive at Nabble.com.