You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Customer <ma...@gmail.com> on 2016/10/10 15:08:29 UTC

Sharding strategies

Hi,


I'm started working on the project which will likely have lots of 
documents in every single language and because of that I'm a bit worried 
storing everything into one single shard. What would be the best way for 
data store, any advices how I should split my data ? I was thinking 
about going for alphabet (make a shard for every single alphabet letter, 
but knowing fact that there will be lots of languages - not only 
English, this is not an option).

Thank you for your advicesin advance.

Re: Sharding strategies

Posted by Reth RM <re...@gmail.com>.
If you will have numerous documents, splitting documents into shard is a
strategy. This split is independent of lingo of document.

For documents with different languages, its necessary to use language
specific analyzers to obtain good search results. For example, assume you
have english language documents, its _text_ field should ideally be
text_en;  likewise, for Chinese/Japanese/Korean type documents, its fields'
fieldType should be text_cjk. If you mix documents of different language
type in same shard, then you will have to define multiple fieldTypes for
each language of document and also at query time manage, need to ensure to
query on respective fields.

There are different strategies that can be applied to have multilingual
search, slide 19 in this ppt explains them
http://www.slideshare.net/treygrainger/semantic-multilingual-strategies-in-lucenesolr
and
there's another article here  based on the assumption that we know the
language of the incoming document and the language in which the query could
be
https://support.lucidworks.com/hc/en-us/articles/203718886-How-to-implement-Multilingual-Search-using-Solr





On Mon, Oct 10, 2016 at 8:08 AM, Customer <ma...@gmail.com> wrote:

> Hi,
>
>
> I'm started working on the project which will likely have lots of
> documents in every single language and because of that I'm a bit worried
> storing everything into one single shard. What would be the best way for
> data store, any advices how I should split my data ? I was thinking about
> going for alphabet (make a shard for every single alphabet letter, but
> knowing fact that there will be lots of languages - not only English, this
> is not an option).
>
> Thank you for your advicesin advance.
>