You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by uyilmaz <> on 2020/09/28 17:52:40 UTC

Worker node / collection creation, parallelized streams

Hi all,

Today I was fiddling with a streaming expression that takes too long to finish and times out. First of all, is it normal for it to time out, rather than just taking too long?

Then I read about the parallelized streaming expressions, which takes a worker number as parameter. We have 10 nodes in our cluster.

First question is, if I want to run it in 10 worker nodes, should I provide a partition key that takes exactly 10 different values, or Solr itself figures 10 different values from it? "mod" function query with modulus 10 came into my mind, but I got various errors when using it as a partition key.

Second question is, how do I correctly create a worker collection? Should it be an empty collection with 10 shards with 1 replica each, or 1 shard with 10 replicas? When I used the latter, I got array IndexOutOfBounds errors with workers parameter set to greater than 1.


uyilmaz <>