You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Troy Edwards <te...@gmail.com> on 2016/01/19 21:30:49 UTC

Scaling SolrCloud

We are currently "beta testing" a SolrCloud with 2 nodes and 2 shards with
2 replicas each. The number of documents is about 125000.

We now want to scale this to about 10 billion documents.

What are the steps to prototyping, hardware estimation and stress testing?

Thanks

Re: Scaling SolrCloud

Posted by Walter Underwood <wu...@wunderwood.org>.
Alternatively, do you still want to be protected against a single failure during scheduled maintenance?

With a three node ensemble, when one Zookeeper node is being updated or moved to a new instance, one more failure means it does not have a quorum. With a five node ensemble, three nodes would still be up.

If you are OK with that risk, run three nodes. If not, run five.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 21, 2016, at 9:27 AM, Erick Erickson <er...@gmail.com> wrote:
> 
> NP. My usual question though is "how often do you expect to lose a
> second ZK node before you can replace the first one that died?"
> 
> My tongue-in-cheek statement is often "If you're losing two nodes
> regularly, you have problems with your hardware that you're not really
> going to address by adding more ZK nodes" ;).
> 
> And do note that even if you lose quorum, SolrCloud will continue to
> serve _queries_, albeit the "picture" each individual Solr node has of
> the current state of all the Solr nodes will get stale. You won't be
> able to index though. That said, the internal Solr load balancers
> auto-distribute queries anyway to live nodes, so things can limp
> along.
> 
> As always, it's a tradeoff between expense/complexity and robustness
> though, and each and every situation is different in how much risk it
> can tolerate.
> 
> FWIW,
> Erick
> 
> On Thu, Jan 21, 2016 at 1:49 AM, Yago Riveiro <ya...@gmail.com> wrote:
>> Is not a typo. I was wrong, for zookeeper 2 nodes still count as majority.
>> It's not the desirable configuration but is tolerable.
>> 
>> 
>> 
>> Thanks Erick.
>> 
>> 
>> 
>> \--
>> 
>> /Yago Riveiro
>> 
>>> On Jan 21 2016, at 4:15 am, Erick Erickson &lt;erickerickson@gmail.com&gt;
>> wrote:
>> 
>>> 
>> 
>>> bq: 3 are to risky, you lost one you lost quorum
>> 
>>> 
>> 
>>> Typo? You need to lose two.....
>> 
>>> 
>> 
>>> On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro &lt;yago.riveiro@gmail.com&gt;
>> wrote:
>> &gt; Our Zookeeper cluster is an ensemble of 5 machines, is a good starting
>> point,
>> &gt; 3 are to risky, you lost one you lost quorum and with 7 sync cost
>> increase.
>> &gt;
>> &gt;
>> &gt;
>> &gt; ZK cluster is in machines without IO and rotative hdd (don't not use SDD
>> to
>> &gt; gain IO performance, zookeeper is optimized to spinning disks).
>> &gt;
>> &gt;
>> &gt;
>> &gt; The ZK cluster behaves without problems, the first deploy of ZK was in
>> the
>> &gt; same machines that the Solr Cluster (ZK log in its own hdd) and that
>> didn't
>> &gt; wok very well, CPU and networking IO from Solr Cluster was too much.
>> &gt;
>> &gt;
>> &gt;
>> &gt; About schema modifications.
>> &gt;
>> &gt; Modify the schema to add new fields is relative simple with new API, in
>> the
>> &gt; pass all the work was manually uploading the schema to ZK and reloading
>> all
>> &gt; collections (indexing must be disable or timeouts and funny errors
>> happen).
>> &gt;
>> &gt; With the new Schema API this is more user friendly. Anyway, I stop
>> indexing
>> &gt; and for reload the collections (I don't know if it's necessary nowadays).
>> &gt;
>> &gt; About Indexing data.
>> &gt;
>> &gt;
>> &gt;
>> &gt; We have self made data importer, it's not java and not performs batch
>> indexing
>> &gt; (with 500 collections buffer data and build the batch is expensive and
>> &gt; complicate for error handling).
>> &gt;
>> &gt;
>> &gt;
>> &gt; We use regular HTTP post in json. Our throughput is about 1000 docs/s
>> without
>> &gt; any type of optimization. Some time we have issues with replication, the
>> slave
>> &gt; can keep pace with leader insertion and a full sync is requested, this is
>> bad
>> &gt; because sync the replica again implicates a lot of IO wait and CPU and
>> with
>> &gt; replicas with 100G take an hour or more (normally when this happen, we
>> disable
>> &gt; indexing to release IO and CPU and not kill the node with a load of 50 or
>> 60).
>> &gt;
>> &gt; In this department my advice is "keep it simple" in the end is an HTTP
>> POST to
>> &gt; a node of the cluster.
>> &gt;
>> &gt;
>> &gt;
>> &gt; \\--
>> &gt;
>> &gt; /Yago Riveiro
>> &gt;
>> &gt;&gt; On Jan 20 2016, at 1:39 pm, Troy Edwards
>> &amp;lt;tedwards415107@gmail.com&amp;gt;
>> &gt; wrote:
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt; Thank you for sharing your experiences/ideas.
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt; Yago since you have 8 billion documents over 500 collections, can you
>> share
>> &gt; what/how you do index maintenance (e.g. add field)? And how are you
>> loading
>> &gt; data into the index? Any experiences around how Zookeeper ensemble
>> behaves
>> &gt; with so many collections?
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt; Best,
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt;
>> &gt; On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro
>> &amp;lt;yago.riveiro@gmail.com&amp;gt;
>> &gt; wrote:
>> &gt;
>> &gt;&gt;
>> &gt;
>> &gt;&gt; &amp;gt; What I can say is:
>> &gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; * SDD (crucial for performance if the index doesn't fit in
>> memory, and
>> &gt; &amp;gt; will not fit)
>> &gt; &amp;gt; * Divide and conquer, for that volume of docs you will need more
>> than 6
>> &gt; &amp;gt; nodes.
>> &gt; &amp;gt; * DocValues to not stress the java HEAP.
>> &gt; &amp;gt; * Do you will you aggregate data?, if yes, what is your max
>> &gt; &amp;gt; cardinality?, this question is the most important to size
>> correctly the
>> &gt; &amp;gt; memory needs.
>> &gt; &amp;gt; * Latency is important too, which threshold is acceptable before
>> &gt; &amp;gt; consider a query slow?
>> &gt; &amp;gt; At my company we are running a 12 terabytes (2 replicas) Solr
>> cluster
>> &gt; with
>> &gt; &amp;gt; 8
>> &gt; &amp;gt; billion documents sparse over 500 collection . For this we have
>> about 12
>> &gt; &amp;gt; machines with SDDs and 32G of ram each (~24G for the heap).
>> &gt; &amp;gt;
>> &gt; &amp;gt; We don't have a strict need of speed, 30 second query to
>> aggregate 100
>> &gt; &amp;gt; million
>> &gt; &amp;gt; documents with 1M of unique keys is fast enough for us, normally
>> the
>> &gt; &amp;gt; aggregation performance decrease as the number of unique keys
>> increase,
>> &gt; &amp;gt; with
>> &gt; &amp;gt; low unique key factor, queries take less than 2 seconds if data
>> is in OS
>> &gt; &amp;gt; cache.
>> &gt; &amp;gt;
>> &gt; &amp;gt; Personal recommendations:
>> &gt; &amp;gt;
>> &gt; &amp;gt; * Sharding is important and smart sharding is crucial, you don't
>> want
>> &gt; &amp;gt; run queries on data that is not interesting (this slow down
>> queries when
>> &gt; &amp;gt; the dataset is big).
>> &gt; &amp;gt; * If you want measure speed do it with about 1 billion documents
>> to
>> &gt; &amp;gt; simulate something real (real for 10 billion document world).
>> &gt; &amp;gt; * Index with re-indexing in mind. with 10 billion docs, re-index
>> data
>> &gt; &amp;gt; takes months ... This is important if you don't use regular
>> features of
>> &gt; &amp;gt; Solr. In my case I configured Docvalues with disk format (not
>> standard
>> &gt; &amp;gt; feature in 4.x) and at some point this format was deprecated.
>> Upgrade
>> &gt; Solr
>> &gt; &amp;gt; to 5.x was an epic 3 months battle to do it without full
>> downtime.
>> &gt; &amp;gt; * Solr is like your girlfriend, will demand love and care and
>> plenty of
>> &gt; &amp;gt; space to full-recover replicas that in some point are out of
>> sync, happen
>> &gt; a
>> &gt; &amp;gt; lot restarting nodes (this is annoying with replicas with 100G),
>> don't
>> &gt; &amp;gt; underestimate this point. Free space can save your life.
>> &gt; &amp;gt;
>> &gt; &amp;gt; \\\\--
>> &gt; &amp;gt;
>> &gt; &amp;gt; /Yago Riveiro
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; On Jan 19 2016, at 11:26 pm, Shawn Heisey
>> &gt; &amp;amp;lt;apache@elyograg.org&amp;amp;gt;
>> &gt; &amp;gt; wrote:
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; On 1/19/2016 1:30 PM, Troy Edwards wrote:
>> &gt; &amp;gt; &amp;amp;gt; We are currently "beta testing" a SolrCloud with 2
>> nodes and 2
>> &gt; shards
>> &gt; &amp;gt; with
>> &gt; &amp;gt; &amp;amp;gt; 2 replicas each. The number of documents is about
>> 125000.
>> &gt; &amp;gt; &amp;amp;gt;
>> &gt; &amp;gt; &amp;amp;gt; We now want to scale this to about 10 billion
>> documents.
>> &gt; &amp;gt; &amp;amp;gt;
>> &gt; &amp;gt; &amp;amp;gt; What are the steps to prototyping, hardware
>> estimation and
>> &gt; stress
>> &gt; &amp;gt; testing?
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; There is no general information available for sizing,
>> because there
>> &gt; are
>> &gt; &amp;gt; too many factors that will affect the answers. Some of the
>> important
>> &gt; &amp;gt; information that you need will be impossible to predict until
>> you
>> &gt; &amp;gt; actually build it and subject it to a real query load.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; https://lucidworks.com/blog/sizing-hardware-in-the-
>> abstract-why-we-
>> &gt; dont-
>> &gt; &amp;gt; have-a-definitive-answer/
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; With an index of 10 billion documents, you may not be
>> able to
>> &gt; precisely
>> &gt; &amp;gt; predict performance and hardware requirements from a small-scale
>> &gt; &amp;gt; prototype. You'll likely need to build a full-scale system on a
>> small
>> &gt; &amp;gt; testbed, look for bottlenecks, ask for advice, and plan on a
>> larger
>> &gt; &amp;gt; system for production.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; The hard limit for documents on a single shard is
>> slightly less than
>> &gt; &amp;gt; Java's Integer.MAX_VALUE -- just over two billion. Because
>> deleted
>> &gt; &amp;gt; documents count against this max, about one billion documents
>> per shard
>> &gt; &amp;gt; is the absolute max that should be loaded in practice.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; BUT, if you actually try to put one billion documents
>> in a single
>> &gt; &amp;gt; server, performance will likely be awful. A more reasonable
>> limit per
>> &gt; &amp;gt; machine is 100 million ... but even this is quite large. You
>> might need
>> &gt; &amp;gt; smaller shards, or you might be able to get good performance
>> with larger
>> &gt; &amp;gt; shards. It all depends on things that you may not even know yet.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; Memory is always a strong driver for Solr performance,
>> and I am
>> &gt; speaking
>> &gt; &amp;gt; specifically of OS disk cache -- memory that has not been
>> allocated by
>> &gt; &amp;gt; any program. With 10 billion documents, your total index size
>> will
>> &gt; &amp;gt; likely be hundreds of gigabytes, and might even reach terabyte
>> scale.
>> &gt; &amp;gt; Good performance with indexes this large will require a lot of
>> total
>> &gt; &amp;gt; memory, which probably means that you will need a lot of servers
>> and
>> &gt; &amp;gt; many shards. SSD storage is strongly recommended.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; For extreme scaling on Solr, especially if the query
>> rate will be
>> &gt; high,
>> &gt; &amp;gt; it is recommended to only have one shard replica per server.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; I have just added an "extreme scaling" section to the
>> following wiki
>> &gt; &amp;gt; page, but it's mostly a placeholder right now. I would like to
>> have a
>> &gt; &amp;gt; discussion with people who operate very large indexes so I can
>> put real
>> &gt; &amp;gt; usable information in this section. I'm on IRC quite frequently
>> in the
>> &gt; &amp;gt; #solr channel.
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; https://wiki.apache.org/solr/SolrPerformanceProblems
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt; &amp;gt; &amp;gt; Thanks,
>> &gt; &amp;gt; Shawn
>> &gt; &amp;gt;
>> &gt; &amp;gt;
>> &gt;
>> 


Re: Scaling SolrCloud

Posted by Erick Erickson <er...@gmail.com>.
NP. My usual question though is "how often do you expect to lose a
second ZK node before you can replace the first one that died?"

My tongue-in-cheek statement is often "If you're losing two nodes
regularly, you have problems with your hardware that you're not really
going to address by adding more ZK nodes" ;).

And do note that even if you lose quorum, SolrCloud will continue to
serve _queries_, albeit the "picture" each individual Solr node has of
the current state of all the Solr nodes will get stale. You won't be
able to index though. That said, the internal Solr load balancers
auto-distribute queries anyway to live nodes, so things can limp
along.

As always, it's a tradeoff between expense/complexity and robustness
though, and each and every situation is different in how much risk it
can tolerate.

FWIW,
Erick

On Thu, Jan 21, 2016 at 1:49 AM, Yago Riveiro <ya...@gmail.com> wrote:
> Is not a typo. I was wrong, for zookeeper 2 nodes still count as majority.
> It's not the desirable configuration but is tolerable.
>
>
>
> Thanks Erick.
>
>
>
> \--
>
> /Yago Riveiro
>
>> On Jan 21 2016, at 4:15 am, Erick Erickson &lt;erickerickson@gmail.com&gt;
> wrote:
>
>>
>
>> bq: 3 are to risky, you lost one you lost quorum
>
>>
>
>> Typo? You need to lose two.....
>
>>
>
>> On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro &lt;yago.riveiro@gmail.com&gt;
> wrote:
> &gt; Our Zookeeper cluster is an ensemble of 5 machines, is a good starting
> point,
> &gt; 3 are to risky, you lost one you lost quorum and with 7 sync cost
> increase.
> &gt;
> &gt;
> &gt;
> &gt; ZK cluster is in machines without IO and rotative hdd (don't not use SDD
> to
> &gt; gain IO performance, zookeeper is optimized to spinning disks).
> &gt;
> &gt;
> &gt;
> &gt; The ZK cluster behaves without problems, the first deploy of ZK was in
> the
> &gt; same machines that the Solr Cluster (ZK log in its own hdd) and that
> didn't
> &gt; wok very well, CPU and networking IO from Solr Cluster was too much.
> &gt;
> &gt;
> &gt;
> &gt; About schema modifications.
> &gt;
> &gt; Modify the schema to add new fields is relative simple with new API, in
> the
> &gt; pass all the work was manually uploading the schema to ZK and reloading
> all
> &gt; collections (indexing must be disable or timeouts and funny errors
> happen).
> &gt;
> &gt; With the new Schema API this is more user friendly. Anyway, I stop
> indexing
> &gt; and for reload the collections (I don't know if it's necessary nowadays).
> &gt;
> &gt; About Indexing data.
> &gt;
> &gt;
> &gt;
> &gt; We have self made data importer, it's not java and not performs batch
> indexing
> &gt; (with 500 collections buffer data and build the batch is expensive and
> &gt; complicate for error handling).
> &gt;
> &gt;
> &gt;
> &gt; We use regular HTTP post in json. Our throughput is about 1000 docs/s
> without
> &gt; any type of optimization. Some time we have issues with replication, the
> slave
> &gt; can keep pace with leader insertion and a full sync is requested, this is
> bad
> &gt; because sync the replica again implicates a lot of IO wait and CPU and
> with
> &gt; replicas with 100G take an hour or more (normally when this happen, we
> disable
> &gt; indexing to release IO and CPU and not kill the node with a load of 50 or
> 60).
> &gt;
> &gt; In this department my advice is "keep it simple" in the end is an HTTP
> POST to
> &gt; a node of the cluster.
> &gt;
> &gt;
> &gt;
> &gt; \\--
> &gt;
> &gt; /Yago Riveiro
> &gt;
> &gt;&gt; On Jan 20 2016, at 1:39 pm, Troy Edwards
> &amp;lt;tedwards415107@gmail.com&amp;gt;
> &gt; wrote:
> &gt;
> &gt;&gt;
> &gt;
> &gt;&gt; Thank you for sharing your experiences/ideas.
> &gt;
> &gt;&gt;
> &gt;
> &gt;&gt; Yago since you have 8 billion documents over 500 collections, can you
> share
> &gt; what/how you do index maintenance (e.g. add field)? And how are you
> loading
> &gt; data into the index? Any experiences around how Zookeeper ensemble
> behaves
> &gt; with so many collections?
> &gt;
> &gt;&gt;
> &gt;
> &gt;&gt; Best,
> &gt;
> &gt;&gt;
> &gt;
> &gt;&gt;
> &gt; On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro
> &amp;lt;yago.riveiro@gmail.com&amp;gt;
> &gt; wrote:
> &gt;
> &gt;&gt;
> &gt;
> &gt;&gt; &amp;gt; What I can say is:
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; * SDD (crucial for performance if the index doesn't fit in
> memory, and
> &gt; &amp;gt; will not fit)
> &gt; &amp;gt; * Divide and conquer, for that volume of docs you will need more
> than 6
> &gt; &amp;gt; nodes.
> &gt; &amp;gt; * DocValues to not stress the java HEAP.
> &gt; &amp;gt; * Do you will you aggregate data?, if yes, what is your max
> &gt; &amp;gt; cardinality?, this question is the most important to size
> correctly the
> &gt; &amp;gt; memory needs.
> &gt; &amp;gt; * Latency is important too, which threshold is acceptable before
> &gt; &amp;gt; consider a query slow?
> &gt; &amp;gt; At my company we are running a 12 terabytes (2 replicas) Solr
> cluster
> &gt; with
> &gt; &amp;gt; 8
> &gt; &amp;gt; billion documents sparse over 500 collection . For this we have
> about 12
> &gt; &amp;gt; machines with SDDs and 32G of ram each (~24G for the heap).
> &gt; &amp;gt;
> &gt; &amp;gt; We don't have a strict need of speed, 30 second query to
> aggregate 100
> &gt; &amp;gt; million
> &gt; &amp;gt; documents with 1M of unique keys is fast enough for us, normally
> the
> &gt; &amp;gt; aggregation performance decrease as the number of unique keys
> increase,
> &gt; &amp;gt; with
> &gt; &amp;gt; low unique key factor, queries take less than 2 seconds if data
> is in OS
> &gt; &amp;gt; cache.
> &gt; &amp;gt;
> &gt; &amp;gt; Personal recommendations:
> &gt; &amp;gt;
> &gt; &amp;gt; * Sharding is important and smart sharding is crucial, you don't
> want
> &gt; &amp;gt; run queries on data that is not interesting (this slow down
> queries when
> &gt; &amp;gt; the dataset is big).
> &gt; &amp;gt; * If you want measure speed do it with about 1 billion documents
> to
> &gt; &amp;gt; simulate something real (real for 10 billion document world).
> &gt; &amp;gt; * Index with re-indexing in mind. with 10 billion docs, re-index
> data
> &gt; &amp;gt; takes months ... This is important if you don't use regular
> features of
> &gt; &amp;gt; Solr. In my case I configured Docvalues with disk format (not
> standard
> &gt; &amp;gt; feature in 4.x) and at some point this format was deprecated.
> Upgrade
> &gt; Solr
> &gt; &amp;gt; to 5.x was an epic 3 months battle to do it without full
> downtime.
> &gt; &amp;gt; * Solr is like your girlfriend, will demand love and care and
> plenty of
> &gt; &amp;gt; space to full-recover replicas that in some point are out of
> sync, happen
> &gt; a
> &gt; &amp;gt; lot restarting nodes (this is annoying with replicas with 100G),
> don't
> &gt; &amp;gt; underestimate this point. Free space can save your life.
> &gt; &amp;gt;
> &gt; &amp;gt; \\\\--
> &gt; &amp;gt;
> &gt; &amp;gt; /Yago Riveiro
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; On Jan 19 2016, at 11:26 pm, Shawn Heisey
> &gt; &amp;amp;lt;apache@elyograg.org&amp;amp;gt;
> &gt; &amp;gt; wrote:
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; On 1/19/2016 1:30 PM, Troy Edwards wrote:
> &gt; &amp;gt; &amp;amp;gt; We are currently "beta testing" a SolrCloud with 2
> nodes and 2
> &gt; shards
> &gt; &amp;gt; with
> &gt; &amp;gt; &amp;amp;gt; 2 replicas each. The number of documents is about
> 125000.
> &gt; &amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; We now want to scale this to about 10 billion
> documents.
> &gt; &amp;gt; &amp;amp;gt;
> &gt; &amp;gt; &amp;amp;gt; What are the steps to prototyping, hardware
> estimation and
> &gt; stress
> &gt; &amp;gt; testing?
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; There is no general information available for sizing,
> because there
> &gt; are
> &gt; &amp;gt; too many factors that will affect the answers. Some of the
> important
> &gt; &amp;gt; information that you need will be impossible to predict until
> you
> &gt; &amp;gt; actually build it and subject it to a real query load.
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; https://lucidworks.com/blog/sizing-hardware-in-the-
> abstract-why-we-
> &gt; dont-
> &gt; &amp;gt; have-a-definitive-answer/
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; With an index of 10 billion documents, you may not be
> able to
> &gt; precisely
> &gt; &amp;gt; predict performance and hardware requirements from a small-scale
> &gt; &amp;gt; prototype. You'll likely need to build a full-scale system on a
> small
> &gt; &amp;gt; testbed, look for bottlenecks, ask for advice, and plan on a
> larger
> &gt; &amp;gt; system for production.
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; The hard limit for documents on a single shard is
> slightly less than
> &gt; &amp;gt; Java's Integer.MAX_VALUE -- just over two billion. Because
> deleted
> &gt; &amp;gt; documents count against this max, about one billion documents
> per shard
> &gt; &amp;gt; is the absolute max that should be loaded in practice.
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; BUT, if you actually try to put one billion documents
> in a single
> &gt; &amp;gt; server, performance will likely be awful. A more reasonable
> limit per
> &gt; &amp;gt; machine is 100 million ... but even this is quite large. You
> might need
> &gt; &amp;gt; smaller shards, or you might be able to get good performance
> with larger
> &gt; &amp;gt; shards. It all depends on things that you may not even know yet.
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; Memory is always a strong driver for Solr performance,
> and I am
> &gt; speaking
> &gt; &amp;gt; specifically of OS disk cache -- memory that has not been
> allocated by
> &gt; &amp;gt; any program. With 10 billion documents, your total index size
> will
> &gt; &amp;gt; likely be hundreds of gigabytes, and might even reach terabyte
> scale.
> &gt; &amp;gt; Good performance with indexes this large will require a lot of
> total
> &gt; &amp;gt; memory, which probably means that you will need a lot of servers
> and
> &gt; &amp;gt; many shards. SSD storage is strongly recommended.
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; For extreme scaling on Solr, especially if the query
> rate will be
> &gt; high,
> &gt; &amp;gt; it is recommended to only have one shard replica per server.
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; I have just added an "extreme scaling" section to the
> following wiki
> &gt; &amp;gt; page, but it's mostly a placeholder right now. I would like to
> have a
> &gt; &amp;gt; discussion with people who operate very large indexes so I can
> put real
> &gt; &amp;gt; usable information in this section. I'm on IRC quite frequently
> in the
> &gt; &amp;gt; #solr channel.
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; https://wiki.apache.org/solr/SolrPerformanceProblems
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; Thanks,
> &gt; &amp;gt; Shawn
> &gt; &amp;gt;
> &gt; &amp;gt;
> &gt;
>

Re: Scaling SolrCloud

Posted by Yago Riveiro <ya...@gmail.com>.
Is not a typo. I was wrong, for zookeeper 2 nodes still count as majority.
It's not the desirable configuration but is tolerable.

  

Thanks Erick.

  

\--

/Yago Riveiro

> On Jan 21 2016, at 4:15 am, Erick Erickson &lt;erickerickson@gmail.com&gt;
wrote:  

>

> bq: 3 are to risky, you lost one you lost quorum

>

> Typo? You need to lose two.....

>

> On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro &lt;yago.riveiro@gmail.com&gt;
wrote:  
&gt; Our Zookeeper cluster is an ensemble of 5 machines, is a good starting
point,  
&gt; 3 are to risky, you lost one you lost quorum and with 7 sync cost
increase.  
&gt;  
&gt;  
&gt;  
&gt; ZK cluster is in machines without IO and rotative hdd (don't not use SDD
to  
&gt; gain IO performance, zookeeper is optimized to spinning disks).  
&gt;  
&gt;  
&gt;  
&gt; The ZK cluster behaves without problems, the first deploy of ZK was in
the  
&gt; same machines that the Solr Cluster (ZK log in its own hdd) and that
didn't  
&gt; wok very well, CPU and networking IO from Solr Cluster was too much.  
&gt;  
&gt;  
&gt;  
&gt; About schema modifications.  
&gt;  
&gt; Modify the schema to add new fields is relative simple with new API, in
the  
&gt; pass all the work was manually uploading the schema to ZK and reloading
all  
&gt; collections (indexing must be disable or timeouts and funny errors
happen).  
&gt;  
&gt; With the new Schema API this is more user friendly. Anyway, I stop
indexing  
&gt; and for reload the collections (I don't know if it's necessary nowadays).  
&gt;  
&gt; About Indexing data.  
&gt;  
&gt;  
&gt;  
&gt; We have self made data importer, it's not java and not performs batch
indexing  
&gt; (with 500 collections buffer data and build the batch is expensive and  
&gt; complicate for error handling).  
&gt;  
&gt;  
&gt;  
&gt; We use regular HTTP post in json. Our throughput is about 1000 docs/s
without  
&gt; any type of optimization. Some time we have issues with replication, the
slave  
&gt; can keep pace with leader insertion and a full sync is requested, this is
bad  
&gt; because sync the replica again implicates a lot of IO wait and CPU and
with  
&gt; replicas with 100G take an hour or more (normally when this happen, we
disable  
&gt; indexing to release IO and CPU and not kill the node with a load of 50 or
60).  
&gt;  
&gt; In this department my advice is "keep it simple" in the end is an HTTP
POST to  
&gt; a node of the cluster.  
&gt;  
&gt;  
&gt;  
&gt; \\--  
&gt;  
&gt; /Yago Riveiro  
&gt;  
&gt;&gt; On Jan 20 2016, at 1:39 pm, Troy Edwards
&amp;lt;tedwards415107@gmail.com&amp;gt;  
&gt; wrote:  
&gt;  
&gt;&gt;  
&gt;  
&gt;&gt; Thank you for sharing your experiences/ideas.  
&gt;  
&gt;&gt;  
&gt;  
&gt;&gt; Yago since you have 8 billion documents over 500 collections, can you
share  
&gt; what/how you do index maintenance (e.g. add field)? And how are you
loading  
&gt; data into the index? Any experiences around how Zookeeper ensemble
behaves  
&gt; with so many collections?  
&gt;  
&gt;&gt;  
&gt;  
&gt;&gt; Best,  
&gt;  
&gt;&gt;  
&gt;  
&gt;&gt;  
&gt; On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro
&amp;lt;yago.riveiro@gmail.com&amp;gt;  
&gt; wrote:  
&gt;  
&gt;&gt;  
&gt;  
&gt;&gt; &amp;gt; What I can say is:  
&gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; * SDD (crucial for performance if the index doesn't fit in
memory, and  
&gt; &amp;gt; will not fit)  
&gt; &amp;gt; * Divide and conquer, for that volume of docs you will need more
than 6  
&gt; &amp;gt; nodes.  
&gt; &amp;gt; * DocValues to not stress the java HEAP.  
&gt; &amp;gt; * Do you will you aggregate data?, if yes, what is your max  
&gt; &amp;gt; cardinality?, this question is the most important to size
correctly the  
&gt; &amp;gt; memory needs.  
&gt; &amp;gt; * Latency is important too, which threshold is acceptable before  
&gt; &amp;gt; consider a query slow?  
&gt; &amp;gt; At my company we are running a 12 terabytes (2 replicas) Solr
cluster  
&gt; with  
&gt; &amp;gt; 8  
&gt; &amp;gt; billion documents sparse over 500 collection . For this we have
about 12  
&gt; &amp;gt; machines with SDDs and 32G of ram each (~24G for the heap).  
&gt; &amp;gt;  
&gt; &amp;gt; We don't have a strict need of speed, 30 second query to
aggregate 100  
&gt; &amp;gt; million  
&gt; &amp;gt; documents with 1M of unique keys is fast enough for us, normally
the  
&gt; &amp;gt; aggregation performance decrease as the number of unique keys
increase,  
&gt; &amp;gt; with  
&gt; &amp;gt; low unique key factor, queries take less than 2 seconds if data
is in OS  
&gt; &amp;gt; cache.  
&gt; &amp;gt;  
&gt; &amp;gt; Personal recommendations:  
&gt; &amp;gt;  
&gt; &amp;gt; * Sharding is important and smart sharding is crucial, you don't
want  
&gt; &amp;gt; run queries on data that is not interesting (this slow down
queries when  
&gt; &amp;gt; the dataset is big).  
&gt; &amp;gt; * If you want measure speed do it with about 1 billion documents
to  
&gt; &amp;gt; simulate something real (real for 10 billion document world).  
&gt; &amp;gt; * Index with re-indexing in mind. with 10 billion docs, re-index
data  
&gt; &amp;gt; takes months ... This is important if you don't use regular
features of  
&gt; &amp;gt; Solr. In my case I configured Docvalues with disk format (not
standard  
&gt; &amp;gt; feature in 4.x) and at some point this format was deprecated.
Upgrade  
&gt; Solr  
&gt; &amp;gt; to 5.x was an epic 3 months battle to do it without full
downtime.  
&gt; &amp;gt; * Solr is like your girlfriend, will demand love and care and
plenty of  
&gt; &amp;gt; space to full-recover replicas that in some point are out of
sync, happen  
&gt; a  
&gt; &amp;gt; lot restarting nodes (this is annoying with replicas with 100G),
don't  
&gt; &amp;gt; underestimate this point. Free space can save your life.  
&gt; &amp;gt;  
&gt; &amp;gt; \\\\--  
&gt; &amp;gt;  
&gt; &amp;gt; /Yago Riveiro  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; On Jan 19 2016, at 11:26 pm, Shawn Heisey  
&gt; &amp;amp;lt;apache@elyograg.org&amp;amp;gt;  
&gt; &amp;gt; wrote:  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; On 1/19/2016 1:30 PM, Troy Edwards wrote:  
&gt; &amp;gt; &amp;amp;gt; We are currently "beta testing" a SolrCloud with 2
nodes and 2  
&gt; shards  
&gt; &amp;gt; with  
&gt; &amp;gt; &amp;amp;gt; 2 replicas each. The number of documents is about
125000.  
&gt; &amp;gt; &amp;amp;gt;  
&gt; &amp;gt; &amp;amp;gt; We now want to scale this to about 10 billion
documents.  
&gt; &amp;gt; &amp;amp;gt;  
&gt; &amp;gt; &amp;amp;gt; What are the steps to prototyping, hardware
estimation and  
&gt; stress  
&gt; &amp;gt; testing?  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; There is no general information available for sizing,
because there  
&gt; are  
&gt; &amp;gt; too many factors that will affect the answers. Some of the
important  
&gt; &amp;gt; information that you need will be impossible to predict until
you  
&gt; &amp;gt; actually build it and subject it to a real query load.  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; https://lucidworks.com/blog/sizing-hardware-in-the-
abstract-why-we-  
&gt; dont-  
&gt; &amp;gt; have-a-definitive-answer/  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; With an index of 10 billion documents, you may not be
able to  
&gt; precisely  
&gt; &amp;gt; predict performance and hardware requirements from a small-scale  
&gt; &amp;gt; prototype. You'll likely need to build a full-scale system on a
small  
&gt; &amp;gt; testbed, look for bottlenecks, ask for advice, and plan on a
larger  
&gt; &amp;gt; system for production.  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; The hard limit for documents on a single shard is
slightly less than  
&gt; &amp;gt; Java's Integer.MAX_VALUE -- just over two billion. Because
deleted  
&gt; &amp;gt; documents count against this max, about one billion documents
per shard  
&gt; &amp;gt; is the absolute max that should be loaded in practice.  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; BUT, if you actually try to put one billion documents
in a single  
&gt; &amp;gt; server, performance will likely be awful. A more reasonable
limit per  
&gt; &amp;gt; machine is 100 million ... but even this is quite large. You
might need  
&gt; &amp;gt; smaller shards, or you might be able to get good performance
with larger  
&gt; &amp;gt; shards. It all depends on things that you may not even know yet.  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; Memory is always a strong driver for Solr performance,
and I am  
&gt; speaking  
&gt; &amp;gt; specifically of OS disk cache -- memory that has not been
allocated by  
&gt; &amp;gt; any program. With 10 billion documents, your total index size
will  
&gt; &amp;gt; likely be hundreds of gigabytes, and might even reach terabyte
scale.  
&gt; &amp;gt; Good performance with indexes this large will require a lot of
total  
&gt; &amp;gt; memory, which probably means that you will need a lot of servers
and  
&gt; &amp;gt; many shards. SSD storage is strongly recommended.  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; For extreme scaling on Solr, especially if the query
rate will be  
&gt; high,  
&gt; &amp;gt; it is recommended to only have one shard replica per server.  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; I have just added an "extreme scaling" section to the
following wiki  
&gt; &amp;gt; page, but it's mostly a placeholder right now. I would like to
have a  
&gt; &amp;gt; discussion with people who operate very large indexes so I can
put real  
&gt; &amp;gt; usable information in this section. I'm on IRC quite frequently
in the  
&gt; &amp;gt; #solr channel.  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; https://wiki.apache.org/solr/SolrPerformanceProblems  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; &amp;gt; Thanks,  
&gt; &amp;gt; Shawn  
&gt; &amp;gt;  
&gt; &amp;gt;  
&gt;


Re: Scaling SolrCloud

Posted by Erick Erickson <er...@gmail.com>.
bq: 3 are to risky, you lost one you lost quorum

Typo? You need to lose two.....

On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro <ya...@gmail.com> wrote:
> Our Zookeeper cluster is an ensemble of 5 machines, is a good starting point,
> 3 are to risky, you lost one you lost quorum and with 7 sync cost increase.
>
>
>
> ZK cluster is in machines without IO and rotative hdd (don't not use SDD to
> gain IO performance,  zookeeper is optimized to spinning disks).
>
>
>
> The ZK cluster behaves without problems, the first deploy of ZK was in the
> same machines that the Solr Cluster (ZK log in its own hdd) and that didn't
> wok very well, CPU and networking IO from Solr Cluster was too much.
>
>
>
> About schema modifications.
>
> Modify the schema to add new fields is relative simple with new API, in the
> pass all the work was manually uploading the schema to ZK and reloading all
> collections (indexing must be disable or timeouts and funny errors happen).
>
> With the new Schema API this is more user friendly. Anyway, I stop indexing
> and for reload the collections (I don't know if it's necessary nowadays).
>
> About Indexing data.
>
>
>
> We have self made data importer, it's not java and not performs batch indexing
> (with 500 collections buffer data and build the batch is expensive and
> complicate for error handling).
>
>
>
> We use regular HTTP post in json. Our throughput  is about 1000 docs/s without
> any type of optimization. Some time we have issues with replication, the slave
> can keep pace with leader insertion and a full sync is requested, this is bad
> because sync the replica again implicates a lot of IO wait and CPU and with
> replicas with 100G take an hour or more (normally when this happen, we disable
> indexing to release IO and CPU and not kill the node with a load of 50 or 60).
>
> In this department my advice is "keep it simple" in the end is an HTTP POST to
> a node of the cluster.
>
>
>
> \--
>
> /Yago Riveiro
>
>> On Jan 20 2016, at 1:39 pm, Troy Edwards &lt;tedwards415107@gmail.com&gt;
> wrote:
>
>>
>
>> Thank you for sharing your experiences/ideas.
>
>>
>
>> Yago since you have 8 billion documents over 500 collections, can you share
> what/how you do index maintenance (e.g. add field)? And how are you loading
> data into the index? Any experiences around how Zookeeper ensemble behaves
> with so many collections?
>
>>
>
>> Best,
>
>>
>
>>
> On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro &lt;yago.riveiro@gmail.com&gt;
> wrote:
>
>>
>
>> &gt; What I can say is:
> &gt;
> &gt;
> &gt; * SDD (crucial for performance if the index doesn't fit in memory, and
> &gt; will not fit)
> &gt; * Divide and conquer, for that volume of docs you will need more than 6
> &gt; nodes.
> &gt; * DocValues to not stress the java HEAP.
> &gt; * Do you will you aggregate data?, if yes, what is your max
> &gt; cardinality?, this question is the most important to size correctly the
> &gt; memory needs.
> &gt; * Latency is important too, which threshold is acceptable before
> &gt; consider a query slow?
> &gt; At my company we are running a 12 terabytes (2 replicas) Solr cluster
> with
> &gt; 8
> &gt; billion documents sparse over 500 collection . For this we have about 12
> &gt; machines with SDDs and 32G of ram each (~24G for the heap).
> &gt;
> &gt; We don't have a strict need of speed, 30 second query to aggregate 100
> &gt; million
> &gt; documents with 1M of unique keys is fast enough for us, normally the
> &gt; aggregation performance decrease as the number of unique keys increase,
> &gt; with
> &gt; low unique key factor, queries take less than 2 seconds if data is in OS
> &gt; cache.
> &gt;
> &gt; Personal recommendations:
> &gt;
> &gt; * Sharding is important and smart sharding is crucial, you don't want
> &gt; run queries on data that is not interesting (this slow down queries when
> &gt; the dataset is big).
> &gt; * If you want measure speed do it with about 1 billion documents to
> &gt; simulate something real (real for 10 billion document world).
> &gt; * Index with re-indexing in mind. with 10 billion docs, re-index data
> &gt; takes months ... This is important if you don't use regular features of
> &gt; Solr. In my case I configured Docvalues with disk format (not standard
> &gt; feature in 4.x) and at some point this format was deprecated. Upgrade
> Solr
> &gt; to 5.x was an epic 3 months battle to do it without full downtime.
> &gt; * Solr is like your girlfriend, will demand love and care and plenty of
> &gt; space to full-recover replicas that in some point are out of sync, happen
> a
> &gt; lot restarting nodes (this is annoying with replicas with 100G), don't
> &gt; underestimate this point. Free space can save your life.
> &gt;
> &gt; \\--
> &gt;
> &gt; /Yago Riveiro
> &gt;
> &gt; &gt; On Jan 19 2016, at 11:26 pm, Shawn Heisey
> &amp;lt;apache@elyograg.org&amp;gt;
> &gt; wrote:
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; On 1/19/2016 1:30 PM, Troy Edwards wrote:
> &gt; &amp;gt; We are currently "beta testing" a SolrCloud with 2 nodes and 2
> shards
> &gt; with
> &gt; &amp;gt; 2 replicas each. The number of documents is about 125000.
> &gt; &amp;gt;
> &gt; &amp;gt; We now want to scale this to about 10 billion documents.
> &gt; &amp;gt;
> &gt; &amp;gt; What are the steps to prototyping, hardware estimation and
> stress
> &gt; testing?
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; There is no general information available for sizing, because there
> are
> &gt; too many factors that will affect the answers. Some of the important
> &gt; information that you need will be impossible to predict until you
> &gt; actually build it and subject it to a real query load.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-
> dont-
> &gt; have-a-definitive-answer/
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; With an index of 10 billion documents, you may not be able to
> precisely
> &gt; predict performance and hardware requirements from a small-scale
> &gt; prototype. You'll likely need to build a full-scale system on a small
> &gt; testbed, look for bottlenecks, ask for advice, and plan on a larger
> &gt; system for production.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; The hard limit for documents on a single shard is slightly less than
> &gt; Java's Integer.MAX_VALUE -- just over two billion. Because deleted
> &gt; documents count against this max, about one billion documents per shard
> &gt; is the absolute max that should be loaded in practice.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; BUT, if you actually try to put one billion documents in a single
> &gt; server, performance will likely be awful. A more reasonable limit per
> &gt; machine is 100 million ... but even this is quite large. You might need
> &gt; smaller shards, or you might be able to get good performance with larger
> &gt; shards. It all depends on things that you may not even know yet.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; Memory is always a strong driver for Solr performance, and I am
> speaking
> &gt; specifically of OS disk cache -- memory that has not been allocated by
> &gt; any program. With 10 billion documents, your total index size will
> &gt; likely be hundreds of gigabytes, and might even reach terabyte scale.
> &gt; Good performance with indexes this large will require a lot of total
> &gt; memory, which probably means that you will need a lot of servers and
> &gt; many shards. SSD storage is strongly recommended.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; For extreme scaling on Solr, especially if the query rate will be
> high,
> &gt; it is recommended to only have one shard replica per server.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; I have just added an "extreme scaling" section to the following wiki
> &gt; page, but it's mostly a placeholder right now. I would like to have a
> &gt; discussion with people who operate very large indexes so I can put real
> &gt; usable information in this section. I'm on IRC quite frequently in the
> &gt; #solr channel.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; https://wiki.apache.org/solr/SolrPerformanceProblems
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; Thanks,
> &gt; Shawn
> &gt;
> &gt;
>

Re: Scaling SolrCloud

Posted by Yago Riveiro <ya...@gmail.com>.
Our Zookeeper cluster is an ensemble of 5 machines, is a good starting point,
3 are to risky, you lost one you lost quorum and with 7 sync cost increase.

  

ZK cluster is in machines without IO and rotative hdd (don't not use SDD to
gain IO performance,  zookeeper is optimized to spinning disks).

  

The ZK cluster behaves without problems, the first deploy of ZK was in the
same machines that the Solr Cluster (ZK log in its own hdd) and that didn't
wok very well, CPU and networking IO from Solr Cluster was too much.

  

About schema modifications.  
  
Modify the schema to add new fields is relative simple with new API, in the
pass all the work was manually uploading the schema to ZK and reloading all
collections (indexing must be disable or timeouts and funny errors happen).  
  
With the new Schema API this is more user friendly. Anyway, I stop indexing
and for reload the collections (I don't know if it's necessary nowadays).  
  
About Indexing data.

  

We have self made data importer, it's not java and not performs batch indexing
(with 500 collections buffer data and build the batch is expensive and
complicate for error handling).

  

We use regular HTTP post in json. Our throughput  is about 1000 docs/s without
any type of optimization. Some time we have issues with replication, the slave
can keep pace with leader insertion and a full sync is requested, this is bad
because sync the replica again implicates a lot of IO wait and CPU and with
replicas with 100G take an hour or more (normally when this happen, we disable
indexing to release IO and CPU and not kill the node with a load of 50 or 60).  
  
In this department my advice is "keep it simple" in the end is an HTTP POST to
a node of the cluster.

  

\--

/Yago Riveiro

> On Jan 20 2016, at 1:39 pm, Troy Edwards &lt;tedwards415107@gmail.com&gt;
wrote:  

>

> Thank you for sharing your experiences/ideas.

>

> Yago since you have 8 billion documents over 500 collections, can you share  
what/how you do index maintenance (e.g. add field)? And how are you loading  
data into the index? Any experiences around how Zookeeper ensemble behaves  
with so many collections?

>

> Best,

>

>  
On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro &lt;yago.riveiro@gmail.com&gt;  
wrote:

>

> &gt; What I can say is:  
&gt;  
&gt;  
&gt; * SDD (crucial for performance if the index doesn't fit in memory, and  
&gt; will not fit)  
&gt; * Divide and conquer, for that volume of docs you will need more than 6  
&gt; nodes.  
&gt; * DocValues to not stress the java HEAP.  
&gt; * Do you will you aggregate data?, if yes, what is your max  
&gt; cardinality?, this question is the most important to size correctly the  
&gt; memory needs.  
&gt; * Latency is important too, which threshold is acceptable before  
&gt; consider a query slow?  
&gt; At my company we are running a 12 terabytes (2 replicas) Solr cluster
with  
&gt; 8  
&gt; billion documents sparse over 500 collection . For this we have about 12  
&gt; machines with SDDs and 32G of ram each (~24G for the heap).  
&gt;  
&gt; We don't have a strict need of speed, 30 second query to aggregate 100  
&gt; million  
&gt; documents with 1M of unique keys is fast enough for us, normally the  
&gt; aggregation performance decrease as the number of unique keys increase,  
&gt; with  
&gt; low unique key factor, queries take less than 2 seconds if data is in OS  
&gt; cache.  
&gt;  
&gt; Personal recommendations:  
&gt;  
&gt; * Sharding is important and smart sharding is crucial, you don't want  
&gt; run queries on data that is not interesting (this slow down queries when  
&gt; the dataset is big).  
&gt; * If you want measure speed do it with about 1 billion documents to  
&gt; simulate something real (real for 10 billion document world).  
&gt; * Index with re-indexing in mind. with 10 billion docs, re-index data  
&gt; takes months ... This is important if you don't use regular features of  
&gt; Solr. In my case I configured Docvalues with disk format (not standard  
&gt; feature in 4.x) and at some point this format was deprecated. Upgrade
Solr  
&gt; to 5.x was an epic 3 months battle to do it without full downtime.  
&gt; * Solr is like your girlfriend, will demand love and care and plenty of  
&gt; space to full-recover replicas that in some point are out of sync, happen
a  
&gt; lot restarting nodes (this is annoying with replicas with 100G), don't  
&gt; underestimate this point. Free space can save your life.  
&gt;  
&gt; \\--  
&gt;  
&gt; /Yago Riveiro  
&gt;  
&gt; &gt; On Jan 19 2016, at 11:26 pm, Shawn Heisey
&amp;lt;apache@elyograg.org&amp;gt;  
&gt; wrote:  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; On 1/19/2016 1:30 PM, Troy Edwards wrote:  
&gt; &amp;gt; We are currently "beta testing" a SolrCloud with 2 nodes and 2
shards  
&gt; with  
&gt; &amp;gt; 2 replicas each. The number of documents is about 125000.  
&gt; &amp;gt;  
&gt; &amp;gt; We now want to scale this to about 10 billion documents.  
&gt; &amp;gt;  
&gt; &amp;gt; What are the steps to prototyping, hardware estimation and
stress  
&gt; testing?  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; There is no general information available for sizing, because there
are  
&gt; too many factors that will affect the answers. Some of the important  
&gt; information that you need will be impossible to predict until you  
&gt; actually build it and subject it to a real query load.  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-
dont-  
&gt; have-a-definitive-answer/  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; With an index of 10 billion documents, you may not be able to
precisely  
&gt; predict performance and hardware requirements from a small-scale  
&gt; prototype. You'll likely need to build a full-scale system on a small  
&gt; testbed, look for bottlenecks, ask for advice, and plan on a larger  
&gt; system for production.  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; The hard limit for documents on a single shard is slightly less than  
&gt; Java's Integer.MAX_VALUE -- just over two billion. Because deleted  
&gt; documents count against this max, about one billion documents per shard  
&gt; is the absolute max that should be loaded in practice.  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; BUT, if you actually try to put one billion documents in a single  
&gt; server, performance will likely be awful. A more reasonable limit per  
&gt; machine is 100 million ... but even this is quite large. You might need  
&gt; smaller shards, or you might be able to get good performance with larger  
&gt; shards. It all depends on things that you may not even know yet.  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; Memory is always a strong driver for Solr performance, and I am
speaking  
&gt; specifically of OS disk cache -- memory that has not been allocated by  
&gt; any program. With 10 billion documents, your total index size will  
&gt; likely be hundreds of gigabytes, and might even reach terabyte scale.  
&gt; Good performance with indexes this large will require a lot of total  
&gt; memory, which probably means that you will need a lot of servers and  
&gt; many shards. SSD storage is strongly recommended.  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; For extreme scaling on Solr, especially if the query rate will be
high,  
&gt; it is recommended to only have one shard replica per server.  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; I have just added an "extreme scaling" section to the following wiki  
&gt; page, but it's mostly a placeholder right now. I would like to have a  
&gt; discussion with people who operate very large indexes so I can put real  
&gt; usable information in this section. I'm on IRC quite frequently in the  
&gt; #solr channel.  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; https://wiki.apache.org/solr/SolrPerformanceProblems  
&gt;  
&gt; &gt;  
&gt;  
&gt; &gt; Thanks,  
&gt; Shawn  
&gt;  
&gt;


Re: Scaling SolrCloud

Posted by Troy Edwards <te...@gmail.com>.
Thank you for sharing your experiences/ideas.

Yago since you have 8 billion documents over 500 collections, can you share
what/how you do index maintenance (e.g. add field)? And how are you loading
data into the index? Any experiences around how Zookeeper ensemble behaves
with so many collections?

Best,


On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro <ya...@gmail.com>
wrote:

> What I can say is:
>
>
>   * SDD (crucial for performance if the index doesn't fit in memory, and
> will not fit)
>   * Divide and conquer, for that volume of docs you will need more than 6
> nodes.
>   * DocValues to not stress the java HEAP.
>   * Do you will you aggregate data?, if yes, what is your max
> cardinality?, this question is the most important to size correctly the
> memory needs.
>   * Latency is important too, which threshold is acceptable before
> consider a query slow?
> At my company we are running a 12 terabytes (2 replicas) Solr cluster with
> 8
> billion documents sparse over 500 collection . For this we have about 12
> machines with SDDs and 32G of ram each (~24G for the heap).
>
> We don't have a strict need of speed, 30 second query to aggregate 100
> million
> documents with 1M of unique keys is fast enough for us, normally the
> aggregation performance decrease as the number of unique keys increase,
> with
> low unique key factor, queries take less than 2 seconds if data is in OS
> cache.
>
> Personal recommendations:
>
>   * Sharding is important and smart sharding is crucial, you don't want
> run queries on data that is not interesting (this slow down queries when
> the dataset is big).
>   * If you want measure speed do it with about 1 billion documents to
> simulate something real (real for 10 billion document world).
>   * Index with re-indexing in mind. with 10 billion docs, re-index data
> takes months ... This is important if you don't use regular features of
> Solr. In my case I configured Docvalues with disk format (not standard
> feature in 4.x) and at some point this format was deprecated. Upgrade Solr
> to 5.x was an epic 3 months battle to do it without full downtime.
>   * Solr is like your girlfriend, will demand love and care and plenty of
> space to full-recover replicas that in some point are out of sync, happen a
> lot restarting nodes (this is annoying with replicas with 100G), don't
> underestimate this point. Free space can save your life.
>
> \--
>
> /Yago Riveiro
>
> > On Jan 19 2016, at 11:26 pm, Shawn Heisey &lt;apache@elyograg.org&gt;
> wrote:
>
> >
>
> > On 1/19/2016 1:30 PM, Troy Edwards wrote:
> &gt; We are currently "beta testing" a SolrCloud with 2 nodes and 2 shards
> with
> &gt; 2 replicas each. The number of documents is about 125000.
> &gt;
> &gt; We now want to scale this to about 10 billion documents.
> &gt;
> &gt; What are the steps to prototyping, hardware estimation and stress
> testing?
>
> >
>
> > There is no general information available for sizing, because there are
> too many factors that will affect the answers. Some of the important
> information that you need will be impossible to predict until you
> actually build it and subject it to a real query load.
>
> >
>
> > https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-
> have-a-definitive-answer/
>
> >
>
> > With an index of 10 billion documents, you may not be able to precisely
> predict performance and hardware requirements from a small-scale
> prototype. You'll likely need to build a full-scale system on a small
> testbed, look for bottlenecks, ask for advice, and plan on a larger
> system for production.
>
> >
>
> > The hard limit for documents on a single shard is slightly less than
> Java's Integer.MAX_VALUE -- just over two billion. Because deleted
> documents count against this max, about one billion documents per shard
> is the absolute max that should be loaded in practice.
>
> >
>
> > BUT, if you actually try to put one billion documents in a single
> server, performance will likely be awful. A more reasonable limit per
> machine is 100 million ... but even this is quite large. You might need
> smaller shards, or you might be able to get good performance with larger
> shards. It all depends on things that you may not even know yet.
>
> >
>
> > Memory is always a strong driver for Solr performance, and I am speaking
> specifically of OS disk cache -- memory that has not been allocated by
> any program. With 10 billion documents, your total index size will
> likely be hundreds of gigabytes, and might even reach terabyte scale.
> Good performance with indexes this large will require a lot of total
> memory, which probably means that you will need a lot of servers and
> many shards. SSD storage is strongly recommended.
>
> >
>
> > For extreme scaling on Solr, especially if the query rate will be high,
> it is recommended to only have one shard replica per server.
>
> >
>
> > I have just added an "extreme scaling" section to the following wiki
> page, but it's mostly a placeholder right now. I would like to have a
> discussion with people who operate very large indexes so I can put real
> usable information in this section. I'm on IRC quite frequently in the
> #solr channel.
>
> >
>
> > https://wiki.apache.org/solr/SolrPerformanceProblems
>
> >
>
> > Thanks,
> Shawn
>
>

Re: Scaling SolrCloud

Posted by Yago Riveiro <ya...@gmail.com>.
What I can say is:  
  

  * SDD (crucial for performance if the index doesn't fit in memory, and will not fit)
  * Divide and conquer, for that volume of docs you will need more than 6 nodes.
  * DocValues to not stress the java HEAP.
  * Do you will you aggregate data?, if yes, what is your max cardinality?, this question is the most important to size correctly the memory needs.
  * Latency is important too, which threshold is acceptable before consider a query slow?
At my company we are running a 12 terabytes (2 replicas) Solr cluster with 8
billion documents sparse over 500 collection . For this we have about 12
machines with SDDs and 32G of ram each (~24G for the heap).  
  
We don't have a strict need of speed, 30 second query to aggregate 100 million
documents with 1M of unique keys is fast enough for us, normally the
aggregation performance decrease as the number of unique keys increase, with
low unique key factor, queries take less than 2 seconds if data is in OS
cache.  
  
Personal recommendations:

  * Sharding is important and smart sharding is crucial, you don't want run queries on data that is not interesting (this slow down queries when the dataset is big). 
  * If you want measure speed do it with about 1 billion documents to simulate something real (real for 10 billion document world).
  * Index with re-indexing in mind. with 10 billion docs, re-index data takes months ... This is important if you don't use regular features of Solr. In my case I configured Docvalues with disk format (not standard feature in 4.x) and at some point this format was deprecated. Upgrade Solr to 5.x was an epic 3 months battle to do it without full downtime.
  * Solr is like your girlfriend, will demand love and care and plenty of space to full-recover replicas that in some point are out of sync, happen a lot restarting nodes (this is annoying with replicas with 100G), don't underestimate this point. Free space can save your life.  

\--

/Yago Riveiro

> On Jan 19 2016, at 11:26 pm, Shawn Heisey &lt;apache@elyograg.org&gt; wrote:  

>

> On 1/19/2016 1:30 PM, Troy Edwards wrote:  
&gt; We are currently "beta testing" a SolrCloud with 2 nodes and 2 shards
with  
&gt; 2 replicas each. The number of documents is about 125000.  
&gt;  
&gt; We now want to scale this to about 10 billion documents.  
&gt;  
&gt; What are the steps to prototyping, hardware estimation and stress
testing?

>

> There is no general information available for sizing, because there are  
too many factors that will affect the answers. Some of the important  
information that you need will be impossible to predict until you  
actually build it and subject it to a real query load.

>

> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-
have-a-definitive-answer/

>

> With an index of 10 billion documents, you may not be able to precisely  
predict performance and hardware requirements from a small-scale  
prototype. You'll likely need to build a full-scale system on a small  
testbed, look for bottlenecks, ask for advice, and plan on a larger  
system for production.

>

> The hard limit for documents on a single shard is slightly less than  
Java's Integer.MAX_VALUE -- just over two billion. Because deleted  
documents count against this max, about one billion documents per shard  
is the absolute max that should be loaded in practice.

>

> BUT, if you actually try to put one billion documents in a single  
server, performance will likely be awful. A more reasonable limit per  
machine is 100 million ... but even this is quite large. You might need  
smaller shards, or you might be able to get good performance with larger  
shards. It all depends on things that you may not even know yet.

>

> Memory is always a strong driver for Solr performance, and I am speaking  
specifically of OS disk cache -- memory that has not been allocated by  
any program. With 10 billion documents, your total index size will  
likely be hundreds of gigabytes, and might even reach terabyte scale.  
Good performance with indexes this large will require a lot of total  
memory, which probably means that you will need a lot of servers and  
many shards. SSD storage is strongly recommended.

>

> For extreme scaling on Solr, especially if the query rate will be high,  
it is recommended to only have one shard replica per server.

>

> I have just added an "extreme scaling" section to the following wiki  
page, but it's mostly a placeholder right now. I would like to have a  
discussion with people who operate very large indexes so I can put real  
usable information in this section. I'm on IRC quite frequently in the  
#solr channel.

>

> https://wiki.apache.org/solr/SolrPerformanceProblems

>

> Thanks,  
Shawn


Re: Scaling SolrCloud

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/19/2016 1:30 PM, Troy Edwards wrote:
> We are currently "beta testing" a SolrCloud with 2 nodes and 2 shards with
> 2 replicas each. The number of documents is about 125000.
>
> We now want to scale this to about 10 billion documents.
>
> What are the steps to prototyping, hardware estimation and stress testing?

There is no general information available for sizing, because there are 
too many factors that will affect the answers. Some of the important 
information that you need will be impossible to predict until you 
actually build it and subject it to a real query load.

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

With an index of 10 billion documents, you may not be able to precisely 
predict performance and hardware requirements from a small-scale 
prototype.  You'll likely need to build a full-scale system on a small 
testbed, look for bottlenecks, ask for advice, and plan on a larger 
system for production.

The hard limit for documents on a single shard is slightly less than 
Java's Integer.MAX_VALUE -- just over two billion. Because deleted 
documents count against this max, about one billion documents per shard 
is the absolute max that should be loaded in practice.

BUT, if you actually try to put one billion documents in a single 
server, performance will likely be awful.  A more reasonable limit per 
machine is 100 million ... but even this is quite large.  You might need 
smaller shards, or you might be able to get good performance with larger 
shards.  It all depends on things that you may not even know yet.

Memory is always a strong driver for Solr performance, and I am speaking 
specifically of OS disk cache -- memory that has not been allocated by 
any program.  With 10 billion documents, your total index size will 
likely be hundreds of gigabytes, and might even reach terabyte scale.  
Good performance with indexes this large will require a lot of total 
memory, which probably means that you will need a lot of servers and 
many shards.  SSD storage is strongly recommended.

For extreme scaling on Solr, especially if the query rate will be high, 
it is recommended to only have one shard replica per server.

I have just added an "extreme scaling" section to the following wiki 
page, but it's mostly a placeholder right now.  I would like to have a 
discussion with people who operate very large indexes so I can put real 
usable information in this section.  I'm on IRC quite frequently in the 
#solr channel.

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn