You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Prashant Jyoti <jt...@gmail.com> on 2020/07/28 11:26:15 UTC

Production sizing and scaling guidelines -- Solr

Hi,
I wanted to check if anybody has any references for tech companies' blogs
detailing their Solr setup in production. I am more interested in storage
and scaling guidelines. I intend to use Solr for one of my projects at
work(back-end for a reporting tool) and need to convince higher management
that it is indeed the right solution. I have gone through the material
available in the Solr reference guide, I am looking for some details from a
working production setup.

Thanks!
-- 
Regards,
Prashant.

Re: Production sizing and scaling guidelines -- Solr

Posted by Prashant Jyoti <jt...@gmail.com>.
Thanks for that Colvin. Even though it's a bit dated but it sure does help
in getting an idea.

I definitely remember seeing a list of these sorts of blogs somewhere a
> long time ago... don't know where though
>
By any chance you stumble upon it, please feel free to share even at a
later date.

On Tue, Jul 28, 2020 at 8:32 PM Colvin Cowie <co...@gmail.com>
wrote:

> Maybe not the most up to date or relevant example for your usage but
>
> https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-solrcloud/
> is one that sticks in my mind
> I definitely remember seeing a list of these sorts of blogs somewhere a
> long time ago... don't know where though
>
> On Tue, 28 Jul 2020 at 13:50, Prashant Jyoti <jt...@gmail.com> wrote:
>
> > Thanks Erick.
> >
> > 1> does Solr do what you want? You’re talking about reporting, and Solr
> is
> > > primarily a search engine. That said, it has tons of analytics
> > capabilities
> > > built in. Depends on what “reporting” means in your situation.
> > >
> > There is a reporting UI which has various criteria the user can filter
> on,
> > the data for this UI will be indexed to and fetched from Solr. These are
> > basically call logs of the user's interaction with tech support. The
> > documents would be at max a few MBs in size.
> >
> > > 2> how expensive is it?
> >
> > I am looking at what kind of a setup is considered okay to handle, let's
> > say, average loads to start with (I am not considering billions of
> > documents/day to be an average load at my place, that would be
> higher-end),
> > with the scope of scaling as and when the load increases.
> >
> > I went through the linked article in your answer and understand your
> > viewpoint, but that said even I am looking for some averages ;)
> > Unable to find any authentic blogs which detail their usage of Solr.
> >
> > On Tue, Jul 28, 2020 at 5:19 PM Erick Erickson <er...@gmail.com>
> > wrote:
> >
> > > Here’s a list of some sites using Solr:
> > > https://cwiki.apache.org/confluence/display/solr/PublicServers
> > >
> > > It’s not really what you’re looking for though, it doesn’t really have
> > the
> > > details you’d like.
> > >
> > > There are two dimensions here:
> > >
> > > 1> does Solr do what you want? You’re talking about reporting, and Solr
> > is
> > > primarily a search engine. That said, it has tons of analytics
> > capabilities
> > > built in. Depends on what “reporting” means in your situation.
> > >
> > > 2> how expensive is it? Here “expensive” means hardware and support.
> > > Unfortunately that’s un-answerable. This is really “the sizing
> question”,
> > > and there are too many variables to work with. If you want some backup
> > for
> > > why this is an unfair question to answer in the abstract, see:
> > >
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > >
> > > What I’d recommend is to ask for enough resources to create a PoC on an
> > > existing bit of hardware, your workstation/laptop would do. For a PoC,
> > > there’s no reason to even have 3 Zookeepers, I routinely run with just
> > one
> > > (although I do use an external-to-Solr zookeeper). I’d start with two
> > > shards, leader-only, just to be sure you take into account how
> SolrCloud
> > > works. I wouldn’t get fancy here, just take your first guess at how it
> > will
> > > all work and index a bunch of documents (say 10,000,000) and see if you
> > can
> > > get Solr to create the data for your reports. At that point, you have
> > some
> > > data to work with, i.e. how big your indexes are, whether Solr’s
> > > capabilities meet your functional requirements etc.
> > >
> > > You can infer that I consider 10,000,000 documents a small Solr
> > > installation, with the caveat that if the docs are each gigabytes in
> > length
> > > all bets are off. I’ve worked with clients who index billions of
> > > documents/day (yes billion) admittedly they had a very large hardware
> > > budget ;).  I’ve seen 300M docs (each reasonably complex and a few K
> > each)
> > > fit comfortably on a machine with 12G allocated to Solr (64G total
> > physical
> > > memory IIRC).
> > >
> > > So, It Depends (tm)...
> > >
> > > Good luck!
> > > Erick
> > >
> > > > On Jul 28, 2020, at 7:26 AM, Prashant Jyoti <jt...@gmail.com>
> > > wrote:
> > > >
> > > > Hi,
> > > > I wanted to check if anybody has any references for tech companies'
> > blogs
> > > > detailing their Solr setup in production. I am more interested in
> > storage
> > > > and scaling guidelines. I intend to use Solr for one of my projects
> at
> > > > work(back-end for a reporting tool) and need to convince higher
> > > management
> > > > that it is indeed the right solution. I have gone through the
> material
> > > > available in the Solr reference guide, I am looking for some details
> > > from a
> > > > working production setup.
> > > >
> > > > Thanks!
> > > > --
> > > > Regards,
> > > > Prashant.
> > >
> > >
> >
> > --
> > Regards,
> > Prashant.
> >
>


-- 
Regards,
Prashant.

Re: Production sizing and scaling guidelines -- Solr

Posted by Colvin Cowie <co...@gmail.com>.
Maybe not the most up to date or relevant example for your usage but
https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-solrcloud/
is one that sticks in my mind
I definitely remember seeing a list of these sorts of blogs somewhere a
long time ago... don't know where though

On Tue, 28 Jul 2020 at 13:50, Prashant Jyoti <jt...@gmail.com> wrote:

> Thanks Erick.
>
> 1> does Solr do what you want? You’re talking about reporting, and Solr is
> > primarily a search engine. That said, it has tons of analytics
> capabilities
> > built in. Depends on what “reporting” means in your situation.
> >
> There is a reporting UI which has various criteria the user can filter on,
> the data for this UI will be indexed to and fetched from Solr. These are
> basically call logs of the user's interaction with tech support. The
> documents would be at max a few MBs in size.
>
> > 2> how expensive is it?
>
> I am looking at what kind of a setup is considered okay to handle, let's
> say, average loads to start with (I am not considering billions of
> documents/day to be an average load at my place, that would be higher-end),
> with the scope of scaling as and when the load increases.
>
> I went through the linked article in your answer and understand your
> viewpoint, but that said even I am looking for some averages ;)
> Unable to find any authentic blogs which detail their usage of Solr.
>
> On Tue, Jul 28, 2020 at 5:19 PM Erick Erickson <er...@gmail.com>
> wrote:
>
> > Here’s a list of some sites using Solr:
> > https://cwiki.apache.org/confluence/display/solr/PublicServers
> >
> > It’s not really what you’re looking for though, it doesn’t really have
> the
> > details you’d like.
> >
> > There are two dimensions here:
> >
> > 1> does Solr do what you want? You’re talking about reporting, and Solr
> is
> > primarily a search engine. That said, it has tons of analytics
> capabilities
> > built in. Depends on what “reporting” means in your situation.
> >
> > 2> how expensive is it? Here “expensive” means hardware and support.
> > Unfortunately that’s un-answerable. This is really “the sizing question”,
> > and there are too many variables to work with. If you want some backup
> for
> > why this is an unfair question to answer in the abstract, see:
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> > What I’d recommend is to ask for enough resources to create a PoC on an
> > existing bit of hardware, your workstation/laptop would do. For a PoC,
> > there’s no reason to even have 3 Zookeepers, I routinely run with just
> one
> > (although I do use an external-to-Solr zookeeper). I’d start with two
> > shards, leader-only, just to be sure you take into account how SolrCloud
> > works. I wouldn’t get fancy here, just take your first guess at how it
> will
> > all work and index a bunch of documents (say 10,000,000) and see if you
> can
> > get Solr to create the data for your reports. At that point, you have
> some
> > data to work with, i.e. how big your indexes are, whether Solr’s
> > capabilities meet your functional requirements etc.
> >
> > You can infer that I consider 10,000,000 documents a small Solr
> > installation, with the caveat that if the docs are each gigabytes in
> length
> > all bets are off. I’ve worked with clients who index billions of
> > documents/day (yes billion) admittedly they had a very large hardware
> > budget ;).  I’ve seen 300M docs (each reasonably complex and a few K
> each)
> > fit comfortably on a machine with 12G allocated to Solr (64G total
> physical
> > memory IIRC).
> >
> > So, It Depends (tm)...
> >
> > Good luck!
> > Erick
> >
> > > On Jul 28, 2020, at 7:26 AM, Prashant Jyoti <jt...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > > I wanted to check if anybody has any references for tech companies'
> blogs
> > > detailing their Solr setup in production. I am more interested in
> storage
> > > and scaling guidelines. I intend to use Solr for one of my projects at
> > > work(back-end for a reporting tool) and need to convince higher
> > management
> > > that it is indeed the right solution. I have gone through the material
> > > available in the Solr reference guide, I am looking for some details
> > from a
> > > working production setup.
> > >
> > > Thanks!
> > > --
> > > Regards,
> > > Prashant.
> >
> >
>
> --
> Regards,
> Prashant.
>

Re: Production sizing and scaling guidelines -- Solr

Posted by Prashant Jyoti <jt...@gmail.com>.
Thanks Erick.

1> does Solr do what you want? You’re talking about reporting, and Solr is
> primarily a search engine. That said, it has tons of analytics capabilities
> built in. Depends on what “reporting” means in your situation.
>
There is a reporting UI which has various criteria the user can filter on,
the data for this UI will be indexed to and fetched from Solr. These are
basically call logs of the user's interaction with tech support. The
documents would be at max a few MBs in size.

> 2> how expensive is it?

I am looking at what kind of a setup is considered okay to handle, let's
say, average loads to start with (I am not considering billions of
documents/day to be an average load at my place, that would be higher-end),
with the scope of scaling as and when the load increases.

I went through the linked article in your answer and understand your
viewpoint, but that said even I am looking for some averages ;)
Unable to find any authentic blogs which detail their usage of Solr.

On Tue, Jul 28, 2020 at 5:19 PM Erick Erickson <er...@gmail.com>
wrote:

> Here’s a list of some sites using Solr:
> https://cwiki.apache.org/confluence/display/solr/PublicServers
>
> It’s not really what you’re looking for though, it doesn’t really have the
> details you’d like.
>
> There are two dimensions here:
>
> 1> does Solr do what you want? You’re talking about reporting, and Solr is
> primarily a search engine. That said, it has tons of analytics capabilities
> built in. Depends on what “reporting” means in your situation.
>
> 2> how expensive is it? Here “expensive” means hardware and support.
> Unfortunately that’s un-answerable. This is really “the sizing question”,
> and there are too many variables to work with. If you want some backup for
> why this is an unfair question to answer in the abstract, see:
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> What I’d recommend is to ask for enough resources to create a PoC on an
> existing bit of hardware, your workstation/laptop would do. For a PoC,
> there’s no reason to even have 3 Zookeepers, I routinely run with just one
> (although I do use an external-to-Solr zookeeper). I’d start with two
> shards, leader-only, just to be sure you take into account how SolrCloud
> works. I wouldn’t get fancy here, just take your first guess at how it will
> all work and index a bunch of documents (say 10,000,000) and see if you can
> get Solr to create the data for your reports. At that point, you have some
> data to work with, i.e. how big your indexes are, whether Solr’s
> capabilities meet your functional requirements etc.
>
> You can infer that I consider 10,000,000 documents a small Solr
> installation, with the caveat that if the docs are each gigabytes in length
> all bets are off. I’ve worked with clients who index billions of
> documents/day (yes billion) admittedly they had a very large hardware
> budget ;).  I’ve seen 300M docs (each reasonably complex and a few K each)
> fit comfortably on a machine with 12G allocated to Solr (64G total physical
> memory IIRC).
>
> So, It Depends (tm)...
>
> Good luck!
> Erick
>
> > On Jul 28, 2020, at 7:26 AM, Prashant Jyoti <jt...@gmail.com>
> wrote:
> >
> > Hi,
> > I wanted to check if anybody has any references for tech companies' blogs
> > detailing their Solr setup in production. I am more interested in storage
> > and scaling guidelines. I intend to use Solr for one of my projects at
> > work(back-end for a reporting tool) and need to convince higher
> management
> > that it is indeed the right solution. I have gone through the material
> > available in the Solr reference guide, I am looking for some details
> from a
> > working production setup.
> >
> > Thanks!
> > --
> > Regards,
> > Prashant.
>
>

-- 
Regards,
Prashant.

Re: Production sizing and scaling guidelines -- Solr

Posted by Erick Erickson <er...@gmail.com>.
Here’s a list of some sites using Solr: https://cwiki.apache.org/confluence/display/solr/PublicServers

It’s not really what you’re looking for though, it doesn’t really have the details you’d like.

There are two dimensions here:

1> does Solr do what you want? You’re talking about reporting, and Solr is primarily a search engine. That said, it has tons of analytics capabilities built in. Depends on what “reporting” means in your situation.

2> how expensive is it? Here “expensive” means hardware and support. Unfortunately that’s un-answerable. This is really “the sizing question”, and there are too many variables to work with. If you want some backup for why this is an unfair question to answer in the abstract, see: https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

What I’d recommend is to ask for enough resources to create a PoC on an existing bit of hardware, your workstation/laptop would do. For a PoC, there’s no reason to even have 3 Zookeepers, I routinely run with just one (although I do use an external-to-Solr zookeeper). I’d start with two shards, leader-only, just to be sure you take into account how SolrCloud works. I wouldn’t get fancy here, just take your first guess at how it will all work and index a bunch of documents (say 10,000,000) and see if you can get Solr to create the data for your reports. At that point, you have some data to work with, i.e. how big your indexes are, whether Solr’s capabilities meet your functional requirements etc.

You can infer that I consider 10,000,000 documents a small Solr installation, with the caveat that if the docs are each gigabytes in length all bets are off. I’ve worked with clients who index billions of documents/day (yes billion) admittedly they had a very large hardware budget ;).  I’ve seen 300M docs (each reasonably complex and a few K each) fit comfortably on a machine with 12G allocated to Solr (64G total physical memory IIRC).

So, It Depends (tm)...

Good luck!
Erick

> On Jul 28, 2020, at 7:26 AM, Prashant Jyoti <jt...@gmail.com> wrote:
> 
> Hi,
> I wanted to check if anybody has any references for tech companies' blogs
> detailing their Solr setup in production. I am more interested in storage
> and scaling guidelines. I intend to use Solr for one of my projects at
> work(back-end for a reporting tool) and need to convince higher management
> that it is indeed the right solution. I have gone through the material
> available in the Solr reference guide, I am looking for some details from a
> working production setup.
> 
> Thanks!
> -- 
> Regards,
> Prashant.