You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by search engn dev <sa...@gmail.com> on 2014/02/21 05:24:59 UTC

how many shards required to search data

Data size is 250 GB of small records. each record is of around 0.3kb size. It
consists around 1 billion records. my index has 20 different fields. .
Majorly queries will be very simple or spacial queries mainly on on 2-3
fields. 
all 20 fields will be stored. Any suggestions on how many shards will I need
to search data?



--
View this message in context: http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how many shards required to search data

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/21/2014 1:39 AM, search engn dev wrote:
> As you suggestedI have indexed 12million sample records in solr on hardware
> of 8gb ram. Size of index is 3gb.
> can i extrapolate this to predict actual size of index.?

If the sizes of those records are about the same size as the records in
the system as a whole, you can probably use that to extrapolate.

Based on that, I would guess that the index is probably going to be
about 85GB.  That's a lot less than I would have guessed, so perhaps
there's a lot of extra stuff in that 250GB that doesn't actually get
sent to Solr.

Even though they are small, the number of documents will probably
require a larger Java heap than the relatively small index size would
normally require.

Do you have any kind of notion as to what kind of query volume you're
going to have?  If it's low, you can put multiple shards on your
multi-cpu machines and take advantage of parallel processing.  If the
query volume is high, you'll need all those cpus to handle the load of
one shard, and you might need more than two machines for each shard.

You'll want to shard your index even though it's relatively small in
terms of disk space, because a billion documents is a LOT.

If you're just starting out, SolrCloud is probably a good way to go.  It
handles document routing across shards for you.  You didn't say whether
that was your plan or not.

Thanks,
Shawn


Re: how many shards required to search data

Posted by search engn dev <sa...@gmail.com>.
As you suggestedI have indexed 12million sample records in solr on hardware
of 8gb ram. Size of index is 3gb.
can i extrapolate this to predict actual size of index.?



--
View this message in context: http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715p4118753.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how many shards required to search data

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/20/2014 9:24 PM, search engn dev wrote:
> Data size is 250 GB of small records. each record is of around 0.3kb size. It
> consists around 1 billion records. my index has 20 different fields. .
> Majorly queries will be very simple or spacial queries mainly on on 2-3
> fields. 
> all 20 fields will be stored. Any suggestions on how many shards will I need
> to search data?

Your question is impossible to answer.  I will tell you that this is a
very big index, and it's going to take a lot of hardware.  It's not the
biggest I've heard of, but it is quite large.  Any situation that would
result in a performance issue on a small index is going to be far worse
on a large index.

http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-ont-have-a-definitive-answer/

Two machines with 32GB of RAM each are not going to be anywhere near
enough.  If you can't get more than 32GB of RAM in each server, you're
probably going to need a lot of them.

Since all your fields will be stored, the *minimum* size of your index
will be approximately equal to the original data size after compression
-- assuming you're using 4.1.0 or later, where compression was
introduced.  That will not be the end, though -- it doesn't take into
account the size of the *indexed* data.

Although it is theoretically possible to look at a schema and the
original data to calculate the size of the indexed data, in reality the
only way to be SURE is to actually index a significant percentage of
your real data with the same schema you would use in production.

Once you know how big your index is actually going to be, you can begin
to figure out how much total RAM you'll need across all the servers for
a single copy of the index (no redundancy).  If you want redundancy, the
requirements will be at least twice what you calculate.

http://wiki.apache.org/solr/SolrPerformanceProblems

The number of shards and replicas that you're going to need is going to
depend on the query volume, the nature of the queries, and the nature of
the data.  Just like with index size, the only way to know is to try it
with all your real data.

If your query volume is large, you'll need multiple copies of the
complete index, which means more servers.

If you don't care how long each query takes and your query volume will
be low, then your server requirements will be a LOT smaller.

Thanks,
Shawn


Re: how many shards required to search data

Posted by search engn dev <sa...@gmail.com>.
Shard1 config : Config: 32GB RAM, 4 core
Shard2 config : Config: 32GB RAM, 4 core




--
View this message in context: http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715p4118717.html
Sent from the Solr - User mailing list archive at Nabble.com.