You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nimrod Cohen <Ni...@nice.com> on 2015/01/20 15:24:22 UTC

shards per disk

Hi
I done some performance test, and I wanted to know if any one saw the same behavior.

We need to get 1K documents out of 100M documents each time we query solr and send them to text Analysis.
First configuration had 8 shards on one RAD (Disk F) we  got the 1K in around 15 seconds.
Second configuration we removed the RAD and work on 8 different disk each shard on one disk and get the 1K documents in 2-3 seconds.

Do anyone see this type of performance improvement or can verify that it's reasonable?

Thanks,
NIMROD COHEN
Software Engineer
(T) +972 (9) 775-3668
(M) +972 (0) 52-5522901
nimrod.cohen@nice.com<ma...@nice.com>
www.nice.com<http://www.nice.com/>
[cid:image001.jpg@01D0349E.3C2773C0]<http://www.nice.com/real-time-guidance>


Re: shards per disk

Posted by Jack Krupansky <ja...@gmail.com>.
It sounds like your app needs a lot more RAM so that it is not doing so
much I/O.

-- Jack Krupansky

On Tue, Jan 20, 2015 at 9:24 AM, Nimrod Cohen <Ni...@nice.com> wrote:

> Hi
>
> I done some performance test, and I wanted to know if any one saw the same
> behavior.
>
>
>
> We need to get 1K documents out of 100M documents each time we query solr
> and send them to text Analysis.
>
> First configuration had 8 shards on one RAD (Disk F) we  got the 1K in
> around 15 seconds.
>
> Second configuration we removed the RAD and work on 8 different disk each
> shard on one disk and get the 1K documents in 2-3 seconds.
>
>
>
> Do anyone see this type of performance improvement or can verify that it’s
> reasonable?
>
>
>
> Thanks,
>
> *NIMROD COHEN*
> *Software Engineer*
> (T) +972 (9) 775-3668
> (M) +972 (0) 52-5522901
> nimrod.cohen@nice.com
> www.nice.com
> [image: http://tlvbiztalk03/SignatureMaker/img/banner_SAFE_real_time.jpg]
> <http://www.nice.com/real-time-guidance>
>
>
>

Re: shards per disk

Posted by Roman Chyla <ro...@gmail.com>.
I think this makes sense to (ie. the setup), since the search is getting 1K
documents each time (for textual analysis, ie. they are probably large
docs), and use Solr as a storage (which is totally fine) then the parallel
multiple drive i/o shards speed things up. The index is probably large, so
it is unrealistic to have enough RAM to cache the most used parts (if they
are hitting different docs all the time). I'm curious, as Toke's points
out, what was the RAID configuration you ran it on initially.

Best,

roman

On Tue, Jan 20, 2015 at 12:43 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> Nimrod Cohen [Nimrod.Cohen@nice.com] wrote:
> > We need to get 1K documents out of 100M documents each
> > time we query solr and send them to text Analysis.
> > First configuration had 8 shards on one RAD (Disk F) we
> > got the 1K in around 15 seconds.
> > Second configuration we removed the RAD and work on 8
> > different disk each shard on one disk and get the 1K
> > documents in 2-3 seconds.
>
> Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be
> about the same performance as shards on individual disks, due to striping.
> If you did a RAID 1 with, for example, 2*4 disks, your performance would be
> markedly worse. If you did a RAID 1 of 8*1 disk, it would be better than
> individual drives as it would mitigate the "slowest drive dictates overall
> speed" problem. If your RAID is not really a RAID but instead JBOD or
> similar (http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD),
> then the poor performance is to be expected as chances are all your data
> would reside on the same physical disk.
>
> Please describe your RAID setup in detail.
>
> Also, is 2-3 second response time satisfactory to you? If not, what are
> you aiming at?
>
> - Toke Eskildsen
>

Re: shards per disk

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2015-01-21 at 09:46 +0100, Toke Eskildsen wrote:
> Anyway, RAID 0 does really help for random access, [...]

Should have been "...does not really help...".

- Toke Eskildsen



Re: shards per disk

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2015-01-21 at 07:56 +0100, Nimrod Cohen wrote:
> RAID [0] configuration
> 
> each shard has data on each one of the 8 disks in the RAID, on each
> query to get 1K docs, each shard request to get data from the one RAID
> disk, so we get 8 request to get date from all of the disks and we get
> a queue.

Your RAID-setup (whether it is hardware or software) should use a
parallel queue, so that requests to different physical drives are issued
in parallel under the hood. But RAID is not that well-defined, so maybe
your controller or your software uses a single sequential queue. In that
case, the pattern will be as you describe.

Anyway, RAID 0 does really help for random access, when your access
pattern is homogeneous across shards. Even if you fix the problem with
your current RAID 0 setup, it is unlikely that you would get a
noticeable performance advantage over separate drives. It would make it
easier to add shards though, as you would not have to purchase a new
drive or unbalance your setup by running multiple shards on some drives.

> Regarding the response time, 2-3 seconds is good for our usage also
> getting better is always better, if we will get better we might run
> the analysis on more than 1K.

Limit the amount of fields you request and try experimenting with SolrJ
and the binary protocol: I have found that the time for serializing the
result to XML can be quite high for large responses.

If the number of fields needed is very low and the content of those
fields is not large, you could try using faceting with DocValues to get
the content.


- Toke Eskildsen, State and University Library, Denmark




RE: shards per disk

Posted by Nimrod Cohen <Ni...@nice.com>.
Hi Toke,

Thanks for your answer.

We are using RAID 0 of 8 disk, I don't understand why it should give me the same performance as disk per drive.

Below is an explanation as I see it please correct me if I'm wrong.



RAID configuration

each shard has data on each one of the 8 disks in the RAID, on each query to get 1K docs, each shard request to get data from the one RAID disk, so we get 8 request to get date from all of the disks and we get a queue.



Shard per disk configuration

each shard has data only on his own disk, each shard request to get data from his own disk and they don't block each other.



If I'm wrong please correct me, I do want to get it.



Regarding the response time, 2-3 seconds is good for our usage also getting better is always better, if we will get better we might run the analysis on more than 1K.



Thanks for the help.

NIMROD COHEN

Software Engineer

RTI

(T) +972 (9) 775-3668

(M) +972 (0) 52-5522901

nimrod.cohen@nice.com

www.nice.com







-----Original Message-----
From: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
Sent: יום ג, 20 ינואר 2015 19:43
To: solr-user@lucene.apache.org
Subject: RE: shards per disk



Nimrod Cohen [Nimrod.Cohen@nice.com] wrote:

> We need to get 1K documents out of 100M documents each time we query

> solr and send them to text Analysis.

> First configuration had 8 shards on one RAD (Disk F) we got the 1K in

> around 15 seconds.

> Second configuration we removed the RAD and work on 8 different disk

> each shard on one disk and get the 1K documents in 2-3 seconds.



Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be about the same performance as shards on individual disks, due to striping. If you did a RAID 1 with, for example, 2*4 disks, your performance would be markedly worse. If you did a RAID 1 of 8*1 disk, it would be better than individual drives as it would mitigate the "slowest drive dictates overall speed" problem. If your RAID is not really a RAID but instead JBOD or similar (http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD), then the poor performance is to be expected as chances are all your data would reside on the same physical disk.



Please describe your RAID setup in detail.



Also, is 2-3 second response time satisfactory to you? If not, what are you aiming at?



- Toke Eskildsen

RE: shards per disk

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Nimrod Cohen [Nimrod.Cohen@nice.com] wrote:
> We need to get 1K documents out of 100M documents each
> time we query solr and send them to text Analysis.
> First configuration had 8 shards on one RAD (Disk F) we
> got the 1K in around 15 seconds.
> Second configuration we removed the RAD and work on 8
> different disk each shard on one disk and get the 1K
> documents in 2-3 seconds.

Which RAID level? 0, 1, maybe 5 or 6? If you did a RAID 0, it should be about the same performance as shards on individual disks, due to striping. If you did a RAID 1 with, for example, 2*4 disks, your performance would be markedly worse. If you did a RAID 1 of 8*1 disk, it would be better than individual drives as it would mitigate the "slowest drive dictates overall speed" problem. If your RAID is not really a RAID but instead JBOD or similar (http://en.wikipedia.org/wiki/Non-RAID_drive_architectures#JBOD), then the poor performance is to be expected as chances are all your data would reside on the same physical disk.

Please describe your RAID setup in detail.

Also, is 2-3 second response time satisfactory to you? If not, what are you aiming at?

- Toke Eskildsen

Re: shards per disk

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/20/2015 7:45 AM, Nimrod Cohen wrote:
> All shards are on the same system each one use different port.
> BTW
> Data size is about 1T, memory is 192G.

If Solr has to actually go to the disk to satisfy a query, it's going to
be slow.  This will always be true, no matter how many disks you use. 
In terms of performance, disks are like molasses or a glacier compared
to RAM.  Even an SSD is a lot slower.

Solr performance is good when all of the data that a query needs is
sitting in RAM already, cached by the operating system using memory that
is not allocated to programs.  192GB of RAM is nowhere near enough to
assure good performance if the Solr indexes are 1TB in size.  I would
bet that this is true even if you put the indexes on SSD instead of
spinning magnetic drives ... although performance would be better with SSD.

http://wiki.apache.org/solr/SolrPerformanceProblems

You should *not* run multiple Solr instances per machine.  All of your
index cores should be handled by one instance.  Running multiple
instances is a waste of resources, especially memory, which as already
discussed is extremely precious when dealing with a large index.

Thanks,
Shawn


RE: shards per disk

Posted by Nimrod Cohen <Ni...@nice.com>.
Hi
All shards are on the same system each one use different port.
BTW
Data size is about 1T, memory is 192G.

NIMROD COHEN 
Software Engineer 
RTI
(T) +972 (9) 775-3668
(M) +972 (0) 52-5522901
nimrod.cohen@nice.com 
www.nice.com  


-----Original Message-----
From: Nitin Solanki [mailto:nitinmlvya@gmail.com] 
Sent: יום ג, 20 ינואר 2015 16:37
To: solr-user@lucene.apache.org
Subject: Re: shards per disk

Hey Nimrod,
Nice try. I just want to know that these 8 shards are each on different system or do you implemented sharding on single system and each shard with different port?

On Tue, Jan 20, 2015 at 7:54 PM, Nimrod Cohen <Ni...@nice.com> wrote:

> Hi
>
> I done some performance test, and I wanted to know if any one saw the 
> same behavior.
>
>
>
> We need to get 1K documents out of 100M documents each time we query 
> solr and send them to text Analysis.
>
> First configuration had 8 shards on one RAD (Disk F) we  got the 1K in 
> around 15 seconds.
>
> Second configuration we removed the RAD and work on 8 different disk 
> each shard on one disk and get the 1K documents in 2-3 seconds.
>
>
>
> Do anyone see this type of performance improvement or can verify that 
> it’s reasonable?
>
>
>
> Thanks,
>
> *NIMROD COHEN*
> *Software Engineer*
> (T) +972 (9) 775-3668
> (M) +972 (0) 52-5522901
> nimrod.cohen@nice.com
> www.nice.com
> [image: 
> http://tlvbiztalk03/SignatureMaker/img/banner_SAFE_real_time.jpg]
> <http://www.nice.com/real-time-guidance>
>
>
>

Re: shards per disk

Posted by Nitin Solanki <ni...@gmail.com>.
Hey Nimrod,
Nice try. I just want to know that these 8 shards are each on different
system or do you implemented sharding on single system and each shard with
different port?

On Tue, Jan 20, 2015 at 7:54 PM, Nimrod Cohen <Ni...@nice.com> wrote:

> Hi
>
> I done some performance test, and I wanted to know if any one saw the same
> behavior.
>
>
>
> We need to get 1K documents out of 100M documents each time we query solr
> and send them to text Analysis.
>
> First configuration had 8 shards on one RAD (Disk F) we  got the 1K in
> around 15 seconds.
>
> Second configuration we removed the RAD and work on 8 different disk each
> shard on one disk and get the 1K documents in 2-3 seconds.
>
>
>
> Do anyone see this type of performance improvement or can verify that it’s
> reasonable?
>
>
>
> Thanks,
>
> *NIMROD COHEN*
> *Software Engineer*
> (T) +972 (9) 775-3668
> (M) +972 (0) 52-5522901
> nimrod.cohen@nice.com
> www.nice.com
> [image: http://tlvbiztalk03/SignatureMaker/img/banner_SAFE_real_time.jpg]
> <http://www.nice.com/real-time-guidance>
>
>
>