You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Stephen Powis <st...@pardot.com> on 2011/11/22 04:45:08 UTC

Painfully slow transfer speed from Solr

I'm running Solr 1.4.1 with Jetty.  When I make requests against solr that
have a large response (~1mb of data) I'm getting super slow transfer times
back to the client, I'm hoping you guys can help shed some light on this
issue for me.

Some more information about my setup:
- The qTime header in the response generally is very small, under 1 sec ( <
1000ms).
- The client making the request is on a 1000mb LAN with the solr server,
yet the transfer speed is only between 16k and 30k per sec.
- If I make the same request against localhost on the solr server, I see
the same slow speeds.  SCP and other transfer between the client and server
are all quick.  I'd like to think these tests eliminate any kind of network
pipe problem between the two servers.
- If I make the same query repeatedly, sometimes it will send the response
very quickly (6mb/sec and faster)
- While testing this, load on the box was basically at idle.

So I guess I'm hoping someone can help me understand whats going on here,
and why I'm seeing this behavior, and perhaps a possible solution?

What exactly does qTime measure?  I assume it is the time it takes to
process the request and fetch the resulting rows.  It obviously does not
include the transfer time back to the client, but does it include pulling
the data from the index?  Is solr slow to pull the data from the index and
drop it into the network pipe?

Thanks for any help!
Stephen

Re: Painfully slow transfer speed from Solr

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Nov 22, 2011 at 12:19 AM, Stephen Powis
<st...@pardot.com> wrote:
> Just trying to get a better understanding of this.....Wouldn't the indexes
> not being in the disk cache make the queries themselves slow as well (high
> qTime), not just fetching the results?

What happens in situations like this is that the true "index" portions
that are being used more often get cached by the OS, but the stored
fields aren't.
Also, use of the index portion normally consists of sequential IO,
which is more efficient if not cached (i.e. iterating over all doc ids
that match a term).
But after the top N documents are found, if the stored fields aren't
in cache, it's a disk seek for each document being returned.  This is
normally only tolerable if you're only retrieving the top 10 documents
or so.

-Yonik
http://www.lucidimagination.com

Re: Painfully slow transfer speed from Solr

Posted by Walter Underwood <wu...@wunderwood.org>.
When you ask for "a large response (~1mb of data)", you are asking for Solr to do tons of disk accesses and sorting before it sends the first response. That is going to be slow.

I strongly recommend requesting smaller results.

One of those requests may be using most of the caching resources in Solr, so that the next response is slow, too.

How many rows are you requesting and why?

wunder
Walter Underwood

On Nov 21, 2011, at 9:46 PM, Shawn Heisey wrote:

> On 11/21/2011 10:19 PM, Stephen Powis wrote:
>> Thanks for the reply Shawn.
>> 
>> The solr server currently has 8gb of ram and the total size of the dataDir
>> is around 30gb.  I start solr and give the java heap up to 4gb of ram, so
>> that leaves 4gb for the OS, there are no other running services on the
>> box.  So from what you are saying, we are way under on the amount of ram we
>> would ideally have.
>> 
>> Just trying to get a better understanding of this.....Wouldn't the indexes
>> not being in the disk cache make the queries themselves slow as well (high
>> qTime), not just fetching the results?
> 
> Having 4GB for disk cache isn't much compared to 30GB, but it's probably enough to get most of the important bits cached.  On the other hand, unless your query is really complex, 1 second is a pretty slow response time.  (Yonik said the same thing, only better.)
> 
>> We currently store all the fields that we index, my reasoning behind that
>> is that debugging results we get from solr w/o being able to see what is
>> stored in solr would be near impossible (in my head anyhow..).  Generally
>> our original source (mysql) and solr are consistent, but we've had cases
>> where some updates have been missed for one reason or another.
>> 
>> So my options are: reduce index sizes, increase ram on the server, increase
>> disk speed (SSD drives)?
> 
> You're right about debugging being harder if you don't see all the data in the result.  It's something I have to deal with in my index.  Each of my index shards is already 20GB in size, it would be easily triple that if I stored everything.
> 
> One thing you can do to help with debugging is include faceting on the field that you are trying to debug.  You won't see the original field values, but you will see what your index analyzer chain has done with it, which sometimes is even more useful.
> 
> SSD is an awesome option, but it becomes absolutely critical that you have redundant servers if you go that route.  This is because there are few (maybe none) RAID1, RAID5, or RAID10 solutions that support TRIM, which is absolutely required for good SSD performance.  With RAID0 or no RAID, a single SSD failure takes the server out.  Things can get very expensive very quickly with SSD.
> 
> With your current index size, if you can bump the server RAM to 32GB, your performance will be very good.  If you can go higher, it would be stellar.  You'll want to be running a 64-bit OS and 64-bit JVM.
> 
> Thanks,
> Shawn
> 


Re: Painfully slow transfer speed from Solr

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/21/2011 10:19 PM, Stephen Powis wrote:
> Thanks for the reply Shawn.
>
> The solr server currently has 8gb of ram and the total size of the dataDir
> is around 30gb.  I start solr and give the java heap up to 4gb of ram, so
> that leaves 4gb for the OS, there are no other running services on the
> box.  So from what you are saying, we are way under on the amount of ram we
> would ideally have.
>
> Just trying to get a better understanding of this.....Wouldn't the indexes
> not being in the disk cache make the queries themselves slow as well (high
> qTime), not just fetching the results?

Having 4GB for disk cache isn't much compared to 30GB, but it's probably 
enough to get most of the important bits cached.  On the other hand, 
unless your query is really complex, 1 second is a pretty slow response 
time.  (Yonik said the same thing, only better.)

> We currently store all the fields that we index, my reasoning behind that
> is that debugging results we get from solr w/o being able to see what is
> stored in solr would be near impossible (in my head anyhow..).  Generally
> our original source (mysql) and solr are consistent, but we've had cases
> where some updates have been missed for one reason or another.
>
> So my options are: reduce index sizes, increase ram on the server, increase
> disk speed (SSD drives)?

You're right about debugging being harder if you don't see all the data 
in the result.  It's something I have to deal with in my index.  Each of 
my index shards is already 20GB in size, it would be easily triple that 
if I stored everything.

One thing you can do to help with debugging is include faceting on the 
field that you are trying to debug.  You won't see the original field 
values, but you will see what your index analyzer chain has done with 
it, which sometimes is even more useful.

SSD is an awesome option, but it becomes absolutely critical that you 
have redundant servers if you go that route.  This is because there are 
few (maybe none) RAID1, RAID5, or RAID10 solutions that support TRIM, 
which is absolutely required for good SSD performance.  With RAID0 or no 
RAID, a single SSD failure takes the server out.  Things can get very 
expensive very quickly with SSD.

With your current index size, if you can bump the server RAM to 32GB, 
your performance will be very good.  If you can go higher, it would be 
stellar.  You'll want to be running a 64-bit OS and 64-bit JVM.

Thanks,
Shawn


Re: Painfully slow transfer speed from Solr

Posted by Stephen Powis <st...@pardot.com>.
Thanks for the reply Shawn.

The solr server currently has 8gb of ram and the total size of the dataDir
is around 30gb.  I start solr and give the java heap up to 4gb of ram, so
that leaves 4gb for the OS, there are no other running services on the
box.  So from what you are saying, we are way under on the amount of ram we
would ideally have.

Just trying to get a better understanding of this.....Wouldn't the indexes
not being in the disk cache make the queries themselves slow as well (high
qTime), not just fetching the results?

We currently store all the fields that we index, my reasoning behind that
is that debugging results we get from solr w/o being able to see what is
stored in solr would be near impossible (in my head anyhow..).  Generally
our original source (mysql) and solr are consistent, but we've had cases
where some updates have been missed for one reason or another.

So my options are: reduce index sizes, increase ram on the server, increase
disk speed (SSD drives)?

Thanks
Stephen

On Mon, Nov 21, 2011 at 11:33 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 11/21/2011 8:45 PM, Stephen Powis wrote:
>
>> I'm running Solr 1.4.1 with Jetty.  When I make requests against solr that
>> have a large response (~1mb of data) I'm getting super slow transfer times
>> back to the client, I'm hoping you guys can help shed some light on this
>> issue for me.
>>
>> Some more information about my setup:
>> - The qTime header in the response generally is very small, under 1 sec (<
>> 1000ms).
>> - The client making the request is on a 1000mb LAN with the solr server,
>> yet the transfer speed is only between 16k and 30k per sec.
>> - If I make the same request against localhost on the solr server, I see
>> the same slow speeds.  SCP and other transfer between the client and
>> server
>> are all quick.  I'd like to think these tests eliminate any kind of
>> network
>> pipe problem between the two servers.
>> - If I make the same query repeatedly, sometimes it will send the response
>> very quickly (6mb/sec and faster)
>> - While testing this, load on the box was basically at idle.
>>
>> So I guess I'm hoping someone can help me understand whats going on here,
>> and why I'm seeing this behavior, and perhaps a possible solution?
>>
>> What exactly does qTime measure?  I assume it is the time it takes to
>> process the request and fetch the resulting rows.  It obviously does not
>> include the transfer time back to the client, but does it include pulling
>> the data from the index?  Is solr slow to pull the data from the index and
>> drop it into the network pipe?
>>
>
> Your bottleneck is probably disk I/O and a lack of OS disk cache.  How big
> is your index, how much RAM do you have, and how much RAM is used by
> processes, especially the Java heap?  QTime measures the amount of time
> that Solr spent finding the document IDs.  It does not include time spent
> retrieving the requested fields or sending it to the client.
>
> Solr is designed to work best when the entire index fits into the OS disk
> cache, which is free memory that is not assigned to the OS or other
> processes.  Limiting the number of fields that Solr indexes (for searching)
> and stores (for data retrieval) keeps index size down, so you can fit more
> of it in the disk cache.  When the index data is in RAM, Solr is very very
> fast.  If it has to go out to the disk to search or retrieve, it is very
> slow.
>
> You should only index the fields absolutely required to get good search
> results, and you should only store the fields required to display a grid of
> search results.  When displaying full details for an individual item, go to
> the the original data source using the identifier returned in the search
> results.  In typical search applications, you only need full details for a
> small subset of the results returned by a search, so don't retrieve
> megabytes of information that will never be used.
>
> Thanks,
> Shawn
>
>

Re: Painfully slow transfer speed from Solr

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/21/2011 8:45 PM, Stephen Powis wrote:
> I'm running Solr 1.4.1 with Jetty.  When I make requests against solr that
> have a large response (~1mb of data) I'm getting super slow transfer times
> back to the client, I'm hoping you guys can help shed some light on this
> issue for me.
>
> Some more information about my setup:
> - The qTime header in the response generally is very small, under 1 sec (<
> 1000ms).
> - The client making the request is on a 1000mb LAN with the solr server,
> yet the transfer speed is only between 16k and 30k per sec.
> - If I make the same request against localhost on the solr server, I see
> the same slow speeds.  SCP and other transfer between the client and server
> are all quick.  I'd like to think these tests eliminate any kind of network
> pipe problem between the two servers.
> - If I make the same query repeatedly, sometimes it will send the response
> very quickly (6mb/sec and faster)
> - While testing this, load on the box was basically at idle.
>
> So I guess I'm hoping someone can help me understand whats going on here,
> and why I'm seeing this behavior, and perhaps a possible solution?
>
> What exactly does qTime measure?  I assume it is the time it takes to
> process the request and fetch the resulting rows.  It obviously does not
> include the transfer time back to the client, but does it include pulling
> the data from the index?  Is solr slow to pull the data from the index and
> drop it into the network pipe?

Your bottleneck is probably disk I/O and a lack of OS disk cache.  How 
big is your index, how much RAM do you have, and how much RAM is used by 
processes, especially the Java heap?  QTime measures the amount of time 
that Solr spent finding the document IDs.  It does not include time 
spent retrieving the requested fields or sending it to the client.

Solr is designed to work best when the entire index fits into the OS 
disk cache, which is free memory that is not assigned to the OS or other 
processes.  Limiting the number of fields that Solr indexes (for 
searching) and stores (for data retrieval) keeps index size down, so you 
can fit more of it in the disk cache.  When the index data is in RAM, 
Solr is very very fast.  If it has to go out to the disk to search or 
retrieve, it is very slow.

You should only index the fields absolutely required to get good search 
results, and you should only store the fields required to display a grid 
of search results.  When displaying full details for an individual item, 
go to the the original data source using the identifier returned in the 
search results.  In typical search applications, you only need full 
details for a small subset of the results returned by a search, so don't 
retrieve megabytes of information that will never be used.

Thanks,
Shawn