You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Dan A. Dickey" <da...@savvis.net> on 2009/09/10 15:13:33 UTC

Solr http post performance seems slow - help?

I'm posting documents to Solr using http (curl) from
C++/C code and am seeing approximately 3.3 - 3.4
documents per second being posted.  Is this to be expected?
Granted - I understand that this depends somewhat on the
machine running Solr.  By the way - I'm running Solr inside JBoss.

I was hoping for maybe 20 or more docs/sec, and 3 or so
is quite a way from that.

Also, I'm posting just a single document at a time.  I once tried
5 processes each posting documents, and that slowed things
down considerably.  Down into the multiple (5-10) seconds per document.

Does anyone have suggestions on what I can try?  I'll soon
have better servers installed and will be splitting the indexing
work from the searching - but at this point in time, I wasn't doing
indexing while searching anyway.  Thanks for any and all help!
	-Dan

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dickey@savvis.net

Re: Solr http post performance seems slow - help?

Posted by "Dan A. Dickey" <da...@savvis.net>.
On Thursday 10 September 2009 01:47:38 pm Walter Underwood wrote:
> What kind of storage is used for the Solr index files? When I tested it, NFS
> was 100X slower than local disk.

I'm sorry - I misunderstood your question.  The Solr indexes themselves are
stored on local disk.  The documents are retrievable (for DIH) from NFS.

And, I started looking closer into this problem... both the box doing the
posts, and the solr box are around 90% idle while the indexing process is
running.  And there is no I/O wait time.
I'm now looking into possible network slowness...
	-Dan

> 
> wunder 
> 
> -----Original Message-----
> From: Dan A. Dickey [mailto:dan.dickey@savvis.net] 
> Sent: Thursday, September 10, 2009 11:15 AM
> To: solr-user@lucene.apache.org
> Cc: Walter Underwood
> Subject: Re: Solr http post performance seems slow - help?
> 
> On Thursday 10 September 2009 09:10:27 am Walter Underwood wrote:
> > How big are your documents?
> 
> For the most part, I'm just indexing metadata that has been pulled from
> the documents.  I think I have currently about 40 or so fields that I'm
> setting.
> When the document is an actual document - pdf, doc, etc... I use the DIH
> to extract stuff and also set the metadata then.
> 
> > Is your index on local disk or network- 
> > mounted disk?
> 
> I'm basically pulling the metadata info from a database and the documents
> themselves are shared via NFS to the Solr indexer.
> 	-Dan
> 
> > 
> > wunder
> > 
> > On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote:
> > 
> > > On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey  
> > > <da...@savvis.net> wrote:
> > >> I'm posting documents to Solr using http (curl) from
> > >> C++/C code and am seeing approximately 3.3 - 3.4
> > >> documents per second being posted.  Is this to be expected?
> > >
> > > No, that's very slow.
> > > Are you using libcurl, or actually forking a new process for every  
> > > document?
> > > Are you committing on every document?
> > >
> > > If you can, using Java would make your life much easier since you
> > > could use the SolrJ client and it's binary protocol for indexing.
> > >
> > > -Yonik
> > > http://www.lucidimagination.com
> > >
> > 
> > 
> 
> 

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dickey@savvis.net

RE: Solr http post performance seems slow - help?

Posted by Walter Underwood <wu...@wunderwood.org>.
What kind of storage is used for the Solr index files? When I tested it, NFS
was 100X slower than local disk.

wunder 

-----Original Message-----
From: Dan A. Dickey [mailto:dan.dickey@savvis.net] 
Sent: Thursday, September 10, 2009 11:15 AM
To: solr-user@lucene.apache.org
Cc: Walter Underwood
Subject: Re: Solr http post performance seems slow - help?

On Thursday 10 September 2009 09:10:27 am Walter Underwood wrote:
> How big are your documents?

For the most part, I'm just indexing metadata that has been pulled from
the documents.  I think I have currently about 40 or so fields that I'm
setting.
When the document is an actual document - pdf, doc, etc... I use the DIH
to extract stuff and also set the metadata then.

> Is your index on local disk or network- 
> mounted disk?

I'm basically pulling the metadata info from a database and the documents
themselves are shared via NFS to the Solr indexer.
	-Dan

> 
> wunder
> 
> On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote:
> 
> > On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey  
> > <da...@savvis.net> wrote:
> >> I'm posting documents to Solr using http (curl) from
> >> C++/C code and am seeing approximately 3.3 - 3.4
> >> documents per second being posted.  Is this to be expected?
> >
> > No, that's very slow.
> > Are you using libcurl, or actually forking a new process for every  
> > document?
> > Are you committing on every document?
> >
> > If you can, using Java would make your life much easier since you
> > could use the SolrJ client and it's binary protocol for indexing.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> 
> 

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dickey@savvis.net



Re: Solr http post performance seems slow - help?

Posted by "Dan A. Dickey" <da...@savvis.net>.
On Thursday 10 September 2009 09:10:27 am Walter Underwood wrote:
> How big are your documents?

For the most part, I'm just indexing metadata that has been pulled from
the documents.  I think I have currently about 40 or so fields that I'm setting.
When the document is an actual document - pdf, doc, etc... I use the DIH
to extract stuff and also set the metadata then.

> Is your index on local disk or network- 
> mounted disk?

I'm basically pulling the metadata info from a database and the documents
themselves are shared via NFS to the Solr indexer.
	-Dan

> 
> wunder
> 
> On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote:
> 
> > On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey  
> > <da...@savvis.net> wrote:
> >> I'm posting documents to Solr using http (curl) from
> >> C++/C code and am seeing approximately 3.3 - 3.4
> >> documents per second being posted.  Is this to be expected?
> >
> > No, that's very slow.
> > Are you using libcurl, or actually forking a new process for every  
> > document?
> > Are you committing on every document?
> >
> > If you can, using Java would make your life much easier since you
> > could use the SolrJ client and it's binary protocol for indexing.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> 
> 

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dickey@savvis.net

Re: Solr http post performance seems slow - help?

Posted by Walter Underwood <wu...@wunderwood.org>.
How big are your documents? Is your index on local disk or network- 
mounted disk?

wunder

On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote:

> On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey  
> <da...@savvis.net> wrote:
>> I'm posting documents to Solr using http (curl) from
>> C++/C code and am seeing approximately 3.3 - 3.4
>> documents per second being posted.  Is this to be expected?
>
> No, that's very slow.
> Are you using libcurl, or actually forking a new process for every  
> document?
> Are you committing on every document?
>
> If you can, using Java would make your life much easier since you
> could use the SolrJ client and it's binary protocol for indexing.
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Solr http post performance seems slow - help?

Posted by "Dan A. Dickey" <da...@savvis.net>.
On Thursday 10 September 2009 08:39:38 am Yonik Seeley wrote:
> On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey <da...@savvis.net> wrote:
> > I'm posting documents to Solr using http (curl) from
> > C++/C code and am seeing approximately 3.3 - 3.4
> > documents per second being posted.  Is this to be expected?
> 
> No, that's very slow.
> Are you using libcurl, or actually forking a new process for every document?

I'm using libcurl and not forking.

> Are you committing on every document?

No.

> If you can, using Java would make your life much easier since you
> could use the SolrJ client and it's binary protocol for indexing.

As much as I'd like to, I can't.  At this point in time it would take far
too much code restructuring and rewriting.  There is a database involved,
and some senseless portability library being used - though we only run on
Linux at this point in time.  It's just too much work to switch over to using
Java, for now.
	-Dan

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dickey@savvis.net

Re: Solr http post performance seems slow - help?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey <da...@savvis.net> wrote:
> I'm posting documents to Solr using http (curl) from
> C++/C code and am seeing approximately 3.3 - 3.4
> documents per second being posted.  Is this to be expected?

No, that's very slow.
Are you using libcurl, or actually forking a new process for every document?
Are you committing on every document?

If you can, using Java would make your life much easier since you
could use the SolrJ client and it's binary protocol for indexing.

-Yonik
http://www.lucidimagination.com

Re: Solr http post performance seems slow - help?

Posted by Lance Norskog <go...@gmail.com>.
Your indexing project is disk-bound. My modern midrange laptop gets
30MB/s doing "cat > /dev/null" (1 7200rpm disk). The Amazon instances
I'm playing with get 50-60 (I really want to know how it fits
together). Your laptop might be 10-20?

On Thu, Sep 24, 2009 at 11:54 PM, Constantijn Visinescu
<ba...@gmail.com> wrote:
> This may or may not help but here goes :)
>
> When i was running performance tests i look a look at the simple post tool
> that comes with the solr examples.
>
> First i changed my schema.xml to fit my needs and then i deleted the old
> index so solr created a blank one when i started up.
> Then i had a had a process chew on my data and spit out xml files that are
> formatted similarly to the xml files that the SimplePostTool example uses.
> Next i used the simple Post tool to post the xml files to solr (60k-80k
> records per xml file). Each file only took a couple minutes to index this
> way.
> Comit and optimize after that (took less then 10 minutes) and after about
> 2.5 hrs i had indexed just under 8 milion records.
>
> This was on a 4 year old single core laptop using resin 3 as my servlet
> container.
>
> Hope this helps.
>
>
> On Fri, Sep 25, 2009 at 3:51 AM, Lance Norskog <go...@gmail.com> wrote:
>
>> In "top", press the '1' key. This will give a list of the CPUs and how
>> much load is on each. The display is otherwise a little weird for
>> multi-cpu machines. But don't be surprised when Solr is I/O bound. The
>> biggest fanciest RAID is often a better investment than CPUs. On one
>> project we bought low-end rack servers come with 6-8 disk bays,
>> filling them with 10k/15k RPM disks.
>>
>> On Wed, Sep 23, 2009 at 2:47 PM, Dan A. Dickey <da...@savvis.net>
>> wrote:
>> > On Friday 11 September 2009 11:06:20 am Dan A. Dickey wrote:
>> > ...
>> >> Our JBoss expert and I will be looking into why this might be occurring.
>> >> Does anyone know of any JBoss related slowness with Solr?
>> >> And does anyone have any other sort of suggestions to speed indexing
>> >> performance?   Thanks for your help all!  I'll keep you up to date with
>> >> further progress.
>> >
>> > Ok, further progress... just to keep any interested parties up to date
>> > and for the record...
>> >
>> > I'm finding that using the "example" jetty setup (will be switching very
>> > very soon to a "real" jetty installation) is about the fastest.  Using
>> > several processes to send posts to Solr helps a lot, and we're seeing
>> > about 80 posts a second this way.
>> >
>> > We also stripped down JBoss to the bare bones and the Solr in it
>> > is running nearly as fast - about 50 posts a second.  It was our previous
>> > JBoss configuration that was making it appear "slow" for some reason.
>> >
>> > We will be running more tests and spreading out the "pre-index" workload
>> > across more machines and more processes. In our case we were seeing
>> > the bottleneck being one machine running 18 processes.
>> > The 2 quad core xeon system is experiencing about a 25% cpu load.
>> > And I'm not certain, but I think this may be actually 25% of one of the 8
>> cores.
>> > So, there's *lots* of room for Solr to be doing more work there.
>> >        -Dan
>> >
>> > --
>> > Dan A. Dickey | Senior Software Engineer
>> >
>> > Savvis
>> > 10900 Hampshire Ave. S., Bloomington, MN  55438
>> > Office: 952.852.4803 | Fax: 952.852.4951
>> > E-mail: dan.dickey@savvis.net
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Solr http post performance seems slow - help?

Posted by Constantijn Visinescu <ba...@gmail.com>.
This may or may not help but here goes :)

When i was running performance tests i look a look at the simple post tool
that comes with the solr examples.

First i changed my schema.xml to fit my needs and then i deleted the old
index so solr created a blank one when i started up.
Then i had a had a process chew on my data and spit out xml files that are
formatted similarly to the xml files that the SimplePostTool example uses.
Next i used the simple Post tool to post the xml files to solr (60k-80k
records per xml file). Each file only took a couple minutes to index this
way.
Comit and optimize after that (took less then 10 minutes) and after about
2.5 hrs i had indexed just under 8 milion records.

This was on a 4 year old single core laptop using resin 3 as my servlet
container.

Hope this helps.


On Fri, Sep 25, 2009 at 3:51 AM, Lance Norskog <go...@gmail.com> wrote:

> In "top", press the '1' key. This will give a list of the CPUs and how
> much load is on each. The display is otherwise a little weird for
> multi-cpu machines. But don't be surprised when Solr is I/O bound. The
> biggest fanciest RAID is often a better investment than CPUs. On one
> project we bought low-end rack servers come with 6-8 disk bays,
> filling them with 10k/15k RPM disks.
>
> On Wed, Sep 23, 2009 at 2:47 PM, Dan A. Dickey <da...@savvis.net>
> wrote:
> > On Friday 11 September 2009 11:06:20 am Dan A. Dickey wrote:
> > ...
> >> Our JBoss expert and I will be looking into why this might be occurring.
> >> Does anyone know of any JBoss related slowness with Solr?
> >> And does anyone have any other sort of suggestions to speed indexing
> >> performance?   Thanks for your help all!  I'll keep you up to date with
> >> further progress.
> >
> > Ok, further progress... just to keep any interested parties up to date
> > and for the record...
> >
> > I'm finding that using the "example" jetty setup (will be switching very
> > very soon to a "real" jetty installation) is about the fastest.  Using
> > several processes to send posts to Solr helps a lot, and we're seeing
> > about 80 posts a second this way.
> >
> > We also stripped down JBoss to the bare bones and the Solr in it
> > is running nearly as fast - about 50 posts a second.  It was our previous
> > JBoss configuration that was making it appear "slow" for some reason.
> >
> > We will be running more tests and spreading out the "pre-index" workload
> > across more machines and more processes. In our case we were seeing
> > the bottleneck being one machine running 18 processes.
> > The 2 quad core xeon system is experiencing about a 25% cpu load.
> > And I'm not certain, but I think this may be actually 25% of one of the 8
> cores.
> > So, there's *lots* of room for Solr to be doing more work there.
> >        -Dan
> >
> > --
> > Dan A. Dickey | Senior Software Engineer
> >
> > Savvis
> > 10900 Hampshire Ave. S., Bloomington, MN  55438
> > Office: 952.852.4803 | Fax: 952.852.4951
> > E-mail: dan.dickey@savvis.net
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Solr http post performance seems slow - help?

Posted by Lance Norskog <go...@gmail.com>.
In "top", press the '1' key. This will give a list of the CPUs and how
much load is on each. The display is otherwise a little weird for
multi-cpu machines. But don't be surprised when Solr is I/O bound. The
biggest fanciest RAID is often a better investment than CPUs. On one
project we bought low-end rack servers come with 6-8 disk bays,
filling them with 10k/15k RPM disks.

On Wed, Sep 23, 2009 at 2:47 PM, Dan A. Dickey <da...@savvis.net> wrote:
> On Friday 11 September 2009 11:06:20 am Dan A. Dickey wrote:
> ...
>> Our JBoss expert and I will be looking into why this might be occurring.
>> Does anyone know of any JBoss related slowness with Solr?
>> And does anyone have any other sort of suggestions to speed indexing
>> performance?   Thanks for your help all!  I'll keep you up to date with
>> further progress.
>
> Ok, further progress... just to keep any interested parties up to date
> and for the record...
>
> I'm finding that using the "example" jetty setup (will be switching very
> very soon to a "real" jetty installation) is about the fastest.  Using
> several processes to send posts to Solr helps a lot, and we're seeing
> about 80 posts a second this way.
>
> We also stripped down JBoss to the bare bones and the Solr in it
> is running nearly as fast - about 50 posts a second.  It was our previous
> JBoss configuration that was making it appear "slow" for some reason.
>
> We will be running more tests and spreading out the "pre-index" workload
> across more machines and more processes. In our case we were seeing
> the bottleneck being one machine running 18 processes.
> The 2 quad core xeon system is experiencing about a 25% cpu load.
> And I'm not certain, but I think this may be actually 25% of one of the 8 cores.
> So, there's *lots* of room for Solr to be doing more work there.
>        -Dan
>
> --
> Dan A. Dickey | Senior Software Engineer
>
> Savvis
> 10900 Hampshire Ave. S., Bloomington, MN  55438
> Office: 952.852.4803 | Fax: 952.852.4951
> E-mail: dan.dickey@savvis.net
>



-- 
Lance Norskog
goksron@gmail.com

Re: Solr http post performance seems slow - help?

Posted by "Dan A. Dickey" <da...@savvis.net>.
On Friday 11 September 2009 11:06:20 am Dan A. Dickey wrote:
...
> Our JBoss expert and I will be looking into why this might be occurring.
> Does anyone know of any JBoss related slowness with Solr?
> And does anyone have any other sort of suggestions to speed indexing
> performance?   Thanks for your help all!  I'll keep you up to date with
> further progress.

Ok, further progress... just to keep any interested parties up to date
and for the record...

I'm finding that using the "example" jetty setup (will be switching very
very soon to a "real" jetty installation) is about the fastest.  Using
several processes to send posts to Solr helps a lot, and we're seeing
about 80 posts a second this way.

We also stripped down JBoss to the bare bones and the Solr in it
is running nearly as fast - about 50 posts a second.  It was our previous
JBoss configuration that was making it appear "slow" for some reason.

We will be running more tests and spreading out the "pre-index" workload
across more machines and more processes. In our case we were seeing
the bottleneck being one machine running 18 processes.
The 2 quad core xeon system is experiencing about a 25% cpu load.
And I'm not certain, but I think this may be actually 25% of one of the 8 cores.
So, there's *lots* of room for Solr to be doing more work there.
	-Dan

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dickey@savvis.net

Re: Solr http post performance seems slow - help?

Posted by "Dan A. Dickey" <da...@savvis.net>.
On Thursday 10 September 2009 08:13:33 am Dan A. Dickey wrote:
> I'm posting documents to Solr using http (curl) from
> C++/C code and am seeing approximately 3.3 - 3.4
> documents per second being posted.  Is this to be expected?
> Granted - I understand that this depends somewhat on the
> machine running Solr.  By the way - I'm running Solr inside JBoss.
> 
> I was hoping for maybe 20 or more docs/sec, and 3 or so
> is quite a way from that.
> 
> Also, I'm posting just a single document at a time.  I once tried
> 5 processes each posting documents, and that slowed things
> down considerably.  Down into the multiple (5-10) seconds per document.
> 
> Does anyone have suggestions on what I can try?  I'll soon
> have better servers installed and will be splitting the indexing
> work from the searching - but at this point in time, I wasn't doing
> indexing while searching anyway.  Thanks for any and all help!

Ok, I spent some time on this problem this morning, and have some
interesting results to share.  I started off by making sure both boxes
were attached to the same switch - they weren't, but now are.
It didn't help.

I added some timing code... and found indeed that I was getting about
3.3 - 3.4 documents per second to index.  Not so good.

I stopped JBoss (and Solr) and built up a version of the example
stuff that would run my current configuration instead of the example.
Reading the documentation - this runs Solr in a Jetty container.

And this resulted in indexing speeds ranging between 20 - 30 documents
per second.  Much more acceptable.  And also, with a quick test of using
two processes to index - I hit a rate of about 37 dps.  Much nicer.
I don't know yet how this actually scales - but I intend to find out.
We've almost got some nice quad core xeon's ready...

Our JBoss expert and I will be looking into why this might be occurring.
Does anyone know of any JBoss related slowness with Solr?
And does anyone have any other sort of suggestions to speed indexing
performance?   Thanks for your help all!  I'll keep you up to date with
further progress.
	-Dan

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dickey@savvis.net