You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Geert-Jan Brits <gb...@gmail.com> on 2008/01/17 12:10:56 UTC

Re: batch indexing takes more time than shown on SOLR output --> something to do with IO?

No Solr-commit is sent until the end.
Since client and server on this moment are on the same machine network IO
should be small as well I think. Also as you mentioned the response is very
small so it can't be that either.

As to what IO-activity I was thinking about: I was merely guessing here, but
I thought that maybe the creation of indices for indexed fields were not
accounted for in the supplied number. Which is something I can't imagine,
but still with these numbers my head makes all sorts of strange scenario's
;-)

After checking some other machine stats while doing a big update I think I
know what's going on (please correct me if it doesn't sound plausible):
client and server (on the same machine with 2GB RAM) are causing excessive
page swapping (on the same disk, yeah I know, I must get a different setup)
which causes SOLR-server to have difficulties with creating its indices on
disk. I think this is going on since all goes pretty good (no big
discrepencies) until Ram gets's more or less exhausted.

could this be it? I'm going to test with 2 machines I guess.

Thanks,
Geert-Jan

2008/1/17, Chris Hostetter <ho...@fucit.org>:
>
> : INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498,
> ...(42
> : more)
> : ]} 0 875
> :
> : However, when timing this instruction on the client-side (I use SOlrJ
> -->
> : req.process(server)) I get totally different numbers (in the beginning
> the
> : client-side measured time is about 2 seconds on average but after some
> time
> : this time goes up to about 30-40 seconds, altough the solr-outputted
> time
> : stays between 0.8-1.3 seconds?
>
> as Otis mentioned, that time is the raw processing of the request, not
> counting any network IO between the client and the server, or any time
> spent by the "ResponseWriter" formating the response.  you can get more
> accurate numbers about exctly how long the server spent doing all of these
> things from the access log of your servlet container (which should be
> recording the time only after every last byte is written back to the
> client.
>
> that said: there's really no reason for as big a descrepency as you are
> describing particularly on updates where the ResposneWriter has almost
> nothing to do (30-40 seconds per update?!?!?!)
>
> I'm not very familiar with SolrJ, but are you by any chance using it in a
> way that sends a commit after every update command?  (commits can get
> successifly longer as your index gets bigger.)
>
> : Does this have anything to do with costly IO-activity that is accounted
> for
> : in the SOLR output? If this is true, what tool do you recommend using to
> : monitor IO-activity?
>
> Which IO-activity are you talking about?
>
>
>
>
> -Hoss
>
>