You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruby-dev@lucene.apache.org by Matt Mitchell <go...@gmail.com> on 2008/12/03 16:05:18 UTC

Re: quick jruby + solr benchmarks

Thanks Jamie. That's kind of shocking actually. What client library do you
use?

On Sun, Nov 30, 2008 at 1:38 PM, Jamie Orchard-Hays <ja...@dangosaur.us>wrote:

> Here's something to note when using net/http in Ruby (which open-uri
> wraps). Even though it's about as fast as other options, it uses a huge cpu
> load when compared to others (on ruby 1.8.6):
>
> http://apocryph.org/more_indepth_analysis_ruby_http_client_performance
>
>
>
> On Nov 26, 2008, at 12:06 PM, Matt Mitchell wrote:
>
>  Interesting. My main goal was to get a feel for how jruby and the
>> direct/embedded stuff compared to mri ruby and straight up http. But
>> obviously, the data and these tests are not realistic at all. Thanks for
>> your feedback guys.
>>
>> Matt
>>
>> On Wed, Nov 26, 2008 at 10:34 AM, Erik Hatcher
>> <er...@ehatchersolutions.com>wrote:
>>
>>  I just had a brief conversation with Yonik on this to get his way more
>>> expert opinion, and it really boils down to this in this particular
>>> test...
>>> the query itself is incredibly fast (1 millisecond or less QTime Solr
>>> reports) since there are no documents.  So what these differences are
>>> showing is merely the difference between HTTP and a method call - with
>>> nothing else (of note) going on.
>>>
>>> In a realer world scenario, the HTTP overhead makes less difference as
>>> the
>>> work being done in the query/faceting overshadows the communication
>>> overhead.
>>>
>>> There's lies, damned lies, and benchmarks :)
>>>
>>>      Erik
>>>
>>>
>>>
>>> On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:
>>>
>>> Yeah I overlooked all of that. Thanks Erik. So could a better query test
>>>
>>>> be
>>>> an incremental one based on id like:
>>>>
>>>> 100.times do |id|
>>>> q = "id:#{id}"
>>>> # query request here...
>>>> end
>>>>
>>>> ?
>>>>
>>>> Would you happen to know why the solr home and data dir never really
>>>> change?
>>>> Anytime I use commons http or embedded, a "solr" directory is created in
>>>> the
>>>> same directory as my script. Even though I'm setting the home and data
>>>> dir
>>>> in my code?
>>>>
>>>> Matt
>>>>
>>>> On Wed, Nov 26, 2008 at 3:28 AM, Erik Hatcher <
>>>> erik@ehatchersolutions.com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> just a couple of quick code comments...
>>>>
>>>>>
>>>>> On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
>>>>>
>>>>> # EmbeddedSolrServer
>>>>>
>>>>>> def embedded(solr_home)
>>>>>> @embedded ||= (
>>>>>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
>>>>>> import org.apache.solr.core.CoreContainer
>>>>>> import org.apache.solr.core.CoreDescriptor
>>>>>> import org.apache.solr.client.solrj.SolrQuery
>>>>>> core_name = 'main-core'
>>>>>> container = CoreContainer.new
>>>>>> descriptor = CoreDescriptor.new(container, core_name, solr_home)
>>>>>> core = container.create(descriptor)
>>>>>>
>>>>>>
>>>>>>  You'll want to close that core, otherwise the JVM doesn't exit.  I
>>>>> changed
>>>>> this to:
>>>>>
>>>>> @core = ....
>>>>>
>>>>> container.register(core_name, core, false)
>>>>>
>>>>>
>>>>>>
>>>>>>  and used @core there.
>>>>>
>>>>> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
>>>>>
>>>>>
>>>>>>
>>>>>>  Note that faceting is not enabled unless there is also a &facet=on
>>>>>
>>>>> params = hash_to_params(query)
>>>>>
>>>>>
>>>>>> max = 1000
>>>>>>
>>>>>> Benchmark.bm do |x|
>>>>>> x.report 'http commons' do
>>>>>> max.times do
>>>>>>  http_commons.query(params)
>>>>>> end
>>>>>> end
>>>>>> x.report 'embedded' do
>>>>>> max.times do
>>>>>>  embedded(solr_home).query(params)
>>>>>> end
>>>>>> end
>>>>>> end
>>>>>>
>>>>>>
>>>>>>  And I added an:
>>>>>
>>>>> @core.close
>>>>>
>>>>> at the end.
>>>>>
>>>>>    Erik
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>

Re: quick jruby + solr benchmarks

Posted by Jamie Orchard-Hays <ja...@dangosaur.us>.
The other night I spent a few hours messing with EventMachine, Curb  
(libcurl ruby lib) and RFuzz. EventMachine's HTTP2 is just missing  
some of the POST features I need, and I didn't want to figure out how  
to build what I needed from EventMachine's low-level features. RFuzz  
works, but then would crap out completely or go from well under a  
second to 20+ seconds to complete a request. I suspect it's not  
designed for the large POSTs I need. Curb (which is used with "require  
'curl'"--why do some gem authors not name the gem and the library the  
same dang thing???) works great. It's not any faster than net/http,  
but judging from those tests, I should be saving a lot of CPU.

Jamie

On Dec 3, 2008, at 10:05 AM, Matt Mitchell wrote:

> Thanks Jamie. That's kind of shocking actually. What client library  
> do you
> use?
>
> On Sun, Nov 30, 2008 at 1:38 PM, Jamie Orchard-Hays <jamie@dangosaur.us 
> >wrote:
>
>> Here's something to note when using net/http in Ruby (which open-uri
>> wraps). Even though it's about as fast as other options, it uses a  
>> huge cpu
>> load when compared to others (on ruby 1.8.6):
>>
>> http://apocryph.org/ 
>> more_indepth_analysis_ruby_http_client_performance
>>
>>
>>
>> On Nov 26, 2008, at 12:06 PM, Matt Mitchell wrote:
>>
>> Interesting. My main goal was to get a feel for how jruby and the
>>> direct/embedded stuff compared to mri ruby and straight up http. But
>>> obviously, the data and these tests are not realistic at all.  
>>> Thanks for
>>> your feedback guys.
>>>
>>> Matt
>>>
>>> On Wed, Nov 26, 2008 at 10:34 AM, Erik Hatcher
>>> <er...@ehatchersolutions.com>wrote:
>>>
>>> I just had a brief conversation with Yonik on this to get his way  
>>> more
>>>> expert opinion, and it really boils down to this in this particular
>>>> test...
>>>> the query itself is incredibly fast (1 millisecond or less QTime  
>>>> Solr
>>>> reports) since there are no documents.  So what these differences  
>>>> are
>>>> showing is merely the difference between HTTP and a method call -  
>>>> with
>>>> nothing else (of note) going on.
>>>>
>>>> In a realer world scenario, the HTTP overhead makes less  
>>>> difference as
>>>> the
>>>> work being done in the query/faceting overshadows the communication
>>>> overhead.
>>>>
>>>> There's lies, damned lies, and benchmarks :)
>>>>
>>>>     Erik
>>>>
>>>>
>>>>
>>>> On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:
>>>>
>>>> Yeah I overlooked all of that. Thanks Erik. So could a better  
>>>> query test
>>>>
>>>>> be
>>>>> an incremental one based on id like:
>>>>>
>>>>> 100.times do |id|
>>>>> q = "id:#{id}"
>>>>> # query request here...
>>>>> end
>>>>>
>>>>> ?
>>>>>
>>>>> Would you happen to know why the solr home and data dir never  
>>>>> really
>>>>> change?
>>>>> Anytime I use commons http or embedded, a "solr" directory is  
>>>>> created in
>>>>> the
>>>>> same directory as my script. Even though I'm setting the home  
>>>>> and data
>>>>> dir
>>>>> in my code?
>>>>>
>>>>> Matt
>>>>>
>>>>> On Wed, Nov 26, 2008 at 3:28 AM, Erik Hatcher <
>>>>> erik@ehatchersolutions.com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>> just a couple of quick code comments...
>>>>>
>>>>>>
>>>>>> On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
>>>>>>
>>>>>> # EmbeddedSolrServer
>>>>>>
>>>>>>> def embedded(solr_home)
>>>>>>> @embedded ||= (
>>>>>>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
>>>>>>> import org.apache.solr.core.CoreContainer
>>>>>>> import org.apache.solr.core.CoreDescriptor
>>>>>>> import org.apache.solr.client.solrj.SolrQuery
>>>>>>> core_name = 'main-core'
>>>>>>> container = CoreContainer.new
>>>>>>> descriptor = CoreDescriptor.new(container, core_name, solr_home)
>>>>>>> core = container.create(descriptor)
>>>>>>>
>>>>>>>
>>>>>>> You'll want to close that core, otherwise the JVM doesn't  
>>>>>>> exit.  I
>>>>>> changed
>>>>>> this to:
>>>>>>
>>>>>> @core = ....
>>>>>>
>>>>>> container.register(core_name, core, false)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> and used @core there.
>>>>>>
>>>>>> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Note that faceting is not enabled unless there is also a  
>>>>>>> &facet=on
>>>>>>
>>>>>> params = hash_to_params(query)
>>>>>>
>>>>>>
>>>>>>> max = 1000
>>>>>>>
>>>>>>> Benchmark.bm do |x|
>>>>>>> x.report 'http commons' do
>>>>>>> max.times do
>>>>>>> http_commons.query(params)
>>>>>>> end
>>>>>>> end
>>>>>>> x.report 'embedded' do
>>>>>>> max.times do
>>>>>>> embedded(solr_home).query(params)
>>>>>>> end
>>>>>>> end
>>>>>>> end
>>>>>>>
>>>>>>>
>>>>>>> And I added an:
>>>>>>
>>>>>> @core.close
>>>>>>
>>>>>> at the end.
>>>>>>
>>>>>>   Erik
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>