You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2005/05/17 04:41:34 UTC

Lucene vs. Ruby/Odeum

Some interesting stuff...

http://www.zedshaw.com/projects/ruby_odeum/performance.html
http://blog.innerewut.de/articles/2005/05/16/ruby-odeum-vs-apache-lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Paul Elschot <pa...@xs4all.nl>.
On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote:
> Some interesting stuff...
> 
> http://www.zedshaw.com/projects/ruby_odeum/performance.html
> http://blog.innerewut.de/articles/2005/05/16/ruby-odeum-vs-apache-lucene
> 

One explanation of jvm startup time is here:

http://www.suse.de/~bastian/Export/linking.txt

Another performance benchmark with lucene, gcj and wikipedia:

http://www.spindazzle.org/green/index.php?m=20050511

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Wolfgang Hoschek <wh...@lbl.gov>.
Yep, if one would set -Xmx32m memory consumption would of course be  
different. So it's really a "discovery" of the (default) Sun JVM gc  
policy rather than anything Lucene specific.

It seems that benchmark results sometimes reflect more a person's  
familiarity (or lack thereof) with a tool rather than anything else.

Oh well,
Wolfgang.


On Jun 1, 2005, at 3:07 PM, Daniel Naber wrote:


> On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote:
>
>
>
>> http://www.zedshaw.com/projects/ruby_odeum/performance.html
>>
>>
>
> Here's a follow up:
> http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html
>
> Now the claim is that Lucene is faster than Ruby/Odeum but it takes 36
> times more memory. However, I cannot find any information on how  
> exactly
> Lucene was started. It's no surprise that Java requires much memory  
> and
> doesn't clean up if it never comes close to the limit set with -Xmx.
>
> Regards
>  Daniel
>
> -- 
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Wolfgang Hoschek <wh...@lbl.gov>.
> As an aside, in my performance testing of Lucene using JProfiler,  
> it seems
> to me that the only way to improve Lucene's performance greatly can  
> come
> from 2 areas
>
> 1. optimizing the JVM array/looping/JIT constructs/capabilities to  
> avoid
> bounds checking/improve performance
> 2. improve function call overhead
>
> Other than that, other changes will require a significant change in  
> the code
> structure (manually unrolling loops), at the sacrifice of
> readability/maintainability.
>

Just curious: are you more happy with JProfiler than with the JDK 1.5  
profiler?

I haven't used JProfiler in quite a while but my impression back then  
was that it's overheads tend to significantly perturb measurement  
results. When I switched to the low-level JDK 1.5 profiler CPU tuning  
efforts got a lot more targetted and meaningful.

So, in my experience, the least perturbing and most accurate profiler  
is the one built into JDK 1.5. run java with
-server -agentlib:hprof=cpu=samples,depth=10' flags for long enough  
to collect enough samples to be statistically meaningful, then study  
the trace log and correlate its hotspot trailer with its call stack  
headers (grep is your friend, a GUI isn't really needed). For a  
background article on hprof see http://java.sun.com/developer/ 
technicalArticles/Programming/HPROF.html

Wolfgang.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Wolfgang Hoschek <wh...@lbl.gov>.
 > poor java startup time

For the one's really keen on reducing startup time the Jolt Java VM  
daemon may perhaps be of some interest:
http://www.dystance.net/software/jolt/index.html

I played with it a year ago when I was curious to see what could be  
done about startup time in the context of simple unix-scriptable  
command line XML webservice clients (the ones that require tons of  
jars as dependencies and take ages to initialize). Startup time went  
from 3-5 secs to zero. Feels like "ls" - you hit ENTER and the  
program completes *instantly*. Of course there's a catch. It requires  
some more work, and it's not a general solution wrt. isolation,  
security, reliability, etc. but for a simple command line lucene  
query tool it might just do fine, FWIW.

Long-term Sun's MVM might be a more comprehensive solution, with some  
luck.

Wolfgang.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene vs. Ruby/Odeum

Posted by Robert Engels <re...@ix.netcom.com>.
One more thing, I did some simple tests with my caching enhancements. And
using a similar test (performing the search for the same word over and
over), there was a 100% performance improvement, so I would expect Lucene to
blow the doors of Odeum in this case.

This is why 'test cases' are so easy to manipulate. I am sure there are
parameter's for Odeum that allow you to increase its index & data block
cache sizes, but the minimum/defaults may be enough to hold all of the data
necessary for the test. As the test coverage gets wider, allocating more
buffer space will usually compensate, and give similar performance numbers.


-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Thursday, June 02, 2005 5:09 AM
To: java-dev@lucene.apache.org
Subject: Re: Lucene vs. Ruby/Odeum


Zed has updated his second part with more experiments with different
JVM's and memory settings:

     http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html

On Jun 2, 2005, at 12:27 AM, Robert Engels wrote:
> I read all of Zed's posts on the subject and I feel he certainly
> presents a
> strong anti-Java

Most definitely an anti-Java leaning - but at least he's working on
being objective about it by measuring things :)

> , if not anti-Lucene bias - maybe just pro Ruby.

He's quite pro-Lucene, and most definitely pro-Ruby.  I consider
myself in those categories myself.

> If you do not even adhere to the principle designer's "guidelines
> to proper
> usage", your tests are meaningless. It's akin to using a new flat
> screen
> monitor and claiming "boy, it has a fuzzy picture", because you didn't
> follow the instructions that said "remove protective film before
> using".

I concur with your sentiment and I've done what I can via e-mail with
him to educate him on my experience with Lucene and JVM garbage
collection.  I'd encourage anyone who has the the time and
inclination to take him up on the request to show how to do it better
since he's made his code available.

> Zed is using a very constrained test - which is probably very
> UNCOMMON in
> the real world of server based systems, to attempt to discern the
> relative
> performance characteristics of Lucene/Java/Ruby/etc. The tests may be
> applicable in his poorly designed environment, but he presents his
> limited
> finding as "gospel", and that it should hold true in all cases. I
> quote...
> "For the people who have no clue (also known as "Executives")
> here's the
> information you need to tell all your employees they need to adopt the
> latest and greatest thing without ever having to understand
> anything you
> read. Cheaper than an article in CIO magazine and even has big
> words like
> "standard deviation"." and then goes on to present his "statistically
> correct" performance numbers.

Don't get me wrong - Zed is using inflammatory language.  We should
work to not lower ourselves to speaking in that same tone but rather
objectively and nicely point out the errors of his ways.  He's open
to that despite his caustic tone - at least from the e-mail exchanges
I've had with him.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene vs. Ruby/Odeum

Posted by Robert Engels <re...@ix.netcom.com>.
There are still very SIGNIFICANT problems with his tests.

1. The environment is not "real", except for possibly desktop searching.
Whether the JVM needs 64m or 128m to perform adequately is immaterial, given
the price of RAM and the ease of expansion. It would be akin to saying
"let's all try to write programs that work with 64k of memory", why
needlessly constrain yourself? I would take performance, readability,
maintainability, and reliability over memory consumption any day. If the
point of the test is to show that a Java based searching system needs more
memory than a script language on top of a C db library, who cares? The
smallest possible Java program I can write, shows a VM size of 8.5mb under
Windows XP - which is larger than the whole of Odeum/Ruby. There is a lot
that the Java system provides that isn't useful in this particular use case,
but I run Lucene in a multithreaded, server environment, with a test index
of 350mb, and I can run it in as little as 19mb - but why would I want to?

2. The search is always for the same word. The Odeum database based version
will almost certainly cache all the required data and index blocks in memory
after the first run, avoiding all calls to the OS. Since Lucene performs no
local caching (without my mods), it will ALWAYS require trips to the OS.
Also, each run of Lucene is going to generate garbage without question. A
properly designed non-Java db will almost certainly generate no increased
memory usage in the constrained case. Running the tests using multiple
threads on random words would be far more interesting.

Lastly, for what's it's worth (and that's probably not much!) - if Odeum was
the "better search engine", you could do a Java -> Odeum mapping and I
GUARANTEE the Java implementation using the latest JIT JVMs compilers will
be faster than the Ruby one. Also, where's my cross platform GUI for
displaying the search results?

The best developers attempt to use the right tool for the job. Ruby is a
GREAT scripting language, and is perfect for all the things scripting
languages are good for. Let's leave it at that.

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Thursday, June 02, 2005 5:09 AM
To: java-dev@lucene.apache.org
Subject: Re: Lucene vs. Ruby/Odeum


Zed has updated his second part with more experiments with different
JVM's and memory settings:

     http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html

On Jun 2, 2005, at 12:27 AM, Robert Engels wrote:
> I read all of Zed's posts on the subject and I feel he certainly
> presents a
> strong anti-Java

Most definitely an anti-Java leaning - but at least he's working on
being objective about it by measuring things :)

> , if not anti-Lucene bias - maybe just pro Ruby.

He's quite pro-Lucene, and most definitely pro-Ruby.  I consider
myself in those categories myself.

> If you do not even adhere to the principle designer's "guidelines
> to proper
> usage", your tests are meaningless. It's akin to using a new flat
> screen
> monitor and claiming "boy, it has a fuzzy picture", because you didn't
> follow the instructions that said "remove protective film before
> using".

I concur with your sentiment and I've done what I can via e-mail with
him to educate him on my experience with Lucene and JVM garbage
collection.  I'd encourage anyone who has the the time and
inclination to take him up on the request to show how to do it better
since he's made his code available.

> Zed is using a very constrained test - which is probably very
> UNCOMMON in
> the real world of server based systems, to attempt to discern the
> relative
> performance characteristics of Lucene/Java/Ruby/etc. The tests may be
> applicable in his poorly designed environment, but he presents his
> limited
> finding as "gospel", and that it should hold true in all cases. I
> quote...
> "For the people who have no clue (also known as "Executives")
> here's the
> information you need to tell all your employees they need to adopt the
> latest and greatest thing without ever having to understand
> anything you
> read. Cheaper than an article in CIO magazine and even has big
> words like
> "standard deviation"." and then goes on to present his "statistically
> correct" performance numbers.

Don't get me wrong - Zed is using inflammatory language.  We should
work to not lower ourselves to speaking in that same tone but rather
objectively and nicely point out the errors of his ways.  He's open
to that despite his caustic tone - at least from the e-mail exchanges
I've had with him.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Zed has updated his second part with more experiments with different  
JVM's and memory settings:

     http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html

On Jun 2, 2005, at 12:27 AM, Robert Engels wrote:
> I read all of Zed's posts on the subject and I feel he certainly  
> presents a
> strong anti-Java

Most definitely an anti-Java leaning - but at least he's working on  
being objective about it by measuring things :)

> , if not anti-Lucene bias - maybe just pro Ruby.

He's quite pro-Lucene, and most definitely pro-Ruby.  I consider  
myself in those categories myself.

> If you do not even adhere to the principle designer's "guidelines  
> to proper
> usage", your tests are meaningless. It's akin to using a new flat  
> screen
> monitor and claiming "boy, it has a fuzzy picture", because you didn't
> follow the instructions that said "remove protective film before  
> using".

I concur with your sentiment and I've done what I can via e-mail with  
him to educate him on my experience with Lucene and JVM garbage  
collection.  I'd encourage anyone who has the the time and  
inclination to take him up on the request to show how to do it better  
since he's made his code available.

> Zed is using a very constrained test - which is probably very  
> UNCOMMON in
> the real world of server based systems, to attempt to discern the  
> relative
> performance characteristics of Lucene/Java/Ruby/etc. The tests may be
> applicable in his poorly designed environment, but he presents his  
> limited
> finding as "gospel", and that it should hold true in all cases. I  
> quote...
> "For the people who have no clue (also known as "Executives")  
> here's the
> information you need to tell all your employees they need to adopt the
> latest and greatest thing without ever having to understand  
> anything you
> read. Cheaper than an article in CIO magazine and even has big  
> words like
> "standard deviation"." and then goes on to present his "statistically
> correct" performance numbers.

Don't get me wrong - Zed is using inflammatory language.  We should  
work to not lower ourselves to speaking in that same tone but rather  
objectively and nicely point out the errors of his ways.  He's open  
to that despite his caustic tone - at least from the e-mail exchanges  
I've had with him.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene vs. Ruby/Odeum

Posted by Robert Engels <re...@ix.netcom.com>.
Sorry if you thought my comment was destructive or counter-productive.

I read all of Zed's posts on the subject and I feel he certainly presents a
strong anti-Java, if not anti-Lucene bias - maybe just pro Ruby. The funny
thing is that I am not a Java zealot by any means, and I am a firm believer
in the "right tool for the job", but Zed's analysis is similar to testing
screwdrivers, and then determining that one hammers nails way better than
another.

If you do not even adhere to the principle designer's "guidelines to proper
usage", your tests are meaningless. It's akin to using a new flat screen
monitor and claiming "boy, it has a fuzzy picture", because you didn't
follow the instructions that said "remove protective film before using".

I just get frustrated when people use their "advanced methods" to prove
their point (even though the statistics are very basic), but avoid the use
of common sense. The adage "garbage in, garbage out" will always hold true.

Zed is using a very constrained test - which is probably very UNCOMMON in
the real world of server based systems, to attempt to discern the relative
performance characteristics of Lucene/Java/Ruby/etc. The tests may be
applicable in his poorly designed environment, but he presents his limited
finding as "gospel", and that it should hold true in all cases. I quote...
"For the people who have no clue (also known as "Executives") here's the
information you need to tell all your employees they need to adopt the
latest and greatest thing without ever having to understand anything you
read. Cheaper than an article in CIO magazine and even has big words like
"standard deviation"." and then goes on to present his "statistically
correct" performance numbers.

As an aside, in my performance testing of Lucene using JProfiler, it seems
to me that the only way to improve Lucene's performance greatly can come
from 2 areas

1. optimizing the JVM array/looping/JIT constructs/capabilities to avoid
bounds checking/improve performance
2. improve function call overhead

Other than that, other changes will require a significant change in the code
structure (manually unrolling loops), at the sacrifice of
readability/maintainability.

R


-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Wednesday, June 01, 2005 8:46 PM
To: java-dev@lucene.apache.org
Subject: Re: Lucene vs. Ruby/Odeum


Robert - Please tone it down.  Zed is aware of this thread and
perhaps even seeing this message.  There is no need to resort to such
verbiage - Zed and I have been communicating and he is a fan of
Lucene and has proven in his last entry that Lucene is faster than
Ruby/Odeum even with the massive memory issue he notes (and has been
properly informed of what he's doing incorrectly in that situation).

Speaking for myself - I want the most accurate, flexible, and fastest
search system possible regardless of platform or language.  Certainly
I want it to be Lucene, but I welcome competition and those that go
to the extensive effort of collecting data and making studies such as
Zed has.  The Lucene community can help keep this type of competition
healthy and positive by educating folks in proper Lucene usage and
responding in kind regardless of the mistakes, attitudes, or flame-
bait we may encounter.

     Erik

On Jun 1, 2005, at 7:48 PM, Robert Engels wrote:

> I think I am going to start a new Blog - "Zed's an Idiot".
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Wednesday, June 01, 2005 6:39 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Lucene vs. Ruby/Odeum
>
>
>
> On Jun 1, 2005, at 6:07 PM, Daniel Naber wrote:
>
>
>> On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote:
>>
>>
>>
>>> http://www.zedshaw.com/projects/ruby_odeum/performance.html
>>>
>>>
>>
>> Here's a follow up:
>> http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html
>>
>> Now the claim is that Lucene is faster than Ruby/Odeum but it
>> takes 36
>> times more memory. However, I cannot find any information on how
>> exactly
>> Lucene was started. It's no surprise that Java requires much memory
>> and
>> doesn't clean up if it never comes close to the limit set with -Xmx.
>>
>
> I went around several times in e-mail with Zed, the author of this
> comparison after his follow-up.  His paraphrasing of me in there is
> only partially sort of what I said to him.  He's instantiating an
> IndexSearcher inside a tight loop which I told him was a very bad
> thing to do with Lucene and that his loops are so tight that garbage
> collection isn't getting a chance to kick in.  He doesn't currently
> believe some of this from me, and also feels that adjusting the code
> to make Lucene happy is being unfair.
>
> I wish the RubyLucene folks would hurry up and get a port over there
> so that we could compare against Ruby/Odeum "fairly" :)
>
>      Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Robert - Please tone it down.  Zed is aware of this thread and  
perhaps even seeing this message.  There is no need to resort to such  
verbiage - Zed and I have been communicating and he is a fan of  
Lucene and has proven in his last entry that Lucene is faster than  
Ruby/Odeum even with the massive memory issue he notes (and has been  
properly informed of what he's doing incorrectly in that situation).

Speaking for myself - I want the most accurate, flexible, and fastest  
search system possible regardless of platform or language.  Certainly  
I want it to be Lucene, but I welcome competition and those that go  
to the extensive effort of collecting data and making studies such as  
Zed has.  The Lucene community can help keep this type of competition  
healthy and positive by educating folks in proper Lucene usage and  
responding in kind regardless of the mistakes, attitudes, or flame- 
bait we may encounter.

     Erik

On Jun 1, 2005, at 7:48 PM, Robert Engels wrote:

> I think I am going to start a new Blog - "Zed's an Idiot".
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Wednesday, June 01, 2005 6:39 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Lucene vs. Ruby/Odeum
>
>
>
> On Jun 1, 2005, at 6:07 PM, Daniel Naber wrote:
>
>
>> On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote:
>>
>>
>>
>>> http://www.zedshaw.com/projects/ruby_odeum/performance.html
>>>
>>>
>>
>> Here's a follow up:
>> http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html
>>
>> Now the claim is that Lucene is faster than Ruby/Odeum but it  
>> takes 36
>> times more memory. However, I cannot find any information on how
>> exactly
>> Lucene was started. It's no surprise that Java requires much memory
>> and
>> doesn't clean up if it never comes close to the limit set with -Xmx.
>>
>
> I went around several times in e-mail with Zed, the author of this
> comparison after his follow-up.  His paraphrasing of me in there is
> only partially sort of what I said to him.  He's instantiating an
> IndexSearcher inside a tight loop which I told him was a very bad
> thing to do with Lucene and that his loops are so tight that garbage
> collection isn't getting a chance to kick in.  He doesn't currently
> believe some of this from me, and also feels that adjusting the code
> to make Lucene happy is being unfair.
>
> I wish the RubyLucene folks would hurry up and get a port over there
> so that we could compare against Ruby/Odeum "fairly" :)
>
>      Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene vs. Ruby/Odeum

Posted by Robert Engels <re...@ix.netcom.com>.
I think I am going to start a new Blog - "Zed's an Idiot".

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Wednesday, June 01, 2005 6:39 PM
To: java-dev@lucene.apache.org
Subject: Re: Lucene vs. Ruby/Odeum



On Jun 1, 2005, at 6:07 PM, Daniel Naber wrote:

> On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote:
>
>
>> http://www.zedshaw.com/projects/ruby_odeum/performance.html
>>
>
> Here's a follow up:
> http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html
>
> Now the claim is that Lucene is faster than Ruby/Odeum but it takes 36
> times more memory. However, I cannot find any information on how  
> exactly
> Lucene was started. It's no surprise that Java requires much memory  
> and
> doesn't clean up if it never comes close to the limit set with -Xmx.

I went around several times in e-mail with Zed, the author of this  
comparison after his follow-up.  His paraphrasing of me in there is  
only partially sort of what I said to him.  He's instantiating an  
IndexSearcher inside a tight loop which I told him was a very bad  
thing to do with Lucene and that his loops are so tight that garbage  
collection isn't getting a chance to kick in.  He doesn't currently  
believe some of this from me, and also feels that adjusting the code  
to make Lucene happy is being unfair.

I wish the RubyLucene folks would hurry up and get a port over there  
so that we could compare against Ruby/Odeum "fairly" :)

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jun 1, 2005, at 6:07 PM, Daniel Naber wrote:

> On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote:
>
>
>> http://www.zedshaw.com/projects/ruby_odeum/performance.html
>>
>
> Here's a follow up:
> http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html
>
> Now the claim is that Lucene is faster than Ruby/Odeum but it takes 36
> times more memory. However, I cannot find any information on how  
> exactly
> Lucene was started. It's no surprise that Java requires much memory  
> and
> doesn't clean up if it never comes close to the limit set with -Xmx.

I went around several times in e-mail with Zed, the author of this  
comparison after his follow-up.  His paraphrasing of me in there is  
only partially sort of what I said to him.  He's instantiating an  
IndexSearcher inside a tight loop which I told him was a very bad  
thing to do with Lucene and that his loops are so tight that garbage  
collection isn't getting a chance to kick in.  He doesn't currently  
believe some of this from me, and also feels that adjusting the code  
to make Lucene happy is being unfair.

I wish the RubyLucene folks would hurry up and get a port over there  
so that we could compare against Ruby/Odeum "fairly" :)

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene vs. Ruby/Odeum

Posted by Robert Engels <re...@ix.netcom.com>.
One more thing, his statement that "why returning 20 documents would perform
any better than returning all of them" (paraphrased), shows complete
ignorance of proper Lucene usage.

R

-----Original Message-----
From: Robert Engels [mailto:rengels@ix.netcom.com]
Sent: Wednesday, June 01, 2005 5:52 PM
To: java-dev@lucene.apache.org
Subject: RE: Lucene vs. Ruby/Odeum


I read his complete article... he still doesn't have a clue.

The opening and closing of the IndexReader's is just creating garbage which
is distorting the memory consumption, and ruining the performance - it is
akin to starting it from the command-line to perform a search. As you state,
the Java memory consumption will continue to grow until it hits the Xmx
until it even attempts to purge, and even then it may not release the memory
back to the OS. Even small one-time use strings will eventually show huge
memory consumption until the runtime needs the memory.

It is my understanding in reviewing the Lucene code, is that Lucene caches
VERY LITTLE information in memory - seemingly just the skip table for terms,
and relies on the OS's disk cache for performance (other than the new
enhancements I posted that move a cache into the Lucene directory). Maybe
somebody has more information here?

Robert Engels

-----Original Message-----
From: Daniel Naber [mailto:lucenelist@danielnaber.de]
Sent: Wednesday, June 01, 2005 5:07 PM
To: java-dev@lucene.apache.org
Subject: Re: Lucene vs. Ruby/Odeum


On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote:

> http://www.zedshaw.com/projects/ruby_odeum/performance.html

Here's a follow up:
http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html

Now the claim is that Lucene is faster than Ruby/Odeum but it takes 36
times more memory. However, I cannot find any information on how exactly
Lucene was started. It's no surprise that Java requires much memory and
doesn't clean up if it never comes close to the limit set with -Xmx.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene vs. Ruby/Odeum

Posted by Robert Engels <re...@ix.netcom.com>.
I read his complete article... he still doesn't have a clue.

The opening and closing of the IndexReader's is just creating garbage which
is distorting the memory consumption, and ruining the performance - it is
akin to starting it from the command-line to perform a search. As you state,
the Java memory consumption will continue to grow until it hits the Xmx
until it even attempts to purge, and even then it may not release the memory
back to the OS. Even small one-time use strings will eventually show huge
memory consumption until the runtime needs the memory.

It is my understanding in reviewing the Lucene code, is that Lucene caches
VERY LITTLE information in memory - seemingly just the skip table for terms,
and relies on the OS's disk cache for performance (other than the new
enhancements I posted that move a cache into the Lucene directory). Maybe
somebody has more information here?

Robert Engels

-----Original Message-----
From: Daniel Naber [mailto:lucenelist@danielnaber.de]
Sent: Wednesday, June 01, 2005 5:07 PM
To: java-dev@lucene.apache.org
Subject: Re: Lucene vs. Ruby/Odeum


On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote:

> http://www.zedshaw.com/projects/ruby_odeum/performance.html

Here's a follow up:
http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html

Now the claim is that Lucene is faster than Ruby/Odeum but it takes 36
times more memory. However, I cannot find any information on how exactly
Lucene was started. It's no surprise that Java requires much memory and
doesn't clean up if it never comes close to the limit set with -Xmx.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Daniel Naber <lu...@danielnaber.de>.
On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote:

> http://www.zedshaw.com/projects/ruby_odeum/performance.html

Here's a follow up:
http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html

Now the claim is that Lucene is faster than Ruby/Odeum but it takes 36 
times more memory. However, I cannot find any information on how exactly 
Lucene was started. It's no surprise that Java requires much memory and 
doesn't clean up if it never comes close to the limit set with -Xmx.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 16, 2005, at 10:41 PM, Otis Gospodnetic wrote:
> Some interesting stuff...
>
> http://www.zedshaw.com/projects/ruby_odeum/performance.html

That's nice flamebait for sure.  The fact of the matter is that JVM  
startup speed is a well-known issue and to truly compare indexing/ 
searching speed the startup of the Ruby interpreter or JVM should not  
be included.

Practically though, if your searches are occurring from command-line  
tools then certainly the startup of the VM is a factor to consider,  
sure.  But in the Java world, most uses of Lucene would not be from a  
command-line tool in this manner.

All that said - Odeum and the Ruby wrapper to it appear to be well  
done and deserve great credit.

I am working on phasing more of my work into Ruby, so full-text  
search in that environment is important to me (even if it means  
having Java Lucene server communicating to Ruby).

The Ruby (and Rails) hype is a bit too antagonistic towards Java, and  
the same is true in reverse with a lot of Java folks defensive  
against Ruby.  Ruby is a great language and I likely will be  
developing more and more there.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Wolfgang Hoschek <wh...@lbl.gov>.
Right. One doesn't need to run those benchmarks to immediately see  
that most time is spent in VM startup, class loading, hotspot  
compilation rather than anything Lucene related. Even a simple  
System.out.println("hello") typically takes some 0.3 secs on a fast  
box and JVM.

Wolfgang.

On May 17, 2005, at 7:33 AM, Scott Ganyo wrote:

> Interesting, but questionable.  I can imagine three problems with  
> the write-up just off-hand:
>
> 1) JVM startup time.  As the author noted, this can be an issue  
> with short-running Java applications.
>
> 2) JVM warm-up time.  The HotSpot VM is designed to optimize itself  
> and become faster over time rather than being the fastest right out  
> of the blocks.
>
> 3) Data access patterns.  It is possible (I don't know) that Odeum  
> is designed for quick one-time search on the data without reading  
> and caching the index like Lucene does for subsequent queries.
>
> In each case, there is a common theme:  Lucene and Java are  
> designed to perform better for longer-running applications... not  
> start, lookup, and terminate utilities.
>
> S
>
> On May 16, 2005, at 9:41 PM, Otis Gospodnetic wrote:
>
>
>> Some interesting stuff...
>>
>> http://www.zedshaw.com/projects/ruby_odeum/performance.html
>> http://blog.innerewut.de/articles/2005/05/16/ruby-odeum-vs-apache- 
>> lucene
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
>


Re: Lucene vs. Ruby/Odeum

Posted by Wolfgang Hoschek <wh...@lbl.gov>.
Right. One doesn't need to run those benchmarks to immediately see  
that most time is spent in VM startup, class loading, hotspot  
compilation rather than anything Lucene related. Even a simple  
System.out.println("hello") typically takes some 0.3 secs on a fast  
box and JVM.

Wolfgang.

On May 17, 2005, at 7:33 AM, Scott Ganyo wrote:

> Interesting, but questionable.  I can imagine three problems with  
> the write-up just off-hand:
>
> 1) JVM startup time.  As the author noted, this can be an issue  
> with short-running Java applications.
>
> 2) JVM warm-up time.  The HotSpot VM is designed to optimize itself  
> and become faster over time rather than being the fastest right out  
> of the blocks.
>
> 3) Data access patterns.  It is possible (I don't know) that Odeum  
> is designed for quick one-time search on the data without reading  
> and caching the index like Lucene does for subsequent queries.
>
> In each case, there is a common theme:  Lucene and Java are  
> designed to perform better for longer-running applications... not  
> start, lookup, and terminate utilities.
>
> S
>
> On May 16, 2005, at 9:41 PM, Otis Gospodnetic wrote:
>
>
>> Some interesting stuff...
>>
>> http://www.zedshaw.com/projects/ruby_odeum/performance.html
>> http://blog.innerewut.de/articles/2005/05/16/ruby-odeum-vs-apache- 
>> lucene
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene vs. Ruby/Odeum

Posted by Scott Ganyo <sc...@ganyo.com>.
Interesting, but questionable.  I can imagine three problems with the  
write-up just off-hand:

1) JVM startup time.  As the author noted, this can be an issue with  
short-running Java applications.

2) JVM warm-up time.  The HotSpot VM is designed to optimize itself  
and become faster over time rather than being the fastest right out  
of the blocks.

3) Data access patterns.  It is possible (I don't know) that Odeum is  
designed for quick one-time search on the data without reading and  
caching the index like Lucene does for subsequent queries.

In each case, there is a common theme:  Lucene and Java are designed  
to perform better for longer-running applications... not start,  
lookup, and terminate utilities.

S

On May 16, 2005, at 9:41 PM, Otis Gospodnetic wrote:

> Some interesting stuff...
>
> http://www.zedshaw.com/projects/ruby_odeum/performance.html
> http://blog.innerewut.de/articles/2005/05/16/ruby-odeum-vs-apache- 
> lucene
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


Re: Lucene vs. Ruby/Odeum

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 16, 2005, at 10:41 PM, Otis Gospodnetic wrote:
> Some interesting stuff...
>
> http://www.zedshaw.com/projects/ruby_odeum/performance.html

That's nice flamebait for sure.  The fact of the matter is that JVM  
startup speed is a well-known issue and to truly compare indexing/ 
searching speed the startup of the Ruby interpreter or JVM should not  
be included.

Practically though, if your searches are occurring from command-line  
tools then certainly the startup of the VM is a factor to consider,  
sure.  But in the Java world, most uses of Lucene would not be from a  
command-line tool in this manner.

All that said - Odeum and the Ruby wrapper to it appear to be well  
done and deserve great credit.

I am working on phasing more of my work into Ruby, so full-text  
search in that environment is important to me (even if it means  
having Java Lucene server communicating to Ruby).

The Ruby (and Rails) hype is a bit too antagonistic towards Java, and  
the same is true in reverse with a lot of Java folks defensive  
against Ruby.  Ruby is a great language and I likely will be  
developing more and more there.

     Erik


Re: Lucene vs. Ruby/Odeum

Posted by Scott Ganyo <sc...@ganyo.com>.
Interesting, but questionable.  I can imagine three problems with the  
write-up just off-hand:

1) JVM startup time.  As the author noted, this can be an issue with  
short-running Java applications.

2) JVM warm-up time.  The HotSpot VM is designed to optimize itself  
and become faster over time rather than being the fastest right out  
of the blocks.

3) Data access patterns.  It is possible (I don't know) that Odeum is  
designed for quick one-time search on the data without reading and  
caching the index like Lucene does for subsequent queries.

In each case, there is a common theme:  Lucene and Java are designed  
to perform better for longer-running applications... not start,  
lookup, and terminate utilities.

S

On May 16, 2005, at 9:41 PM, Otis Gospodnetic wrote:

> Some interesting stuff...
>
> http://www.zedshaw.com/projects/ruby_odeum/performance.html
> http://blog.innerewut.de/articles/2005/05/16/ruby-odeum-vs-apache- 
> lucene
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>