You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Doug Cutting <cu...@apache.org> on 2004/07/01 21:24:11 UTC

Re: Making a case for Lucene

 > The best example that I've been able to find is the Yahoo research
 > lab - as I understand it, this is a Nutch (i.e. Lucene)
 > implementation that's providing impressive performance over a
 > 100 million document repository.

This demo runs on a handful of boxes.  It was originally running on 
three dual-processor boxes, but I think Yahoo! subsequently moved it to 
six or eight single-processor boxes.  Queries are broadcast to all 
servers, and the top-scoring matches overall are presented.

In Nutch-based benchmarks, we found that a single-processor box with 4GB 
of memory and a 2M page Nutch index (i.e., the entire index fits in RAM) 
could handle over 20 Nutch searches/second.  A box with 1GB of memory 
and a 20M page Nutch index (i.e., the entire index does not fit in 
memory) could only handle around 1 or 2 Nutch searches/second.  These 
were done with Lucene 1.3.  Lucene 1.4 should be somewhat faster. 
Performance will obviously vary with processor speed, disk speed, 
average document size, average number terms per query, etc.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Visualization of Lucene search results with a treemap

Posted by Stefan Groschupf <sg...@media-style.com>.
>
>> Do you know:
>> http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html ?
>
> Interesting - is there any code avail to draw the maps?

The algorithm is described here;
http://www.cis.hut.fi/research/som-research/book/

A short summary and some sample code is available here:

http://davis.wpi.edu/~matt/courses/soms/

Some more interesting papers about visualization is available at the 
text-mining.org community page.
http://www.text-mining.org/index.jsp?folderPK=793


Happy hacking! :-)

Stefan
---------------------------------------------------------------
enterprise information technology consulting
open technology:   http://www.media-style.com
open source:           http://www.weta-group.net
open discussion:    http://www.text-mining.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Visualization of Lucene search results with a treemap

Posted by David Spencer <da...@tropo.com>.
Stefan Groschupf wrote:

> Dave,
> cool stuff, think aboout to contribute that to nutch.. ;-)!

Well the code is very generic - basically 1 method that takes a 
Searcher, a Query, the # of cells to show, and the size of the diagram. 
Technically I think it would be a Lucene sandbox contribution - but - 
for my site I do want to convert the custom spider/cache to use Nutch...

> Do you know:
> http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html ?

Interesting - is there any code avail to draw the maps?

thx,
  Dave

> 
> Cheers,
> Stefan
> 
> Am 01.07.2004 um 23:28 schrieb David Spencer:
> 
>>
>> Inspired by these guys who put results from Google into a treemap...
>> http://google.hivegroup.com/
>>
>> I did up my own version running against my index of OSS/javadoc trees.
>> This query for "thread pool" shows it off nicely:
>>
>> http://www.searchmorph.com/kat/tsearch.jsp? 
>> s=thread%20pool&side=300&goal=500
>>
>> This is the empty search form:
>>
>> http://www.searchmorph.com/kat/tsearch.jsp
>>
>> And the weblog entry has a few more links, esp useful if you don't  
>> know what a treemap is:
>>
>> http://searchmorph.com/weblog/index.php?id=18
>>
>> Oh: As a start, a treemap is a visualization technique, not  
>> java.util.Treemap. Bigger boxes show a higher score, and x,y location  
>> has no significance.
>>
>> Enjoy,
>>   Dave
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
> ---------------------------------------------------------------
> enterprise information technology consulting
> open technology:   http://www.media-style.com
> open source:           http://www.weta-group.net
> open discussion:    http://www.text-mining.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Visualization of Lucene search results with a treemap

Posted by Stefan Groschupf <sg...@media-style.com>.
Dave,
cool stuff, think aboout to contribute that to nutch.. ;-)!
Do you know:
http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html ?

Cheers,
Stefan

Am 01.07.2004 um 23:28 schrieb David Spencer:

>
> Inspired by these guys who put results from Google into a treemap...
> http://google.hivegroup.com/
>
> I did up my own version running against my index of OSS/javadoc trees.
> This query for "thread pool" shows it off nicely:
>
> http://www.searchmorph.com/kat/tsearch.jsp? 
> s=thread%20pool&side=300&goal=500
>
> This is the empty search form:
>
> http://www.searchmorph.com/kat/tsearch.jsp
>
> And the weblog entry has a few more links, esp useful if you don't  
> know what a treemap is:
>
> http://searchmorph.com/weblog/index.php?id=18
>
> Oh: As a start, a treemap is a visualization technique, not  
> java.util.Treemap. Bigger boxes show a higher score, and x,y location  
> has no significance.
>
> Enjoy,
>   Dave
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------
enterprise information technology consulting
open technology:   http://www.media-style.com
open source:           http://www.weta-group.net
open discussion:    http://www.text-mining.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Visualization of Lucene search results with a treemap

Posted by David Spencer <da...@tropo.com>.
Inspired by these guys who put results from Google into a treemap...
http://google.hivegroup.com/

I did up my own version running against my index of OSS/javadoc trees.
This query for "thread pool" shows it off nicely:

http://www.searchmorph.com/kat/tsearch.jsp?s=thread%20pool&side=300&goal=500

This is the empty search form:

http://www.searchmorph.com/kat/tsearch.jsp

And the weblog entry has a few more links, esp useful if you don't know 
what a treemap is:

http://searchmorph.com/weblog/index.php?id=18

Oh: As a start, a treemap is a visualization technique, not 
java.util.Treemap. Bigger boxes show a higher score, and x,y location 
has no significance.

Enjoy,
   Dave



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org