You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sam Lee <vi...@yahoo.com> on 2005/10/24 07:58:55 UTC

How Fast is MemoryIndex? How Much Resource Does It Use?

Hi,
  Someone suggested that I should use MemoryIndex to
match content to a large # of queries. e.g. "nike red
shoes" --match--> "nike shoes -blue"  and --match-->
"nike shoes -black"...  What if I have 100000 of these
queries for each content?  and there maybe 1000000 of
these contents.

But how fast is MemoryIndex?  Is it cpu and memory
intensive?  I read somewhere and it said that it is
about  three order faster than normal operation.  If
so, why not use it for the normal operation as well?  

Many thanks.




		
__________________________________ 
Start your day with Yahoo! - Make it your home page! 
http://www.yahoo.com/r/hs

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How Fast is MemoryIndex? How Much Resource Does It Use?

Posted by Christophe <fo...@blowfish.com>.
Hi, Sam,

Is there a reason you couldn't build a test case and try it, in your  
environment and on your hardware?  That seems to be the only way to  
really answer the question.

On 24 Oct 2005, at 09:54, Sam Lee wrote:

> How much of a performance impact if I store queries as
> documents first?
>
> Actually, I just thought of a way to first select
> queries with certain quality before doing memoryindex,
> so it will trim it to much less than 100000.
>
> But has anyone done MemoryIndex?  I need some
> real-world examples that can tell me how fast
> MemoryIndex is before I decide to use it, like # of
> queries /sec and cpu and memory they are using, etc.
> I searched all over google but can't find any.
>
> --- markharw00d <ma...@yahoo.co.uk> wrote:
>
>
>>>> If so, why not use it for the normal operation as
>>>>
>> well?
>>
>> Because MemoryIndex only allows you to store/query
>> one document.
>> It is fast, but I would not suggest running 10000
>> queries against it.
>>
>> Why not try store the queries as documents in a
>> special index and query
>> them using the subject document.
>> The results will be a rough short-list of the
>> queries you now need to
>> run (ie less than 10,000!).  Put the subject
>> document eg "i sell red
>> nike shoes" into a memory index then run the
>> selected queries against it.
>>
>> These queries may have mandatory clauses  ( eg +/-
>> operators) which may
>> cause them to fail when run as queries against the
>> MemoryIndexed subject
>> doc which is why the first "query the queries"
>> search is insufficient to
>> find the matches.
>>
>> Cheers,
>> Mark
>>
>>
>> Sam Lee wrote:
>>
>>
>>> Hi,
>>>  Someone suggested that I should use MemoryIndex
>>>
>> to
>>
>>> match content to a large # of queries. e.g. "nike
>>>
>> red
>>
>>> shoes" --match--> "nike shoes -blue"  and
>>>
>> --match-->
>>
>>> "nike shoes -black"...  What if I have 100000 of
>>>
>> these
>>
>>> queries for each content?  and there maybe 1000000
>>>
>> of
>>
>>> these contents.
>>>
>>> But how fast is MemoryIndex?  Is it cpu and memory
>>> intensive?  I read somewhere and it said that it is
>>> about  three order faster than normal operation.
>>>
>> If
>>
>>> so, why not use it for the normal operation as
>>>
>> well?
>>
>>>
>>> Many thanks.
>>>
>>>
>>>
>>>
>>>
>>> __________________________________
>>> Start your day with Yahoo! - Make it your home
>>>
>> page!
>>
>>> http://www.yahoo.com/r/hs
>>>
>>>
>>
>> ---------------------------------------------------------------------
>>
>>> To unsubscribe, e-mail:
>>>
>> java-user-unsubscribe@lucene.apache.org
>>
>>> For additional commands, e-mail:
>>>
>> java-user-help@lucene.apache.org
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>>
>>
> ___________________________________________________________
>
>>
>> Yahoo! Messenger - NEW crystal clear PC to PC
>> calling worldwide with voicemail
>> http://uk.messenger.yahoo.com
>>
>>
>>
> ---------------------------------------------------------------------
>
>> To unsubscribe, e-mail:
>> java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>>
>>
>>
>
>
>
>
> __________________________________
> Yahoo! FareChase: Search multiple travel sites in one click.
> http://farechase.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How Fast is MemoryIndex? How Much Resource Does It Use?

Posted by Sam Lee <vi...@yahoo.com>.
How much of a performance impact if I store queries as
documents first?  

Actually, I just thought of a way to first select
queries with certain quality before doing memoryindex,
so it will trim it to much less than 100000.  

But has anyone done MemoryIndex?  I need some
real-world examples that can tell me how fast
MemoryIndex is before I decide to use it, like # of
queries /sec and cpu and memory they are using, etc. 
I searched all over google but can't find any.

--- markharw00d <ma...@yahoo.co.uk> wrote:

> >>If so, why not use it for the normal operation as
> well?  
> 
> Because MemoryIndex only allows you to store/query
> one document.
> It is fast, but I would not suggest running 10000
> queries against it.
> 
> Why not try store the queries as documents in a
> special index and query 
> them using the subject document.
> The results will be a rough short-list of the
> queries you now need to 
> run (ie less than 10,000!).  Put the subject
> document eg "i sell red 
> nike shoes" into a memory index then run the
> selected queries against it.
> 
> These queries may have mandatory clauses  ( eg +/-
> operators) which may 
> cause them to fail when run as queries against the
> MemoryIndexed subject 
> doc which is why the first "query the queries"
> search is insufficient to 
> find the matches.
> 
> Cheers,
> Mark
> 
> 
> Sam Lee wrote:
> 
> >Hi,
> >  Someone suggested that I should use MemoryIndex
> to
> >match content to a large # of queries. e.g. "nike
> red
> >shoes" --match--> "nike shoes -blue"  and
> --match-->
> >"nike shoes -black"...  What if I have 100000 of
> these
> >queries for each content?  and there maybe 1000000
> of
> >these contents.
> >
> >But how fast is MemoryIndex?  Is it cpu and memory
> >intensive?  I read somewhere and it said that it is
> >about  three order faster than normal operation. 
> If
> >so, why not use it for the normal operation as
> well?  
> >
> >Many thanks.
> >
> >
> >
> >
> >		
> >__________________________________ 
> >Start your day with Yahoo! - Make it your home
> page! 
> >http://www.yahoo.com/r/hs
> >
>
>---------------------------------------------------------------------
> >To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> >For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >
> >
> >
> >  
> >
> 
> 
> 
> 	
> 	
> 		
>
___________________________________________________________
> 
> Yahoo! Messenger - NEW crystal clear PC to PC
> calling worldwide with voicemail
> http://uk.messenger.yahoo.com
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 



		
__________________________________ 
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How Fast is MemoryIndex? How Much Resource Does It Use?

Posted by markharw00d <ma...@yahoo.co.uk>.
>>If so, why not use it for the normal operation as well?  

Because MemoryIndex only allows you to store/query one document.
It is fast, but I would not suggest running 10000 queries against it.

Why not try store the queries as documents in a special index and query 
them using the subject document.
The results will be a rough short-list of the queries you now need to 
run (ie less than 10,000!).  Put the subject document eg "i sell red 
nike shoes" into a memory index then run the selected queries against it.

These queries may have mandatory clauses  ( eg +/- operators) which may 
cause them to fail when run as queries against the MemoryIndexed subject 
doc which is why the first "query the queries" search is insufficient to 
find the matches.

Cheers,
Mark


Sam Lee wrote:

>Hi,
>  Someone suggested that I should use MemoryIndex to
>match content to a large # of queries. e.g. "nike red
>shoes" --match--> "nike shoes -blue"  and --match-->
>"nike shoes -black"...  What if I have 100000 of these
>queries for each content?  and there maybe 1000000 of
>these contents.
>
>But how fast is MemoryIndex?  Is it cpu and memory
>intensive?  I read somewhere and it said that it is
>about  three order faster than normal operation.  If
>so, why not use it for the normal operation as well?  
>
>Many thanks.
>
>
>
>
>		
>__________________________________ 
>Start your day with Yahoo! - Make it your home page! 
>http://www.yahoo.com/r/hs
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>  
>



	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How Fast is MemoryIndex? How Much Resource Does It Use?

Posted by mark harwood <ma...@yahoo.co.uk>.
It is fast.
>> so, why not use it for the normal operation as
well?

Because it only stores one document.

Given the number of queries you have I'm not sure I'd
run them all. How about putting them as docs into a
categorisation index then using the subject document
as a query to selct a subset of the queries you need
to run?
This should give you a rough shortlist of queries then
you can run them all against the one memory-indexed
subject document to see if they *really* match i.e if
the mandatory/AND statements are all satisfied. 

Cheers,
Mark


		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How Fast is MemoryIndex? How Much Resource Does It Use?

Posted by Olena Medelyan <me...@coling.uni-freiburg.de>.
Hi Sam,

to do such matching you first of all need something that keeps semantic
information about words: e.g. a thesaurus, where "red", "blue" and "black"
are all grouped under the same term "colour". Otherwise, how will
your system know that "nike red shoes" should match to "nike shoes -black" and not to
"nike shoes -"anything else"?
You would also need rules that define that only certain terms are to
be replaced with alternatives. Otherwise, your query can be mapped to X
alternatives like:
"-adidas red shoes", "nike red -pants" ...

Cheers,
Olena

On Sun, 23 Oct 2005, Sam Lee wrote:

> Hi,
>   Someone suggested that I should use MemoryIndex to
> match content to a large # of queries. e.g. "nike red
> shoes" --match--> "nike shoes -blue"  and --match-->
> "nike shoes -black"...  What if I have 100000 of these
> queries for each content?  and there maybe 1000000 of
> these contents.
>
> But how fast is MemoryIndex?  Is it cpu and memory
> intensive?  I read somewhere and it said that it is
> about  three order faster than normal operation.  If
> so, why not use it for the normal operation as well?
>
> Many thanks.
>
>
>
>
>
> __________________________________
> Start your day with Yahoo! - Make it your home page!
> http://www.yahoo.com/r/hs
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How Fast is MemoryIndex? How Much Resource Does It Use?

Posted by Olena Medelyan <me...@coling.uni-freiburg.de>.
Hi Sam,

to do such matching you first of all need something that keeps semantic
information about words: e.g. a thesaurus, where "red", "blue" and "black"
are all grouped under the same term "colour". Otherwise, how will
your system know that "nike red shoes" should match to "nike shoes -black" and not to
"nike shoes -"anything else"?
You would also need rules that define that only certain terms are to
be replaced with alternatives. Otherwise, your query can be mapped to X
alternatives like:
"-adidas red shoes", "nike red -pants" ...

Cheers,
Olena

On Sun, 23 Oct 2005, Sam Lee wrote:

> Hi,
>   Someone suggested that I should use MemoryIndex to
> match content to a large # of queries. e.g. "nike red
> shoes" --match--> "nike shoes -blue"  and --match-->
> "nike shoes -black"...  What if I have 100000 of these
> queries for each content?  and there maybe 1000000 of
> these contents.
>
> But how fast is MemoryIndex?  Is it cpu and memory
> intensive?  I read somewhere and it said that it is
> about  three order faster than normal operation.  If
> so, why not use it for the normal operation as well?
>
> Many thanks.
>
>
>
>
>
> __________________________________
> Start your day with Yahoo! - Make it your home page!
> http://www.yahoo.com/r/hs
>