You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by ywlee522 <yw...@gmail.com> on 2009/06/11 21:00:41 UTC

Performance: Field.Store.YES vs. Field.Store.NO + DB


My document store has 750K users who wrote 100M reports.  The size of a
report ranges from 1k to 2M. 
I have read in several places that actual values (text) can be stored in DB,
while lucene only manages index with Field.Store.NO

I wonder any differences in performance (search and match retrieval) between
Field.Store.YES and NO values.  For example, if actual report contents are
stored in a DB (Field.Store.NO), given a search that matches 500 reports,
one has to send either 500 SELECT queries to DB, or one long SELECT with IN
clause in WHERE condition. Or something in between.  Is this faster than
retrieving them from index created with Field.Store.YES.

Does NOT storing actual values in index make the search faster? 

Any pointer would be appreciated. Thanks




-- 
View this message in context: http://www.nabble.com/Performance%3A-Field.Store.YES-vs.-Field.Store.NO-%2B-DB-tp23987086p23987086.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: Performance: Field.Store.YES vs. Field.Store.NO + DB

Posted by ywlee522 <yw...@gmail.com>.

Thanks for the pointers. I sure will do explore them as options.

 
 

-- 
View this message in context: http://www.nabble.com/Performance%3A-Field.Store.YES-vs.-Field.Store.NO-%2B-DB-tp23987086p23997587.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: Performance: Field.Store.YES vs. Field.Store.NO + DB

Posted by Ted Dunning <te...@gmail.com>.
A traditional database is not normally used for this.  Look at something
like Voldemort <http://simonwillison.net/2009/Jan/17/voldemort/> or
Hbase<http://hadoop.apache.org/hbase/>or even
memcache <http://www.danga.com/memcached/> instead.

Also, you database is moderately large, but not massively so. With a decent
sharding system like Katta, you should be able to store the text in your
index and still get good retrieval performance.

On Thu, Jun 11, 2009 at 12:00 PM, ywlee522 <yw...@gmail.com> wrote:

> I have read in several places that actual values (text) can be stored in
> DB,
> while lucene only manages index with Field.Store.NO
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Performance: Field.Store.YES vs. Field.Store.NO + DB

Posted by ywlee522 <yw...@gmail.com>.

I will try several options and post results.
Thanks.

 

-- 
View this message in context: http://www.nabble.com/Performance%3A-Field.Store.YES-vs.-Field.Store.NO-%2B-DB-tp23987086p23997599.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: Performance: Field.Store.YES vs. Field.Store.NO + DB

Posted by Michael McCandless <lu...@mikemccandless.com>.
You should try it & see & post back.

When using Lucene, you should sort by docID and then retrieve in that order.

There's also another open source project (don't remember the name)
that aims to be a store for cases like this.  There was an
announcement a while back... would be a 3rd option to try.

Please post back results if you get that far!

Mike

On Thu, Jun 11, 2009 at 3:00 PM, ywlee522<yw...@gmail.com> wrote:
>
>
> My document store has 750K users who wrote 100M reports.  The size of a
> report ranges from 1k to 2M.
> I have read in several places that actual values (text) can be stored in DB,
> while lucene only manages index with Field.Store.NO
>
> I wonder any differences in performance (search and match retrieval) between
> Field.Store.YES and NO values.  For example, if actual report contents are
> stored in a DB (Field.Store.NO), given a search that matches 500 reports,
> one has to send either 500 SELECT queries to DB, or one long SELECT with IN
> clause in WHERE condition. Or something in between.  Is this faster than
> retrieving them from index created with Field.Store.YES.
>
> Does NOT storing actual values in index make the search faster?
>
> Any pointer would be appreciated. Thanks
>
>
>
>
> --
> View this message in context: http://www.nabble.com/Performance%3A-Field.Store.YES-vs.-Field.Store.NO-%2B-DB-tp23987086p23987086.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>