You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by BrightMinds Dev <de...@brightminds.org> on 2011/03/04 18:59:40 UTC

Recent Content - Lucene vs. DB SELECT / DB Triggers / Memcached

We are developing a large 4-tier multi-server app that will accept 
Questions and related Comments supplied by users.  There will be 100K's 
of users that live in Shards.  Also, ideally there would be no delay in 
adding content and seeing it in recent results but to make the system 
performant a delay is acceptable.

On the main page we will have 2 panels with the 5 most recent site-wide 
Questions and 5 most recent site-wide Comments.
On the user's profile page we would display similar panels except they 
would only consist of links that pertain to that user.


There are essentially 4 design choices:

1) Do periodic DB SELECT calls and cache for site-wide content "AND" do 
live DB SELECT calls for user specific content.  Not wonderful but ok.

2) Use DB triggers to manage tables for site-wide content, though as 
content is sharded we would need to aggregate and resort based on 
results obtained from all shards "AND" do live DB SELECT calls for user 
specific content.  Sounds awful... .

3) Use a solution like memcached however having multiple servers add and 
prune site-wide content to cache seems like it would be a synchro 
nightmare "AND" do live DB SELECT calls for user specific content.  Nix 
that... .

4) Use Lucene to obtain most recent site-wide content (and better yet 
push it into memcached) "AND" use Lucene to retrieve live user specific 
content


Is this a natural fit for Lucene?

I understood Lucene is very performant but is finding the last X 
documents based on a long (timestamp) something that it would do very well?

Or is Lucene not appropriate and should we be considering something else?

Personally, I think Lucene is the best choice... but would like to hear 
the thoughts of others.

Thanks,

--Nikolaos 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Recent Content - Lucene vs. DB SELECT / DB Triggers / Memcached

Posted by DBSight <db...@gmail.com>.
Can you just maintain the site-wide top 5 by combining top 5 of each Shards?
It can be done in memory and just O(1) operations. No Lucene is needed.

-- 

--

Chris Lu

-------------------------

Instant Scalable Full-Text Search On Any Database/Application

site: http://www.dbsight.net

demo: http://search.dbsight.com

Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

On 2011/3/4 9:59, BrightMinds Dev wrote:
> We are developing a large 4-tier multi-server app that will accept 
> Questions and related Comments supplied by users.  There will be 
> 100K's of users that live in Shards.  Also, ideally there would be no 
> delay in adding content and seeing it in recent results but to make 
> the system performant a delay is acceptable.
>
> On the main page we will have 2 panels with the 5 most recent 
> site-wide Questions and 5 most recent site-wide Comments.
> On the user's profile page we would display similar panels except they 
> would only consist of links that pertain to that user.
>
>
> There are essentially 4 design choices:
>
> 1) Do periodic DB SELECT calls and cache for site-wide content "AND" 
> do live DB SELECT calls for user specific content.  Not wonderful but ok.
>
> 2) Use DB triggers to manage tables for site-wide content, though as 
> content is sharded we would need to aggregate and resort based on 
> results obtained from all shards "AND" do live DB SELECT calls for 
> user specific content.  Sounds awful... .
>
> 3) Use a solution like memcached however having multiple servers add 
> and prune site-wide content to cache seems like it would be a synchro 
> nightmare "AND" do live DB SELECT calls for user specific content.  
> Nix that... .
>
> 4) Use Lucene to obtain most recent site-wide content (and better yet 
> push it into memcached) "AND" use Lucene to retrieve live user 
> specific content
>
>
> Is this a natural fit for Lucene?
>
> I understood Lucene is very performant but is finding the last X 
> documents based on a long (timestamp) something that it would do very 
> well?
>
> Or is Lucene not appropriate and should we be considering something else?
>
> Personally, I think Lucene is the best choice... but would like to 
> hear the thoughts of others.
>
> Thanks,
>
> --Nikolaos
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



Re: Recent Content - Lucene vs. DB SELECT / DB Triggers / Memcached

Posted by Itamar Syn-Hershko <it...@code972.com>.
(sorry for picking this up so late...)


This sounds like a perfect fit for document DBs like CouchDB and MongoDB 
- based on your architecture and data structure.


They are designed for multi-server applications, and use Map/Reduce 
which will give you Lucene operations directly from your DB, no DB-index 
sync necessary.


On 04/03/2011 19:59, BrightMinds Dev wrote:

> We are developing a large 4-tier multi-server app that will accept
> Questions and related Comments supplied by users.  There will be
> 100K's of users that live in Shards.  Also, ideally there would be no
> delay in adding content and seeing it in recent results but to make
> the system performant a delay is acceptable.
>
> On the main page we will have 2 panels with the 5 most recent
> site-wide Questions and 5 most recent site-wide Comments.
> On the user's profile page we would display similar panels except they
> would only consist of links that pertain to that user.
>
>
> There are essentially 4 design choices:
>
> 1) Do periodic DB SELECT calls and cache for site-wide content "AND"
> do live DB SELECT calls for user specific content.  Not wonderful but ok.
>
> 2) Use DB triggers to manage tables for site-wide content, though as
> content is sharded we would need to aggregate and resort based on
> results obtained from all shards "AND" do live DB SELECT calls for
> user specific content.  Sounds awful... .
>
> 3) Use a solution like memcached however having multiple servers add
> and prune site-wide content to cache seems like it would be a synchro
> nightmare "AND" do live DB SELECT calls for user specific content.
> Nix that... .
>
> 4) Use Lucene to obtain most recent site-wide content (and better yet
> push it into memcached) "AND" use Lucene to retrieve live user
> specific content
>
>
> Is this a natural fit for Lucene?
>
> I understood Lucene is very performant but is finding the last X
> documents based on a long (timestamp) something that it would do very
> well?
>
> Or is Lucene not appropriate and should we be considering something else?
>
> Personally, I think Lucene is the best choice... but would like to
> hear the thoughts of others.
>
> Thanks,
>
> --Nikolaos
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Recent Content - Lucene vs. DB SELECT / DB Triggers / Memcached

Posted by Ian Lea <ia...@gmail.com>.
I'd go for lucene.  The near realtime search stuff should minimise
delays in making content visible and if you can do the last X since
whenever with a NumericRangeQuery, perhaps in conjunction with a
custom Collector, that will be fast too.


--
Ian.


On Fri, Mar 4, 2011 at 5:59 PM, BrightMinds Dev <de...@brightminds.org> wrote:
> We are developing a large 4-tier multi-server app that will accept Questions
> and related Comments supplied by users.  There will be 100K's of users that
> live in Shards.  Also, ideally there would be no delay in adding content and
> seeing it in recent results but to make the system performant a delay is
> acceptable.
>
> On the main page we will have 2 panels with the 5 most recent site-wide
> Questions and 5 most recent site-wide Comments.
> On the user's profile page we would display similar panels except they would
> only consist of links that pertain to that user.
>
>
> There are essentially 4 design choices:
>
> 1) Do periodic DB SELECT calls and cache for site-wide content "AND" do live
> DB SELECT calls for user specific content.  Not wonderful but ok.
>
> 2) Use DB triggers to manage tables for site-wide content, though as content
> is sharded we would need to aggregate and resort based on results obtained
> from all shards "AND" do live DB SELECT calls for user specific content.
>  Sounds awful... .
>
> 3) Use a solution like memcached however having multiple servers add and
> prune site-wide content to cache seems like it would be a synchro nightmare
> "AND" do live DB SELECT calls for user specific content.  Nix that... .
>
> 4) Use Lucene to obtain most recent site-wide content (and better yet push
> it into memcached) "AND" use Lucene to retrieve live user specific content
>
>
> Is this a natural fit for Lucene?
>
> I understood Lucene is very performant but is finding the last X documents
> based on a long (timestamp) something that it would do very well?
>
> Or is Lucene not appropriate and should we be considering something else?
>
> Personally, I think Lucene is the best choice... but would like to hear the
> thoughts of others.
>
> Thanks,
>
> --Nikolaos
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org