You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Matthieu Labour <ma...@yahoo.com> on 2010/01/27 01:27:17 UTC

Multiple Cores Vs. Single Core for the following use case

Hi



Shall I set up Multiple Core or Single core for the following use case:



I have X number of users.



When I do a search, I always know for which user I am doing a search



Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document?



If I choose the 1 core solution then I am concerned with performance.
Let's say I search for "NewYork" ... If lucene returns all "New York"
matches for all users and then filters based on the userId, then this
is going to be less efficient than if I have sharded per user and send
the request for "New York" to the user's core



Thank you for your help



matt

RE: update doc success, but could not find the new value

Posted by Jennifer Luo <Je...@talenttech.com>.

It works. I made some mistake in my code.

Jennifer Luo

> -----Original Message-----
> From: Jennifer Luo [mailto:Jennifer@talenttech.com]
> Sent: Wednesday, January 27, 2010 1:57 PM
> To: solr-user@lucene.apache.org
> Subject: RE: update doc success, but could not find the new value
> 
> I am using example, only with two fields, id and body. Id is string
> field, body is text field.
> 
> I use another program to do a http post to update the document, url is
>
http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
> n=10 , the data is
> <add>
> 	<doc>
> 		<field name="id">id1</field>
> 		<field name="body">test body</field>
>       </doc>
> </add>
> 
> I get the responseHeader back, the status is 0.
> 
> Then I go to admin page, do search, query is body:test.  The result
> numFound = 0.
> 
> I think the reason should be the index is not updated with the updated
> document.
> 
> What should I do? What's is missing?
> Jennifer Luo
> 
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: Wednesday, January 27, 2010 1:39 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: update doc success, but could not find the new value
> >
> > Ummm, you have to provide a *lot* more detail before anyone can
help.
> >
> > Have you used Luke or the admin page to examine your index and
> determine
> > that the update did, indeed, work?
> >
> > Have you tried firing your query with debugQuery=on to see if the
> fields
> > searched are the ones you expect?
> >
> > etc.
> >
> > Erick
> >
> > On Wed, Jan 27, 2010 at 11:54 AM, Jennifer Luo
> > <Je...@talenttech.com>wrote:
> >
> > > I am using
> > >
>
http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
> > > n=10 to update a document. The responseHeader's status is 0.
> > >
> > > But when I search the new value, it couldn't be found.
> > >

RE: update doc success, but could not find the new value

Posted by Markus Jelsma <ma...@buyways.nl>.

Check out Jetty's output or Tomcat's logs. The logging is very verbose and
you can get a clearer picture.


Jennifer Luo said:
> I am using example, only with two fields, id and body. Id is string
> field, body is text field.
>
> I use another program to do a http post to update the document, url is
> http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
> n=10 , the data is
> <add>
> 	<doc>
> 		<field name="id">id1</field>
> 		<field name="body">test body</field>
>       </doc>
> </add>
>
> I get the responseHeader back, the status is 0.
>
> Then I go to admin page, do search, query is body:test.  The result
> numFound = 0.
>
> I think the reason should be the index is not updated with the updated
> document.
>
> What should I do? What's is missing?
> Jennifer Luo
>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: Wednesday, January 27, 2010 1:39 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: update doc success, but could not find the new value
>>
>> Ummm, you have to provide a *lot* more detail before anyone can help.
>>
>> Have you used Luke or the admin page to examine your index and
> determine
>> that the update did, indeed, work?
>>
>> Have you tried firing your query with debugQuery=on to see if the
> fields
>> searched are the ones you expect?
>>
>> etc.
>>
>> Erick
>>
>> On Wed, Jan 27, 2010 at 11:54 AM, Jennifer Luo
>> <Je...@talenttech.com>wrote:
>>
>> > I am using
>> >
> http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
>> > n=10 to update a document. The responseHeader's status is 0.
>> >
>> > But when I search the new value, it couldn't be found.
>> >

RE: update doc success, but could not find the new value

Posted by Jennifer Luo <Je...@talenttech.com>.

I am using example, only with two fields, id and body. Id is string
field, body is text field.

I use another program to do a http post to update the document, url is
http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
n=10 , the data is 
<add>
	<doc>
		<field name="id">id1</field>
		<field name="body">test body</field>
      </doc>
</add>

I get the responseHeader back, the status is 0.

Then I go to admin page, do search, query is body:test.  The result
numFound = 0.

I think the reason should be the index is not updated with the updated
document.

What should I do? What's is missing?
Jennifer Luo

> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, January 27, 2010 1:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: update doc success, but could not find the new value
> 
> Ummm, you have to provide a *lot* more detail before anyone can help.
> 
> Have you used Luke or the admin page to examine your index and
determine
> that the update did, indeed, work?
> 
> Have you tried firing your query with debugQuery=on to see if the
fields
> searched are the ones you expect?
> 
> etc.
> 
> Erick
> 
> On Wed, Jan 27, 2010 at 11:54 AM, Jennifer Luo
> <Je...@talenttech.com>wrote:
> 
> > I am using
> >
http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
> > n=10 to update a document. The responseHeader's status is 0.
> >
> > But when I search the new value, it couldn't be found.
> >

Re: update doc success, but could not find the new value

Posted by Erick Erickson <er...@gmail.com>.

Ummm, you have to provide a *lot* more detail before anyone can help.

Have you used Luke or the admin page to examine your index and determine
that the update did, indeed, work?

Have you tried firing your query with debugQuery=on to see if the fields
searched are the ones you expect?

etc.

Erick

On Wed, Jan 27, 2010 at 11:54 AM, Jennifer Luo <Je...@talenttech.com>wrote:

> I am using
> http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
> n=10 to update a document. The responseHeader's status is 0.
>
> But when I search the new value, it couldn't be found.
>

Re: update doc success, but could not find the new value

Posted by Chris Hostetter <ho...@fucit.org>.

: Subject: update doc success, but could not find the new value
: In-Reply-To: <44...@web56308.mail.re3.yahoo.com>
: References: <27...@talk.nabble.com>
:     <44...@web56308.mail.re3.yahoo.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss

update doc success, but could not find the new value

Posted by Jennifer Luo <Je...@talenttech.com>.

I am using
http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
n=10 to update a document. The responseHeader's status is 0.

But when I search the new value, it couldn't be found.

Re: Multiple Cores Vs. Single Core for the following use case

Posted by Matthieu Labour <ma...@yahoo.com>.


Thanks a lot everybody for the responses ... I am going to do some practical/empirical testing and will report
matt

--- On Wed, 1/27/10, Tom Hill <so...@worldware.com> wrote:

From: Tom Hill <so...@worldware.com>
Subject: Re: Multiple Cores Vs. Single Core for the following use case
To: solr-user@lucene.apache.org
Date: Wednesday, January 27, 2010, 2:47 PM

Hi -

I'd probably go with a single core on this one, just for ease of operations..

But here are some thoughts:

One advantage I can see to multiple cores, though, would be better idf
calculations. With individual cores, each user only sees the idf for his own
documents. With a single core, the idf will be across all documents. In
theory, better relevance.

While multi-core will use more ram to start with, and I would expect it to
use more disk (term dictionary per core). Filters would add to the memory
footprint of the multiple core setup.

However, if you only end up sorting/faceting on some of the cores, your
memory use with multiple cores may actually be less. With multiple cores,
each field cache only covers one user's docs. With single core, you have one
field cache entry per doc in the whole corpus. Depending on usage patterns,
index sizes, etc, this could be a significant amount of memory.

Tom


On Wed, Jan 27, 2010 at 11:38 AM, Amit Nithian <an...@gmail.com> wrote:

> It sounds to me that multiple cores won't scale.. wouldn't you have to
> create multiple configurations per each core and does the ranking function
> change per user?
>
> I would imagine that the filter method would work better.. the caching is
> there and as mentioned earlier would be fast for multiple searches. If you
> have searches for the same user, then add that to your warming queries list
> so that on server startup, the cache will be warm for certain users that
> you
> know tend to do a lot of searches. This can be known empirically or by log
> mining.
>
> I haven't used multiple cores but I suspect that having that many
> configuration files parsed and loaded in memory can't be good for memory
> usage over filter caching.
>
> Just my 2 cents
> Amit
>
> On Wed, Jan 27, 2010 at 8:58 AM, Matthieu Labour
> <ma...@yahoo.com>wrote:
>
> > Thanks Didier for your response
> > And in your opinion, this should be as fast as if I would getCore(userId)
> > -- provided that the core is already open -- and then search for "Paris"
> ?
> > matt
> >
> > --- On Wed, 1/27/10, didier deshommes <df...@gmail.com> wrote:
> >
> > From: didier deshommes <df...@gmail.com>
> > Subject: Re: Multiple Cores Vs. Single Core for the following use case
> > To: solr-user@lucene.apache.org
> > Date: Wednesday, January 27, 2010, 10:52 AM
> >
> > On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour
> > <ma...@yahoo.com> wrote:
> > > What I am trying to understand is the search/filter algorithm. If I
> have
> > 1 core with all documents and I  search for "Paris" for userId="123", is
> > lucene going to first search for all Paris documents and then apply a
> filter
> > on the userId ? If this is the case, then I am better off having a
> specific
> > index for the user="123" because this will be faster
> >
> > If you want to apply the filter to userid first, use filter queries
> > (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will
> > filter by userid first then search for "Paris".
> >
> > didier
> >
> > >
> > >
> > >
> > >
> > >
> > > --- On Wed, 1/27/10, Marc Sturlese <ma...@gmail.com> wrote:
> > >
> > > From: Marc Sturlese <ma...@gmail.com>
> > > Subject: Re: Multiple Cores Vs. Single Core for the following use case
> > > To: solr-user@lucene.apache.org
> > > Date: Wednesday, January 27, 2010, 2:22 AM
> > >
> > >
> > > In case you are going to use core per user take a look to this patch:
> > > http://wiki.apache.org/solr/LotsOfCores
> > >
> > > Trey-13 wrote:
> > >>
> > >> Hi Matt,
> > >>
> > >> In most cases you are going to be better off going with the userid
> > method
> > >> unless you have a very small number of users and a very large number
> of
> > >> docs/user. The userid method will likely be much easier to manage, as
> > you
> > >> won't have to spin up a new core every time you add a new user.  I
> would
> > >> start here and see if the performance is good enough for your
> > requirements
> > >> before you start worrying about it not being efficient.
> > >>
> > >> That being said, I really don't have any idea what your data looks
> like.
> > >> How many users do you have?  How many documents per user?  Are any
> > >> documents
> > >> shared by multiple users?
> > >>
> > >> -Trey
> > >>
> > >>
> > >>
> > >> On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
> > >> <ma...@yahoo.com>wrote:
> > >>
> > >>> Hi
> > >>>
> > >>>
> > >>>
> > >>> Shall I set up Multiple Core or Single core for the following use
> case:
> > >>>
> > >>>
> > >>>
> > >>> I have X number of users.
> > >>>
> > >>>
> > >>>
> > >>> When I do a search, I always know for which user I am doing a search
> > >>>
> > >>>
> > >>>
> > >>> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core
> and
> > >>> add
> > >>> a userId field to each document?
> > >>>
> > >>>
> > >>>
> > >>> If I choose the 1 core solution then I am concerned with performance.
> > >>> Let's say I search for "NewYork" ... If lucene returns all "New York"
> > >>> matches for all users and then filters based on the userId, then this
> > >>> is going to be less efficient than if I have sharded per user and
> send
> > >>> the request for "New York" to the user's core
> > >>>
> > >>>
> > >>>
> > >>> Thank you for your help
> > >>>
> > >>>
> > >>>
> > >>> matt
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >
> > > --
> > > View this message in context:
> >
> http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>

Re: Multiple Cores Vs. Single Core for the following use case

Posted by Tom Hill <so...@worldware.com>.

Hi -

I'd probably go with a single core on this one, just for ease of operations.

But here are some thoughts:

One advantage I can see to multiple cores, though, would be better idf
calculations. With individual cores, each user only sees the idf for his own
documents. With a single core, the idf will be across all documents. In
theory, better relevance.

While multi-core will use more ram to start with, and I would expect it to
use more disk (term dictionary per core). Filters would add to the memory
footprint of the multiple core setup.

However, if you only end up sorting/faceting on some of the cores, your
memory use with multiple cores may actually be less. With multiple cores,
each field cache only covers one user's docs. With single core, you have one
field cache entry per doc in the whole corpus. Depending on usage patterns,
index sizes, etc, this could be a significant amount of memory.

Tom


On Wed, Jan 27, 2010 at 11:38 AM, Amit Nithian <an...@gmail.com> wrote:

> It sounds to me that multiple cores won't scale.. wouldn't you have to
> create multiple configurations per each core and does the ranking function
> change per user?
>
> I would imagine that the filter method would work better.. the caching is
> there and as mentioned earlier would be fast for multiple searches. If you
> have searches for the same user, then add that to your warming queries list
> so that on server startup, the cache will be warm for certain users that
> you
> know tend to do a lot of searches. This can be known empirically or by log
> mining.
>
> I haven't used multiple cores but I suspect that having that many
> configuration files parsed and loaded in memory can't be good for memory
> usage over filter caching.
>
> Just my 2 cents
> Amit
>
> On Wed, Jan 27, 2010 at 8:58 AM, Matthieu Labour
> <ma...@yahoo.com>wrote:
>
> > Thanks Didier for your response
> > And in your opinion, this should be as fast as if I would getCore(userId)
> > -- provided that the core is already open -- and then search for "Paris"
> ?
> > matt
> >
> > --- On Wed, 1/27/10, didier deshommes <df...@gmail.com> wrote:
> >
> > From: didier deshommes <df...@gmail.com>
> > Subject: Re: Multiple Cores Vs. Single Core for the following use case
> > To: solr-user@lucene.apache.org
> > Date: Wednesday, January 27, 2010, 10:52 AM
> >
> > On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour
> > <ma...@yahoo.com> wrote:
> > > What I am trying to understand is the search/filter algorithm. If I
> have
> > 1 core with all documents and I  search for "Paris" for userId="123", is
> > lucene going to first search for all Paris documents and then apply a
> filter
> > on the userId ? If this is the case, then I am better off having a
> specific
> > index for the user="123" because this will be faster
> >
> > If you want to apply the filter to userid first, use filter queries
> > (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will
> > filter by userid first then search for "Paris".
> >
> > didier
> >
> > >
> > >
> > >
> > >
> > >
> > > --- On Wed, 1/27/10, Marc Sturlese <ma...@gmail.com> wrote:
> > >
> > > From: Marc Sturlese <ma...@gmail.com>
> > > Subject: Re: Multiple Cores Vs. Single Core for the following use case
> > > To: solr-user@lucene.apache.org
> > > Date: Wednesday, January 27, 2010, 2:22 AM
> > >
> > >
> > > In case you are going to use core per user take a look to this patch:
> > > http://wiki.apache.org/solr/LotsOfCores
> > >
> > > Trey-13 wrote:
> > >>
> > >> Hi Matt,
> > >>
> > >> In most cases you are going to be better off going with the userid
> > method
> > >> unless you have a very small number of users and a very large number
> of
> > >> docs/user. The userid method will likely be much easier to manage, as
> > you
> > >> won't have to spin up a new core every time you add a new user.  I
> would
> > >> start here and see if the performance is good enough for your
> > requirements
> > >> before you start worrying about it not being efficient.
> > >>
> > >> That being said, I really don't have any idea what your data looks
> like.
> > >> How many users do you have?  How many documents per user?  Are any
> > >> documents
> > >> shared by multiple users?
> > >>
> > >> -Trey
> > >>
> > >>
> > >>
> > >> On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
> > >> <ma...@yahoo.com>wrote:
> > >>
> > >>> Hi
> > >>>
> > >>>
> > >>>
> > >>> Shall I set up Multiple Core or Single core for the following use
> case:
> > >>>
> > >>>
> > >>>
> > >>> I have X number of users.
> > >>>
> > >>>
> > >>>
> > >>> When I do a search, I always know for which user I am doing a search
> > >>>
> > >>>
> > >>>
> > >>> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core
> and
> > >>> add
> > >>> a userId field to each document?
> > >>>
> > >>>
> > >>>
> > >>> If I choose the 1 core solution then I am concerned with performance.
> > >>> Let's say I search for "NewYork" ... If lucene returns all "New York"
> > >>> matches for all users and then filters based on the userId, then this
> > >>> is going to be less efficient than if I have sharded per user and
> send
> > >>> the request for "New York" to the user's core
> > >>>
> > >>>
> > >>>
> > >>> Thank you for your help
> > >>>
> > >>>
> > >>>
> > >>> matt
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >
> > > --
> > > View this message in context:
> >
> http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>

Re: Multiple Cores Vs. Single Core for the following use case

Posted by Amit Nithian <an...@gmail.com>.

It sounds to me that multiple cores won't scale.. wouldn't you have to
create multiple configurations per each core and does the ranking function
change per user?

I would imagine that the filter method would work better.. the caching is
there and as mentioned earlier would be fast for multiple searches. If you
have searches for the same user, then add that to your warming queries list
so that on server startup, the cache will be warm for certain users that you
know tend to do a lot of searches. This can be known empirically or by log
mining.

I haven't used multiple cores but I suspect that having that many
configuration files parsed and loaded in memory can't be good for memory
usage over filter caching.

Just my 2 cents
Amit

On Wed, Jan 27, 2010 at 8:58 AM, Matthieu Labour
<ma...@yahoo.com>wrote:

> Thanks Didier for your response
> And in your opinion, this should be as fast as if I would getCore(userId)
> -- provided that the core is already open -- and then search for "Paris" ?
> matt
>
> --- On Wed, 1/27/10, didier deshommes <df...@gmail.com> wrote:
>
> From: didier deshommes <df...@gmail.com>
> Subject: Re: Multiple Cores Vs. Single Core for the following use case
> To: solr-user@lucene.apache.org
> Date: Wednesday, January 27, 2010, 10:52 AM
>
> On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour
> <ma...@yahoo.com> wrote:
> > What I am trying to understand is the search/filter algorithm. If I have
> 1 core with all documents and I  search for "Paris" for userId="123", is
> lucene going to first search for all Paris documents and then apply a filter
> on the userId ? If this is the case, then I am better off having a specific
> index for the user="123" because this will be faster
>
> If you want to apply the filter to userid first, use filter queries
> (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will
> filter by userid first then search for "Paris".
>
> didier
>
> >
> >
> >
> >
> >
> > --- On Wed, 1/27/10, Marc Sturlese <ma...@gmail.com> wrote:
> >
> > From: Marc Sturlese <ma...@gmail.com>
> > Subject: Re: Multiple Cores Vs. Single Core for the following use case
> > To: solr-user@lucene.apache.org
> > Date: Wednesday, January 27, 2010, 2:22 AM
> >
> >
> > In case you are going to use core per user take a look to this patch:
> > http://wiki.apache.org/solr/LotsOfCores
> >
> > Trey-13 wrote:
> >>
> >> Hi Matt,
> >>
> >> In most cases you are going to be better off going with the userid
> method
> >> unless you have a very small number of users and a very large number of
> >> docs/user. The userid method will likely be much easier to manage, as
> you
> >> won't have to spin up a new core every time you add a new user.  I would
> >> start here and see if the performance is good enough for your
> requirements
> >> before you start worrying about it not being efficient.
> >>
> >> That being said, I really don't have any idea what your data looks like.
> >> How many users do you have?  How many documents per user?  Are any
> >> documents
> >> shared by multiple users?
> >>
> >> -Trey
> >>
> >>
> >>
> >> On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
> >> <ma...@yahoo.com>wrote:
> >>
> >>> Hi
> >>>
> >>>
> >>>
> >>> Shall I set up Multiple Core or Single core for the following use case:
> >>>
> >>>
> >>>
> >>> I have X number of users.
> >>>
> >>>
> >>>
> >>> When I do a search, I always know for which user I am doing a search
> >>>
> >>>
> >>>
> >>> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and
> >>> add
> >>> a userId field to each document?
> >>>
> >>>
> >>>
> >>> If I choose the 1 core solution then I am concerned with performance.
> >>> Let's say I search for "NewYork" ... If lucene returns all "New York"
> >>> matches for all users and then filters based on the userId, then this
> >>> is going to be less efficient than if I have sharded per user and send
> >>> the request for "New York" to the user's core
> >>>
> >>>
> >>>
> >>> Thank you for your help
> >>>
> >>>
> >>>
> >>> matt
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> > --
> > View this message in context:
> http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
> >
> >
>
>
>
>
>

Re: Multiple Cores Vs. Single Core for the following use case

Posted by Matthieu Labour <ma...@yahoo.com>.

Thanks Didier for your response
And in your opinion, this should be as fast as if I would getCore(userId) -- provided that the core is already open -- and then search for "Paris" ?
matt

--- On Wed, 1/27/10, didier deshommes <df...@gmail.com> wrote:

From: didier deshommes <df...@gmail.com>
Subject: Re: Multiple Cores Vs. Single Core for the following use case
To: solr-user@lucene.apache.org
Date: Wednesday, January 27, 2010, 10:52 AM

On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour
<ma...@yahoo.com> wrote:
> What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I  search for "Paris" for userId="123", is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user="123" because this will be faster

If you want to apply the filter to userid first, use filter queries
(http://wiki.apache.org/solr/CommonQueryParameters#fq). This will
filter by userid first then search for "Paris".

didier

>
>
>
>
>
> --- On Wed, 1/27/10, Marc Sturlese <ma...@gmail.com> wrote:
>
> From: Marc Sturlese <ma...@gmail.com>
> Subject: Re: Multiple Cores Vs. Single Core for the following use case
> To: solr-user@lucene.apache.org
> Date: Wednesday, January 27, 2010, 2:22 AM
>
>
> In case you are going to use core per user take a look to this patch:
> http://wiki.apache.org/solr/LotsOfCores
>
> Trey-13 wrote:
>>
>> Hi Matt,
>>
>> In most cases you are going to be better off going with the userid method
>> unless you have a very small number of users and a very large number of
>> docs/user. The userid method will likely be much easier to manage, as you
>> won't have to spin up a new core every time you add a new user.  I would
>> start here and see if the performance is good enough for your requirements
>> before you start worrying about it not being efficient.
>>
>> That being said, I really don't have any idea what your data looks like.
>> How many users do you have?  How many documents per user?  Are any
>> documents
>> shared by multiple users?
>>
>> -Trey
>>
>>
>>
>> On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
>> <ma...@yahoo.com>wrote:
>>
>>> Hi
>>>
>>>
>>>
>>> Shall I set up Multiple Core or Single core for the following use case:
>>>
>>>
>>>
>>> I have X number of users.
>>>
>>>
>>>
>>> When I do a search, I always know for which user I am doing a search
>>>
>>>
>>>
>>> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and
>>> add
>>> a userId field to each document?
>>>
>>>
>>>
>>> If I choose the 1 core solution then I am concerned with performance.
>>> Let's say I search for "NewYork" ... If lucene returns all "New York"
>>> matches for all users and then filters based on the userId, then this
>>> is going to be less efficient than if I have sharded per user and send
>>> the request for "New York" to the user's core
>>>
>>>
>>>
>>> Thank you for your help
>>>
>>>
>>>
>>> matt
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
>

Re: Multiple Cores Vs. Single Core for the following use case

Posted by didier deshommes <df...@gmail.com>.

On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour
<ma...@yahoo.com> wrote:
> What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I  search for "Paris" for userId="123", is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user="123" because this will be faster

If you want to apply the filter to userid first, use filter queries
(http://wiki.apache.org/solr/CommonQueryParameters#fq). This will
filter by userid first then search for "Paris".

didier

>
>
>
>
>
> --- On Wed, 1/27/10, Marc Sturlese <ma...@gmail.com> wrote:
>
> From: Marc Sturlese <ma...@gmail.com>
> Subject: Re: Multiple Cores Vs. Single Core for the following use case
> To: solr-user@lucene.apache.org
> Date: Wednesday, January 27, 2010, 2:22 AM
>
>
> In case you are going to use core per user take a look to this patch:
> http://wiki.apache.org/solr/LotsOfCores
>
> Trey-13 wrote:
>>
>> Hi Matt,
>>
>> In most cases you are going to be better off going with the userid method
>> unless you have a very small number of users and a very large number of
>> docs/user. The userid method will likely be much easier to manage, as you
>> won't have to spin up a new core every time you add a new user.  I would
>> start here and see if the performance is good enough for your requirements
>> before you start worrying about it not being efficient.
>>
>> That being said, I really don't have any idea what your data looks like.
>> How many users do you have?  How many documents per user?  Are any
>> documents
>> shared by multiple users?
>>
>> -Trey
>>
>>
>>
>> On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
>> <ma...@yahoo.com>wrote:
>>
>>> Hi
>>>
>>>
>>>
>>> Shall I set up Multiple Core or Single core for the following use case:
>>>
>>>
>>>
>>> I have X number of users.
>>>
>>>
>>>
>>> When I do a search, I always know for which user I am doing a search
>>>
>>>
>>>
>>> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and
>>> add
>>> a userId field to each document?
>>>
>>>
>>>
>>> If I choose the 1 core solution then I am concerned with performance.
>>> Let's say I search for "NewYork" ... If lucene returns all "New York"
>>> matches for all users and then filters based on the userId, then this
>>> is going to be less efficient than if I have sharded per user and send
>>> the request for "New York" to the user's core
>>>
>>>
>>>
>>> Thank you for your help
>>>
>>>
>>>
>>> matt
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
>

Re: Multiple Cores Vs. Single Core for the following use case

Posted by Toby Cole <to...@semantico.com>.

I've not looked at the filtering for quite a while, but if you're  
getting lots of similar queries, the filter's caching can play a huge  
part in speeding up queries, so even if the first query for "paris"  
was slow, subsequent queries from different users for the same terms  
will be sped up considerably (especially if you're using the  
FastLRUCache).

IF filtering is slow for your queries, why not try simply using a  
boolean query (i.e, for the example below: "paris AND userId:123")  
this would remove the cross-user usefulness of the caches, if I  
understand them correctly, but may speed up uncached searches.

Toby.


On 27 Jan 2010, at 15:48, Matthieu Labour wrote:

> @Marc: Thank you marc. This is a logic we had to implement in the  
> client application. Will look into applying the patch to replace our  
> own grown logic
>
> @Trey: I have 1000 users per machine. 1 core / user. Each core is  
> 35000 documents. Documents are small...each core goes from 100MB to  
> 1.3GB at most. There are 7 types of documents.
> What I am trying to understand is the search/filter algorithm. If I  
> have 1 core with all documents and I  search for "Paris" for  
> userId="123", is lucene going to first search for all Paris  
> documents and then apply a filter on the userId ? If this is the  
> case, then I am better off having a specific index for the  
> user="123" because this will be faster
>
>
>
>
>
> --- On Wed, 1/27/10, Marc Sturlese <ma...@gmail.com> wrote:
>
> From: Marc Sturlese <ma...@gmail.com>
> Subject: Re: Multiple Cores Vs. Single Core for the following use case
> To: solr-user@lucene.apache.org
> Date: Wednesday, January 27, 2010, 2:22 AM
>
>
> In case you are going to use core per user take a look to this patch:
> http://wiki.apache.org/solr/LotsOfCores
>
> Trey-13 wrote:
>>
>> Hi Matt,
>>
>> In most cases you are going to be better off going with the userid  
>> method
>> unless you have a very small number of users and a very large  
>> number of
>> docs/user. The userid method will likely be much easier to manage,  
>> as you
>> won't have to spin up a new core every time you add a new user.  I  
>> would
>> start here and see if the performance is good enough for your  
>> requirements
>> before you start worrying about it not being efficient.
>>
>> That being said, I really don't have any idea what your data looks  
>> like.
>> How many users do you have?  How many documents per user?  Are any
>> documents
>> shared by multiple users?
>>
>> -Trey
>>
>>
>>
>> On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
>> <ma...@yahoo.com>wrote:
>>
>>> Hi
>>>
>>>
>>>
>>> Shall I set up Multiple Core or Single core for the following use  
>>> case:
>>>
>>>
>>>
>>> I have X number of users.
>>>
>>>
>>>
>>> When I do a search, I always know for which user I am doing a search
>>>
>>>
>>>
>>> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core  
>>> and
>>> add
>>> a userId field to each document?
>>>
>>>
>>>
>>> If I choose the 1 core solution then I am concerned with  
>>> performance.
>>> Let's say I search for "NewYork" ... If lucene returns all "New  
>>> York"
>>> matches for all users and then filters based on the userId, then  
>>> this
>>> is going to be less efficient than if I have sharded per user and  
>>> send
>>> the request for "New York" to the user's core
>>>
>>>
>>>
>>> Thank you for your help
>>>
>>>
>>>
>>> matt
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
> -- 
> View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>


--
Toby Cole
Senior Software Engineer, Semantico Limited
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/

Re: Multiple Cores Vs. Single Core for the following use case

Posted by Matthieu Labour <ma...@yahoo.com>.

@Marc: Thank you marc. This is a logic we had to implement in the client application. Will look into applying the patch to replace our own grown logic

@Trey: I have 1000 users per machine. 1 core / user. Each core is 35000 documents. Documents are small...each core goes from 100MB to 1.3GB at most. There are 7 types of documents.
What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I  search for "Paris" for userId="123", is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user="123" because this will be faster 

--- On Wed, 1/27/10, Marc Sturlese <ma...@gmail.com> wrote:

From: Marc Sturlese <ma...@gmail.com>
Subject: Re: Multiple Cores Vs. Single Core for the following use case
To: solr-user@lucene.apache.org
Date: Wednesday, January 27, 2010, 2:22 AM

In case you are going to use core per user take a look to this patch:
http://wiki.apache.org/solr/LotsOfCores

Trey-13 wrote:
> 
> Hi Matt,
> 
> In most cases you are going to be better off going with the userid method
> unless you have a very small number of users and a very large number of
> docs/user. The userid method will likely be much easier to manage, as you
> won't have to spin up a new core every time you add a new user.  I would
> start here and see if the performance is good enough for your requirements
> before you start worrying about it not being efficient.
> 
> That being said, I really don't have any idea what your data looks like.
> How many users do you have?  How many documents per user?  Are any
> documents
> shared by multiple users?
> 
> -Trey
> 
> 
> 
> On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
> <ma...@yahoo.com>wrote:
> 
>> Hi
>>
>>
>>
>> Shall I set up Multiple Core or Single core for the following use case:
>>
>>
>>
>> I have X number of users.
>>
>>
>>
>> When I do a search, I always know for which user I am doing a search
>>
>>
>>
>> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and
>> add
>> a userId field to each document?
>>
>>
>>
>> If I choose the 1 core solution then I am concerned with performance.
>> Let's say I search for "NewYork" ... If lucene returns all "New York"
>> matches for all users and then filters based on the userId, then this
>> is going to be less efficient than if I have sharded per user and send
>> the request for "New York" to the user's core
>>
>>
>>
>> Thank you for your help
>>
>>
>>
>> matt
>>
>>
>>
>>
>>
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Cores Vs. Single Core for the following use case

Posted by Marc Sturlese <ma...@gmail.com>.

In case you are going to use core per user take a look to this patch:
http://wiki.apache.org/solr/LotsOfCores

Trey-13 wrote:
> 
> Hi Matt,
> 
> In most cases you are going to be better off going with the userid method
> unless you have a very small number of users and a very large number of
> docs/user. The userid method will likely be much easier to manage, as you
> won't have to spin up a new core every time you add a new user.  I would
> start here and see if the performance is good enough for your requirements
> before you start worrying about it not being efficient.
> 
> That being said, I really don't have any idea what your data looks like.
> How many users do you have?  How many documents per user?  Are any
> documents
> shared by multiple users?
> 
> -Trey
> 
> 
> 
> On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
> <ma...@yahoo.com>wrote:
> 
>> Hi
>>
>>
>>
>> Shall I set up Multiple Core or Single core for the following use case:
>>
>>
>>
>> I have X number of users.
>>
>>
>>
>> When I do a search, I always know for which user I am doing a search
>>
>>
>>
>> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and
>> add
>> a userId field to each document?
>>
>>
>>
>> If I choose the 1 core solution then I am concerned with performance.
>> Let's say I search for "NewYork" ... If lucene returns all "New York"
>> matches for all users and then filters based on the userId, then this
>> is going to be less efficient than if I have sharded per user and send
>> the request for "New York" to the user's core
>>
>>
>>
>> Thank you for your help
>>
>>
>>
>> matt
>>
>>
>>
>>
>>
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Cores Vs. Single Core for the following use case

Posted by Trey <so...@gmail.com>.

Hi Matt,

In most cases you are going to be better off going with the userid method
unless you have a very small number of users and a very large number of
docs/user. The userid method will likely be much easier to manage, as you
won't have to spin up a new core every time you add a new user.  I would
start here and see if the performance is good enough for your requirements
before you start worrying about it not being efficient.

That being said, I really don't have any idea what your data looks like.
How many users do you have?  How many documents per user?  Are any documents
shared by multiple users?

-Trey

On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
<ma...@yahoo.com>wrote:

> Hi
>
>
>
> Shall I set up Multiple Core or Single core for the following use case:
>
>
>
> I have X number of users.
>
>
>
> When I do a search, I always know for which user I am doing a search
>
>
>
> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add
> a userId field to each document?
>
>
>
> If I choose the 1 core solution then I am concerned with performance.
> Let's say I search for "NewYork" ... If lucene returns all "New York"
> matches for all users and then filters based on the userId, then this
> is going to be less efficient than if I have sharded per user and send
> the request for "New York" to the user's core
>
>
>
> Thank you for your help
>
>
>
> matt
>
>
>
>
>
>
>