You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mike anderson <sa...@gmail.com> on 2010/10/21 20:53:40 UTC

how well does multicore scale?

I'm exploring the possibility of using cores as a solution to "bookmark
folders" in my solr application. This would mean I'll need tens of thousands
of cores... does this seem reasonable? I have plenty of CPUs available for
scaling, but I wonder about the memory overhead of adding cores (aside from
needing to fit the new index in memory).

Thoughts?

-mike

Re: how well does multicore scale?

Posted by Tharindu Mathew <mc...@gmail.com>.
On Fri, Oct 22, 2010 at 11:18 AM, Lance Norskog <go...@gmail.com> wrote:
> There is an API now for dynamically loading, unloading, creating and
> deleting cores.
> Restarting a Solr with thousands of cores will take, I don't know, hours.
>
Is this in the trunk? Any docs available?
> On Thu, Oct 21, 2010 at 10:44 PM, Tharindu Mathew <mc...@gmail.com> wrote:
>> Hi Mike,
>>
>> I've also considered using a separate cores in a multi tenant
>> application, ie a separate core for each tenant/domain. But the cores
>> do not suit that purpose.
>>
>> If you check out documentation no real API support exists for this so
>> it can be done dynamically through SolrJ. And all use cases I found,
>> only had users configuring it statically and then using it. That was
>> maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
>>
>> So your better off using a single index and with a user id and use a
>> query filter with the user id when fetching data.
>>
>> On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind <ro...@jhu.edu> wrote:
>>> No, it does not seem reasonable.  Why do you think you need a seperate core
>>> for every user?
>>> mike anderson wrote:
>>>>
>>>> I'm exploring the possibility of using cores as a solution to "bookmark
>>>> folders" in my solr application. This would mean I'll need tens of
>>>> thousands
>>>> of cores... does this seem reasonable? I have plenty of CPUs available for
>>>> scaling, but I wonder about the memory overhead of adding cores (aside
>>>> from
>>>> needing to fit the new index in memory).
>>>>
>>>> Thoughts?
>>>>
>>>> -mike
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Tharindu
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Regards,

Tharindu

Re: how well does multicore scale?

Posted by Lance Norskog <go...@gmail.com>.
There is an API now for dynamically loading, unloading, creating and
deleting cores.
Restarting a Solr with thousands of cores will take, I don't know, hours.

On Thu, Oct 21, 2010 at 10:44 PM, Tharindu Mathew <mc...@gmail.com> wrote:
> Hi Mike,
>
> I've also considered using a separate cores in a multi tenant
> application, ie a separate core for each tenant/domain. But the cores
> do not suit that purpose.
>
> If you check out documentation no real API support exists for this so
> it can be done dynamically through SolrJ. And all use cases I found,
> only had users configuring it statically and then using it. That was
> maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
>
> So your better off using a single index and with a user id and use a
> query filter with the user id when fetching data.
>
> On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind <ro...@jhu.edu> wrote:
>> No, it does not seem reasonable.  Why do you think you need a seperate core
>> for every user?
>> mike anderson wrote:
>>>
>>> I'm exploring the possibility of using cores as a solution to "bookmark
>>> folders" in my solr application. This would mean I'll need tens of
>>> thousands
>>> of cores... does this seem reasonable? I have plenty of CPUs available for
>>> scaling, but I wonder about the memory overhead of adding cores (aside
>>> from
>>> needing to fit the new index in memory).
>>>
>>> Thoughts?
>>>
>>> -mike
>>>
>>>
>>
>
>
>
> --
> Regards,
>
> Tharindu
>



-- 
Lance Norskog
goksron@gmail.com

Re: how well does multicore scale?

Posted by Lance Norskog <go...@gmail.com>.
Creating a unique id for a schema is one of those design tasks:

http://wiki.apache.org/solr/UniqueKey

A marvelously lucid and well-written page, if I do say so. And I do.

On Tue, Oct 26, 2010 at 10:16 PM, Tharindu Mathew <mc...@gmail.com> wrote:
> Really great to know you were able to fire up about 100 cores. But,
> when it scales up to around 1000 or even more. I wonder how it would
> perform.
>
> I have a question regarding ids i.e. the unique key. Since there is a
> potential use case that two users might add the same document, how
> would we set the id. I was thinking of appending the user id to the an
> id I would use ex: "/system/bar.pdfuserid25". Otherwise, solr would
> replace the document of one user, which is not what we want.
>
> This is also applicable to deleteById. Is there a better way to do this?
>
> On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind <ro...@jhu.edu> wrote:
>> mike anderson wrote:
>>>
>>> I'm really curious if there is a clever solution to the obvious problem
>>> with: "So your better off using a single index and with a user id and use
>>> a query filter with the user id when fetching data.", i.e.. when you have
>>> hundreds of thousands of user IDs tagged on each article. That just
>>> doesn't
>>> sound like it scales very well..
>>>
>>
>> Actually, I think that design would scale pretty fine, I don't think there's
>> an 'obvious' problem. You store your userIDs in a multi-valued field (or as
>> multiple terms in a single value, ends up being similar). You fq on there
>> with the current userID.   There's one way to find out of course, but that
>> doesn't seem a patently ridiculous scenario or anything, that's the kind of
>> thing Solr is generally good at, it's what it's built for.   The problem
>> might actually be in the time it takes to add such a document to the index;
>> but not in query time.
>>
>> Doesn't mean it's the best solution for your problem though, I can't say.
>>
>> My impression is that Solr in general isn't really designed to support the
>> kind of multi-tenancy use case people are talking about lately.  So trying
>> to make it work anyway... if multi-cores work for you, then great, but be
>> aware they weren't really designed for that (having thousands of cores) and
>> may not. If a single index can work for you instead, great, but as you've
>> discovered it's not neccesarily obvious how to set up the schema to do what
>> you need -- really this applies to Solr in general, unlike an rdbms where
>> you just third-form-normalize everything and figure it'll work for almost
>> any use case that comes up,  in Solr you generally need to custom fit the
>> schema for your particular use cases, sometimes being kind of clever to
>> figure out the optimal way to do that.
>>
>> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
>> index takes more intellectual work than setting up an rdbms. The trade off
>> is you get speed, and flexible ways to set up relevancy (that still perform
>> well). Took a couple decades for rdbms to get as brainless to use as they
>> are, maybe in a couple more we'll have figured out ways to make indexing
>> engines like solr equally brainless, but not yet -- but it's still pretty
>> damn easy for what it is, the lucene/Solr folks have done a remarkable job.
>>
>
>
>
> --
> Regards,
>
> Tharindu
>



-- 
Lance Norskog
goksron@gmail.com

Re: how well does multicore scale?

Posted by Tharindu Mathew <mc...@gmail.com>.
Really great to know you were able to fire up about 100 cores. But,
when it scales up to around 1000 or even more. I wonder how it would
perform.

I have a question regarding ids i.e. the unique key. Since there is a
potential use case that two users might add the same document, how
would we set the id. I was thinking of appending the user id to the an
id I would use ex: "/system/bar.pdfuserid25". Otherwise, solr would
replace the document of one user, which is not what we want.

This is also applicable to deleteById. Is there a better way to do this?

On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind <ro...@jhu.edu> wrote:
> mike anderson wrote:
>>
>> I'm really curious if there is a clever solution to the obvious problem
>> with: "So your better off using a single index and with a user id and use
>> a query filter with the user id when fetching data.", i.e.. when you have
>> hundreds of thousands of user IDs tagged on each article. That just
>> doesn't
>> sound like it scales very well..
>>
>
> Actually, I think that design would scale pretty fine, I don't think there's
> an 'obvious' problem. You store your userIDs in a multi-valued field (or as
> multiple terms in a single value, ends up being similar). You fq on there
> with the current userID.   There's one way to find out of course, but that
> doesn't seem a patently ridiculous scenario or anything, that's the kind of
> thing Solr is generally good at, it's what it's built for.   The problem
> might actually be in the time it takes to add such a document to the index;
> but not in query time.
>
> Doesn't mean it's the best solution for your problem though, I can't say.
>
> My impression is that Solr in general isn't really designed to support the
> kind of multi-tenancy use case people are talking about lately.  So trying
> to make it work anyway... if multi-cores work for you, then great, but be
> aware they weren't really designed for that (having thousands of cores) and
> may not. If a single index can work for you instead, great, but as you've
> discovered it's not neccesarily obvious how to set up the schema to do what
> you need -- really this applies to Solr in general, unlike an rdbms where
> you just third-form-normalize everything and figure it'll work for almost
> any use case that comes up,  in Solr you generally need to custom fit the
> schema for your particular use cases, sometimes being kind of clever to
> figure out the optimal way to do that.
>
> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
> index takes more intellectual work than setting up an rdbms. The trade off
> is you get speed, and flexible ways to set up relevancy (that still perform
> well). Took a couple decades for rdbms to get as brainless to use as they
> are, maybe in a couple more we'll have figured out ways to make indexing
> engines like solr equally brainless, but not yet -- but it's still pretty
> damn easy for what it is, the lucene/Solr folks have done a remarkable job.
>



-- 
Regards,

Tharindu

Re: how well does multicore scale?

Posted by Dennis Gearon <ge...@sbcglobal.net>.
This is why using 'groups' as intermidiary permission objects came into existence in databases.

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Wed, 10/27/10, mike anderson <sa...@gmail.com> wrote:

> From: mike anderson <sa...@gmail.com>
> Subject: Re: how well does multicore scale?
> To: solr-user@lucene.apache.org
> Date: Wednesday, October 27, 2010, 5:20 AM
> Tagging every document with a few
> hundred thousand 6 character user-ids
> would  increase the document size by two orders of
> magnitude. I can't
> imagine why this wouldn't mean the index would increase by
> just as much
> (though I really don't know much about that file
> structure). By my simple
> math, this would mean that if we want each shard's index to
> be able to fit
> in memory, then (even with some beefy servers) each query
> would have to go
> out to a few thousand shards (as opposed to 21 if we used
> the MultiCore
> approach). This means the typical response time would be
> much slower.
> 
> 
> -mike
> 
> On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind <ro...@jhu.edu>wrote:
> 
> > mike anderson wrote:
> >
> >> I'm really curious if there is a clever solution
> to the obvious problem
> >> with: "So your better off using a single index and
> with a user id and use
> >> a query filter with the user id when fetching
> data.", i.e.. when you have
> >> hundreds of thousands of user IDs tagged on each
> article. That just
> >> doesn't
> >> sound like it scales very well..
> >>
> >>
> > Actually, I think that design would scale pretty fine,
> I don't think
> > there's an 'obvious' problem. You store your userIDs
> in a multi-valued field
> > (or as multiple terms in a single value, ends up being
> similar). You fq on
> > there with the current
> userID.   There's one way to find out of
> course, but
> > that doesn't seem a patently ridiculous scenario or
> anything, that's the
> > kind of thing Solr is generally good at, it's what
> it's built for.   The
> > problem might actually be in the time it takes to add
> such a document to the
> > index; but not in query time.
> >
> > Doesn't mean it's the best solution for your problem
> though, I can't say.
> >
> > My impression is that Solr in general isn't really
> designed to support the
> > kind of multi-tenancy use case people are talking
> about lately.  So trying
> > to make it work anyway... if multi-cores work for you,
> then great, but be
> > aware they weren't really designed for that (having
> thousands of cores) and
> > may not. If a single index can work for you instead,
> great, but as you've
> > discovered it's not neccesarily obvious how to set up
> the schema to do what
> > you need -- really this applies to Solr in general,
> unlike an rdbms where
> > you just third-form-normalize everything and figure
> it'll work for almost
> > any use case that comes up,  in Solr you
> generally need to custom fit the
> > schema for your particular use cases, sometimes being
> kind of clever to
> > figure out the optimal way to do that.
> >
> > This is, I'd argue/agree, indeed kind of a
> disadvantage, setting up a Solr
> > index takes more intellectual work than setting up an
> rdbms. The trade off
> > is you get speed, and flexible ways to set up
> relevancy (that still perform
> > well). Took a couple decades for rdbms to get as
> brainless to use as they
> > are, maybe in a couple more we'll have figured out
> ways to make indexing
> > engines like solr equally brainless, but not yet --
> but it's still pretty
> > damn easy for what it is, the lucene/Solr folks have
> done a remarkable job.
> >
> 

Re: how well does multicore scale?

Posted by Tharindu Mathew <mc...@gmail.com>.
Hi mike,

I think I wasn't clear,

Each document will only be tagged with one user_id, or to be specific
one tenant_id. Users of the same tenant can't upload the same document
to the same path.

So I use this to make the key unique for each tenant. So I can index,
delete without a problem.

On Wed, Oct 27, 2010 at 5:50 PM, mike anderson <sa...@gmail.com> wrote:
> Tagging every document with a few hundred thousand 6 character user-ids
> would  increase the document size by two orders of magnitude. I can't
> imagine why this wouldn't mean the index would increase by just as much
> (though I really don't know much about that file structure). By my simple
> math, this would mean that if we want each shard's index to be able to fit
> in memory, then (even with some beefy servers) each query would have to go
> out to a few thousand shards (as opposed to 21 if we used the MultiCore
> approach). This means the typical response time would be much slower.
>
>
> -mike
>
> On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind <ro...@jhu.edu>wrote:
>
>> mike anderson wrote:
>>
>>> I'm really curious if there is a clever solution to the obvious problem
>>> with: "So your better off using a single index and with a user id and use
>>> a query filter with the user id when fetching data.", i.e.. when you have
>>> hundreds of thousands of user IDs tagged on each article. That just
>>> doesn't
>>> sound like it scales very well..
>>>
>>>
>> Actually, I think that design would scale pretty fine, I don't think
>> there's an 'obvious' problem. You store your userIDs in a multi-valued field
>> (or as multiple terms in a single value, ends up being similar). You fq on
>> there with the current userID.   There's one way to find out of course, but
>> that doesn't seem a patently ridiculous scenario or anything, that's the
>> kind of thing Solr is generally good at, it's what it's built for.   The
>> problem might actually be in the time it takes to add such a document to the
>> index; but not in query time.
>>
>> Doesn't mean it's the best solution for your problem though, I can't say.
>>
>> My impression is that Solr in general isn't really designed to support the
>> kind of multi-tenancy use case people are talking about lately.  So trying
>> to make it work anyway... if multi-cores work for you, then great, but be
>> aware they weren't really designed for that (having thousands of cores) and
>> may not. If a single index can work for you instead, great, but as you've
>> discovered it's not neccesarily obvious how to set up the schema to do what
>> you need -- really this applies to Solr in general, unlike an rdbms where
>> you just third-form-normalize everything and figure it'll work for almost
>> any use case that comes up,  in Solr you generally need to custom fit the
>> schema for your particular use cases, sometimes being kind of clever to
>> figure out the optimal way to do that.
>>
>> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
>> index takes more intellectual work than setting up an rdbms. The trade off
>> is you get speed, and flexible ways to set up relevancy (that still perform
>> well). Took a couple decades for rdbms to get as brainless to use as they
>> are, maybe in a couple more we'll have figured out ways to make indexing
>> engines like solr equally brainless, but not yet -- but it's still pretty
>> damn easy for what it is, the lucene/Solr folks have done a remarkable job.
>>
>



-- 
Regards,

Tharindu

RE: how well does multicore scale?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
mike anderson [saidtherobot@gmail.com] wrote:
> That's a great point. If SSDs are sufficient, then what does the "Index size
> vs Response time" curve look like? Since that would dictate the number
> of machines needed. I took a look at 
> http://wiki.apache.org/solr/SolrPerformanceData but only one use case
> seemed comparable.

I generally find it very hard to compare acrosse setups. Looking at SolrPerformanceData for example, we see that CNET Shopper has a very poor resposetime/size ratio, while HathiTrust is a lot better. This is not too surprising as CNET seems to use quite advanced searching where HathiTrust's is more simple, but it does illustrate that comparisons are not easy.

However, as long as I/O has been identified as the main bottleneck for a given setup, relative gains from different storage back ends should be fairly comparable across setups. We did some work on storage testing with Lucene two years ago (see the I-wish-I-had-the-time-to-update-this page at http://wiki.statsbiblioteket.dk/summa/Hardware), but unfortunately we did very little testing on scaling over index size.

...

I just digged out some old measurements that says a little bit: We tried changing the size of out index (by deleting every X document and optimizing) and performing 350K queries with extraction of 2 or 3 fairly small fields for the first 20 hits from each. The machine was capped at 4GB of RAM. I am fairly certain the searcher was single threaded and there were no web-services involved, so this is very raw Lucene speed:
4GB index: 626 queries/second
9GB index: 405 queries/second
17GB index: 205 queries/second
26GB index: 188 queries/second
Not a lot of measurement points and I wish I had data for larger index sizes, as it seems that the curve is flattening quite drastically at the end. Graph at
http://www.mathcracker.com/scatterplotimage.php?datax=4,9,17,26&datay=626,405,205,188&namex=Index%20size%20in%20GB&namey=queries/second&titl=SSD%20scaling%20performance%20with%20Lucene

> We currently have about 25M docs, split into 18 shards, with a
> total index size of about 120GB. If index size has truly little
> impact on performance then perhaps tagging articles with user
> IDs is a better way to approach my use case.

I don't know your budget, but do consider buying a single 160GB Intel X25-M or one of the new 256GB SandForce-based SSDs for testing. If it does not deliver what you hoped for, you'll be happy to put it in your workstation.

It would be nice if there were some sort of corpus generator that generated Zipfian-distributed data and sample queries so that we could do large scale testing on different hardware without having to share sample data.

Regards,
Toke Eskildsen

Re: how well does multicore scale?

Posted by mike anderson <sa...@gmail.com>.
That's a great point. If SSDs are sufficient, then what does the "Index size
vs Response time" curve look like? Since that would dictate the number of
machines needed. I took a look at
http://wiki.apache.org/solr/SolrPerformanceData but only one use case seemed
comparable. We currently have about 25M docs, split into 18 shards, with a
total index size of about 120GB. If index size has truly little impact on
performance then perhaps tagging articles with user IDs is a better way to
approach my use case.

-Mike



On Wed, Oct 27, 2010 at 9:45 AM, Toke Eskildsen <te...@statsbiblioteket.dk>wrote:

> On Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote:
> > [...] By my simple math, this would mean that if we want each shard's
> > index to be able to fit in memory, [...]
>
> Might I ask why you're planning on using memory-based sharding? The
> performance gap between memory and SSDs is not very big so using memory
> to get those last queries/second is quite expensive.
>
>

Re: how well does multicore scale?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote:
> [...] By my simple math, this would mean that if we want each shard's
> index to be able to fit in memory, [...]

Might I ask why you're planning on using memory-based sharding? The
performance gap between memory and SSDs is not very big so using memory
to get those last queries/second is quite expensive.


Re: how well does multicore scale?

Posted by mike anderson <sa...@gmail.com>.
Tagging every document with a few hundred thousand 6 character user-ids
would  increase the document size by two orders of magnitude. I can't
imagine why this wouldn't mean the index would increase by just as much
(though I really don't know much about that file structure). By my simple
math, this would mean that if we want each shard's index to be able to fit
in memory, then (even with some beefy servers) each query would have to go
out to a few thousand shards (as opposed to 21 if we used the MultiCore
approach). This means the typical response time would be much slower.


-mike

On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind <ro...@jhu.edu>wrote:

> mike anderson wrote:
>
>> I'm really curious if there is a clever solution to the obvious problem
>> with: "So your better off using a single index and with a user id and use
>> a query filter with the user id when fetching data.", i.e.. when you have
>> hundreds of thousands of user IDs tagged on each article. That just
>> doesn't
>> sound like it scales very well..
>>
>>
> Actually, I think that design would scale pretty fine, I don't think
> there's an 'obvious' problem. You store your userIDs in a multi-valued field
> (or as multiple terms in a single value, ends up being similar). You fq on
> there with the current userID.   There's one way to find out of course, but
> that doesn't seem a patently ridiculous scenario or anything, that's the
> kind of thing Solr is generally good at, it's what it's built for.   The
> problem might actually be in the time it takes to add such a document to the
> index; but not in query time.
>
> Doesn't mean it's the best solution for your problem though, I can't say.
>
> My impression is that Solr in general isn't really designed to support the
> kind of multi-tenancy use case people are talking about lately.  So trying
> to make it work anyway... if multi-cores work for you, then great, but be
> aware they weren't really designed for that (having thousands of cores) and
> may not. If a single index can work for you instead, great, but as you've
> discovered it's not neccesarily obvious how to set up the schema to do what
> you need -- really this applies to Solr in general, unlike an rdbms where
> you just third-form-normalize everything and figure it'll work for almost
> any use case that comes up,  in Solr you generally need to custom fit the
> schema for your particular use cases, sometimes being kind of clever to
> figure out the optimal way to do that.
>
> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
> index takes more intellectual work than setting up an rdbms. The trade off
> is you get speed, and flexible ways to set up relevancy (that still perform
> well). Took a couple decades for rdbms to get as brainless to use as they
> are, maybe in a couple more we'll have figured out ways to make indexing
> engines like solr equally brainless, but not yet -- but it's still pretty
> damn easy for what it is, the lucene/Solr folks have done a remarkable job.
>

Re: how well does multicore scale?

Posted by Jonathan Rochkind <ro...@jhu.edu>.
mike anderson wrote:
> I'm really curious if there is a clever solution to the obvious problem
> with: "So your better off using a single index and with a user id and use
> a query filter with the user id when fetching data.", i.e.. when you have
> hundreds of thousands of user IDs tagged on each article. That just doesn't
> sound like it scales very well..
>   
Actually, I think that design would scale pretty fine, I don't think 
there's an 'obvious' problem. You store your userIDs in a multi-valued 
field (or as multiple terms in a single value, ends up being similar). 
You fq on there with the current userID.   There's one way to find out 
of course, but that doesn't seem a patently ridiculous scenario or 
anything, that's the kind of thing Solr is generally good at, it's what 
it's built for.   The problem might actually be in the time it takes to 
add such a document to the index; but not in query time.

Doesn't mean it's the best solution for your problem though, I can't say.

My impression is that Solr in general isn't really designed to support 
the kind of multi-tenancy use case people are talking about lately.  So 
trying to make it work anyway... if multi-cores work for you, then 
great, but be aware they weren't really designed for that (having 
thousands of cores) and may not. If a single index can work for you 
instead, great, but as you've discovered it's not neccesarily obvious 
how to set up the schema to do what you need -- really this applies to 
Solr in general, unlike an rdbms where you just third-form-normalize 
everything and figure it'll work for almost any use case that comes up,  
in Solr you generally need to custom fit the schema for your particular 
use cases, sometimes being kind of clever to figure out the optimal way 
to do that.

This is, I'd argue/agree, indeed kind of a disadvantage, setting up a 
Solr index takes more intellectual work than setting up an rdbms. The 
trade off is you get speed, and flexible ways to set up relevancy (that 
still perform well). Took a couple decades for rdbms to get as brainless 
to use as they are, maybe in a couple more we'll have figured out ways 
to make indexing engines like solr equally brainless, but not yet -- but 
it's still pretty damn easy for what it is, the lucene/Solr folks have 
done a remarkable job.

Re: how well does multicore scale?

Posted by mike anderson <sa...@gmail.com>.
So I fired up about 100 cores and used JMeter to fire off a few thousand
queries. It looks like the memory usage isn't much worse than running a
single shard. So thats good.

I'm really curious if there is a clever solution to the obvious problem
with: "So your better off using a single index and with a user id and use
a query filter with the user id when fetching data.", i.e.. when you have
hundreds of thousands of user IDs tagged on each article. That just doesn't
sound like it scales very well..


Cheers,
Mike


On Fri, Oct 22, 2010 at 10:43 PM, Lance Norskog <go...@gmail.com> wrote:

> http://wiki.apache.org/solr/CoreAdmin
>
> Since Solr 1.3
>
> On Fri, Oct 22, 2010 at 1:40 PM, mike anderson <sa...@gmail.com>
> wrote:
> > Thanks for the advice, everyone. I'll take a look at the API mentioned
> and
> > do some benchmarking over the weekend.
> >
> > -Mike
> >
> >
> > On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller <ma...@gmail.com>
> wrote:
> >
> >> On 10/22/10 1:44 AM, Tharindu Mathew wrote:
> >> > Hi Mike,
> >> >
> >> > I've also considered using a separate cores in a multi tenant
> >> > application, ie a separate core for each tenant/domain. But the cores
> >> > do not suit that purpose.
> >> >
> >> > If you check out documentation no real API support exists for this so
> >> > it can be done dynamically through SolrJ. And all use cases I found,
> >> > only had users configuring it statically and then using it. That was
> >> > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
> >>
> >> You can dynamically manage cores with solrj. See
> >> org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
> >> for a place to start.
> >>
> >> You probably want to turn solr.xml's persist option on so that your
> >> cores survive restarts.
> >>
> >> >
> >> > So your better off using a single index and with a user id and use a
> >> > query filter with the user id when fetching data.
> >>
> >> Many times this is probably the case - pro's and con's to each depending
> >> on what you are up to.
> >>
> >> - Mark
> >> lucidimagination.com
> >>
> >> >
> >> > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind <ro...@jhu.edu>
> >> wrote:
> >> >> No, it does not seem reasonable.  Why do you think you need a
> seperate
> >> core
> >> >> for every user?
> >> >> mike anderson wrote:
> >> >>>
> >> >>> I'm exploring the possibility of using cores as a solution to
> "bookmark
> >> >>> folders" in my solr application. This would mean I'll need tens of
> >> >>> thousands
> >> >>> of cores... does this seem reasonable? I have plenty of CPUs
> available
> >> for
> >> >>> scaling, but I wonder about the memory overhead of adding cores
> (aside
> >> >>> from
> >> >>> needing to fit the new index in memory).
> >> >>>
> >> >>> Thoughts?
> >> >>>
> >> >>> -mike
> >> >>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >>
> >>
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: how well does multicore scale?

Posted by Lance Norskog <go...@gmail.com>.
http://wiki.apache.org/solr/CoreAdmin

Since Solr 1.3

On Fri, Oct 22, 2010 at 1:40 PM, mike anderson <sa...@gmail.com> wrote:
> Thanks for the advice, everyone. I'll take a look at the API mentioned and
> do some benchmarking over the weekend.
>
> -Mike
>
>
> On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller <ma...@gmail.com> wrote:
>
>> On 10/22/10 1:44 AM, Tharindu Mathew wrote:
>> > Hi Mike,
>> >
>> > I've also considered using a separate cores in a multi tenant
>> > application, ie a separate core for each tenant/domain. But the cores
>> > do not suit that purpose.
>> >
>> > If you check out documentation no real API support exists for this so
>> > it can be done dynamically through SolrJ. And all use cases I found,
>> > only had users configuring it statically and then using it. That was
>> > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
>>
>> You can dynamically manage cores with solrj. See
>> org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
>> for a place to start.
>>
>> You probably want to turn solr.xml's persist option on so that your
>> cores survive restarts.
>>
>> >
>> > So your better off using a single index and with a user id and use a
>> > query filter with the user id when fetching data.
>>
>> Many times this is probably the case - pro's and con's to each depending
>> on what you are up to.
>>
>> - Mark
>> lucidimagination.com
>>
>> >
>> > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind <ro...@jhu.edu>
>> wrote:
>> >> No, it does not seem reasonable.  Why do you think you need a seperate
>> core
>> >> for every user?
>> >> mike anderson wrote:
>> >>>
>> >>> I'm exploring the possibility of using cores as a solution to "bookmark
>> >>> folders" in my solr application. This would mean I'll need tens of
>> >>> thousands
>> >>> of cores... does this seem reasonable? I have plenty of CPUs available
>> for
>> >>> scaling, but I wonder about the memory overhead of adding cores (aside
>> >>> from
>> >>> needing to fit the new index in memory).
>> >>>
>> >>> Thoughts?
>> >>>
>> >>> -mike
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: how well does multicore scale?

Posted by mike anderson <sa...@gmail.com>.
Thanks for the advice, everyone. I'll take a look at the API mentioned and
do some benchmarking over the weekend.

-Mike


On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller <ma...@gmail.com> wrote:

> On 10/22/10 1:44 AM, Tharindu Mathew wrote:
> > Hi Mike,
> >
> > I've also considered using a separate cores in a multi tenant
> > application, ie a separate core for each tenant/domain. But the cores
> > do not suit that purpose.
> >
> > If you check out documentation no real API support exists for this so
> > it can be done dynamically through SolrJ. And all use cases I found,
> > only had users configuring it statically and then using it. That was
> > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
>
> You can dynamically manage cores with solrj. See
> org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
> for a place to start.
>
> You probably want to turn solr.xml's persist option on so that your
> cores survive restarts.
>
> >
> > So your better off using a single index and with a user id and use a
> > query filter with the user id when fetching data.
>
> Many times this is probably the case - pro's and con's to each depending
> on what you are up to.
>
> - Mark
> lucidimagination.com
>
> >
> > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind <ro...@jhu.edu>
> wrote:
> >> No, it does not seem reasonable.  Why do you think you need a seperate
> core
> >> for every user?
> >> mike anderson wrote:
> >>>
> >>> I'm exploring the possibility of using cores as a solution to "bookmark
> >>> folders" in my solr application. This would mean I'll need tens of
> >>> thousands
> >>> of cores... does this seem reasonable? I have plenty of CPUs available
> for
> >>> scaling, but I wonder about the memory overhead of adding cores (aside
> >>> from
> >>> needing to fit the new index in memory).
> >>>
> >>> Thoughts?
> >>>
> >>> -mike
> >>>
> >>>
> >>
> >
> >
> >
>
>

Re: how well does multicore scale?

Posted by Mark Miller <ma...@gmail.com>.
On 10/22/10 1:44 AM, Tharindu Mathew wrote:
> Hi Mike,
> 
> I've also considered using a separate cores in a multi tenant
> application, ie a separate core for each tenant/domain. But the cores
> do not suit that purpose.
> 
> If you check out documentation no real API support exists for this so
> it can be done dynamically through SolrJ. And all use cases I found,
> only had users configuring it statically and then using it. That was
> maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.

You can dynamically manage cores with solrj. See
org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
for a place to start.

You probably want to turn solr.xml's persist option on so that your
cores survive restarts.

> 
> So your better off using a single index and with a user id and use a
> query filter with the user id when fetching data.

Many times this is probably the case - pro's and con's to each depending
on what you are up to.

- Mark
lucidimagination.com

> 
> On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind <ro...@jhu.edu> wrote:
>> No, it does not seem reasonable.  Why do you think you need a seperate core
>> for every user?
>> mike anderson wrote:
>>>
>>> I'm exploring the possibility of using cores as a solution to "bookmark
>>> folders" in my solr application. This would mean I'll need tens of
>>> thousands
>>> of cores... does this seem reasonable? I have plenty of CPUs available for
>>> scaling, but I wonder about the memory overhead of adding cores (aside
>>> from
>>> needing to fit the new index in memory).
>>>
>>> Thoughts?
>>>
>>> -mike
>>>
>>>
>>
> 
> 
> 


Re: how well does multicore scale?

Posted by Tharindu Mathew <mc...@gmail.com>.
Hi Mike,

I've also considered using a separate cores in a multi tenant
application, ie a separate core for each tenant/domain. But the cores
do not suit that purpose.

If you check out documentation no real API support exists for this so
it can be done dynamically through SolrJ. And all use cases I found,
only had users configuring it statically and then using it. That was
maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.

So your better off using a single index and with a user id and use a
query filter with the user id when fetching data.

On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind <ro...@jhu.edu> wrote:
> No, it does not seem reasonable.  Why do you think you need a seperate core
> for every user?
> mike anderson wrote:
>>
>> I'm exploring the possibility of using cores as a solution to "bookmark
>> folders" in my solr application. This would mean I'll need tens of
>> thousands
>> of cores... does this seem reasonable? I have plenty of CPUs available for
>> scaling, but I wonder about the memory overhead of adding cores (aside
>> from
>> needing to fit the new index in memory).
>>
>> Thoughts?
>>
>> -mike
>>
>>
>



-- 
Regards,

Tharindu

Re: how well does multicore scale?

Posted by Jonathan Rochkind <ro...@jhu.edu>.
No, it does not seem reasonable.  Why do you think you need a seperate 
core for every user? 

mike anderson wrote:
> I'm exploring the possibility of using cores as a solution to "bookmark
> folders" in my solr application. This would mean I'll need tens of thousands
> of cores... does this seem reasonable? I have plenty of CPUs available for
> scaling, but I wonder about the memory overhead of adding cores (aside from
> needing to fit the new index in memory).
>
> Thoughts?
>
> -mike
>
>