You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Gustavo Falco <co...@gmail.com> on 2011/11/03 19:03:42 UTC

Three questions about: Commit, single index vs multiple indexes and implementation advice

Hi guys!

I have a couple of questions that I hope someone could help me with:

1) Recently I've implemented Solr in my app. My use case is not
complicated. Suppose that there will be 50 concurrent users tops. This is
an app like, let's say, a CRM. I tell you this so you have an idea in terms
of how many read and write operations will be needed. What I do need is
that the data that is added / updated be available right after it's added /
updated (maybe a second later it's ok). I know that the commit operation is
expensive, so maybe doing a commit right after each write operation is not
a good idea. I'm trying to use the autoCommit feature with a maxTime of
1000ms, but then the question arised: Is this the best way to handle this
type of situation? and if not, what should I do?

2) I'm using a single index per entity type because I've read that if the
app is not handling lots of data (let's say, 1 million of records) then
it's "safe" to use a single index. Is this true? if not, why?

3) Is it a problem if I use a simple setup of Solr using a single core for
this use case? if not, what do you recommend?



Any help in any of these topics would be greatly appreciated.

Thanks in advance!

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

Posted by Gustavo Falco <co...@gmail.com>.

Hi Brian,

I'll take a look at what you mentioned. I didn't think about that. I'll
finish the implementation at the app level and then I'll read a little more
about multi-core setups. Maybe I don't know yet all the benefits it has.


Thanks a lot for your advice.

2011/11/4 Brian Gerby <br...@hotmail.com>

>
> Gustavo -
>
> Even with the most basic requirements, I'd recommend setting up a
> multi-core configuration so you can RELOAD the main core you will be using
> when you make simple changes to config files. This is much cleaner than
> bouncing solr each time. There are other benefits to doing it, but this is
> the main reason I do it.
>
> Brian
>
> > Date: Fri, 4 Nov 2011 15:34:27 -0300
> > Subject: Re: Three questions about: Commit, single index vs multiple
> indexes and implementation advice
> > From: comfortablynumb84@gmail.com
> > To: solr-user@lucene.apache.org
> >
> > First of all, thanks a lot for your answer.
> >
> > 1) I could use 5 to 15 seconds between each commit and give it a try. Is
> > this an acceptable configuration? I'll take a look at NRT.
> > 2) Currently I'm using a single core, the simplest setup. I don't expect
> to
> > have an overwhelming quantity of records, but I do have lots of classes
> to
> > persist, and I need to search all of them at the same time, and not per
> > class (entity). For now is working good. With multiple indexes I mean
> using
> > an index for each entity. Let's say, an index for "Articles", another for
> > "Users", etc. The thing is that I don't know when I should divide it and
> > use one index for each entity (or if it's possible to make a "UNION" like
> > search between every index). I've read that when an entity reaches the
> size
> > of one million records then it's best to give it a dedicated index, even
> > though I don't expect to have that size even with all my entities. But I
> > wanted to know from you just to be sure.
> > 3) Great! for now I think I'll stick with one index, but it's good to
> know
> > that in case I need to change later for some reason.
> >
> >
> >
> > Again, thanks a lot for your help!
> >
> > 2011/11/4 Erick Erickson <er...@gmail.com>
> >
> > > Let's see...
> > > 1> Committing every second, even with commitWithin is probably going
> > > to be a problem.
> > >     I usually think that 1 second latency is usually overkill, but
> > > that's up to your
> > >     product manager. Look at the NRT (Near Real Time) stuff if you
> > > really need this.
> > >     I thought that NRT was only on trunk, but it *might* be in the
> > > 3.4 code base.
> > > 2> Don't understand what "a single index per entity" is. How many
> cores do
> > > you
> > >     have total? For not very many records, I'd put everything in a
> > > single index and
> > >     use filterqueries to restrict views.
> > > 3> I guess this relates to <2>. And I'd use a single core. If, for
> > > some reason, you decide
> > >     that you need multiple indexes, use several cores with ONE Solr
> > > rather than start
> > >     a new Solr per core, it's more resource expensive to have
> > > multiple JVMs around.
> > >
> > > Best
> > > Erick
> > >
> > > On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco
> > > <co...@gmail.com> wrote:
> > > > Hi guys!
> > > >
> > > > I have a couple of questions that I hope someone could help me with:
> > > >
> > > > 1) Recently I've implemented Solr in my app. My use case is not
> > > > complicated. Suppose that there will be 50 concurrent users tops.
> This is
> > > > an app like, let's say, a CRM. I tell you this so you have an idea in
> > > terms
> > > > of how many read and write operations will be needed. What I do need
> is
> > > > that the data that is added / updated be available right after it's
> > > added /
> > > > updated (maybe a second later it's ok). I know that the commit
> operation
> > > is
> > > > expensive, so maybe doing a commit right after each write operation
> is
> > > not
> > > > a good idea. I'm trying to use the autoCommit feature with a maxTime
> of
> > > > 1000ms, but then the question arised: Is this the best way to handle
> this
> > > > type of situation? and if not, what should I do?
> > > >
> > > > 2) I'm using a single index per entity type because I've read that
> if the
> > > > app is not handling lots of data (let's say, 1 million of records)
> then
> > > > it's "safe" to use a single index. Is this true? if not, why?
> > > >
> > > > 3) Is it a problem if I use a simple setup of Solr using a single
> core
> > > for
> > > > this use case? if not, what do you recommend?
> > > >
> > > >
> > > >
> > > > Any help in any of these topics would be greatly appreciated.
> > > >
> > > > Thanks in advance!
> > > >
> > >
>
>

RE: Three questions about: Commit, single index vs multiple indexes and implementation advice

Posted by Brian Gerby <br...@hotmail.com>.

Gustavo - 

Even with the most basic requirements, I'd recommend setting up a multi-core configuration so you can RELOAD the main core you will be using when you make simple changes to config files. This is much cleaner than bouncing solr each time. There are other benefits to doing it, but this is the main reason I do it.  

Brian 

> Date: Fri, 4 Nov 2011 15:34:27 -0300
> Subject: Re: Three questions about: Commit, single index vs multiple indexes and implementation advice
> From: comfortablynumb84@gmail.com
> To: solr-user@lucene.apache.org
> 
> First of all, thanks a lot for your answer.
> 
> 1) I could use 5 to 15 seconds between each commit and give it a try. Is
> this an acceptable configuration? I'll take a look at NRT.
> 2) Currently I'm using a single core, the simplest setup. I don't expect to
> have an overwhelming quantity of records, but I do have lots of classes to
> persist, and I need to search all of them at the same time, and not per
> class (entity). For now is working good. With multiple indexes I mean using
> an index for each entity. Let's say, an index for "Articles", another for
> "Users", etc. The thing is that I don't know when I should divide it and
> use one index for each entity (or if it's possible to make a "UNION" like
> search between every index). I've read that when an entity reaches the size
> of one million records then it's best to give it a dedicated index, even
> though I don't expect to have that size even with all my entities. But I
> wanted to know from you just to be sure.
> 3) Great! for now I think I'll stick with one index, but it's good to know
> that in case I need to change later for some reason.
> 
> 
> 
> Again, thanks a lot for your help!
> 
> 2011/11/4 Erick Erickson <er...@gmail.com>
> 
> > Let's see...
> > 1> Committing every second, even with commitWithin is probably going
> > to be a problem.
> >     I usually think that 1 second latency is usually overkill, but
> > that's up to your
> >     product manager. Look at the NRT (Near Real Time) stuff if you
> > really need this.
> >     I thought that NRT was only on trunk, but it *might* be in the
> > 3.4 code base.
> > 2> Don't understand what "a single index per entity" is. How many cores do
> > you
> >     have total? For not very many records, I'd put everything in a
> > single index and
> >     use filterqueries to restrict views.
> > 3> I guess this relates to <2>. And I'd use a single core. If, for
> > some reason, you decide
> >     that you need multiple indexes, use several cores with ONE Solr
> > rather than start
> >     a new Solr per core, it's more resource expensive to have
> > multiple JVMs around.
> >
> > Best
> > Erick
> >
> > On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco
> > <co...@gmail.com> wrote:
> > > Hi guys!
> > >
> > > I have a couple of questions that I hope someone could help me with:
> > >
> > > 1) Recently I've implemented Solr in my app. My use case is not
> > > complicated. Suppose that there will be 50 concurrent users tops. This is
> > > an app like, let's say, a CRM. I tell you this so you have an idea in
> > terms
> > > of how many read and write operations will be needed. What I do need is
> > > that the data that is added / updated be available right after it's
> > added /
> > > updated (maybe a second later it's ok). I know that the commit operation
> > is
> > > expensive, so maybe doing a commit right after each write operation is
> > not
> > > a good idea. I'm trying to use the autoCommit feature with a maxTime of
> > > 1000ms, but then the question arised: Is this the best way to handle this
> > > type of situation? and if not, what should I do?
> > >
> > > 2) I'm using a single index per entity type because I've read that if the
> > > app is not handling lots of data (let's say, 1 million of records) then
> > > it's "safe" to use a single index. Is this true? if not, why?
> > >
> > > 3) Is it a problem if I use a simple setup of Solr using a single core
> > for
> > > this use case? if not, what do you recommend?
> > >
> > >
> > >
> > > Any help in any of these topics would be greatly appreciated.
> > >
> > > Thanks in advance!
> > >
> >

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

Posted by Gustavo Falco <co...@gmail.com>.

First of all, thanks a lot for your answer.

1) I could use 5 to 15 seconds between each commit and give it a try. Is
this an acceptable configuration? I'll take a look at NRT.
2) Currently I'm using a single core, the simplest setup. I don't expect to
have an overwhelming quantity of records, but I do have lots of classes to
persist, and I need to search all of them at the same time, and not per
class (entity). For now is working good. With multiple indexes I mean using
an index for each entity. Let's say, an index for "Articles", another for
"Users", etc. The thing is that I don't know when I should divide it and
use one index for each entity (or if it's possible to make a "UNION" like
search between every index). I've read that when an entity reaches the size
of one million records then it's best to give it a dedicated index, even
though I don't expect to have that size even with all my entities. But I
wanted to know from you just to be sure.
3) Great! for now I think I'll stick with one index, but it's good to know
that in case I need to change later for some reason.

Again, thanks a lot for your help!

2011/11/4 Erick Erickson <er...@gmail.com>

> Let's see...
> 1> Committing every second, even with commitWithin is probably going
> to be a problem.
>     I usually think that 1 second latency is usually overkill, but
> that's up to your
>     product manager. Look at the NRT (Near Real Time) stuff if you
> really need this.
>     I thought that NRT was only on trunk, but it *might* be in the
> 3.4 code base.
> 2> Don't understand what "a single index per entity" is. How many cores do
> you
>     have total? For not very many records, I'd put everything in a
> single index and
>     use filterqueries to restrict views.
> 3> I guess this relates to <2>. And I'd use a single core. If, for
> some reason, you decide
>     that you need multiple indexes, use several cores with ONE Solr
> rather than start
>     a new Solr per core, it's more resource expensive to have
> multiple JVMs around.
>
> Best
> Erick
>
> On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco
> <co...@gmail.com> wrote:
> > Hi guys!
> >
> > I have a couple of questions that I hope someone could help me with:
> >
> > 1) Recently I've implemented Solr in my app. My use case is not
> > complicated. Suppose that there will be 50 concurrent users tops. This is
> > an app like, let's say, a CRM. I tell you this so you have an idea in
> terms
> > of how many read and write operations will be needed. What I do need is
> > that the data that is added / updated be available right after it's
> added /
> > updated (maybe a second later it's ok). I know that the commit operation
> is
> > expensive, so maybe doing a commit right after each write operation is
> not
> > a good idea. I'm trying to use the autoCommit feature with a maxTime of
> > 1000ms, but then the question arised: Is this the best way to handle this
> > type of situation? and if not, what should I do?
> >
> > 2) I'm using a single index per entity type because I've read that if the
> > app is not handling lots of data (let's say, 1 million of records) then
> > it's "safe" to use a single index. Is this true? if not, why?
> >
> > 3) Is it a problem if I use a simple setup of Solr using a single core
> for
> > this use case? if not, what do you recommend?
> >
> >
> >
> > Any help in any of these topics would be greatly appreciated.
> >
> > Thanks in advance!
> >
>

Re: Three questions about: Commit, single index vs multiple indexes and implementation advice

Posted by Erick Erickson <er...@gmail.com>.

Let's see...
1> Committing every second, even with commitWithin is probably going
to be a problem.
     I usually think that 1 second latency is usually overkill, but
that's up to your
     product manager. Look at the NRT (Near Real Time) stuff if you
really need this.
     I thought that NRT was only on trunk, but it *might* be in the
3.4 code base.
2> Don't understand what "a single index per entity" is. How many cores do you
     have total? For not very many records, I'd put everything in a
single index and
     use filterqueries to restrict views.
3> I guess this relates to <2>. And I'd use a single core. If, for
some reason, you decide
     that you need multiple indexes, use several cores with ONE Solr
rather than start
     a new Solr per core, it's more resource expensive to have
multiple JVMs around.

Best
Erick

On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco
<co...@gmail.com> wrote:
> Hi guys!
>
> I have a couple of questions that I hope someone could help me with:
>
> 1) Recently I've implemented Solr in my app. My use case is not
> complicated. Suppose that there will be 50 concurrent users tops. This is
> an app like, let's say, a CRM. I tell you this so you have an idea in terms
> of how many read and write operations will be needed. What I do need is
> that the data that is added / updated be available right after it's added /
> updated (maybe a second later it's ok). I know that the commit operation is
> expensive, so maybe doing a commit right after each write operation is not
> a good idea. I'm trying to use the autoCommit feature with a maxTime of
> 1000ms, but then the question arised: Is this the best way to handle this
> type of situation? and if not, what should I do?
>
> 2) I'm using a single index per entity type because I've read that if the
> app is not handling lots of data (let's say, 1 million of records) then
> it's "safe" to use a single index. Is this true? if not, why?
>
> 3) Is it a problem if I use a simple setup of Solr using a single core for
> this use case? if not, what do you recommend?
>
>
>
> Any help in any of these topics would be greatly appreciated.
>
> Thanks in advance!
>