You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alireza Salimi <al...@gmail.com> on 2012/03/09 19:24:32 UTC

Lucene vs Solr design decision

Hi everybody,

Let's say we have a system with billions of small documents (average of 2-3
fields).
and each document belongs to JUST ONE user
and searches are user specific, meaning that when we search
for something, we just look into documents of that user.

On the other hand we need to see the newly added documents
as soon as they are added to the indexes.

Now I think we have two solutions:
1. Use Lucene directly and create a separate index file for each user
2. Use Solr and store all of the users' data all together in one HUGE index
file

the benefit of using Lucene is that each commit() will take less time
comparing to the case that we use Solr.

Is there any suggested solution for cases like this?

Thanks

-- 
Alireza Salimi
Java EE Developer

Re: Lucene vs Solr design decision

Posted by Alireza Salimi <al...@gmail.com>.
probably, and besides that, how can I use the features that SolrCloud
provides (i.e. high availability and distribution)?

The other solution would be to use SolrCloud and keep all of the users'
information in single collection and use NRT. But on the other hand
the frequency of updates on that big collection will be high.

Do you think it makes sense?

On Fri, Mar 9, 2012 at 2:02 PM, Glen Newton <gl...@gmail.com> wrote:

> millions of cores will not work...
> ...yet.
>
> -glen
>
> On Fri, Mar 9, 2012 at 1:46 PM, Lan <du...@gmail.com> wrote:
> > Solr has no limitation on the number of cores. It's limited by your
> hardware,
> > inodes and how many files you could keep open.
> >
> > I think even if you went the Lucene route you would run into same
> hardware
> > limits.
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
> -
> http://zzzoot.blogspot.com/
> -
>



-- 
Alireza Salimi
Java EE Developer

Re: Lucene vs Solr design decision

Posted by Alireza Salimi <al...@gmail.com>.
On the other hand, I'm aware of the fact that if I go with Lucene approach,
failover is something that I will have to support manually! which is a
nightmare!

On Fri, Mar 9, 2012 at 2:13 PM, Alireza Salimi <al...@gmail.com>wrote:

> This solution makes sense, but I still don't know if I can use solrCloud
> with
> this configuration or not.
>
> On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart <bs...@gmail.com>wrote:
>
>> Split up index into say 100 cores, and then route each search to a
>> specific core by some mod operator on the user id:
>>
>> core_number = userid % num_cores
>>
>> core_name = "core"+core_number
>>
>> That way each index core is relatively small (maybe 100 million docs or
>> less).
>>
>>
>> On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:
>>
>> > millions of cores will not work...
>> > ...yet.
>> >
>> > -glen
>> >
>> > On Fri, Mar 9, 2012 at 1:46 PM, Lan <du...@gmail.com> wrote:
>> >> Solr has no limitation on the number of cores. It's limited by your
>> hardware,
>> >> inodes and how many files you could keep open.
>> >>
>> >> I think even if you went the Lucene route you would run into same
>> hardware
>> >> limits.
>> >>
>> >> --
>> >> View this message in context:
>> http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>> >
>> >
>> > --
>> > -
>> > http://zzzoot.blogspot.com/
>> > -
>>
>>
>
>
> --
> Alireza Salimi
> Java EE Developer
>
>
>


-- 
Alireza Salimi
Java EE Developer

Re: Lucene vs Solr design decision

Posted by Alireza Salimi <al...@gmail.com>.
This solution makes sense, but I still don't know if I can use solrCloud
with
this configuration or not.

On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart <bs...@gmail.com>wrote:

> Split up index into say 100 cores, and then route each search to a
> specific core by some mod operator on the user id:
>
> core_number = userid % num_cores
>
> core_name = "core"+core_number
>
> That way each index core is relatively small (maybe 100 million docs or
> less).
>
>
> On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:
>
> > millions of cores will not work...
> > ...yet.
> >
> > -glen
> >
> > On Fri, Mar 9, 2012 at 1:46 PM, Lan <du...@gmail.com> wrote:
> >> Solr has no limitation on the number of cores. It's limited by your
> hardware,
> >> inodes and how many files you could keep open.
> >>
> >> I think even if you went the Lucene route you would run into same
> hardware
> >> limits.
> >>
> >> --
> >> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
> > --
> > -
> > http://zzzoot.blogspot.com/
> > -
>
>


-- 
Alireza Salimi
Java EE Developer

Re: Lucene vs Solr design decision

Posted by William Bell <bi...@gmail.com>.
Great answer Robert.

On Fri, Mar 9, 2012 at 12:06 PM, Robert Stewart <bs...@gmail.com> wrote:
> Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id:
>
> core_number = userid % num_cores
>
> core_name = "core"+core_number
>
> That way each index core is relatively small (maybe 100 million docs or less).
>
>
> On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:
>
>> millions of cores will not work...
>> ...yet.
>>
>> -glen
>>
>> On Fri, Mar 9, 2012 at 1:46 PM, Lan <du...@gmail.com> wrote:
>>> Solr has no limitation on the number of cores. It's limited by your hardware,
>>> inodes and how many files you could keep open.
>>>
>>> I think even if you went the Lucene route you would run into same hardware
>>> limits.
>>>
>>> --
>>> View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>> --
>> -
>> http://zzzoot.blogspot.com/
>> -
>



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Re: Lucene vs Solr design decision

Posted by Robert Stewart <bs...@gmail.com>.
Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id:

core_number = userid % num_cores

core_name = "core"+core_number

That way each index core is relatively small (maybe 100 million docs or less).


On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:

> millions of cores will not work...
> ...yet.
> 
> -glen
> 
> On Fri, Mar 9, 2012 at 1:46 PM, Lan <du...@gmail.com> wrote:
>> Solr has no limitation on the number of cores. It's limited by your hardware,
>> inodes and how many files you could keep open.
>> 
>> I think even if you went the Lucene route you would run into same hardware
>> limits.
>> 
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> -- 
> -
> http://zzzoot.blogspot.com/
> -


Re: Lucene vs Solr design decision

Posted by Glen Newton <gl...@gmail.com>.
millions of cores will not work...
...yet.

-glen

On Fri, Mar 9, 2012 at 1:46 PM, Lan <du...@gmail.com> wrote:
> Solr has no limitation on the number of cores. It's limited by your hardware,
> inodes and how many files you could keep open.
>
> I think even if you went the Lucene route you would run into same hardware
> limits.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
-
http://zzzoot.blogspot.com/
-

Re: Lucene vs Solr design decision

Posted by Lan <du...@gmail.com>.
Solr has no limitation on the number of cores. It's limited by your hardware,
inodes and how many files you could keep open.

I think even if you went the Lucene route you would run into same hardware
limits.

--
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Lucene vs Solr design decision

Posted by Alireza Salimi <al...@gmail.com>.
Sorry I didn't mention that, the number of users can be millions!
Meaning that millions of cores! So I'm not sure if it's a good idea.

On Fri, Mar 9, 2012 at 1:35 PM, Lan <du...@gmail.com> wrote:

> Solr has cores which are independent search indexes. You could create a
> separate core per user.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Alireza Salimi
Java EE Developer

Re: Lucene vs Solr design decision

Posted by Lan <du...@gmail.com>.
Solr has cores which are independent search indexes. You could create a
separate core per user. 

--
View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html
Sent from the Solr - User mailing list archive at Nabble.com.