You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Sharath Jagannath <sh...@gmail.com> on 2012/03/08 19:35:08 UTC

indexing bigdata

Is indexing around 30 Million documents in a single solr instance efficient?
Has somebody experimented it? Planning to use it for an autosuggest feature
I am implementing, so expecting the response in few milliseconds.
Should I be looking at sharding?

Thanks,
Sharath

Re: indexing bigdata

Posted by Robert Stewart <bs...@gmail.com>.

It very much depends on your data and also what query features you will use.  How many fields, the size of each field, how many unique values per field, how many fields are stored vs. only indexed, etc.  I have a system with 3+ billion does, and each instance (each index core) has 120million docs and it flies.  But the documents are tiny only 3 fields each, and the search is very simple single keyword match.  On another system we only have 7 million docs per instance and it is slower because documents are much much larger with many more fields, and we do a lot of faceting and other advanced search features.

Also other factors such as what type of features you will use for search (faceting, field collapsing, wildcard queries, etc.) can all increase search time vs. just simple keyword search.

Unfortunately it is one of those things you need to try it out to really get an answer IMO.

On Mar 8, 2012, at 11:39 PM, Sharath Jagannath wrote:

> Ok, My bad. I should have put it in a better way.
> Is it good idea to have all the 30M docs on a single instance, or should I
> consider distributed set-up.
> I have synthesized the data and the have configured schema and have made
> suitable changes to the config. Have tested out with a smaller data-set on
> my laptop and have a good work flow set-up.
> 
> I do not have a big machine and test it out.
> Wanted to make sure I have insight in either option I have before I decide
> to spin-up an amazon instance.
> 
> Thanks,
> Sharath
> 
> On Thu, Mar 8, 2012 at 6:18 PM, Erick Erickson <er...@gmail.com>wrote:
> 
>> Your question is really unanswerable, there are about a zillion
>> factors that could influence the answer. I can index 5-7K docs/second
>> so it's "efficient". Others can index only a fraction of that. It all
>> depends...
>> 
>> Try it and see is about the only way to answer.
>> 
>> Best
>> Erick
>> 
>> On Thu, Mar 8, 2012 at 1:35 PM, Sharath Jagannath
>> <sh...@gmail.com> wrote:
>>> Is indexing around 30 Million documents in a single solr instance
>> efficient?
>>> Has somebody experimented it? Planning to use it for an autosuggest
>> feature
>>> I am implementing, so expecting the response in few milliseconds.
>>> Should I be looking at sharding?
>>> 
>>> Thanks,
>>> Sharath
>>

Re: indexing bigdata

Posted by Sharath Jagannath <sh...@gmail.com>.

Ok, My bad. I should have put it in a better way.
Is it good idea to have all the 30M docs on a single instance, or should I
consider distributed set-up.
I have synthesized the data and the have configured schema and have made
suitable changes to the config. Have tested out with a smaller data-set on
my laptop and have a good work flow set-up.

I do not have a big machine and test it out.
Wanted to make sure I have insight in either option I have before I decide
to spin-up an amazon instance.

Thanks,
Sharath

On Thu, Mar 8, 2012 at 6:18 PM, Erick Erickson <er...@gmail.com>wrote:

> Your question is really unanswerable, there are about a zillion
> factors that could influence the answer. I can index 5-7K docs/second
> so it's "efficient". Others can index only a fraction of that. It all
> depends...
>
> Try it and see is about the only way to answer.
>
> Best
> Erick
>
> On Thu, Mar 8, 2012 at 1:35 PM, Sharath Jagannath
> <sh...@gmail.com> wrote:
> > Is indexing around 30 Million documents in a single solr instance
> efficient?
> > Has somebody experimented it? Planning to use it for an autosuggest
> feature
> > I am implementing, so expecting the response in few milliseconds.
> > Should I be looking at sharding?
> >
> > Thanks,
> > Sharath
>

Re: indexing bigdata

Posted by Erick Erickson <er...@gmail.com>.

Your question is really unanswerable, there are about a zillion
factors that could influence the answer. I can index 5-7K docs/second
so it's "efficient". Others can index only a fraction of that. It all depends...

Try it and see is about the only way to answer.

Best
Erick

On Thu, Mar 8, 2012 at 1:35 PM, Sharath Jagannath
<sh...@gmail.com> wrote:
> Is indexing around 30 Million documents in a single solr instance efficient?
> Has somebody experimented it? Planning to use it for an autosuggest feature
> I am implementing, so expecting the response in few milliseconds.
> Should I be looking at sharding?
>
> Thanks,
> Sharath