You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Silent Surfer <si...@yahoo.com> on 2009/08/05 19:44:27 UTC

Limit of Index size per machine..

Hi ,

We are planning to use Solr for indexing the server log contents.
The expected processed log file size per day: 100 GB
We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB).

Can any one provide what would be the optimal size of the index that I can store on a single server, without hampering the search performance etc.

We are planning to use OSX server with a configuration of 16 GB (Can go to 24 GB).

We need to figure out how many servers are required to handle such amount of data..

Any help would be greatly appreciated.

Thanks
SilentSurfer


      


Re: Limit of Index size per machine..

Posted by Walter Underwood <wu...@wunderwood.org>.
That is why people don't use search engines to manage logs. Look at a  
Hadoop cluster.

wunder

On Aug 5, 2009, at 10:08 PM, Silent Surfer wrote:

>
> Hi,
>
> That means we need approximately 3000 GB (Index Size)/24 GB (RAM) =  
> 125 servers.
>
> It would be very hard to convince my org to go for 125 servers for  
> log management of 3 Terabytes of indexes.
>
> Has any one used, solr for processing and handling of the indexes of  
> the order of 3 TB ? If so how many servers were used for indexing  
> alone.
>
> Thanks,
> sS
>
>
> --- On Wed, 8/5/09, Ian Connor <ia...@gmail.com> wrote:
>
>> From: Ian Connor <ia...@gmail.com>
>> Subject: Re: Limit of Index size per machine..
>> To: solr-user@lucene.apache.org
>> Date: Wednesday, August 5, 2009, 9:38 PM
>> I try to keep the index directory
>> size less than the amount of RAM and rely
>> on the OS to cache as it needs. Linux does a pretty good
>> job here and I am
>> sure OS X will do a good job also.
>>
>> Distributed search here will be your friend so you can
>> chunk it up to a
>> number of servers to keep your cost down (2GB RAM sticks
>> are much cheaper
>> than 4GB RAM sticks $20 < $100).
>>
>> Ian.
>>
>> On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer <silentsurfer77@yahoo.com 
>> >wrote:
>>
>>>
>>> Hi ,
>>>
>>> We are planning to use Solr for indexing the server
>> log contents.
>>> The expected processed log file size per day: 100 GB
>>> We are expecting to retain these indexes for 30 days
>> (100*30 ~ 3 TB).
>>>
>>> Can any one provide what would be the optimal size of
>> the index that I can
>>> store on a single server, without hampering the search
>> performance etc.
>>>
>>> We are planning to use OSX server with a configuration
>> of 16 GB (Can go to
>>> 24 GB).
>>>
>>> We need to figure out how many servers are required to
>> handle such amount
>>> of data..
>>>
>>> Any help would be greatly appreciated.
>>>
>>> Thanks
>>> SilentSurfer
>>>
>>>
>>>
>>>
>>>
>>
>>
>> -- 
>> Regards,
>>
>> Ian Connor
>> 1 Leighton St #723
>> Cambridge, MA 02141
>> Call Center Phone: +1 (714) 239 3875 (24 hrs)
>> Fax: +1(770) 818 5697
>> Skype: ian.connor
>>
>
>
>
>


Re: Limit of Index size per machine..

Posted by Tom Burton-West <tb...@gmail.com>.
Hello,

I think you are confusing the size of the data you want to index with the
size of the index.  For our indexes (large full text documents) the Solr
index is about 1/3 of the size of the documents being indexed.  For 3 TB of
data you might have an index of 1 TB or less.  This depends on many factors
in your index configuration, including whether you store fields.

What kind of performance do you need for indexing time and for search
response time?

We are trying to optimize search response time and  have been running tests
on a 225GB Solr index with 32GB of ram and are getting 95% of our test
queries returning in less than a second.  However, the slowest 1% of queries
are returning 5 and 10 seconds.

On the other hand it takes almost a week to index about 670GB of full text
documents.

We will be scaling up to 3 million documents which will be about 2 TB of
text and 0.75 TB index size.  We plan to distribute the index across 5
machines.

More information on our setup and results is available
at:http://www.hathitrust.org/blogs/large-scale-search

Tom
> > The expected processed log file size per day: 100 GB
> > We are expecting to retain these indexes for 30 days
> (100*30 ~ 3 TB).


>>>That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125
servers. 

It would be very hard to convince my org to go for 125 servers for log
management of 3 Terabytes of indexes. 

Has any one used, solr for processing and handling of the indexes of the
order of 3 TB ? If so how many servers were used for indexing alone.

Thanks,
sS

-- 
View this message in context: http://www.nabble.com/Limit-of-Index-size-per-machine..-tp24833163p24853662.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Limit of Index size per machine..

Posted by Silent Surfer <si...@yahoo.com>.
Hi,

That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125 servers. 

It would be very hard to convince my org to go for 125 servers for log management of 3 Terabytes of indexes. 

Has any one used, solr for processing and handling of the indexes of the order of 3 TB ? If so how many servers were used for indexing alone.

Thanks,
sS


--- On Wed, 8/5/09, Ian Connor <ia...@gmail.com> wrote:

> From: Ian Connor <ia...@gmail.com>
> Subject: Re: Limit of Index size per machine..
> To: solr-user@lucene.apache.org
> Date: Wednesday, August 5, 2009, 9:38 PM
> I try to keep the index directory
> size less than the amount of RAM and rely
> on the OS to cache as it needs. Linux does a pretty good
> job here and I am
> sure OS X will do a good job also.
> 
> Distributed search here will be your friend so you can
> chunk it up to a
> number of servers to keep your cost down (2GB RAM sticks
> are much cheaper
> than 4GB RAM sticks $20 < $100).
> 
> Ian.
> 
> On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer <si...@yahoo.com>wrote:
> 
> >
> > Hi ,
> >
> > We are planning to use Solr for indexing the server
> log contents.
> > The expected processed log file size per day: 100 GB
> > We are expecting to retain these indexes for 30 days
> (100*30 ~ 3 TB).
> >
> > Can any one provide what would be the optimal size of
> the index that I can
> > store on a single server, without hampering the search
> performance etc.
> >
> > We are planning to use OSX server with a configuration
> of 16 GB (Can go to
> > 24 GB).
> >
> > We need to figure out how many servers are required to
> handle such amount
> > of data..
> >
> > Any help would be greatly appreciated.
> >
> > Thanks
> > SilentSurfer
> >
> >
> >
> >
> >
> 
> 
> -- 
> Regards,
> 
> Ian Connor
> 1 Leighton St #723
> Cambridge, MA 02141
> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> Fax: +1(770) 818 5697
> Skype: ian.connor
> 


      


Re: Limit of Index size per machine..

Posted by Ian Connor <ia...@gmail.com>.
I try to keep the index directory size less than the amount of RAM and rely
on the OS to cache as it needs. Linux does a pretty good job here and I am
sure OS X will do a good job also.

Distributed search here will be your friend so you can chunk it up to a
number of servers to keep your cost down (2GB RAM sticks are much cheaper
than 4GB RAM sticks $20 < $100).

Ian.

On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer <si...@yahoo.com>wrote:

>
> Hi ,
>
> We are planning to use Solr for indexing the server log contents.
> The expected processed log file size per day: 100 GB
> We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB).
>
> Can any one provide what would be the optimal size of the index that I can
> store on a single server, without hampering the search performance etc.
>
> We are planning to use OSX server with a configuration of 16 GB (Can go to
> 24 GB).
>
> We need to figure out how many servers are required to handle such amount
> of data..
>
> Any help would be greatly appreciated.
>
> Thanks
> SilentSurfer
>
>
>
>
>


-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor