You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by abhishek jain <ab...@gmail.com> on 2014/03/09 19:55:53 UTC

Optimizing RAM

hi friends,
I want to index some good amount of data, i want to keep both stemmed and
unstemmed versions ,
I am confused should i keep two separate indexes or keep one index with two
versions or column , i mean col1_stemmed and col2_unstemmed.

I have multicore with multi shard configuration.
My server have 32 GB RAM and stemmed index size (without content) i
calculated as 60 GB .
I want to not put too much load and I/O load on a decent server with some 5
other replicated servers and want to use servers for other purposes also.


Also is it advised to server queries from master server or only from slaves?
-- 
Thanks,
Abhishek

Re: Optimizing RAM

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Sun, 2014-03-09 at 19:55 +0100, abhishek jain wrote:
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.

1 index with stemmed & unstemmed will be markedly smaller than 2 indexes
(one with stemmed, one with unstemmed). Furthermore, keeping the stemmed
and unstemmed in the same index allows you to search in both fields and
assign a greater weight to the unstemmed field.

> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .

What do you mean by "without content"? Is it the col2_unstemmed that you
plan to add? Or stored fields maybe?

> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.

If you haven't already done so, use SSDs as your storage. That way you
don't have to worry much about RAM / index size ratio for performance.

- Toke Eskildsen, State and University Library, Denmark

Re: Optimizing RAM

Posted by Shawn Heisey <so...@elyograg.org>.

On 3/11/2014 11:05 AM, abhishek jain wrote:
> hi Shawn,
> Thanks for the reply,
>
> Is there a way to optimize RAM or does  Solr does automatically. I have
> multiple shards and i know i will be querying only 30% of shards most of
> time! and i have 6 slaves. so dedicating more slave with 30% most used
> shards .
>
> Another question:
> Is it advised to serve queries from master or only from slaves? or it
> doesnt matter?

You'll have to explain what you mean by 'optimize RAM' before I can 
answer that question and have any confidence that I've given you the 
information you need.

The OS disk cache is handled by the operating system, not Solr.  It is 
automatic, and it is very efficient.  Some operating systems are better 
at it than others, but even the worst of them is pretty good.

For the Java heap, normal usage will eventually allocate the maximum 
heap value.  Java's garbage collection model is not very well optimized 
for large heaps, but it's highly tunable. With good tuning options, it 
usually works very well.

Thanks,
Shawn

Re: Optimizing RAM

Posted by abhishek jain <ab...@gmail.com>.

hi Shawn,
Thanks for the reply,

Is there a way to optimize RAM or does  Solr does automatically. I have
multiple shards and i know i will be querying only 30% of shards most of
time! and i have 6 slaves. so dedicating more slave with 30% most used
shards .

Another question:
Is it advised to serve queries from master or only from slaves? or it
doesnt matter?

thanks
Abhishek




On Tue, Mar 11, 2014 at 9:12 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 3/11/2014 6:14 AM, abhishek.netjain@gmail.com wrote:
> > Hi all,
> > What should be the ideal RAM index size ratio.
> >
> > please reply I expect index to be of size of 60 gb and I dont store
> contents.
>
> Ideally, your total system RAM will be equal to the size of all your
> program's heap requirements, plus the size of all the data for all the
> programs.
>
> If Solr is the only thing on the box, then the ideal memory size is
> roughly the Solr heap plus the size of all the Solr indexes that live on
> that machine.  So if your heap is 8GB and your index is 60GB, you'll
> want at least 68GB of RAM for an ideal setup.  I don't know how big your
> heap is, so I am guessing here.
>
> You said your index does not store much content.  That means you will
> need a higher percentage of your total index size to be in RAM for good
> performance.  I would estimate that you want a minimum of two thirds of
> your index in RAM, which indicates a minimum RAM size of 48GB if we
> assume your heap is 8GB.  64GB would be better.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#General_information
>
> Thanks,
> Shawn
>
>


-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Re: Optimizing RAM

Posted by Shawn Heisey <so...@elyograg.org>.

On 3/11/2014 6:14 AM, abhishek.netjain@gmail.com wrote:
> Hi all,
> What should be the ideal RAM index size ratio.
> 
> please reply I expect index to be of size of 60 gb and I dont store contents. 

Ideally, your total system RAM will be equal to the size of all your
program's heap requirements, plus the size of all the data for all the
programs.

If Solr is the only thing on the box, then the ideal memory size is
roughly the Solr heap plus the size of all the Solr indexes that live on
that machine.  So if your heap is 8GB and your index is 60GB, you'll
want at least 68GB of RAM for an ideal setup.  I don't know how big your
heap is, so I am guessing here.

You said your index does not store much content.  That means you will
need a higher percentage of your total index size to be in RAM for good
performance.  I would estimate that you want a minimum of two thirds of
your index in RAM, which indicates a minimum RAM size of 48GB if we
assume your heap is 8GB.  64GB would be better.

http://wiki.apache.org/solr/SolrPerformanceProblems#General_information

Thanks,
Shawn

Re: Optimizing RAM

Posted by ab...@gmail.com.

Hi all,
What should be the ideal RAM index size ratio.

please reply I expect index to be of size of 60 gb and I dont store contents. 
Thanks 
Abhishek

  Original Message  
From: abhishek.netjain@gmail.com
Sent: Monday, 10 March 2014 09:25
To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: Optimizing RAM

Hi,
If I go with copy field than will it increase I/O load considering I have RAM less than one third of total index size?

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:37
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Optimizing RAM

I'd go for a copyField, keep the stemmed and unstemmed
version in the same index.

An alternative (and I think there's a JIRA for this if not an
outright patch) is implement a "special" filter that, say, puts
the original tken in with a special character, say $ at the
end, i.e. if indexing "running", you'd index both "running$" and
"run". Then when you want exact match, you search for "running$".

Best,
Erick

On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain
<ab...@gmail.com> wrote:
> hi friends,
> I want to index some good amount of data, i want to keep both stemmed and
> unstemmed versions ,
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.
>
> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .
> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.
>
>
> Also is it advised to server queries from master server or only from slaves?
> --
> Thanks,
> Abhishek

Re: Optimizing RAM

Posted by ab...@gmail.com.

Hi,
If I go with copy field than will it increase I/O load considering I have RAM less than one third of total index size?

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:37
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Optimizing RAM

I'd go for a copyField, keep the stemmed and unstemmed
version in the same index.

An alternative (and I think there's a JIRA for this if not an
outright patch) is implement a "special" filter that, say, puts
the original tken in with a special character, say $ at the
end, i.e. if indexing "running", you'd index both "running$" and
"run". Then when you want exact match, you search for "running$".

Best,
Erick

On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain
<ab...@gmail.com> wrote:
> hi friends,
> I want to index some good amount of data, i want to keep both stemmed and
> unstemmed versions ,
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.
>
> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .
> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.
>
>
> Also is it advised to server queries from master server or only from slaves?
> --
> Thanks,
> Abhishek

Re: Optimizing RAM

Posted by Erick Erickson <er...@gmail.com>.

I'd go for a copyField, keep the stemmed and unstemmed
version in the same index.

An alternative (and I think there's a JIRA for this if not an
outright patch) is implement a "special" filter that, say, puts
the original tken in with a special character, say $ at the
end, i.e. if indexing "running", you'd index both "running$" and
"run". Then when you want exact match, you search for "running$".

Best,
Erick

On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain
<ab...@gmail.com> wrote:
> hi friends,
> I want to index some good amount of data, i want to keep both stemmed and
> unstemmed versions ,
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.
>
> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .
> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.
>
>
> Also is it advised to server queries from master server or only from slaves?
> --
> Thanks,
> Abhishek