You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Rahul Ramesh <rr...@gmail.com> on 2016/01/25 12:02:13 UTC

Understanding solr commit

We are facing some issue and we are finding it difficult to debug the
problem. We wanted to understand how solr commit works.
A background on our setup:
We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
use case. In peak load, we index 400-500 documents/second.
We also want these documents to be visible as quickly as possible, hence we
run an external script which commits every 3 mins.

Consider the three nodes as N1, N2, N3. Commit is an synchronous operation.
So, we will not get control till the commit operation is complete.

Consider the following scenario. Although it looks like a basic scenario in
distributed system:-) but we just wanted to eliminate this possibility.

step 1 : At time T1, commit happens to Node N1
step 2: At same time T1, we search for all the documents inserted in Node
N2.

My question is

1. Is commit an atomic operation? I mean, will commit happen on all the
nodes at the same time?
2. Can we say that, the search result will always contain the documents
before commit / or after commit . Or can it so happen that we get new
documents fron N1, N2 but old documents (i.e., before commit)  from N3?

Thank you,
Rahul

Re: Understanding solr commit

Posted by Emir Arnautovic <em...@sematext.com>.

Hi Rahul,
If I got your mail right there is misconception of SolrCloud - nodes are 
infrastructure of cloud and collection is something that is "unit". So 
when you commit, you are committing changes you did on collection and 
SolrCloud will handle nodes. When you commit to three 3 nodes it is 
actually 3 commits to single collection.
It is not considered to be good practice to have script that does 
commits. Solr has autocommit functionality. You should also educate 
about soft v.s. hard commits. Following article is good starting point: 
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Regards,
Emir

On 25.01.2016 12:02, Rahul Ramesh wrote:
> We are facing some issue and we are finding it difficult to debug the
> problem. We wanted to understand how solr commit works.
> A background on our setup:
> We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
> use case. In peak load, we index 400-500 documents/second.
> We also want these documents to be visible as quickly as possible, hence we
> run an external script which commits every 3 mins.
>
> Consider the three nodes as N1, N2, N3. Commit is an synchronous operation.
> So, we will not get control till the commit operation is complete.
>
> Consider the following scenario. Although it looks like a basic scenario in
> distributed system:-) but we just wanted to eliminate this possibility.
>
> step 1 : At time T1, commit happens to Node N1
> step 2: At same time T1, we search for all the documents inserted in Node
> N2.
>
> My question is
>
> 1. Is commit an atomic operation? I mean, will commit happen on all the
> nodes at the same time?
> 2. Can we say that, the search result will always contain the documents
> before commit / or after commit . Or can it so happen that we get new
> documents fron N1, N2 but old documents (i.e., before commit)  from N3?
>
> Thank you,
> Rahul
>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Understanding solr commit

Posted by Rahul Ramesh <rr...@gmail.com>.

Thank you Emir, Allesandro for the inputs. We use sematext for monitoring.
We understand that Solr needs more memory but unfortunately we have to move
towards an altogether new range of servers.
As you say eventually, we will have to upgrade our servers.

Thanks,
Rahul


On Mon, Jan 25, 2016 at 6:32 PM, Emir Arnautovic <
emir.arnautovic@sematext.com> wrote:

> Hi Rahul,
> It is hard to tell without seeing metrics, but 8GB heap seems small for
> such setup - e.g. with indexing buffer of 32MB and 30 collections, it will
> eat almost 1GB memory.
> About commits, you can set auto commit to be more frequent (keep
> openSearcher=false) and add soft commits every 3 min.
> What you need to tune is your heap and heap related settings - indexing
> buffer, caches. Not sure what you use for monitoring Solr, but Sematext's
> SPM (http://sematext.com/spm) is one such tool that can give you info how
> you Solr, JVM and host handle different load. One such tool can give you
> enough info to tune your Solr.
>
> Regards,
> Emir
>
>
> On 25.01.2016 13:42, Rahul Ramesh wrote:
>
>> Can you give us bit more details about Solr heap parameters.
>> Each node has 32Gb of RAM and we are using 8Gb for heap.
>> Index size in each node is around 80Gb
>> #of collections 30
>>
>>
>> Also can you give us info about auto commit (both hard and soft) you used
>> when experienced OOM.
>> <autoCommit> <maxTime>15000</maxTime> <maxDocs>15000</maxDocs>
>> <openSearcher
>>
>>> false</openSearcher> </autoCommit>
>>>
>> soft commit is not enabled.
>>
>> -Rahul
>>
>>
>>
>> On Mon, Jan 25, 2016 at 6:00 PM, Emir Arnautovic <
>> emir.arnautovic@sematext.com> wrote:
>>
>> Hi Rahul,
>>> It is good that you commit only once, but not sure how external commits
>>> can do something auto commit cannot.
>>> Can you give us bit more details about Solr heap parameters. Running Solr
>>> on the edge of OOM is always risk of starting snowball effect and
>>> crashing
>>> entire cluster. Also can you give us info about auto commit (both hard
>>> and
>>> soft) you used when experienced OOM.
>>>
>>> Thanks,
>>> Emir
>>>
>>> On 25.01.2016 12:28, Rahul Ramesh wrote:
>>>
>>> Thanks for your replies.
>>>>
>>>> A bit more detail about our setup.
>>>> The index size is close to 80Gb spread across 30 collections. The main
>>>> memory available is around 32Gb. We are always in short of memory!
>>>> Unfortunately we could not expand the memory as the server motherboard
>>>> doesnt support it.
>>>>
>>>> We tried with solr auto commit features. However, sometimes we were
>>>> getting
>>>> Java OOM exception and when I start digging more about it, somebody
>>>> suggested that I am not committing the collections often. So, we started
>>>> committing the collections explicitly.
>>>>
>>>> Please let me know if our approach is not correct.
>>>>
>>>> *Emir*,
>>>> We are committing to the collection only once. We have Node N1, N2 and
>>>> N3
>>>> and for a collection Coll1, commit will happen to N1/coll1 every 3
>>>> minutes.
>>>> we are not doing it for every node. We will remove _shard<>_replica<>
>>>> and
>>>> use only the collection name to commit.
>>>>
>>>> *Alessandro*,
>>>>
>>>> We are using Solr Cloud with replication factor of 2 and no of shards as
>>>> either 2 or 3.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jan 25, 2016 at 4:43 PM, Alessandro Benedetti <
>>>> abenedetti@apache.org
>>>>
>>>> wrote:
>>>>> Let me answer in line :
>>>>>
>>>>> On 25 January 2016 at 11:02, Rahul Ramesh <rr...@gmail.com> wrote:
>>>>>
>>>>> We are facing some issue and we are finding it difficult to debug the
>>>>>
>>>>>> problem. We wanted to understand how solr commit works.
>>>>>> A background on our setup:
>>>>>> We have  3 Node Solr Cluster running in version 5.3.1. Its a index
>>>>>> heavy
>>>>>> use case. In peak load, we index 400-500 documents/second.
>>>>>> We also want these documents to be visible as quickly as possible,
>>>>>> hence
>>>>>>
>>>>>> we
>>>>>
>>>>> run an external script which commits every 3 mins.
>>>>>>
>>>>>> This is weird, why not using the auto-soft commit if you want
>>>>>> visibility
>>>>>>
>>>>> every 3 minutes ?
>>>>> Is there any particular reason you trigger the commit from the client ?
>>>>>
>>>>> Consider the three nodes as N1, N2, N3. Commit is an synchronous
>>>>> operation.
>>>>>
>>>>> So, we will not get control till the commit operation is complete.
>>>>>>
>>>>>> Consider the following scenario. Although it looks like a basic
>>>>>> scenario
>>>>>>
>>>>>> in
>>>>>
>>>>> distributed system:-) but we just wanted to eliminate this possibility.
>>>>>>
>>>>>> step 1 : At time T1, commit happens to Node N1
>>>>>> step 2: At same time T1, we search for all the documents inserted in
>>>>>> Node
>>>>>> N2.
>>>>>>
>>>>>> My question is
>>>>>>
>>>>>> 1. Is commit an atomic operation? I mean, will commit happen on all
>>>>>> the
>>>>>> nodes at the same time?
>>>>>>
>>>>>> Which kind of architecture of Solr are you using ? Are you using
>>>>>>
>>>>> SolrCloud
>>>>> ?
>>>>>
>>>>> 2. Can we say that, the search result will always contain the documents
>>>>>
>>>>> before commit / or after commit . Or can it so happen that we get new
>>>>>> documents fron N1, N2 but old documents (i.e., before commit)  from
>>>>>> N3?
>>>>>>
>>>>>> With a manual cluster it could faintly happen.
>>>>>>
>>>>> In SolrCloud it should not, but I should double check the code !
>>>>>
>>>>> Thank you,
>>>>>
>>>>>> Rahul
>>>>>>
>>>>>>
>>>>>> --
>>>>> --------------------------
>>>>>
>>>>> Benedetti Alessandro
>>>>> Visiting card : http://about.me/alessandro_benedetti
>>>>>
>>>>> "Tyger, tyger burning bright
>>>>> In the forests of the night,
>>>>> What immortal hand or eye
>>>>> Could frame thy fearful symmetry?"
>>>>>
>>>>> William Blake - Songs of Experience -1794 England
>>>>>
>>>>>
>>>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Re: Understanding solr commit

Posted by Emir Arnautovic <em...@sematext.com>.

Hi Rahul,
It is hard to tell without seeing metrics, but 8GB heap seems small for 
such setup - e.g. with indexing buffer of 32MB and 30 collections, it 
will eat almost 1GB memory.
About commits, you can set auto commit to be more frequent (keep 
openSearcher=false) and add soft commits every 3 min.
What you need to tune is your heap and heap related settings - indexing 
buffer, caches. Not sure what you use for monitoring Solr, but 
Sematext's SPM (http://sematext.com/spm) is one such tool that can give 
you info how you Solr, JVM and host handle different load. One such tool 
can give you enough info to tune your Solr.

Regards,
Emir

On 25.01.2016 13:42, Rahul Ramesh wrote:
> Can you give us bit more details about Solr heap parameters.
> Each node has 32Gb of RAM and we are using 8Gb for heap.
> Index size in each node is around 80Gb
> #of collections 30
>
>
> Also can you give us info about auto commit (both hard and soft) you used
> when experienced OOM.
> <autoCommit> <maxTime>15000</maxTime> <maxDocs>15000</maxDocs> <openSearcher
>> false</openSearcher> </autoCommit>
> soft commit is not enabled.
>
> -Rahul
>
>
>
> On Mon, Jan 25, 2016 at 6:00 PM, Emir Arnautovic <
> emir.arnautovic@sematext.com> wrote:
>
>> Hi Rahul,
>> It is good that you commit only once, but not sure how external commits
>> can do something auto commit cannot.
>> Can you give us bit more details about Solr heap parameters. Running Solr
>> on the edge of OOM is always risk of starting snowball effect and crashing
>> entire cluster. Also can you give us info about auto commit (both hard and
>> soft) you used when experienced OOM.
>>
>> Thanks,
>> Emir
>>
>> On 25.01.2016 12:28, Rahul Ramesh wrote:
>>
>>> Thanks for your replies.
>>>
>>> A bit more detail about our setup.
>>> The index size is close to 80Gb spread across 30 collections. The main
>>> memory available is around 32Gb. We are always in short of memory!
>>> Unfortunately we could not expand the memory as the server motherboard
>>> doesnt support it.
>>>
>>> We tried with solr auto commit features. However, sometimes we were
>>> getting
>>> Java OOM exception and when I start digging more about it, somebody
>>> suggested that I am not committing the collections often. So, we started
>>> committing the collections explicitly.
>>>
>>> Please let me know if our approach is not correct.
>>>
>>> *Emir*,
>>> We are committing to the collection only once. We have Node N1, N2 and N3
>>> and for a collection Coll1, commit will happen to N1/coll1 every 3
>>> minutes.
>>> we are not doing it for every node. We will remove _shard<>_replica<> and
>>> use only the collection name to commit.
>>>
>>> *Alessandro*,
>>>
>>> We are using Solr Cloud with replication factor of 2 and no of shards as
>>> either 2 or 3.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jan 25, 2016 at 4:43 PM, Alessandro Benedetti <
>>> abenedetti@apache.org
>>>
>>>> wrote:
>>>> Let me answer in line :
>>>>
>>>> On 25 January 2016 at 11:02, Rahul Ramesh <rr...@gmail.com> wrote:
>>>>
>>>> We are facing some issue and we are finding it difficult to debug the
>>>>> problem. We wanted to understand how solr commit works.
>>>>> A background on our setup:
>>>>> We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
>>>>> use case. In peak load, we index 400-500 documents/second.
>>>>> We also want these documents to be visible as quickly as possible, hence
>>>>>
>>>> we
>>>>
>>>>> run an external script which commits every 3 mins.
>>>>>
>>>>> This is weird, why not using the auto-soft commit if you want visibility
>>>> every 3 minutes ?
>>>> Is there any particular reason you trigger the commit from the client ?
>>>>
>>>> Consider the three nodes as N1, N2, N3. Commit is an synchronous
>>>> operation.
>>>>
>>>>> So, we will not get control till the commit operation is complete.
>>>>>
>>>>> Consider the following scenario. Although it looks like a basic scenario
>>>>>
>>>> in
>>>>
>>>>> distributed system:-) but we just wanted to eliminate this possibility.
>>>>>
>>>>> step 1 : At time T1, commit happens to Node N1
>>>>> step 2: At same time T1, we search for all the documents inserted in
>>>>> Node
>>>>> N2.
>>>>>
>>>>> My question is
>>>>>
>>>>> 1. Is commit an atomic operation? I mean, will commit happen on all the
>>>>> nodes at the same time?
>>>>>
>>>>> Which kind of architecture of Solr are you using ? Are you using
>>>> SolrCloud
>>>> ?
>>>>
>>>> 2. Can we say that, the search result will always contain the documents
>>>>
>>>>> before commit / or after commit . Or can it so happen that we get new
>>>>> documents fron N1, N2 but old documents (i.e., before commit)  from N3?
>>>>>
>>>>> With a manual cluster it could faintly happen.
>>>> In SolrCloud it should not, but I should double check the code !
>>>>
>>>> Thank you,
>>>>> Rahul
>>>>>
>>>>>
>>>> --
>>>> --------------------------
>>>>
>>>> Benedetti Alessandro
>>>> Visiting card : http://about.me/alessandro_benedetti
>>>>
>>>> "Tyger, tyger burning bright
>>>> In the forests of the night,
>>>> What immortal hand or eye
>>>> Could frame thy fearful symmetry?"
>>>>
>>>> William Blake - Songs of Experience -1794 England
>>>>
>>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Understanding solr commit

Posted by Rahul Ramesh <rr...@gmail.com>.

Can you give us bit more details about Solr heap parameters.
Each node has 32Gb of RAM and we are using 8Gb for heap.
Index size in each node is around 80Gb
#of collections 30


Also can you give us info about auto commit (both hard and soft) you used
when experienced OOM.
<autoCommit> <maxTime>15000</maxTime> <maxDocs>15000</maxDocs> <openSearcher
>false</openSearcher> </autoCommit>

soft commit is not enabled.

-Rahul



On Mon, Jan 25, 2016 at 6:00 PM, Emir Arnautovic <
emir.arnautovic@sematext.com> wrote:

> Hi Rahul,
> It is good that you commit only once, but not sure how external commits
> can do something auto commit cannot.
> Can you give us bit more details about Solr heap parameters. Running Solr
> on the edge of OOM is always risk of starting snowball effect and crashing
> entire cluster. Also can you give us info about auto commit (both hard and
> soft) you used when experienced OOM.
>
> Thanks,
> Emir
>
> On 25.01.2016 12:28, Rahul Ramesh wrote:
>
>> Thanks for your replies.
>>
>> A bit more detail about our setup.
>> The index size is close to 80Gb spread across 30 collections. The main
>> memory available is around 32Gb. We are always in short of memory!
>> Unfortunately we could not expand the memory as the server motherboard
>> doesnt support it.
>>
>> We tried with solr auto commit features. However, sometimes we were
>> getting
>> Java OOM exception and when I start digging more about it, somebody
>> suggested that I am not committing the collections often. So, we started
>> committing the collections explicitly.
>>
>> Please let me know if our approach is not correct.
>>
>> *Emir*,
>> We are committing to the collection only once. We have Node N1, N2 and N3
>> and for a collection Coll1, commit will happen to N1/coll1 every 3
>> minutes.
>> we are not doing it for every node. We will remove _shard<>_replica<> and
>> use only the collection name to commit.
>>
>> *Alessandro*,
>>
>> We are using Solr Cloud with replication factor of 2 and no of shards as
>> either 2 or 3.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jan 25, 2016 at 4:43 PM, Alessandro Benedetti <
>> abenedetti@apache.org
>>
>>> wrote:
>>> Let me answer in line :
>>>
>>> On 25 January 2016 at 11:02, Rahul Ramesh <rr...@gmail.com> wrote:
>>>
>>> We are facing some issue and we are finding it difficult to debug the
>>>> problem. We wanted to understand how solr commit works.
>>>> A background on our setup:
>>>> We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
>>>> use case. In peak load, we index 400-500 documents/second.
>>>> We also want these documents to be visible as quickly as possible, hence
>>>>
>>> we
>>>
>>>> run an external script which commits every 3 mins.
>>>>
>>>> This is weird, why not using the auto-soft commit if you want visibility
>>> every 3 minutes ?
>>> Is there any particular reason you trigger the commit from the client ?
>>>
>>> Consider the three nodes as N1, N2, N3. Commit is an synchronous
>>>>
>>> operation.
>>>
>>>> So, we will not get control till the commit operation is complete.
>>>>
>>>> Consider the following scenario. Although it looks like a basic scenario
>>>>
>>> in
>>>
>>>> distributed system:-) but we just wanted to eliminate this possibility.
>>>>
>>>> step 1 : At time T1, commit happens to Node N1
>>>> step 2: At same time T1, we search for all the documents inserted in
>>>> Node
>>>> N2.
>>>>
>>>> My question is
>>>>
>>>> 1. Is commit an atomic operation? I mean, will commit happen on all the
>>>> nodes at the same time?
>>>>
>>>> Which kind of architecture of Solr are you using ? Are you using
>>> SolrCloud
>>> ?
>>>
>>> 2. Can we say that, the search result will always contain the documents
>>>
>>>> before commit / or after commit . Or can it so happen that we get new
>>>> documents fron N1, N2 but old documents (i.e., before commit)  from N3?
>>>>
>>>> With a manual cluster it could faintly happen.
>>> In SolrCloud it should not, but I should double check the code !
>>>
>>> Thank you,
>>>> Rahul
>>>>
>>>>
>>>
>>> --
>>> --------------------------
>>>
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>>
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>>
>>> William Blake - Songs of Experience -1794 England
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Re: Understanding solr commit

Posted by Emir Arnautovic <em...@sematext.com>.

Hi Rahul,
It is good that you commit only once, but not sure how external commits 
can do something auto commit cannot.
Can you give us bit more details about Solr heap parameters. Running 
Solr on the edge of OOM is always risk of starting snowball effect and 
crashing entire cluster. Also can you give us info about auto commit 
(both hard and soft) you used when experienced OOM.

Thanks,
Emir

On 25.01.2016 12:28, Rahul Ramesh wrote:
> Thanks for your replies.
>
> A bit more detail about our setup.
> The index size is close to 80Gb spread across 30 collections. The main
> memory available is around 32Gb. We are always in short of memory!
> Unfortunately we could not expand the memory as the server motherboard
> doesnt support it.
>
> We tried with solr auto commit features. However, sometimes we were getting
> Java OOM exception and when I start digging more about it, somebody
> suggested that I am not committing the collections often. So, we started
> committing the collections explicitly.
>
> Please let me know if our approach is not correct.
>
> *Emir*,
> We are committing to the collection only once. We have Node N1, N2 and N3
> and for a collection Coll1, commit will happen to N1/coll1 every 3 minutes.
> we are not doing it for every node. We will remove _shard<>_replica<> and
> use only the collection name to commit.
>
> *Alessandro*,
> We are using Solr Cloud with replication factor of 2 and no of shards as
> either 2 or 3.
>
> Thanks,
> Rahul
>
>
>
>
>
>
>
>
>
> On Mon, Jan 25, 2016 at 4:43 PM, Alessandro Benedetti <abenedetti@apache.org
>> wrote:
>> Let me answer in line :
>>
>> On 25 January 2016 at 11:02, Rahul Ramesh <rr...@gmail.com> wrote:
>>
>>> We are facing some issue and we are finding it difficult to debug the
>>> problem. We wanted to understand how solr commit works.
>>> A background on our setup:
>>> We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
>>> use case. In peak load, we index 400-500 documents/second.
>>> We also want these documents to be visible as quickly as possible, hence
>> we
>>> run an external script which commits every 3 mins.
>>>
>> This is weird, why not using the auto-soft commit if you want visibility
>> every 3 minutes ?
>> Is there any particular reason you trigger the commit from the client ?
>>
>>> Consider the three nodes as N1, N2, N3. Commit is an synchronous
>> operation.
>>> So, we will not get control till the commit operation is complete.
>>>
>>> Consider the following scenario. Although it looks like a basic scenario
>> in
>>> distributed system:-) but we just wanted to eliminate this possibility.
>>>
>>> step 1 : At time T1, commit happens to Node N1
>>> step 2: At same time T1, we search for all the documents inserted in Node
>>> N2.
>>>
>>> My question is
>>>
>>> 1. Is commit an atomic operation? I mean, will commit happen on all the
>>> nodes at the same time?
>>>
>> Which kind of architecture of Solr are you using ? Are you using SolrCloud
>> ?
>>
>> 2. Can we say that, the search result will always contain the documents
>>> before commit / or after commit . Or can it so happen that we get new
>>> documents fron N1, N2 but old documents (i.e., before commit)  from N3?
>>>
>> With a manual cluster it could faintly happen.
>> In SolrCloud it should not, but I should double check the code !
>>
>>> Thank you,
>>> Rahul
>>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Understanding solr commit

Posted by Rahul Ramesh <rr...@gmail.com>.

Thanks for your replies.

A bit more detail about our setup.
The index size is close to 80Gb spread across 30 collections. The main
memory available is around 32Gb. We are always in short of memory!
Unfortunately we could not expand the memory as the server motherboard
doesnt support it.

We tried with solr auto commit features. However, sometimes we were getting
Java OOM exception and when I start digging more about it, somebody
suggested that I am not committing the collections often. So, we started
committing the collections explicitly.

Please let me know if our approach is not correct.

*Emir*,
We are committing to the collection only once. We have Node N1, N2 and N3
and for a collection Coll1, commit will happen to N1/coll1 every 3 minutes.
we are not doing it for every node. We will remove _shard<>_replica<> and
use only the collection name to commit.

*Alessandro*,
We are using Solr Cloud with replication factor of 2 and no of shards as
either 2 or 3.

Thanks,
Rahul

On Mon, Jan 25, 2016 at 4:43 PM, Alessandro Benedetti <abenedetti@apache.org
> wrote:

> Let me answer in line :
>
> On 25 January 2016 at 11:02, Rahul Ramesh <rr...@gmail.com> wrote:
>
> > We are facing some issue and we are finding it difficult to debug the
> > problem. We wanted to understand how solr commit works.
> > A background on our setup:
> > We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
> > use case. In peak load, we index 400-500 documents/second.
> > We also want these documents to be visible as quickly as possible, hence
> we
> > run an external script which commits every 3 mins.
> >
>
> This is weird, why not using the auto-soft commit if you want visibility
> every 3 minutes ?
> Is there any particular reason you trigger the commit from the client ?
>
> >
> > Consider the three nodes as N1, N2, N3. Commit is an synchronous
> operation.
> > So, we will not get control till the commit operation is complete.
> >
> > Consider the following scenario. Although it looks like a basic scenario
> in
> > distributed system:-) but we just wanted to eliminate this possibility.
> >
> > step 1 : At time T1, commit happens to Node N1
> > step 2: At same time T1, we search for all the documents inserted in Node
> > N2.
> >
> > My question is
> >
> > 1. Is commit an atomic operation? I mean, will commit happen on all the
> > nodes at the same time?
> >
> Which kind of architecture of Solr are you using ? Are you using SolrCloud
> ?
>
> 2. Can we say that, the search result will always contain the documents
> > before commit / or after commit . Or can it so happen that we get new
> > documents fron N1, N2 but old documents (i.e., before commit)  from N3?
> >
> With a manual cluster it could faintly happen.
> In SolrCloud it should not, but I should double check the code !
>
> >
> > Thank you,
> > Rahul
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Understanding solr commit

Posted by Alessandro Benedetti <ab...@apache.org>.

Let me answer in line :

On 25 January 2016 at 11:02, Rahul Ramesh <rr...@gmail.com> wrote:

> We are facing some issue and we are finding it difficult to debug the
> problem. We wanted to understand how solr commit works.
> A background on our setup:
> We have  3 Node Solr Cluster running in version 5.3.1. Its a index heavy
> use case. In peak load, we index 400-500 documents/second.
> We also want these documents to be visible as quickly as possible, hence we
> run an external script which commits every 3 mins.
>

This is weird, why not using the auto-soft commit if you want visibility
every 3 minutes ?
Is there any particular reason you trigger the commit from the client ?

>
> Consider the three nodes as N1, N2, N3. Commit is an synchronous operation.
> So, we will not get control till the commit operation is complete.
>
> Consider the following scenario. Although it looks like a basic scenario in
> distributed system:-) but we just wanted to eliminate this possibility.
>
> step 1 : At time T1, commit happens to Node N1
> step 2: At same time T1, we search for all the documents inserted in Node
> N2.
>
> My question is
>
> 1. Is commit an atomic operation? I mean, will commit happen on all the
> nodes at the same time?
>
Which kind of architecture of Solr are you using ? Are you using SolrCloud ?

2. Can we say that, the search result will always contain the documents
> before commit / or after commit . Or can it so happen that we get new
> documents fron N1, N2 but old documents (i.e., before commit)  from N3?
>
With a manual cluster it could faintly happen.
In SolrCloud it should not, but I should double check the code !

>
> Thank you,
> Rahul
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England