You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by adfel70 <ad...@gmail.com> on 2013/11/24 19:01:01 UTC

Commit behaviour in SolrCloud

Hi everyone,

I am wondering how commit operation works in SolrCloud:
Say I have 2 parallel indexing processes. What if one process sends big
update request (an add command with a lot of docs), and the other one just
happens to send a commit command while the update request is being
processed. 
Is it possible that only part of the documents will be commited? 
What will happen with the other docs? Is Solr transactional and promise that
there will be no partial results?



--
View this message in context: http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Commit behaviour in SolrCloud

Posted by Furkan KAMACI <fu...@gmail.com>.
I suggest you to read here:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks;
Furkan KAMACI


2013/11/24 Mark Miller <ma...@gmail.com>

> SolrCloud does not use commits for update acceptance promises.
>
> The idea is, if you get a success from the update, it’s in the system,
> commit or not.
>
> Soft Commits are used for visibility only.
>
> Standard Hard Commits are used essentially for internal purposes and
> should be done via auto commit generally.
>
> To your question though - it is fine to send a commit while updates are
> coming in from another source - it’s just not generally necessary to do
> that anyway.
>
> - Mark
>
> On Nov 24, 2013, at 1:01 PM, adfel70 <ad...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I am wondering how commit operation works in SolrCloud:
> > Say I have 2 parallel indexing processes. What if one process sends big
> > update request (an add command with a lot of docs), and the other one
> just
> > happens to send a commit command while the update request is being
> > processed.
> > Is it possible that only part of the documents will be commited?
> > What will happen with the other docs? Is Solr transactional and promise
> that
> > there will be no partial results?
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Commit behaviour in SolrCloud

Posted by Mark Miller <ma...@gmail.com>.
On Nov 25, 2013, at 1:40 AM, adfel70 <ad...@gmail.com> wrote:

> Just to clarify how these two phrases come together:
> 1. "you will know when an update is rejected - it just might not be easy to
> know which in the batch / stream"
> 
> 2. "Documents that come in batches are added as they come / are processed -
> not in some atomic unit."
> 
> 
> If I send a batch of documents in one update request, and some of the docs
> fail - will the other docs still remain in the system?

Yes.

> what if soft commit occurred after some of the docs but before all of the
> docs got processed, and then some of the remaining docs fail during
> processing?

soft commit is only about visibility.

> I assume that the client will get an error for the whole batch (because of
> the current error reporting strategy), but which docs will remain in the
> system? only those which got processed before the fail or non of the docs in
> this batch?

Generally, it will be those processed before the fail if you are using the bulk add methods. Somewhat depends on impls and such - for example CloudSolrServer can use multiple threads to route documents and so perhaps a couple documents after the fail make it in.


- Mark

> 
> 
> 
> 
> Mark Miller-3 wrote
>> If you want this promise and complete control, you pretty much need to do
>> a doc per request and many parallel requests for speed.
>> 
>> The bulk and streaming methods of adding documents do not have a good fine
>> grained error reporting strategy yet. It’s okay for certain use cases and
>> and especially batch loading, and you will know when an update is rejected
>> - it just might not be easy to know which in the batch / stream.
>> 
>> Documents that come in batches are added as they come / are processed -
>> not in some atomic unit.
>> 
>> What controls how soon you will see documents or whether you will see them
>> as they are still loading is simply when you soft commit and how many docs
>> have been indexed when the soft commit happens.
>> 
>> - Mark
>> 
>> On Nov 25, 2013, at 1:03 AM, adfel70 &lt;
> 
>> adfel70@
> 
>> &gt; wrote:
>> 
>>> Hi Mark, Thanks for the answer.
>>> 
>>> One more question though: You say that if I get a success from the
>>> update,
>>> it’s in the system, commit or not. But when exactly do I get this
>>> feedback -
>>> Is it one feedback per the whole request, or per one add inside the
>>> request?
>>> I will give an example clarify my question: Say I have new empty index,
>>> and
>>> I repeatedly send indexing requests - every request adds 500 new
>>> documents
>>> to the index. Is it possible that in some point during this process, to
>>> query the index and get a total of 1,030 docs total? (Lets assume there
>>> were
>>> no indexing errors got from Solr)
>>> 
>>> Thanks again.
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102996.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context:http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102999.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Commit behaviour in SolrCloud

Posted by adfel70 <ad...@gmail.com>.
Just to clarify how these two phrases come together:
1. "you will know when an update is rejected - it just might not be easy to
know which in the batch / stream"

2. "Documents that come in batches are added as they come / are processed -
not in some atomic unit."


If I send a batch of documents in one update request, and some of the docs
fail - will the other docs still remain in the system?
what if soft commit occurred after some of the docs but before all of the
docs got processed, and then some of the remaining docs fail during
processing?
I assume that the client will get an error for the whole batch (because of
the current error reporting strategy), but which docs will remain in the
system? only those which got processed before the fail or non of the docs in
this batch?




Mark Miller-3 wrote
> If you want this promise and complete control, you pretty much need to do
> a doc per request and many parallel requests for speed.
> 
> The bulk and streaming methods of adding documents do not have a good fine
> grained error reporting strategy yet. It’s okay for certain use cases and
> and especially batch loading, and you will know when an update is rejected
> - it just might not be easy to know which in the batch / stream.
> 
> Documents that come in batches are added as they come / are processed -
> not in some atomic unit.
> 
> What controls how soon you will see documents or whether you will see them
> as they are still loading is simply when you soft commit and how many docs
> have been indexed when the soft commit happens.
> 
> - Mark
> 
> On Nov 25, 2013, at 1:03 AM, adfel70 &lt;

> adfel70@

> &gt; wrote:
> 
>> Hi Mark, Thanks for the answer.
>> 
>> One more question though: You say that if I get a success from the
>> update,
>> it’s in the system, commit or not. But when exactly do I get this
>> feedback -
>> Is it one feedback per the whole request, or per one add inside the
>> request?
>> I will give an example clarify my question: Say I have new empty index,
>> and
>> I repeatedly send indexing requests - every request adds 500 new
>> documents
>> to the index. Is it possible that in some point during this process, to
>> query the index and get a total of 1,030 docs total? (Lets assume there
>> were
>> no indexing errors got from Solr)
>> 
>> Thanks again.
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102996.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102999.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Commit behaviour in SolrCloud

Posted by Mark Miller <ma...@gmail.com>.
If you want this promise and complete control, you pretty much need to do a doc per request and many parallel requests for speed.

The bulk and streaming methods of adding documents do not have a good fine grained error reporting strategy yet. It’s okay for certain use cases and and especially batch loading, and you will know when an update is rejected - it just might not be easy to know which in the batch / stream.

Documents that come in batches are added as they come / are processed - not in some atomic unit.

What controls how soon you will see documents or whether you will see them as they are still loading is simply when you soft commit and how many docs have been indexed when the soft commit happens.

- Mark

On Nov 25, 2013, at 1:03 AM, adfel70 <ad...@gmail.com> wrote:

> Hi Mark, Thanks for the answer.
> 
> One more question though: You say that if I get a success from the update,
> it’s in the system, commit or not. But when exactly do I get this feedback -
> Is it one feedback per the whole request, or per one add inside the request?
> I will give an example clarify my question: Say I have new empty index, and
> I repeatedly send indexing requests - every request adds 500 new documents
> to the index. Is it possible that in some point during this process, to
> query the index and get a total of 1,030 docs total? (Lets assume there were
> no indexing errors got from Solr)
> 
> Thanks again.
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102996.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Commit behaviour in SolrCloud

Posted by adfel70 <ad...@gmail.com>.
Hi Mark, Thanks for the answer.

One more question though: You say that if I get a success from the update,
it’s in the system, commit or not. But when exactly do I get this feedback -
Is it one feedback per the whole request, or per one add inside the request?
I will give an example clarify my question: Say I have new empty index, and
I repeatedly send indexing requests - every request adds 500 new documents
to the index. Is it possible that in some point during this process, to
query the index and get a total of 1,030 docs total? (Lets assume there were
no indexing errors got from Solr)

Thanks again.




--
View this message in context: http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102996.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Commit behaviour in SolrCloud

Posted by Mark Miller <ma...@gmail.com>.
SolrCloud does not use commits for update acceptance promises.

The idea is, if you get a success from the update, it’s in the system, commit or not.

Soft Commits are used for visibility only.

Standard Hard Commits are used essentially for internal purposes and should be done via auto commit generally.

To your question though - it is fine to send a commit while updates are coming in from another source - it’s just not generally necessary to do that anyway.

- Mark

On Nov 24, 2013, at 1:01 PM, adfel70 <ad...@gmail.com> wrote:

> Hi everyone,
> 
> I am wondering how commit operation works in SolrCloud:
> Say I have 2 parallel indexing processes. What if one process sends big
> update request (an add command with a lot of docs), and the other one just
> happens to send a commit command while the update request is being
> processed. 
> Is it possible that only part of the documents will be commited? 
> What will happen with the other docs? Is Solr transactional and promise that
> there will be no partial results?
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879.html
> Sent from the Solr - User mailing list archive at Nabble.com.