You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mark12345 <ma...@yahoo.com.au> on 2013/05/02 09:06:45 UTC

SolrJ / Solr Two Phase Commit

I am wondering if it was possible to achieve SolrJ/Solr Two Phase Commit. 
Any examples?  Any best practices?

What I know:
* Lucene offers Two Phase Commit 	via it's index writer (prepareCommit()
followed by either commit() or rollback()).

http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/IndexWriter.html

* I know Solr Optimistic Concurrency is available.
http://yonik.com/solr/optimistic-concurrency/
<http://yonik.com/solr/optimistic-concurrency/>  


I want a transactional behavior that ensures that there is a full commit or
full rollback of multiple documents.  I do not want to be in a situation
where I don't know if the beans have been written or not written to the Solr
instance.

* Code Snippet

> try {
> 	UpdateResponse updateResponse = server.add(Arrays.asList(docOne,
> docTwo));
> 	successForAddingDocuments = (updateResponse.getStatus() == 0);
> 	if (successForAddingDocuments) {
> 		UpdateResponse updateResponseForCommit = server.commit();
> 		successForCommit = (updateResponseForCommit.getStatus() == 0);
> 	}
> } catch (Exception e) {
> } finally {
> 	if (!successForCommit) {
> 		System.err.println("Rolling back transaction.");
> 		try {
> 			UpdateResponse updateResponseForRollback = server.rollback();
> 			if (updateResponseForRollback.getStatus() == 0) {
> 				successForRollback = true;
> 			} else {
> 				successForRollback = false;
> 				System.err.println("Failed to rollback!  Bad as state is now
> unknown!");
> 			}
> 		} catch (Exception e) {
> 		}
> 	}
> }

* Full Test class

> @Test
> public void documentTransactionTest() {
> 
> 	try {
> 		// HttpSolrServer server = ...
> 		server.deleteById(Arrays.asList("1", "2"));
> 		server.commit();
> 	} catch (Exception e) {
> 		e.printStackTrace();
> 	}
> 
> 	SolrInputDocument docOne = new SolrInputDocument();
> 	{
> 		docOne.addField("id", 1L);
> 		docOne.addField("type_s", "MyTestDoc");
> 		docOne.addField("value_s", "docOne");
> 		docOne.addField("_version_", -1L);
> 	}
> 
> 	SolrInputDocument docTwo = new SolrInputDocument();
> 	{
> 		docTwo.addField("id", 2L);
> 		docTwo.addField("type_s", "MyTestDoc");
> 		docTwo.addField("value_s", "docTwo");
> 		docTwo.addField("_version_", -1L);
> 	}
> 
> 	boolean successForAddingDocuments = false;
> 	boolean successForCommit = false;
> 	boolean successForRollback = false;
> 
> //        throw new SolrServerException("Connection Broken");
> 
> 	try {
> 		UpdateResponse updateResponse = server.add(Arrays.asList(docOne,
> docTwo));
> 		successForAddingDocuments = (updateResponse.getStatus() == 0);
> 		if (successForAddingDocuments) {
> 			UpdateResponse updateResponseForCommit = server.commit();
> 			successForCommit = (updateResponseForCommit.getStatus() == 0);
> 		}
> 	} catch (Exception e) {
> 	} finally {
> 		if (!successForCommit) {
> 			System.err.println("Rolling back transaction.");
> 			try {
> 				UpdateResponse updateResponseForRollback = server.rollback();
> 				if (updateResponseForRollback.getStatus() == 0) {
> 					successForRollback = true;
> 				} else {
> 					successForRollback = false;
> 					System.err.println("Failed to rollback!  Bad as state is now
> unknown!");
> 				}
> 			} catch (Exception e) {
> 			}
> 		}
> 	}
> 
> 	{
> 		try {
> 			QueryResponse response = server.query(new
> SolrQuery("*:*").addFilterQuery("type_s:MyTestDoc"));
> 
> 			if (successForCommit) {
> 				Assert.assertEquals(2, response.getResults().size());
> 				Assert.assertEquals("docOne",
> response.getResults().get(0).get("value_s"));
> 				Assert.assertEquals("docTwo",
> response.getResults().get(1).get("value_s"));
> 			} else if (successForRollback) {
> 				Assert.assertEquals(0, response.getResults().size());
> 			} else {
> 				// UNKNOWN STATE
> 
> 				if (response.getResults().size() == 0) {
> 					// rollback must have been successful
> 				} else if (response.getResults().size() == 2) {
> 					// commit was successful
> 				} else {
> 					Assert.fail();
> 				}
> 			}
> 		} catch (Exception e) {
> 			Assert.fail();
> 		}
> 	}
> 
> }

http://lucene.apache.org/solr/4_2_0/solr-solrj/org/apache/solr/client/solrj/impl/HttpSolrServer.html





--
View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ / Solr Two Phase Commit

Posted by Walter Underwood <wu...@wunderwood.org>.
Yes, that is correct.  --wunder

On May 2, 2013, at 7:46 PM, mark12345 wrote:

> Question: Just to clarify. Are you saying that if I have multiple threads
> using multiple instances of HttpSolrServer each making calls to add
> SolrInputDocuments (For example, "httpSolrServer.add(SolrInputDocument
> doc)". ), and one server calls "httpSolrServer.commit()", all documents
> added are now commited?
> 
> 
> If that is the case it does help me understand the rollback api description
> in a new light.
> 
> http://lucene.apache.org/solr/4_2_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29
> 
>> Performs a rollback of all non-committed documents pending.
>> 
>> Note that this is not a true rollback as in databases. Content you have
>> previously added may have been committed due to autoCommit, buffer full,
>> other client performing a commit etc.
> 
> ----
> 
> 
> Michael Della Bitta-2 wrote
>> Per core or collection, depending on whether we're talking about Cloud or
>> not.
>> 
>> Basically, commits in Solr are about controlling visibility more than
>> anything, although now with Cloud, they have resource consumption and
>> lifecycle ramifications as well.
>> On May 2, 2013 10:01 PM, "mark12345" wrote:
>> 
>>> By saying commits in Solr are "global", do you mean per Solr deployment,
>>> per
>>> HttpSolrServer instance, per thread, or something else?
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
> 
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060589.html
> Sent from the Solr - User mailing list archive at Nabble.com.

--
Walter Underwood
wunder@wunderwood.org




Re: SolrJ / Solr Two Phase Commit

Posted by mark12345 <ma...@yahoo.com.au>.
Question: Just to clarify. Are you saying that if I have multiple threads
using multiple instances of HttpSolrServer each making calls to add
SolrInputDocuments (For example, "httpSolrServer.add(SolrInputDocument
doc)". ), and one server calls "httpSolrServer.commit()", all documents
added are now commited?


If that is the case it does help me understand the rollback api description
in a new light.

http://lucene.apache.org/solr/4_2_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback%28%29

> Performs a rollback of all non-committed documents pending.
> 
> Note that this is not a true rollback as in databases. Content you have
> previously added may have been committed due to autoCommit, buffer full,
> other client performing a commit etc.

----


Michael Della Bitta-2 wrote
> Per core or collection, depending on whether we're talking about Cloud or
> not.
> 
> Basically, commits in Solr are about controlling visibility more than
> anything, although now with Cloud, they have resource consumption and
> lifecycle ramifications as well.
> On May 2, 2013 10:01 PM, "mark12345" wrote:
> 
>> By saying commits in Solr are "global", do you mean per Solr deployment,
>> per
>> HttpSolrServer instance, per thread, or something else?
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>





--
View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060589.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ / Solr Two Phase Commit

Posted by Michael Della Bitta <mi...@appinions.com>.
Peer core or collection, depending on whether we're talking about Cloud or
not.

Basically, commits in Solr are about controlling visibility more than
anything, although now with Cloud, they have resource consumption and
lifecycle ramifications as well.
On May 2, 2013 10:01 PM, "mark12345" <ma...@yahoo.com.au> wrote:

> By saying commits in Solr are "global", do you mean per Solr deployment,
> per
> HttpSolrServer instance, per thread, or something else?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SolrJ / Solr Two Phase Commit

Posted by mark12345 <ma...@yahoo.com.au>.
By saying commits in Solr are "global", do you mean per Solr deployment, per
HttpSolrServer instance, per thread, or something else?



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ / Solr Two Phase Commit

Posted by Michael Della Bitta <mi...@appinions.com>.
One thing I do know is that commits in Solr are global, so there's no way
to do this with concurrency.

That being said, Solr doesn't tend to accept updates that would generate
errors once committed in my experience.


Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, May 2, 2013 at 3:06 AM, mark12345 <ma...@yahoo.com.au>wrote:

> I am wondering if it was possible to achieve SolrJ/Solr Two Phase Commit.
> Any examples?  Any best practices?
>
> What I know:
> * Lucene offers Two Phase Commit        via it's index writer
> (prepareCommit()
> followed by either commit() or rollback()).
>
>
> http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/IndexWriter.html
>
> * I know Solr Optimistic Concurrency is available.
> http://yonik.com/solr/optimistic-concurrency/
> <http://yonik.com/solr/optimistic-concurrency/>
>
>
> I want a transactional behavior that ensures that there is a full commit or
> full rollback of multiple documents.  I do not want to be in a situation
> where I don't know if the beans have been written or not written to the
> Solr
> instance.
>
> * Code Snippet
>
> > try {
> >       UpdateResponse updateResponse = server.add(Arrays.asList(docOne,
> > docTwo));
> >       successForAddingDocuments = (updateResponse.getStatus() == 0);
> >       if (successForAddingDocuments) {
> >               UpdateResponse updateResponseForCommit = server.commit();
> >               successForCommit = (updateResponseForCommit.getStatus() ==
> 0);
> >       }
> > } catch (Exception e) {
> > } finally {
> >       if (!successForCommit) {
> >               System.err.println("Rolling back transaction.");
> >               try {
> >                       UpdateResponse updateResponseForRollback =
> server.rollback();
> >                       if (updateResponseForRollback.getStatus() == 0) {
> >                               successForRollback = true;
> >                       } else {
> >                               successForRollback = false;
> >                               System.err.println("Failed to rollback!
>  Bad as state is now
> > unknown!");
> >                       }
> >               } catch (Exception e) {
> >               }
> >       }
> > }
>
> * Full Test class
>
> > @Test
> > public void documentTransactionTest() {
> >
> >       try {
> >               // HttpSolrServer server = ...
> >               server.deleteById(Arrays.asList("1", "2"));
> >               server.commit();
> >       } catch (Exception e) {
> >               e.printStackTrace();
> >       }
> >
> >       SolrInputDocument docOne = new SolrInputDocument();
> >       {
> >               docOne.addField("id", 1L);
> >               docOne.addField("type_s", "MyTestDoc");
> >               docOne.addField("value_s", "docOne");
> >               docOne.addField("_version_", -1L);
> >       }
> >
> >       SolrInputDocument docTwo = new SolrInputDocument();
> >       {
> >               docTwo.addField("id", 2L);
> >               docTwo.addField("type_s", "MyTestDoc");
> >               docTwo.addField("value_s", "docTwo");
> >               docTwo.addField("_version_", -1L);
> >       }
> >
> >       boolean successForAddingDocuments = false;
> >       boolean successForCommit = false;
> >       boolean successForRollback = false;
> >
> > //        throw new SolrServerException("Connection Broken");
> >
> >       try {
> >               UpdateResponse updateResponse =
> server.add(Arrays.asList(docOne,
> > docTwo));
> >               successForAddingDocuments = (updateResponse.getStatus() ==
> 0);
> >               if (successForAddingDocuments) {
> >                       UpdateResponse updateResponseForCommit =
> server.commit();
> >                       successForCommit =
> (updateResponseForCommit.getStatus() == 0);
> >               }
> >       } catch (Exception e) {
> >       } finally {
> >               if (!successForCommit) {
> >                       System.err.println("Rolling back transaction.");
> >                       try {
> >                               UpdateResponse updateResponseForRollback =
> server.rollback();
> >                               if (updateResponseForRollback.getStatus()
> == 0) {
> >                                       successForRollback = true;
> >                               } else {
> >                                       successForRollback = false;
> >                                       System.err.println("Failed to
> rollback!  Bad as state is now
> > unknown!");
> >                               }
> >                       } catch (Exception e) {
> >                       }
> >               }
> >       }
> >
> >       {
> >               try {
> >                       QueryResponse response = server.query(new
> > SolrQuery("*:*").addFilterQuery("type_s:MyTestDoc"));
> >
> >                       if (successForCommit) {
> >                               Assert.assertEquals(2,
> response.getResults().size());
> >                               Assert.assertEquals("docOne",
> > response.getResults().get(0).get("value_s"));
> >                               Assert.assertEquals("docTwo",
> > response.getResults().get(1).get("value_s"));
> >                       } else if (successForRollback) {
> >                               Assert.assertEquals(0,
> response.getResults().size());
> >                       } else {
> >                               // UNKNOWN STATE
> >
> >                               if (response.getResults().size() == 0) {
> >                                       // rollback must have been
> successful
> >                               } else if (response.getResults().size() ==
> 2) {
> >                                       // commit was successful
> >                               } else {
> >                                       Assert.fail();
> >                               }
> >                       }
> >               } catch (Exception e) {
> >                       Assert.fail();
> >               }
> >       }
> >
> > }
>
>
> http://lucene.apache.org/solr/4_2_0/solr-solrj/org/apache/solr/client/solrj/impl/HttpSolrServer.html
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: The HttpSolrServer "add(Collection docs)" method is not atomic.

Posted by Erick Erickson <er...@gmail.com>.
bq:  Is there a way to commit multiple documents/beans in a
transaction/together in a way that it succeeds completely or fails
completely?

Not that I know of. I've seen various "divide and conquer" strategies
to identify _which_ document failed, but the general process
is usually to re-index the docs in smaller chunks until you
isolate the offending one and trust that re-indexing documents will
be OK since it overwrites the earlier copiy.

Best
Erick

On Thu, May 2, 2013 at 7:53 PM, mark12345 <ma...@yahoo.com.au> wrote:
> One thing I noticed is that while the HttpSolrServer "add(SolrInputDocument
> doc)" method is atomic (Either a bean is added or an exception is thrown),
> the HttpSolrServer "add(Collection<SolrInputDocument> docs)" method is not
> atomic.
>
> Question:  Is there a way to commit multiple documents/beans in a
> transaction/together in a way that it succeeds completely or fails
> completely?
>
>
> Quick outline of what I did to highlight a call to HttpSolrServer
> "add(Collection<SolrInputDocument> docs)" method is not atomic.
> 1.  Create 5 documents, comprising of 4 valid documents (Documents 1,2,4,5)
> and 1 document with an issue, document 3.
> 2.  Call to HttpSolrServer "add(Collection<SolrInputDocument> docs)" which
> threw a SolrException.
> 3.  Call to HttpSolrServer "commit()".
> 4.  Discovered that 2 out of 5 (documents 1 and 2) documents where still
> commited.
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060590.html
> Sent from the Solr - User mailing list archive at Nabble.com.

The HttpSolrServer "add(Collection docs)" method is not atomic.

Posted by mark12345 <ma...@yahoo.com.au>.
One thing I noticed is that while the HttpSolrServer "add(SolrInputDocument
doc)" method is atomic (Either a bean is added or an exception is thrown),
the HttpSolrServer "add(Collection<SolrInputDocument> docs)" method is not
atomic.  

Question:  Is there a way to commit multiple documents/beans in a
transaction/together in a way that it succeeds completely or fails
completely?


Quick outline of what I did to highlight a call to HttpSolrServer
"add(Collection<SolrInputDocument> docs)" method is not atomic.
1.  Create 5 documents, comprising of 4 valid documents (Documents 1,2,4,5)
and 1 document with an issue, document 3.
2.  Call to HttpSolrServer "add(Collection<SolrInputDocument> docs)" which
threw a SolrException.
3.  Call to HttpSolrServer "commit()".
4.  Discovered that 2 out of 5 (documents 1 and 2) documents where still
commited.




--
View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060590.html
Sent from the Solr - User mailing list archive at Nabble.com.