You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Greg Georges <gr...@biztree.com> on 2011/05/02 19:33:30 UTC

Question concerning the updating of my solr index

Hello all,

I have integrated Solr into my project with success. I use a dataimporthandler to first import the data mapping the fields to my schema.xml. I use Solrj to query the data and also use faceting. Works great.

The question I have now is a general one on updating the index and how it works. Right now, I have a thread which runs a couple of times a day to update the index. My index is composed of about 20000 documents, and when this thread is run it takes the data of the 20000 documents in the db, I create a solrdocument for each and I then use this line of code to index the index.

SolrServer server = new CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();

for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
Document document = (Document) iterator.next();
SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);
               docs.add(solrDoc);
}

UpdateRequest req = new UpdateRequest();
req.setAction(UpdateRequest.ACTION.COMMIT, false, false);
req.add(docs);
UpdateResponse rsp = req.process(server);

server.optimize();

This process takes 19 seconds, which is 10 seconds faster than my older solution using compass (another opensource search project we used). Is this the best was to update the index? If I understand correctly, an update is actually a delete in the index then an add. During the 19 seconds, will my index be locked only on the document being updated or the whole index could be locked? I am not in production yet with this solution, so I want to make sure my update process makes sense. Thanks

Greg

RE: Question concerning the updating of my solr index

Posted by Greg Georges <gr...@biztree.com>.
Yeah you are right, I have changed that to add a document and not a list of documents. Still works pretty fast, I will continue to test settings to see if I can tweak it further. Thanks

Greg

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: 2 mai 2011 14:56
To: solr-user@lucene.apache.org
Subject: Re: Question concerning the updating of my solr index

Greg,

I believe the point of SUSS is that you can just add docs to it one by one, so 
that SUSS can asynchronously send them to the backend Solr instead of you 
batching the docs.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 2:45:40 PM
> Subject: RE: Question concerning the updating of my solr index
> 
> Oops, here is the code
> 
> SolrServer server = new  
>StreamingUpdateSolrServer("http://localhost:8080/apache-solr-1.4.1/", 1000, 4); 
>
>         Collection<SolrInputDocument>  docs = new 
>ArrayList<SolrInputDocument>();
>          
>         for (Iterator  iterator = documents.iterator(); iterator.hasNext();) {
>              Document document = (Document)  iterator.next();
>             
>             SolrInputDocument  solrDoc = 
>SolrUtils.createDocsSolrDocument(document);    
>
>              docs.add(solrDoc);
>          }
> 
>        server.add(docs);
>         server.commit();
>         server.optimize();
> 
> Greg
> 
> -----Original Message-----
> From: Greg  Georges [mailto:greg.georges@biztree.com] 
> Sent: 2  mai 2011 14:44
> To: solr-user@lucene.apache.org
> Subject:  RE: Question concerning the updating of my solr index
> 
> Ok I had seen this  in the wiki, performance has gone from 19 seconds to 13. I 
>have configured it  like this, I wonder what would the best settings be with 
>20,000 docs to update?  Higher or lower queue value? Higher or lower thread 
>value?  Thanks
> 
> Greg
> 
> -----Original Message-----
> From: Otis Gospodnetic  [mailto:otis_gospodnetic@yahoo.com] 
> Sent: 2 mai 2011 13:59
> To: solr-user@lucene.apache.org
> Subject:  Re: Question concerning the updating of my solr index
> 
> Greg,
> 
> You  could use StreamingUpdateSolrServer instead of that UpdateRequest class - 

> http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
> Your  index won't be locked in the sense that you could have multiple apps or 
> threads adding docs to the same index simultaneously and that searches can  be 

> executed against the index concurrently.
> 
> Otis
> ----
> Sematext  :: http://sematext.com/ ::  Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message  ----
> > From: Greg Georges <gr...@biztree.com>
> >  To: "solr-user@lucene.apache.org"  <so...@lucene.apache.org>
> >  Sent: Mon, May 2, 2011 1:33:30 PM
> > Subject: Question concerning the  updating of my solr index
> > 
> > Hello all,
> > 
> > I have  integrated Solr into my project with success. I use a  
>dataimporthandler 
>
> >to first import the data mapping the fields to my schema.xml.  I  use Solrj to 
>
> >query the data and also use faceting. Works great.
> > 
> > The  question I have now is a general one on updating the index  and how it 
> >works.  Right now, I have a thread which runs a couple  of times a day to 
>update 
>
> >the  index. My index is composed of about  20000 documents, and when this 
>thread 
>
> >is  run it takes the data of  the 20000 documents in the db, I create a 
> >solrdocument  for each  and I then use this line of code to index the index.
> > 
> >  SolrServer  server = new 
> >CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> >  Collection<SolrInputDocument>  docs = new  ArrayList<SolrInputDocument>();
> > 
> > for (Iterator  iterator  = documents.iterator(); iterator.hasNext();) {
> > Document  document = (Document)  iterator.next();
> > SolrInputDocument solrDoc  =  SolrUtils.createDocsSolrDocument(document);
> >                  docs.add(solrDoc);
> > }
> > 
> > UpdateRequest req = new  UpdateRequest();
> >  req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
> >  req.add(docs);
> > UpdateResponse rsp =  req.process(server);
> > 
> > server.optimize();
> > 
> > This process takes 19   seconds, which is 10 seconds faster than my older 
> >solution using  compass  (another opensource search project we used). Is this 
>the 
>
> >best was to update the  index? If I understand correctly, an update  is 
>actually 
>
> >a delete in the index  then an add. During the 19  seconds, will my index be 
> >locked only on the document  being  updated or the whole index could be 
>locked? I 
>
> >am not in production  yet  with this solution, so I want to make sure my 
>update 
>
> >process  makes sense.  Thanks
> > 
> > Greg
> > 
> 

Re: Question concerning the updating of my solr index

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Greg,

I believe the point of SUSS is that you can just add docs to it one by one, so 
that SUSS can asynchronously send them to the backend Solr instead of you 
batching the docs.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 2:45:40 PM
> Subject: RE: Question concerning the updating of my solr index
> 
> Oops, here is the code
> 
> SolrServer server = new  
>StreamingUpdateSolrServer("http://localhost:8080/apache-solr-1.4.1/", 1000, 4); 
>
>         Collection<SolrInputDocument>  docs = new 
>ArrayList<SolrInputDocument>();
>          
>         for (Iterator  iterator = documents.iterator(); iterator.hasNext();) {
>              Document document = (Document)  iterator.next();
>             
>             SolrInputDocument  solrDoc = 
>SolrUtils.createDocsSolrDocument(document);    
>
>              docs.add(solrDoc);
>          }
> 
>        server.add(docs);
>         server.commit();
>         server.optimize();
> 
> Greg
> 
> -----Original Message-----
> From: Greg  Georges [mailto:greg.georges@biztree.com] 
> Sent: 2  mai 2011 14:44
> To: solr-user@lucene.apache.org
> Subject:  RE: Question concerning the updating of my solr index
> 
> Ok I had seen this  in the wiki, performance has gone from 19 seconds to 13. I 
>have configured it  like this, I wonder what would the best settings be with 
>20,000 docs to update?  Higher or lower queue value? Higher or lower thread 
>value?  Thanks
> 
> Greg
> 
> -----Original Message-----
> From: Otis Gospodnetic  [mailto:otis_gospodnetic@yahoo.com] 
> Sent: 2 mai 2011 13:59
> To: solr-user@lucene.apache.org
> Subject:  Re: Question concerning the updating of my solr index
> 
> Greg,
> 
> You  could use StreamingUpdateSolrServer instead of that UpdateRequest class - 

> http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
> Your  index won't be locked in the sense that you could have multiple apps or 
> threads adding docs to the same index simultaneously and that searches can  be 

> executed against the index concurrently.
> 
> Otis
> ----
> Sematext  :: http://sematext.com/ ::  Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message  ----
> > From: Greg Georges <gr...@biztree.com>
> >  To: "solr-user@lucene.apache.org"  <so...@lucene.apache.org>
> >  Sent: Mon, May 2, 2011 1:33:30 PM
> > Subject: Question concerning the  updating of my solr index
> > 
> > Hello all,
> > 
> > I have  integrated Solr into my project with success. I use a  
>dataimporthandler 
>
> >to first import the data mapping the fields to my schema.xml.  I  use Solrj to 
>
> >query the data and also use faceting. Works great.
> > 
> > The  question I have now is a general one on updating the index  and how it 
> >works.  Right now, I have a thread which runs a couple  of times a day to 
>update 
>
> >the  index. My index is composed of about  20000 documents, and when this 
>thread 
>
> >is  run it takes the data of  the 20000 documents in the db, I create a 
> >solrdocument  for each  and I then use this line of code to index the index.
> > 
> >  SolrServer  server = new 
> >CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> >  Collection<SolrInputDocument>  docs = new  ArrayList<SolrInputDocument>();
> > 
> > for (Iterator  iterator  = documents.iterator(); iterator.hasNext();) {
> > Document  document = (Document)  iterator.next();
> > SolrInputDocument solrDoc  =  SolrUtils.createDocsSolrDocument(document);
> >                  docs.add(solrDoc);
> > }
> > 
> > UpdateRequest req = new  UpdateRequest();
> >  req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
> >  req.add(docs);
> > UpdateResponse rsp =  req.process(server);
> > 
> > server.optimize();
> > 
> > This process takes 19   seconds, which is 10 seconds faster than my older 
> >solution using  compass  (another opensource search project we used). Is this 
>the 
>
> >best was to update the  index? If I understand correctly, an update  is 
>actually 
>
> >a delete in the index  then an add. During the 19  seconds, will my index be 
> >locked only on the document  being  updated or the whole index could be 
>locked? I 
>
> >am not in production  yet  with this solution, so I want to make sure my 
>update 
>
> >process  makes sense.  Thanks
> > 
> > Greg
> > 
> 

RE: Question concerning the updating of my solr index

Posted by Greg Georges <gr...@biztree.com>.
Oops, here is the code

SolrServer server = new StreamingUpdateSolrServer("http://localhost:8080/apache-solr-1.4.1/", 1000, 4); 
		Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
		
		for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
			Document document = (Document) iterator.next();
			
			SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);	
		    docs.add(solrDoc);
		}

	   server.add(docs);
	   server.commit();
	   server.optimize();

Greg

-----Original Message-----
From: Greg Georges [mailto:greg.georges@biztree.com] 
Sent: 2 mai 2011 14:44
To: solr-user@lucene.apache.org
Subject: RE: Question concerning the updating of my solr index

Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I have configured it like this, I wonder what would the best settings be with 20,000 docs to update? Higher or lower queue value? Higher or lower thread value? Thanks

Greg

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: 2 mai 2011 13:59
To: solr-user@lucene.apache.org
Subject: Re: Question concerning the updating of my solr index

Greg,

You could use StreamingUpdateSolrServer instead of that UpdateRequest class - 
http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
Your index won't be locked in the sense that you could have multiple apps or 
threads adding docs to the same index simultaneously and that searches can be 
executed against the index concurrently.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 1:33:30 PM
> Subject: Question concerning the updating of my solr index
> 
> Hello all,
> 
> I have integrated Solr into my project with success. I use a  dataimporthandler 
>to first import the data mapping the fields to my schema.xml.  I use Solrj to 
>query the data and also use faceting. Works great.
> 
> The  question I have now is a general one on updating the index and how it 
>works.  Right now, I have a thread which runs a couple of times a day to update 
>the  index. My index is composed of about 20000 documents, and when this thread 
>is  run it takes the data of the 20000 documents in the db, I create a 
>solrdocument  for each and I then use this line of code to index the index.
> 
> SolrServer  server = new 
>CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> Collection<SolrInputDocument>  docs = new ArrayList<SolrInputDocument>();
> 
> for (Iterator iterator  = documents.iterator(); iterator.hasNext();) {
> Document document = (Document)  iterator.next();
> SolrInputDocument solrDoc =  SolrUtils.createDocsSolrDocument(document);
>                 docs.add(solrDoc);
> }
> 
> UpdateRequest req = new  UpdateRequest();
> req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
> req.add(docs);
> UpdateResponse rsp =  req.process(server);
> 
> server.optimize();
> 
> This process takes 19  seconds, which is 10 seconds faster than my older 
>solution using compass  (another opensource search project we used). Is this the 
>best was to update the  index? If I understand correctly, an update is actually 
>a delete in the index  then an add. During the 19 seconds, will my index be 
>locked only on the document  being updated or the whole index could be locked? I 
>am not in production yet  with this solution, so I want to make sure my update 
>process makes sense.  Thanks
> 
> Greg
> 

RE: Question concerning the updating of my solr index

Posted by Greg Georges <gr...@biztree.com>.
Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I have configured it like this, I wonder what would the best settings be with 20,000 docs to update? Higher or lower queue value? Higher or lower thread value? Thanks

Greg

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: 2 mai 2011 13:59
To: solr-user@lucene.apache.org
Subject: Re: Question concerning the updating of my solr index

Greg,

You could use StreamingUpdateSolrServer instead of that UpdateRequest class - 
http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
Your index won't be locked in the sense that you could have multiple apps or 
threads adding docs to the same index simultaneously and that searches can be 
executed against the index concurrently.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 1:33:30 PM
> Subject: Question concerning the updating of my solr index
> 
> Hello all,
> 
> I have integrated Solr into my project with success. I use a  dataimporthandler 
>to first import the data mapping the fields to my schema.xml.  I use Solrj to 
>query the data and also use faceting. Works great.
> 
> The  question I have now is a general one on updating the index and how it 
>works.  Right now, I have a thread which runs a couple of times a day to update 
>the  index. My index is composed of about 20000 documents, and when this thread 
>is  run it takes the data of the 20000 documents in the db, I create a 
>solrdocument  for each and I then use this line of code to index the index.
> 
> SolrServer  server = new 
>CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> Collection<SolrInputDocument>  docs = new ArrayList<SolrInputDocument>();
> 
> for (Iterator iterator  = documents.iterator(); iterator.hasNext();) {
> Document document = (Document)  iterator.next();
> SolrInputDocument solrDoc =  SolrUtils.createDocsSolrDocument(document);
>                 docs.add(solrDoc);
> }
> 
> UpdateRequest req = new  UpdateRequest();
> req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
> req.add(docs);
> UpdateResponse rsp =  req.process(server);
> 
> server.optimize();
> 
> This process takes 19  seconds, which is 10 seconds faster than my older 
>solution using compass  (another opensource search project we used). Is this the 
>best was to update the  index? If I understand correctly, an update is actually 
>a delete in the index  then an add. During the 19 seconds, will my index be 
>locked only on the document  being updated or the whole index could be locked? I 
>am not in production yet  with this solution, so I want to make sure my update 
>process makes sense.  Thanks
> 
> Greg
> 

Re: Question concerning the updating of my solr index

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Greg,

You could use StreamingUpdateSolrServer instead of that UpdateRequest class - 
http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
Your index won't be locked in the sense that you could have multiple apps or 
threads adding docs to the same index simultaneously and that searches can be 
executed against the index concurrently.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 1:33:30 PM
> Subject: Question concerning the updating of my solr index
> 
> Hello all,
> 
> I have integrated Solr into my project with success. I use a  dataimporthandler 
>to first import the data mapping the fields to my schema.xml.  I use Solrj to 
>query the data and also use faceting. Works great.
> 
> The  question I have now is a general one on updating the index and how it 
>works.  Right now, I have a thread which runs a couple of times a day to update 
>the  index. My index is composed of about 20000 documents, and when this thread 
>is  run it takes the data of the 20000 documents in the db, I create a 
>solrdocument  for each and I then use this line of code to index the index.
> 
> SolrServer  server = new 
>CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> Collection<SolrInputDocument>  docs = new ArrayList<SolrInputDocument>();
> 
> for (Iterator iterator  = documents.iterator(); iterator.hasNext();) {
> Document document = (Document)  iterator.next();
> SolrInputDocument solrDoc =  SolrUtils.createDocsSolrDocument(document);
>                 docs.add(solrDoc);
> }
> 
> UpdateRequest req = new  UpdateRequest();
> req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
> req.add(docs);
> UpdateResponse rsp =  req.process(server);
> 
> server.optimize();
> 
> This process takes 19  seconds, which is 10 seconds faster than my older 
>solution using compass  (another opensource search project we used). Is this the 
>best was to update the  index? If I understand correctly, an update is actually 
>a delete in the index  then an add. During the 19 seconds, will my index be 
>locked only on the document  being updated or the whole index could be locked? I 
>am not in production yet  with this solution, so I want to make sure my update 
>process makes sense.  Thanks
> 
> Greg
>