You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Greg Georges <gr...@biztree.com> on 2011/05/02 19:33:30 UTC
Question concerning the updating of my solr index
Hello all,
I have integrated Solr into my project with success. I use a dataimporthandler to first import the data mapping the fields to my schema.xml. I use Solrj to query the data and also use faceting. Works great.
The question I have now is a general one on updating the index and how it works. Right now, I have a thread which runs a couple of times a day to update the index. My index is composed of about 20000 documents, and when this thread is run it takes the data of the 20000 documents in the db, I create a solrdocument for each and I then use this line of code to index the index.
SolrServer server = new CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
Document document = (Document) iterator.next();
SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);
docs.add(solrDoc);
}
UpdateRequest req = new UpdateRequest();
req.setAction(UpdateRequest.ACTION.COMMIT, false, false);
req.add(docs);
UpdateResponse rsp = req.process(server);
server.optimize();
This process takes 19 seconds, which is 10 seconds faster than my older solution using compass (another opensource search project we used). Is this the best was to update the index? If I understand correctly, an update is actually a delete in the index then an add. During the 19 seconds, will my index be locked only on the document being updated or the whole index could be locked? I am not in production yet with this solution, so I want to make sure my update process makes sense. Thanks
Greg
RE: Question concerning the updating of my solr index
Posted by Greg Georges <gr...@biztree.com>.
Yeah you are right, I have changed that to add a document and not a list of documents. Still works pretty fast, I will continue to test settings to see if I can tweak it further. Thanks
Greg
-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: 2 mai 2011 14:56
To: solr-user@lucene.apache.org
Subject: Re: Question concerning the updating of my solr index
Greg,
I believe the point of SUSS is that you can just add docs to it one by one, so
that SUSS can asynchronously send them to the backend Solr instead of you
batching the docs.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 2:45:40 PM
> Subject: RE: Question concerning the updating of my solr index
>
> Oops, here is the code
>
> SolrServer server = new
>StreamingUpdateSolrServer("http://localhost:8080/apache-solr-1.4.1/", 1000, 4);
>
> Collection<SolrInputDocument> docs = new
>ArrayList<SolrInputDocument>();
>
> for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
> Document document = (Document) iterator.next();
>
> SolrInputDocument solrDoc =
>SolrUtils.createDocsSolrDocument(document);
>
> docs.add(solrDoc);
> }
>
> server.add(docs);
> server.commit();
> server.optimize();
>
> Greg
>
> -----Original Message-----
> From: Greg Georges [mailto:greg.georges@biztree.com]
> Sent: 2 mai 2011 14:44
> To: solr-user@lucene.apache.org
> Subject: RE: Question concerning the updating of my solr index
>
> Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I
>have configured it like this, I wonder what would the best settings be with
>20,000 docs to update? Higher or lower queue value? Higher or lower thread
>value? Thanks
>
> Greg
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: 2 mai 2011 13:59
> To: solr-user@lucene.apache.org
> Subject: Re: Question concerning the updating of my solr index
>
> Greg,
>
> You could use StreamingUpdateSolrServer instead of that UpdateRequest class -
> http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
> Your index won't be locked in the sense that you could have multiple apps or
> threads adding docs to the same index simultaneously and that searches can be
> executed against the index concurrently.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
> > From: Greg Georges <gr...@biztree.com>
> > To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> > Sent: Mon, May 2, 2011 1:33:30 PM
> > Subject: Question concerning the updating of my solr index
> >
> > Hello all,
> >
> > I have integrated Solr into my project with success. I use a
>dataimporthandler
>
> >to first import the data mapping the fields to my schema.xml. I use Solrj to
>
> >query the data and also use faceting. Works great.
> >
> > The question I have now is a general one on updating the index and how it
> >works. Right now, I have a thread which runs a couple of times a day to
>update
>
> >the index. My index is composed of about 20000 documents, and when this
>thread
>
> >is run it takes the data of the 20000 documents in the db, I create a
> >solrdocument for each and I then use this line of code to index the index.
> >
> > SolrServer server = new
> >CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> > Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
> >
> > for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
> > Document document = (Document) iterator.next();
> > SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);
> > docs.add(solrDoc);
> > }
> >
> > UpdateRequest req = new UpdateRequest();
> > req.setAction(UpdateRequest.ACTION.COMMIT, false, false);
> > req.add(docs);
> > UpdateResponse rsp = req.process(server);
> >
> > server.optimize();
> >
> > This process takes 19 seconds, which is 10 seconds faster than my older
> >solution using compass (another opensource search project we used). Is this
>the
>
> >best was to update the index? If I understand correctly, an update is
>actually
>
> >a delete in the index then an add. During the 19 seconds, will my index be
> >locked only on the document being updated or the whole index could be
>locked? I
>
> >am not in production yet with this solution, so I want to make sure my
>update
>
> >process makes sense. Thanks
> >
> > Greg
> >
>
Re: Question concerning the updating of my solr index
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Greg,
I believe the point of SUSS is that you can just add docs to it one by one, so
that SUSS can asynchronously send them to the backend Solr instead of you
batching the docs.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 2:45:40 PM
> Subject: RE: Question concerning the updating of my solr index
>
> Oops, here is the code
>
> SolrServer server = new
>StreamingUpdateSolrServer("http://localhost:8080/apache-solr-1.4.1/", 1000, 4);
>
> Collection<SolrInputDocument> docs = new
>ArrayList<SolrInputDocument>();
>
> for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
> Document document = (Document) iterator.next();
>
> SolrInputDocument solrDoc =
>SolrUtils.createDocsSolrDocument(document);
>
> docs.add(solrDoc);
> }
>
> server.add(docs);
> server.commit();
> server.optimize();
>
> Greg
>
> -----Original Message-----
> From: Greg Georges [mailto:greg.georges@biztree.com]
> Sent: 2 mai 2011 14:44
> To: solr-user@lucene.apache.org
> Subject: RE: Question concerning the updating of my solr index
>
> Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I
>have configured it like this, I wonder what would the best settings be with
>20,000 docs to update? Higher or lower queue value? Higher or lower thread
>value? Thanks
>
> Greg
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: 2 mai 2011 13:59
> To: solr-user@lucene.apache.org
> Subject: Re: Question concerning the updating of my solr index
>
> Greg,
>
> You could use StreamingUpdateSolrServer instead of that UpdateRequest class -
> http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
> Your index won't be locked in the sense that you could have multiple apps or
> threads adding docs to the same index simultaneously and that searches can be
> executed against the index concurrently.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
> > From: Greg Georges <gr...@biztree.com>
> > To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> > Sent: Mon, May 2, 2011 1:33:30 PM
> > Subject: Question concerning the updating of my solr index
> >
> > Hello all,
> >
> > I have integrated Solr into my project with success. I use a
>dataimporthandler
>
> >to first import the data mapping the fields to my schema.xml. I use Solrj to
>
> >query the data and also use faceting. Works great.
> >
> > The question I have now is a general one on updating the index and how it
> >works. Right now, I have a thread which runs a couple of times a day to
>update
>
> >the index. My index is composed of about 20000 documents, and when this
>thread
>
> >is run it takes the data of the 20000 documents in the db, I create a
> >solrdocument for each and I then use this line of code to index the index.
> >
> > SolrServer server = new
> >CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> > Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
> >
> > for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
> > Document document = (Document) iterator.next();
> > SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);
> > docs.add(solrDoc);
> > }
> >
> > UpdateRequest req = new UpdateRequest();
> > req.setAction(UpdateRequest.ACTION.COMMIT, false, false);
> > req.add(docs);
> > UpdateResponse rsp = req.process(server);
> >
> > server.optimize();
> >
> > This process takes 19 seconds, which is 10 seconds faster than my older
> >solution using compass (another opensource search project we used). Is this
>the
>
> >best was to update the index? If I understand correctly, an update is
>actually
>
> >a delete in the index then an add. During the 19 seconds, will my index be
> >locked only on the document being updated or the whole index could be
>locked? I
>
> >am not in production yet with this solution, so I want to make sure my
>update
>
> >process makes sense. Thanks
> >
> > Greg
> >
>
RE: Question concerning the updating of my solr index
Posted by Greg Georges <gr...@biztree.com>.
Oops, here is the code
SolrServer server = new StreamingUpdateSolrServer("http://localhost:8080/apache-solr-1.4.1/", 1000, 4);
Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
Document document = (Document) iterator.next();
SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);
docs.add(solrDoc);
}
server.add(docs);
server.commit();
server.optimize();
Greg
-----Original Message-----
From: Greg Georges [mailto:greg.georges@biztree.com]
Sent: 2 mai 2011 14:44
To: solr-user@lucene.apache.org
Subject: RE: Question concerning the updating of my solr index
Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I have configured it like this, I wonder what would the best settings be with 20,000 docs to update? Higher or lower queue value? Higher or lower thread value? Thanks
Greg
-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: 2 mai 2011 13:59
To: solr-user@lucene.apache.org
Subject: Re: Question concerning the updating of my solr index
Greg,
You could use StreamingUpdateSolrServer instead of that UpdateRequest class -
http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
Your index won't be locked in the sense that you could have multiple apps or
threads adding docs to the same index simultaneously and that searches can be
executed against the index concurrently.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 1:33:30 PM
> Subject: Question concerning the updating of my solr index
>
> Hello all,
>
> I have integrated Solr into my project with success. I use a dataimporthandler
>to first import the data mapping the fields to my schema.xml. I use Solrj to
>query the data and also use faceting. Works great.
>
> The question I have now is a general one on updating the index and how it
>works. Right now, I have a thread which runs a couple of times a day to update
>the index. My index is composed of about 20000 documents, and when this thread
>is run it takes the data of the 20000 documents in the db, I create a
>solrdocument for each and I then use this line of code to index the index.
>
> SolrServer server = new
>CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
>
> for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
> Document document = (Document) iterator.next();
> SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);
> docs.add(solrDoc);
> }
>
> UpdateRequest req = new UpdateRequest();
> req.setAction(UpdateRequest.ACTION.COMMIT, false, false);
> req.add(docs);
> UpdateResponse rsp = req.process(server);
>
> server.optimize();
>
> This process takes 19 seconds, which is 10 seconds faster than my older
>solution using compass (another opensource search project we used). Is this the
>best was to update the index? If I understand correctly, an update is actually
>a delete in the index then an add. During the 19 seconds, will my index be
>locked only on the document being updated or the whole index could be locked? I
>am not in production yet with this solution, so I want to make sure my update
>process makes sense. Thanks
>
> Greg
>
RE: Question concerning the updating of my solr index
Posted by Greg Georges <gr...@biztree.com>.
Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I have configured it like this, I wonder what would the best settings be with 20,000 docs to update? Higher or lower queue value? Higher or lower thread value? Thanks
Greg
-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: 2 mai 2011 13:59
To: solr-user@lucene.apache.org
Subject: Re: Question concerning the updating of my solr index
Greg,
You could use StreamingUpdateSolrServer instead of that UpdateRequest class -
http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
Your index won't be locked in the sense that you could have multiple apps or
threads adding docs to the same index simultaneously and that searches can be
executed against the index concurrently.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 1:33:30 PM
> Subject: Question concerning the updating of my solr index
>
> Hello all,
>
> I have integrated Solr into my project with success. I use a dataimporthandler
>to first import the data mapping the fields to my schema.xml. I use Solrj to
>query the data and also use faceting. Works great.
>
> The question I have now is a general one on updating the index and how it
>works. Right now, I have a thread which runs a couple of times a day to update
>the index. My index is composed of about 20000 documents, and when this thread
>is run it takes the data of the 20000 documents in the db, I create a
>solrdocument for each and I then use this line of code to index the index.
>
> SolrServer server = new
>CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
>
> for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
> Document document = (Document) iterator.next();
> SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);
> docs.add(solrDoc);
> }
>
> UpdateRequest req = new UpdateRequest();
> req.setAction(UpdateRequest.ACTION.COMMIT, false, false);
> req.add(docs);
> UpdateResponse rsp = req.process(server);
>
> server.optimize();
>
> This process takes 19 seconds, which is 10 seconds faster than my older
>solution using compass (another opensource search project we used). Is this the
>best was to update the index? If I understand correctly, an update is actually
>a delete in the index then an add. During the 19 seconds, will my index be
>locked only on the document being updated or the whole index could be locked? I
>am not in production yet with this solution, so I want to make sure my update
>process makes sense. Thanks
>
> Greg
>
Re: Question concerning the updating of my solr index
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Greg,
You could use StreamingUpdateSolrServer instead of that UpdateRequest class -
http://search-lucene.com/?q=StreamingUpdateSolrServer+&fc_project=Solr
Your index won't be locked in the sense that you could have multiple apps or
threads adding docs to the same index simultaneously and that searches can be
executed against the index concurrently.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
----- Original Message ----
> From: Greg Georges <gr...@biztree.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, May 2, 2011 1:33:30 PM
> Subject: Question concerning the updating of my solr index
>
> Hello all,
>
> I have integrated Solr into my project with success. I use a dataimporthandler
>to first import the data mapping the fields to my schema.xml. I use Solrj to
>query the data and also use faceting. Works great.
>
> The question I have now is a general one on updating the index and how it
>works. Right now, I have a thread which runs a couple of times a day to update
>the index. My index is composed of about 20000 documents, and when this thread
>is run it takes the data of the 20000 documents in the db, I create a
>solrdocument for each and I then use this line of code to index the index.
>
> SolrServer server = new
>CommonsHttpSolrServer("http://localhost:8080/apache-solr-1.4.1/");
> Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
>
> for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
> Document document = (Document) iterator.next();
> SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);
> docs.add(solrDoc);
> }
>
> UpdateRequest req = new UpdateRequest();
> req.setAction(UpdateRequest.ACTION.COMMIT, false, false);
> req.add(docs);
> UpdateResponse rsp = req.process(server);
>
> server.optimize();
>
> This process takes 19 seconds, which is 10 seconds faster than my older
>solution using compass (another opensource search project we used). Is this the
>best was to update the index? If I understand correctly, an update is actually
>a delete in the index then an add. During the 19 seconds, will my index be
>locked only on the document being updated or the whole index could be locked? I
>am not in production yet with this solution, so I want to make sure my update
>process makes sense. Thanks
>
> Greg
>