You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Huang, Zijian(Victor)" <zi...@etrade.com> on 2009/03/18 19:51:59 UTC

Question about incremental index update

Hi:
   Is it easy to do daily incremental index update in Solr assuming the
index is around 1G? In terms of giving a document an ID to facilitate
index update, is it using the URL a good way to do so? 

Thanks


Victor


Re: Question about incremental index update

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Mar 19, 2009 at 2:14 AM, Huang, Zijian(Victor) <
zijian.huang@etrade.com> wrote:

>
>    I mean the document ID in Slor xml doc format. Inside the Solr wiki,
> it tells me that I can update a particular doc by its ID if I assigned
> one previously. I am thinking if using the url as the doc ID will be a
> good thing to do.
>

There's the uniqueKey in schema.xml. If you send another document with the
same uniqueKey then it will replace the document already existing in the
index.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Question about incremental index update

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Victor,

Yes, if you use the same ID (and a URL could serve as a Document ID), Solr will update the Document.
Note that Solr doesn't do crawling/web page fetching, but Nutch and Droids do.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: "Huang, Zijian(Victor)" <zi...@etrade.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, March 18, 2009 4:44:30 PM
> Subject: RE: Question about incremental index update
> 
> Hi, Otis:
>    so does Solr already has some kind of libraries build-in, which it
> can automatically detect the different within two set of crawled
> documents and update the index to the newer one? 
>     I mean the document ID in Slor xml doc format. Inside the Solr wiki,
> it tells me that I can update a particular doc by its ID if I assigned
> one previously. I am thinking if using the url as the doc ID will be a
> good thing to do.
> 
> Thanks
> 
> Vic
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> Sent: Wednesday, March 18, 2009 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Question about incremental index update
> 
> 
> Victor,
> 
> Daily updates (or hourly or more frequent) are not going to be a
> problem.  I don't follow your question about document ID and using URL.
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
> > From: "Huang, Zijian(Victor)" 
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, March 18, 2009 2:51:59 PM
> > Subject: Question about incremental index update
> > 
> > Hi:
> >    Is it easy to do daily incremental index update in Solr assuming 
> > the index is around 1G? In terms of giving a document an ID to 
> > facilitate index update, is it using the URL a good way to do so?
> > 
> > Thanks
> > 
> > 
> > Victor


RE: Question about incremental index update

Posted by "Huang, Zijian(Victor)" <zi...@etrade.com>.
Hi, Otis:
   so does Solr already has some kind of libraries build-in, which it
can automatically detect the different within two set of crawled
documents and update the index to the newer one? 
    I mean the document ID in Slor xml doc format. Inside the Solr wiki,
it tells me that I can update a particular doc by its ID if I assigned
one previously. I am thinking if using the url as the doc ID will be a
good thing to do.

Thanks

Vic

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Wednesday, March 18, 2009 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Question about incremental index update


Victor,

Daily updates (or hourly or more frequent) are not going to be a
problem.  I don't follow your question about document ID and using URL.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: "Huang, Zijian(Victor)" <zi...@etrade.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, March 18, 2009 2:51:59 PM
> Subject: Question about incremental index update
> 
> Hi:
>    Is it easy to do daily incremental index update in Solr assuming 
> the index is around 1G? In terms of giving a document an ID to 
> facilitate index update, is it using the URL a good way to do so?
> 
> Thanks
> 
> 
> Victor


Re: Question about incremental index update

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Victor,

Daily updates (or hourly or more frequent) are not going to be a problem.  I don't follow your question about document ID and using URL.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: "Huang, Zijian(Victor)" <zi...@etrade.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, March 18, 2009 2:51:59 PM
> Subject: Question about incremental index update 
> 
> Hi:
>    Is it easy to do daily incremental index update in Solr assuming the
> index is around 1G? In terms of giving a document an ID to facilitate
> index update, is it using the URL a good way to do so? 
> 
> Thanks
> 
> 
> Victor