You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Eyal <ey...@gmail.com> on 2005/08/11 20:44:39 UTC

RE: Updating existing documents in index: Solutions

Run a search on "Lucene ParallelReader" in google - You'll find something
Doug Cutting wrote that I believe is what you're looking for.

Eyal 
 

> -----Original Message-----
> From: John Smith [mailto:john_smith9910@yahoo.com] 
> Sent: Thursday, August 11, 2005 21:12 PM
> To: java-user@lucene.apache.org
> Subject: Updating existing documents in index: Solutions 
> 
> 
> Hi all
> 
>  
> 
> This is a slightly long email. Pardon me.
> 
>  
> 
> As Lucene does not allow for updating an existing document in 
> the index, the only option is to delete and reindex the 
> message.When you have too many updates, this gets a little 
> cumbersome. In our case, as such the actual content of the 
> document being indexed does 
> 
> not change, but the fields around the content, like say 
> "LastReadby" or something like Folder associated with it etc 
> change. These are all fields that have been indexed as a part 
> of the original  document in the index.
> 
>  
> 
> I have been contemplating putting these "commonly changing 
> fields" into one  index and allow for delete and reindex on 
> this  index alone and keep the static data in another index. 
> DocumentID will be a stored field and will be stored in both 
> the static and dynamic index, as a way of identifying the document.
> 
>  
> 
> Static index: Contains content of document indexed and 
> documentID stored.
> 
> Dynamic index: Contains all fields about the document which 
> change frequently indexed  and documentID stored.
> 
>  
> 
>  
> 
> Questions
> 
>  
> 
> 1. First of all, is there a better solution to this 
> frequently changing fields having to be reindexed ?
> 
>  
> 
> 2. Let's say I go  with the 2 index approach, 
> 
>  
> 
> Example query:  Content: "Hello world" AND Folder:Folder1 AND 
> LastReadBy: jane. If we execute these queries on our static 
> and dynamic indexes, they will obviously fail to get hits.
> 
> 
> 
>  
> 
>      Let's say I have a way of splitting my queries such that 
>  all content queries go to static (content) index only and 
> queries on other fields go to the dynamic index, basically 
> allow for queries to come in such a way that it is always a 
> AND between the dynamic index result set and static index 
> result set. So on the results set, I would have to retrieve 
> the document ID and make sure we have the same documentID in 
> both the result sets, in order for it to be a match.
> 
>       In  cases where the result sets are really huge from  
> both the queries, then even to get the number of hits, I will 
> have to retrieve each and every document from the results, in 
> order to get the documentID for comparison. Queries can get 
> really slow.
> 
>  
> 
> Has anyone faced similar problems, If so what was your solution?
> 
> Any comments/thoughts will be appreciated.
> 
>  
> 
> Thank you
> 
> JS
> 
> 
> 
> 		
> ---------------------------------
>  Start your day with Yahoo! - make it your home page 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Updating existing documents in index: Solutions

Posted by Nicolas Maisonneuve <n....@gmail.com>.
maybe this  code can be useful for you 

http://issues.apache.org/bugzilla/show_bug.cgi?id=34629

nicolas

On 8/11/05, John Smith <jo...@yahoo.com> wrote:
> Thank you . That does look like what I want
> 
> JS
> 
> Eyal <ey...@gmail.com> wrote:
> Run a search on "Lucene ParallelReader" in google - You'll find something
> Doug Cutting wrote that I believe is what you're looking for.
> 
> Eyal
> 
> 
> > -----Original Message-----
> > From: John Smith [mailto:john_smith9910@yahoo.com]
> > Sent: Thursday, August 11, 2005 21:12 PM
> > To: java-user@lucene.apache.org
> > Subject: Updating existing documents in index: Solutions
> >
> >
> > Hi all
> >
> >
> >
> > This is a slightly long email. Pardon me.
> >
> >
> >
> > As Lucene does not allow for updating an existing document in
> > the index, the only option is to delete and reindex the
> > message.When you have too many updates, this gets a little
> > cumbersome. In our case, as such the actual content of the
> > document being indexed does
> >
> > not change, but the fields around the content, like say
> > "LastReadby" or something like Folder associated with it etc
> > change. These are all fields that have been indexed as a part
> > of the original document in the index.
> >
> >
> >
> > I have been contemplating putting these "commonly changing
> > fields" into one index and allow for delete and reindex on
> > this index alone and keep the static data in another index.
> > DocumentID will be a stored field and will be stored in both
> > the static and dynamic index, as a way of identifying the document.
> >
> >
> >
> > Static index: Contains content of document indexed and
> > documentID stored.
> >
> > Dynamic index: Contains all fields about the document which
> > change frequently indexed and documentID stored.
> >
> >
> >
> >
> >
> > Questions
> >
> >
> >
> > 1. First of all, is there a better solution to this
> > frequently changing fields having to be reindexed ?
> >
> >
> >
> > 2. Let's say I go with the 2 index approach,
> >
> >
> >
> > Example query: Content: "Hello world" AND Folder:Folder1 AND
> > LastReadBy: jane. If we execute these queries on our static
> > and dynamic indexes, they will obviously fail to get hits.
> >
> >
> >
> >
> >
> > Let's say I have a way of splitting my queries such that
> > all content queries go to static (content) index only and
> > queries on other fields go to the dynamic index, basically
> > allow for queries to come in such a way that it is always a
> > AND between the dynamic index result set and static index
> > result set. So on the results set, I would have to retrieve
> > the document ID and make sure we have the same documentID in
> > both the result sets, in order for it to be a match.
> >
> > In cases where the result sets are really huge from
> > both the queries, then even to get the number of hits, I will
> > have to retrieve each and every document from the results, in
> > order to get the documentID for comparison. Queries can get
> > really slow.
> >
> >
> >
> > Has anyone faced similar problems, If so what was your solution?
> >
> > Any comments/thoughts will be appreciated.
> >
> >
> >
> > Thank you
> >
> > JS
> >
> >
> >
> >
> > ---------------------------------
> > Start your day with Yahoo! - make it your home page
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Updating existing documents in index: Solutions

Posted by John Smith <jo...@yahoo.com>.
Thank you . That does look like what I want
 
JS

Eyal <ey...@gmail.com> wrote:
Run a search on "Lucene ParallelReader" in google - You'll find something
Doug Cutting wrote that I believe is what you're looking for.

Eyal 


> -----Original Message-----
> From: John Smith [mailto:john_smith9910@yahoo.com] 
> Sent: Thursday, August 11, 2005 21:12 PM
> To: java-user@lucene.apache.org
> Subject: Updating existing documents in index: Solutions 
> 
> 
> Hi all
> 
> 
> 
> This is a slightly long email. Pardon me.
> 
> 
> 
> As Lucene does not allow for updating an existing document in 
> the index, the only option is to delete and reindex the 
> message.When you have too many updates, this gets a little 
> cumbersome. In our case, as such the actual content of the 
> document being indexed does 
> 
> not change, but the fields around the content, like say 
> "LastReadby" or something like Folder associated with it etc 
> change. These are all fields that have been indexed as a part 
> of the original document in the index.
> 
> 
> 
> I have been contemplating putting these "commonly changing 
> fields" into one index and allow for delete and reindex on 
> this index alone and keep the static data in another index. 
> DocumentID will be a stored field and will be stored in both 
> the static and dynamic index, as a way of identifying the document.
> 
> 
> 
> Static index: Contains content of document indexed and 
> documentID stored.
> 
> Dynamic index: Contains all fields about the document which 
> change frequently indexed and documentID stored.
> 
> 
> 
> 
> 
> Questions
> 
> 
> 
> 1. First of all, is there a better solution to this 
> frequently changing fields having to be reindexed ?
> 
> 
> 
> 2. Let's say I go with the 2 index approach, 
> 
> 
> 
> Example query: Content: "Hello world" AND Folder:Folder1 AND 
> LastReadBy: jane. If we execute these queries on our static 
> and dynamic indexes, they will obviously fail to get hits.
> 
> 
> 
> 
> 
> Let's say I have a way of splitting my queries such that 
> all content queries go to static (content) index only and 
> queries on other fields go to the dynamic index, basically 
> allow for queries to come in such a way that it is always a 
> AND between the dynamic index result set and static index 
> result set. So on the results set, I would have to retrieve 
> the document ID and make sure we have the same documentID in 
> both the result sets, in order for it to be a match.
> 
> In cases where the result sets are really huge from 
> both the queries, then even to get the number of hits, I will 
> have to retrieve each and every document from the results, in 
> order to get the documentID for comparison. Queries can get 
> really slow.
> 
> 
> 
> Has anyone faced similar problems, If so what was your solution?
> 
> Any comments/thoughts will be appreciated.
> 
> 
> 
> Thank you
> 
> JS
> 
> 
> 
> 
> ---------------------------------
> Start your day with Yahoo! - make it your home page 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com