You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Albert Vila <av...@imente.com> on 2004/06/22 16:44:34 UTC

Clustering question: searching two diferent indexes

Hi all,

I was wondering If I can search using the MultiSearcher over two 
diferent indexes at the same time (with diferent fields).
I've got one big index, with the code, title, content, language, etc 
fields (new documents are added incrementally). Now, I have to introduce 
a clustering field. The problem is that I have to update the whole index 
each time the clusters change, and I have no enought time to do it (I 
wanna check for new clusters every 10 minuts and I spent 25 minutes to 
reindex the whole index).
A query example could be: language:0 and title:java and cluster:0

Can I leave the big index whitout any changes and create a new index 
with only the following fields, code and cluster, and perform the 
searches using this two indexes? I think I cannot do that without 
changing the code. It would need a postprocess, matching all returning 
codes from index 1 with index 2.

Anyone have a solution for this problem? I would appreciate that.

Thanks.

-- 
Albert Vila
Director de proyectos I+D
http://www.imente.com
902 933 242
[iMente “La información con más beneficios”]


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Clustering question: searching two diferent indexes

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Correct, that is what I meant when I said you application will have to
handle your particular merge.  Instead of using addIndexes method, your
applicatoin will have to go through all Documents in the smaller index
(the one with cluster fields), get the PK of each Doc, look up that Doc
by PK in the big index, delete it from there if it exists, and re-add
it to the big index).

Otis



--- Albert Vila <av...@imente.com> wrote:
> OK, but with this solution, i cannot perform queries like:
> get all codes that match "title:java and language:english and
> cluster:0"
> 
> Albert
> 
> 
> Otis Gospodnetic wrote:
> 
> >Aha, now I see what you mean.  You didn't mention 'date' before. :)
> >So, dates will get preserved, and you will be able to keep using
> them
> >for sorting.  However, Lucene will not automatically recognize your
> 'PK
> >fields' and merge fields from two Documents with the same PK into a
> >single Document.  You can think of 'merge' as 'add' (well, the
> method
> >name is addIndices, actually :)), so Lucene will simply make a
> >cumulative index from your two separate indices:
> >
> >luceneID_0, code_x, title_x, content_x, language_x, date_x
> >luceneID_1, code_y, title_y, content_y, language_y, date_y
> >luceneID_0, code_y, cluster_y
> >luceneID_1, code_x, cluster_x
> >
> >Otis
> >
> >
> >--- Albert Vila <av...@imente.com> wrote:
> >  
> >
> >>By 'order', I mean that I'm adding the documents in the big index
> >>sorted 
> >>by date (in order to increase the sorting process). I wanna
> preserve 
> >>this sorting after the merging process.
> >>
> >>I'm not using the internal lucene ID in the code field. The code
> >>field 
> >>contains my own IDs. I was asking, if I can do the merge using my
> own
> >>
> >>IDs (the code field), and not the lucene internal IDs, for example:
> >>
> >>luceneID_0, code_x, title_x, content_x, language_x, date_x
> >>luceneID_1, code_y, title_y, content_y, language_y, date_y
> >>
> >>luceneID_0, code_y, cluster_y
> >>luceneID_1, code_x, cluster_x
> >>
> >>Will the prevous index structure procude an unconsistent merged
> >>index?
> >>
> >>I wanna achieve the following merged index:
> >>luceneID_0, code_x, title_x, content_x, language_x, date_x,
> cluster_x
> >>luceneID_1, code_y, title_y, content_y, language_y, date_y,
> cluster_y
> >>
> >>Thanks
> >>
> >>Otis Gospodnetic wrote:
> >>
> >>    
> >>
> >>>Albert,
> >>>
> >>>--- Albert Vila <av...@imente.com> wrote:
> >>> 
> >>>
> >>>      
> >>>
> >>>>Thanks Otis, but I can merge two indexes with different fields?
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>Yes.  Documents with different Fields can be stored in the same
> >>>      
> >>>
> >>index.
> >>    
> >>
> >>>Not every Document has to have all fields, and it can even have a
> >>>completely different set of Fields.
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>My big index has this fields, code, title, content, language and
> >>>>date. I add the new documents incrementally.
> >>>>
> >>>>The clustering index only contains the fields code, and cluster.
> >>>>Merging 
> >>>>the big index with the clustering one will preserve the order of
> >>>>        
> >>>>
> >>the
> >>    
> >>
> >>>>big one?
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>I don't fully understand what you mean by 'order'.  If you are
> >>>      
> >>>
> >>asking
> >>    
> >>
> >>>whether internal document Ids will remain the same, the answer is
> >>>negative.  If you have deleted some documents, there will be gaps
> in
> >>>document Id sequence, which Lucene will fill, thus re-assigning
> >>>internal document Ids.
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>For example, if I have the following indexes:
> >>>>Big index
> >>>>code_1, title_1, content_1, language_1, date_1
> >>>>code_2, title_2, content_2, language_2, date_2
> >>>>....
> >>>>
> >>>>Clustering index
> >>>>code_1, cluster_1
> >>>>code_2, cluster_2
> >>>>....
> >>>>
> >>>>then the new merged index will be:
> >>>>
> >>>>Merged index
> >>>>code_1, title_1, content_1, language_1, date_1, cluster_1
> >>>>code_2, title_2, content_2, language_2, date_2, cluster_2
> >>>>....
> >>>>
> >>>>If I can do that then fine, but I think the merging process uses
> >>>>        
> >>>>
> >>the 
> >>    
> >>
> >>>>lucene internal ID to match the documents. I wanna use the code
> >>>>        
> >>>>
> >>field
> >>    
> >>
> >>>>to 
> >>>>do that matching, is that possible?. I cannot be sure the lucene 
> >>>>internal ID's are the same for the same codes in both indexes.
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>Are you storing the internal Lucene Document Id in the 'code'
> field?
> >>>      
> >>>
> >>>If you are, I suggest you change your application to use its own
> set
> >>>      
> >>>
> >>of
> >>    
> >>
> >>>unique Ids to serve as 'primary keys' in your indices.
> >>>
> >>>Otis
> >>>
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>Thanks again,
> >>>>
> >>>>Albert
> >>>>
> >>>>
> >>>>Otis Gospodnetic wrote:
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>(re-directing to lucene-user list)
> >>>>>
> >>>>>Albert,
> >>>>>
> >>>>>If I understand your question correctly... You could run a query
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>like
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>the one you gave on both indices, but if one of them contains
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>documents
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>that have only one of those fields (cluster), then there will
> >>>>>          
> >>>>>
> >>never
> >>    
> >>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>be
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>any matches in the second index.
> >>>>>
> >>>>>However, why not leave your big index along, add documents to a
> >>>>>          
> >>>>>
> >>new,
> >>    
> >>
> >>>>>smaller index, and then merge them periodically.  I may be off
> >>>>>          
> >>>>>
> >>with
> >>    
> >>
> >>>>>this; it sounds like this is what you want to do, but I'm not
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>certain I
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>understood you fully.
> >>>>>
> >>>>>Otis
> >>>>>
> >>>>>--- Albert Vila <av...@imente.com> wrote:
> >>>>>
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>Hi all,
> >>>>>>
> >>>>>>I was wondering If I can search using the MultiSearcher over
> two 
> >>>>>>diferent indexes at the same time (with diferent fields).
> >>>>>>I've got one big index, with the code, title, content,
> language,
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>etc 
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>>fields (new documents are added incrementally). Now, I have to
> >>>>>>introduce 
> >>>>>>a clustering field. The problem is that I have to update the
> >>>>>>            
> >>>>>>
> >>whole
> >>    
> >>
> >>>>>>index 
> >>>>>>each time the clusters change, and I have no enought time to do
> >>>>>>            
> >>>>>>
> >>it
> >>    
> >>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>(I
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>>wanna check for new clusters every 10 minuts and I spent 25
> >>>>>>            
> >>>>>>
> >>minutes
> >>    
> >>
> >>>>>>to 
> >>>>>>reindex the whole index).
> >>>>>>A query example could be: language:0 and title:java and
> cluster:0
> >>>>>>
> >>>>>>Can I leave the big index whitout any changes and create a new
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>index 
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>>with only the following fields, code and cluster, and perform
> the
> >>>>>>            
> >>>>>>
> >>>>>>searches using this two indexes? I think I cannot do that
> without
> >>>>>>            
> >>>>>>
> >>>>>>changing the code. It would need a postprocess, matching all
> >>>>>>returning 
> >>>>>>codes from index 1 with index 2.
> >>>>>>
> >>>>>>Anyone have a solution for this problem? I would appreciate
> that.
> >>>>>>  
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>-- 
> >>>>Albert Vila
> >>>>Director de proyectos I+D
> >>>>http://www.imente.com
> >>>>902 933 242
> >>>>[iMente �La informaci�n con m�s beneficios�]
> >>>>
> >>>>
> >>>>        
> >>>>
>
>>>---------------------------------------------------------------------
> >>>      
> >>>
> >>>>To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> >>>>For additional commands, e-mail:
> >>>>        
> >>>>
> >>lucene-user-help@jakarta.apache.org
> >>    
> >>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>      
> >>>
>
>>---------------------------------------------------------------------
> >>    
> >>
> >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >>>
> >>>
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>-- 
> >>Albert Vila
> >>Director de proyectos I+D
> >>http://www.imente.com
> >>902 933 242
> >>[iMente �La informaci�n con m�s beneficios�]
> >>
> >>
>
>>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >>
> >>
> >>    
> >>
> >
> >
>
>---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> >  
> >
> 
> -- 
> Albert Vila
> Director de proyectos I+D
> http://www.imente.com
> 902 933 242
> [iMente �La informaci�n con m�s beneficios�]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Clustering question: searching two diferent indexes

Posted by Albert Vila <av...@imente.com>.

OK, but with this solution, i cannot perform queries like:
get all codes that match "title:java and language:english and cluster:0"

Albert


Otis Gospodnetic wrote:

>Aha, now I see what you mean.  You didn't mention 'date' before. :)
>So, dates will get preserved, and you will be able to keep using them
>for sorting.  However, Lucene will not automatically recognize your 'PK
>fields' and merge fields from two Documents with the same PK into a
>single Document.  You can think of 'merge' as 'add' (well, the method
>name is addIndices, actually :)), so Lucene will simply make a
>cumulative index from your two separate indices:
>
>luceneID_0, code_x, title_x, content_x, language_x, date_x
>luceneID_1, code_y, title_y, content_y, language_y, date_y
>luceneID_0, code_y, cluster_y
>luceneID_1, code_x, cluster_x
>
>Otis
>
>
>--- Albert Vila <av...@imente.com> wrote:
>  
>
>>By 'order', I mean that I'm adding the documents in the big index
>>sorted 
>>by date (in order to increase the sorting process). I wanna preserve 
>>this sorting after the merging process.
>>
>>I'm not using the internal lucene ID in the code field. The code
>>field 
>>contains my own IDs. I was asking, if I can do the merge using my own
>>
>>IDs (the code field), and not the lucene internal IDs, for example:
>>
>>luceneID_0, code_x, title_x, content_x, language_x, date_x
>>luceneID_1, code_y, title_y, content_y, language_y, date_y
>>
>>luceneID_0, code_y, cluster_y
>>luceneID_1, code_x, cluster_x
>>
>>Will the prevous index structure procude an unconsistent merged
>>index?
>>
>>I wanna achieve the following merged index:
>>luceneID_0, code_x, title_x, content_x, language_x, date_x, cluster_x
>>luceneID_1, code_y, title_y, content_y, language_y, date_y, cluster_y
>>
>>Thanks
>>
>>Otis Gospodnetic wrote:
>>
>>    
>>
>>>Albert,
>>>
>>>--- Albert Vila <av...@imente.com> wrote:
>>> 
>>>
>>>      
>>>
>>>>Thanks Otis, but I can merge two indexes with different fields?
>>>>   
>>>>
>>>>        
>>>>
>>>Yes.  Documents with different Fields can be stored in the same
>>>      
>>>
>>index.
>>    
>>
>>>Not every Document has to have all fields, and it can even have a
>>>completely different set of Fields.
>>>
>>> 
>>>
>>>      
>>>
>>>>My big index has this fields, code, title, content, language and
>>>>date. I add the new documents incrementally.
>>>>
>>>>The clustering index only contains the fields code, and cluster.
>>>>Merging 
>>>>the big index with the clustering one will preserve the order of
>>>>        
>>>>
>>the
>>    
>>
>>>>big one?
>>>>   
>>>>
>>>>        
>>>>
>>>I don't fully understand what you mean by 'order'.  If you are
>>>      
>>>
>>asking
>>    
>>
>>>whether internal document Ids will remain the same, the answer is
>>>negative.  If you have deleted some documents, there will be gaps in
>>>document Id sequence, which Lucene will fill, thus re-assigning
>>>internal document Ids.
>>>
>>> 
>>>
>>>      
>>>
>>>>For example, if I have the following indexes:
>>>>Big index
>>>>code_1, title_1, content_1, language_1, date_1
>>>>code_2, title_2, content_2, language_2, date_2
>>>>....
>>>>
>>>>Clustering index
>>>>code_1, cluster_1
>>>>code_2, cluster_2
>>>>....
>>>>
>>>>then the new merged index will be:
>>>>
>>>>Merged index
>>>>code_1, title_1, content_1, language_1, date_1, cluster_1
>>>>code_2, title_2, content_2, language_2, date_2, cluster_2
>>>>....
>>>>
>>>>If I can do that then fine, but I think the merging process uses
>>>>        
>>>>
>>the 
>>    
>>
>>>>lucene internal ID to match the documents. I wanna use the code
>>>>        
>>>>
>>field
>>    
>>
>>>>to 
>>>>do that matching, is that possible?. I cannot be sure the lucene 
>>>>internal ID's are the same for the same codes in both indexes.
>>>>   
>>>>
>>>>        
>>>>
>>>Are you storing the internal Lucene Document Id in the 'code' field?
>>>      
>>>
>>>If you are, I suggest you change your application to use its own set
>>>      
>>>
>>of
>>    
>>
>>>unique Ids to serve as 'primary keys' in your indices.
>>>
>>>Otis
>>>
>>>
>>> 
>>>
>>>      
>>>
>>>>Thanks again,
>>>>
>>>>Albert
>>>>
>>>>
>>>>Otis Gospodnetic wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>(re-directing to lucene-user list)
>>>>>
>>>>>Albert,
>>>>>
>>>>>If I understand your question correctly... You could run a query
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>like
>>>>   
>>>>
>>>>        
>>>>
>>>>>the one you gave on both indices, but if one of them contains
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>documents
>>>>   
>>>>
>>>>        
>>>>
>>>>>that have only one of those fields (cluster), then there will
>>>>>          
>>>>>
>>never
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>be
>>>>   
>>>>
>>>>        
>>>>
>>>>>any matches in the second index.
>>>>>
>>>>>However, why not leave your big index along, add documents to a
>>>>>          
>>>>>
>>new,
>>    
>>
>>>>>smaller index, and then merge them periodically.  I may be off
>>>>>          
>>>>>
>>with
>>    
>>
>>>>>this; it sounds like this is what you want to do, but I'm not
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>certain I
>>>>   
>>>>
>>>>        
>>>>
>>>>>understood you fully.
>>>>>
>>>>>Otis
>>>>>
>>>>>--- Albert Vila <av...@imente.com> wrote:
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Hi all,
>>>>>>
>>>>>>I was wondering If I can search using the MultiSearcher over two 
>>>>>>diferent indexes at the same time (with diferent fields).
>>>>>>I've got one big index, with the code, title, content, language,
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>etc 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>fields (new documents are added incrementally). Now, I have to
>>>>>>introduce 
>>>>>>a clustering field. The problem is that I have to update the
>>>>>>            
>>>>>>
>>whole
>>    
>>
>>>>>>index 
>>>>>>each time the clusters change, and I have no enought time to do
>>>>>>            
>>>>>>
>>it
>>    
>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>(I
>>>>   
>>>>
>>>>        
>>>>
>>>>>>wanna check for new clusters every 10 minuts and I spent 25
>>>>>>            
>>>>>>
>>minutes
>>    
>>
>>>>>>to 
>>>>>>reindex the whole index).
>>>>>>A query example could be: language:0 and title:java and cluster:0
>>>>>>
>>>>>>Can I leave the big index whitout any changes and create a new
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>index 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>with only the following fields, code and cluster, and perform the
>>>>>>            
>>>>>>
>>>>>>searches using this two indexes? I think I cannot do that without
>>>>>>            
>>>>>>
>>>>>>changing the code. It would need a postprocess, matching all
>>>>>>returning 
>>>>>>codes from index 1 with index 2.
>>>>>>
>>>>>>Anyone have a solution for this problem? I would appreciate that.
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>-- 
>>>>Albert Vila
>>>>Director de proyectos I+D
>>>>http://www.imente.com
>>>>902 933 242
>>>>[iMente “La información con más beneficios”]
>>>>
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>      
>>>
>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail:
>>>>        
>>>>
>>lucene-user-help@jakarta.apache.org
>>    
>>
>>>>   
>>>>
>>>>        
>>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>    
>>
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>
>>> 
>>>
>>>      
>>>
>>-- 
>>Albert Vila
>>Director de proyectos I+D
>>http://www.imente.com
>>902 933 242
>>[iMente “La información con más beneficios”]
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>  
>

-- 
Albert Vila
Director de proyectos I+D
http://www.imente.com
902 933 242
[iMente “La información con más beneficios”]


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Clustering question: searching two diferent indexes

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Aha, now I see what you mean.  You didn't mention 'date' before. :)
So, dates will get preserved, and you will be able to keep using them
for sorting.  However, Lucene will not automatically recognize your 'PK
fields' and merge fields from two Documents with the same PK into a
single Document.  You can think of 'merge' as 'add' (well, the method
name is addIndices, actually :)), so Lucene will simply make a
cumulative index from your two separate indices:

luceneID_0, code_x, title_x, content_x, language_x, date_x
luceneID_1, code_y, title_y, content_y, language_y, date_y
luceneID_0, code_y, cluster_y
luceneID_1, code_x, cluster_x

Otis


--- Albert Vila <av...@imente.com> wrote:
> By 'order', I mean that I'm adding the documents in the big index
> sorted 
> by date (in order to increase the sorting process). I wanna preserve 
> this sorting after the merging process.
> 
> I'm not using the internal lucene ID in the code field. The code
> field 
> contains my own IDs. I was asking, if I can do the merge using my own
> 
> IDs (the code field), and not the lucene internal IDs, for example:
> 
> luceneID_0, code_x, title_x, content_x, language_x, date_x
> luceneID_1, code_y, title_y, content_y, language_y, date_y
> 
> luceneID_0, code_y, cluster_y
> luceneID_1, code_x, cluster_x
> 
> Will the prevous index structure procude an unconsistent merged
> index?
> 
> I wanna achieve the following merged index:
> luceneID_0, code_x, title_x, content_x, language_x, date_x, cluster_x
> luceneID_1, code_y, title_y, content_y, language_y, date_y, cluster_y
> 
> Thanks
> 
> Otis Gospodnetic wrote:
> 
> >Albert,
> >
> >--- Albert Vila <av...@imente.com> wrote:
> >  
> >
> >>Thanks Otis, but I can merge two indexes with different fields?
> >>    
> >>
> >
> >Yes.  Documents with different Fields can be stored in the same
> index.
> >Not every Document has to have all fields, and it can even have a
> >completely different set of Fields.
> >
> >  
> >
> >>My big index has this fields, code, title, content, language and
> >>date. I add the new documents incrementally.
> >>
> >>The clustering index only contains the fields code, and cluster.
> >>Merging 
> >>the big index with the clustering one will preserve the order of
> the
> >>big one?
> >>    
> >>
> >
> >I don't fully understand what you mean by 'order'.  If you are
> asking
> >whether internal document Ids will remain the same, the answer is
> >negative.  If you have deleted some documents, there will be gaps in
> >document Id sequence, which Lucene will fill, thus re-assigning
> >internal document Ids.
> >
> >  
> >
> >>For example, if I have the following indexes:
> >>Big index
> >>code_1, title_1, content_1, language_1, date_1
> >>code_2, title_2, content_2, language_2, date_2
> >>....
> >>
> >>Clustering index
> >>code_1, cluster_1
> >>code_2, cluster_2
> >>....
> >>
> >>then the new merged index will be:
> >>
> >>Merged index
> >>code_1, title_1, content_1, language_1, date_1, cluster_1
> >>code_2, title_2, content_2, language_2, date_2, cluster_2
> >>....
> >>
> >>If I can do that then fine, but I think the merging process uses
> the 
> >>lucene internal ID to match the documents. I wanna use the code
> field
> >>to 
> >>do that matching, is that possible?. I cannot be sure the lucene 
> >>internal ID's are the same for the same codes in both indexes.
> >>    
> >>
> >
> >Are you storing the internal Lucene Document Id in the 'code' field?
> 
> >If you are, I suggest you change your application to use its own set
> of
> >unique Ids to serve as 'primary keys' in your indices.
> >
> >Otis
> >
> >
> >  
> >
> >>Thanks again,
> >>
> >>Albert
> >>
> >>
> >>Otis Gospodnetic wrote:
> >>
> >>    
> >>
> >>>(re-directing to lucene-user list)
> >>>
> >>>Albert,
> >>>
> >>>If I understand your question correctly... You could run a query
> >>>      
> >>>
> >>like
> >>    
> >>
> >>>the one you gave on both indices, but if one of them contains
> >>>      
> >>>
> >>documents
> >>    
> >>
> >>>that have only one of those fields (cluster), then there will
> never
> >>>      
> >>>
> >>be
> >>    
> >>
> >>>any matches in the second index.
> >>>
> >>>However, why not leave your big index along, add documents to a
> new,
> >>>smaller index, and then merge them periodically.  I may be off
> with
> >>>this; it sounds like this is what you want to do, but I'm not
> >>>      
> >>>
> >>certain I
> >>    
> >>
> >>>understood you fully.
> >>>
> >>>Otis
> >>>
> >>>--- Albert Vila <av...@imente.com> wrote:
> >>> 
> >>>
> >>>      
> >>>
> >>>>Hi all,
> >>>>
> >>>>I was wondering If I can search using the MultiSearcher over two 
> >>>>diferent indexes at the same time (with diferent fields).
> >>>>I've got one big index, with the code, title, content, language,
> >>>>        
> >>>>
> >>etc 
> >>    
> >>
> >>>>fields (new documents are added incrementally). Now, I have to
> >>>>introduce 
> >>>>a clustering field. The problem is that I have to update the
> whole
> >>>>index 
> >>>>each time the clusters change, and I have no enought time to do
> it
> >>>>        
> >>>>
> >>(I
> >>    
> >>
> >>>>wanna check for new clusters every 10 minuts and I spent 25
> minutes
> >>>>to 
> >>>>reindex the whole index).
> >>>>A query example could be: language:0 and title:java and cluster:0
> >>>>
> >>>>Can I leave the big index whitout any changes and create a new
> >>>>        
> >>>>
> >>index 
> >>    
> >>
> >>>>with only the following fields, code and cluster, and perform the
> 
> >>>>searches using this two indexes? I think I cannot do that without
> 
> >>>>changing the code. It would need a postprocess, matching all
> >>>>returning 
> >>>>codes from index 1 with index 2.
> >>>>
> >>>>Anyone have a solution for this problem? I would appreciate that.
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>-- 
> >>Albert Vila
> >>Director de proyectos I+D
> >>http://www.imente.com
> >>902 933 242
> >>[iMente �La informaci�n con m�s beneficios�]
> >>
> >>
>
>>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >>
> >>
> >>    
> >>
> >
> >
>
>---------------------------------------------------------------------
> >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
> >  
> >
> 
> -- 
> Albert Vila
> Director de proyectos I+D
> http://www.imente.com
> 902 933 242
> [iMente �La informaci�n con m�s beneficios�]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Clustering question: searching two diferent indexes

Posted by Albert Vila <av...@imente.com>.

By 'order', I mean that I'm adding the documents in the big index sorted 
by date (in order to increase the sorting process). I wanna preserve 
this sorting after the merging process.

I'm not using the internal lucene ID in the code field. The code field 
contains my own IDs. I was asking, if I can do the merge using my own 
IDs (the code field), and not the lucene internal IDs, for example:

luceneID_0, code_x, title_x, content_x, language_x, date_x
luceneID_1, code_y, title_y, content_y, language_y, date_y

luceneID_0, code_y, cluster_y
luceneID_1, code_x, cluster_x

Will the prevous index structure procude an unconsistent merged index?

I wanna achieve the following merged index:
luceneID_0, code_x, title_x, content_x, language_x, date_x, cluster_x
luceneID_1, code_y, title_y, content_y, language_y, date_y, cluster_y

Thanks

Otis Gospodnetic wrote:

>Albert,
>
>--- Albert Vila <av...@imente.com> wrote:
>  
>
>>Thanks Otis, but I can merge two indexes with different fields?
>>    
>>
>
>Yes.  Documents with different Fields can be stored in the same index.
>Not every Document has to have all fields, and it can even have a
>completely different set of Fields.
>
>  
>
>>My big index has this fields, code, title, content, language and
>>date. I add the new documents incrementally.
>>
>>The clustering index only contains the fields code, and cluster.
>>Merging 
>>the big index with the clustering one will preserve the order of the
>>big one?
>>    
>>
>
>I don't fully understand what you mean by 'order'.  If you are asking
>whether internal document Ids will remain the same, the answer is
>negative.  If you have deleted some documents, there will be gaps in
>document Id sequence, which Lucene will fill, thus re-assigning
>internal document Ids.
>
>  
>
>>For example, if I have the following indexes:
>>Big index
>>code_1, title_1, content_1, language_1, date_1
>>code_2, title_2, content_2, language_2, date_2
>>....
>>
>>Clustering index
>>code_1, cluster_1
>>code_2, cluster_2
>>....
>>
>>then the new merged index will be:
>>
>>Merged index
>>code_1, title_1, content_1, language_1, date_1, cluster_1
>>code_2, title_2, content_2, language_2, date_2, cluster_2
>>....
>>
>>If I can do that then fine, but I think the merging process uses the 
>>lucene internal ID to match the documents. I wanna use the code field
>>to 
>>do that matching, is that possible?. I cannot be sure the lucene 
>>internal ID's are the same for the same codes in both indexes.
>>    
>>
>
>Are you storing the internal Lucene Document Id in the 'code' field? 
>If you are, I suggest you change your application to use its own set of
>unique Ids to serve as 'primary keys' in your indices.
>
>Otis
>
>
>  
>
>>Thanks again,
>>
>>Albert
>>
>>
>>Otis Gospodnetic wrote:
>>
>>    
>>
>>>(re-directing to lucene-user list)
>>>
>>>Albert,
>>>
>>>If I understand your question correctly... You could run a query
>>>      
>>>
>>like
>>    
>>
>>>the one you gave on both indices, but if one of them contains
>>>      
>>>
>>documents
>>    
>>
>>>that have only one of those fields (cluster), then there will never
>>>      
>>>
>>be
>>    
>>
>>>any matches in the second index.
>>>
>>>However, why not leave your big index along, add documents to a new,
>>>smaller index, and then merge them periodically.  I may be off with
>>>this; it sounds like this is what you want to do, but I'm not
>>>      
>>>
>>certain I
>>    
>>
>>>understood you fully.
>>>
>>>Otis
>>>
>>>--- Albert Vila <av...@imente.com> wrote:
>>> 
>>>
>>>      
>>>
>>>>Hi all,
>>>>
>>>>I was wondering If I can search using the MultiSearcher over two 
>>>>diferent indexes at the same time (with diferent fields).
>>>>I've got one big index, with the code, title, content, language,
>>>>        
>>>>
>>etc 
>>    
>>
>>>>fields (new documents are added incrementally). Now, I have to
>>>>introduce 
>>>>a clustering field. The problem is that I have to update the whole
>>>>index 
>>>>each time the clusters change, and I have no enought time to do it
>>>>        
>>>>
>>(I
>>    
>>
>>>>wanna check for new clusters every 10 minuts and I spent 25 minutes
>>>>to 
>>>>reindex the whole index).
>>>>A query example could be: language:0 and title:java and cluster:0
>>>>
>>>>Can I leave the big index whitout any changes and create a new
>>>>        
>>>>
>>index 
>>    
>>
>>>>with only the following fields, code and cluster, and perform the 
>>>>searches using this two indexes? I think I cannot do that without 
>>>>changing the code. It would need a postprocess, matching all
>>>>returning 
>>>>codes from index 1 with index 2.
>>>>
>>>>Anyone have a solution for this problem? I would appreciate that.
>>>>   
>>>>
>>>>        
>>>>
>>>
>>> 
>>>
>>>      
>>>
>>-- 
>>Albert Vila
>>Director de proyectos I+D
>>http://www.imente.com
>>902 933 242
>>[iMente “La información con más beneficios”]
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>  
>

-- 
Albert Vila
Director de proyectos I+D
http://www.imente.com
902 933 242
[iMente “La información con más beneficios”]


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Clustering question: searching two diferent indexes

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Albert,

--- Albert Vila <av...@imente.com> wrote:
> Thanks Otis, but I can merge two indexes with different fields?

Yes.  Documents with different Fields can be stored in the same index.
Not every Document has to have all fields, and it can even have a
completely different set of Fields.

> My big index has this fields, code, title, content, language and
> date. I add the new documents incrementally.
> 
> The clustering index only contains the fields code, and cluster.
> Merging 
> the big index with the clustering one will preserve the order of the
> big one?

I don't fully understand what you mean by 'order'.  If you are asking
whether internal document Ids will remain the same, the answer is
negative.  If you have deleted some documents, there will be gaps in
document Id sequence, which Lucene will fill, thus re-assigning
internal document Ids.

> For example, if I have the following indexes:
> Big index
> code_1, title_1, content_1, language_1, date_1
> code_2, title_2, content_2, language_2, date_2
> ....
> 
> Clustering index
> code_1, cluster_1
> code_2, cluster_2
> ....
> 
> then the new merged index will be:
> 
> Merged index
> code_1, title_1, content_1, language_1, date_1, cluster_1
> code_2, title_2, content_2, language_2, date_2, cluster_2
> ....
> 
> If I can do that then fine, but I think the merging process uses the 
> lucene internal ID to match the documents. I wanna use the code field
> to 
> do that matching, is that possible?. I cannot be sure the lucene 
> internal ID's are the same for the same codes in both indexes.

Are you storing the internal Lucene Document Id in the 'code' field? 
If you are, I suggest you change your application to use its own set of
unique Ids to serve as 'primary keys' in your indices.

Otis


> Thanks again,
> 
> Albert
> 
> 
> Otis Gospodnetic wrote:
> 
> >(re-directing to lucene-user list)
> >
> >Albert,
> >
> >If I understand your question correctly... You could run a query
> like
> >the one you gave on both indices, but if one of them contains
> documents
> >that have only one of those fields (cluster), then there will never
> be
> >any matches in the second index.
> >
> >However, why not leave your big index along, add documents to a new,
> >smaller index, and then merge them periodically.  I may be off with
> >this; it sounds like this is what you want to do, but I'm not
> certain I
> >understood you fully.
> >
> >Otis
> >
> >--- Albert Vila <av...@imente.com> wrote:
> >  
> >
> >>Hi all,
> >>
> >>I was wondering If I can search using the MultiSearcher over two 
> >>diferent indexes at the same time (with diferent fields).
> >>I've got one big index, with the code, title, content, language,
> etc 
> >>fields (new documents are added incrementally). Now, I have to
> >>introduce 
> >>a clustering field. The problem is that I have to update the whole
> >>index 
> >>each time the clusters change, and I have no enought time to do it
> (I
> >>
> >>wanna check for new clusters every 10 minuts and I spent 25 minutes
> >>to 
> >>reindex the whole index).
> >>A query example could be: language:0 and title:java and cluster:0
> >>
> >>Can I leave the big index whitout any changes and create a new
> index 
> >>with only the following fields, code and cluster, and perform the 
> >>searches using this two indexes? I think I cannot do that without 
> >>changing the code. It would need a postprocess, matching all
> >>returning 
> >>codes from index 1 with index 2.
> >>
> >>Anyone have a solution for this problem? I would appreciate that.
> >>    
> >>
> >
> >
> >
> >  
> >
> 
> -- 
> Albert Vila
> Director de proyectos I+D
> http://www.imente.com
> 902 933 242
> [iMente �La informaci�n con m�s beneficios�]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Clustering question: searching two diferent indexes

Posted by Albert Vila <av...@imente.com>.

Thanks Otis, but I can merge two indexes with different fields?

My big index has this fields, code, title, content, language and date. I 
add the new documents incrementally.

The clustering index only contains the fields code, and cluster. Merging 
the big index with the clustering one will preserve the order of the big 
one? For example, if I have the following indexes:
Big index
code_1, title_1, content_1, language_1, date_1
code_2, title_2, content_2, language_2, date_2
...

Clustering index
code_1, cluster_1
code_2, cluster_2
...

then the new merged index will be:

Merged index
code_1, title_1, content_1, language_1, date_1, cluster_1
code_2, title_2, content_2, language_2, date_2, cluster_2
...

If I can do that then fine, but I think the merging process uses the 
lucene internal ID to match the documents. I wanna use the code field to 
do that matching, is that possible?. I cannot be sure the lucene 
internal ID's are the same for the same codes in both indexes.

Thanks again,

Albert

Otis Gospodnetic wrote:

>(re-directing to lucene-user list)
>
>Albert,
>
>If I understand your question correctly... You could run a query like
>the one you gave on both indices, but if one of them contains documents
>that have only one of those fields (cluster), then there will never be
>any matches in the second index.
>
>However, why not leave your big index along, add documents to a new,
>smaller index, and then merge them periodically.  I may be off with
>this; it sounds like this is what you want to do, but I'm not certain I
>understood you fully.
>
>Otis
>
>--- Albert Vila <av...@imente.com> wrote:
>  
>
>>Hi all,
>>
>>I was wondering If I can search using the MultiSearcher over two 
>>diferent indexes at the same time (with diferent fields).
>>I've got one big index, with the code, title, content, language, etc 
>>fields (new documents are added incrementally). Now, I have to
>>introduce 
>>a clustering field. The problem is that I have to update the whole
>>index 
>>each time the clusters change, and I have no enought time to do it (I
>>
>>wanna check for new clusters every 10 minuts and I spent 25 minutes
>>to 
>>reindex the whole index).
>>A query example could be: language:0 and title:java and cluster:0
>>
>>Can I leave the big index whitout any changes and create a new index 
>>with only the following fields, code and cluster, and perform the 
>>searches using this two indexes? I think I cannot do that without 
>>changing the code. It would need a postprocess, matching all
>>returning 
>>codes from index 1 with index 2.
>>
>>Anyone have a solution for this problem? I would appreciate that.
>>    
>>
>
>
>
>  
>

-- 
Albert Vila
Director de proyectos I+D
http://www.imente.com
902 933 242
[iMente “La información con más beneficios”]

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Clustering question: searching two diferent indexes

Posted by Otis Gospodnetic <ot...@yahoo.com>.

(re-directing to lucene-user list)

Albert,

If I understand your question correctly... You could run a query like
the one you gave on both indices, but if one of them contains documents
that have only one of those fields (cluster), then there will never be
any matches in the second index.

However, why not leave your big index along, add documents to a new,
smaller index, and then merge them periodically.  I may be off with
this; it sounds like this is what you want to do, but I'm not certain I
understood you fully.

Otis

--- Albert Vila <av...@imente.com> wrote:
> Hi all,
> 
> I was wondering If I can search using the MultiSearcher over two 
> diferent indexes at the same time (with diferent fields).
> I've got one big index, with the code, title, content, language, etc 
> fields (new documents are added incrementally). Now, I have to
> introduce 
> a clustering field. The problem is that I have to update the whole
> index 
> each time the clusters change, and I have no enought time to do it (I
> 
> wanna check for new clusters every 10 minuts and I spent 25 minutes
> to 
> reindex the whole index).
> A query example could be: language:0 and title:java and cluster:0
> 
> Can I leave the big index whitout any changes and create a new index 
> with only the following fields, code and cluster, and perform the 
> searches using this two indexes? I think I cannot do that without 
> changing the code. It would need a postprocess, matching all
> returning 
> codes from index 1 with index 2.
> 
> Anyone have a solution for this problem? I would appreciate that.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org