You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by starz10de <fa...@yahoo.com> on 2012/03/29 17:33:31 UTC

conditional High Freq Terms in Lucene index

HI, 

I am using HighFreqTerms class to compute the high frequent terms in the
Lucene index and it works well. However, I am interested to compute the high
frequent terms under some condition. I would like to compute the high
frequent terms not for all documents in the index instead only for documents
with type “A”. Beside the “contents” field in the index I have also the
“DocType” (document type) in the index as extra field. 
So I should compute the high frequent term only  (if DocType=”A”) 

Any idea how to do this? 

Thanks 

--
View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: conditional High Freq Terms in Lucene index

Posted by Michael McCandless <lu...@mikemccandless.com>.

One big problem is your collector (that gathers all "A" doc IDs) is
not mapping the per-segment docID to the top-level global docID space.

You need to save the docBase that was passed to setNextReader, and
then add it back in on each collect call.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Mar 30, 2012 at 7:23 PM, starz10de <fa...@yahoo.com> wrote:
> Thanks for your hint.
>
> I tried simple solution as following:
> Firstly I determine the document type “A” and stored them in an array by
> searching the field document type in the index:
> public static void doStreamingSearch(final Searcher searcher, Query query)
>                        throws IOException {
>
>
>                Collector streamingHitCollector = new Collector() {
>                        // simply print docId and score of every matching document
>                        @Override
>                        public void collect(int doc) throws IOException {
>                                c++;
>                        //      System.out.println("doc=" + doc);
>
>                                doc_id.add(doc+"");
>                                //  System.out.println("doc=" + doc  );
>                                // scorer.score());
>                        }
>
>                        @Override
>                        public boolean acceptsDocsOutOfOrder() {
>                                return true;
>                        }
>
>                        @Override
>                        public void setNextReader(IndexReader arg0, int arg1)
>                                        throws IOException {
>                                // TODO Auto-generated method stub
>
>                        }
>
>                        @Override
>                        public void setScorer(Scorer arg0) throws IOException {
>                                // TODO Auto-generated method stub
>
>                        }
>
>                };
>
>                 searcher.search(query, streamingHitCollector);
>
>        }
> Then I modified the HighFrequentTerm in lucene as follows:
> while (terms.next()) {
>
>      dok.seek(terms);
>
>        while (dok.next()) {
>
>
>
>                  for(int i=0;i< doc_id.size();++i)
>                         {
>
>                    if( doc_id.get(i).equals(dok.doc()+""))
>                    {
>                         if (terms.term().field().equals(field)  ) {
>
> tiq.insertWithOverflow(new TermInfo(terms.term(), dok.freq()));
>                                }
>
>                    }
> I could test that i correctly have only the document type „A“. However, the
> result is not correct because I can see few terms twice in the ordered high
> frequent list.
>
> Any hints where are the problem?
>
> Michael McCandless-2 wrote
>>
>> You'd have to modify HighFreqTerm's sources...
>>
>> Roughly...
>>
>> First, make a bitset recording which docs are type A (eg, use
>> FieldCache), second, change HighFreqTerms so that for each term, it
>> walks the postings, counting how many type A docs there were, then...
>> just use the rest of HighFreqTerms (priority queue, etc.).
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Mar 29, 2012 at 11:33 AM, starz10de <farag_ahmed@> wrote:
>>> HI,
>>>
>>> I am using HighFreqTerms class to compute the high frequent terms in the
>>> Lucene index and it works well. However, I am interested to compute the
>>> high
>>> frequent terms under some condition. I would like to compute the high
>>> frequent terms not for all documents in the index instead only for
>>> documents
>>> with type “A”. Beside the “contents” field in the index I have also the
>>> “DocType” (document type) in the index as extra field.
>>> So I should compute the high frequent term only  (if DocType=”A”)
>>>
>>> Any idea how to do this?
>>>
>>> Thanks
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html
>>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@.apache
>>> For additional commands, e-mail: dev-help@.apache
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@.apache
>> For additional commands, e-mail: dev-help@.apache
>>
>
> Michael McCandless-2 wrote
>>
>> You'd have to modify HighFreqTerm's sources...
>>
>> Roughly...
>>
>> First, make a bitset recording which docs are type A (eg, use
>> FieldCache), second, change HighFreqTerms so that for each term, it
>> walks the postings, counting how many type A docs there were, then...
>> just use the rest of HighFreqTerms (priority queue, etc.).
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Mar 29, 2012 at 11:33 AM, starz10de <farag_ahmed@> wrote:
>>> HI,
>>>
>>> I am using HighFreqTerms class to compute the high frequent terms in the
>>> Lucene index and it works well. However, I am interested to compute the
>>> high
>>> frequent terms under some condition. I would like to compute the high
>>> frequent terms not for all documents in the index instead only for
>>> documents
>>> with type “A”. Beside the “contents” field in the index I have also the
>>> “DocType” (document type) in the index as extra field.
>>> So I should compute the high frequent term only  (if DocType=”A”)
>>>
>>> Any idea how to do this?
>>>
>>> Thanks
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html
>>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@.apache
>>> For additional commands, e-mail: dev-help@.apache
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@.apache
>> For additional commands, e-mail: dev-help@.apache
>>
>
> Michael McCandless-2 wrote
>>
>> You'd have to modify HighFreqTerm's sources...
>>
>> Roughly...
>>
>> First, make a bitset recording which docs are type A (eg, use
>> FieldCache), second, change HighFreqTerms so that for each term, it
>> walks the postings, counting how many type A docs there were, then...
>> just use the rest of HighFreqTerms (priority queue, etc.).
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Mar 29, 2012 at 11:33 AM, starz10de <farag_ahmed@> wrote:
>>> HI,
>>>
>>> I am using HighFreqTerms class to compute the high frequent terms in the
>>> Lucene index and it works well. However, I am interested to compute the
>>> high
>>> frequent terms under some condition. I would like to compute the high
>>> frequent terms not for all documents in the index instead only for
>>> documents
>>> with type “A”. Beside the “contents” field in the index I have also the
>>> “DocType” (document type) in the index as extra field.
>>> So I should compute the high frequent term only  (if DocType=”A”)
>>>
>>> Any idea how to do this?
>>>
>>> Thanks
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html
>>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@.apache
>>> For additional commands, e-mail: dev-help@.apache
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@.apache
>> For additional commands, e-mail: dev-help@.apache
>>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3872298.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: conditional High Freq Terms in Lucene index

Posted by starz10de <fa...@yahoo.com>.

Thanks for your hint.

I tried simple solution as following:
Firstly I determine the document type “A” and stored them in an array by
searching the field document type in the index:
public static void doStreamingSearch(final Searcher searcher, Query query)
			throws IOException {
		
		
		Collector streamingHitCollector = new Collector() { 
			// simply print docId and score of every matching document
			@Override
			public void collect(int doc) throws IOException {
				c++;
			//	System.out.println("doc=" + doc);
				
				doc_id.add(doc+"");
				//  System.out.println("doc=" + doc  );
				// scorer.score());
			}

			@Override
			public boolean acceptsDocsOutOfOrder() {
				return true;
			}

			@Override
			public void setNextReader(IndexReader arg0, int arg1)
					throws IOException {
				// TODO Auto-generated method stub
				
			}

			@Override
			public void setScorer(Scorer arg0) throws IOException {
				// TODO Auto-generated method stub
				
			} 

		};

		 searcher.search(query, streamingHitCollector); 
		 
	}
Then I modified the HighFrequentTerm in lucene as follows:
while (terms.next()) { 
    	  
      dok.seek(terms);
         
        while (dok.next()) {  
        	 
         	
       
        	  for(int i=0;i< doc_id.size();++i)
        		 { 
            	 
                    if( doc_id.get(i).equals(dok.doc()+""))
                    {
                    	 if (terms.term().field().equals(field)  ) {
                    		                    		  
tiq.insertWithOverflow(new TermInfo(terms.term(), dok.freq()));
                    	        }
            
                    }
I could test that i correctly have only the document type „A“. However, the
result is not correct because I can see few terms twice in the ordered high
frequent list.

Any hints where are the problem?

Michael McCandless-2 wrote
> 
> You'd have to modify HighFreqTerm's sources...
> 
> Roughly...
> 
> First, make a bitset recording which docs are type A (eg, use
> FieldCache), second, change HighFreqTerms so that for each term, it
> walks the postings, counting how many type A docs there were, then...
> just use the rest of HighFreqTerms (priority queue, etc.).
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Thu, Mar 29, 2012 at 11:33 AM, starz10de &lt;farag_ahmed@&gt; wrote:
>> HI,
>>
>> I am using HighFreqTerms class to compute the high frequent terms in the
>> Lucene index and it works well. However, I am interested to compute the
>> high
>> frequent terms under some condition. I would like to compute the high
>> frequent terms not for all documents in the index instead only for
>> documents
>> with type “A”. Beside the “contents” field in the index I have also the
>> “DocType” (document type) in the index as extra field.
>> So I should compute the high frequent term only  (if DocType=”A”)
>>
>> Any idea how to do this?
>>
>> Thanks
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@.apache
>> For additional commands, e-mail: dev-help@.apache
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@.apache
> For additional commands, e-mail: dev-help@.apache
> 

Michael McCandless-2 wrote
> 
> You'd have to modify HighFreqTerm's sources...
> 
> Roughly...
> 
> First, make a bitset recording which docs are type A (eg, use
> FieldCache), second, change HighFreqTerms so that for each term, it
> walks the postings, counting how many type A docs there were, then...
> just use the rest of HighFreqTerms (priority queue, etc.).
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Thu, Mar 29, 2012 at 11:33 AM, starz10de &lt;farag_ahmed@&gt; wrote:
>> HI,
>>
>> I am using HighFreqTerms class to compute the high frequent terms in the
>> Lucene index and it works well. However, I am interested to compute the
>> high
>> frequent terms under some condition. I would like to compute the high
>> frequent terms not for all documents in the index instead only for
>> documents
>> with type “A”. Beside the “contents” field in the index I have also the
>> “DocType” (document type) in the index as extra field.
>> So I should compute the high frequent term only  (if DocType=”A”)
>>
>> Any idea how to do this?
>>
>> Thanks
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@.apache
>> For additional commands, e-mail: dev-help@.apache
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@.apache
> For additional commands, e-mail: dev-help@.apache
> 

Michael McCandless-2 wrote
> 
> You'd have to modify HighFreqTerm's sources...
> 
> Roughly...
> 
> First, make a bitset recording which docs are type A (eg, use
> FieldCache), second, change HighFreqTerms so that for each term, it
> walks the postings, counting how many type A docs there were, then...
> just use the rest of HighFreqTerms (priority queue, etc.).
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Thu, Mar 29, 2012 at 11:33 AM, starz10de &lt;farag_ahmed@&gt; wrote:
>> HI,
>>
>> I am using HighFreqTerms class to compute the high frequent terms in the
>> Lucene index and it works well. However, I am interested to compute the
>> high
>> frequent terms under some condition. I would like to compute the high
>> frequent terms not for all documents in the index instead only for
>> documents
>> with type “A”. Beside the “contents” field in the index I have also the
>> “DocType” (document type) in the index as extra field.
>> So I should compute the high frequent term only  (if DocType=”A”)
>>
>> Any idea how to do this?
>>
>> Thanks
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@.apache
>> For additional commands, e-mail: dev-help@.apache
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@.apache
> For additional commands, e-mail: dev-help@.apache
> 


--
View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3872298.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: conditional High Freq Terms in Lucene index

Posted by starz10de <fa...@yahoo.com>.

I did as you mentioned and the problem still the same, I think the problem in
the highFrequentTerm part. There I see duplicate words in the produced high
frequent list. The comparison itself ok because I can see only terms belong
to document type "A" is added to the TermInfoQueue. However, the frequency
is not correctly counted for each term and also with some duplicate words in
the list. Does something wrong with TermDocs dok and dok.freq()?

--
View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873567.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: conditional High Freq Terms in Lucene index

Posted by Michael McCandless <lu...@mikemccandless.com>.

Hmm, you are adding two strings.  You should first add the two ints
(docBase + doc), then convert that to a string.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Mar 31, 2012 at 8:56 AM, starz10de <fa...@yahoo.com> wrote:
> I revised it including your comment:
>
>
>
>                        private Scorer scorer;
>                        private int docBase;
>
>                        // simply print docId and score of every matching document
>                        @Override
>                        public void collect(int doc) throws IOException {
>
> String k=doc+"";
> String k1=docBase+"";
>
>
>                                  doc_ids.add(k+k1);
>
>
>
>                        }
>
>                        @Override
>                        public boolean acceptsDocsOutOfOrder() {
>                          return true;
>                        }
>
>                        @Override
>                        public void setNextReader(IndexReader reader, int docBase)
>                            throws IOException {
>                          this.docBase = docBase;
>                        }
>
>                        @Override
>                        public void setScorer(Scorer scorer) throws IOException {
>                          this.scorer = scorer;
>                        }
>
>
>        I could see in the highFrequentTerm that the condition for the document
> type "A" is performed. However, the highFrequent term isnot computed
> correctly, I still see duplicate term in the list beside wrong occuerence.
>
> here how I do it:
>
> TermInfoQueue tiq = new TermInfoQueue(numTerms);
>    TermEnum terms = reader.terms();
>    TermDocs dok =null;
>    int k=0;
>    dok = reader.termDocs();
>    if (field != null) {
>      while (terms.next()) {
>
>
>          k=0;
>
>      dok.seek(terms);
>
>        while (dok.next()) {
>
>
>
>                //System.out.println(dok.doc());
>                  for(int i=0;i< doc_ids.size();++i)
>                         {
>
>
> if(categorization_based_on_year.doc_ids.get(i).equals(dok.doc()+""))
>                    {
>
> // here I can see that only doc ids for the type "A" is printed
>
> System.out.println(dok.doc());
>
>                         if (terms.term().field().equals(field)   ) {
>                       tiq.insertWithOverflow(new TermInfo(terms.term(),
> dok.freq()));
>                                }
>
>               i=10000;
>                    }
>
>                 }
> .
> .
> .
>
> any hint ?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873362.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: conditional High Freq Terms in Lucene index

Posted by starz10de <fa...@yahoo.com>.

I revised it including your comment:



		        private Scorer scorer;
		        private int docBase;
		        
		        // simply print docId and score of every matching document
		        @Override
		        public void collect(int doc) throws IOException {

String k=doc+"";
String k1=docBase+"";

		        	
		        	  doc_ids.add(k+k1);
		
		     
		        
		        }

		        @Override
		        public boolean acceptsDocsOutOfOrder() {
		          return true;
		        }

		        @Override
		        public void setNextReader(IndexReader reader, int docBase)
		            throws IOException {
		          this.docBase = docBase;
		        }

		        @Override
		        public void setScorer(Scorer scorer) throws IOException {
		          this.scorer = scorer;
		        }
		        
		      
	I could see in the highFrequentTerm that the condition for the document
type "A" is performed. However, the highFrequent term isnot computed
correctly, I still see duplicate term in the list beside wrong occuerence.

here how I do it:

TermInfoQueue tiq = new TermInfoQueue(numTerms);
    TermEnum terms = reader.terms();
    TermDocs dok =null; 
    int k=0;
    dok = reader.termDocs(); 
    if (field != null) { 
      while (terms.next()) { 
    	  
    	
          k=0;
      
      dok.seek(terms);
         
        while (dok.next()) {  
        	 
            
           
            	//System.out.println(dok.doc());
        	  for(int i=0;i< doc_ids.size();++i)
        		 {  

                   
if(categorization_based_on_year.doc_ids.get(i).equals(dok.doc()+""))
                    {

// here I can see that only doc ids for the type "A" is printed

System.out.println(dok.doc());

                    	 if (terms.term().field().equals(field)   ) {
                       tiq.insertWithOverflow(new TermInfo(terms.term(),
dok.freq()));
                    	        }
                    	 
               i=10000;
                    }
                 
       	  	 }   
.
.
.

any hint ?

--
View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873362.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: conditional High Freq Terms in Lucene index

Posted by starz10de <fa...@yahoo.com>.

Thanks for your hint.

I tried simple solution as following:
Firstly I determine the document type “A” and stored them in an array by
searching the field document type in the index:
public static void doStreamingSearch(final Searcher searcher, Query query)
			throws IOException {
		
		
		Collector streamingHitCollector = new Collector() { 
			// simply print docId and score of every matching document
			@Override
			public void collect(int doc) throws IOException {
				c++;
			//	System.out.println("doc=" + doc);
				
				doc_id.add(doc+"");
				//  System.out.println("doc=" + doc  );
				// scorer.score());
			}

			@Override
			public boolean acceptsDocsOutOfOrder() {
				return true;
			}

			@Override
			public void setNextReader(IndexReader arg0, int arg1)
					throws IOException {
				// TODO Auto-generated method stub
				
			}

			@Override
			public void setScorer(Scorer arg0) throws IOException {
				// TODO Auto-generated method stub
				
			} 

		};

		 searcher.search(query, streamingHitCollector); 
		 
	}
Then I modified the HighFrequentTerm in lucene as follows:
while (terms.next()) { 
    	  
      dok.seek(terms);
         
        while (dok.next()) {  
        	 
         	
       
        	  for(int i=0;i< doc_id.size();++i)
        		 { 
            	 
                    if( doc_id.get(i).equals(dok.doc()+""))
                    {
                    	 if (terms.term().field().equals(field)  ) {
                    		                    		  
tiq.insertWithOverflow(new TermInfo(terms.term(), dok.freq()));
                    	        }
            
                    }
I could test that i correctly have only the document type „A“. However, the
result is not correct because I can see few terms twice in the ordered high
frequent list.

Any hints where are the problem?


--
View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3872309.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: conditional High Freq Terms in Lucene index

Posted by Michael McCandless <lu...@mikemccandless.com>.

You'd have to modify HighFreqTerm's sources...

Roughly...

First, make a bitset recording which docs are type A (eg, use
FieldCache), second, change HighFreqTerms so that for each term, it
walks the postings, counting how many type A docs there were, then...
just use the rest of HighFreqTerms (priority queue, etc.).

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 29, 2012 at 11:33 AM, starz10de <fa...@yahoo.com> wrote:
> HI,
>
> I am using HighFreqTerms class to compute the high frequent terms in the
> Lucene index and it works well. However, I am interested to compute the high
> frequent terms under some condition. I would like to compute the high
> frequent terms not for all documents in the index instead only for documents
> with type “A”. Beside the “contents” field in the index I have also the
> “DocType” (document type) in the index as extra field.
> So I should compute the high frequent term only  (if DocType=”A”)
>
> Any idea how to do this?
>
> Thanks
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org