You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ira Goldstein <ir...@usa.net> on 2005/12/13 23:55:52 UTC

Re: Impact of Term Vectors

We've run into an issue with the term vectors. When indexing a small corpus
(~3k docs, 1.3G) everything works fine, as it does with a small number of
documents from TREC-6 (so we believe that our indexing code is ok).  However,
when we tried to index the full TREC-6 corpus (~300,000 docs, 2G) the term
vectors all seem to come back as null.  When the indexing process is going on,
we can see the .frq file being built so it appears as if the index routine is
doing its thing.

Has anyone experienced anything similar?

Thanks in advance for any insight you can offer
--Ira Goldstein
  

------ Original Message ------
Received: Tue, 13 Dec 2005 12:41:07 PM EST
From: "Dan Climan" <dc...@keepmedia.com>
To: <ja...@lucene.apache.org>
Subject: Impact of Term Vectors (was ApacheCon next week)

> Good question. I was wondering about the impact of adding term vectors with
> the various options. For example, is adding term vectors with both
positions
> and offsets a significant impact? Which current parts of lucene (including
> contributions) take advantage of term vectors being present? I know that
> Highlighter class can make use of them if present.
> 
> Dan
> 
> -----Original Message-----
> From: Jeff Rodenburg [mailto:jeff.rodenburg@gmail.com] 
> Sent: Monday, December 12, 2005 9:08 PM
> To: java-user@lucene.apache.org
> Subject: Re: ApacheCon next week
> 
> Well done, Grant.  Very informative.
> 
> Question on Term Vectors: with their inclusion in an index, have you
noticed
> any degradation in performance, either from a search effiiciency or
> maintenance point-of-view?  Given the power of term vectors, if the perf
> impact is negligible, I'm curious to the reasons why one would NOT include
> term vectors in any and every index...
> 
> cheers,
> j
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Impact of Term Vectors

Posted by Grant Ingersoll <gs...@syr.edu>.
I have successfully used term vectors for both the TREC English and 
Arabic collections.  Take a look at the code I posted at 
http://www.cnlp.org/apachecon2005 or Erik and Otis' book, "Lucene In 
Action", which both have good examples of term vectors.  Perhaps there 
is something with how you are creating your fields.



Ira Goldstein wrote:

>We've run into an issue with the term vectors. When indexing a small corpus
>(~3k docs, 1.3G) everything works fine, as it does with a small number of
>documents from TREC-6 (so we believe that our indexing code is ok).  However,
>when we tried to index the full TREC-6 corpus (~300,000 docs, 2G) the term
>vectors all seem to come back as null.  When the indexing process is going on,
>we can see the .frq file being built so it appears as if the index routine is
>doing its thing.
>
>Has anyone experienced anything similar?
>
>Thanks in advance for any insight you can offer
>--Ira Goldstein
>  
>
>------ Original Message ------
>Received: Tue, 13 Dec 2005 12:41:07 PM EST
>From: "Dan Climan" <dc...@keepmedia.com>
>To: <ja...@lucene.apache.org>
>Subject: Impact of Term Vectors (was ApacheCon next week)
>
>  
>
>>Good question. I was wondering about the impact of adding term vectors with
>>the various options. For example, is adding term vectors with both
>>    
>>
>positions
>  
>
>>and offsets a significant impact? Which current parts of lucene (including
>>contributions) take advantage of term vectors being present? I know that
>>Highlighter class can make use of them if present.
>>
>>Dan
>>
>>-----Original Message-----
>>From: Jeff Rodenburg [mailto:jeff.rodenburg@gmail.com] 
>>Sent: Monday, December 12, 2005 9:08 PM
>>To: java-user@lucene.apache.org
>>Subject: Re: ApacheCon next week
>>
>>Well done, Grant.  Very informative.
>>
>>Question on Term Vectors: with their inclusion in an index, have you
>>    
>>
>noticed
>  
>
>>any degradation in performance, either from a search effiiciency or
>>maintenance point-of-view?  Given the power of term vectors, if the perf
>>impact is negligible, I'm curious to the reasons why one would NOT include
>>term vectors in any and every index...
>>
>>cheers,
>>j
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>    
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>  
>

-- 
------------------------------------------------------------------- 
Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
337 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org