You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@syr.edu> on 2004/02/05 23:27:42 UTC

Dmitry's Term Vector stuff, plus some

Hi All,

I am putting the finishing touches on an implementation of Dmitry's Term Vector code built and running against the HEAD, plus test cases for all files involved.  What is the best way to submit this?  I can do the diff, but how should I submit the new files?

I can also provide notes on my implementation, as it varies slightly from Dmitry's due to changes in 1.3.

I also tested by indexing 12,598 documents (88,362 terms) using both term vectors and no term vectors.
Index size w/o term vectors: 42 MB
Index size w/ term vectors: 71.3 MB

Time for the first test was 5 minutes 30 seconds, time for the second test was 6 minutes 2 seconds.

Let me know, and I will upload it tomorrow or Monday.

Thanks,
Grant


----------------------------------------------------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University

http://www.cnlp.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Dmitry's Term Vector stuff, plus some

Posted by Doug Cutting <cu...@apache.org>.
The best way to generate a patch is to connect to the root of your 
Lucene CVS checkout, and run:

   cvs diff -Nu > my.patch

This will include all newly added files.

Hmm.  Perhaps that requires that you're a developer.  If it does, then 
simply tar up the new files separately from the patch file.  Attach 
everything to a bug report.

This will be great to have!

Doug

Grant Ingersoll wrote:
> Hi All,
> 
> I am putting the finishing touches on an implementation of Dmitry's Term Vector code built and running against the HEAD, plus test cases for all files involved.  What is the best way to submit this?  I can do the diff, but how should I submit the new files?
> 
> I can also provide notes on my implementation, as it varies slightly from Dmitry's due to changes in 1.3.
> 
> I also tested by indexing 12,598 documents (88,362 terms) using both term vectors and no term vectors.
> Index size w/o term vectors: 42 MB
> Index size w/ term vectors: 71.3 MB
> 
> Time for the first test was 5 minutes 30 seconds, time for the second test was 6 minutes 2 seconds.
> 
> Let me know, and I will upload it tomorrow or Monday.
> 
> Thanks,
> Grant
> 
> 
> ----------------------------------------------------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> 
> http://www.cnlp.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org