You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jim Hargrave <ha...@ldschurch.org> on 2003/07/02 07:22:07 UTC

Re: Geting exact term positions for each document inside acollect method...

Our application indexes and retreieves sentences from a large database. Our terms are overlapping characters (n-grams). In order to calculate our custom score we need to know the (relative) position of each n-gram in the matched sentences. I'm currently using a boolen query (each n-ngram in a big 'OR' statement). I will investigate customizing the query as you suggest. 

Basically we are using Lucene as a Translation Memeory tool! Pretty cool. Lucene is wonderful and I think we can use it in many of our linguistic projects (Terminlogy, concordance, TM etc.).

Jim

>>> cutting@lucene.com 06/30/03 10:56 AM >>>
Jim Hargrave wrote:
> I've defined my own collector (I want the raw score before it is normalized between 1.0 and 0.0). For each document I need to know the the matching term positions in the document.  I've seen the methods in IndexReader, but how can I access them inside my collect method? Are there other methods I am missing? 

No, this information is not available to the hit collector.

Why do you need this?  If it is only for summaries, then you're probably 
better off re-tokenizing those few documents that you wish to summarize. 
  If it is for query evaluation, then you're probably better off writing 
a new class of query (which is non-trivial).

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org




------------------------------------------------------------------------------
This message may contain confidential information, and is intended only for the use of the individual(s) to whom it is addressed.


==============================================================================


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org