You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lev Bronshtein <le...@hotmail.com> on 2010/09/07 04:36:45 UTC

hit position

Now that I can index my data, I want to be able to search it and report some sort of position information with every hit, such as a line number or a byte ofset within the stream.  Any idea how I can acoomplish this?
 		 	   		  
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: hit position

Posted by Erick Erickson <er...@gmail.com>.
Line number is a completely unknown concept to Lucene, you have to somehow
figure it out. I've seen at least two ways to make that work:
1> use payloads. A payload is just a bit of data you attach to each token,
what you
     put in there is up to you, so you can encode this kind of information
however you
     want. See:
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

2> You can do a similar sort of thing by recording the relevant information
when
     you analyze a document and the include that data in a very special
(probably
     stored-only field) in your document. Say the offsets of each beginning
of line.
     This field would never be searched, just used to find out what line a
hit
     was on. Then, when you can find the lines numbers once you know the
term
     positions. Stealing from Grant:
See
http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours(although
I think you can do better than the code in the third reply by using a
TermVectorMapper such that you can process the TermVector as it comes from
disk.)

Essentially, you need to use a combination of SpanQuery, TermVector and
TermVectorMapper.

HTH
Erick


On Mon, Sep 6, 2010 at 10:36 PM, Lev Bronshtein
<le...@hotmail.com>wrote:

>
> Now that I can index my data, I want to be able to search it and report
> some sort of position information with every hit, such as a line number or a
> byte ofset within the stream.  Any idea how I can acoomplish this?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>