You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ziqi Zhang <zi...@sheffield.ac.uk> on 2015/09/21 17:08:32 UTC
offsets of a term in a document
Hi
Given a document in a lucene index, I would like to get a list of terms
in that document and their offsets. I suppose starting with
IndexReader.getTermVector can get me going with this. I have some code
as below (Lucene 5.3) of which I have some questions:
----------------------------------------
IndexReader reader = ....
Terms termVector = reader.getTermVector(docId, "content");
//now iterate through the terms
TermsEnum ti = termVector.iterator();
BytesRef luceneTerm = ti.next();
while(luceneTerm!=null){
String tString =luceneTerm.utf8ToString();
//each term can have >1 occurrence, so I need to get each
occurrence:
PostingsEnum postingsEnum=ti.postings(???, PostingsEnum.OFFSETS);
int totalOccurrence=postingsEnum.freq();
for(int i=0; i<totalOccurrence; i++) { //api says calling
"nextPosition" more than "freq()" times is undefined, so...
postingsEnum.nextPosition(); //move cursor to next
position/occurrence
int start=postingsEnum.startOffset(); //get the startoffset
int end=postingsEnum.endOffset(); //get the endoffset
}
luceneTerm=ti.next();
}
------------------------------------------
The first question is if the code makes sense.
The second question if where I should put in place of "???". The API
says "pass a prior PostingsEnum for possible reuse", but I don't get how
to create an instance of it.
Many thanks!
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: offsets of a term in a document
Posted by Alan Woodward <al...@flax.co.uk>.
>
> The second question if where I should put in place of "???". The API says "pass a prior PostingsEnum for possible reuse", but I don't get how to create an instance of it.
You can just pass null.
Alan Woodward
www.flax.co.uk
>
> Many thanks!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org