You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Piotr Pęzik <pi...@gmail.com> on 2012/12/17 13:15:43 UTC
TermVectors and Attributes in Lucene 4.0
Hi,
I've been trying to enumerate over all terms in all documents in a
Lucene 4.0 index in order to retrieve their attributes (payloads,
positions etc.).
I have an index with documents containing stored, tokenized fields with
term vectors, offsets and payloads. Below is what I have tried so far
(have to admit I don't fully understand this part of the 4.0 API yet).
My questions are: can I use either TermsEnum or DocsEnum or
DocsAndPositionsEnum to access each term per each document and get its
attributes? They all have the .attributes() method, but so far I haven't
managed to make it return the actual attributes of individual terms (not
even the CharTermAttribute).
Thanks,
Piotr Pezik
//Checking field type:
Document doc = dReader.document(1);
System.out.println(doc.getField("myField").fieldType());
//=>
stored,indexed,tokenized,termVector,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
//Getting Terms and TermsEnum:
Terms terms = SlowCompositeReaderWrapper
.wrap(directoryReader).terms("myField");
TermsEnum tenum = terms.iterator(TermsEnum.EMPTY);
//Moving to the next term (?)
BytesRef br = tenum.next();
System.out.println(tenum.attributes().hasAttributes());
//=>FALSE
System.out.println(tenum.attributes().getAttribute(PositionIncrementAttribute.class));
// => java.lang.IllegalArgumentException: This AttributeSource does not
have the attribute
'org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute'.
Bits liveDocs = SlowCompositeReaderWrapper.wrap(dReader).getLiveDocs();
DocsEnum denum = tenum.docs(liveDocs, null);
denum.nextDoc();
System.out.println(denum.attributes().hasAttributes());
//=>FALSE
DocsAndPositionsEnum denum2 = tenum.docsAndPositions(liveDocs, null);
denum2.nextDoc();
System.out.println(denum2.attributes().hasAttributes());
//=>FALSE
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org