You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Vi...@McAfee.com on 2012/12/06 11:28:40 UTC

Lucene 4.0.0 - find term position.

Hi all,
I am new with Lucene.
I try to understand how can I find the term position.

I use following code to index documents:
...
                IndexWriter writer = new IndexWriter(mIndexDir, mIwc);
                FileInputStream fis;
                fis = new FileInputStream(file);
Document doc = new Document();
Field pathField = new StringField("path", file.getPath(), Field.Store.YES);
                doc.add(pathField);
doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-16LE"))));
writer.updateDocument(new Term("path", file.getPath()), doc);
fis.close();
writer.close();
...

To search I use following code:
...
IndexReader reader = DirectoryReader.open( mIndexDir );
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser(Version.LUCENE_40, "contents", mAnalyzer);
Query query = parser.parse(aQuery);

TopScoreDocCollector collector = TopScoreDocCollector.create(100, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

for(int i=0;i<hits.length;++i) {
                int docId = hits[i].doc;
                Document d = searcher.doc(docId);
}
...
How can I find positions of my query string in the indexed documents.

Thanks, Vitaly

Re: Lucene 4.0.0 - find term position.

Posted by Adrien Grand <jp...@gmail.com>.

Hi Vitaly,

On Fri, Dec 7, 2012 at 3:24 PM, <Vi...@mcafee.com> wrote:

> I try to use or  Terms tfvector = reader.getTermVector(docId, "contents");
> or  Fields fields = reader.getTermVectors(docId);
> but I get null from these calls.
> What is wrong?

These methods will always return null unless you turn term vectors on at
indexing time (see FieldType.setStoreTermVectors[1]
and FieldType.setStoreTermVectorPositions[2]).

 [1]
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectors(boolean)
 [2]
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectorPositions(boolean)

-- 
Adrien

RE: Lucene 4.0.0 - find term position.

Posted by Vi...@McAfee.com.

I try to use or  Terms tfvector = reader.getTermVector(docId, "contents"); or  Fields fields = reader.getTermVectors(docId);
but I get null from these calls.
What is wrong?

-----Original Message-----
From: lukai [mailto:lukai1984@gmail.com] 
Sent: Friday, December 07, 2012 2:50 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene 4.0.0 - find term position.

terms = fileds.terms(...);
termsEnum = terms.iterator(null);
termsEnum.seekExat(...);
docsAndPositionsEnum docsPosEnum = termsEnum.docsAndPositions(...);

You can get the information in "docsPosEnum".


On Thu, Dec 6, 2012 at 2:28 AM, <Vi...@mcafee.com> wrote:

> Hi all,
> I am new with Lucene.
> I try to understand how can I find the term position.
>
> I use following code to index documents:
> ...
>                 IndexWriter writer = new IndexWriter(mIndexDir, mIwc);
>                 FileInputStream fis;
>                 fis = new FileInputStream(file); Document doc = new 
> Document(); Field pathField = new StringField("path", file.getPath(), 
> Field.Store.YES);
>                 doc.add(pathField);
> doc.add(new TextField("contents", new BufferedReader(new 
> InputStreamReader(fis, "UTF-16LE")))); writer.updateDocument(new 
> Term("path", file.getPath()), doc); fis.close(); writer.close(); ...
>
> To search I use following code:
> ...
> IndexReader reader = DirectoryReader.open( mIndexDir ); IndexSearcher 
> searcher = new IndexSearcher(reader); QueryParser parser = new 
> QueryParser(Version.LUCENE_40, "contents", mAnalyzer); Query query = 
> parser.parse(aQuery);
>
> TopScoreDocCollector collector = TopScoreDocCollector.create(100, 
> true); searcher.search(query, collector); ScoreDoc[] hits = 
> collector.topDocs().scoreDocs;
>
> for(int i=0;i<hits.length;++i) {
>                 int docId = hits[i].doc;
>                 Document d = searcher.doc(docId); } ...
> How can I find positions of my query string in the indexed documents.
>
> Thanks, Vitaly
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene 4.0.0 - find term position.

Posted by lukai <lu...@gmail.com>.

terms = fileds.terms(...);
termsEnum = terms.iterator(null);
termsEnum.seekExat(...);
docsAndPositionsEnum docsPosEnum = termsEnum.docsAndPositions(...);

You can get the information in "docsPosEnum".


On Thu, Dec 6, 2012 at 2:28 AM, <Vi...@mcafee.com> wrote:

> Hi all,
> I am new with Lucene.
> I try to understand how can I find the term position.
>
> I use following code to index documents:
> ...
>                 IndexWriter writer = new IndexWriter(mIndexDir, mIwc);
>                 FileInputStream fis;
>                 fis = new FileInputStream(file);
> Document doc = new Document();
> Field pathField = new StringField("path", file.getPath(), Field.Store.YES);
>                 doc.add(pathField);
> doc.add(new TextField("contents", new BufferedReader(new
> InputStreamReader(fis, "UTF-16LE"))));
> writer.updateDocument(new Term("path", file.getPath()), doc);
> fis.close();
> writer.close();
> ...
>
> To search I use following code:
> ...
> IndexReader reader = DirectoryReader.open( mIndexDir );
> IndexSearcher searcher = new IndexSearcher(reader);
> QueryParser parser = new QueryParser(Version.LUCENE_40, "contents",
> mAnalyzer);
> Query query = parser.parse(aQuery);
>
> TopScoreDocCollector collector = TopScoreDocCollector.create(100, true);
> searcher.search(query, collector);
> ScoreDoc[] hits = collector.topDocs().scoreDocs;
>
> for(int i=0;i<hits.length;++i) {
>                 int docId = hits[i].doc;
>                 Document d = searcher.doc(docId);
> }
> ...
> How can I find positions of my query string in the indexed documents.
>
> Thanks, Vitaly
>
>
>