You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sirish Vadala <si...@gmail.com> on 2010/06/01 22:53:37 UTC

Problem fetching number of occurrences

Hello All:

Can any one suggest me the best way to get the no. of occurrences of each
word per document in Lucene?

Eg: Let the indexed text be:

If you are posting a question, please try search first. Your question may
have already been answered.

Now if I search for the word 'question', then I would like to get this
document along with the number of occurrences of question in the document,
in the above case it would be 2.

Any hint would be appreciated.
Thanks.


-- 
View this message in context: http://lucene.472066.n3.nabble.com/Problem-fetching-number-of-occurrences-tp862859p862859.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problem fetching number of occurrences

Posted by Rebecca Watson <be...@gmail.com>.
Hi,

i was looking at another post which had this presentation in - it has
a nice section
on termfreqvectors:

http://www.cnlp.org/presentations/slides/advancedluceneeu.pdf

bec :)

On 2 June 2010 13:56, Rebecca Watson <be...@gmail.com> wrote:
> hi
>
> when you are indexing, use termvectors
> org.apache.lucene.document.Field.TermVector
> set this in the Field object constructor when you create your Field objects
> at index time.
>
> i've never done it but i'm pretty sure these can be retrieved at
> search time using one of the
> IndexReader.getTermFreqVector methods.
>
> lucene in action has a really good section on using termfreqvectors:
> http://www.manning.com/hatcher3/
>
> if you want the positional info too e.g. the two positions of the
> "question" word in your
> example then have a look at the org.apache.lucene.search.spans.SpanTermQuery
> class -- in the getSpans method -- it grabs the terms + positions
> using the IndexReader
> as well: reader.termPositions(term)
>
> hope that helps,
>
> bec :)
>
>
> On 2 June 2010 04:53, Sirish Vadala <si...@gmail.com> wrote:
>>
>> Hello All:
>>
>> Can any one suggest me the best way to get the no. of occurrences of each
>> word per document in Lucene?
>>
>> Eg: Let the indexed text be:
>>
>> If you are posting a question, please try search first. Your question may
>> have already been answered.
>>
>> Now if I search for the word 'question', then I would like to get this
>> document along with the number of occurrences of question in the document,
>> in the above case it would be 2.
>>
>> Any hint would be appreciated.
>> Thanks.
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Problem-fetching-number-of-occurrences-tp862859p862859.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problem fetching number of occurrences

Posted by Rebecca Watson <be...@gmail.com>.
hi

when you are indexing, use termvectors
org.apache.lucene.document.Field.TermVector
set this in the Field object constructor when you create your Field objects
at index time.

i've never done it but i'm pretty sure these can be retrieved at
search time using one of the
IndexReader.getTermFreqVector methods.

lucene in action has a really good section on using termfreqvectors:
http://www.manning.com/hatcher3/

if you want the positional info too e.g. the two positions of the
"question" word in your
example then have a look at the org.apache.lucene.search.spans.SpanTermQuery
class -- in the getSpans method -- it grabs the terms + positions
using the IndexReader
as well: reader.termPositions(term)

hope that helps,

bec :)


On 2 June 2010 04:53, Sirish Vadala <si...@gmail.com> wrote:
>
> Hello All:
>
> Can any one suggest me the best way to get the no. of occurrences of each
> word per document in Lucene?
>
> Eg: Let the indexed text be:
>
> If you are posting a question, please try search first. Your question may
> have already been answered.
>
> Now if I search for the word 'question', then I would like to get this
> document along with the number of occurrences of question in the document,
> in the above case it would be 2.
>
> Any hint would be appreciated.
> Thanks.
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Problem-fetching-number-of-occurrences-tp862859p862859.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org