You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by 장용석 <ne...@gmail.com> on 2013/01/04 08:39:11 UTC

Terms.getSumTotalTermFreq() in Lucene 4.0

Hello.
I have some questions.

Document 1 : "learning perl learning java learning ruby"
Document 2 : "perl test"

I have indexed this documents, with StoreTermVectors(true) and
IndexOptions.DOCS_AND_FREQS.
Field name is "f".

And I executed this code.

IndexReader ir = IndexReader.open(dir);
Terms terms = ir.getTermVector(0, "f");

System.out.println(terms.getDocCount()); -> 1
System.out.println(terms.getSumDocFreq()); -> 4
System.out.println(terms.getSumTotalTermFreq()); -> -1

I think this terms instance acts like a single-document inverted index.

So getDocCount is 1 (single document), and getSumDocFreq is 4. (because
each term's docFreq is 1)
Is this right?

But I can't understand why getSumTotalTermFreq method return -1.
In javadoc getSumTotalTermFreq is sum of
TermsEnum.totalTermFreq<eclipse-javadoc:%E2%98%82=aboutLucene4/lib%5C/lucene-core-4.0.0.jar%3Corg.apache.lucene.index(Terms.class%E2%98%83Terms~getSumTotalTermFreq%E2%98%82TermsEnum%E2%98%82totalTermFreq>
.

I think in Document1, each term's totalTermFreqs are [learning, 3], [java,
1], [perl, 1], [ruby, 1].
So getSumTotalTermFreq method's result is 6 not -1.

Why temrs.getSumTotalTermFreq() method return -1?


Thanks in advance.
-- 
DEV용식
http://devyongsik.tistory.com

Re: Terms.getSumTotalTermFreq() in Lucene 4.0

Posted by 장용석 <ne...@gmail.com>.
Hi,
Ah.. I understand.
If I need this, I'll open an issue.

Thank you very much.

2013/1/5 Michael McCandless <lu...@mikemccandless.com>

> Hi,
>
> The next version won't have a fix for this unless someone opens an
> issue / posts a patch.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Jan 4, 2013 at 7:59 PM, 장용석 <ne...@gmail.com> wrote:
> > Hello Mike.
> > Thanks for your reply.
> >
> > It's not an important issue.
> > I'll waiting for next release version including this patch.
> >
> > Thanks.
> >
> > 2013/1/4 Michael McCandless <lu...@mikemccandless.com>
> >
> >> The problem is that the TermVectorsFormat for the default codec
> >> (Lucene40TermVectorsFormat) does not store this statistic
> >> per-document, currently.  We could in theory fix this ... maybe open
> >> an issue / make a patch if it's important?
> >>
> >> -1 return value is actually "valid": it means this statistic is not
> >> available.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Fri, Jan 4, 2013 at 2:39 AM, 장용석 <ne...@gmail.com> wrote:
> >> > Hello.
> >> > I have some questions.
> >> >
> >> > Document 1 : "learning perl learning java learning ruby"
> >> > Document 2 : "perl test"
> >> >
> >> > I have indexed this documents, with StoreTermVectors(true) and
> >> > IndexOptions.DOCS_AND_FREQS.
> >> > Field name is "f".
> >> >
> >> > And I executed this code.
> >> >
> >> > IndexReader ir = IndexReader.open(dir);
> >> > Terms terms = ir.getTermVector(0, "f");
> >> >
> >> > System.out.println(terms.getDocCount()); -> 1
> >> > System.out.println(terms.getSumDocFreq()); -> 4
> >> > System.out.println(terms.getSumTotalTermFreq()); -> -1
> >> >
> >> > I think this terms instance acts like a single-document inverted
> index.
> >> >
> >> > So getDocCount is 1 (single document), and getSumDocFreq is 4.
> (because
> >> > each term's docFreq is 1)
> >> > Is this right?
> >> >
> >> > But I can't understand why getSumTotalTermFreq method return -1.
> >> > In javadoc getSumTotalTermFreq is sum of
> >> >
> >>
> TermsEnum.totalTermFreq<eclipse-javadoc:%E2%98%82=aboutLucene4/lib%5C/lucene-core-4.0.0.jar%3Corg.apache.lucene.index(Terms.class%E2%98%83Terms~getSumTotalTermFreq%E2%98%82TermsEnum%E2%98%82totalTermFreq>
> >> > .
> >> >
> >> > I think in Document1, each term's totalTermFreqs are [learning, 3],
> >> [java,
> >> > 1], [perl, 1], [ruby, 1].
> >> > So getSumTotalTermFreq method's result is 6 not -1.
> >> >
> >> > Why temrs.getSumTotalTermFreq() method return -1?
> >> >
> >> >
> >> > Thanks in advance.
> >> > --
> >> > DEV용식
> >> > http://devyongsik.tistory.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > DEV용식
> > http://devyongsik.tistory.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
DEV용식
http://devyongsik.tistory.com

Re: Terms.getSumTotalTermFreq() in Lucene 4.0

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hi,

The next version won't have a fix for this unless someone opens an
issue / posts a patch.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 4, 2013 at 7:59 PM, 장용석 <ne...@gmail.com> wrote:
> Hello Mike.
> Thanks for your reply.
>
> It's not an important issue.
> I'll waiting for next release version including this patch.
>
> Thanks.
>
> 2013/1/4 Michael McCandless <lu...@mikemccandless.com>
>
>> The problem is that the TermVectorsFormat for the default codec
>> (Lucene40TermVectorsFormat) does not store this statistic
>> per-document, currently.  We could in theory fix this ... maybe open
>> an issue / make a patch if it's important?
>>
>> -1 return value is actually "valid": it means this statistic is not
>> available.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Fri, Jan 4, 2013 at 2:39 AM, 장용석 <ne...@gmail.com> wrote:
>> > Hello.
>> > I have some questions.
>> >
>> > Document 1 : "learning perl learning java learning ruby"
>> > Document 2 : "perl test"
>> >
>> > I have indexed this documents, with StoreTermVectors(true) and
>> > IndexOptions.DOCS_AND_FREQS.
>> > Field name is "f".
>> >
>> > And I executed this code.
>> >
>> > IndexReader ir = IndexReader.open(dir);
>> > Terms terms = ir.getTermVector(0, "f");
>> >
>> > System.out.println(terms.getDocCount()); -> 1
>> > System.out.println(terms.getSumDocFreq()); -> 4
>> > System.out.println(terms.getSumTotalTermFreq()); -> -1
>> >
>> > I think this terms instance acts like a single-document inverted index.
>> >
>> > So getDocCount is 1 (single document), and getSumDocFreq is 4. (because
>> > each term's docFreq is 1)
>> > Is this right?
>> >
>> > But I can't understand why getSumTotalTermFreq method return -1.
>> > In javadoc getSumTotalTermFreq is sum of
>> >
>> TermsEnum.totalTermFreq<eclipse-javadoc:%E2%98%82=aboutLucene4/lib%5C/lucene-core-4.0.0.jar%3Corg.apache.lucene.index(Terms.class%E2%98%83Terms~getSumTotalTermFreq%E2%98%82TermsEnum%E2%98%82totalTermFreq>
>> > .
>> >
>> > I think in Document1, each term's totalTermFreqs are [learning, 3],
>> [java,
>> > 1], [perl, 1], [ruby, 1].
>> > So getSumTotalTermFreq method's result is 6 not -1.
>> >
>> > Why temrs.getSumTotalTermFreq() method return -1?
>> >
>> >
>> > Thanks in advance.
>> > --
>> > DEV용식
>> > http://devyongsik.tistory.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> DEV용식
> http://devyongsik.tistory.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Terms.getSumTotalTermFreq() in Lucene 4.0

Posted by 장용석 <ne...@gmail.com>.
Hello Mike.
Thanks for your reply.

It's not an important issue.
I'll waiting for next release version including this patch.

Thanks.

2013/1/4 Michael McCandless <lu...@mikemccandless.com>

> The problem is that the TermVectorsFormat for the default codec
> (Lucene40TermVectorsFormat) does not store this statistic
> per-document, currently.  We could in theory fix this ... maybe open
> an issue / make a patch if it's important?
>
> -1 return value is actually "valid": it means this statistic is not
> available.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Jan 4, 2013 at 2:39 AM, 장용석 <ne...@gmail.com> wrote:
> > Hello.
> > I have some questions.
> >
> > Document 1 : "learning perl learning java learning ruby"
> > Document 2 : "perl test"
> >
> > I have indexed this documents, with StoreTermVectors(true) and
> > IndexOptions.DOCS_AND_FREQS.
> > Field name is "f".
> >
> > And I executed this code.
> >
> > IndexReader ir = IndexReader.open(dir);
> > Terms terms = ir.getTermVector(0, "f");
> >
> > System.out.println(terms.getDocCount()); -> 1
> > System.out.println(terms.getSumDocFreq()); -> 4
> > System.out.println(terms.getSumTotalTermFreq()); -> -1
> >
> > I think this terms instance acts like a single-document inverted index.
> >
> > So getDocCount is 1 (single document), and getSumDocFreq is 4. (because
> > each term's docFreq is 1)
> > Is this right?
> >
> > But I can't understand why getSumTotalTermFreq method return -1.
> > In javadoc getSumTotalTermFreq is sum of
> >
> TermsEnum.totalTermFreq<eclipse-javadoc:%E2%98%82=aboutLucene4/lib%5C/lucene-core-4.0.0.jar%3Corg.apache.lucene.index(Terms.class%E2%98%83Terms~getSumTotalTermFreq%E2%98%82TermsEnum%E2%98%82totalTermFreq>
> > .
> >
> > I think in Document1, each term's totalTermFreqs are [learning, 3],
> [java,
> > 1], [perl, 1], [ruby, 1].
> > So getSumTotalTermFreq method's result is 6 not -1.
> >
> > Why temrs.getSumTotalTermFreq() method return -1?
> >
> >
> > Thanks in advance.
> > --
> > DEV용식
> > http://devyongsik.tistory.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
DEV용식
http://devyongsik.tistory.com

Re: Terms.getSumTotalTermFreq() in Lucene 4.0

Posted by Michael McCandless <lu...@mikemccandless.com>.
The problem is that the TermVectorsFormat for the default codec
(Lucene40TermVectorsFormat) does not store this statistic
per-document, currently.  We could in theory fix this ... maybe open
an issue / make a patch if it's important?

-1 return value is actually "valid": it means this statistic is not available.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 4, 2013 at 2:39 AM, 장용석 <ne...@gmail.com> wrote:
> Hello.
> I have some questions.
>
> Document 1 : "learning perl learning java learning ruby"
> Document 2 : "perl test"
>
> I have indexed this documents, with StoreTermVectors(true) and
> IndexOptions.DOCS_AND_FREQS.
> Field name is "f".
>
> And I executed this code.
>
> IndexReader ir = IndexReader.open(dir);
> Terms terms = ir.getTermVector(0, "f");
>
> System.out.println(terms.getDocCount()); -> 1
> System.out.println(terms.getSumDocFreq()); -> 4
> System.out.println(terms.getSumTotalTermFreq()); -> -1
>
> I think this terms instance acts like a single-document inverted index.
>
> So getDocCount is 1 (single document), and getSumDocFreq is 4. (because
> each term's docFreq is 1)
> Is this right?
>
> But I can't understand why getSumTotalTermFreq method return -1.
> In javadoc getSumTotalTermFreq is sum of
> TermsEnum.totalTermFreq<eclipse-javadoc:%E2%98%82=aboutLucene4/lib%5C/lucene-core-4.0.0.jar%3Corg.apache.lucene.index(Terms.class%E2%98%83Terms~getSumTotalTermFreq%E2%98%82TermsEnum%E2%98%82totalTermFreq>
> .
>
> I think in Document1, each term's totalTermFreqs are [learning, 3], [java,
> 1], [perl, 1], [ruby, 1].
> So getSumTotalTermFreq method's result is 6 not -1.
>
> Why temrs.getSumTotalTermFreq() method return -1?
>
>
> Thanks in advance.
> --
> DEV용식
> http://devyongsik.tistory.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org