You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tricia Williams <pg...@student.cs.uwaterloo.ca> on 2005/10/03 17:24:43 UTC

Re: TermDocs.freq()

To follow up on my post from Thursday.  I have written a very basic test
for TermPositions.  This test allows me to identify that only the
first 10001 tokens are considered to determine term frequency (ie with
the searching term in a position greater than 10001 my test fails).

Is this by design?  Is there an obvious work-around so that the frequency
that I receive is correct for my document?

Thank you for your consideration,
Tricia

On Thu, 29 Sep 2005, Tricia Williams wrote:

> I am finding that TermDocs.freq() method is returning an incorrect value.
> I was wondering if anyone else had experienced this problem.
>
> I am using tp = IndexReader.termPositions( queryTerm ) to return a object
> which implements TermPositions.  I then use tp.skipTo( docid ) to go
> directly to the document from which I wish to retrieve term positions. The
> following for loop adds the positions to my ArrayList which I use later:
>
> for( 	int pos = tp.nextPosition(), k = 0;
> 	k < tp.freq();
> 	pos = tp.nextPosition(), k++ )
> {
> 	positionMatches.add( new Integer( pos ) );
> }
>
> In a document which I know has 48 references to the term, a frequency of
> 23 is returned.  There doesn't seem to be a pattern to this as some other
> documents have (frequency, actual): (25, 48), (36, 43), (30, 149).
>
> These frequencies are from results within my code and confirmed in Luke,
> so I'm pretty certain that this isn't an error on my part.
>
> I've been trying to find out where the origin of this issue is without
> luck thus far.  Any help or advice would be appreciated.
>
> Thanks,
> Tricia
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TermDocs.freq()

Posted by Yonik Seeley <ys...@gmail.com>.
See IndexWriter.setMaxFieldLength()

-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 10/3/05, Tricia Williams <pg...@student.cs.uwaterloo.ca> wrote:
>
> To follow up on my post from Thursday. I have written a very basic test
> for TermPositions. This test allows me to identify that only the
> first 10001 tokens are considered to determine term frequency (ie with
> the searching term in a position greater than 10001 my test fails).
>
> Is this by design? Is there an obvious work-around so that the frequency
> that I receive is correct for my document?
>
> Thank you for your consideration,
> Tricia
>
> On Thu, 29 Sep 2005, Tricia Williams wrote:
>
> > I am finding that TermDocs.freq() method is returning an incorrect
> value.
> > I was wondering if anyone else had experienced this problem.
> >
> > I am using tp = IndexReader.termPositions( queryTerm ) to return a
> object
> > which implements TermPositions. I then use tp.skipTo( docid ) to go
> > directly to the document from which I wish to retrieve term positions.
> The
> > following for loop adds the positions to my ArrayList which I use later:
> >
> > for( int pos = tp.nextPosition(), k = 0;
> > k < tp.freq();
> > pos = tp.nextPosition(), k++ )
> > {
> > positionMatches.add( new Integer( pos ) );
> > }
> >
> > In a document which I know has 48 references to the term, a frequency of
> > 23 is returned. There doesn't seem to be a pattern to this as some other
> > documents have (frequency, actual): (25, 48), (36, 43), (30, 149).
> >
> > These frequencies are from results within my code and confirmed in Luke,
> > so I'm pretty certain that this isn't an error on my part.
> >
> > I've been trying to find out where the origin of this issue is without
> > luck thus far. Any help or advice would be appreciated.
> >
> > Thanks,
> > Tricia
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>