You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tricia Williams <pg...@student.cs.uwaterloo.ca> on 2005/09/29 17:53:27 UTC

TermDocs.freq()

I am finding that TermDocs.freq() method is returning an incorrect value.
I was wondering if anyone else had experienced this problem.

I am using tp = IndexReader.termPositions( queryTerm ) to return a object
which implements TermPositions.  I then use tp.skipTo( docid ) to go
directly to the document from which I wish to retrieve term positions. The
following for loop adds the positions to my ArrayList which I use later:

for( 	int pos = tp.nextPosition(), k = 0;
	k < tp.freq();
	pos = tp.nextPosition(), k++ )
{
	positionMatches.add( new Integer( pos ) );
}

In a document which I know has 48 references to the term, a frequency of
23 is returned.  There doesn't seem to be a pattern to this as some other
documents have (frequency, actual): (25, 48), (36, 43), (30, 149).

These frequencies are from results within my code and confirmed in Luke,
so I'm pretty certain that this isn't an error on my part.

I've been trying to find out where the origin of this issue is without
luck thus far.  Any help or advice would be appreciated.

Thanks,
Tricia

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TermDocs.freq()

Posted by Yonik Seeley <ys...@gmail.com>.
See IndexWriter.setMaxFieldLength()

-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 10/3/05, Tricia Williams <pg...@student.cs.uwaterloo.ca> wrote:
>
> To follow up on my post from Thursday. I have written a very basic test
> for TermPositions. This test allows me to identify that only the
> first 10001 tokens are considered to determine term frequency (ie with
> the searching term in a position greater than 10001 my test fails).
>
> Is this by design? Is there an obvious work-around so that the frequency
> that I receive is correct for my document?
>
> Thank you for your consideration,
> Tricia
>
> On Thu, 29 Sep 2005, Tricia Williams wrote:
>
> > I am finding that TermDocs.freq() method is returning an incorrect
> value.
> > I was wondering if anyone else had experienced this problem.
> >
> > I am using tp = IndexReader.termPositions( queryTerm ) to return a
> object
> > which implements TermPositions. I then use tp.skipTo( docid ) to go
> > directly to the document from which I wish to retrieve term positions.
> The
> > following for loop adds the positions to my ArrayList which I use later:
> >
> > for( int pos = tp.nextPosition(), k = 0;
> > k < tp.freq();
> > pos = tp.nextPosition(), k++ )
> > {
> > positionMatches.add( new Integer( pos ) );
> > }
> >
> > In a document which I know has 48 references to the term, a frequency of
> > 23 is returned. There doesn't seem to be a pattern to this as some other
> > documents have (frequency, actual): (25, 48), (36, 43), (30, 149).
> >
> > These frequencies are from results within my code and confirmed in Luke,
> > so I'm pretty certain that this isn't an error on my part.
> >
> > I've been trying to find out where the origin of this issue is without
> > luck thus far. Any help or advice would be appreciated.
> >
> > Thanks,
> > Tricia
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: TermDocs.freq()

Posted by Tricia Williams <pg...@student.cs.uwaterloo.ca>.
To follow up on my post from Thursday.  I have written a very basic test
for TermPositions.  This test allows me to identify that only the
first 10001 tokens are considered to determine term frequency (ie with
the searching term in a position greater than 10001 my test fails).

Is this by design?  Is there an obvious work-around so that the frequency
that I receive is correct for my document?

Thank you for your consideration,
Tricia

On Thu, 29 Sep 2005, Tricia Williams wrote:

> I am finding that TermDocs.freq() method is returning an incorrect value.
> I was wondering if anyone else had experienced this problem.
>
> I am using tp = IndexReader.termPositions( queryTerm ) to return a object
> which implements TermPositions.  I then use tp.skipTo( docid ) to go
> directly to the document from which I wish to retrieve term positions. The
> following for loop adds the positions to my ArrayList which I use later:
>
> for( 	int pos = tp.nextPosition(), k = 0;
> 	k < tp.freq();
> 	pos = tp.nextPosition(), k++ )
> {
> 	positionMatches.add( new Integer( pos ) );
> }
>
> In a document which I know has 48 references to the term, a frequency of
> 23 is returned.  There doesn't seem to be a pattern to this as some other
> documents have (frequency, actual): (25, 48), (36, 43), (30, 149).
>
> These frequencies are from results within my code and confirmed in Luke,
> so I'm pretty certain that this isn't an error on my part.
>
> I've been trying to find out where the origin of this issue is without
> luck thus far.  Any help or advice would be appreciated.
>
> Thanks,
> Tricia
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TermDocs.freq()

Posted by Greg Gershman <gr...@yahoo.com>.
Save user queries in a database along with number of
results from last time queried, use that as suggestion
base.

Notice that Google's result count in Suggest differs
from the actual result count.  They are not computing
results on the fly.

Greg

--- Jérôme BENOIS <be...@argia-engineering.fr> wrote:

> Hello everybody,
> 
> 	I would like implement a "Google
> Suggest"
> (http://www.google.com/webhp?complete=1&hl=en) like
> but how to
> get similar criteria and number of results. 
> 
> 	Are you an idea ?
> 
> Thanks,
> Jérôme.
> 



		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TermDocs.freq()

Posted by Jérôme BENOIS <be...@argia-engineering.fr>.
Hello everybody,

	I would like implement a "Google
Suggest" (http://www.google.com/webhp?complete=1&hl=en) like but how to
get similar criteria and number of results. 

	Are you an idea ?

Thanks,
Jérôme.