You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karl Øie <ka...@gan.no> on 2002/01/28 18:23:22 UTC
RE: strange search problems(SOLVED!)
....er, after i sent the previous mail i grep'ed trough the source for
"10000" and found this:
jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java
/** The maximum number of terms that will be indexed for a single field in
a
document. This limits the amount of memory required for indexing, so
that
collections with very large files will not crash the indexing process by
running out of memory.
<p>By default, no more than 10,000 terms will be indexed for a field. */
public int maxFieldLength = 10000;
gentlemen, you may throw your tomatoes now!... sorry to bother you!
mvh karl øie
-----Original Message-----
From: Karl Øie [mailto:karl@gan.no]
Sent: 28. januar 2002 18:16
To: lucene-user@jakarta.apache.org
Subject: strange search problems(cannot query for more than the first
10000 words!?!)
I have created a testclass for working with Analyzers and ran into a strange
problem; I cannot search for text in fields with more than 10000 words!?!?
I have tested for various bugs in my test class, but I cannot find anything
there (please have a look, files are attached).
the class "AnalayzerTest" can be used like this:
"java -cp lucene-1.2-rc3-dev.jar org.apache.lucene.analysis.AnalyzerTest
voc.txt voc_out.txt"
where the "voc.txt" and "voc_out.txt" also are included in the zip file.
The approach is simple: voc.txt contains 20628 Norwegian words, to test the
Analyzer I try to do this:
- create a string containing all the 20628 words separated with " ".
- create a lucene document and index this string as a text field.
- add this one document to an index
- loop trough the words again and query the index for each of the same words
in the list.
- if everything works every word should yield a hit in the single document
that exist in the index.
To be sure nothing is filtered I have used the WhitespaceAnalyzer analyzer
(or NullAnalyzer...).
But here comes the problems:
----------------------------
If I try to run all the 20628 words, the last 10628 words can not be found
by the IndexSearcher. If I flip the words around(reverse alpha-order). I
cannot find the 10628 first words!!.
If I limit the wordlist to 10000, I get a perfect match for either the first
or last 10000 words. If I set the limit to 10005 I will get 5 words not
found at the beginning or end of the list according to order.
Does anyone know what's going on here?? I would be very happy if someone
could point to a place in my code where I have done something really stupid,
because I have tried to track this for a hole day.
mvh karl øie/gan media
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>