You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/05/03 21:48:03 UTC
DO NOT REPLY [Bug 28748] New: -
Inconsistent behaviour sorting against field with no related documents
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=28748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=28748
Inconsistent behaviour sorting against field with no related documents
Summary: Inconsistent behaviour sorting against field with no
related documents
Product: Lucene
Version: CVS Nightly - Specify date in submission
Platform: Other
OS/Version: Other
Status: NEW
Severity: Normal
Priority: Other
Component: Search
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: sam@redspr.com
In StringSortedHitQueue - generateSortIndex seems to mistake
the TermEnum having values as indicating that the sort field
has entries in the index.
In the case where the search has matching results an ArrayIndexOutOfBounds
exception is thrown in sortValue (line 177 StringSortedHitQueue)
as generateSortIndex creates a terms array of zero length and fieldOrder
contains 0 for all documents.
It would seem more helpful if:
a) generateSortIndex catches the lack of any documents with the sort field.
or
b) reserve terms[0] as a special value for documents that do not have
matching sort field values. ie Change the current implementation to add 1
to the index and change terms[0] to ensure it sorts "untagged" documents to
first or last.
For my application Id much prefer solution (b) as it allows much smaller
indexes and make searching using sort values less brittle.
Thats the best my communication skills can muster just now. Could change
current code to something like:
private final int[] generateSortIndex()
throws IOException {
final int[] retArray = new int[reader.maxDoc()];
final String[] mterms = new String[reader.maxDoc() + 1]; // guess length
if (retArray.length > 0) {
TermDocs termDocs = reader.termDocs();
// change this value to control if documents without sort field come first or last
mterms[0] = ""; // XXXXXXXXX change
int t = 1; // current term number XXXXXXXXXXXXX change
try {
do {
Term term = enumerator.term();
if (term.field() != field) break;
// store term text
// we expect that there is at most one term per document
if (t >= mterms.length) throw new RuntimeException ("there are more terms
than documents in field \""+field+"\"");
mterms[t] = term.text();
// store which documents use this term
termDocs.seek (enumerator);
while (termDocs.next()) {
retArray[termDocs.doc()] = t;
}
t++;
} while (enumerator.next());
} finally {
termDocs.close();
}
// if there are less terms than documents,
// trim off the dead array space
if (t < mterms.length) {
terms = new String[t];
System.arraycopy (mterms, 0, terms, 0, t);
} else {
terms = mterms;
}
}
return retArray;
}
Having very quick look at IntegerSortedHitQueue would seem possible
to do same thing. Maybe creating Integer wrapper objects once.
Hope that made some sort of sense. Im not very familiar with the code
or Lucene terminology.
If the above seems like a useful approach Id be glad to generate patches
for a cleaned up version.
Thanks
Sam
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org