You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "comparis.ch - Roman Baeriswyl" <ro...@comparis.ch> on 2010/05/18 18:18:34 UTC

Sorting and Empty (non-existing) Fields

Hi All

I've got a problem I'm trying to solve the whole day:

Let's say I have an index with two fields, the first one is always filled and the second one only sometimes.
Now I want to search something on the first field and want the results sorted by relevance, then by the first field, then by the second field.
My problem now is that, if I have a lot of Entries with the same value in the first field and no value in the second field, these entries with no value on the 2nd field are coming first.

Is there any way to increase the score on those documents which have a value on the second field? Or is there any way to skip those Documents which don't have the second field? I don't want to use a Filter, it should all be done with the Queries Objects if possible.

I tried a lot of things with WildcardQuery or TermRangeQuery (with null values or empty strings) in Luke and directly in IndexSearcher, but I always get either no results or all results, even those which have no value in the second field.

I found a lot of information where "-field2:[* TO *]" or similar stuff should work but it doesn't.

Can anyone give me some hints?

Thanks
Roman

Holen Sie die besten Elektronik-Aktionen direkt auf Ihr Facebook-Profil: http://www.facebook.com/pages/Preissturz/218831069608

Die besten Elektronik-Aktionen auf Twitter: http://twitter.com/preissturz1

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorting and Empty (non-existing) Fields

Posted by Rob Bygrave <ro...@gmail.com>.
BTW:  Saw this in the SOLR docs...

       - If sortMissingLast="false" and sortMissingFirst="false" (the
default),
*         then default lucene sorting will be used which places docs without
the
         field first in an ascending sort and last in a descending sort.*



On Wed, May 19, 2010 at 4:18 AM, comparis.ch - Roman Baeriswyl <
roman.baeriswyl@comparis.ch> wrote:

> Hi All
>
> I've got a problem I'm trying to solve the whole day:
>
> Let's say I have an index with two fields, the first one is always filled
> and the second one only sometimes.
> Now I want to search something on the first field and want the results
> sorted by relevance, then by the first field, then by the second field.
> My problem now is that, if I have a lot of Entries with the same value in
> the first field and no value in the second field, these entries with no
> value on the 2nd field are coming first.
>
> Is there any way to increase the score on those documents which have a
> value on the second field? Or is there any way to skip those Documents which
> don't have the second field? I don't want to use a Filter, it should all be
> done with the Queries Objects if possible.
>
> I tried a lot of things with WildcardQuery or TermRangeQuery (with null
> values or empty strings) in Luke and directly in IndexSearcher, but I always
> get either no results or all results, even those which have no value in the
> second field.
>
> I found a lot of information where "-field2:[* TO *]" or similar stuff
> should work but it doesn't.
>
> Can anyone give me some hints?
>
> Thanks
> Roman
>
> Holen Sie die besten Elektronik-Aktionen direkt auf Ihr Facebook-Profil:
> http://www.facebook.com/pages/Preissturz/218831069608
>
> Die besten Elektronik-Aktionen auf Twitter: http://twitter.com/preissturz1
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Sorting and Empty (non-existing) Fields

Posted by Chris Hostetter <ho...@fucit.org>.
: Now I want to search something on the first field and want the results 
: sorted by relevance, then by the first field, then by the second field.

first off: if your primary sort is on relevancy, there are going to be 
very few cases where your secondary sort comes into play -- the scoring 
formula generates a float, and those floats aren't likely to be identical 
unless the documents themselves are extremely similar.

: My problem now is that, if I have a lot of Entries with the same value 
: in the first field and no value in the second field, these entries with 
: no value on the 2nd field are coming first.
: 
: Is there any way to increase the score on those documents which have a 
: value on the second field? Or is there any way to skip those Documents 
: which don't have the second field? I don't want to use a Filter, it 

yes, add a prohibited clause to your query, which is a nested BooleanQuery 
containing a mandatory MatchAllDocsQuery and a prohibited full range query 
on field2 (ie:  "-(+*:* -field2:[* TO *]"  ) ... that will ensure those 
docs are exlcuded from your query.  alternately you could make them score 
lower by just adding an optional clause that boosts the score of any 
docs that do have a value in that field (ie:  "field2[* TO *]^1000"  )

Lastly: if you don't want to affect the scores in any way, and you don't 
wnat to exclude the docs from the result set, just make them sort after 
other docs that do have a value in that field, you can consider using 
something like this...

http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/java/org/apache/solr/search/MissingStringLastComparatorSource.java

...but note that it only works for sorting "Strings" because that's the 
only type of FieldCache that understands the idea of Documents that are 
"missing" a field.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorting and Empty (non-existing) Fields

Posted by Rob Bygrave <ro...@gmail.com>.
I'm not a Lucene Guru so hopefully you get a more definitive response.

I believe this means you want a way to specify ... "Nulls High" / "Nulls
Low" for your field (in this case you want Nulls High I believe).

I haven't seen support for that (but it might exist). Looking at
StringValComparator I'd say support for it doesn't exist - it looks like
null is always treated as low. My thinking is that you'll have to create a
FieldComparatorSource/FieldComparator :

SortField(String field, FieldComparatorSource comparator);

There are a bunch of implementations (static classes) in FieldComparator
(DoubleComparator, FloatComparator etc).

I believe you might need to create one of those. For example, I'm looking at
StringValComparator and I don't see a Null's High/ Nulls Low option in the
compare method.

Maybe that helps. Hopefully someone else can be more helpful. Would be nice
to have a Nulls High/Nulls Low support built in.

... source code of StringValComparator from lucene version 3


  /** Sorts by field's natural String sort order.  All
   *  comparisons are done using String.compareTo, which is
   *  slow for medium to large result sets but possibly
   *  very fast for very small results sets. */
  public static final class StringValComparator extends FieldComparator {

    private String[] values;
    private String[] currentReaderValues;
    private final String field;
    private String bottom;

    StringValComparator(int numHits, String field) {
      values = new String[numHits];
      this.field = field;
    }

    @Override
    public int compare(int slot1, int slot2) {
      final String val1 = values[slot1];
      final String val2 = values[slot2];
      if (val1 == null) {
        if (val2 == null) {
          return 0;
        }
        return -1;
      } else if (val2 == null) {
        return 1;
      }

      return val1.compareTo(val2);
    }

    @Override
    public int compareBottom(int doc) {
      final String val2 = currentReaderValues[doc];
      if (bottom == null) {
        if (val2 == null) {
          return 0;
        }
        return -1;
      } else if (val2 == null) {
        return 1;
      }
      return bottom.compareTo(val2);
    }

    @Override
    public void copy(int slot, int doc) {
      values[slot] = currentReaderValues[doc];
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase) throws
IOException {
      currentReaderValues = FieldCache.DEFAULT.getStrings(reader, field);
    }

    @Override
    public void setBottom(final int bottom) {
      this.bottom = values[bottom];
    }

    @Override
    public Comparable<?> value(int slot) {
      return values[slot];
    }
  }





On Wed, May 19, 2010 at 4:18 AM, comparis.ch - Roman Baeriswyl <
roman.baeriswyl@comparis.ch> wrote:

> Hi All
>
> I've got a problem I'm trying to solve the whole day:
>
> Let's say I have an index with two fields, the first one is always filled
> and the second one only sometimes.
> Now I want to search something on the first field and want the results
> sorted by relevance, then by the first field, then by the second field.
> My problem now is that, if I have a lot of Entries with the same value in
> the first field and no value in the second field, these entries with no
> value on the 2nd field are coming first.
>
> Is there any way to increase the score on those documents which have a
> value on the second field? Or is there any way to skip those Documents which
> don't have the second field? I don't want to use a Filter, it should all be
> done with the Queries Objects if possible.
>
> I tried a lot of things with WildcardQuery or TermRangeQuery (with null
> values or empty strings) in Luke and directly in IndexSearcher, but I always
> get either no results or all results, even those which have no value in the
> second field.
>
> I found a lot of information where "-field2:[* TO *]" or similar stuff
> should work but it doesn't.
>
> Can anyone give me some hints?
>
> Thanks
> Roman
>
> Holen Sie die besten Elektronik-Aktionen direkt auf Ihr Facebook-Profil:
> http://www.facebook.com/pages/Preissturz/218831069608
>
> Die besten Elektronik-Aktionen auf Twitter: http://twitter.com/preissturz1
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>