You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Todd Long <lo...@gmail.com> on 2015/10/04 17:58:07 UTC

Numeric Sorting with 0 and NULL Values

I'm trying to sort on numeric (e.g. TrieDoubleField) fields and running into
an issue where 0 and NULL values are being compared as equal. This appears
to be the "common case" in the FieldComparator class where the missing value
(i.e. NULL) gets assigned for a 0 value (which is perfectly valid). Is there
any way around this short of indexing another field to signify there is a
value? I need the sort such that ascending will have the NULL values first
and descending will have the NULL values last (i.e. sortMissingFirst="false"
and sortMissingLast="false").

expected:
NULL
NULL
0
0.7
5
32

actual:
NULL
0
NULL
0.7
5
32

Please let me know if I can provide any additional information. Thank you.



--
View this message in context: http://lucene.472066.n3.nabble.com/Numeric-Sorting-with-0-and-NULL-Values-tp4232654.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Numeric Sorting with 0 and NULL Values

Posted by Todd Long <lo...@gmail.com>.
Todd Long wrote
> I'm curious as to where the loss of precision would be when using
> "-(Double.MAX_VALUE)" as you mentioned? Also, any specific reason why you
> chose that over Double.MIN_VALUE (sorry, just making sure I'm not missing
> something)?

So, to answer my own question it looks like Double.MIN_VALUE is somewhat
misleading (or poorly named perhaps?)... from the javadoc it states "A
constant holding the smallest positive nonzero value of type double". In
this case, the cast to int/long would result in 0 with the loss of precision
which is definitely not what I want (and back to the original issue). It
would certainly seem that -Double.MAX_VALUE would be the way to go! This is
something that I was not aware of with Double... thank you.


Chris Hostetter-3 wrote
> ...i mention this as being a workarround for floats/doubles because the 
> functions are evaluated as doubles (no "casting" or "forced integer 
> context" type support at the moment), so with integer/float fields there 
> would be some loss of precision.

I'm still curious of whether or not there would be any cast issue going from
double to int/long within the "def()" function. Any additional details would
be greatly appreciated.



--
View this message in context: http://lucene.472066.n3.nabble.com/Numeric-Sorting-with-0-and-NULL-Values-tp4232654p4233361.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Numeric Sorting with 0 and NULL Values

Posted by Todd Long <lo...@gmail.com>.
Chris Hostetter-3 wrote
> ...i mention this as being a workarround for floats/doubles because the 
> functions are evaluated as doubles (no "casting" or "forced integer 
> context" type support at the moment), so with integer/float fields there 
> would be some loss of precision.

Excellent, thank you for the reply.

My initial thought was going with the extra un-indexed/un-stored field... I
wasn't aware of the "docValues" attribute to be used in that case for
sorting (I assume this is more for performance). Thank you for the default
value explanation.

I definitely like the workaround as a reindex-free option. I'm curious as to
where the loss of precision would be when using "-(Double.MAX_VALUE)" as you
mentioned? Also, any specific reason why you chose that over
Double.MIN_VALUE (sorry, just making sure I'm not missing something)? I
would think an int or long field would simply cast down from the double
min/max value... at least that is what I gathered from poking around the
"def()" function code. Of course, the decimal would be lost with the int and
long but I would still come away with the min value of -2147483648 and
-9223372036854775808, respectively.



--
View this message in context: http://lucene.472066.n3.nabble.com/Numeric-Sorting-with-0-and-NULL-Values-tp4232654p4233117.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Numeric Sorting with 0 and NULL Values

Posted by Chris Hostetter <ho...@fucit.org>.
: value? I need the sort such that ascending will have the NULL values first
: and descending will have the NULL values last (i.e. sortMissingFirst="false"
: and sortMissingLast="false").

You can configure a default="X" attribute on your field such that X is the 
minimum legal value for your field type (ie: -2147483648 for TrieIntField, 
-9223372036854775808 for TrieLongField etc...).  but then those values 
will be stored in the docs as well -- in which case you're best bet would 
be to use 2 fields, one to index/store the 'real' value and one (with the 
default specified) that is un-indexed/un-stored but uses docValues for 
sorting.


A workarround that would probably be good enough for floats/doubles w/o 
reindexing would be to sort on the "def()" function using 
"-(Double.MAX_VALUE)" as the default...

  sort=def(your_field,-1.7976931348623157E308) asc
  sort=def(your_field,-1.7976931348623157E308) desc


...i mention this as being a workarround for floats/doubles because the 
functions are evaluated as doubles (no "casting" or "forced integer 
context" type support at the moment), so with integer/float fields there 
would be some loss of precision.


-Hoss
http://www.lucidworks.com/