You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ishan Chattopadhyaya (JIRA)" <ji...@apache.org> on 2016/03/15 16:12:33 UTC

[jira] [Updated] (SOLR-8082) can't query against negative float or double values when indexed="false" docValues="true" multiValued="false"

     [ https://issues.apache.org/jira/browse/SOLR-8082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ishan Chattopadhyaya updated SOLR-8082:
---------------------------------------
    Attachment: SOLR-8082.patch

Here's a summary of my understanding / observations:
# Floats and doubles need to be converted to longs before writing them to NumericDocValues. 
# We have two options, Double.doubleToLongBits() and NumericUtils.doubleToSortableLong(). For positive doubles, both these methods return the same long value, but different ones for negative doubles.
# Currently, we use Double.doubleToLongBits(). Hence, to use term query against such docValues, we should use the same method with the query value, but current code uses NumericUtils.doubleToSortableLong() and hence term queries against negative values fail. Similarly, range queries also fail when min is negative.
# I tried changing initial writing logic to use NumericUtils.doubleToSortableLong(). With this change, both term queries and range queries work, but sorting fails (when there are negative values). That is counter intuitive, since the individual long values themselves are in sorted order. Since this is an intrusive change that breaks backcompat, I didn't investigate deeper to understand why this is happening.
# To arrive at a least intrusive fix, I tried changing the range query logic to split out the queries into two distinct ranges (negatives and positives) using a boolean query. I had to do this since the Double.doubleToLongBits() values are not monotonically increasing (they are decreasing for Double.MIN_VALUE to 0, but increasing for 0 to Double.MAX_VALUE).

Attached the patch for the last point, which I think is the least intrusive way to pull things together so that they work. When the range query crosses the 0 boundary, there are two dv range queries which is less efficient, but better than not working at all (which is the case today). The patch passes the tests, but it might benefit from some neater refactoring.

[~hossman] Can you please review? Do you think there's a cleaner way to do this?

> can't query against negative float or double values when indexed="false" docValues="true" multiValued="false"
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8082
>                 URL: https://issues.apache.org/jira/browse/SOLR-8082
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>         Attachments: SOLR-8082.patch, SOLR-8082.patch
>
>
> Haven't dug into this yet, but something is evidently wrong in how the DocValues based queries get build for single valued float or double fields when negative numbers are involved.
> Steps to reproduce...
> {noformat}
> $ bin/solr -e schemaless -noprompt
> ...
> $ curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field":{ "name":"f_dv_multi", "type":"tfloat", "stored":"true", "indexed":"false", "docValues":"true", "multiValued":"true" }, "add-field":{ "name":"f_dv_single", "type":"tfloat", "stored":"true", "indexed":"false", "docValues":"true", "multiValued":"false" } }' http://localhost:8983/solr/gettingstarted/schema
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":84}}
> $ curl -X POST -H 'Content-type:application/json' --data-binary '[{"id":"test", "f_dv_multi":-4.3, "f_dv_single":-4.3}]' 'http://localhost:8983/solr/gettingstarted/update/json/docs?commit=true'
> {"responseHeader":{"status":0,"QTime":57}}
> $ curl 'http://localhost:8983/solr/gettingstarted/query?q=f_dv_multi:"-4.3"'
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":5,
>     "params":{
>       "q":"f_dv_multi:\"-4.3\""}},
>   "response":{"numFound":1,"start":0,"docs":[
>       {
>         "id":"test",
>         "f_dv_multi":[-4.3],
>         "f_dv_single":-4.3,
>         "_version_":1512962117004689408}]
>   }}
> $ curl 'http://localhost:8983/solr/gettingstarted/query?q=f_dv_single:"-4.3"'
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":5,
>     "params":{
>       "q":"f_dv_single:\"-4.3\""}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   }}
> {noformat}
> Explicit range queries (which is how numeric "field" queries are implemented under the cover) are equally problematic...
> {noformat}
> $ curl 'http://localhost:8983/solr/gettingstarted/query?q=f_dv_multi:%5B-4.3+TO+-4.3%5D'
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":0,
>     "params":{
>       "q":"f_dv_multi:[-4.3 TO -4.3]"}},
>   "response":{"numFound":1,"start":0,"docs":[
>       {
>         "id":"test",
>         "f_dv_multi":[-4.3],
>         "f_dv_single":-4.3,
>         "_version_":1512962117004689408}]
>   }}
> $ curl 'http://localhost:8983/solr/gettingstarted/query?q=f_dv_single:%5B-4.3+TO+-4.3%5D'
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":0,
>     "params":{
>       "q":"f_dv_single:[-4.3 TO -4.3]"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   }}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org