You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2016/11/10 04:25:58 UTC

[jira] [Updated] (SOLR-9166) Export handler returns zero for numeric fields that are not in the original doc

     [ https://issues.apache.org/jira/browse/SOLR-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erick Erickson updated SOLR-9166:
---------------------------------
    Attachment: SOLR-9166.patch

Latest patch implement as we've discussed. The code changes are absolutely minimal but are made in ExportWriter since SortingResonseWriter has been retired and tests have been added.

There are no default values returned for fields not in the docs, I'm arguing that this is incorrect behavior and any code that depends on it needs to be re-written. We can discuss that of course....

The test case I added ran afoul of LUCENE-7548. When that's committed the test case should be updated. See the comments in StreamingTest.checkSort.

The /export handler seems to sort missing fields first/last as it should, it's just that using the /select handler to get the proper ordering seemed like a good idea rather than hard-coding the results as in the current patch. This test case should continue to run fine even after LUCENE-7548 is committed, it'll just be inelegant.

Still to do: Run the entire test suite to see what, if anything, breaks. Will do that tonight.

> Export handler returns zero for numeric fields that are not in the original doc
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-9166
>                 URL: https://issues.apache.org/jira/browse/SOLR-9166
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Rohit
>         Attachments: SOLR-9166.patch, SOLR-9166.patch, SOLR-9166.patch
>
>
> From the dev list discussion:
> My original post.
> Zero is different from not
> existing. And let's claim that I want to process a stream and, say,
> facet on in integer field over the result set. There's no way on the
> client side to distinguish between a document that has a zero in the
> field and one that didn't have the field in the first place so I'll
> over-count the zero bucket.
> From Dennis Gove:
> Is this true for non-numeric fields as well? I agree that this seems like a very bad thing.
> I can't imagine that a fix would cause a problem with Streaming Expressions, ParallelSQL, or other given that the /select handler is not returning 0 for these missing fields (the /select handler is the default handler for the Streaming API so if nulls were a problem I imagine we'd have already seen it). 
> That said, within Streaming Expressions there is a select(...) function which supports a replace(...) operation which allows you to replace one value (or null) with some other value. If a 0 were necessary one could use a select(...) to replace null with 0 using an expression like this 
>    select(<stream>, replace(fieldA, null, withValue=0)). 
> The end result of that would be that the field fieldA would never have a null value and for all tuples where a null value existed it would be replaced with 0.
> Details on the select function can be found at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61330338#StreamingExpressions-select.
> And to answer Denis' question, null gets returned for string DocValues fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org