You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2016/10/31 03:52:58 UTC

[jira] [Updated] (SOLR-9166) Export handler returns zero for numeric fields that are not in the original doc

     [ https://issues.apache.org/jira/browse/SOLR-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erick Erickson updated SOLR-9166:
---------------------------------
    Attachment: SOLR-9166.patch

[~rohitcse] I had some time today so started this patch.

What I have so far. I got it this far and ran into a few things I thought I'd run by folks. Lots of nocommits and the like currently, as well as new failing tests. But it's progress....

[~yonik@apache.org] [~joel.bernstein] [~dpgove] I'd be particularly interested in your takes.

1> My base assumption is that sorting during export should return docs in the same order as using the /select handler. Currently this doesn't happen, the new test I wrote fails all over the place. Not quite sure why, but I just got all this to semi-work so I'm checkpointing.

2> I want to fold the two parameters into a single on/off returnDefaultsForMissing which defaults to "false". This would mean there's really no way to get the old behavior where numerics return zero and strings return null. Is that OK? I think it's easier to explain something like "defaults for numerics are zero, default for string is "", default for boolean is "false" and default for date is in 1970". But see <4>.

3> Does it make any sense to support sortMissingFirst/Last? My initial take is "no" since what matters is consistent sorting. That said I started down that road before wondering whether it was desirable so this patch has sortMissingFirstLast in the test, it'll be removed unless there are objections.

4> [~yonik@apache.org]: Your comment about using functions is interesting. I'll take a look at that now that I have a clue what the problem is. It's certainly more elegant than some new flag I think and allows the user to put anything at all in rather than us deciding what a "proper" default is. Do you have any advice on how to access the defined default for the fields in SortingResponseWriter since that's where I need to trap this? (being lazy here).

5> I @Ignored all the rest of the tests except the new one to be able to beast the new stuff, they'll be un-ignored before committing.

6> Despite my comment on the dev list, after looking into this I don't think we want to force it into 6.3, I think there'll be some ramifications we'll need to bake out.

No doubt more later when we get some advice on how to continue.

> Export handler returns zero for numeric fields that are not in the original doc
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-9166
>                 URL: https://issues.apache.org/jira/browse/SOLR-9166
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Rohit
>         Attachments: SOLR-9166.patch, SOLR-9166.patch
>
>
> From the dev list discussion:
> My original post.
> Zero is different from not
> existing. And let's claim that I want to process a stream and, say,
> facet on in integer field over the result set. There's no way on the
> client side to distinguish between a document that has a zero in the
> field and one that didn't have the field in the first place so I'll
> over-count the zero bucket.
> From Dennis Gove:
> Is this true for non-numeric fields as well? I agree that this seems like a very bad thing.
> I can't imagine that a fix would cause a problem with Streaming Expressions, ParallelSQL, or other given that the /select handler is not returning 0 for these missing fields (the /select handler is the default handler for the Streaming API so if nulls were a problem I imagine we'd have already seen it). 
> That said, within Streaming Expressions there is a select(...) function which supports a replace(...) operation which allows you to replace one value (or null) with some other value. If a 0 were necessary one could use a select(...) to replace null with 0 using an expression like this 
>    select(<stream>, replace(fieldA, null, withValue=0)). 
> The end result of that would be that the field fieldA would never have a null value and for all tuples where a null value existed it would be replaced with 0.
> Details on the select function can be found at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61330338#StreamingExpressions-select.
> And to answer Denis' question, null gets returned for string DocValues fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org