You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "James Dyer (JIRA)" <ji...@apache.org> on 2017/05/01 16:52:04 UTC

[jira] [Commented] (SOLR-10522) Duplicate keys in "collations" object with JSON response format

    [ https://issues.apache.org/jira/browse/SOLR-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991076#comment-15991076 ] 

James Dyer commented on SOLR-10522:
-----------------------------------

We might need to re-think our work with SOLR-9972.  My apologizes [~cpoerschke] in that when I reviewed SOLR-9972, I hadn't realized we had more than 1 json format and that fixing one might break the others. 

prior to SOLR-9972, our "flat" (default) json looked like this for the collation section:
{noformat}
"collations":[
    "collation",{
        "collationQuery":"lowerfilt:(+faith +hope +loaves)",
        "hits":1,
        "misspellingsAndCorrections":[
        "fauth","faith",
        "home","hope",
        "loane","loaves"]},
    "collation",{
        "collationQuery":"lowerfilt:(+faith +hope +love)",
        "hits":1,
        "misspellingsAndCorrections":[
        "fauth","faith",
        "home","hope",
        "loane","love"]}]
{noformat}

...by having "collations" as a NamedList, we avoid having duplicate keys with "collation".  But the "arrntv" format chokes around the "collationQuery":

{noformat}
"collations":
[
    {"name":"collation",{
    "type":"str","value":"collationQuery":"lowerfilt:(+faith +hope +loaves)",
    "hits":1,
    "misspellingsAndCorrections":
    [
        {"name":"fauth","type":"str","value":"faith"},
        {"name":"home","type":"str","value":"hope"},
        {"name":"loane","type":"str","value":"loaves"}]}},
    {"name":"collation",{
    "type":"str","value":"collationQuery":"lowerfilt:(+faith +hope +love)",
    "hits":1,
    "misspellingsAndCorrections":
    [
        {"name":"fauth","type":"str","value":"faith"},
        {"name":"home","type":"str","value":"hope"},
        {"name":"loane","type":"str","value":"love"}]}}]
{noformat}

...So SOLR-9972 changed "collations" to be a SimpleOrderedMap.  Now we get this for "arrntv":
{noformat}
"collations":{
    "collation":{
    "collationQuery":"lowerfilt:(+faith +hope +loaves)",
    "hits":1,
    "misspellingsAndCorrections":
    [
        {"name":"fauth","type":"str","value":"faith"},
        {"name":"home","type":"str","value":"hope"},
        {"name":"loane","type":"str","value":"loaves"}]},
    "collation":{
    "collationQuery":"lowerfilt:(+faith +hope +love)",
    "hits":1,
    "misspellingsAndCorrections":
    [
        {"name":"fauth","type":"str","value":"faith"},
        {"name":"home","type":"str","value":"hope"},
        {"name":"loane","type":"str","value":"love"}]}}
{noformat}

...so now it renders valid json.  But under "collations", we have duplicate keys, right?  If there is more than 1 collation, the "collation" key keeps getting overwritten.  

So then, it seems that SOLR-9972 is only a partial fix for "arrntv" because while we have valid json, there are duplicate keys.  But worse, SOLR-9972 broke the default json format, both from a backwards-compatibility standpoint, and also from a correctness standpoint as this is also subject to duplicate keys.

I'd think reverting SOLR-9972 would leave us in a better situation than the current one.  But can someone suggest a solution that would result in:
- valid json for all the various json formats we support
- no duplicate keys when there are multiple collations
- no breaking backwards compatibility until 7.0, except for the completely-broken "arrntv" case ? (6.5 changes notwithstanding, breaking backwards here was a bug in my opinion).

??




> Duplicate keys in "collations" object with JSON response format
> ---------------------------------------------------------------
>
>                 Key: SOLR-10522
>                 URL: https://issues.apache.org/jira/browse/SOLR-10522
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: spellchecker
>    Affects Versions: 6.5
>            Reporter: Nikita Pchelintsev
>            Assignee: James Dyer
>            Priority: Minor
>
> After upgrading Solr 6.3 -> 6.5 I've noticed a change in how json response writer outputs "collations" response key when spellchecking is enabled (wt=json&json.nl=arrarr)
> Solr 6.3:
> "collations":
>     [
>       ["collation",{
>           "collationQuery":"the",
>           "hits":48,
>           "maxScore":"30.282",
>           "misspellingsAndCorrections":
>           [
>             ["thea","the"]]}],
>       ["collation",{
>           "collationQuery":"tea",
>           "hits":3,
>           "maxScore":"2.936",
>           "misspellingsAndCorrections":
>           [
>             ["thea","tea"]]}],
>       ...
> Solr 6.5:
> "collations":{
>       "collation":{
>         "collationQuery":"the",
>         "hits":43,
>         "misspellingsAndCorrections":
>         [
>           ["thea","the"]]},
>       "collation":{
>         "collationQuery":"tea",
>         "hits":3,
>         "misspellingsAndCorrections":
>         [
>           ["thea","tea"]]},
>         ...
> Solr 6.5 outputs object instead of an array, and it has duplicate keys which is not valid for JSON format.
> Any help is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org