You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2008/12/30 07:26:44 UTC

[jira] Created: (SOLR-945) JSON update handler

JSON update handler
-------------------

                 Key: SOLR-945
                 URL: https://issues.apache.org/jira/browse/SOLR-945
             Project: Solr
          Issue Type: New Feature
            Reporter: Ryan McKinley


In addition to supporting xml and csv updating, it would be good to support json.

This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-945) JSON update handler

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659948#action_12659948 ] 

Yonik Seeley commented on SOLR-945:
-----------------------------------

Yes, the parsing is much faster in noggit vs parsing XML (or vs other JSON parsers for that matter).  Not sure what the split between parsing/indexing... I imagine/hope that more time is spent in indexing.

Two more benefits:
 - smaller footprint... less network IO
 - able to represent the entire unicode range


> JSON update handler
> -------------------
>
>                 Key: SOLR-945
>                 URL: https://issues.apache.org/jira/browse/SOLR-945
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-945-json-update.patch
>
>
> In addition to supporting xml and csv updating, it would be good to support json.
> This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-945) JSON update handler

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659966#action_12659966 ] 

Ryan McKinley commented on SOLR-945:
------------------------------------

Re: API revist...

For better or worse, this patch matches the JSON format with the XXXUpdateCommand classes.  Unlike XML, each add document requires a new add statement.  I did this since adding the document boost gets really clumsy.

{code}
"add": {
  "commitWithin": 1234,
  "overwrite": false,
  "boost": 3.45,
  "doc": {
    "f1": "v1",
    "f1": "v2"
  }
{code}

Adding the document boost to your example gets a bit ugly:
{code}
"docs":[
     { "boost": 2,
        "fields": { "f0": "v0",  "f1": 2.4 }
     },
     { "boost": 3,
        "fields": { "f0": "v0",  "f1": 2.4 }
     },
  ]
{code}

Personally, I like having the entire command encompassed in JSON rather then spreading it between the query args and the post body.  I like this since all commands can be represented sequentially and clearly.  Also it allows for easier streamming.  

For the 'add' command, I don't think we make things much easier/clearer by adding args.  

I agree a more RESTfull API is a good thing, but I think that is a separate task.  For that, we should look at supporting HTTP GET/PUT/DELETE as the main control structures rather then passing params.

For the XmlUpdateRequestHandler we added some arguments to the query string so that we could call "commit" in the same request that we send documents.  In retrospect I'm not sure that was a good idea.  We could achieve the same thing with:
{code:xml}
<commands>
  <add>
     ...
  </add>
  <commit />
</commands>
{code}
(this is currently supported by the XmlUpdateRequestHandler, since it only starts parsing after it hits known commands (add, commit, etc)



> JSON update handler
> -------------------
>
>                 Key: SOLR-945
>                 URL: https://issues.apache.org/jira/browse/SOLR-945
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-945-json-update.patch
>
>
> In addition to supporting xml and csv updating, it would be good to support json.
> This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-945) JSON update handler

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-945:
-------------------------------

    Attachment: SOLR-945-json-update.patch

Here is a patch that lets you update sending documents that look like this:
{code}
{ 

"add": {
  "doc": {
    "f0": "v0",
    "f2": {
      "boost": 2.3,
      "value": "test"
    },
    "array": [ "aaa", "bbb" ],
    "boosted": {
      "boost": 6.7,
      "value": [ "aaa", "bbb" ]
    }
  }
},
"add": {
  "commitWithin": 1234,
  "overwrite": false,
  "boost": 3.45,
  "doc": {
    "f1": "v1",
    "f1": "v2"
  }
},

"commit": {},
"optimize": { "waitFlush":false, "waitSearcher":false },

"delete": { "id":"ID" },
"delete": { "query":"QUERY" },
"rollback": {}

}

{code}

> JSON update handler
> -------------------
>
>                 Key: SOLR-945
>                 URL: https://issues.apache.org/jira/browse/SOLR-945
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-945-json-update.patch
>
>
> In addition to supporting xml and csv updating, it would be good to support json.
> This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-945) JSON update handler

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659957#action_12659957 ] 

Ryan McKinley commented on SOLR-945:
------------------------------------

Is this something we should consider for 1.4?  Since Grant refactored all the XmlLoader stuff, this is a pretty simple extension.

The only real issue i see is how to include the noggit library?  Since noggit is in apache labs, it can not have a release there.  We could:
 1. build a jar and release it as "apache-solr-noggit.jar" the same way we do with commons-csv
 2. move noggit to sourceforge and release from there.

#1 seems easier to me.

> JSON update handler
> -------------------
>
>                 Key: SOLR-945
>                 URL: https://issues.apache.org/jira/browse/SOLR-945
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-945-json-update.patch
>
>
> In addition to supporting xml and csv updating, it would be good to support json.
> This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-945) JSON update handler

Posted by "Matt Weber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715198#action_12715198 ] 

Matt Weber commented on SOLR-945:
---------------------------------

Any update on this for 1.4? 

+1 here.

> JSON update handler
> -------------------
>
>                 Key: SOLR-945
>                 URL: https://issues.apache.org/jira/browse/SOLR-945
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-945-json-update.patch
>
>
> In addition to supporting xml and csv updating, it would be good to support json.
> This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-945) JSON update handler

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659955#action_12659955 ] 

Yonik Seeley commented on SOLR-945:
-----------------------------------

The API looks like a direct translation of the XML API.... that's a reasonable approach, but we should also take this chance to revisit and see what we might want to change.

If we were to do it over again (now that we can grab params from the URL in a POST), would we prefer removing the adjectives like "add" and some of the other parameters from the XML?

{code}
http://localhost:8983/solr/update/add?commitWithin=1234

{
  "docs":[
     { "f0": "v0",
        "f2": {
        "boost": 2.3,
        "value": "test"}
     },
     { "fo":"zzz",
       "f1":"ggg"
     }
  ]
}
  

for deletes,
http://localhost:8983/solr/update/delete?q=foo:1234   (or /update?delete=foo:1234)
{code}



> JSON update handler
> -------------------
>
>                 Key: SOLR-945
>                 URL: https://issues.apache.org/jira/browse/SOLR-945
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-945-json-update.patch
>
>
> In addition to supporting xml and csv updating, it would be good to support json.
> This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-945) JSON update handler

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659880#action_12659880 ] 

Ryan McKinley commented on SOLR-945:
------------------------------------

I have not benchmarked anything yet...  current motivation is interface rather then speed (though it is potentially faster)

> JSON update handler
> -------------------
>
>                 Key: SOLR-945
>                 URL: https://issues.apache.org/jira/browse/SOLR-945
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-945-json-update.patch
>
>
> In addition to supporting xml and csv updating, it would be good to support json.
> This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-945) JSON update handler

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659878#action_12659878 ] 

Otis Gospodnetic commented on SOLR-945:
---------------------------------------

Out of curiosity, is there a benefit to using JSON over XML for indexing/updating?  Perhaps noggit is (much?) faster than the XML parser Solr uses and this has noticeable difference? (though I'd guess that indexing itself is what takes most of the time)


> JSON update handler
> -------------------
>
>                 Key: SOLR-945
>                 URL: https://issues.apache.org/jira/browse/SOLR-945
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-945-json-update.patch
>
>
> In addition to supporting xml and csv updating, it would be good to support json.
> This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-945) JSON update handler

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659969#action_12659969 ] 

Yonik Seeley commented on SOLR-945:
-----------------------------------

bq. Is this something we should consider for 1.4?

I think so... it's easy to understand both the API and the impact at a glance.

bq. 1. build a jar and release it as "apache-solr-noggit.jar" the same way we do with commons-csv

+1


> JSON update handler
> -------------------
>
>                 Key: SOLR-945
>                 URL: https://issues.apache.org/jira/browse/SOLR-945
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>         Attachments: SOLR-945-json-update.patch
>
>
> In addition to supporting xml and csv updating, it would be good to support json.
> This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.