You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Kelly Kagen (JIRA)" <ji...@apache.org> on 2015/11/03 23:20:28 UTC
[jira] [Commented] (SOLR-6304) Transforming and Indexing custom JSON data

    [ https://issues.apache.org/jira/browse/SOLR-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988279#comment-14988279 ] 

Kelly Kagen commented on SOLR-6304:
-----------------------------------

I'm having some difficulty while indexing custom JSON data using v5.3.1. I took the same example from the documentation, but it doesn't seem to be working as expected. Can someone validate if this is a bug or there's an issue with the procedure followed? The below are the scenarios.

Source: [Indexing custom JSON data|http://lucidworks.com/blog/2014/08/12/indexing-custom-json-data], [Transforming and Indexing Custom JSON|https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-TransformingandIndexingCustomJSON]

*Note:* The echo parameter has been added.

*Input:*
{code}
curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/exams'
'&f=first:/first'
'&f=last:/last'
'&f=grade:/grade'
'&f=subject:/exams/subject'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
'&echo=true'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'
{code}

*Output:*
{code}
{
  "error":{
    "msg":"Raw data can be stored only if split=/",
    "code":400
  }
}
{code}

Say I pass only '/' to the split parameter as reported, but with different field mappping, it doesn't seem to index the data per mentioned fields. Notice the suffix 'Name' added in the input JSON and also the field mapping.

*Input:*
{code}
curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/'
'&f=first:/firstName'
'&f=last:/lastName'
'&f=grade:/grade'
'&f=subject:/exams/subjectName'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
'&echo=true'
 -H 'Content-type:application/json' -d '
{
  "firstName": "John",
  "lastName": "Doe",
  "grade": 8,
  "exams": [
      {
        "subjectName": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'
{code}

*Output:*
{code}
{"responseHeader":{"status":0,"QTime":0},"docs":[{"id":"3c5fa5a0-ff71-4fef-b3e9-8e279cc0d724","_src_":"{  \"firstName\": \"John\",  \"lastName\": \"Doe\",  \"grade\": 8,  \"exams\": [      {        \"subjectName\": \"Maths\",        \"test\"   : \"term1\",        \"marks\":90},        {         \"subject\": \",         \"test\"   : \"term1\",         \"marks\":86}      ]}","text":["John","Doe",8,"Maths",["term1","term1"],[90,86]]}]}
{code}

If there is a field named "id" is present then that reflects in the reponse, but all other fields are ignored for some reason.

*Input:*
{code}
curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/'
'&f=first:/firstName'
'&f=id:/lastName'
'&f=grade:/grade'
'&f=subject:/exams/subjectName'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
'&echo=true'
 -H 'Content-type:application/json' -d '
{
  "firstName": "John",
  "lastName": "Doe",
  "grade": 8,
  "exams": [
      {
        "subjectName": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'
{code}

*Output:*
{code}
{"responseHeader":{"status":0,"QTime":1},"docs":[{"id":"Doe","_src_":"{  \"firstName\": \"John\",  \"lastName\": \"Doe\",  \"grade\": 8,  \"exams\": [      {        \"subjectName\": \"Maths\",        \"test\"   : \"term1\",        \"marks\":90},        {         \"subject\": \",         \"test\"   : \"term1\",         \"marks\":86}      ]}","text":["John","Doe",8,"Maths",["term1","term1"],[90,86]]}]}
{code}

> Transforming and Indexing custom JSON data
> ------------------------------------------
>
>                 Key: SOLR-6304
>                 URL: https://issues.apache.org/jira/browse/SOLR-6304
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>             Fix For: 4.10, Trunk
>
>         Attachments: SOLR-6304.patch, SOLR-6304.patch
>
>
> example
> {noformat}
> curl localhost:8983/update/json/docs?split=/batters/batter&f=recipeId:/id&f=recipeType:/type&f=id:/batters/batter/id&f=type:/batters/batter/type -d '
> {
> 		"id": "0001",
> 		"type": "donut",
> 		"name": "Cake",
> 		"ppu": 0.55,
> 		"batters": {
> 				"batter":
> 					[
> 						{ "id": "1001", "type": "Regular" },
> 						{ "id": "1002", "type": "Chocolate" },
> 						{ "id": "1003", "type": "Blueberry" },
> 						{ "id": "1004", "type": "Devil's Food" }
> 					]
> 			}
> }'
> {noformat}
> should produce the following output docs
> {noformat}
> { "recipeId":"001", "recipeType":"donut", "id":"1001", "type":"Regular" }
> { "recipeId":"001", "recipeType":"donut", "id":"1002", "type":"Chocolate" }
> { "recipeId":"001", "recipeType":"donut", "id":"1003", "type":"Blueberry" }
> { "recipeId":"001", "recipeType":"donut", "id":"1004", "type":"Devil's food" }
> {noformat}
> the split param is the element in the tree where it should be split into multiple docs. The 'f' are field name mappings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org