You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by David Lee <ni...@comcast.net> on 2017/12/02 19:55:48 UTC

Having trouble indexing nested docs using "split" feature.

Hi all,

I've been trying for some time now to find a suitable way to deal with 
json documents that have nested data. By suitable, I mean being able to 
index them and retrieve them so that they are in the same structure as 
when indexed.

I'm using version 7.1 under linux Mint 18.3 with Oracle Java 1.8.0_151. 
After untarring the distribution, I ran through the "getting started" 
tutorial from the reference manual where it had me create the 
techproducts index. I then created another collection called 
my_collection so I could run the examples more easily. It used the 
_default schema.

Here is a sample:

{

     "book_id": "1234",     "book_title": "The Martian Chronicles",     
"author": "Ray Bradbury", "reviews": [         {             "reviewer": 
"John Smith",             "reviewer_background": {                 
"highest_rank": "Excellent",                 "latest_review": 
"10/15/2017 10:15:00.000 CST",             }         }, {             
"reviewer": "Adam Smith",            "reviewer_background": { 
             "highest_rank": "Good",             "latest_review": 
"10/10/2017 16:18:00.000 CST",         }     } ], "checkouts": [ { 
"member_id": "aaabbbccc", "member_name": "Sam Jackson" },{ "member_id": 
"bbbcccddd",           "member_name": "Buddy Jones"       }   ] }

Obviously, I'll need to search at the parent level and child level. I 
started experimenting and tried to use one of the examples from 
"Transforming and Indexing Solr JSON". However, when I tried the first 
example as follows:

curl 'http://localhost:8983/solr/my_collection/update/json/docs'\
> '?split=/exams'\
> '&f=first:/first'\
> '&f=last:/last'\
> '&f=grade:/grade'\
> '&f=subject:/exams/subject'\
> '&f=test:/exams/test'\
> '&f=marks:/exams/marks'\
>  -H 'Content-type:application/json' -d '
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
>     {
>       "subject": "Maths",
>       "test"   : "term1",
>       "marks"  : 90},
>     {
>       "subject": "Biology",
>       "test"   : "term1",
>       "marks"  : 86}
>   ]
> }'
{
   "responseHeader":{
     "status":0,
     "QTime":798}}

Though the status indicates there was no error, when I try to query on 
the the data using *:*, I get this:

curl 'http://localhost:8983/solr/my_collection/select?q=*:*'
{
   "responseHeader":{
     "zkConnected":true,
     "status":0,
     "QTime":6,
     "params":{
       "q":"*:*"}},
   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
   }}

So it looks like no documents were actually indexed from above. I'm 
trying to determine if this is due to an error in the reference manual, 
or if I haven't set up Solr correctly.

I've tried other techniques (not using the split option) like from 
Yonik's site, but those are slightly dated and I was hoping there was a 
more practical approach with the release of Solr 7.

Any assistance would be appreciated.

Thank you.

Re: Having trouble indexing nested docs using "split" feature.

Posted by David Lee <ni...@comcast.net>.

Sorry about the formatting for the first part, hope this is clearer:

{
     "book_id": "1234",
     "book_title": "The Martian Chronicles",
     "author": "Ray Bradbury",
     "reviews": [
         {
             "reviewer": "John Smith",
             "reviewer_background": {
                 "highest_rank": "Excellent",
                 "latest_review": "10/15/2017 10:15:00.000 CST",
             }
         }, {
             "reviewer": "Adam Smith",
             "reviewer_background": {
                 "highest_rank": "Good",
                 "latest_review": "10/10/2017 16:18:00.000 CST",
             }
         }
     ],
     "checkouts": [
         {
             "member_id": "aaabbbccc",
             "member_name": "Sam Jackson"
         },{
             "member_id": "bbbcccddd",
             "member_name": "Buddy Jones"
         }
     ]
}


On 12/2/2017 1:55 PM, David Lee wrote:
> Hi all,
>
> I've been trying for some time now to find a suitable way to deal with 
> json documents that have nested data. By suitable, I mean being able 
> to index them and retrieve them so that they are in the same structure 
> as when indexed.
>
> I'm using version 7.1 under linux Mint 18.3 with Oracle Java 
> 1.8.0_151. After untarring the distribution, I ran through the 
> "getting started" tutorial from the reference manual where it had me 
> create the techproducts index. I then created another collection 
> called my_collection so I could run the examples more easily. It used 
> the _default schema.
>
> Here is a sample:
>
> {
>
>     "book_id": "1234",     "book_title": "The Martian Chronicles",     
> "author": "Ray Bradbury", "reviews": [         { "reviewer": "John 
> Smith",             "reviewer_background": {                 
> "highest_rank": "Excellent", "latest_review": "10/15/2017 10:15:00.000 
> CST",             }         }, {             "reviewer": "Adam Smith", 
> "reviewer_background": {             "highest_rank": "Good", 
>             "latest_review": "10/10/2017 16:18:00.000 CST",         } 
>     } ], "checkouts": [ { "member_id": "aaabbbccc", "member_name": 
> "Sam Jackson" },{ "member_id": "bbbcccddd",           "member_name": 
> "Buddy Jones"       }   ] }
>
> Obviously, I'll need to search at the parent level and child level. I 
> started experimenting and tried to use one of the examples from 
> "Transforming and Indexing Solr JSON". However, when I tried the first 
> example as follows:
>
> curl 'http://localhost:8983/solr/my_collection/update/json/docs'\
>> '?split=/exams'\
>> '&f=first:/first'\
>> '&f=last:/last'\
>> '&f=grade:/grade'\
>> '&f=subject:/exams/subject'\
>> '&f=test:/exams/test'\
>> '&f=marks:/exams/marks'\
>>   -H 'Content-type:application/json' -d '
>> {
>>    "first": "John",
>>    "last": "Doe",
>>    "grade": 8,
>>    "exams": [
>>      {
>>        "subject": "Maths",
>>        "test"   : "term1",
>>        "marks"  : 90},
>>      {
>>        "subject": "Biology",
>>        "test"   : "term1",
>>        "marks"  : 86}
>>    ]
>> }'
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":798}}
>
> Though the status indicates there was no error, when I try to query on 
> the the data using *:*, I get this:
>
> curl 'http://localhost:8983/solr/my_collection/select?q=*:*'
> {
>   "responseHeader":{
>     "zkConnected":true,
>     "status":0,
>     "QTime":6,
>     "params":{
>       "q":"*:*"}},
>   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>   }}
>
> So it looks like no documents were actually indexed from above. I'm 
> trying to determine if this is due to an error in the reference 
> manual, or if I haven't set up Solr correctly.
>
> I've tried other techniques (not using the split option) like from 
> Yonik's site, but those are slightly dated and I was hoping there was 
> a more practical approach with the release of Solr 7.
>
> Any assistance would be appreciated.
>
> Thank you.
>
>
>
>
>

Re: Having trouble indexing nested docs using "split" feature.

Posted by Shawn Heisey <ap...@elyograg.org>.

On 12/2/2017 12:55 PM, David Lee wrote:
> {
>    "responseHeader":{
>      "status":0,
>      "QTime":798}}
> 
> Though the status indicates there was no error, when I try to query on 
> the the data using *:*, I get this:
> 
> curl 'http://localhost:8983/solr/my_collection/select?q=*:*'
> {
>    "responseHeader":{
>      "zkConnected":true,
>      "status":0,
>      "QTime":6,
>      "params":{
>        "q":"*:*"}},
>    "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>    }}
> 
> So it looks like no documents were actually indexed from above. I'm 
> trying to determine if this is due to an error in the reference manual, 
> or if I haven't set up Solr correctly.

I don't know anything at all about the split feature or the parent/child 
document feature.  I'm going to concentrate on the fact that numFound is 
zero.  With the indexing returning a success response, there should have 
been SOMETHING indexed.

Did you ever do a commit operation?  This can be an explicit operation, 
or there are some ways you can have it happen automatically.  If you 
include a commitWithin parameter on the indexing request, then there 
will be an automatic commit within that many milliseconds from when 
indexing started.  You can configure autoSoftCommit in solrconfig.xml, 
then reload the core/collection or restart Solr.

Unless there is a commit that opens a new searcher, changes made to the 
index will never be visible to clients.

https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The article title says "SolrCloud" but all the information is just as 
applicable to standalone mode.

If you *have* done a commit with openSearcher set to true (which is the 
default setting for openSearcher), then we'll need to examine solr.log, 
and you'll need to be sure that the indexing request happened during the 
time the log was created.

Thanks,
Shawn