You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by ru...@comcast.net on 2017/04/20 19:15:20 UTC

Advice on how to work with pure JSON data.

I have looked at many examples on how to do what I want, but they tend to only show fragments or they 
are based on older versions of Solr. I'm hoping there are new features that make what I'm doing easier. 

I am running version 6.5 and am testing by running in cloud mode but only on a single machine. 

Basically, I have a large number of documents stored as JSON in individual files. I want to take that JSON 
document and index it without having to do any pre-processing, etc. I also need to be able to write newly indexed 
JSON data back to individual files in the same format. 

For example, let's say I have a json document that looks like the following: 

{ 
"id" : "bb903493-55b0-421f-a83e-2199ea11e136", 
"productName_s" : "UsefulWidget", 
"productCategory_s" : "tool", 
"suppliers" : [ 
{ 
"id" : " bb903493-55b0-421f-a83e-2199ea11e221", 
"name_s" : "Acme Tools", 
"productNumber_s" : "10342UW" 
}, { 
"id" : " bb903493-55b0-421a-a83e-2199ea11e445", 
"name_s" : "Snappy Tools", 
"productNumber_s" : "ST-X100023" 
} 
], 
"resellers" : [ 
{ 
"id" : "cc 903493-55b0-421f-a83e-2199ea11e221", 
"name_s" : "Target", 
"productSKU_s" : "TA092310342UW" 
}, { 
"id" : "bc903493-55b0-421a-a83e-2199ea11e445", 
"name_s" : "Wal-Mart", 
"productSKU_s" : "029342ABLSWM" 
} 
] 
} 

I know I can use the /update/json/docs handler to insert the above but from what I understand, I'd have to set up parameters 
telling it how to split the children, etc. Though that is a bit of a pain, I can make that happen. 

The problem is that, when I then try to query for the data, it comes back with _childDocuments_ instead of the names of the 
child document lists. So, how can I have Solr return the document as it was originally indexed (I know it would be embedded 
in the results structure, but I can deal with that)? 

I am running version 6.5 and I am hoping there is a method I haven't seen documented that can do this. If not, can someone 
point me to some examples of how to do this another way. 

If there is no easy way to do this with the current version, can someone point me to a good resource for writing my own 
handlers? 

Thank you.

Re: Advice on how to work with pure JSON data.

Posted by Mikhail Khludnev <mk...@apache.org>.

Hello,
See below.

On Fri, Apr 21, 2017 at 8:21 AM, <ru...@comcast.net> wrote:

> One thing I forgot to mention in my original post is that I wish to do
> this using the SolrJ client.
> I have my own rest server that presents a common API to our users, but the
> back-end can be
> anything I wish. I have been using "that other Lucene based product" :),
> but I wish to stick to
> a product that is more open and that perhaps I can contribute to.
>
> I've searched for SolrJ examples for child documents and unfortunately
> there are far too
> many references to implementations based off of older versions of Solr.
> Specifically, I would
> like to insert beans with multiple child collections in them, but the
> latest I've read says this
> is not currently possible. Is that still true?
>
Right. That how it was done at SOLR-1945
Now it throws cannot have more than one Field with child=true

>
> In short, It isn't so important that REST based requests / responses from
> Solr are pure JSON
> so long as I can do what I want from the java client.
>
> Do you know if there have been recent additions / enhancements up through
> 6.5 that make
> this more straight-forward?
>
Nothing new there.


>
> Thanks
>
>
> ----- Original Message -----
>
> From: "Mikhail Khludnev" <mk...@apache.org>
> To: "solr-user" <so...@lucene.apache.org>
> Sent: Thursday, April 20, 2017 3:38:11 PM
> Subject: Re: Advice on how to work with pure JSON data.
>
> This is one of the features of the epic
> https://issues.apache.org/jira/browse/SOLR-10144.
> Until it's done the only way to achieve this is to properly set many params
> for
> https://cwiki.apache.org/confluence/display/solr/
> Transforming+Result+Documents#TransformingResultDocuments-[subquery]
>
> Note, here I assume that children mapping is static ie there is a limited
> list of optional scopes.
> Indexing and searching arbitrary JSON is esoteric (XML DB like) problem.
> Also, beware of https://issues.apache.org/jira/browse/SOLR-10500. I hope
> to
> fix it soon.
>
> On Thu, Apr 20, 2017 at 10:15 PM, <ru...@comcast.net> wrote:
>
> >
> > I have looked at many examples on how to do what I want, but they tend to
> > only show fragments or they
> > are based on older versions of Solr. I'm hoping there are new features
> > that make what I'm doing easier.
> >
> > I am running version 6.5 and am testing by running in cloud mode but only
> > on a single machine.
> >
> > Basically, I have a large number of documents stored as JSON in
> individual
> > files. I want to take that JSON
> > document and index it without having to do any pre-processing, etc. I
> also
> > need to be able to write newly indexed
> > JSON data back to individual files in the same format.
> >
> > For example, let's say I have a json document that looks like the
> > following:
> >
> > {
> > "id" : "bb903493-55b0-421f-a83e-2199ea11e136",
> > "productName_s" : "UsefulWidget",
> > "productCategory_s" : "tool",
> > "suppliers" : [
> > {
> > "id" : " bb903493-55b0-421f-a83e-2199ea11e221",
> > "name_s" : "Acme Tools",
> > "productNumber_s" : "10342UW"
> > }, {
> > "id" : " bb903493-55b0-421a-a83e-2199ea11e445",
> > "name_s" : "Snappy Tools",
> > "productNumber_s" : "ST-X100023"
> > }
> > ],
> > "resellers" : [
> > {
> > "id" : "cc 903493-55b0-421f-a83e-2199ea11e221",
> > "name_s" : "Target",
> > "productSKU_s" : "TA092310342UW"
> > }, {
> > "id" : "bc903493-55b0-421a-a83e-2199ea11e445",
> > "name_s" : "Wal-Mart",
> > "productSKU_s" : "029342ABLSWM"
> > }
> > ]
> > }
> >
> > I know I can use the /update/json/docs handler to insert the above but
> > from what I understand, I'd have to set up parameters
> > telling it how to split the children, etc. Though that is a bit of a
> pain,
> > I can make that happen.
> >
> > The problem is that, when I then try to query for the data, it comes back
> > with _childDocuments_ instead of the names of the
> > child document lists. So, how can I have Solr return the document as it
> > was originally indexed (I know it would be embedded
> > in the results structure, but I can deal with that)?
> >
> > I am running version 6.5 and I am hoping there is a method I haven't seen
> > documented that can do this. If not, can someone
> > point me to some examples of how to do this another way.
> >
> > If there is no easy way to do this with the current version, can someone
> > point me to a good resource for writing my own
> > handlers?
> >
> > Thank you.
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Advice on how to work with pure JSON data.

Posted by ru...@comcast.net.

One thing I forgot to mention in my original post is that I wish to do this using the SolrJ client. 
I have my own rest server that presents a common API to our users, but the back-end can be 
anything I wish. I have been using "that other Lucene based product" :), but I wish to stick to 
a product that is more open and that perhaps I can contribute to. 

I've searched for SolrJ examples for child documents and unfortunately there are far too 
many references to implementations based off of older versions of Solr. Specifically, I would 
like to insert beans with multiple child collections in them, but the latest I've read says this 
is not currently possible. Is that still true? 

In short, It isn't so important that REST based requests / responses from Solr are pure JSON 
so long as I can do what I want from the java client. 

Do you know if there have been recent additions / enhancements up through 6.5 that make 
this more straight-forward? 

Thanks 

----- Original Message -----

From: "Mikhail Khludnev" <mk...@apache.org> 
To: "solr-user" <so...@lucene.apache.org> 
Sent: Thursday, April 20, 2017 3:38:11 PM 
Subject: Re: Advice on how to work with pure JSON data. 

This is one of the features of the epic 
https://issues.apache.org/jira/browse/SOLR-10144. 
Until it's done the only way to achieve this is to properly set many params 
for 
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents#TransformingResultDocuments-[subquery] 

Note, here I assume that children mapping is static ie there is a limited 
list of optional scopes. 
Indexing and searching arbitrary JSON is esoteric (XML DB like) problem. 
Also, beware of https://issues.apache.org/jira/browse/SOLR-10500. I hope to 
fix it soon. 

On Thu, Apr 20, 2017 at 10:15 PM, <ru...@comcast.net> wrote: 

> 
> I have looked at many examples on how to do what I want, but they tend to 
> only show fragments or they 
> are based on older versions of Solr. I'm hoping there are new features 
> that make what I'm doing easier. 
> 
> I am running version 6.5 and am testing by running in cloud mode but only 
> on a single machine. 
> 
> Basically, I have a large number of documents stored as JSON in individual 
> files. I want to take that JSON 
> document and index it without having to do any pre-processing, etc. I also 
> need to be able to write newly indexed 
> JSON data back to individual files in the same format. 
> 
> For example, let's say I have a json document that looks like the 
> following: 
> 
> { 
> "id" : "bb903493-55b0-421f-a83e-2199ea11e136", 
> "productName_s" : "UsefulWidget", 
> "productCategory_s" : "tool", 
> "suppliers" : [ 
> { 
> "id" : " bb903493-55b0-421f-a83e-2199ea11e221", 
> "name_s" : "Acme Tools", 
> "productNumber_s" : "10342UW" 
> }, { 
> "id" : " bb903493-55b0-421a-a83e-2199ea11e445", 
> "name_s" : "Snappy Tools", 
> "productNumber_s" : "ST-X100023" 
> } 
> ], 
> "resellers" : [ 
> { 
> "id" : "cc 903493-55b0-421f-a83e-2199ea11e221", 
> "name_s" : "Target", 
> "productSKU_s" : "TA092310342UW" 
> }, { 
> "id" : "bc903493-55b0-421a-a83e-2199ea11e445", 
> "name_s" : "Wal-Mart", 
> "productSKU_s" : "029342ABLSWM" 
> } 
> ] 
> } 
> 
> I know I can use the /update/json/docs handler to insert the above but 
> from what I understand, I'd have to set up parameters 
> telling it how to split the children, etc. Though that is a bit of a pain, 
> I can make that happen. 
> 
> The problem is that, when I then try to query for the data, it comes back 
> with _childDocuments_ instead of the names of the 
> child document lists. So, how can I have Solr return the document as it 
> was originally indexed (I know it would be embedded 
> in the results structure, but I can deal with that)? 
> 
> I am running version 6.5 and I am hoping there is a method I haven't seen 
> documented that can do this. If not, can someone 
> point me to some examples of how to do this another way. 
> 
> If there is no easy way to do this with the current version, can someone 
> point me to a good resource for writing my own 
> handlers? 
> 
> Thank you. 
> 
> 
> 
> 
> 
> 
> 
> 
> 

-- 
Sincerely yours 
Mikhail Khludnev

Re: Advice on how to work with pure JSON data.

Posted by Mikhail Khludnev <mk...@apache.org>.

This is one of the features of the epic
https://issues.apache.org/jira/browse/SOLR-10144.
Until it's done the only way to achieve this is to properly set many params
for
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents#TransformingResultDocuments-[subquery]

Note, here I assume that children mapping is static ie there is a limited
list of optional scopes.
Indexing and searching arbitrary JSON is esoteric (XML DB like) problem.
Also, beware of https://issues.apache.org/jira/browse/SOLR-10500. I hope to
fix it soon.

On Thu, Apr 20, 2017 at 10:15 PM, <ru...@comcast.net> wrote:

>
> I have looked at many examples on how to do what I want, but they tend to
> only show fragments or they
> are based on older versions of Solr. I'm hoping there are new features
> that make what I'm doing easier.
>
> I am running version 6.5 and am testing by running in cloud mode but only
> on a single machine.
>
> Basically, I have a large number of documents stored as JSON in individual
> files. I want to take that JSON
> document and index it without having to do any pre-processing, etc. I also
> need to be able to write newly indexed
> JSON data back to individual files in the same format.
>
> For example, let's say I have a json document that looks like the
> following:
>
> {
> "id" : "bb903493-55b0-421f-a83e-2199ea11e136",
> "productName_s" : "UsefulWidget",
> "productCategory_s" : "tool",
> "suppliers" : [
> {
> "id" : " bb903493-55b0-421f-a83e-2199ea11e221",
> "name_s" : "Acme Tools",
> "productNumber_s" : "10342UW"
> }, {
> "id" : " bb903493-55b0-421a-a83e-2199ea11e445",
> "name_s" : "Snappy Tools",
> "productNumber_s" : "ST-X100023"
> }
> ],
> "resellers" : [
> {
> "id" : "cc 903493-55b0-421f-a83e-2199ea11e221",
> "name_s" : "Target",
> "productSKU_s" : "TA092310342UW"
> }, {
> "id" : "bc903493-55b0-421a-a83e-2199ea11e445",
> "name_s" : "Wal-Mart",
> "productSKU_s" : "029342ABLSWM"
> }
> ]
> }
>
> I know I can use the /update/json/docs handler to insert the above but
> from what I understand, I'd have to set up parameters
> telling it how to split the children, etc. Though that is a bit of a pain,
> I can make that happen.
>
> The problem is that, when I then try to query for the data, it comes back
> with _childDocuments_ instead of the names of the
> child document lists. So, how can I have Solr return the document as it
> was originally indexed (I know it would be embedded
> in the results structure, but I can deal with that)?
>
> I am running version 6.5 and I am hoping there is a method I haven't seen
> documented that can do this. If not, can someone
> point me to some examples of how to do this another way.
>
> If there is no easy way to do this with the current version, can someone
> point me to a good resource for writing my own
> handlers?
>
> Thank you.
>
>
>
>
>
>
>
>
>


-- 
Sincerely yours
Mikhail Khludnev