You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2018/09/11 16:23:11 UTC

Update partial document

Hi Solr gurus :)

I have a delicious question (that I'm struggling with), really hope that
someone can help me.

There is a document with many fields but I have to modify only few of them.

I thought to use atomic update but it seems that I cannot replace an entire
list of dynamic fields.

Here I try to explain my problem, for example using the schemaless
configuration, I have a dynamic field:

<dynamicField name="attr_*" type="text_general" indexed="true"
stored="true" multiValued="true"/>

And then I have a document :

     {
        "id":"aaa",
        "value_i":10,
        "attr_1":["a"]
     }

I suppose to be able to remove attr_1 and add attr_3 with one atomic update.

Like this:

curl -X POST -H 'Content-Type: application/json' '
http://localhost:8983/solr/gettingstarted/update?versions=true&commit=true'
--data-binary '
 [
    {
      "id" : "aaa" ,
      "attr_" : [ "set" : null ],
      "attr_3" : [ "set" : "x" ]
    }
]'

But as result I only have a new attr_3 field (the field attr_1 is still
there)

     {
        "id":"aaa",
        "value_i":10,
        "attr_1":["a"],
        "attr_3":["x"]
     }

So it seem that, for this particular case, I have first to read the
document and then I can update it.

Do you think there are other options?
Can I use the StatelessScriptUpdateProcessorFactory ?
Should I write my own UpdateProcessor ?

Thanks in advance for your time.
Vincenzo

-- 
Vincenzo D'Amore

Re: Update partial document

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Mikhail, Shawn,

thanks for your prompt answer.
The problem is that the indexed documents have dozen of fields and usually
they are different for each document.

For example document id 1 has few generic fields like title, description
and all the attributes like attr_1224, attr_4343, attr_4454, attr_5345, and
so on (dozen).
document id 2 like the former has its generic fields and attr_435,
attr_165, attr_986, attr_12, and so on (dozen).

In other words, I cannot know for each document I have update what are the
list of attr_# that I have to remove.

In the update request there is only the list of new fields/values that I
have to substitute in the document and yes, this list can be different from
the original document.




On Tue, Sep 11, 2018 at 7:42 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 9/11/2018 10:23 AM, Vincenzo D'Amore wrote:
> > I suppose to be able to remove attr_1 and add attr_3 with one atomic
> update.
> >
> > Like this:
> >
> > curl -X POST -H 'Content-Type: application/json' '
> >
> http://localhost:8983/solr/gettingstarted/update?versions=true&commit=true
> '
> > --data-binary '
> >   [
> >      {
> >        "id" : "aaa" ,
> >        "attr_" : [ "set" : null ],
> >        "attr_3" : [ "set" : "x" ]
> >      }
> > ]'
>
> This would probably have worked if you had used "attr_1" instead of
> "attr_".  There is no field named "attr_" in your document, so that line
> does nothing.  Fields in atomic updates must be fully specified. I am
> not aware of any kind of wildcard support.
>
> > But as result I only have a new attr_3 field (the field attr_1 is still
> > there)
> >
> >       {
> >          "id":"aaa",
> >          "value_i":10,
> >          "attr_1":["a"],
> >          "attr_3":["x"]
> >       }
> >
> > So it seem that, for this particular case, I have first to read the
> > document and then I can update it.
> >
> > Do you think there are other options?
> > Can I use the StatelessScriptUpdateProcessorFactory ?
> > Should I write my own UpdateProcessor ?
>
> Thanks,
> Shawn
>
>

-- 
Vincenzo D'Amore

Re: Update partial document

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/11/2018 10:23 AM, Vincenzo D'Amore wrote:
> I suppose to be able to remove attr_1 and add attr_3 with one atomic update.
>
> Like this:
>
> curl -X POST -H 'Content-Type: application/json' '
> http://localhost:8983/solr/gettingstarted/update?versions=true&commit=true'
> --data-binary '
>   [
>      {
>        "id" : "aaa" ,
>        "attr_" : [ "set" : null ],
>        "attr_3" : [ "set" : "x" ]
>      }
> ]'

This would probably have worked if you had used "attr_1" instead of 
"attr_".  There is no field named "attr_" in your document, so that line 
does nothing.  Fields in atomic updates must be fully specified. I am 
not aware of any kind of wildcard support.

> But as result I only have a new attr_3 field (the field attr_1 is still
> there)
>
>       {
>          "id":"aaa",
>          "value_i":10,
>          "attr_1":["a"],
>          "attr_3":["x"]
>       }
>
> So it seem that, for this particular case, I have first to read the
> document and then I can update it.
>
> Do you think there are other options?
> Can I use the StatelessScriptUpdateProcessorFactory ?
> Should I write my own UpdateProcessor ?

Thanks,
Shawn


Re: Update partial document

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Vincenzo.

What about adding 1 into      "attr_" : [ "set" : null ], ?

On Tue, Sep 11, 2018 at 7:23 PM Vincenzo D'Amore <v....@gmail.com> wrote:

> Hi Solr gurus :)
>
> I have a delicious question (that I'm struggling with), really hope that
> someone can help me.
>
> There is a document with many fields but I have to modify only few of them.
>
> I thought to use atomic update but it seems that I cannot replace an entire
> list of dynamic fields.
>
> Here I try to explain my problem, for example using the schemaless
> configuration, I have a dynamic field:
>
> <dynamicField name="attr_*" type="text_general" indexed="true"
> stored="true" multiValued="true"/>
>
> And then I have a document :
>
>      {
>         "id":"aaa",
>         "value_i":10,
>         "attr_1":["a"]
>      }
>
> I suppose to be able to remove attr_1 and add attr_3 with one atomic
> update.
>
> Like this:
>
> curl -X POST -H 'Content-Type: application/json' '
> http://localhost:8983/solr/gettingstarted/update?versions=true&commit=true
> '
> --data-binary '
>  [
>     {
>       "id" : "aaa" ,
>       "attr_" : [ "set" : null ],
>       "attr_3" : [ "set" : "x" ]
>     }
> ]'
>
> But as result I only have a new attr_3 field (the field attr_1 is still
> there)
>
>      {
>         "id":"aaa",
>         "value_i":10,
>         "attr_1":["a"],
>         "attr_3":["x"]
>      }
>
> So it seem that, for this particular case, I have first to read the
> document and then I can update it.
>
> Do you think there are other options?
> Can I use the StatelessScriptUpdateProcessorFactory ?
> Should I write my own UpdateProcessor ?
>
> Thanks in advance for your time.
> Vincenzo
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev