You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John Smith <so...@remailme.net> on 2015/10/07 15:32:02 UTC

Unexpected delayed document deletion with atomic updates

Hi,

I'm bumping on the following problem with update XML messages. The idea
is to record the number of clicks for a document: each time, a message
is sent to .../update such as this one:

<add>
<doc>
<field name="Id">abc</field>
<field name="Clicks" update="set">1</field>
<field name="Boost" update="set">1.05</field>
</doc>
</add>

(Clicks is an int field; Boost is a float field, it's updated to reflect
the change in popularity using a formula based on the number of clicks).

At the moment in the dev environment, changes are committed immediately.


When a document is updated, the changes are indeed reflected in the
search results. If I click on the same document again, all goes well.
But  when I click on an other document, the latter gets updated as
expected but the former is plainly deleted. It can no longer be found
and the admin core Overview page counts 1 document less. If I click on a
3rd document, so goes the 2nd one.


The schema is the default one amended to remove unneeded fields and add
new ones, nothing fancy. All fields are stored="true" and there's no
<copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
the same outcome. It looks like a bug to me but I might have overlooked
something? This is my first attempt at atomic updates.

Thanks,
John.


Re: Unexpected delayed document deletion with atomic updates

Posted by John Smith <so...@remailme.net>.
Hi Allessandro,

In the example I set the value to 1, but it's actually incremented in
the code, so with time it should go up. You're right though, I could use
an inc update instead.

John


On 08/10/15 16:45, Alessandro Benedetti wrote:
> Not related to the deletion problem, only as a curiosity for your use case :
>
> <field name="Clicks" update="set">1</field>
>
> Have i misunderstood your use case, or you should use :
>
> inc
>
> Increments a numeric value by a specific amount.
>
> Must be specified as a single numeric value.
>
> Basically overtime you click, you always set the value for that field to
> "1" .
> So a document with 1 click will be considered equal to one with 1000 clicks.
> My 2 cents
>
> Cheers
>
> On 8 October 2015 at 14:10, John Smith <so...@remailme.net> wrote:
>
>> Well, every day we update a lot of documents (usually several millions)
>> so the DIH is a good fit.
>>
>> Calling the update chain would make sense there: after all a data import
>> is just a batch update. Otherwise, the same operations would have to be
>> made upfront, possibly in another environment and/or language. That's
>> probably what I'm gonna do anyway.
>>
>> Thanks for your help!
>> John
>>
>>
>> On 08/10/15 13:39, Upayavira wrote:
>>> You can either specify the update chain via an update.chain request
>>> parameter, or you can configure a new request parameter with its own URL
>>> and separate update.chain value.
>>>
>>> I have no idea how you would then reference that in the DIH - I've never
>>> really used it.
>>>
>>> Upayavira
>>>
>>> On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
>>>> After some further investigation, for those interested: the
>>>> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
>>>> guess copied over from another collection). The initial import had been
>>>> made using a data import handler: I suppose the update chain isn't
>>>> called in this process and no signature field is created - am I right?.
>>>>
>>>> The first time a document was updated, a signature field with value
>>>> "0000000000000000" was added. The next time, the same signature was
>>>> generated for the new udpate, which triggered the deletion of all
>>>> documents with the same signature (i.e. the first one) as overwriteDupes
>>>> was set to true. Correct behavior but quite tricky...
>>>>
>>>> So my conclusion here (please correct me if I'm wrong) is of course to
>>>> fix the signature configuration problem, but also to manage calling the
>>>> update chain (or maybe a simplified one, e.g. by skipping logging) in
>>>> the data import handler. Is there an easy way to do this? Conceptually,
>>>> shouldn't the update chain be callable from the data import process -
>>>> maybe it is?
>>>>
>>>> John
>>>>
>>>>
>>>> On 08/10/15 09:43, Upayavira wrote:
>>>>> Yay!
>>>>>
>>>>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
>>>>>> Yes indeed, the update chain had been activated... I commented it out
>>>>>> again and the problem vanished.
>>>>>>
>>>>>> Good job, thanks Erick and Upayavira!
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 08/10/15 08:58, Upayavira wrote:
>>>>>>> Look for the DedupUpdateProcessor in an update chain.
>>>>>>>
>>>>>>> that is there, but commented out IIRC in the techproducts sample
>>>>>>> configs.
>>>>>>>
>>>>>>> Perhaps you uncommented it to use your own update processors, but
>> didn't
>>>>>>> remove that component?
>>>>>>>
>>>>>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>>>>>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>>>>>>>> INFO level, the update request just gets mentioned. No exception. I
>>>>>>>> reran it with the DEBUG level, but most of the log was related to
>> jetty.
>>>>>>>> Here's a line I noticed though:
>>>>>>>>
>>>>>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>>>>>>>> {wt=json&commit=true&update.chain=dedupe}
>>>>>>>>
>>>>>>>> The update.chain parameter wasn't part of the original request, and
>>>>>>>> "dedupe" looks suspicious to me. Perhaps should I investigate
>> further
>>>>>>>> there?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> John.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/10/15 08:25, John Smith wrote:
>>>>>>>>> The ids are all different: they're unique numbers followed by a
>> couple
>>>>>>>>> of keywords. I've made a test with a small collection of 10
>> documents to
>>>>>>>>> make sure I can manage them manually: all ids are confirmed as
>> different.
>>>>>>>>> I also dumped the exact command, here's one example:
>>>>>>>>>
>>>>>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago
>> shoes</field><field
>>>>>>>>> name="Clicks" update="set">1</field><field name="Boost"
>>>>>>>>> update="set">1.8701925463775</field></doc></add>
>>>>>>>>>
>>>>>>>>> It's sent as the body of a POST request to
>>>>>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true,
>> with a
>>>>>>>>> Content-Type: text/xml header. I still noted the consistent loss of
>>>>>>>>> another document with the update above.
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/10/15 00:38, Upayavira wrote:
>>>>>>>>>> What ID are you using? Are you possibly using the same ID field
>> for
>>>>>>>>>> both, so the second document you visit causes the first to be
>>>>>>>>>> overwritten?
>>>>>>>>>>
>>>>>>>>>> Upayavira
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>>>>>>>>>> This certainly should not be happening. I'd
>>>>>>>>>>> take a careful look at what you actually send.
>>>>>>>>>>> My _guess_ is that you're not sending the update
>>>>>>>>>>> command you think you are....
>>>>>>>>>>>
>>>>>>>>>>> As a test you could just curl (or use post.jar) to
>>>>>>>>>>> send these types of commands up individually.
>>>>>>>>>>>
>>>>>>>>>>> Perhaps looking at the solr log would help too...
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Erick
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <
>> solr-user@remailme.net>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm bumping on the following problem with update XML messages.
>> The idea
>>>>>>>>>>>> is to record the number of clicks for a document: each time, a
>> message
>>>>>>>>>>>> is sent to .../update such as this one:
>>>>>>>>>>>>
>>>>>>>>>>>> <add>
>>>>>>>>>>>> <doc>
>>>>>>>>>>>> <field name="Id">abc</field>
>>>>>>>>>>>> <field name="Clicks" update="set">1</field>
>>>>>>>>>>>> <field name="Boost" update="set">1.05</field>
>>>>>>>>>>>> </doc>
>>>>>>>>>>>> </add>
>>>>>>>>>>>>
>>>>>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated
>> to reflect
>>>>>>>>>>>> the change in popularity using a formula based on the number of
>> clicks).
>>>>>>>>>>>> At the moment in the dev environment, changes are committed
>> immediately.
>>>>>>>>>>>>
>>>>>>>>>>>> When a document is updated, the changes are indeed reflected in
>> the
>>>>>>>>>>>> search results. If I click on the same document again, all goes
>> well.
>>>>>>>>>>>> But  when I click on an other document, the latter gets updated
>> as
>>>>>>>>>>>> expected but the former is plainly deleted. It can no longer be
>> found
>>>>>>>>>>>> and the admin core Overview page counts 1 document less. If I
>> click on a
>>>>>>>>>>>> 3rd document, so goes the 2nd one.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The schema is the default one amended to remove unneeded fields
>> and add
>>>>>>>>>>>> new ones, nothing fancy. All fields are stored="true" and
>> there's no
>>>>>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone
>> mode, with
>>>>>>>>>>>> the same outcome. It looks like a bug to me but I might have
>> overlooked
>>>>>>>>>>>> something? This is my first attempt at atomic updates.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> John.
>>>>>>>>>>>>
>>
>


Re: Unexpected delayed document deletion with atomic updates

Posted by Alessandro Benedetti <be...@gmail.com>.
Not related to the deletion problem, only as a curiosity for your use case :

<field name="Clicks" update="set">1</field>

Have i misunderstood your use case, or you should use :

inc

Increments a numeric value by a specific amount.

Must be specified as a single numeric value.

Basically overtime you click, you always set the value for that field to
"1" .
So a document with 1 click will be considered equal to one with 1000 clicks.
My 2 cents

Cheers

On 8 October 2015 at 14:10, John Smith <so...@remailme.net> wrote:

> Well, every day we update a lot of documents (usually several millions)
> so the DIH is a good fit.
>
> Calling the update chain would make sense there: after all a data import
> is just a batch update. Otherwise, the same operations would have to be
> made upfront, possibly in another environment and/or language. That's
> probably what I'm gonna do anyway.
>
> Thanks for your help!
> John
>
>
> On 08/10/15 13:39, Upayavira wrote:
> > You can either specify the update chain via an update.chain request
> > parameter, or you can configure a new request parameter with its own URL
> > and separate update.chain value.
> >
> > I have no idea how you would then reference that in the DIH - I've never
> > really used it.
> >
> > Upayavira
> >
> > On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
> >> After some further investigation, for those interested: the
> >> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
> >> guess copied over from another collection). The initial import had been
> >> made using a data import handler: I suppose the update chain isn't
> >> called in this process and no signature field is created - am I right?.
> >>
> >> The first time a document was updated, a signature field with value
> >> "0000000000000000" was added. The next time, the same signature was
> >> generated for the new udpate, which triggered the deletion of all
> >> documents with the same signature (i.e. the first one) as overwriteDupes
> >> was set to true. Correct behavior but quite tricky...
> >>
> >> So my conclusion here (please correct me if I'm wrong) is of course to
> >> fix the signature configuration problem, but also to manage calling the
> >> update chain (or maybe a simplified one, e.g. by skipping logging) in
> >> the data import handler. Is there an easy way to do this? Conceptually,
> >> shouldn't the update chain be callable from the data import process -
> >> maybe it is?
> >>
> >> John
> >>
> >>
> >> On 08/10/15 09:43, Upayavira wrote:
> >>> Yay!
> >>>
> >>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
> >>>> Yes indeed, the update chain had been activated... I commented it out
> >>>> again and the problem vanished.
> >>>>
> >>>> Good job, thanks Erick and Upayavira!
> >>>> John
> >>>>
> >>>>
> >>>> On 08/10/15 08:58, Upayavira wrote:
> >>>>> Look for the DedupUpdateProcessor in an update chain.
> >>>>>
> >>>>> that is there, but commented out IIRC in the techproducts sample
> >>>>> configs.
> >>>>>
> >>>>> Perhaps you uncommented it to use your own update processors, but
> didn't
> >>>>> remove that component?
> >>>>>
> >>>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
> >>>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
> >>>>>> INFO level, the update request just gets mentioned. No exception. I
> >>>>>> reran it with the DEBUG level, but most of the log was related to
> jetty.
> >>>>>> Here's a line I noticed though:
> >>>>>>
> >>>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
> >>>>>> {wt=json&commit=true&update.chain=dedupe}
> >>>>>>
> >>>>>> The update.chain parameter wasn't part of the original request, and
> >>>>>> "dedupe" looks suspicious to me. Perhaps should I investigate
> further
> >>>>>> there?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> John.
> >>>>>>
> >>>>>>
> >>>>>> On 08/10/15 08:25, John Smith wrote:
> >>>>>>> The ids are all different: they're unique numbers followed by a
> couple
> >>>>>>> of keywords. I've made a test with a small collection of 10
> documents to
> >>>>>>> make sure I can manage them manually: all ids are confirmed as
> different.
> >>>>>>>
> >>>>>>> I also dumped the exact command, here's one example:
> >>>>>>>
> >>>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago
> shoes</field><field
> >>>>>>> name="Clicks" update="set">1</field><field name="Boost"
> >>>>>>> update="set">1.8701925463775</field></doc></add>
> >>>>>>>
> >>>>>>> It's sent as the body of a POST request to
> >>>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true,
> with a
> >>>>>>> Content-Type: text/xml header. I still noted the consistent loss of
> >>>>>>> another document with the update above.
> >>>>>>>
> >>>>>>> John
> >>>>>>>
> >>>>>>>
> >>>>>>> On 08/10/15 00:38, Upayavira wrote:
> >>>>>>>> What ID are you using? Are you possibly using the same ID field
> for
> >>>>>>>> both, so the second document you visit causes the first to be
> >>>>>>>> overwritten?
> >>>>>>>>
> >>>>>>>> Upayavira
> >>>>>>>>
> >>>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> >>>>>>>>> This certainly should not be happening. I'd
> >>>>>>>>> take a careful look at what you actually send.
> >>>>>>>>> My _guess_ is that you're not sending the update
> >>>>>>>>> command you think you are....
> >>>>>>>>>
> >>>>>>>>> As a test you could just curl (or use post.jar) to
> >>>>>>>>> send these types of commands up individually.
> >>>>>>>>>
> >>>>>>>>> Perhaps looking at the solr log would help too...
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Erick
> >>>>>>>>>
> >>>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <
> solr-user@remailme.net>
> >>>>>>>>> wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I'm bumping on the following problem with update XML messages.
> The idea
> >>>>>>>>>> is to record the number of clicks for a document: each time, a
> message
> >>>>>>>>>> is sent to .../update such as this one:
> >>>>>>>>>>
> >>>>>>>>>> <add>
> >>>>>>>>>> <doc>
> >>>>>>>>>> <field name="Id">abc</field>
> >>>>>>>>>> <field name="Clicks" update="set">1</field>
> >>>>>>>>>> <field name="Boost" update="set">1.05</field>
> >>>>>>>>>> </doc>
> >>>>>>>>>> </add>
> >>>>>>>>>>
> >>>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated
> to reflect
> >>>>>>>>>> the change in popularity using a formula based on the number of
> clicks).
> >>>>>>>>>>
> >>>>>>>>>> At the moment in the dev environment, changes are committed
> immediately.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> When a document is updated, the changes are indeed reflected in
> the
> >>>>>>>>>> search results. If I click on the same document again, all goes
> well.
> >>>>>>>>>> But  when I click on an other document, the latter gets updated
> as
> >>>>>>>>>> expected but the former is plainly deleted. It can no longer be
> found
> >>>>>>>>>> and the admin core Overview page counts 1 document less. If I
> click on a
> >>>>>>>>>> 3rd document, so goes the 2nd one.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> The schema is the default one amended to remove unneeded fields
> and add
> >>>>>>>>>> new ones, nothing fancy. All fields are stored="true" and
> there's no
> >>>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone
> mode, with
> >>>>>>>>>> the same outcome. It looks like a bug to me but I might have
> overlooked
> >>>>>>>>>> something? This is my first attempt at atomic updates.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> John.
> >>>>>>>>>>
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Unexpected delayed document deletion with atomic updates

Posted by John Smith <so...@remailme.net>.
Well, every day we update a lot of documents (usually several millions)
so the DIH is a good fit.

Calling the update chain would make sense there: after all a data import
is just a batch update. Otherwise, the same operations would have to be
made upfront, possibly in another environment and/or language. That's
probably what I'm gonna do anyway.

Thanks for your help!
John


On 08/10/15 13:39, Upayavira wrote:
> You can either specify the update chain via an update.chain request
> parameter, or you can configure a new request parameter with its own URL
> and separate update.chain value. 
>
> I have no idea how you would then reference that in the DIH - I've never
> really used it.
>
> Upayavira
>
> On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
>> After some further investigation, for those interested: the
>> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
>> guess copied over from another collection). The initial import had been
>> made using a data import handler: I suppose the update chain isn't
>> called in this process and no signature field is created - am I right?.
>>
>> The first time a document was updated, a signature field with value
>> "0000000000000000" was added. The next time, the same signature was
>> generated for the new udpate, which triggered the deletion of all
>> documents with the same signature (i.e. the first one) as overwriteDupes
>> was set to true. Correct behavior but quite tricky...
>>
>> So my conclusion here (please correct me if I'm wrong) is of course to
>> fix the signature configuration problem, but also to manage calling the
>> update chain (or maybe a simplified one, e.g. by skipping logging) in
>> the data import handler. Is there an easy way to do this? Conceptually,
>> shouldn't the update chain be callable from the data import process -
>> maybe it is?
>>
>> John
>>
>>
>> On 08/10/15 09:43, Upayavira wrote:
>>> Yay!
>>>
>>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
>>>> Yes indeed, the update chain had been activated... I commented it out
>>>> again and the problem vanished.
>>>>
>>>> Good job, thanks Erick and Upayavira!
>>>> John
>>>>
>>>>
>>>> On 08/10/15 08:58, Upayavira wrote:
>>>>> Look for the DedupUpdateProcessor in an update chain.
>>>>>
>>>>> that is there, but commented out IIRC in the techproducts sample
>>>>> configs.
>>>>>
>>>>> Perhaps you uncommented it to use your own update processors, but didn't
>>>>> remove that component?
>>>>>
>>>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>>>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>>>>>> INFO level, the update request just gets mentioned. No exception. I
>>>>>> reran it with the DEBUG level, but most of the log was related to jetty.
>>>>>> Here's a line I noticed though:
>>>>>>
>>>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>>>>>> {wt=json&commit=true&update.chain=dedupe}
>>>>>>
>>>>>> The update.chain parameter wasn't part of the original request, and
>>>>>> "dedupe" looks suspicious to me. Perhaps should I investigate further
>>>>>> there?
>>>>>>
>>>>>> Thanks,
>>>>>> John.
>>>>>>
>>>>>>
>>>>>> On 08/10/15 08:25, John Smith wrote:
>>>>>>> The ids are all different: they're unique numbers followed by a couple
>>>>>>> of keywords. I've made a test with a small collection of 10 documents to
>>>>>>> make sure I can manage them manually: all ids are confirmed as different.
>>>>>>>
>>>>>>> I also dumped the exact command, here's one example:
>>>>>>>
>>>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
>>>>>>> name="Clicks" update="set">1</field><field name="Boost"
>>>>>>> update="set">1.8701925463775</field></doc></add>
>>>>>>>
>>>>>>> It's sent as the body of a POST request to
>>>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
>>>>>>> Content-Type: text/xml header. I still noted the consistent loss of
>>>>>>> another document with the update above.
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 08/10/15 00:38, Upayavira wrote:
>>>>>>>> What ID are you using? Are you possibly using the same ID field for
>>>>>>>> both, so the second document you visit causes the first to be
>>>>>>>> overwritten?
>>>>>>>>
>>>>>>>> Upayavira
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>>>>>>>> This certainly should not be happening. I'd
>>>>>>>>> take a careful look at what you actually send.
>>>>>>>>> My _guess_ is that you're not sending the update
>>>>>>>>> command you think you are....
>>>>>>>>>
>>>>>>>>> As a test you could just curl (or use post.jar) to
>>>>>>>>> send these types of commands up individually.
>>>>>>>>>
>>>>>>>>> Perhaps looking at the solr log would help too...
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Erick
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net>
>>>>>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm bumping on the following problem with update XML messages. The idea
>>>>>>>>>> is to record the number of clicks for a document: each time, a message
>>>>>>>>>> is sent to .../update such as this one:
>>>>>>>>>>
>>>>>>>>>> <add>
>>>>>>>>>> <doc>
>>>>>>>>>> <field name="Id">abc</field>
>>>>>>>>>> <field name="Clicks" update="set">1</field>
>>>>>>>>>> <field name="Boost" update="set">1.05</field>
>>>>>>>>>> </doc>
>>>>>>>>>> </add>
>>>>>>>>>>
>>>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated to reflect
>>>>>>>>>> the change in popularity using a formula based on the number of clicks).
>>>>>>>>>>
>>>>>>>>>> At the moment in the dev environment, changes are committed immediately.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> When a document is updated, the changes are indeed reflected in the
>>>>>>>>>> search results. If I click on the same document again, all goes well.
>>>>>>>>>> But  when I click on an other document, the latter gets updated as
>>>>>>>>>> expected but the former is plainly deleted. It can no longer be found
>>>>>>>>>> and the admin core Overview page counts 1 document less. If I click on a
>>>>>>>>>> 3rd document, so goes the 2nd one.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The schema is the default one amended to remove unneeded fields and add
>>>>>>>>>> new ones, nothing fancy. All fields are stored="true" and there's no
>>>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
>>>>>>>>>> the same outcome. It looks like a bug to me but I might have overlooked
>>>>>>>>>> something? This is my first attempt at atomic updates.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> John.
>>>>>>>>>>


Re: Unexpected delayed document deletion with atomic updates

Posted by Upayavira <uv...@odoko.co.uk>.
You can either specify the update chain via an update.chain request
parameter, or you can configure a new request parameter with its own URL
and separate update.chain value. 

I have no idea how you would then reference that in the DIH - I've never
really used it.

Upayavira

On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
> After some further investigation, for those interested: the
> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
> guess copied over from another collection). The initial import had been
> made using a data import handler: I suppose the update chain isn't
> called in this process and no signature field is created - am I right?.
> 
> The first time a document was updated, a signature field with value
> "0000000000000000" was added. The next time, the same signature was
> generated for the new udpate, which triggered the deletion of all
> documents with the same signature (i.e. the first one) as overwriteDupes
> was set to true. Correct behavior but quite tricky...
> 
> So my conclusion here (please correct me if I'm wrong) is of course to
> fix the signature configuration problem, but also to manage calling the
> update chain (or maybe a simplified one, e.g. by skipping logging) in
> the data import handler. Is there an easy way to do this? Conceptually,
> shouldn't the update chain be callable from the data import process -
> maybe it is?
> 
> John
> 
> 
> On 08/10/15 09:43, Upayavira wrote:
> > Yay!
> >
> > On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
> >> Yes indeed, the update chain had been activated... I commented it out
> >> again and the problem vanished.
> >>
> >> Good job, thanks Erick and Upayavira!
> >> John
> >>
> >>
> >> On 08/10/15 08:58, Upayavira wrote:
> >>> Look for the DedupUpdateProcessor in an update chain.
> >>>
> >>> that is there, but commented out IIRC in the techproducts sample
> >>> configs.
> >>>
> >>> Perhaps you uncommented it to use your own update processors, but didn't
> >>> remove that component?
> >>>
> >>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
> >>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
> >>>> INFO level, the update request just gets mentioned. No exception. I
> >>>> reran it with the DEBUG level, but most of the log was related to jetty.
> >>>> Here's a line I noticed though:
> >>>>
> >>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
> >>>> {wt=json&commit=true&update.chain=dedupe}
> >>>>
> >>>> The update.chain parameter wasn't part of the original request, and
> >>>> "dedupe" looks suspicious to me. Perhaps should I investigate further
> >>>> there?
> >>>>
> >>>> Thanks,
> >>>> John.
> >>>>
> >>>>
> >>>> On 08/10/15 08:25, John Smith wrote:
> >>>>> The ids are all different: they're unique numbers followed by a couple
> >>>>> of keywords. I've made a test with a small collection of 10 documents to
> >>>>> make sure I can manage them manually: all ids are confirmed as different.
> >>>>>
> >>>>> I also dumped the exact command, here's one example:
> >>>>>
> >>>>> <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
> >>>>> name="Clicks" update="set">1</field><field name="Boost"
> >>>>> update="set">1.8701925463775</field></doc></add>
> >>>>>
> >>>>> It's sent as the body of a POST request to
> >>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
> >>>>> Content-Type: text/xml header. I still noted the consistent loss of
> >>>>> another document with the update above.
> >>>>>
> >>>>> John
> >>>>>
> >>>>>
> >>>>> On 08/10/15 00:38, Upayavira wrote:
> >>>>>> What ID are you using? Are you possibly using the same ID field for
> >>>>>> both, so the second document you visit causes the first to be
> >>>>>> overwritten?
> >>>>>>
> >>>>>> Upayavira
> >>>>>>
> >>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> >>>>>>> This certainly should not be happening. I'd
> >>>>>>> take a careful look at what you actually send.
> >>>>>>> My _guess_ is that you're not sending the update
> >>>>>>> command you think you are....
> >>>>>>>
> >>>>>>> As a test you could just curl (or use post.jar) to
> >>>>>>> send these types of commands up individually.
> >>>>>>>
> >>>>>>> Perhaps looking at the solr log would help too...
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Erick
> >>>>>>>
> >>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net>
> >>>>>>> wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I'm bumping on the following problem with update XML messages. The idea
> >>>>>>>> is to record the number of clicks for a document: each time, a message
> >>>>>>>> is sent to .../update such as this one:
> >>>>>>>>
> >>>>>>>> <add>
> >>>>>>>> <doc>
> >>>>>>>> <field name="Id">abc</field>
> >>>>>>>> <field name="Clicks" update="set">1</field>
> >>>>>>>> <field name="Boost" update="set">1.05</field>
> >>>>>>>> </doc>
> >>>>>>>> </add>
> >>>>>>>>
> >>>>>>>> (Clicks is an int field; Boost is a float field, it's updated to reflect
> >>>>>>>> the change in popularity using a formula based on the number of clicks).
> >>>>>>>>
> >>>>>>>> At the moment in the dev environment, changes are committed immediately.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> When a document is updated, the changes are indeed reflected in the
> >>>>>>>> search results. If I click on the same document again, all goes well.
> >>>>>>>> But  when I click on an other document, the latter gets updated as
> >>>>>>>> expected but the former is plainly deleted. It can no longer be found
> >>>>>>>> and the admin core Overview page counts 1 document less. If I click on a
> >>>>>>>> 3rd document, so goes the 2nd one.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The schema is the default one amended to remove unneeded fields and add
> >>>>>>>> new ones, nothing fancy. All fields are stored="true" and there's no
> >>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
> >>>>>>>> the same outcome. It looks like a bug to me but I might have overlooked
> >>>>>>>> something? This is my first attempt at atomic updates.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> John.
> >>>>>>>>

Re: Unexpected delayed document deletion with atomic updates

Posted by John Smith <so...@remailme.net>.
After some further investigation, for those interested: the
SignatureUpdateProcessorFactory fields were somehow mis-configured (I
guess copied over from another collection). The initial import had been
made using a data import handler: I suppose the update chain isn't
called in this process and no signature field is created - am I right?.

The first time a document was updated, a signature field with value
"0000000000000000" was added. The next time, the same signature was
generated for the new udpate, which triggered the deletion of all
documents with the same signature (i.e. the first one) as overwriteDupes
was set to true. Correct behavior but quite tricky...

So my conclusion here (please correct me if I'm wrong) is of course to
fix the signature configuration problem, but also to manage calling the
update chain (or maybe a simplified one, e.g. by skipping logging) in
the data import handler. Is there an easy way to do this? Conceptually,
shouldn't the update chain be callable from the data import process -
maybe it is?

John


On 08/10/15 09:43, Upayavira wrote:
> Yay!
>
> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
>> Yes indeed, the update chain had been activated... I commented it out
>> again and the problem vanished.
>>
>> Good job, thanks Erick and Upayavira!
>> John
>>
>>
>> On 08/10/15 08:58, Upayavira wrote:
>>> Look for the DedupUpdateProcessor in an update chain.
>>>
>>> that is there, but commented out IIRC in the techproducts sample
>>> configs.
>>>
>>> Perhaps you uncommented it to use your own update processors, but didn't
>>> remove that component?
>>>
>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>>>> INFO level, the update request just gets mentioned. No exception. I
>>>> reran it with the DEBUG level, but most of the log was related to jetty.
>>>> Here's a line I noticed though:
>>>>
>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>>>> {wt=json&commit=true&update.chain=dedupe}
>>>>
>>>> The update.chain parameter wasn't part of the original request, and
>>>> "dedupe" looks suspicious to me. Perhaps should I investigate further
>>>> there?
>>>>
>>>> Thanks,
>>>> John.
>>>>
>>>>
>>>> On 08/10/15 08:25, John Smith wrote:
>>>>> The ids are all different: they're unique numbers followed by a couple
>>>>> of keywords. I've made a test with a small collection of 10 documents to
>>>>> make sure I can manage them manually: all ids are confirmed as different.
>>>>>
>>>>> I also dumped the exact command, here's one example:
>>>>>
>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
>>>>> name="Clicks" update="set">1</field><field name="Boost"
>>>>> update="set">1.8701925463775</field></doc></add>
>>>>>
>>>>> It's sent as the body of a POST request to
>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
>>>>> Content-Type: text/xml header. I still noted the consistent loss of
>>>>> another document with the update above.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On 08/10/15 00:38, Upayavira wrote:
>>>>>> What ID are you using? Are you possibly using the same ID field for
>>>>>> both, so the second document you visit causes the first to be
>>>>>> overwritten?
>>>>>>
>>>>>> Upayavira
>>>>>>
>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>>>>>> This certainly should not be happening. I'd
>>>>>>> take a careful look at what you actually send.
>>>>>>> My _guess_ is that you're not sending the update
>>>>>>> command you think you are....
>>>>>>>
>>>>>>> As a test you could just curl (or use post.jar) to
>>>>>>> send these types of commands up individually.
>>>>>>>
>>>>>>> Perhaps looking at the solr log would help too...
>>>>>>>
>>>>>>> Best,
>>>>>>> Erick
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm bumping on the following problem with update XML messages. The idea
>>>>>>>> is to record the number of clicks for a document: each time, a message
>>>>>>>> is sent to .../update such as this one:
>>>>>>>>
>>>>>>>> <add>
>>>>>>>> <doc>
>>>>>>>> <field name="Id">abc</field>
>>>>>>>> <field name="Clicks" update="set">1</field>
>>>>>>>> <field name="Boost" update="set">1.05</field>
>>>>>>>> </doc>
>>>>>>>> </add>
>>>>>>>>
>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated to reflect
>>>>>>>> the change in popularity using a formula based on the number of clicks).
>>>>>>>>
>>>>>>>> At the moment in the dev environment, changes are committed immediately.
>>>>>>>>
>>>>>>>>
>>>>>>>> When a document is updated, the changes are indeed reflected in the
>>>>>>>> search results. If I click on the same document again, all goes well.
>>>>>>>> But  when I click on an other document, the latter gets updated as
>>>>>>>> expected but the former is plainly deleted. It can no longer be found
>>>>>>>> and the admin core Overview page counts 1 document less. If I click on a
>>>>>>>> 3rd document, so goes the 2nd one.
>>>>>>>>
>>>>>>>>
>>>>>>>> The schema is the default one amended to remove unneeded fields and add
>>>>>>>> new ones, nothing fancy. All fields are stored="true" and there's no
>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
>>>>>>>> the same outcome. It looks like a bug to me but I might have overlooked
>>>>>>>> something? This is my first attempt at atomic updates.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> John.
>>>>>>>>

Re: Unexpected delayed document deletion with atomic updates

Posted by Upayavira <uv...@odoko.co.uk>.
Yay!

On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
> Yes indeed, the update chain had been activated... I commented it out
> again and the problem vanished.
> 
> Good job, thanks Erick and Upayavira!
> John
> 
> 
> On 08/10/15 08:58, Upayavira wrote:
> > Look for the DedupUpdateProcessor in an update chain.
> >
> > that is there, but commented out IIRC in the techproducts sample
> > configs.
> >
> > Perhaps you uncommented it to use your own update processors, but didn't
> > remove that component?
> >
> > On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
> >> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
> >> INFO level, the update request just gets mentioned. No exception. I
> >> reran it with the DEBUG level, but most of the log was related to jetty.
> >> Here's a line I noticed though:
> >>
> >> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
> >> {wt=json&commit=true&update.chain=dedupe}
> >>
> >> The update.chain parameter wasn't part of the original request, and
> >> "dedupe" looks suspicious to me. Perhaps should I investigate further
> >> there?
> >>
> >> Thanks,
> >> John.
> >>
> >>
> >> On 08/10/15 08:25, John Smith wrote:
> >>> The ids are all different: they're unique numbers followed by a couple
> >>> of keywords. I've made a test with a small collection of 10 documents to
> >>> make sure I can manage them manually: all ids are confirmed as different.
> >>>
> >>> I also dumped the exact command, here's one example:
> >>>
> >>> <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
> >>> name="Clicks" update="set">1</field><field name="Boost"
> >>> update="set">1.8701925463775</field></doc></add>
> >>>
> >>> It's sent as the body of a POST request to
> >>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
> >>> Content-Type: text/xml header. I still noted the consistent loss of
> >>> another document with the update above.
> >>>
> >>> John
> >>>
> >>>
> >>> On 08/10/15 00:38, Upayavira wrote:
> >>>> What ID are you using? Are you possibly using the same ID field for
> >>>> both, so the second document you visit causes the first to be
> >>>> overwritten?
> >>>>
> >>>> Upayavira
> >>>>
> >>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> >>>>> This certainly should not be happening. I'd
> >>>>> take a careful look at what you actually send.
> >>>>> My _guess_ is that you're not sending the update
> >>>>> command you think you are....
> >>>>>
> >>>>> As a test you could just curl (or use post.jar) to
> >>>>> send these types of commands up individually.
> >>>>>
> >>>>> Perhaps looking at the solr log would help too...
> >>>>>
> >>>>> Best,
> >>>>> Erick
> >>>>>
> >>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net>
> >>>>> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm bumping on the following problem with update XML messages. The idea
> >>>>>> is to record the number of clicks for a document: each time, a message
> >>>>>> is sent to .../update such as this one:
> >>>>>>
> >>>>>> <add>
> >>>>>> <doc>
> >>>>>> <field name="Id">abc</field>
> >>>>>> <field name="Clicks" update="set">1</field>
> >>>>>> <field name="Boost" update="set">1.05</field>
> >>>>>> </doc>
> >>>>>> </add>
> >>>>>>
> >>>>>> (Clicks is an int field; Boost is a float field, it's updated to reflect
> >>>>>> the change in popularity using a formula based on the number of clicks).
> >>>>>>
> >>>>>> At the moment in the dev environment, changes are committed immediately.
> >>>>>>
> >>>>>>
> >>>>>> When a document is updated, the changes are indeed reflected in the
> >>>>>> search results. If I click on the same document again, all goes well.
> >>>>>> But  when I click on an other document, the latter gets updated as
> >>>>>> expected but the former is plainly deleted. It can no longer be found
> >>>>>> and the admin core Overview page counts 1 document less. If I click on a
> >>>>>> 3rd document, so goes the 2nd one.
> >>>>>>
> >>>>>>
> >>>>>> The schema is the default one amended to remove unneeded fields and add
> >>>>>> new ones, nothing fancy. All fields are stored="true" and there's no
> >>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
> >>>>>> the same outcome. It looks like a bug to me but I might have overlooked
> >>>>>> something? This is my first attempt at atomic updates.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> John.
> >>>>>>
> 

Re: Unexpected delayed document deletion with atomic updates

Posted by John Smith <so...@remailme.net>.
Yes indeed, the update chain had been activated... I commented it out
again and the problem vanished.

Good job, thanks Erick and Upayavira!
John


On 08/10/15 08:58, Upayavira wrote:
> Look for the DedupUpdateProcessor in an update chain.
>
> that is there, but commented out IIRC in the techproducts sample
> configs.
>
> Perhaps you uncommented it to use your own update processors, but didn't
> remove that component?
>
> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>> INFO level, the update request just gets mentioned. No exception. I
>> reran it with the DEBUG level, but most of the log was related to jetty.
>> Here's a line I noticed though:
>>
>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>> {wt=json&commit=true&update.chain=dedupe}
>>
>> The update.chain parameter wasn't part of the original request, and
>> "dedupe" looks suspicious to me. Perhaps should I investigate further
>> there?
>>
>> Thanks,
>> John.
>>
>>
>> On 08/10/15 08:25, John Smith wrote:
>>> The ids are all different: they're unique numbers followed by a couple
>>> of keywords. I've made a test with a small collection of 10 documents to
>>> make sure I can manage them manually: all ids are confirmed as different.
>>>
>>> I also dumped the exact command, here's one example:
>>>
>>> <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
>>> name="Clicks" update="set">1</field><field name="Boost"
>>> update="set">1.8701925463775</field></doc></add>
>>>
>>> It's sent as the body of a POST request to
>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
>>> Content-Type: text/xml header. I still noted the consistent loss of
>>> another document with the update above.
>>>
>>> John
>>>
>>>
>>> On 08/10/15 00:38, Upayavira wrote:
>>>> What ID are you using? Are you possibly using the same ID field for
>>>> both, so the second document you visit causes the first to be
>>>> overwritten?
>>>>
>>>> Upayavira
>>>>
>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>>>> This certainly should not be happening. I'd
>>>>> take a careful look at what you actually send.
>>>>> My _guess_ is that you're not sending the update
>>>>> command you think you are....
>>>>>
>>>>> As a test you could just curl (or use post.jar) to
>>>>> send these types of commands up individually.
>>>>>
>>>>> Perhaps looking at the solr log would help too...
>>>>>
>>>>> Best,
>>>>> Erick
>>>>>
>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net>
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm bumping on the following problem with update XML messages. The idea
>>>>>> is to record the number of clicks for a document: each time, a message
>>>>>> is sent to .../update such as this one:
>>>>>>
>>>>>> <add>
>>>>>> <doc>
>>>>>> <field name="Id">abc</field>
>>>>>> <field name="Clicks" update="set">1</field>
>>>>>> <field name="Boost" update="set">1.05</field>
>>>>>> </doc>
>>>>>> </add>
>>>>>>
>>>>>> (Clicks is an int field; Boost is a float field, it's updated to reflect
>>>>>> the change in popularity using a formula based on the number of clicks).
>>>>>>
>>>>>> At the moment in the dev environment, changes are committed immediately.
>>>>>>
>>>>>>
>>>>>> When a document is updated, the changes are indeed reflected in the
>>>>>> search results. If I click on the same document again, all goes well.
>>>>>> But  when I click on an other document, the latter gets updated as
>>>>>> expected but the former is plainly deleted. It can no longer be found
>>>>>> and the admin core Overview page counts 1 document less. If I click on a
>>>>>> 3rd document, so goes the 2nd one.
>>>>>>
>>>>>>
>>>>>> The schema is the default one amended to remove unneeded fields and add
>>>>>> new ones, nothing fancy. All fields are stored="true" and there's no
>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
>>>>>> the same outcome. It looks like a bug to me but I might have overlooked
>>>>>> something? This is my first attempt at atomic updates.
>>>>>>
>>>>>> Thanks,
>>>>>> John.
>>>>>>


Re: Unexpected delayed document deletion with atomic updates

Posted by Upayavira <uv...@odoko.co.uk>.
Look for the DedupUpdateProcessor in an update chain.

that is there, but commented out IIRC in the techproducts sample
configs.

Perhaps you uncommented it to use your own update processors, but didn't
remove that component?

On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
> INFO level, the update request just gets mentioned. No exception. I
> reran it with the DEBUG level, but most of the log was related to jetty.
> Here's a line I noticed though:
> 
> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
> {wt=json&commit=true&update.chain=dedupe}
> 
> The update.chain parameter wasn't part of the original request, and
> "dedupe" looks suspicious to me. Perhaps should I investigate further
> there?
> 
> Thanks,
> John.
> 
> 
> On 08/10/15 08:25, John Smith wrote:
> > The ids are all different: they're unique numbers followed by a couple
> > of keywords. I've made a test with a small collection of 10 documents to
> > make sure I can manage them manually: all ids are confirmed as different.
> >
> > I also dumped the exact command, here's one example:
> >
> > <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
> > name="Clicks" update="set">1</field><field name="Boost"
> > update="set">1.8701925463775</field></doc></add>
> >
> > It's sent as the body of a POST request to
> > http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
> > Content-Type: text/xml header. I still noted the consistent loss of
> > another document with the update above.
> >
> > John
> >
> >
> > On 08/10/15 00:38, Upayavira wrote:
> >> What ID are you using? Are you possibly using the same ID field for
> >> both, so the second document you visit causes the first to be
> >> overwritten?
> >>
> >> Upayavira
> >>
> >> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> >>> This certainly should not be happening. I'd
> >>> take a careful look at what you actually send.
> >>> My _guess_ is that you're not sending the update
> >>> command you think you are....
> >>>
> >>> As a test you could just curl (or use post.jar) to
> >>> send these types of commands up individually.
> >>>
> >>> Perhaps looking at the solr log would help too...
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net>
> >>> wrote:
> >>>> Hi,
> >>>>
> >>>> I'm bumping on the following problem with update XML messages. The idea
> >>>> is to record the number of clicks for a document: each time, a message
> >>>> is sent to .../update such as this one:
> >>>>
> >>>> <add>
> >>>> <doc>
> >>>> <field name="Id">abc</field>
> >>>> <field name="Clicks" update="set">1</field>
> >>>> <field name="Boost" update="set">1.05</field>
> >>>> </doc>
> >>>> </add>
> >>>>
> >>>> (Clicks is an int field; Boost is a float field, it's updated to reflect
> >>>> the change in popularity using a formula based on the number of clicks).
> >>>>
> >>>> At the moment in the dev environment, changes are committed immediately.
> >>>>
> >>>>
> >>>> When a document is updated, the changes are indeed reflected in the
> >>>> search results. If I click on the same document again, all goes well.
> >>>> But  when I click on an other document, the latter gets updated as
> >>>> expected but the former is plainly deleted. It can no longer be found
> >>>> and the admin core Overview page counts 1 document less. If I click on a
> >>>> 3rd document, so goes the 2nd one.
> >>>>
> >>>>
> >>>> The schema is the default one amended to remove unneeded fields and add
> >>>> new ones, nothing fancy. All fields are stored="true" and there's no
> >>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
> >>>> the same outcome. It looks like a bug to me but I might have overlooked
> >>>> something? This is my first attempt at atomic updates.
> >>>>
> >>>> Thanks,
> >>>> John.
> >>>>
> >
> 

Re: Unexpected delayed document deletion with atomic updates

Posted by John Smith <so...@remailme.net>.
Oh, I forgot Erick's mention of the logs: there's nothing unusual in
INFO level, the update request just gets mentioned. No exception. I
reran it with the DEBUG level, but most of the log was related to jetty.
Here's a line I noticed though:

org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
{wt=json&commit=true&update.chain=dedupe}

The update.chain parameter wasn't part of the original request, and
"dedupe" looks suspicious to me. Perhaps should I investigate further there?

Thanks,
John.


On 08/10/15 08:25, John Smith wrote:
> The ids are all different: they're unique numbers followed by a couple
> of keywords. I've made a test with a small collection of 10 documents to
> make sure I can manage them manually: all ids are confirmed as different.
>
> I also dumped the exact command, here's one example:
>
> <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
> name="Clicks" update="set">1</field><field name="Boost"
> update="set">1.8701925463775</field></doc></add>
>
> It's sent as the body of a POST request to
> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
> Content-Type: text/xml header. I still noted the consistent loss of
> another document with the update above.
>
> John
>
>
> On 08/10/15 00:38, Upayavira wrote:
>> What ID are you using? Are you possibly using the same ID field for
>> both, so the second document you visit causes the first to be
>> overwritten?
>>
>> Upayavira
>>
>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>> This certainly should not be happening. I'd
>>> take a careful look at what you actually send.
>>> My _guess_ is that you're not sending the update
>>> command you think you are....
>>>
>>> As a test you could just curl (or use post.jar) to
>>> send these types of commands up individually.
>>>
>>> Perhaps looking at the solr log would help too...
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net>
>>> wrote:
>>>> Hi,
>>>>
>>>> I'm bumping on the following problem with update XML messages. The idea
>>>> is to record the number of clicks for a document: each time, a message
>>>> is sent to .../update such as this one:
>>>>
>>>> <add>
>>>> <doc>
>>>> <field name="Id">abc</field>
>>>> <field name="Clicks" update="set">1</field>
>>>> <field name="Boost" update="set">1.05</field>
>>>> </doc>
>>>> </add>
>>>>
>>>> (Clicks is an int field; Boost is a float field, it's updated to reflect
>>>> the change in popularity using a formula based on the number of clicks).
>>>>
>>>> At the moment in the dev environment, changes are committed immediately.
>>>>
>>>>
>>>> When a document is updated, the changes are indeed reflected in the
>>>> search results. If I click on the same document again, all goes well.
>>>> But  when I click on an other document, the latter gets updated as
>>>> expected but the former is plainly deleted. It can no longer be found
>>>> and the admin core Overview page counts 1 document less. If I click on a
>>>> 3rd document, so goes the 2nd one.
>>>>
>>>>
>>>> The schema is the default one amended to remove unneeded fields and add
>>>> new ones, nothing fancy. All fields are stored="true" and there's no
>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
>>>> the same outcome. It looks like a bug to me but I might have overlooked
>>>> something? This is my first attempt at atomic updates.
>>>>
>>>> Thanks,
>>>> John.
>>>>
>


Re: Unexpected delayed document deletion with atomic updates

Posted by John Smith <so...@remailme.net>.
The ids are all different: they're unique numbers followed by a couple
of keywords. I've made a test with a small collection of 10 documents to
make sure I can manage them manually: all ids are confirmed as different.

I also dumped the exact command, here's one example:

<add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
name="Clicks" update="set">1</field><field name="Boost"
update="set">1.8701925463775</field></doc></add>

It's sent as the body of a POST request to
http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
Content-Type: text/xml header. I still noted the consistent loss of
another document with the update above.

John


On 08/10/15 00:38, Upayavira wrote:
> What ID are you using? Are you possibly using the same ID field for
> both, so the second document you visit causes the first to be
> overwritten?
>
> Upayavira
>
> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>> This certainly should not be happening. I'd
>> take a careful look at what you actually send.
>> My _guess_ is that you're not sending the update
>> command you think you are....
>>
>> As a test you could just curl (or use post.jar) to
>> send these types of commands up individually.
>>
>> Perhaps looking at the solr log would help too...
>>
>> Best,
>> Erick
>>
>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net>
>> wrote:
>>> Hi,
>>>
>>> I'm bumping on the following problem with update XML messages. The idea
>>> is to record the number of clicks for a document: each time, a message
>>> is sent to .../update such as this one:
>>>
>>> <add>
>>> <doc>
>>> <field name="Id">abc</field>
>>> <field name="Clicks" update="set">1</field>
>>> <field name="Boost" update="set">1.05</field>
>>> </doc>
>>> </add>
>>>
>>> (Clicks is an int field; Boost is a float field, it's updated to reflect
>>> the change in popularity using a formula based on the number of clicks).
>>>
>>> At the moment in the dev environment, changes are committed immediately.
>>>
>>>
>>> When a document is updated, the changes are indeed reflected in the
>>> search results. If I click on the same document again, all goes well.
>>> But  when I click on an other document, the latter gets updated as
>>> expected but the former is plainly deleted. It can no longer be found
>>> and the admin core Overview page counts 1 document less. If I click on a
>>> 3rd document, so goes the 2nd one.
>>>
>>>
>>> The schema is the default one amended to remove unneeded fields and add
>>> new ones, nothing fancy. All fields are stored="true" and there's no
>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
>>> the same outcome. It looks like a bug to me but I might have overlooked
>>> something? This is my first attempt at atomic updates.
>>>
>>> Thanks,
>>> John.
>>>


Re: Unexpected delayed document deletion with atomic updates

Posted by Upayavira <uv...@odoko.co.uk>.
What ID are you using? Are you possibly using the same ID field for
both, so the second document you visit causes the first to be
overwritten?

Upayavira

On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> This certainly should not be happening. I'd
> take a careful look at what you actually send.
> My _guess_ is that you're not sending the update
> command you think you are....
> 
> As a test you could just curl (or use post.jar) to
> send these types of commands up individually.
> 
> Perhaps looking at the solr log would help too...
> 
> Best,
> Erick
> 
> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net>
> wrote:
> > Hi,
> >
> > I'm bumping on the following problem with update XML messages. The idea
> > is to record the number of clicks for a document: each time, a message
> > is sent to .../update such as this one:
> >
> > <add>
> > <doc>
> > <field name="Id">abc</field>
> > <field name="Clicks" update="set">1</field>
> > <field name="Boost" update="set">1.05</field>
> > </doc>
> > </add>
> >
> > (Clicks is an int field; Boost is a float field, it's updated to reflect
> > the change in popularity using a formula based on the number of clicks).
> >
> > At the moment in the dev environment, changes are committed immediately.
> >
> >
> > When a document is updated, the changes are indeed reflected in the
> > search results. If I click on the same document again, all goes well.
> > But  when I click on an other document, the latter gets updated as
> > expected but the former is plainly deleted. It can no longer be found
> > and the admin core Overview page counts 1 document less. If I click on a
> > 3rd document, so goes the 2nd one.
> >
> >
> > The schema is the default one amended to remove unneeded fields and add
> > new ones, nothing fancy. All fields are stored="true" and there's no
> > <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
> > the same outcome. It looks like a bug to me but I might have overlooked
> > something? This is my first attempt at atomic updates.
> >
> > Thanks,
> > John.
> >

Re: Unexpected delayed document deletion with atomic updates

Posted by Erick Erickson <er...@gmail.com>.
This certainly should not be happening. I'd
take a careful look at what you actually send.
My _guess_ is that you're not sending the update
command you think you are....

As a test you could just curl (or use post.jar) to
send these types of commands up individually.

Perhaps looking at the solr log would help too...

Best,
Erick

On Wed, Oct 7, 2015 at 6:32 AM, John Smith <so...@remailme.net> wrote:
> Hi,
>
> I'm bumping on the following problem with update XML messages. The idea
> is to record the number of clicks for a document: each time, a message
> is sent to .../update such as this one:
>
> <add>
> <doc>
> <field name="Id">abc</field>
> <field name="Clicks" update="set">1</field>
> <field name="Boost" update="set">1.05</field>
> </doc>
> </add>
>
> (Clicks is an int field; Boost is a float field, it's updated to reflect
> the change in popularity using a formula based on the number of clicks).
>
> At the moment in the dev environment, changes are committed immediately.
>
>
> When a document is updated, the changes are indeed reflected in the
> search results. If I click on the same document again, all goes well.
> But  when I click on an other document, the latter gets updated as
> expected but the former is plainly deleted. It can no longer be found
> and the admin core Overview page counts 1 document less. If I click on a
> 3rd document, so goes the 2nd one.
>
>
> The schema is the default one amended to remove unneeded fields and add
> new ones, nothing fancy. All fields are stored="true" and there's no
> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
> the same outcome. It looks like a bug to me but I might have overlooked
> something? This is my first attempt at atomic updates.
>
> Thanks,
> John.
>