You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Dan ." <ro...@gmail.com> on 2017/05/04 13:40:37 UTC

in-place atomic updates for numeric docValue field

Hi,

I have a field like this:

<fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
<field name="popularity" type="integer" indexed="false" stored="false"
docValues="true" multiValued="false"/>

so I can do a fast in-place atomic updates

However if I do e.g.

curl -H 'Content-Type: application/json'
'http://localhost:8983/solr/collection/update?commit=true'
--data-binary '
[{
 "id":"my_id",
 "popularity":{"set":null}
}]'

then I'd expect the popularity field to be removed, however it's not.

I this a bug? or is there a know workaround for this for in-place atomic
updates?

Cheers,
Dan

Re: in-place atomic updates for numeric docValue field

Posted by Emir Arnautovic <em...@sematext.com>.
Hi Dan,

In-place updates are working because index size does not change. Atomic 
(or any other updates) are flagging existing doc as deleted and writing 
it again, so even if it removes some fields, such updates are making 
index larger until segment with deleted doc is merged.

In-place updates are making changes of existing doc values file - think 
of it as 4B that are updated - some value has to be set. Removing it 
would require all bytes after to be shifted = creating new file. That's 
why in-place updates work only for numeric fields (fixed width) and 
supports only set and inc.

Emir


On 04.05.2017 16:57, Dan . wrote:
> Hi Emir,
>
> Yes I though of representing -1 as null, but  this makes the index
> unnecessarily larger, particularly if we have to default all docs to this
> value.
>
> Cheers,
> Dan
>
> On 4 May 2017 at 15:16, Emir Arnautovic <em...@sematext.com>
> wrote:
>
>> Hi Dan,
>>
>> Remove does not make sense when it comes to in-place updates of docValues
>> - it has to have some value, so only thing that you can do is introduce
>> some int value as null.
>>
>> HTH,
>> Emir
>>
>>
>>
>> On 04.05.2017 15:40, Dan . wrote:
>>
>>> Hi,
>>>
>>> I have a field like this:
>>>
>>> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
>>> <field name="popularity" type="integer" indexed="false" stored="false"
>>> docValues="true" multiValued="false"/>
>>>
>>> so I can do a fast in-place atomic updates
>>>
>>> However if I do e.g.
>>>
>>> curl -H 'Content-Type: application/json'
>>> 'http://localhost:8983/solr/collection/update?commit=true'
>>> --data-binary '
>>> [{
>>>    "id":"my_id",
>>>    "popularity":{"set":null}
>>> }]'
>>>
>>> then I'd expect the popularity field to be removed, however it's not.
>>>
>>> I this a bug? or is there a know workaround for this for in-place atomic
>>> updates?
>>>
>>> Cheers,
>>> Dan
>>>
>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: in-place atomic updates for numeric docValue field

Posted by "Dan ." <ro...@gmail.com>.
Hi Emir,

Yes I though of representing -1 as null, but  this makes the index
unnecessarily larger, particularly if we have to default all docs to this
value.

Cheers,
Dan

On 4 May 2017 at 15:16, Emir Arnautovic <em...@sematext.com>
wrote:

> Hi Dan,
>
> Remove does not make sense when it comes to in-place updates of docValues
> - it has to have some value, so only thing that you can do is introduce
> some int value as null.
>
> HTH,
> Emir
>
>
>
> On 04.05.2017 15:40, Dan . wrote:
>
>> Hi,
>>
>> I have a field like this:
>>
>> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
>> <field name="popularity" type="integer" indexed="false" stored="false"
>> docValues="true" multiValued="false"/>
>>
>> so I can do a fast in-place atomic updates
>>
>> However if I do e.g.
>>
>> curl -H 'Content-Type: application/json'
>> 'http://localhost:8983/solr/collection/update?commit=true'
>> --data-binary '
>> [{
>>   "id":"my_id",
>>   "popularity":{"set":null}
>> }]'
>>
>> then I'd expect the popularity field to be removed, however it's not.
>>
>> I this a bug? or is there a know workaround for this for in-place atomic
>> updates?
>>
>> Cheers,
>> Dan
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Re: in-place atomic updates for numeric docValue field

Posted by Emir Arnautovic <em...@sematext.com>.
Hi Dan,

Remove does not make sense when it comes to in-place updates of 
docValues - it has to have some value, so only thing that you can do is 
introduce some int value as null.

HTH,
Emir


On 04.05.2017 15:40, Dan . wrote:
> Hi,
>
> I have a field like this:
>
> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
> <field name="popularity" type="integer" indexed="false" stored="false"
> docValues="true" multiValued="false"/>
>
> so I can do a fast in-place atomic updates
>
> However if I do e.g.
>
> curl -H 'Content-Type: application/json'
> 'http://localhost:8983/solr/collection/update?commit=true'
> --data-binary '
> [{
>   "id":"my_id",
>   "popularity":{"set":null}
> }]'
>
> then I'd expect the popularity field to be removed, however it's not.
>
> I this a bug? or is there a know workaround for this for in-place atomic
> updates?
>
> Cheers,
> Dan
>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: in-place atomic updates for numeric docValue field

Posted by Rick Leir <rl...@leirtech.com>.
Dan, If you don't mind using a float format DocValues field, you could 
store NAN (Not A Number). But any sorting operations would be slower, or 
size comparisons. Floats might be entirely inappropriate, but I thought 
it is worth a mention. Cheers -- Rick


On 2017-05-04 10:55 AM, Dan . wrote:
> Hi Shawn,
>
> Thanks for the suggestion.
>
> I gave that a try but unfortunately it didn't work.
>
> Delete somehow would be really useful, seems wasteful to have e.g. -1
> representing null.
>
> Cheers,
> Dan
>
> On 4 May 2017 at 15:30, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 5/4/2017 7:40 AM, Dan . wrote:
>>> I have a field like this:
>>>
>>> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
>>> <field name="popularity" type="integer" indexed="false" stored="false"
>>> docValues="true" multiValued="false"/>
>>>
>>> so I can do a fast in-place atomic updates
>>>
>>> However if I do e.g.
>>>
>>> curl -H 'Content-Type: application/json'
>>> 'http://localhost:8983/solr/collection/update?commit=true'
>>> --data-binary '
>>> [{
>>>   "id":"my_id",
>>>   "popularity":{"set":null}
>>> }]'
>>>
>>> then I'd expect the popularity field to be removed, however it's not.
>> I'm not really sure how that "null" value will be interpreted.  It's
>> entirely possible that this won't actually delete the field.
>>
>> I think we need a "delete" action for Atomic Updates, to entirely remove
>> the field regardless of what it currently contains.  There is "remove"
>> and "removeRegex", which MIGHT be enough, but I think delete would be
>> useful syntactic sugar.
>>
>> Dan, can you give the following update JSON a try instead?  I am not
>> guaranteeing that this will do the job, but given the current
>> functionality, I think this is the option most likely to work:
>>
>> {
>>   "id":"my_id",
>>   "popularity":{"removeRegex":".*"}
>> }
>>
>> Thanks,
>> Shawn
>>
>>


Re: in-place atomic updates for numeric docValue field

Posted by "Dan ." <ro...@gmail.com>.
Hi Shawn,

Thanks for the suggestion.

I gave that a try but unfortunately it didn't work.

Delete somehow would be really useful, seems wasteful to have e.g. -1
representing null.

Cheers,
Dan

On 4 May 2017 at 15:30, Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/4/2017 7:40 AM, Dan . wrote:
> > I have a field like this:
> >
> > <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
> > <field name="popularity" type="integer" indexed="false" stored="false"
> > docValues="true" multiValued="false"/>
> >
> > so I can do a fast in-place atomic updates
> >
> > However if I do e.g.
> >
> > curl -H 'Content-Type: application/json'
> > 'http://localhost:8983/solr/collection/update?commit=true'
> > --data-binary '
> > [{
> >  "id":"my_id",
> >  "popularity":{"set":null}
> > }]'
> >
> > then I'd expect the popularity field to be removed, however it's not.
>
> I'm not really sure how that "null" value will be interpreted.  It's
> entirely possible that this won't actually delete the field.
>
> I think we need a "delete" action for Atomic Updates, to entirely remove
> the field regardless of what it currently contains.  There is "remove"
> and "removeRegex", which MIGHT be enough, but I think delete would be
> useful syntactic sugar.
>
> Dan, can you give the following update JSON a try instead?  I am not
> guaranteeing that this will do the job, but given the current
> functionality, I think this is the option most likely to work:
>
> {
>  "id":"my_id",
>  "popularity":{"removeRegex":".*"}
> }
>
> Thanks,
> Shawn
>
>

Re: in-place atomic updates for numeric docValue field

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/4/2017 7:40 AM, Dan . wrote:
> I have a field like this:
>
> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
> <field name="popularity" type="integer" indexed="false" stored="false"
> docValues="true" multiValued="false"/>
>
> so I can do a fast in-place atomic updates
>
> However if I do e.g.
>
> curl -H 'Content-Type: application/json'
> 'http://localhost:8983/solr/collection/update?commit=true'
> --data-binary '
> [{
>  "id":"my_id",
>  "popularity":{"set":null}
> }]'
>
> then I'd expect the popularity field to be removed, however it's not.

I'm not really sure how that "null" value will be interpreted.  It's
entirely possible that this won't actually delete the field.

I think we need a "delete" action for Atomic Updates, to entirely remove
the field regardless of what it currently contains.  There is "remove"
and "removeRegex", which MIGHT be enough, but I think delete would be
useful syntactic sugar.

Dan, can you give the following update JSON a try instead?  I am not
guaranteeing that this will do the job, but given the current
functionality, I think this is the option most likely to work:

{
 "id":"my_id",
 "popularity":{"removeRegex":".*"}
}

Thanks,
Shawn