You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Dan ." <ro...@gmail.com> on 2017/05/04 13:40:37 UTC
in-place atomic updates for numeric docValue field
Hi,
I have a field like this:
<fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
<field name="popularity" type="integer" indexed="false" stored="false"
docValues="true" multiValued="false"/>
so I can do a fast in-place atomic updates
However if I do e.g.
curl -H 'Content-Type: application/json'
'http://localhost:8983/solr/collection/update?commit=true'
--data-binary '
[{
"id":"my_id",
"popularity":{"set":null}
}]'
then I'd expect the popularity field to be removed, however it's not.
I this a bug? or is there a know workaround for this for in-place atomic
updates?
Cheers,
Dan
Re: in-place atomic updates for numeric docValue field
Posted by Emir Arnautovic <em...@sematext.com>.
Hi Dan,
In-place updates are working because index size does not change. Atomic
(or any other updates) are flagging existing doc as deleted and writing
it again, so even if it removes some fields, such updates are making
index larger until segment with deleted doc is merged.
In-place updates are making changes of existing doc values file - think
of it as 4B that are updated - some value has to be set. Removing it
would require all bytes after to be shifted = creating new file. That's
why in-place updates work only for numeric fields (fixed width) and
supports only set and inc.
Emir
On 04.05.2017 16:57, Dan . wrote:
> Hi Emir,
>
> Yes I though of representing -1 as null, but this makes the index
> unnecessarily larger, particularly if we have to default all docs to this
> value.
>
> Cheers,
> Dan
>
> On 4 May 2017 at 15:16, Emir Arnautovic <em...@sematext.com>
> wrote:
>
>> Hi Dan,
>>
>> Remove does not make sense when it comes to in-place updates of docValues
>> - it has to have some value, so only thing that you can do is introduce
>> some int value as null.
>>
>> HTH,
>> Emir
>>
>>
>>
>> On 04.05.2017 15:40, Dan . wrote:
>>
>>> Hi,
>>>
>>> I have a field like this:
>>>
>>> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
>>> <field name="popularity" type="integer" indexed="false" stored="false"
>>> docValues="true" multiValued="false"/>
>>>
>>> so I can do a fast in-place atomic updates
>>>
>>> However if I do e.g.
>>>
>>> curl -H 'Content-Type: application/json'
>>> 'http://localhost:8983/solr/collection/update?commit=true'
>>> --data-binary '
>>> [{
>>> "id":"my_id",
>>> "popularity":{"set":null}
>>> }]'
>>>
>>> then I'd expect the popularity field to be removed, however it's not.
>>>
>>> I this a bug? or is there a know workaround for this for in-place atomic
>>> updates?
>>>
>>> Cheers,
>>> Dan
>>>
>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/
Re: in-place atomic updates for numeric docValue field
Posted by "Dan ." <ro...@gmail.com>.
Hi Emir,
Yes I though of representing -1 as null, but this makes the index
unnecessarily larger, particularly if we have to default all docs to this
value.
Cheers,
Dan
On 4 May 2017 at 15:16, Emir Arnautovic <em...@sematext.com>
wrote:
> Hi Dan,
>
> Remove does not make sense when it comes to in-place updates of docValues
> - it has to have some value, so only thing that you can do is introduce
> some int value as null.
>
> HTH,
> Emir
>
>
>
> On 04.05.2017 15:40, Dan . wrote:
>
>> Hi,
>>
>> I have a field like this:
>>
>> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
>> <field name="popularity" type="integer" indexed="false" stored="false"
>> docValues="true" multiValued="false"/>
>>
>> so I can do a fast in-place atomic updates
>>
>> However if I do e.g.
>>
>> curl -H 'Content-Type: application/json'
>> 'http://localhost:8983/solr/collection/update?commit=true'
>> --data-binary '
>> [{
>> "id":"my_id",
>> "popularity":{"set":null}
>> }]'
>>
>> then I'd expect the popularity field to be removed, however it's not.
>>
>> I this a bug? or is there a know workaround for this for in-place atomic
>> updates?
>>
>> Cheers,
>> Dan
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
Re: in-place atomic updates for numeric docValue field
Posted by Emir Arnautovic <em...@sematext.com>.
Hi Dan,
Remove does not make sense when it comes to in-place updates of
docValues - it has to have some value, so only thing that you can do is
introduce some int value as null.
HTH,
Emir
On 04.05.2017 15:40, Dan . wrote:
> Hi,
>
> I have a field like this:
>
> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
> <field name="popularity" type="integer" indexed="false" stored="false"
> docValues="true" multiValued="false"/>
>
> so I can do a fast in-place atomic updates
>
> However if I do e.g.
>
> curl -H 'Content-Type: application/json'
> 'http://localhost:8983/solr/collection/update?commit=true'
> --data-binary '
> [{
> "id":"my_id",
> "popularity":{"set":null}
> }]'
>
> then I'd expect the popularity field to be removed, however it's not.
>
> I this a bug? or is there a know workaround for this for in-place atomic
> updates?
>
> Cheers,
> Dan
>
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/
Re: in-place atomic updates for numeric docValue field
Posted by Rick Leir <rl...@leirtech.com>.
Dan, If you don't mind using a float format DocValues field, you could
store NAN (Not A Number). But any sorting operations would be slower, or
size comparisons. Floats might be entirely inappropriate, but I thought
it is worth a mention. Cheers -- Rick
On 2017-05-04 10:55 AM, Dan . wrote:
> Hi Shawn,
>
> Thanks for the suggestion.
>
> I gave that a try but unfortunately it didn't work.
>
> Delete somehow would be really useful, seems wasteful to have e.g. -1
> representing null.
>
> Cheers,
> Dan
>
> On 4 May 2017 at 15:30, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 5/4/2017 7:40 AM, Dan . wrote:
>>> I have a field like this:
>>>
>>> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
>>> <field name="popularity" type="integer" indexed="false" stored="false"
>>> docValues="true" multiValued="false"/>
>>>
>>> so I can do a fast in-place atomic updates
>>>
>>> However if I do e.g.
>>>
>>> curl -H 'Content-Type: application/json'
>>> 'http://localhost:8983/solr/collection/update?commit=true'
>>> --data-binary '
>>> [{
>>> "id":"my_id",
>>> "popularity":{"set":null}
>>> }]'
>>>
>>> then I'd expect the popularity field to be removed, however it's not.
>> I'm not really sure how that "null" value will be interpreted. It's
>> entirely possible that this won't actually delete the field.
>>
>> I think we need a "delete" action for Atomic Updates, to entirely remove
>> the field regardless of what it currently contains. There is "remove"
>> and "removeRegex", which MIGHT be enough, but I think delete would be
>> useful syntactic sugar.
>>
>> Dan, can you give the following update JSON a try instead? I am not
>> guaranteeing that this will do the job, but given the current
>> functionality, I think this is the option most likely to work:
>>
>> {
>> "id":"my_id",
>> "popularity":{"removeRegex":".*"}
>> }
>>
>> Thanks,
>> Shawn
>>
>>
Re: in-place atomic updates for numeric docValue field
Posted by "Dan ." <ro...@gmail.com>.
Hi Shawn,
Thanks for the suggestion.
I gave that a try but unfortunately it didn't work.
Delete somehow would be really useful, seems wasteful to have e.g. -1
representing null.
Cheers,
Dan
On 4 May 2017 at 15:30, Shawn Heisey <ap...@elyograg.org> wrote:
> On 5/4/2017 7:40 AM, Dan . wrote:
> > I have a field like this:
> >
> > <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
> > <field name="popularity" type="integer" indexed="false" stored="false"
> > docValues="true" multiValued="false"/>
> >
> > so I can do a fast in-place atomic updates
> >
> > However if I do e.g.
> >
> > curl -H 'Content-Type: application/json'
> > 'http://localhost:8983/solr/collection/update?commit=true'
> > --data-binary '
> > [{
> > "id":"my_id",
> > "popularity":{"set":null}
> > }]'
> >
> > then I'd expect the popularity field to be removed, however it's not.
>
> I'm not really sure how that "null" value will be interpreted. It's
> entirely possible that this won't actually delete the field.
>
> I think we need a "delete" action for Atomic Updates, to entirely remove
> the field regardless of what it currently contains. There is "remove"
> and "removeRegex", which MIGHT be enough, but I think delete would be
> useful syntactic sugar.
>
> Dan, can you give the following update JSON a try instead? I am not
> guaranteeing that this will do the job, but given the current
> functionality, I think this is the option most likely to work:
>
> {
> "id":"my_id",
> "popularity":{"removeRegex":".*"}
> }
>
> Thanks,
> Shawn
>
>
Re: in-place atomic updates for numeric docValue field
Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/4/2017 7:40 AM, Dan . wrote:
> I have a field like this:
>
> <fieldType name="integer" class="solr.TrieIntField" omitNorms="true"/>
> <field name="popularity" type="integer" indexed="false" stored="false"
> docValues="true" multiValued="false"/>
>
> so I can do a fast in-place atomic updates
>
> However if I do e.g.
>
> curl -H 'Content-Type: application/json'
> 'http://localhost:8983/solr/collection/update?commit=true'
> --data-binary '
> [{
> "id":"my_id",
> "popularity":{"set":null}
> }]'
>
> then I'd expect the popularity field to be removed, however it's not.
I'm not really sure how that "null" value will be interpreted. It's
entirely possible that this won't actually delete the field.
I think we need a "delete" action for Atomic Updates, to entirely remove
the field regardless of what it currently contains. There is "remove"
and "removeRegex", which MIGHT be enough, but I think delete would be
useful syntactic sugar.
Dan, can you give the following update JSON a try instead? I am not
guaranteeing that this will do the job, but given the current
functionality, I think this is the option most likely to work:
{
"id":"my_id",
"popularity":{"removeRegex":".*"}
}
Thanks,
Shawn