You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by gnandre <ar...@gmail.com> on 2022/03/30 16:27:19 UTC

Atomic indexing without whole document getting indexed again

IIRC, under the hood, atomic indexing indexes the whole document again even
if you might be updating just one field of that document. This costs hugely
in terms of indexing performance because the other fields might be
requiring some significant heavy tokenization. Is there any way around this?

Re: Atomic indexing without whole document getting indexed again

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/5/2022 12:38 PM, gnandre wrote:
> I conducted the test that you mentioned.
>
> Here is the diff - https://www.diffchecker.com/sdsMiGW5
>
> Left hand side is the state before the in-place update. Right hand side is
> the state after the in-place update.

That looks very strange to me.  If it were doing a full add/delete type 
of atomic update, I would expect there to be a new segment, but that 
shows all the same segments and new timestamps on the files for the oyyo 
segment, with the fdt file (which I believe has stored field data) 
changing size.  Not what I thought would happen for EITHER scenario.  
But I have to admit that I am not completely sure how things happen at 
the Lucene level for in-place updates.

These listings shows the user and group for those files as 8983 ... so 
the OS where you gathered this info is NOT the system where Solr is 
actually running.  It could be either a container situation like Docker, 
or a network filesystem.  I believe that uid/gid 8983 is used for the 
solr user in the docker images available for solr.

Thanks,
Shawn


Re: Atomic indexing without whole document getting indexed again

Posted by Matthew Lapointe <ml...@alpha-sense.com>.
Are there any update request processors defined that could be adding
default values?


On Tue, Apr 5, 2022 at 4:53 PM gnandre <ar...@gmail.com> wrote:

> It is configured as a unique field.
>
> <uniqueKey>id</uniqueKey>
> <field name="id" type="string" stored="true" indexed="true"/>
>
>
> On Tue, Apr 5, 2022 at 4:10 PM Matthew Lapointe <mlapointe@alpha-sense.com
> >
> wrote:
>
> > That's odd! The only other thing I can think to check would be to verify
> > that the "id" field is configured as the unique key field for the
> > collection.
> >
> > Matthew
> >
> > On Tue, Apr 5, 2022 at 3:43 PM gnandre <ar...@gmail.com> wrote:
> >
> > > Thanks, Matthew.
> > >
> > > I tried debugging as you suggested. It seems that it is still doing
> > atomic
> > > update instead of in-place update.
> > > I am not using SolrCloud, so I don't think that SOLR-13081 is
> applicable
> > in
> > > my situation. I am using Solr 8.5.2 in standalone mode.
> > > I am not sure why in-place updates are still not getting triggered :(
> > >
> > > solr_1                               | 2022-04-05 19:37:22.453 DEBUG
> > > (qtp825658265-16) [   x:answers] o.a.s.u.DirectUpdateHandler2
> > >
> >
> updateDocument(add{_version_=1729298371656548352,id=answers:question:8029})
> > >
> > > On Tue, Apr 5, 2022 at 3:10 PM Matthew Lapointe <
> > mlapointe@alpha-sense.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I encountered a similar issue recently trying to differentiate
> between
> > > > atomic and in-place updates. I ended up enabling debug logging for
> > > > the DirectUpdateHandler2 class via Solr UI → Logging → Level options.
> > > Then
> > > > the logs should print something like "DirectUpdateHandler2
> > > updateDocValues"
> > > > for an in-place update, or "DirectUpdateHandler2 updateDocument" for
> an
> > > > atomic update.
> > > >
> > > > Not sure if this applies to your setup, but in our case atomic
> updates
> > > were
> > > > initially being used because we have a route.field defined and our
> Solr
> > > > version did not yet have the fix for SOLR-13081
> > > > <https://issues.apache.org/jira/browse/SOLR-13081>.
> > > >
> > > > Matthew
> > > >
> > > > On Tue, Apr 5, 2022 at 2:39 PM gnandre <ar...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks, Shawn.
> > > > >
> > > > > I conducted the test that you mentioned.
> > > > >
> > > > > Here is the diff - https://www.diffchecker.com/sdsMiGW5
> > > > >
> > > > > Left hand side is the state before the in-place update. Right hand
> > side
> > > > is
> > > > > the state after the in-place update.
> > > > >
> > > > > On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey <elyograg@elyograg.org
> >
> > > > wrote:
> > > > >
> > > > > > On 4/5/22 10:53, gnandre wrote:
> > > > > > > Hi, here are the relevant fields from the schema.
> > > > > > >
> > > > > > > <fieldType name="long" class="solr.LongPointField"
> > > docValues="true"/>
> > > > > > > <field name="_version_" type="long" indexed="false"
> > stored="false"
> > > > > > docValues
> > > > > > > ="true" multiValued="false" />
> > > > > > > <field name="views_count" type="long" stored="false"
> > > indexed="false"
> > > > > > > docValues="true" multiValued="false"/>
> > > > > > >
> > > > > > > There are no copyfields for views_count.
> > > > > > >
> > > > > > > Here are the corresponding atomic indexing and commit requests:
> > > > > > >
> > > > > > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > > > > > "answers:question:8029","views_count" : {"set":111}}]'
> > > > > > > curl "http://solr:8983/solr/answers/update?commit=true"
> > > > > >
> > > > > > Can you do some testing when there is no other indexing activity?
> > > What
> > > > > > I'd like to see is a long directory listing of the index
> directory
> > > > > > before an update like that, and then a long directory listing
> after
> > > an
> > > > > > update like that.  To get the kind of listing I'm after, you
> would
> > > use
> > > > > > "ls -al" on a POSIX system like Linux, and "dir" in a command
> > prompt
> > > on
> > > > > > windows.
> > > > > >
> > > > > > > It DOES change the value successfully. To verify if it is doing
> > > > atomic
> > > > > > > indexing or in-place update, I changed the name of one other
> > field
> > > > > > > from
> > > > > > > <field name="asset_type" type="string" stored="true"
> > indexed="true"
> > > > > > > multiValued="true" default="1775"/>
> > > > > > > to
> > > > > > > <field name="asset_typ" type="string" stored="true"
> > indexed="true"
> > > > > > > multiValued="true" default="1775"/>
> > > > > > > and reloaded the schema.
> > > > > > >
> > > > > > > Now, when I send above mentioned atomic indexing request, I get
> > > > > following
> > > > > > > error message:
> > > > > > >
> > > > > > > {
> > > > > > >    "responseHeader":{
> > > > > > >      "status":400,
> > > > > > >      "QTime":7},
> > > > > > >    "error":{
> > > > > > >      "metadata":[
> > > > > > >        "error-class","org.apache.solr.common.SolrException",
> > > > > > >
> > "root-error-class","org.apache.solr.common.SolrException"],
> > > > > > >      "msg":"ERROR: [doc=answers:question:8029] unknown field
> > > > > > 'asset_type'",
> > > > > > >      "code":400}}
> > > > > > >
> > > > > > > So, I believe that it is still trying to index other fields as
> > well
> > > > > from
> > > > > > > their stored values and it is not an in-place update. What am I
> > > > > missing?
> > > > > >
> > > > > > It is entirely possible that the code that does atomic or in
> place
> > > > > > updates checks the existing document against the current schema,
> > and
> > > > > > throws that error even for in-place updates.  I think it would
> have
> > > to
> > > > > > do that to figure out whether it CAN do an in-place update.  I am
> > not
> > > > > > sure which part of the source code I would even need to check to
> > > figure
> > > > > > that out.  But if you can do the test above, I should be able to
> > tell
> > > > > > you whether the update was fully atomic or in-place.
> > > > > >
> > > > > > Thanks,
> > > > > > Shawn
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Atomic indexing without whole document getting indexed again

Posted by gnandre <ar...@gmail.com>.
It is configured as a unique field.

<uniqueKey>id</uniqueKey>
<field name="id" type="string" stored="true" indexed="true"/>


On Tue, Apr 5, 2022 at 4:10 PM Matthew Lapointe <ml...@alpha-sense.com>
wrote:

> That's odd! The only other thing I can think to check would be to verify
> that the "id" field is configured as the unique key field for the
> collection.
>
> Matthew
>
> On Tue, Apr 5, 2022 at 3:43 PM gnandre <ar...@gmail.com> wrote:
>
> > Thanks, Matthew.
> >
> > I tried debugging as you suggested. It seems that it is still doing
> atomic
> > update instead of in-place update.
> > I am not using SolrCloud, so I don't think that SOLR-13081 is applicable
> in
> > my situation. I am using Solr 8.5.2 in standalone mode.
> > I am not sure why in-place updates are still not getting triggered :(
> >
> > solr_1                               | 2022-04-05 19:37:22.453 DEBUG
> > (qtp825658265-16) [   x:answers] o.a.s.u.DirectUpdateHandler2
> >
> updateDocument(add{_version_=1729298371656548352,id=answers:question:8029})
> >
> > On Tue, Apr 5, 2022 at 3:10 PM Matthew Lapointe <
> mlapointe@alpha-sense.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I encountered a similar issue recently trying to differentiate between
> > > atomic and in-place updates. I ended up enabling debug logging for
> > > the DirectUpdateHandler2 class via Solr UI → Logging → Level options.
> > Then
> > > the logs should print something like "DirectUpdateHandler2
> > updateDocValues"
> > > for an in-place update, or "DirectUpdateHandler2 updateDocument" for an
> > > atomic update.
> > >
> > > Not sure if this applies to your setup, but in our case atomic updates
> > were
> > > initially being used because we have a route.field defined and our Solr
> > > version did not yet have the fix for SOLR-13081
> > > <https://issues.apache.org/jira/browse/SOLR-13081>.
> > >
> > > Matthew
> > >
> > > On Tue, Apr 5, 2022 at 2:39 PM gnandre <ar...@gmail.com>
> wrote:
> > >
> > > > Thanks, Shawn.
> > > >
> > > > I conducted the test that you mentioned.
> > > >
> > > > Here is the diff - https://www.diffchecker.com/sdsMiGW5
> > > >
> > > > Left hand side is the state before the in-place update. Right hand
> side
> > > is
> > > > the state after the in-place update.
> > > >
> > > > On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey <el...@elyograg.org>
> > > wrote:
> > > >
> > > > > On 4/5/22 10:53, gnandre wrote:
> > > > > > Hi, here are the relevant fields from the schema.
> > > > > >
> > > > > > <fieldType name="long" class="solr.LongPointField"
> > docValues="true"/>
> > > > > > <field name="_version_" type="long" indexed="false"
> stored="false"
> > > > > docValues
> > > > > > ="true" multiValued="false" />
> > > > > > <field name="views_count" type="long" stored="false"
> > indexed="false"
> > > > > > docValues="true" multiValued="false"/>
> > > > > >
> > > > > > There are no copyfields for views_count.
> > > > > >
> > > > > > Here are the corresponding atomic indexing and commit requests:
> > > > > >
> > > > > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > > > > "answers:question:8029","views_count" : {"set":111}}]'
> > > > > > curl "http://solr:8983/solr/answers/update?commit=true"
> > > > >
> > > > > Can you do some testing when there is no other indexing activity?
> > What
> > > > > I'd like to see is a long directory listing of the index directory
> > > > > before an update like that, and then a long directory listing after
> > an
> > > > > update like that.  To get the kind of listing I'm after, you would
> > use
> > > > > "ls -al" on a POSIX system like Linux, and "dir" in a command
> prompt
> > on
> > > > > windows.
> > > > >
> > > > > > It DOES change the value successfully. To verify if it is doing
> > > atomic
> > > > > > indexing or in-place update, I changed the name of one other
> field
> > > > > > from
> > > > > > <field name="asset_type" type="string" stored="true"
> indexed="true"
> > > > > > multiValued="true" default="1775"/>
> > > > > > to
> > > > > > <field name="asset_typ" type="string" stored="true"
> indexed="true"
> > > > > > multiValued="true" default="1775"/>
> > > > > > and reloaded the schema.
> > > > > >
> > > > > > Now, when I send above mentioned atomic indexing request, I get
> > > > following
> > > > > > error message:
> > > > > >
> > > > > > {
> > > > > >    "responseHeader":{
> > > > > >      "status":400,
> > > > > >      "QTime":7},
> > > > > >    "error":{
> > > > > >      "metadata":[
> > > > > >        "error-class","org.apache.solr.common.SolrException",
> > > > > >
> "root-error-class","org.apache.solr.common.SolrException"],
> > > > > >      "msg":"ERROR: [doc=answers:question:8029] unknown field
> > > > > 'asset_type'",
> > > > > >      "code":400}}
> > > > > >
> > > > > > So, I believe that it is still trying to index other fields as
> well
> > > > from
> > > > > > their stored values and it is not an in-place update. What am I
> > > > missing?
> > > > >
> > > > > It is entirely possible that the code that does atomic or in place
> > > > > updates checks the existing document against the current schema,
> and
> > > > > throws that error even for in-place updates.  I think it would have
> > to
> > > > > do that to figure out whether it CAN do an in-place update.  I am
> not
> > > > > sure which part of the source code I would even need to check to
> > figure
> > > > > that out.  But if you can do the test above, I should be able to
> tell
> > > > > you whether the update was fully atomic or in-place.
> > > > >
> > > > > Thanks,
> > > > > Shawn
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Atomic indexing without whole document getting indexed again

Posted by Matthew Lapointe <ml...@alpha-sense.com>.
That's odd! The only other thing I can think to check would be to verify
that the "id" field is configured as the unique key field for the
collection.

Matthew

On Tue, Apr 5, 2022 at 3:43 PM gnandre <ar...@gmail.com> wrote:

> Thanks, Matthew.
>
> I tried debugging as you suggested. It seems that it is still doing atomic
> update instead of in-place update.
> I am not using SolrCloud, so I don't think that SOLR-13081 is applicable in
> my situation. I am using Solr 8.5.2 in standalone mode.
> I am not sure why in-place updates are still not getting triggered :(
>
> solr_1                               | 2022-04-05 19:37:22.453 DEBUG
> (qtp825658265-16) [   x:answers] o.a.s.u.DirectUpdateHandler2
> updateDocument(add{_version_=1729298371656548352,id=answers:question:8029})
>
> On Tue, Apr 5, 2022 at 3:10 PM Matthew Lapointe <mlapointe@alpha-sense.com
> >
> wrote:
>
> > Hi,
> >
> > I encountered a similar issue recently trying to differentiate between
> > atomic and in-place updates. I ended up enabling debug logging for
> > the DirectUpdateHandler2 class via Solr UI → Logging → Level options.
> Then
> > the logs should print something like "DirectUpdateHandler2
> updateDocValues"
> > for an in-place update, or "DirectUpdateHandler2 updateDocument" for an
> > atomic update.
> >
> > Not sure if this applies to your setup, but in our case atomic updates
> were
> > initially being used because we have a route.field defined and our Solr
> > version did not yet have the fix for SOLR-13081
> > <https://issues.apache.org/jira/browse/SOLR-13081>.
> >
> > Matthew
> >
> > On Tue, Apr 5, 2022 at 2:39 PM gnandre <ar...@gmail.com> wrote:
> >
> > > Thanks, Shawn.
> > >
> > > I conducted the test that you mentioned.
> > >
> > > Here is the diff - https://www.diffchecker.com/sdsMiGW5
> > >
> > > Left hand side is the state before the in-place update. Right hand side
> > is
> > > the state after the in-place update.
> > >
> > > On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey <el...@elyograg.org>
> > wrote:
> > >
> > > > On 4/5/22 10:53, gnandre wrote:
> > > > > Hi, here are the relevant fields from the schema.
> > > > >
> > > > > <fieldType name="long" class="solr.LongPointField"
> docValues="true"/>
> > > > > <field name="_version_" type="long" indexed="false" stored="false"
> > > > docValues
> > > > > ="true" multiValued="false" />
> > > > > <field name="views_count" type="long" stored="false"
> indexed="false"
> > > > > docValues="true" multiValued="false"/>
> > > > >
> > > > > There are no copyfields for views_count.
> > > > >
> > > > > Here are the corresponding atomic indexing and commit requests:
> > > > >
> > > > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > > > "answers:question:8029","views_count" : {"set":111}}]'
> > > > > curl "http://solr:8983/solr/answers/update?commit=true"
> > > >
> > > > Can you do some testing when there is no other indexing activity?
> What
> > > > I'd like to see is a long directory listing of the index directory
> > > > before an update like that, and then a long directory listing after
> an
> > > > update like that.  To get the kind of listing I'm after, you would
> use
> > > > "ls -al" on a POSIX system like Linux, and "dir" in a command prompt
> on
> > > > windows.
> > > >
> > > > > It DOES change the value successfully. To verify if it is doing
> > atomic
> > > > > indexing or in-place update, I changed the name of one other field
> > > > > from
> > > > > <field name="asset_type" type="string" stored="true" indexed="true"
> > > > > multiValued="true" default="1775"/>
> > > > > to
> > > > > <field name="asset_typ" type="string" stored="true" indexed="true"
> > > > > multiValued="true" default="1775"/>
> > > > > and reloaded the schema.
> > > > >
> > > > > Now, when I send above mentioned atomic indexing request, I get
> > > following
> > > > > error message:
> > > > >
> > > > > {
> > > > >    "responseHeader":{
> > > > >      "status":400,
> > > > >      "QTime":7},
> > > > >    "error":{
> > > > >      "metadata":[
> > > > >        "error-class","org.apache.solr.common.SolrException",
> > > > >        "root-error-class","org.apache.solr.common.SolrException"],
> > > > >      "msg":"ERROR: [doc=answers:question:8029] unknown field
> > > > 'asset_type'",
> > > > >      "code":400}}
> > > > >
> > > > > So, I believe that it is still trying to index other fields as well
> > > from
> > > > > their stored values and it is not an in-place update. What am I
> > > missing?
> > > >
> > > > It is entirely possible that the code that does atomic or in place
> > > > updates checks the existing document against the current schema, and
> > > > throws that error even for in-place updates.  I think it would have
> to
> > > > do that to figure out whether it CAN do an in-place update.  I am not
> > > > sure which part of the source code I would even need to check to
> figure
> > > > that out.  But if you can do the test above, I should be able to tell
> > > > you whether the update was fully atomic or in-place.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >
>

Re: Atomic indexing without whole document getting indexed again

Posted by gnandre <ar...@gmail.com>.
Thanks, Matthew.

I tried debugging as you suggested. It seems that it is still doing atomic
update instead of in-place update.
I am not using SolrCloud, so I don't think that SOLR-13081 is applicable in
my situation. I am using Solr 8.5.2 in standalone mode.
I am not sure why in-place updates are still not getting triggered :(

solr_1                               | 2022-04-05 19:37:22.453 DEBUG
(qtp825658265-16) [   x:answers] o.a.s.u.DirectUpdateHandler2
updateDocument(add{_version_=1729298371656548352,id=answers:question:8029})

On Tue, Apr 5, 2022 at 3:10 PM Matthew Lapointe <ml...@alpha-sense.com>
wrote:

> Hi,
>
> I encountered a similar issue recently trying to differentiate between
> atomic and in-place updates. I ended up enabling debug logging for
> the DirectUpdateHandler2 class via Solr UI → Logging → Level options. Then
> the logs should print something like "DirectUpdateHandler2 updateDocValues"
> for an in-place update, or "DirectUpdateHandler2 updateDocument" for an
> atomic update.
>
> Not sure if this applies to your setup, but in our case atomic updates were
> initially being used because we have a route.field defined and our Solr
> version did not yet have the fix for SOLR-13081
> <https://issues.apache.org/jira/browse/SOLR-13081>.
>
> Matthew
>
> On Tue, Apr 5, 2022 at 2:39 PM gnandre <ar...@gmail.com> wrote:
>
> > Thanks, Shawn.
> >
> > I conducted the test that you mentioned.
> >
> > Here is the diff - https://www.diffchecker.com/sdsMiGW5
> >
> > Left hand side is the state before the in-place update. Right hand side
> is
> > the state after the in-place update.
> >
> > On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey <el...@elyograg.org>
> wrote:
> >
> > > On 4/5/22 10:53, gnandre wrote:
> > > > Hi, here are the relevant fields from the schema.
> > > >
> > > > <fieldType name="long" class="solr.LongPointField" docValues="true"/>
> > > > <field name="_version_" type="long" indexed="false" stored="false"
> > > docValues
> > > > ="true" multiValued="false" />
> > > > <field name="views_count" type="long" stored="false" indexed="false"
> > > > docValues="true" multiValued="false"/>
> > > >
> > > > There are no copyfields for views_count.
> > > >
> > > > Here are the corresponding atomic indexing and commit requests:
> > > >
> > > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > > "answers:question:8029","views_count" : {"set":111}}]'
> > > > curl "http://solr:8983/solr/answers/update?commit=true"
> > >
> > > Can you do some testing when there is no other indexing activity? What
> > > I'd like to see is a long directory listing of the index directory
> > > before an update like that, and then a long directory listing after an
> > > update like that.  To get the kind of listing I'm after, you would use
> > > "ls -al" on a POSIX system like Linux, and "dir" in a command prompt on
> > > windows.
> > >
> > > > It DOES change the value successfully. To verify if it is doing
> atomic
> > > > indexing or in-place update, I changed the name of one other field
> > > > from
> > > > <field name="asset_type" type="string" stored="true" indexed="true"
> > > > multiValued="true" default="1775"/>
> > > > to
> > > > <field name="asset_typ" type="string" stored="true" indexed="true"
> > > > multiValued="true" default="1775"/>
> > > > and reloaded the schema.
> > > >
> > > > Now, when I send above mentioned atomic indexing request, I get
> > following
> > > > error message:
> > > >
> > > > {
> > > >    "responseHeader":{
> > > >      "status":400,
> > > >      "QTime":7},
> > > >    "error":{
> > > >      "metadata":[
> > > >        "error-class","org.apache.solr.common.SolrException",
> > > >        "root-error-class","org.apache.solr.common.SolrException"],
> > > >      "msg":"ERROR: [doc=answers:question:8029] unknown field
> > > 'asset_type'",
> > > >      "code":400}}
> > > >
> > > > So, I believe that it is still trying to index other fields as well
> > from
> > > > their stored values and it is not an in-place update. What am I
> > missing?
> > >
> > > It is entirely possible that the code that does atomic or in place
> > > updates checks the existing document against the current schema, and
> > > throws that error even for in-place updates.  I think it would have to
> > > do that to figure out whether it CAN do an in-place update.  I am not
> > > sure which part of the source code I would even need to check to figure
> > > that out.  But if you can do the test above, I should be able to tell
> > > you whether the update was fully atomic or in-place.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>

Re: Atomic indexing without whole document getting indexed again

Posted by Matthew Lapointe <ml...@alpha-sense.com>.
Hi,

I encountered a similar issue recently trying to differentiate between
atomic and in-place updates. I ended up enabling debug logging for
the DirectUpdateHandler2 class via Solr UI → Logging → Level options. Then
the logs should print something like "DirectUpdateHandler2 updateDocValues"
for an in-place update, or "DirectUpdateHandler2 updateDocument" for an
atomic update.

Not sure if this applies to your setup, but in our case atomic updates were
initially being used because we have a route.field defined and our Solr
version did not yet have the fix for SOLR-13081
<https://issues.apache.org/jira/browse/SOLR-13081>.

Matthew

On Tue, Apr 5, 2022 at 2:39 PM gnandre <ar...@gmail.com> wrote:

> Thanks, Shawn.
>
> I conducted the test that you mentioned.
>
> Here is the diff - https://www.diffchecker.com/sdsMiGW5
>
> Left hand side is the state before the in-place update. Right hand side is
> the state after the in-place update.
>
> On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey <el...@elyograg.org> wrote:
>
> > On 4/5/22 10:53, gnandre wrote:
> > > Hi, here are the relevant fields from the schema.
> > >
> > > <fieldType name="long" class="solr.LongPointField" docValues="true"/>
> > > <field name="_version_" type="long" indexed="false" stored="false"
> > docValues
> > > ="true" multiValued="false" />
> > > <field name="views_count" type="long" stored="false" indexed="false"
> > > docValues="true" multiValued="false"/>
> > >
> > > There are no copyfields for views_count.
> > >
> > > Here are the corresponding atomic indexing and commit requests:
> > >
> > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > "answers:question:8029","views_count" : {"set":111}}]'
> > > curl "http://solr:8983/solr/answers/update?commit=true"
> >
> > Can you do some testing when there is no other indexing activity? What
> > I'd like to see is a long directory listing of the index directory
> > before an update like that, and then a long directory listing after an
> > update like that.  To get the kind of listing I'm after, you would use
> > "ls -al" on a POSIX system like Linux, and "dir" in a command prompt on
> > windows.
> >
> > > It DOES change the value successfully. To verify if it is doing atomic
> > > indexing or in-place update, I changed the name of one other field
> > > from
> > > <field name="asset_type" type="string" stored="true" indexed="true"
> > > multiValued="true" default="1775"/>
> > > to
> > > <field name="asset_typ" type="string" stored="true" indexed="true"
> > > multiValued="true" default="1775"/>
> > > and reloaded the schema.
> > >
> > > Now, when I send above mentioned atomic indexing request, I get
> following
> > > error message:
> > >
> > > {
> > >    "responseHeader":{
> > >      "status":400,
> > >      "QTime":7},
> > >    "error":{
> > >      "metadata":[
> > >        "error-class","org.apache.solr.common.SolrException",
> > >        "root-error-class","org.apache.solr.common.SolrException"],
> > >      "msg":"ERROR: [doc=answers:question:8029] unknown field
> > 'asset_type'",
> > >      "code":400}}
> > >
> > > So, I believe that it is still trying to index other fields as well
> from
> > > their stored values and it is not an in-place update. What am I
> missing?
> >
> > It is entirely possible that the code that does atomic or in place
> > updates checks the existing document against the current schema, and
> > throws that error even for in-place updates.  I think it would have to
> > do that to figure out whether it CAN do an in-place update.  I am not
> > sure which part of the source code I would even need to check to figure
> > that out.  But if you can do the test above, I should be able to tell
> > you whether the update was fully atomic or in-place.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: Atomic indexing without whole document getting indexed again

Posted by gnandre <ar...@gmail.com>.
Thanks, Shawn.

I conducted the test that you mentioned.

Here is the diff - https://www.diffchecker.com/sdsMiGW5

Left hand side is the state before the in-place update. Right hand side is
the state after the in-place update.

On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey <el...@elyograg.org> wrote:

> On 4/5/22 10:53, gnandre wrote:
> > Hi, here are the relevant fields from the schema.
> >
> > <fieldType name="long" class="solr.LongPointField" docValues="true"/>
> > <field name="_version_" type="long" indexed="false" stored="false"
> docValues
> > ="true" multiValued="false" />
> > <field name="views_count" type="long" stored="false" indexed="false"
> > docValues="true" multiValued="false"/>
> >
> > There are no copyfields for views_count.
> >
> > Here are the corresponding atomic indexing and commit requests:
> >
> > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > "answers:question:8029","views_count" : {"set":111}}]'
> > curl "http://solr:8983/solr/answers/update?commit=true"
>
> Can you do some testing when there is no other indexing activity? What
> I'd like to see is a long directory listing of the index directory
> before an update like that, and then a long directory listing after an
> update like that.  To get the kind of listing I'm after, you would use
> "ls -al" on a POSIX system like Linux, and "dir" in a command prompt on
> windows.
>
> > It DOES change the value successfully. To verify if it is doing atomic
> > indexing or in-place update, I changed the name of one other field
> > from
> > <field name="asset_type" type="string" stored="true" indexed="true"
> > multiValued="true" default="1775"/>
> > to
> > <field name="asset_typ" type="string" stored="true" indexed="true"
> > multiValued="true" default="1775"/>
> > and reloaded the schema.
> >
> > Now, when I send above mentioned atomic indexing request, I get following
> > error message:
> >
> > {
> >    "responseHeader":{
> >      "status":400,
> >      "QTime":7},
> >    "error":{
> >      "metadata":[
> >        "error-class","org.apache.solr.common.SolrException",
> >        "root-error-class","org.apache.solr.common.SolrException"],
> >      "msg":"ERROR: [doc=answers:question:8029] unknown field
> 'asset_type'",
> >      "code":400}}
> >
> > So, I believe that it is still trying to index other fields as well from
> > their stored values and it is not an in-place update. What am I missing?
>
> It is entirely possible that the code that does atomic or in place
> updates checks the existing document against the current schema, and
> throws that error even for in-place updates.  I think it would have to
> do that to figure out whether it CAN do an in-place update.  I am not
> sure which part of the source code I would even need to check to figure
> that out.  But if you can do the test above, I should be able to tell
> you whether the update was fully atomic or in-place.
>
> Thanks,
> Shawn
>
>

Re: Atomic indexing without whole document getting indexed again

Posted by Shawn Heisey <el...@elyograg.org>.
On 4/5/22 10:53, gnandre wrote:
> Hi, here are the relevant fields from the schema.
>
> <fieldType name="long" class="solr.LongPointField" docValues="true"/>
> <field name="_version_" type="long" indexed="false" stored="false" docValues
> ="true" multiValued="false" />
> <field name="views_count" type="long" stored="false" indexed="false"
> docValues="true" multiValued="false"/>
>
> There are no copyfields for views_count.
>
> Here are the corresponding atomic indexing and commit requests:
>
> curl http://solr:8983/solr/answers/update -d '[{"id" :
> "answers:question:8029","views_count" : {"set":111}}]'
> curl "http://solr:8983/solr/answers/update?commit=true"

Can you do some testing when there is no other indexing activity? What 
I'd like to see is a long directory listing of the index directory 
before an update like that, and then a long directory listing after an 
update like that.  To get the kind of listing I'm after, you would use 
"ls -al" on a POSIX system like Linux, and "dir" in a command prompt on 
windows.

> It DOES change the value successfully. To verify if it is doing atomic
> indexing or in-place update, I changed the name of one other field
> from
> <field name="asset_type" type="string" stored="true" indexed="true"
> multiValued="true" default="1775"/>
> to
> <field name="asset_typ" type="string" stored="true" indexed="true"
> multiValued="true" default="1775"/>
> and reloaded the schema.
>
> Now, when I send above mentioned atomic indexing request, I get following
> error message:
>
> {
>    "responseHeader":{
>      "status":400,
>      "QTime":7},
>    "error":{
>      "metadata":[
>        "error-class","org.apache.solr.common.SolrException",
>        "root-error-class","org.apache.solr.common.SolrException"],
>      "msg":"ERROR: [doc=answers:question:8029] unknown field 'asset_type'",
>      "code":400}}
>
> So, I believe that it is still trying to index other fields as well from
> their stored values and it is not an in-place update. What am I missing?

It is entirely possible that the code that does atomic or in place 
updates checks the existing document against the current schema, and 
throws that error even for in-place updates.  I think it would have to 
do that to figure out whether it CAN do an in-place update.  I am not 
sure which part of the source code I would even need to check to figure 
that out.  But if you can do the test above, I should be able to tell 
you whether the update was fully atomic or in-place.

Thanks,
Shawn


Re: Atomic indexing without whole document getting indexed again

Posted by gnandre <ar...@gmail.com>.
Hi, here are the relevant fields from the schema.

<fieldType name="long" class="solr.LongPointField" docValues="true"/>
<field name="_version_" type="long" indexed="false" stored="false" docValues
="true" multiValued="false" />
<field name="views_count" type="long" stored="false" indexed="false"
docValues="true" multiValued="false"/>

There are no copyfields for views_count.

Here are the corresponding atomic indexing and commit requests:

curl http://solr:8983/solr/answers/update -d '[{"id" :
"answers:question:8029","views_count" : {"set":111}}]'
curl "http://solr:8983/solr/answers/update?commit=true"

It DOES change the value successfully. To verify if it is doing atomic
indexing or in-place update, I changed the name of one other field
from
<field name="asset_type" type="string" stored="true" indexed="true"
multiValued="true" default="1775"/>
to
<field name="asset_typ" type="string" stored="true" indexed="true"
multiValued="true" default="1775"/>
and reloaded the schema.

Now, when I send above mentioned atomic indexing request, I get following
error message:

{
  "responseHeader":{
    "status":400,
    "QTime":7},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"ERROR: [doc=answers:question:8029] unknown field 'asset_type'",
    "code":400}}

So, I believe that it is still trying to index other fields as well from
their stored values and it is not an in-place update. What am I missing?

On Fri, Apr 1, 2022 at 9:50 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 3/31/22 13:36, gnandre wrote:
> > Here is what I tried to confirm if it is still doing atomic indexing and
> > not in-place indexing. I changed one other unrelated field's name and
> > reloaded the schema.
> > Now, when I performed the indexing just for the field that I wanted to
> > update in-place, it should not have complained about this other unrelated
> > field as it wouldn't bother indexing it.
> > But it did complain with 'unknown field' for the unrelated field. So that
> > tells me it is still doing atomic indexing and trying to index the whole
> > document with all fields.
> >
> > Is my understanding correct? If so, then why are in-place updates still
> not
> > working?
>
> Can you share your schema, the atomic update request you are sending,
> and an idea of what the contents of all the fields in the existing
> document are?
>
> Thanks,
> Shawn
>
>

Re: Atomic indexing without whole document getting indexed again

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/31/22 13:36, gnandre wrote:
> Here is what I tried to confirm if it is still doing atomic indexing and
> not in-place indexing. I changed one other unrelated field's name and
> reloaded the schema.
> Now, when I performed the indexing just for the field that I wanted to
> update in-place, it should not have complained about this other unrelated
> field as it wouldn't bother indexing it.
> But it did complain with 'unknown field' for the unrelated field. So that
> tells me it is still doing atomic indexing and trying to index the whole
> document with all fields.
>
> Is my understanding correct? If so, then why are in-place updates still not
> working?

Can you share your schema, the atomic update request you are sending, 
and an idea of what the contents of all the fields in the existing 
document are?

Thanks,
Shawn


Re: Atomic indexing without whole document getting indexed again

Posted by gnandre <ar...@gmail.com>.
Here is what I tried to confirm if it is still doing atomic indexing and
not in-place indexing. I changed one other unrelated field's name and
reloaded the schema.
Now, when I performed the indexing just for the field that I wanted to
update in-place, it should not have complained about this other unrelated
field as it wouldn't bother indexing it.
But it did complain with 'unknown field' for the unrelated field. So that
tells me it is still doing atomic indexing and trying to index the whole
document with all fields.

Is my understanding correct? If so, then why are in-place updates still not
working?

On Thu, Mar 31, 2022 at 12:33 PM gnandre <ar...@gmail.com> wrote:

> Thanks, this is what I was looking for. Although, when I am experimenting
> with them now, I see no performance improvement. I suspect that it is still
> doing atomic updates and not in-place updates.
> How do I confirm whether in-place updates are happening vs atomic updates?
> I can't tell it simply by looking at the update values for the document
> because the behavior will be the same in both cases.
>
> On Wed, Mar 30, 2022 at 12:37 PM Walter Underwood <wu...@wunderwood.org>
> wrote:
>
>> Integer view counts probably do meet those requirements, but you might
>> need to update all 25 million documents every day, which is not going to be
>> fast.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> > On Mar 30, 2022, at 9:34 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>> >
>> > On 3/30/22 10:27, gnandre wrote:
>> >> IIRC, under the hood, atomic indexing indexes the whole document again
>> even
>> >> if you might be updating just one field of that document. This costs
>> hugely
>> >> in terms of indexing performance because the other fields might be
>> >> requiring some significant heavy tokenization. Is there any way around
>> this?
>> >
>> >
>> > If you need to be able to query on any of the fields you're modifying
>> in the atomic update, then there is no way to do it without reindexing the
>> whole document.
>> >
>> > There is a feature that can do an in-place update, but the field has to
>> be not indexed, not stored, single valued, and have docValues enabled.  A
>> field using the TextField class cannot have docValues.  It is probably
>> unlikely that the fields you want to update meet these requirements.
>> >
>> >
>> https://solr.apache.org/guide/8_11/updating-parts-of-documents.html#in-place-updates
>> >
>> > Thanks,
>> > Shawn
>> >
>>
>>

Re: Atomic indexing without whole document getting indexed again

Posted by gnandre <ar...@gmail.com>.
Thanks, this is what I was looking for. Although, when I am experimenting
with them now, I see no performance improvement. I suspect that it is still
doing atomic updates and not in-place updates.
How do I confirm whether in-place updates are happening vs atomic updates?
I can't tell it simply by looking at the update values for the document
because the behavior will be the same in both cases.

On Wed, Mar 30, 2022 at 12:37 PM Walter Underwood <wu...@wunderwood.org>
wrote:

> Integer view counts probably do meet those requirements, but you might
> need to update all 25 million documents every day, which is not going to be
> fast.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Mar 30, 2022, at 9:34 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> >
> > On 3/30/22 10:27, gnandre wrote:
> >> IIRC, under the hood, atomic indexing indexes the whole document again
> even
> >> if you might be updating just one field of that document. This costs
> hugely
> >> in terms of indexing performance because the other fields might be
> >> requiring some significant heavy tokenization. Is there any way around
> this?
> >
> >
> > If you need to be able to query on any of the fields you're modifying in
> the atomic update, then there is no way to do it without reindexing the
> whole document.
> >
> > There is a feature that can do an in-place update, but the field has to
> be not indexed, not stored, single valued, and have docValues enabled.  A
> field using the TextField class cannot have docValues.  It is probably
> unlikely that the fields you want to update meet these requirements.
> >
> >
> https://solr.apache.org/guide/8_11/updating-parts-of-documents.html#in-place-updates
> >
> > Thanks,
> > Shawn
> >
>
>

Re: Atomic indexing without whole document getting indexed again

Posted by Walter Underwood <wu...@wunderwood.org>.
Integer view counts probably do meet those requirements, but you might need to update all 25 million documents every day, which is not going to be fast.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 30, 2022, at 9:34 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> On 3/30/22 10:27, gnandre wrote:
>> IIRC, under the hood, atomic indexing indexes the whole document again even
>> if you might be updating just one field of that document. This costs hugely
>> in terms of indexing performance because the other fields might be
>> requiring some significant heavy tokenization. Is there any way around this?
> 
> 
> If you need to be able to query on any of the fields you're modifying in the atomic update, then there is no way to do it without reindexing the whole document.
> 
> There is a feature that can do an in-place update, but the field has to be not indexed, not stored, single valued, and have docValues enabled.  A field using the TextField class cannot have docValues.  It is probably unlikely that the fields you want to update meet these requirements.
> 
> https://solr.apache.org/guide/8_11/updating-parts-of-documents.html#in-place-updates
> 
> Thanks,
> Shawn
> 


Re: Atomic indexing without whole document getting indexed again

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/30/22 10:27, gnandre wrote:
> IIRC, under the hood, atomic indexing indexes the whole document again even
> if you might be updating just one field of that document. This costs hugely
> in terms of indexing performance because the other fields might be
> requiring some significant heavy tokenization. Is there any way around this?


If you need to be able to query on any of the fields you're modifying in 
the atomic update, then there is no way to do it without reindexing the 
whole document.

There is a feature that can do an in-place update, but the field has to 
be not indexed, not stored, single valued, and have docValues enabled.  
A field using the TextField class cannot have docValues.  It is probably 
unlikely that the fields you want to update meet these requirements.

https://solr.apache.org/guide/8_11/updating-parts-of-documents.html#in-place-updates

Thanks,
Shawn