You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2021/07/29 16:45:00 UTC

[jira] [Commented] (SOLR-15540) Duplicated adding document for update when Solr7 upgrade to Solr8

    [ https://issues.apache.org/jira/browse/SOLR-15540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390024#comment-17390024 ] 

Chris M. Hostetter commented on SOLR-15540:
-------------------------------------------

in Solr7 the treatment of the {{_root_}} field (if it existed in the schema) was very inconsistent depending on wether any given document beeing adeed/updated included nested children or not. this was the cause of various bugs and inconsistencies with update and deleteById which were fixed in 8.x — but it appears this “fix” didn’t account for the possibility of:
 * upgrading from 7.x …

 * … w/schema that includes {{_root_}} field …

 * … but the {{_root_}} field doesn’t exist in some/all documents

I can reproduce the described problem using the 7.7.2 techproducts configs (AFAICT the {{_default}} configs from 7.7.2 should be equally affected) … details below.

IIUC there are 4 possible scenerios for people upgrading from 7x to 8x…
 # Your existing schema doesn’t include a {{_root_}} field
 ** you should be unaffected by this problem
 # Your existing schema includes a {{_root_}} field, and all documents in your collection are nested documents (ie: every document has a value in the {{_root_}} field)
 ** you should be unaffected by this problem
 # Your existing schema includes a {{_root_}} field, but you have no nested documents in your collection (ie: no document has any value in the {{_root_}} field)
 ** You should be able to work-around this problem by removing the {{_root_}} field just before or just after upgrading to solr 8 
 ** if you’ve already updated some documents before removing the {{_root_}} field you may need to re-updated/delete the duplicates manually
 # Your existing schema includes a {{_root_}} field, and some of your documents are nested, but some documents have no children (ie: some docs have values in in the {{_root_}} field, while other docs do not)
 ** I don’t think there is any work-around for this situation except to deleteByQuery to remove all the docs w/o children (either before or after upgrading to 8.x) and then re-add them after upgrading

 

Example of reproducing this problem, and demonstrating the work around of removing the {{_root_}} field…

 
{code:java}
### 7.7.2 create techproducts example w/docs...

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/7.7.2] $ bin/solr -e techproducts
...
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/7.7.2] $ bin/solr stop -all
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 15596 to stop gracefully.



### save our solr home for later re-use in "upgrade"...

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/7.7.2] $ cp -r example/techproducts/solr/ /tmp/solr-home



### Now start solr 8.8.2 using the solr-home from 7.7.2 ....

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ bin/solr -s /tmp/solr-home
Waiting up to 180 seconds to see Solr running on port 8983 [|] 
Started Solr server on port 8983 (pid=17201). Happy searching!


### confirm we have one doc named "solr" and get it's uniqueKey...

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/select?q=name:solr&fl=id,name'
{
 "responseHeader":{
 "status":0,
 "QTime":1,
 "params":{
 "q":"name:solr",
 "fl":"id,name,_root_"}},
 "response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
 {
 "id":"SOLR1000",
 "name":"Solr, the Enterprise Search Server"}]
 }}



### attempt to update this document and see the bug manifest...

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/update/json?commit=true' --data-binary '[{"id":"SOLR1000", "name":"Solr name changed"}]'
{
 "responseHeader":{
 "status":0,
 "QTime":152}}
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/select?q=name:solr&fl=id,name'
{
 "responseHeader":{
 "status":0,
 "QTime":3,
 "params":{
 "q":"name:solr",
 "fl":"id,name,_root_"}},
 "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
 {
 "id":"SOLR1000",
 "name":"Solr name changed"},
 {
 "id":"SOLR1000",
 "name":"Solr, the Enterprise Search Server"}]
 }}

### now we have 2 docs with same uniqueKey

### Workaround by removing the (unneeded) _root_ field from schema,
### and update the doc again (will "overwrite" both existing docs)

hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl -X POST -H 'Content-type:application/json' --data-binary '{ "delete-field":{"name":"_root_"} }' http://localhost:8983/solr/techproducts/schema
{
 "responseHeader":{
 "status":0,
 "QTime":257}}
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/select?q=name:solr&fl=id,name'
{
 "responseHeader":{
 "status":0,
 "QTime":1,
 "params":{
 "q":"name:solr",
 "fl":"id,name"}},
 "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
 {
 "id":"SOLR1000",
 "name":"Solr name changed"},
 {
 "id":"SOLR1000",
 "name":"Solr, the Enterprise Search Server"}]
 }}
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/update/json?commit=true' --data-binary '[{"id":"SOLR1000", "name":"Solr name changed after root field removed"}]'
{
 "responseHeader":{
 "status":0,
 "QTime":63}}
hossman@slate:~/lucene/8x/solr [j8] [tags/releases/lucene-solr/8.8.2] $ curl 'http://localhost:8983/solr/techproducts/select?q=name:solr&fl=id,name'
{
 "responseHeader":{
 "status":0,
 "QTime":1,
 "params":{
 "q":"name:solr",
 "fl":"id,name"}},
 "response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
 {
 "id":"SOLR1000",
 "name":"Solr name changed after root field removed"}]
 }}

{code}

> Duplicated adding document for update when Solr7 upgrade to Solr8
> -----------------------------------------------------------------
>
>                 Key: SOLR-15540
>                 URL: https://issues.apache.org/jira/browse/SOLR-15540
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update, UpdateRequestProcessors
>    Affects Versions: 8.8.2
>         Environment: SolrCloud Solr 8.8.2
>            Reporter: samuel ma
>            Priority: Major
>
> We upgrade Solr7.7.2 to Solr8.8.2, keep using the Solr 7 index data, the query operation is fine. But when we try to add the doc (with the same doc id, actually is an update operation) to Solr8, we actually see 2 doc with the same id, which means the update did not remove the Solr7 doc.
> Below is the schema.xml configuration for Solr7.
> {code:java}
> <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
> <uniqueKey>id</uniqueKey>
> <field name="_root_" type="string" indexed="true" stored="false"/>
> {code}
>  
> Add some fields in Solr8
> <field name="_nest_path_" type="_nest_path_" stored="true"/>
> <fieldType name="_nest_path_" class="solr.NestPathField" />
>  
> I can see in Sol7 code
> {code:java}
> DirectUpdateHandler2.updateDocOrDocValues{code}
> use the idTerm as updateTerm, but in this case Solr8 use rootTerm as the updateTerm. is this an expected behavior? how do we handle this incompatible issue? 
>  
> Add comment:
> This also impacts deletebyId  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org