You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Oleksandr Drapushko <dr...@gmail.com> on 2019/10/15 17:16:48 UTC

Atomic Updates with PreAnalyzedField

Hello Community,

I've discovered data loss bug and couldn't find any mention of it. Please
confirm this bug haven't been reported yet.


Description:

If you try to update non pre-analyzed fields in a document using atomic
updates, data in pre-analyzed fields (if there is any) will be lost. The
bug was discovered in Solr 8.2 and 7.7.2.


Steps to reproduce:

1. Index this document into techproducts
{
  "id": "a",
  "n_s": "s1",
  "pre":
"{\"v\":\"1\",\"str\":\"Alaska\",\"tokens\":[{\"t\":\"alaska\",\"s\":0,\"e\":6,\"i\":1}]}"
}

2. Query the document
{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"a",
        "n_s":"s1",
        "pre":"Alaska",
        "_version_":1647475215142223872}]
  }}

3. Update using atomic syntax
{
  "add": {
    "doc": {
      "id": "a",
      "n_s": {"set": "s2"}
    }
  }
}

4. Observe the warning in solr log
UI:
WARN  x:techproducts_shard2_replica_n6  PreAnalyzedField  Error parsing
pre-analyzed field 'pre'

solr.log:
WARN  (qtp1384454980-23) [c:techproducts s:shard2 r:core_node8
x:techproducts_shard2_replica_n6] o.a.s.s.PreAnalyzedField Error parsing
pre-analyzed field 'pre' => java.io.IOException: Invalid JSON type
java.lang.String, expected Map
at
org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86)

5. Query the document again
{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"a",
        "n_s":"s2",
        "_version_":1647475461695995904}]
  }}

Result: There is no 'pre' field in the document anymore.


My thoughts on it:

1. Data loss can be prevented if the warning will be replaced with error
(re-throwing exception). Atomic updates for such documents still won't
work, but updates will be explicitly rejected.

2. Solr tries to read the document from index, merge it with input document
and re-index the document, but when it reads indexed pre-analyzed fields
the format is different, so Solr cannot parse and re-index those fields
properly.


Thank you,
Oleksandr

Re: Atomic Updates with PreAnalyzedField

Posted by Oleksandr Drapushko <dr...@gmail.com>.
https://issues.apache.org/jira/browse/SOLR-13850

On Wed, Oct 16, 2019 at 11:25 AM Mikhail Khludnev <mk...@apache.org> wrote:

> Hello, Oleksandr.
> It deserves JIRA, please raise one.
>
> On Tue, Oct 15, 2019 at 8:17 PM Oleksandr Drapushko <dr...@gmail.com>
> wrote:
>
> > Hello Community,
> >
> > I've discovered data loss bug and couldn't find any mention of it. Please
> > confirm this bug haven't been reported yet.
> >
> >
> > Description:
> >
> > If you try to update non pre-analyzed fields in a document using atomic
> > updates, data in pre-analyzed fields (if there is any) will be lost. The
> > bug was discovered in Solr 8.2 and 7.7.2.
> >
> >
> > Steps to reproduce:
> >
> > 1. Index this document into techproducts
> > {
> >   "id": "a",
> >   "n_s": "s1",
> >   "pre":
> >
> >
> "{\"v\":\"1\",\"str\":\"Alaska\",\"tokens\":[{\"t\":\"alaska\",\"s\":0,\"e\":6,\"i\":1}]}"
> > }
> >
> > 2. Query the document
> > {
> >   "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
> >       {
> >         "id":"a",
> >         "n_s":"s1",
> >         "pre":"Alaska",
> >         "_version_":1647475215142223872}]
> >   }}
> >
> > 3. Update using atomic syntax
> > {
> >   "add": {
> >     "doc": {
> >       "id": "a",
> >       "n_s": {"set": "s2"}
> >     }
> >   }
> > }
> >
> > 4. Observe the warning in solr log
> > UI:
> > WARN  x:techproducts_shard2_replica_n6  PreAnalyzedField  Error parsing
> > pre-analyzed field 'pre'
> >
> > solr.log:
> > WARN  (qtp1384454980-23) [c:techproducts s:shard2 r:core_node8
> > x:techproducts_shard2_replica_n6] o.a.s.s.PreAnalyzedField Error parsing
> > pre-analyzed field 'pre' => java.io.IOException: Invalid JSON type
> > java.lang.String, expected Map
> > at
> >
> >
> org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86)
> >
> > 5. Query the document again
> > {
> >   "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
> >       {
> >         "id":"a",
> >         "n_s":"s2",
> >         "_version_":1647475461695995904}]
> >   }}
> >
> > Result: There is no 'pre' field in the document anymore.
> >
> >
> > My thoughts on it:
> >
> > 1. Data loss can be prevented if the warning will be replaced with error
> > (re-throwing exception). Atomic updates for such documents still won't
> > work, but updates will be explicitly rejected.
> >
> > 2. Solr tries to read the document from index, merge it with input
> document
> > and re-index the document, but when it reads indexed pre-analyzed fields
> > the format is different, so Solr cannot parse and re-index those fields
> > properly.
> >
> >
> > Thank you,
> > Oleksandr
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: Atomic Updates with PreAnalyzedField

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Oleksandr.
It deserves JIRA, please raise one.

On Tue, Oct 15, 2019 at 8:17 PM Oleksandr Drapushko <dr...@gmail.com>
wrote:

> Hello Community,
>
> I've discovered data loss bug and couldn't find any mention of it. Please
> confirm this bug haven't been reported yet.
>
>
> Description:
>
> If you try to update non pre-analyzed fields in a document using atomic
> updates, data in pre-analyzed fields (if there is any) will be lost. The
> bug was discovered in Solr 8.2 and 7.7.2.
>
>
> Steps to reproduce:
>
> 1. Index this document into techproducts
> {
>   "id": "a",
>   "n_s": "s1",
>   "pre":
>
> "{\"v\":\"1\",\"str\":\"Alaska\",\"tokens\":[{\"t\":\"alaska\",\"s\":0,\"e\":6,\"i\":1}]}"
> }
>
> 2. Query the document
> {
>   "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
>       {
>         "id":"a",
>         "n_s":"s1",
>         "pre":"Alaska",
>         "_version_":1647475215142223872}]
>   }}
>
> 3. Update using atomic syntax
> {
>   "add": {
>     "doc": {
>       "id": "a",
>       "n_s": {"set": "s2"}
>     }
>   }
> }
>
> 4. Observe the warning in solr log
> UI:
> WARN  x:techproducts_shard2_replica_n6  PreAnalyzedField  Error parsing
> pre-analyzed field 'pre'
>
> solr.log:
> WARN  (qtp1384454980-23) [c:techproducts s:shard2 r:core_node8
> x:techproducts_shard2_replica_n6] o.a.s.s.PreAnalyzedField Error parsing
> pre-analyzed field 'pre' => java.io.IOException: Invalid JSON type
> java.lang.String, expected Map
> at
>
> org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86)
>
> 5. Query the document again
> {
>   "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
>       {
>         "id":"a",
>         "n_s":"s2",
>         "_version_":1647475461695995904}]
>   }}
>
> Result: There is no 'pre' field in the document anymore.
>
>
> My thoughts on it:
>
> 1. Data loss can be prevented if the warning will be replaced with error
> (re-throwing exception). Atomic updates for such documents still won't
> work, but updates will be explicitly rejected.
>
> 2. Solr tries to read the document from index, merge it with input document
> and re-index the document, but when it reads indexed pre-analyzed fields
> the format is different, so Solr cannot parse and re-index those fields
> properly.
>
>
> Thank you,
> Oleksandr
>


-- 
Sincerely yours
Mikhail Khludnev