You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hup Chen <ch...@hotmail.com> on 2020/05/29 11:29:13 UTC

TolerantUpdateProcessorFactory not functioning

Hi,

My solr indexing did not tolerate bad record but simply exited even I have configured TolerantUpdateProcessorFactory  in solrconfig.xml.
Please advise how could I get TolerantUpdateProcessorFactory  to be working?

solrconfig.xml:

 <updateRequestProcessorChain name="tolerant-chain">
   <processor class="solr.TolerantUpdateProcessorFactory">
     <int name="maxErrors">100</int>
   </processor>
   <processor class="solr.RunUpdateProcessorFactory" />
 </updateRequestProcessorChain>

restarted solr before indexing:
service solr stop
service solr start

curl "http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100" -d @test.json

The first record is a bad record in test.json, the rest were not indexed.

{
  "responseHeader":{
    "errors":[{
        "type":"ADD",
        "id":"0007264097",
        "message":"ERROR: [doc=0007264097] Error adding field 'usedshipping'='' msg=empty String"}],
    "maxErrors":100,
    "status":400,
    "QTime":0},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Cannot parse provided JSON: Expected key,value separator ':': char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko OÃtomo\", ãã, \"ima'",
    "code":400}}


Re: Fw: TolerantUpdateProcessorFactory not functioning

Posted by Hup Chen <ch...@hotmail.com>.
There was another error which I think it should be an indexing error.
The listprice below is a pdouble filed, the update process didn't ignore the error when it was sent wrong data.

Response: {
  "responseHeader":{
    "status":400,
    "QTime":133551},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.NumberFormatException"],
    "msg":"ERROR: [doc=978194537913] Error adding field 'listprice'='106Chapter' msg=For input string: \"106Chapter\"",
    "code":400}}


________________________________
From: Shawn Heisey <ap...@elyograg.org>
Sent: Tuesday, June 9, 2020 3:19 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

On 6/9/2020 12:44 AM, Hup Chen wrote:
> Thanks for your reply, this is one of the example where it fail.  POST by using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in the title field,  I hope solr can simply skip this record and go ahead to index the rest data.
>
> <add>
> <doc>
>   <field name="id">9780373773244</field>
>   <field name="isbn13">9780373773244</field>
> <field name="title">Missing: Innocent By Association^Zachary's Law (Hqn Romance) </field>
>   <field name="author">Lisa_Jackson </field>
> </doc>
> </add>
>
> curl "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100" -H 'Content-Type: text/xml; charset=utf-8' -d @data
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>    <arr name="errors"/>
>    <int name="maxErrors">100</int>
>    <int name="status">400</int>
>    <int name="QTime">0</int>
> </lst>
> <lst name="error">
>    <lst name="metadata">
>      <str name="error-class">org.apache.solr.common.SolrException</str>
>      <str name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
>    </lst>
>    <str name="msg">Illegal character ((CTRL-CHAR, code 26))
>   at [row,col {unknown-source}]: [1,225]</str>
>    <int name="code">400</int>
> </lst>
> </response>

I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update" -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn

Re: Fw: TolerantUpdateProcessorFactory not functioning

Posted by Hup Chen <ch...@hotmail.com>.
Oh I got it, that's not indexing error!
Seem like I need to remove all the characters between [\x0-\x1F] (except \x9 TAB, \xA LF, \xD CR) first.

Thanks a lot!




________________________________
From: Shawn Heisey <ap...@elyograg.org>
Sent: Tuesday, June 9, 2020 3:19 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning


I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update" -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn

Re: Fw: TolerantUpdateProcessorFactory not functioning

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/9/2020 12:44 AM, Hup Chen wrote:
> Thanks for your reply, this is one of the example where it fail.  POST by using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in the title field,  I hope solr can simply skip this record and go ahead to index the rest data.
> 
> <add>
> <doc>
>   <field name="id">9780373773244</field>
>   <field name="isbn13">9780373773244</field>
> <field name="title">Missing: Innocent By Association^Zachary's Law (Hqn Romance) </field>
>   <field name="author">Lisa_Jackson </field>
> </doc>
> </add>
> 
> curl "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100" -H 'Content-Type: text/xml; charset=utf-8' -d @data
> 
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> 
> <lst name="responseHeader">
>    <arr name="errors"/>
>    <int name="maxErrors">100</int>
>    <int name="status">400</int>
>    <int name="QTime">0</int>
> </lst>
> <lst name="error">
>    <lst name="metadata">
>      <str name="error-class">org.apache.solr.common.SolrException</str>
>      <str name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
>    </lst>
>    <str name="msg">Illegal character ((CTRL-CHAR, code 26))
>   at [row,col {unknown-source}]: [1,225]</str>
>    <int name="code">400</int>
> </lst>
> </response>

I tried your example XML as it is shown in your original message, saved 
to a file named "foo.xml", and didn't have any trouble.  I wasn't even 
using the tolerant update processor.   I just fired up the techproducts 
example on a solr-8.3.0 download I already had, added a field named 
"isbn13" (string type) so the schema was compatible, and tried the 
following command:

curl "http://localhost:8983/solr/techproducts/update" -H 'Content-Type: 
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by 
an actual Ctrl-Z character.  When I did that, I got exactly the same 
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML, 
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format 
of the input ... it only ignores errors during *indexing*.  This error 
occurred during the input parsing, not during indexing, so the update 
processor could not ignore it.

Thanks,
Shawn

Re: Fw: TolerantUpdateProcessorFactory not functioning

Posted by Hup Chen <ch...@hotmail.com>.
Thanks for your reply, this is one of the example where it fail.  POST by using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in the title field,  I hope solr can simply skip this record and go ahead to index the rest data.

<add>
<doc>
 <field name="id">9780373773244</field>
 <field name="isbn13">9780373773244</field>
<field name="title">Missing: Innocent By Association^Zachary's Law (Hqn Romance) </field>
 <field name="author">Lisa_Jackson </field>
</doc>
</add>



curl "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100" -H 'Content-Type: text/xml; charset=utf-8' -d @data


<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <arr name="errors"/>
  <int name="maxErrors">100</int>
  <int name="status">400</int>
  <int name="QTime">0</int>
</lst>
<lst name="error">
  <lst name="metadata">
    <str name="error-class">org.apache.solr.common.SolrException</str>
    <str name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
  </lst>
  <str name="msg">Illegal character ((CTRL-CHAR, code 26))
 at [row,col {unknown-source}]: [1,225]</str>
  <int name="code">400</int>
</lst>
</response>

________________________________
From: Thomas Corthals <th...@klascement.net>
Sent: Tuesday, June 9, 2020 2:12 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

If your XML or JSON can't be parsed, your content never makes it to the
update chain.

It looks like you're trying to index non-UTF-8 data. You can set the
encoding of your XML in the Content-Type header of your POST request.

-H 'Content-Type: text/xml; charset=GB18030'

JSON only allows UTF-8, UTF-16 or UTF-32.

Best,

Thomas

Op di 9 jun. 2020 07:11 schreef Hup Chen <ch...@hotmail.com>:

> Any idea?
> I still won't be able to get TolerantUpdateProcessorFactory working, solr
> exited at any error without any tolerance, any suggestions will be
> appreciated.
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100"
> -d @data.xml
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>   <arr name="errors"/>
>   <int name="maxErrors">100</int>
>   <int name="status">400</int>
>   <int name="QTime">1</int>
> </lst>
> <lst name="error">
>   <lst name="metadata">
>     <str name="error-class">org.apache.solr.common.SolrException</str>
>     <str name="root-error-class">com.ctc.wstx.exc.WstxEOFException</str>
>   </lst>
>   <str name="msg">Unexpected EOF; was expecting a close tag for element
> &lt;field&gt;
>  at [row,col {unknown-source}]: [1,8191]</str>
>   <int name="code">400</int>
> </lst>
> </response>
>
>
> ________________________________
> From: Hup Chen
> Sent: Friday, May 29, 2020 7:29 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: TolerantUpdateProcessorFactory not functioning
>
> Hi,
>
> My solr indexing did not tolerate bad record but simply exited even I have
> configured TolerantUpdateProcessorFactory  in solrconfig.xml.
> Please advise how could I get TolerantUpdateProcessorFactory  to be
> working?
>
> solrconfig.xml:
>
>  <updateRequestProcessorChain name="tolerant-chain">
>    <processor class="solr.TolerantUpdateProcessorFactory">
>      <int name="maxErrors">100</int>
>    </processor>
>    <processor class="solr.RunUpdateProcessorFactory" />
>  </updateRequestProcessorChain>
>
> restarted solr before indexing:
> service solr stop
> service solr start
>
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100"
> -d @test.json
>
> The first record is a bad record in test.json, the rest were not indexed.
>
> {
>   "responseHeader":{
>     "errors":[{
>         "type":"ADD",
>         "id":"0007264097",
>         "message":"ERROR: [doc=0007264097] Error adding field
> 'usedshipping'='' msg=empty String"}],
>     "maxErrors":100,
>     "status":400,
>     "QTime":0},
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.solr.common.SolrException"],
>     "msg":"Cannot parse provided JSON: Expected key,value separator ':':
> char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\",
> \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko
> OÃtomo\", ãã, \"ima'",
>     "code":400}}
>
>

Re: Fw: TolerantUpdateProcessorFactory not functioning

Posted by Thomas Corthals <th...@klascement.net>.
If your XML or JSON can't be parsed, your content never makes it to the
update chain.

It looks like you're trying to index non-UTF-8 data. You can set the
encoding of your XML in the Content-Type header of your POST request.

-H 'Content-Type: text/xml; charset=GB18030'

JSON only allows UTF-8, UTF-16 or UTF-32.

Best,

Thomas

Op di 9 jun. 2020 07:11 schreef Hup Chen <ch...@hotmail.com>:

> Any idea?
> I still won't be able to get TolerantUpdateProcessorFactory working, solr
> exited at any error without any tolerance, any suggestions will be
> appreciated.
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100"
> -d @data.xml
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>   <arr name="errors"/>
>   <int name="maxErrors">100</int>
>   <int name="status">400</int>
>   <int name="QTime">1</int>
> </lst>
> <lst name="error">
>   <lst name="metadata">
>     <str name="error-class">org.apache.solr.common.SolrException</str>
>     <str name="root-error-class">com.ctc.wstx.exc.WstxEOFException</str>
>   </lst>
>   <str name="msg">Unexpected EOF; was expecting a close tag for element
> &lt;field&gt;
>  at [row,col {unknown-source}]: [1,8191]</str>
>   <int name="code">400</int>
> </lst>
> </response>
>
>
> ________________________________
> From: Hup Chen
> Sent: Friday, May 29, 2020 7:29 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: TolerantUpdateProcessorFactory not functioning
>
> Hi,
>
> My solr indexing did not tolerate bad record but simply exited even I have
> configured TolerantUpdateProcessorFactory  in solrconfig.xml.
> Please advise how could I get TolerantUpdateProcessorFactory  to be
> working?
>
> solrconfig.xml:
>
>  <updateRequestProcessorChain name="tolerant-chain">
>    <processor class="solr.TolerantUpdateProcessorFactory">
>      <int name="maxErrors">100</int>
>    </processor>
>    <processor class="solr.RunUpdateProcessorFactory" />
>  </updateRequestProcessorChain>
>
> restarted solr before indexing:
> service solr stop
> service solr start
>
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100"
> -d @test.json
>
> The first record is a bad record in test.json, the rest were not indexed.
>
> {
>   "responseHeader":{
>     "errors":[{
>         "type":"ADD",
>         "id":"0007264097",
>         "message":"ERROR: [doc=0007264097] Error adding field
> 'usedshipping'='' msg=empty String"}],
>     "maxErrors":100,
>     "status":400,
>     "QTime":0},
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.solr.common.SolrException"],
>     "msg":"Cannot parse provided JSON: Expected key,value separator ':':
> char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\",
> \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko
> OÃtomo\", ãã, \"ima'",
>     "code":400}}
>
>

Fw: TolerantUpdateProcessorFactory not functioning

Posted by Hup Chen <ch...@hotmail.com>.
Any idea?
I still won't be able to get TolerantUpdateProcessorFactory working, solr exited at any error without any tolerance, any suggestions will be appreciated.
curl "http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100" -d @data.xml

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <arr name="errors"/>
  <int name="maxErrors">100</int>
  <int name="status">400</int>
  <int name="QTime">1</int>
</lst>
<lst name="error">
  <lst name="metadata">
    <str name="error-class">org.apache.solr.common.SolrException</str>
    <str name="root-error-class">com.ctc.wstx.exc.WstxEOFException</str>
  </lst>
  <str name="msg">Unexpected EOF; was expecting a close tag for element &lt;field&gt;
 at [row,col {unknown-source}]: [1,8191]</str>
  <int name="code">400</int>
</lst>
</response>


________________________________
From: Hup Chen
Sent: Friday, May 29, 2020 7:29 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: TolerantUpdateProcessorFactory not functioning

Hi,

My solr indexing did not tolerate bad record but simply exited even I have configured TolerantUpdateProcessorFactory  in solrconfig.xml.
Please advise how could I get TolerantUpdateProcessorFactory  to be working?

solrconfig.xml:

 <updateRequestProcessorChain name="tolerant-chain">
   <processor class="solr.TolerantUpdateProcessorFactory">
     <int name="maxErrors">100</int>
   </processor>
   <processor class="solr.RunUpdateProcessorFactory" />
 </updateRequestProcessorChain>

restarted solr before indexing:
service solr stop
service solr start

curl "http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100" -d @test.json

The first record is a bad record in test.json, the rest were not indexed.

{
  "responseHeader":{
    "errors":[{
        "type":"ADD",
        "id":"0007264097",
        "message":"ERROR: [doc=0007264097] Error adding field 'usedshipping'='' msg=empty String"}],
    "maxErrors":100,
    "status":400,
    "QTime":0},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Cannot parse provided JSON: Expected key,value separator ':': char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko OÃtomo\", ãã, \"ima'",
    "code":400}}