You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ludger Steens <lu...@qaware.de> on 2020/05/11 12:15:30 UTC

Problems when Upgrading from Solr 7.7.1 to 8.5.0

Hi all,

we recently upgraded our SolrCloud cluster from version 7.7.1 to version
8.5.0 and ran into multiple problems.
In the end we had to revert the upgrade and went back to Solr 7.7.1.

In our company we are using Solr since Version 4 and so far, upgrading
Solr to a newer version was possible without any problems.
We are curious if others are experiencing the same kind of problems and if
these are some known issues. Or maybe we did something wrong and missed
something when upgrading?


1. Network issues when indexing documents
=======================================

Our collection contains roughly 150 million documents.  When we re-created
the collection and re-indexed all documents, we regularly experienced
network problems that causes our loader application to fail.
The Solr log always contains an IOException Exception:

ERROR
(updateExecutor-5-thread-1338-processing-x:PSMG_CI_2020_04_15_10_07_04_sha
rd6_replica_n22 r:core_node25 null n:solr2:8983_solr
c:PSMG_CI_2020_04_15_10_07_04 s:shard6) [c:PSMG_CI_2020_04_15_10_07_04
s:shard6 r:core_node25 x:PSMG_CI_2020_04_15_10_07_04_shard6_replica_n22]
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
SolrCmdDistributor$Req: cmd=add{,id=(null)}; node=StdNode:
http://solr1:8983/solr/PSMG_CI_2020_04_15_10_07_04_shard6_replica_n20/ to
http://solr1:8983/solr/PSMG_CI_2020_04_15_10_07_04_shard6_replica_n20/ =>
java.io.IOException: java.io.IOException: cancel_stream_error
         at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
tProvider.java:197)
 java.io.IOException: java.io.IOException: cancel_stream_error
         at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
tProvider.java:197) ~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
         at
org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputSt
ream.flush(OutputStreamContentProvider.java:151)
~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
         at
org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputSt
ream.write(OutputStreamContentProvider.java:145)
~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
         at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:2
16) ~[solr-solrj-8.5.0.jar:8.5.0 7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42
- romseygeek - 2020-03-1309:38:26]
         at
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.
java:209) ~[solr-solrj-8.5.0.jar:8.5.0
7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42 - romseygeek - 202003-13
09:38:26]
         at
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:172)
~[solr-solrj-8.5.0.jar:8.5.0 7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42 -
romseygeek - 2020-03-13 09:3826]

After the Exception the collection usually was in a degraded state for
some time and shards try to recover and sync with the leader.

In the Solr changelog we saw that one major change from 7.x to 8.x was
that Solr now uses HTTP/2 instead of HTTP/1.1. So we tried to disable
HTTP/2 by setting the system property solr.http1=true.
That did make the indexing process a LOT more stable but we still saw a
IOExceptions from time to time. Disabling HTTP/2 did not completely fix
the problem.

ERROR
(updateExecutor-5-thread-9310-processing-x:PSMG_BOM_2020_04_28_05_00_11_sh
ard7_replica_n24 r:core_node27 null n:solr3:8983_solr
c:PSMG_BOM_2020_04_28_05_00_11 s:shard7) [c:PSMG_BOM_2020_04_28_05_00_11
s:shard7 r:core_node27 x:PSMG_BOM_2020_04_28_05_00_11_shard7_replica_n24]
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
SolrCmdDistributor$Req: cmd=add{,id=5141653a-e33a-4b60-856d-7aa2ce73dee7};
node=ForwardNode:
http://solr2:8983/solr/PSMG_BOM_2020_04_28_05_00_11_shard6_replica_n22/ to
http://solr2:8983/solr/PSMG_BOM_2020_04_28_05_00_11_shard6_replica_n22/ =>
java.io.IOException: java.io.EOFException:
HttpConnectionOverHTTP@9dc7ad1::SocketChannelEndPoint@2d20213b{solr2/10.0.
0.216:8983<->/10.0.0.193:38728,ISHUT,fill=-,flush=-,to=5/600000}{io=0/0,ki
o=0,kro=1}->HttpConnectionOverHTTP@9dc7ad1(l:/10.0.0.193:38728 <->
r:solr2/10.0.0.216:8983,closed=false)=>HttpChannelOverHTTP@47a242c3(exchan
ge=HttpExchange@6ffd260f req=PENDING/null@null
res=PENDING/null@null)[send=HttpSenderOverHTTP@17e056f9(req=CONTENT,snd=ID
LE,failure=null)[HttpGenerator@3b6594c7{s=COMMITTED}],recv=HttpReceiverOve
rHTTP@6e847d32(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0 of -1}]]
        at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
tProvider.java:197)
java.io.IOException: java.io.EOFException:
HttpConnectionOverHTTP@9dc7ad1::SocketChannelEndPoint@2d20213b{solr2/10.0.
0.216:8983<->/10.0.0.193:38728,ISHUT,fill=-,flush=-,to=5/600000}{io=0/0,ki
o=0,kro=1}->HttpConnectionOverHTTP@9dc7ad1(l:/10.0.0.193:38728 <->
r:solr2/10.0.0.216:8983,closed=false)=>HttpChannelOverHTTP@47a242c3(exchan
ge=HttpExchange@6ffd260f req=PENDING/null@null
res=PENDING/null@null)[send=HttpSenderOverHTTP@17e056f9(req=CONTENT,snd=ID
LE,failure=null)[HttpGenerator@3b6594c7{s=COMMITTED}],recv=HttpReceiverOve
rHTTP@6e847d32(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0 of -1}]]
        at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
tProvider.java:197) ~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
        at
org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputSt
ream.flush(OutputStreamContentProvider.java:151)
~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]

Our Solr nodes run inside Docker containers in a Docker Swarm cluster and
we use a software defined overlay network
(https://docs.docker.com/network/network-tutorial-overlay/#use-a-user-defi
ned-overlay-network).
Maybe the reason for the network problems is the combination of the new
HTTP/2 implementation and the overlay network? We never had any network
issues in Solr 7 with an otherwise exact same setup.

2. Incorrect Load Balancing
=======================================

Our SolrCloud cluster contains three nodes and we use a cluster of three
ZooKeeper nodes.
We initialize our CloudSolrClient with the addresses of our ZooKeeper
nodes and the CloudSolrClient should then load balance queries between the
three Solr nodes.
This works as expected in Solr 7. However, in Solr 8 we often see that the
first Solr node receives twice as much queries as the second node and the
third node receives no queries at all.

3. Problems with indexing Child Documents
=======================================

When we index documents that contain Child Documents the our application
regularly runs into a SocketTimeoutException:
{"@timestamp":"2020-04-29T06:56:31.587Z","level":"SEVERE","logger_name":"o
rg.apache.solr.client.solrj.impl.BaseCloudSolrClient","thread_name":"concu
rrent/batchJobExecutorService-managedThreadFactory-Thread-17","log_message
":
 "Request to collection [PSMG_BOM_2020_04_29_06_52_36] failed due to (0)
java.net.SocketTimeoutException: Read timed out, retry=0 commError=false
errorCode=0 "}

{"@timestamp":"2020-04-29T06:56:31.588Z","level":"INFO","logger_name":"org
.apache.solr.client.solrj.impl.BaseCloudSolrClient","thread_name":"concurr
ent/batchJobExecutorService-managedThreadFactory-Thread-17","log_message":
 "request was not communication error it seems"}

Indexing Child Documents seems to be significantly slower in Solr 8
compared to Solr7. We set a timeout value of 2 minutes  with
CloudSolrClient.setSoTimeout().
In Solr 7 documents could be added within a few seconds and a timeout of 2
minutes was more than enough.

Cheers,
Ludger

Re: Problems when Upgrading from Solr 7.7.1 to 8.5.0

Posted by Houston Putman <ho...@gmail.com>.
Hello Ludger,

I don't have answers to all of your questions, but for #2 (Incorrect Load
Balancing) it is a bug that will be fixed in 8.6. You can find more info at
SOLR-14471 <https://issues.apache.org/jira/browse/SOLR-14471>.

- Houston

On Mon, May 11, 2020 at 8:16 AM Ludger Steens <lu...@qaware.de>
wrote:

> Hi all,
>
> we recently upgraded our SolrCloud cluster from version 7.7.1 to version
> 8.5.0 and ran into multiple problems.
> In the end we had to revert the upgrade and went back to Solr 7.7.1.
>
> In our company we are using Solr since Version 4 and so far, upgrading
> Solr to a newer version was possible without any problems.
> We are curious if others are experiencing the same kind of problems and if
> these are some known issues. Or maybe we did something wrong and missed
> something when upgrading?
>
>
> 1. Network issues when indexing documents
> =======================================
>
> Our collection contains roughly 150 million documents.  When we re-created
> the collection and re-indexed all documents, we regularly experienced
> network problems that causes our loader application to fail.
> The Solr log always contains an IOException Exception:
>
> ERROR
> (updateExecutor-5-thread-1338-processing-x:PSMG_CI_2020_04_15_10_07_04_sha
> rd6_replica_n22 r:core_node25 null n:solr2:8983_solr
> c:PSMG_CI_2020_04_15_10_07_04 s:shard6) [c:PSMG_CI_2020_04_15_10_07_04
> s:shard6 r:core_node25 x:PSMG_CI_2020_04_15_10_07_04_shard6_replica_n22]
> o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
> SolrCmdDistributor$Req: cmd=add{,id=(null)}; node=StdNode:
> http://solr1:8983/solr/PSMG_CI_2020_04_15_10_07_04_shard6_replica_n20/ to
> http://solr1:8983/solr/PSMG_CI_2020_04_15_10_07_04_shard6_replica_n20/ =>
> java.io.IOException: java.io.IOException: cancel_stream_error
>          at
> org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
> tProvider.java:197)
>  java.io.IOException: java.io.IOException: cancel_stream_error
>          at
> org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
> tProvider.java:197) ~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
>          at
> org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputSt
> ream.flush(OutputStreamContentProvider.java:151)
> ~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
>          at
> org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputSt
> ream.write(OutputStreamContentProvider.java:145)
> ~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
>          at
> org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:2
> 16) ~[solr-solrj-8.5.0.jar:8.5.0 7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42
> - romseygeek - 2020-03-1309:38:26]
>          at
> org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.
> java:209) ~[solr-solrj-8.5.0.jar:8.5.0
> 7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42 - romseygeek - 202003-13
> 09:38:26]
>          at
> org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:172)
> ~[solr-solrj-8.5.0.jar:8.5.0 7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42 -
> romseygeek - 2020-03-13 09:3826]
>
> After the Exception the collection usually was in a degraded state for
> some time and shards try to recover and sync with the leader.
>
> In the Solr changelog we saw that one major change from 7.x to 8.x was
> that Solr now uses HTTP/2 instead of HTTP/1.1. So we tried to disable
> HTTP/2 by setting the system property solr.http1=true.
> That did make the indexing process a LOT more stable but we still saw a
> IOExceptions from time to time. Disabling HTTP/2 did not completely fix
> the problem.
>
> ERROR
> (updateExecutor-5-thread-9310-processing-x:PSMG_BOM_2020_04_28_05_00_11_sh
> ard7_replica_n24 r:core_node27 null n:solr3:8983_solr
> c:PSMG_BOM_2020_04_28_05_00_11 s:shard7) [c:PSMG_BOM_2020_04_28_05_00_11
> s:shard7 r:core_node27 x:PSMG_BOM_2020_04_28_05_00_11_shard7_replica_n24]
> o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
> SolrCmdDistributor$Req: cmd=add{,id=5141653a-e33a-4b60-856d-7aa2ce73dee7};
> node=ForwardNode:
> http://solr2:8983/solr/PSMG_BOM_2020_04_28_05_00_11_shard6_replica_n22/ to
> http://solr2:8983/solr/PSMG_BOM_2020_04_28_05_00_11_shard6_replica_n22/ =>
> java.io.IOException: java.io.EOFException:
> HttpConnectionOverHTTP@9dc7ad1::SocketChannelEndPoint@2d20213b{solr2/10.0.
> 0.216:8983<->/10.0.0.193:38728,ISHUT,fill=-,flush=-,to=5/600000}{io=0/0,ki
> o=0,kro=1}->HttpConnectionOverHTTP@9dc7ad1(l:/10.0.0.193:38728 <->
> r:solr2/10.0.0.216:8983,closed=false)=>HttpChannelOverHTTP@47a242c3(exchan
> ge=HttpExchange@6ffd260f req=PENDING/null@null
> res=PENDING/null@null)[send=HttpSenderOverHTTP@17e056f9(req=CONTENT,snd=ID
> LE,failure=null)[HttpGenerator@3b6594c7{s=COMMITTED}],recv=HttpReceiverOve
> rHTTP@6e847d32(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0 of -1}]]
>         at
> org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
> tProvider.java:197)
> java.io.IOException: java.io.EOFException:
> HttpConnectionOverHTTP@9dc7ad1::SocketChannelEndPoint@2d20213b{solr2/10.0.
> 0.216:8983<->/10.0.0.193:38728,ISHUT,fill=-,flush=-,to=5/600000}{io=0/0,ki
> o=0,kro=1}->HttpConnectionOverHTTP@9dc7ad1(l:/10.0.0.193:38728 <->
> r:solr2/10.0.0.216:8983,closed=false)=>HttpChannelOverHTTP@47a242c3(exchan
> ge=HttpExchange@6ffd260f req=PENDING/null@null
> res=PENDING/null@null)[send=HttpSenderOverHTTP@17e056f9(req=CONTENT,snd=ID
> LE,failure=null)[HttpGenerator@3b6594c7{s=COMMITTED}],recv=HttpReceiverOve
> rHTTP@6e847d32(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0 of -1}]]
>         at
> org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
> tProvider.java:197) ~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
>         at
> org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputSt
> ream.flush(OutputStreamContentProvider.java:151)
> ~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
>
> Our Solr nodes run inside Docker containers in a Docker Swarm cluster and
> we use a software defined overlay network
> (https://docs.docker.com/network/network-tutorial-overlay/#use-a-user-defi
> ned-overlay-network
> <https://docs.docker.com/network/network-tutorial-overlay/#use-a-user-defined-overlay-network>
> ).
> Maybe the reason for the network problems is the combination of the new
> HTTP/2 implementation and the overlay network? We never had any network
> issues in Solr 7 with an otherwise exact same setup.
>
> 2. Incorrect Load Balancing
> =======================================
>
> Our SolrCloud cluster contains three nodes and we use a cluster of three
> ZooKeeper nodes.
> We initialize our CloudSolrClient with the addresses of our ZooKeeper
> nodes and the CloudSolrClient should then load balance queries between the
> three Solr nodes.
> This works as expected in Solr 7. However, in Solr 8 we often see that the
> first Solr node receives twice as much queries as the second node and the
> third node receives no queries at all.
>
> 3. Problems with indexing Child Documents
> =======================================
>
> When we index documents that contain Child Documents the our application
> regularly runs into a SocketTimeoutException:
> {"@timestamp":"2020-04-29T06:56:31.587Z","level":"SEVERE","logger_name":"o
> rg.apache.solr.client.solrj.impl.BaseCloudSolrClient","thread_name":"concu
> rrent/batchJobExecutorService-managedThreadFactory-Thread-17","log_message
> ":
>  "Request to collection [PSMG_BOM_2020_04_29_06_52_36] failed due to (0)
> java.net.SocketTimeoutException: Read timed out, retry=0 commError=false
> errorCode=0 "}
>
> {"@timestamp":"2020-04-29T06:56:31.588Z","level":"INFO","logger_name":"org
> .apache.solr.client.solrj.impl.BaseCloudSolrClient","thread_name":"concurr
> ent/batchJobExecutorService-managedThreadFactory-Thread-17","log_message":
>  "request was not communication error it seems"}
>
> Indexing Child Documents seems to be significantly slower in Solr 8
> compared to Solr7. We set a timeout value of 2 minutes  with
> CloudSolrClient.setSoTimeout().
> In Solr 7 documents could be added within a few seconds and a timeout of 2
> minutes was more than enough.
>
> Cheers,
> Ludger
>