You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2021/12/11 17:35:29 UTC
[GitHub] [solr-operator] gthvidsten opened a new issue #385: Solr can't connect to Zookeeper
gthvidsten opened a new issue #385:
URL: https://github.com/apache/solr-operator/issues/385
I've installed the SolrCloud with just about all default values except for storage, where I've added a `storageClassName` to both Zookeeper and Solr.
When I'm accessing `Cloud > ZK Status` in the Solr web interface (after portforwarding to the "common" service) I get a mostly empty screen containing only this:
```
Status:
ZK connection string:
Ensemble size:
Ensemble mode:
```
With the following error message at the top: `For input string: "null"`
If I go to the `Logging` in the Solr web interface the log is very quickly filled with these two errors:
```
RequestHandlerBase | java.lang.NumberFormatException: For input string: "null"
HttpSolrCall | null:java.lang.NumberFormatException: For input string: "null"
```
I guess these other errors appear on startup and they seem to be be very related to ZK:
```
ERROR | StaticHostProvider | Unable to resolve address: example-solrcloud-zookeeper-1.example-solrcloud-zookeeper-headless.default.svc.cluster.local:2181
WARN | ClientCnxn | Session 0x0 for server example-solrcloud-zookeeper-1.example-solrcloud-zookeeper-headless.default.svc.cluster.local:2181,​ unexpected error,​ closing socket connection and attempting reconnect
ERROR | StaticHostProvider | Unable to resolve address: example-solrcloud-zookeeper-2.example-solrcloud-zookeeper-headless.default.svc.cluster.local:2181
WARN | ClientCnxn | Session 0x0 for server example-solrcloud-zookeeper-2.example-solrcloud-zookeeper-headless.default.svc.cluster.local:2181,​ unexpected error,​ closing socket connection and attempting reconnect
ERROR | StaticHostProvider | Unable to resolve address: example-solrcloud-zookeeper-2.example-solrcloud-zookeeper-headless.default.svc.cluster.local:2181
WARN | ClientCnxn | Session 0x0 for server example-solrcloud-zookeeper-2.example-solrcloud-zookeeper-headless.default.svc.cluster.local:2181,​ unexpected error,​ closing socket connection and attempting reconnect
WARN | ScheduledTrigger | ScheduledTrigger was not able to run event at scheduled time: 2021-12-11T09:09:16.986Z. Now: 2021-12-11T17:14:46.438Z
```
Looking at the pod logs from i.e. `example-solrcloud-zookeeper-0` I can see the following warnings and errors (some of them repeating multiple times):
```
WARN [QuorumConnectionThread-[myid=1]-1:QuorumCnxManager@400] - Cannot open channel to 2 at election address example-solrcloud-zookeeper-1.example-solrcloud-zookeeper-headless.default.svc.cluster.local:3888
java.net.UnknownHostException: example-solrcloud-zookeeper-1.example-solrcloud-zookeeper-headless.default.svc.cluster.local
at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
at java.base/java.net.Socket.connect(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
WARN [SyncThread:1:QuorumPeer@1775] - Restarting Leader Election
WARN [NIOWorkerThread-2:NIOServerCnxn@364] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /10.240.0.54:54618, session = 0x100000911bc0000
at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
ERROR [LearnerHandler-/10.240.0.54:54148:LearnerHandler@707] - Unexpected exception causing shutdown while sock still open
java.net.SocketException: Connection reset
at java.base/java.net.SocketInputStream.read(Unknown Source)
at java.base/java.net.SocketInputStream.read(Unknown Source)
at java.base/java.io.BufferedInputStream.fill(Unknown Source)
at java.base/java.io.BufferedInputStream.read(Unknown Source)
at java.base/java.io.DataInputStream.readInt(Unknown Source)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:643)
WARN [LearnerHandler-/10.240.0.54:54148:LearnerHandler@730] - ******* GOODBYE /10.240.0.54:54148 ********
WARN [NIOWorkerThread-2:NIOServerCnxn@364] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /10.240.0.54:54832, session = 0x100000911bc0002
at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
```
What is going on here? How can it be fixed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] HoustonPutman commented on issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
HoustonPutman commented on issue #385:
URL: https://github.com/apache/solr-operator/issues/385#issuecomment-998201837
where are you running this? The fact that it can't find the host `example-solrcloud-zookeeper-1.example-solrcloud-zookeeper-headless.default.svc.cluster.local` is a bad sign. Do you have custom networking settings enabled in your Kubernetes cluster?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] gthvidsten commented on issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
gthvidsten commented on issue #385:
URL: https://github.com/apache/solr-operator/issues/385#issuecomment-998601234
This is on an Azure Kubernetes Service with most settings at their default.
Here's the bicep used to provision up the cluster. All values (except the Windows node) should be equal to the default values used when creating a cluster directly in the Azure Portal. (We created a cluster in the portal, exported the template, converted it to bicep, and minified it as best we could):
```
param nodePoolAvailabilityZones array = [
'1'
'2'
'3'
]
param nodeResourceGroupName string
param aksName string
resource aks_resource 'Microsoft.ContainerService/managedClusters@2021-08-01' = {
name: aksName
location: resourceGroup().location
properties: {
kubernetesVersion: '1.20.9'
dnsPrefix: aksName
nodeResourceGroup: nodeResourceGroupName
enableRBAC: true
agentPoolProfiles: [
{
name: 'agentpool'
osDiskSizeGB: 128
count: 1
vmSize: 'Standard_B2s'
osType: 'Linux'
maxPods: 110
enableAutoScaling: false
type: 'VirtualMachineScaleSets'
mode: 'System'
availabilityZones: nodePoolAvailabilityZones
orchestratorVersion: '1.20.9'
}
{
name: 'winpool'
osDiskSizeGB: 128
count: 1
vmSize: 'Standard_D8s_v3'
osType: 'Windows'
maxPods: 110
enableAutoScaling: false
type: 'VirtualMachineScaleSets'
mode: 'User'
availabilityZones: nodePoolAvailabilityZones
orchestratorVersion: '1.20.9'
}
]
addonProfiles: {
httpApplicationRouting: {
enabled: false
}
azurepolicy: {
enabled: false
}
}
networkProfile: {
networkPlugin: 'azure'
serviceCidr: '10.0.0.0/16'
dnsServiceIP: '10.0.0.10'
dockerBridgeCidr: '172.17.0.1/16'
loadBalancerSku: 'standard'
}
}
identity: {
type: 'SystemAssigned'
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] HoustonPutman commented on issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
HoustonPutman commented on issue #385:
URL: https://github.com/apache/solr-operator/issues/385#issuecomment-1036499308
No update, I can't really think of any reason this would be failing... The only thing I can think of is there is something wrong with the Kube cluster, and that restarting everything might provide a fix? Are you also running into this issue @sbbagal13 ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] sbbagal13 commented on issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
sbbagal13 commented on issue #385:
URL: https://github.com/apache/solr-operator/issues/385#issuecomment-1028500766
Any update on this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] HoustonPutman commented on issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
HoustonPutman commented on issue #385:
URL: https://github.com/apache/solr-operator/issues/385#issuecomment-1042226593
Ok after a lot of digging, I think I know what is happening, and it is unrelated to Solr or the Solr Operator.
Basically, the Zookeeper Operator uses/builds its own Docker images of Zookeeper, using Alpine as the base image. Unfortunately, Alpine has a storied history of DNS issues with Kubernetes:
- https://stackoverflow.com/questions/65181012/does-alpine-have-known-dns-issue-within-kubernetes
- https://support.cloudbees.com/hc/en-us/articles/360040999471-UnknownHostException-caused-by-DNS-Resolution-issue-with-Alpine-Images
- https://github.com/kubernetes/kubernetes/issues/64924
I would recommend making an issue in the [Zookeeper Operator repo](https://github.com/pravega/zookeeper-operator/issues), telling them about the problem and asking them to solve the issue in Alpine or switch to using a different base image for Zookeeper.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] HoustonPutman commented on issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
HoustonPutman commented on issue #385:
URL: https://github.com/apache/solr-operator/issues/385#issuecomment-1042258723
Actually I see that Solr also cannot resolve the ZK address.
This is probably because the Zookeeper Pods can never become healthy, and the ZK Headless service does not have `publishNotReadyAddresses: true` enabled. I would also recommend adding that to your issue in the ZK Operator repo, as I would imagine that setting would be useful in the ZK Headless service.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] HoustonPutman closed issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
HoustonPutman closed issue #385:
URL: https://github.com/apache/solr-operator/issues/385
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] HoustonPutman commented on issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
HoustonPutman commented on issue #385:
URL: https://github.com/apache/solr-operator/issues/385#issuecomment-996871806
Could you run `kubectl describe zookeepercluster <zk-name>`? I'd need to see what it's expected zk connection string is.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] gthvidsten commented on issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
gthvidsten commented on issue #385:
URL: https://github.com/apache/solr-operator/issues/385#issuecomment-997376083
I can't find any connection strings in the desciption, so here's the entire output from the command:
```
Name: example-solrcloud-zookeeper
Namespace: default
Labels: app=example-solrcloud-zookeeper
app.kubernetes.io/instance=example-solr
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=solr
app.kubernetes.io/version=8.9.0
helm.sh/chart=solr-0.5.0
release=example-solrcloud-zookeeper
solr-cloud=example
technology=zookeeper
Annotations: <none>
API Version: zookeeper.pravega.io/v1beta1
Kind: ZookeeperCluster
Metadata:
Creation Timestamp: 2021-12-19T11:25:58Z
Generation: 1
Managed Fields:
API Version: zookeeper.pravega.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.:
f:app:
f:app.kubernetes.io/instance:
f:app.kubernetes.io/managed-by:
f:app.kubernetes.io/name:
f:app.kubernetes.io/version:
f:helm.sh/chart:
f:release:
f:solr-cloud:
f:technology:
f:ownerReferences:
f:spec:
.:
f:adminServerService:
f:clientService:
f:config:
.:
f:autoPurgePurgeInterval:
f:autoPurgeSnapRetainCount:
f:commitLogCount:
f:globalOutstandingLimit:
f:initLimit:
f:maxClientCnxns:
f:maxSessionTimeout:
f:minSessionTimeout:
f:preAllocSize:
f:snapCount:
f:snapSizeLimitInKb:
f:syncLimit:
f:tickTime:
f:headlessService:
f:image:
.:
f:pullPolicy:
f:repository:
f:tag:
f:labels:
.:
f:app:
f:app.kubernetes.io/instance:
f:app.kubernetes.io/managed-by:
f:app.kubernetes.io/name:
f:app.kubernetes.io/version:
f:helm.sh/chart:
f:release:
f:solr-cloud:
f:technology:
f:persistence:
.:
f:reclaimPolicy:
f:spec:
.:
f:accessModes:
f:resources:
.:
f:requests:
.:
f:storage:
f:storageClassName:
f:pod:
.:
f:affinity:
.:
f:podAntiAffinity:
.:
f:preferredDuringSchedulingIgnoredDuringExecution:
f:labels:
.:
f:app:
f:release:
f:nodeSelector:
.:
f:kubernetes.io/os:
f:resources:
f:serviceAccountName:
f:terminationGracePeriodSeconds:
f:ports:
f:probes:
.:
f:livenessProbe:
.:
f:failureThreshold:
f:initialDelaySeconds:
f:periodSeconds:
f:successThreshold:
f:timeoutSeconds:
f:readinessProbe:
.:
f:failureThreshold:
f:initialDelaySeconds:
f:periodSeconds:
f:successThreshold:
f:timeoutSeconds:
f:replicas:
f:storageType:
Manager: solr-operator
Operation: Update
Time: 2021-12-19T11:25:58Z
API Version: zookeeper.pravega.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:spec:
f:labels:
f:owner-rv:
f:status:
.:
f:conditions:
f:externalClientEndpoint:
f:internalClientEndpoint:
f:members:
.:
f:ready:
f:unready:
f:readyReplicas:
f:replicas:
Manager: zookeeper-operator
Operation: Update
Time: 2021-12-19T11:26:45Z
Owner References:
API Version: solr.apache.org/v1beta1
Block Owner Deletion: true
Controller: true
Kind: SolrCloud
Name: example
UID: b32657b6-fba0-422d-b472-16e92ccc82a4
Resource Version: 3165711
UID: 54244129-4e82-4fcb-8fab-cc1609278ea6
Spec:
Admin Server Service:
Client Service:
Config:
Auto Purge Purge Interval: 1
Auto Purge Snap Retain Count: 3
Commit Log Count: 500
Global Outstanding Limit: 1000
Init Limit: 10
Max Client Cnxns: 60
Max Session Timeout: 40000
Min Session Timeout: 4000
Pre Alloc Size: 65536
Snap Count: 10000
Snap Size Limit In Kb: 4194304
Sync Limit: 2
Tick Time: 2000
Headless Service:
Image:
Pull Policy: IfNotPresent
Repository: docker.io/pravega/zookeeper
Tag: 0.2.12
Labels:
App: example-solrcloud-zookeeper
app.kubernetes.io/instance: example-solr
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: solr
app.kubernetes.io/version: 8.9.0
helm.sh/chart: solr-0.5.0
Release: example-solrcloud-zookeeper
Solr - Cloud: example
Technology: zookeeper
Persistence:
Reclaim Policy: Retain
Spec:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 5Gi
Storage Class Name: solr-storage
Pod:
Affinity:
Pod Anti Affinity:
Preferred During Scheduling Ignored During Execution:
Pod Affinity Term:
Label Selector:
Match Expressions:
Key: app
Operator: In
Values:
example-solrcloud-zookeeper
Topology Key: kubernetes.io/hostname
Weight: 20
Labels:
App: example-solrcloud-zookeeper
Release: example-solrcloud-zookeeper
Node Selector:
kubernetes.io/os: linux
Resources:
Service Account Name: default
Termination Grace Period Seconds: 30
Ports:
Container Port: 2181
Name: client
Protocol: TCP
Container Port: 2888
Name: quorum
Protocol: TCP
Container Port: 3888
Name: leader-election
Protocol: TCP
Container Port: 7000
Name: metrics
Protocol: TCP
Container Port: 8080
Name: admin-server
Protocol: TCP
Probes:
Liveness Probe:
Failure Threshold: 3
Initial Delay Seconds: 10
Period Seconds: 10
Success Threshold: 0
Timeout Seconds: 10
Readiness Probe:
Failure Threshold: 3
Initial Delay Seconds: 10
Period Seconds: 10
Success Threshold: 1
Timeout Seconds: 10
Replicas: 3
Storage Type: persistence
Status:
Conditions:
Status: False
Type: PodsReady
Status: False
Type: Upgrading
Status: False
Type: Error
External Client Endpoint: N/A
Internal Client Endpoint: 10.0.115.187:2181
Members:
Ready:
example-solrcloud-zookeeper-1
example-solrcloud-zookeeper-0
Unready:
example-solrcloud-zookeeper-2
Ready Replicas: 3
Replicas: 3
Events: <none>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org
[GitHub] [solr-operator] gthvidsten commented on issue #385: Solr can't connect to Zookeeper
Posted by GitBox <gi...@apache.org>.
gthvidsten commented on issue #385:
URL: https://github.com/apache/solr-operator/issues/385#issuecomment-1037102033
The cluster was restarted several times between my attempts, so that doesn't solve it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org