You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (Jira)" <ji...@apache.org> on 2021/03/25 14:33:00 UTC
[jira] [Comment Edited] (CASSANDRA-16538) Cannot run restore for a list of Cassandra nodes

    [ https://issues.apache.org/jira/browse/CASSANDRA-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308717#comment-17308717 ] 

Brandon Williams edited comment on CASSANDRA-16538 at 3/25/21, 2:32 PM:
------------------------------------------------------------------------

Hi Yolanda,

This jira is for tracking bugs in the Apache Cassandra software, and doesn't make for a good vehicle for support.  I recommend contacting the community through Slack or the mailing list: https://cassandra.apache.org/community/


was (Author: brandon.williams):
Hi Yolanda,

This jira is for tracking bugs in the Apache Cassandra software, and doesn't make for a good vehicle support.  I recommend contacting the community through Slack or the mailing list: https://cassandra.apache.org/community/

> Cannot run restore for a list of Cassandra nodes
> ------------------------------------------------
>
>                 Key: CASSANDRA-16538
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16538
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Yolanda Tang
>            Priority: Normal
>
> Hi,
>  
> When switching to use Cassandra medus to fulfill our work for node data restore, we encountered some issues.
> When using pssh remotely we are getting timeout issue, when trying the command on one node of Cassandra, we  get
>  
> {code:java}
> pssh -H XXXX medusa -vvv restore-node --in-place --no-verify --backup-name 2021031803 --temp-dir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a
>  [1] 06:52:08 [FAILURE] sha8392 Timed out, Killed by signal 9
>  When further looking into the timeout issue, we get logs as
>  [2021-03-25 02:23:50,113] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "GET /XX/XX/10.44.XX.XX/2021031803/meta/schema.cql?Version=2006-03-01 HTTP/1.1" 200 24005[2021-03-25 02:23:50,114] DEBUG: [Storage] Getting object sre_dev_cass_sha/10.44.79.15/2021031803/meta/tokenmap.json
>  [2021-03-25 02:23:50,151] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "HEAD /XX HTTP/1.1" 200 0[2021-03-25 02:23:50,201] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "HEAD /XX/XX/10.44.79.15/2021031803/meta/tokenmap.json HTTP/1.1" 200 0[2021-03-25 02:23:50,202] DEBUG: Downloading /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a/medusa-restore-197b6c82-4cd5-4c5b-b3c2-9d98863c1b3f as single part
>  [2021-03-25 02:23:50,254] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "GET /XX/XX/10.44.XX.XX/2021031803/meta/tokenmap.json?Version=2006-03-01 HTTP/1.1" 200 1535[2021-03-25 02:23:50,255] INFO: Stopping Cassandra
> + /usr/bin/nodetool u cassandra -pw if9te8ohKei9xaep drain+ /usr/bin/nodetool -u cassandra -pw if9te8ohKei9xaep drainerror: null- StackTrace --java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298) at com.sun.proxy.$Proxy8.drain(Unknown Source) at org.apache.cassandra.tools.NodeProbe.drain(NodeProbe.java:371) at org.apache.cassandra.tools.nodetool.Drain.execute(Drain.java:36) at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)
>  + ls -l /var/run/cassandra/cassandra.pidls: cannot access /var/run/cassandra/cassandra.pid: No such file or directory+ sleep 10+ echo -n 'Shutdown Cassandra: 'Shutdown Cassandra: ++ cat /var/run/cassandra/cassandra.pidcat: /var/run/cassandra/cassandra.pid: No such file or directory+ su cassandra -c 'kill 'kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]++ seq 40+ for t in '`seq 40`'+ /etc/init.d/cassandra status+ break+ sleep 5+ echo OKOK
> {code}
> But we can get a successful run of the command on one node for
> {code:java}
> export LC_ALL=en_US.UTF-8; export LANG=en_US.UTF-8; export https_proxy=http://proxy.XX:3128 ; export PATH=$PATH:/usr/share/cassandra-medusa/bin; sudo su; mkdir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a; cd /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a;
> medusa-wrapper sudo 
> medusa -vvv restore-node --in-place --no-verify --backup-name 2021031803 --temp-dir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a{code}
> We are running the command on 
> {code:java}
> uname -a
> Linux XXXX 5.3.0-53-generic #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux{code}
> Could you please have a look at the issue?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org