You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yolanda Tang (Jira)" <ji...@apache.org> on 2021/03/25 04:22:00 UTC

[jira] [Updated] (CASSANDRA-16538) Cannot run restore for a list of Cassandra nodes

     [ https://issues.apache.org/jira/browse/CASSANDRA-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yolanda Tang updated CASSANDRA-16538:
-------------------------------------
    Description: 
Hi,

 

When switching to use Cassandra medus to fulfill our work for node data restore, we encountered some issues.

When using pssh remotely we are getting timeout issue, when trying the command on one node of Cassandra, we  get

 
{code:java}
pssh -H XXXX medusa -vvv restore-node --in-place --no-verify --backup-name 2021031803 --temp-dir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a
 [1] 06:52:08 [FAILURE] sha8392 Timed out, Killed by signal 9
 When further looking into the timeout issue, we get logs as
 [2021-03-25 02:23:50,113] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "GET /XX/XX/10.44.XX.XX/2021031803/meta/schema.cql?Version=2006-03-01 HTTP/1.1" 200 24005[2021-03-25 02:23:50,114] DEBUG: [Storage] Getting object sre_dev_cass_sha/10.44.79.15/2021031803/meta/tokenmap.json
 [2021-03-25 02:23:50,151] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "HEAD /XX HTTP/1.1" 200 0[2021-03-25 02:23:50,201] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "HEAD /XX/XX/10.44.79.15/2021031803/meta/tokenmap.json HTTP/1.1" 200 0[2021-03-25 02:23:50,202] DEBUG: Downloading /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a/medusa-restore-197b6c82-4cd5-4c5b-b3c2-9d98863c1b3f as single part
 [2021-03-25 02:23:50,254] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "GET /XX/XX/10.44.XX.XX/2021031803/meta/tokenmap.json?Version=2006-03-01 HTTP/1.1" 200 1535[2021-03-25 02:23:50,255] INFO: Stopping Cassandra
+ /usr/bin/nodetool u cassandra -pw if9te8ohKei9xaep drain+ /usr/bin/nodetool -u cassandra -pw if9te8ohKei9xaep drainerror: null- StackTrace --java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298) at com.sun.proxy.$Proxy8.drain(Unknown Source) at org.apache.cassandra.tools.NodeProbe.drain(NodeProbe.java:371) at org.apache.cassandra.tools.nodetool.Drain.execute(Drain.java:36) at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)
 + ls -l /var/run/cassandra/cassandra.pidls: cannot access /var/run/cassandra/cassandra.pid: No such file or directory+ sleep 10+ echo -n 'Shutdown Cassandra: 'Shutdown Cassandra: ++ cat /var/run/cassandra/cassandra.pidcat: /var/run/cassandra/cassandra.pid: No such file or directory+ su cassandra -c 'kill 'kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]++ seq 40+ for t in '`seq 40`'+ /etc/init.d/cassandra status+ break+ sleep 5+ echo OKOK
{code}

 But we can get a successful run of the command on one node for
{code:java}
export LC_ALL=en_US.UTF-8; export LANG=en_US.UTF-8; export https_proxy=http://proxy.XX:3128 ; export PATH=$PATH:/usr/share/cassandra-medusa/bin; sudo su; mkdir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a; cd /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a;
medusa-wrapper sudo 
medusa -vvv restore-node --in-place --no-verify --backup-name 2021031803 --temp-dir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a{code}
We are running the command on 
{code:java}
uname -a
Linux sha8392 5.3.0-53-generic #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux{code}
Could you please have a look at the issue?

Thanks

  was:
Hi,

 

When switching to use Cassandra medus to fulfill our work for node data restore, we encountered some issues.

When using pssh remotely we are getting timeout issue, when trying the command on one node of Cassandra, we  get
# pssh -H XXXX medusa -vvv restore-node --in-place  --no-verify --backup-name 2021031803 --temp-dir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a
[1] 06:52:08 [FAILURE] sha8392 Timed out, Killed by signal 9
When further looking into the timeout issue, we get logs as
[2021-03-25 02:23:50,113] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "GET /XX/XX/10.44.XX.XX/2021031803/meta/schema.cql?Version=2006-03-01 HTTP/1.1" 200 24005[2021-03-25 02:23:50,114] DEBUG: [Storage] Getting object sre_dev_cass_sha/10.44.79.15/2021031803/meta/tokenmap.json
[2021-03-25 02:23:50,151] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "HEAD /XX HTTP/1.1" 200 0[2021-03-25 02:23:50,201] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "HEAD /XX/XX/10.44.79.15/2021031803/meta/tokenmap.json HTTP/1.1" 200 0[2021-03-25 02:23:50,202] DEBUG: Downloading /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a/medusa-restore-197b6c82-4cd5-4c5b-b3c2-9d98863c1b3f as single part
[2021-03-25 02:23:50,254] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "GET /XX/XX/10.44.XX.XX/2021031803/meta/tokenmap.json?Version=2006-03-01 HTTP/1.1" 200 1535[2021-03-25 02:23:50,255] INFO: Stopping Cassandra

+ /usr/bin/nodetool -u cassandra -pw if9te8ohKei9xaep drain+ /usr/bin/nodetool -u cassandra -pw if9te8ohKei9xaep drainerror: null-- StackTrace --java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298) at com.sun.proxy.$Proxy8.drain(Unknown Source) at org.apache.cassandra.tools.NodeProbe.drain(NodeProbe.java:371) at org.apache.cassandra.tools.nodetool.Drain.execute(Drain.java:36) at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)
+ ls -l /var/run/cassandra/cassandra.pidls: cannot access /var/run/cassandra/cassandra.pid: No such file or directory+ sleep 10+ echo -n 'Shutdown Cassandra: 'Shutdown Cassandra: ++ cat /var/run/cassandra/cassandra.pidcat: /var/run/cassandra/cassandra.pid: No such file or directory+ su cassandra -c 'kill 'kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]++ seq 40+ for t in '`seq 40`'+ /etc/init.d/cassandra status+ break+ sleep 5+ echo OKOK
But we can get a successful run of the command on one node for
{code:java}
export LC_ALL=en_US.UTF-8; export LANG=en_US.UTF-8; export https_proxy=http://proxy.XX:3128 ; export PATH=$PATH:/usr/share/cassandra-medusa/bin; sudo su; mkdir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a; cd /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a;
medusa-wrapper sudo 
medusa -vvv restore-node --in-place --no-verify --backup-name 2021031803 --temp-dir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a{code}
We are running the command on 
{code:java}
uname -a
Linux sha8392 5.3.0-53-generic #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux{code}
Could you please have a look at the issue?

Thanks


> Cannot run restore for a list of Cassandra nodes
> ------------------------------------------------
>
>                 Key: CASSANDRA-16538
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16538
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Yolanda Tang
>            Priority: Normal
>
> Hi,
>  
> When switching to use Cassandra medus to fulfill our work for node data restore, we encountered some issues.
> When using pssh remotely we are getting timeout issue, when trying the command on one node of Cassandra, we  get
>  
> {code:java}
> pssh -H XXXX medusa -vvv restore-node --in-place --no-verify --backup-name 2021031803 --temp-dir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a
>  [1] 06:52:08 [FAILURE] sha8392 Timed out, Killed by signal 9
>  When further looking into the timeout issue, we get logs as
>  [2021-03-25 02:23:50,113] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "GET /XX/XX/10.44.XX.XX/2021031803/meta/schema.cql?Version=2006-03-01 HTTP/1.1" 200 24005[2021-03-25 02:23:50,114] DEBUG: [Storage] Getting object sre_dev_cass_sha/10.44.79.15/2021031803/meta/tokenmap.json
>  [2021-03-25 02:23:50,151] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "HEAD /XX HTTP/1.1" 200 0[2021-03-25 02:23:50,201] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "HEAD /XX/XX/10.44.79.15/2021031803/meta/tokenmap.json HTTP/1.1" 200 0[2021-03-25 02:23:50,202] DEBUG: Downloading /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a/medusa-restore-197b6c82-4cd5-4c5b-b3c2-9d98863c1b3f as single part
>  [2021-03-25 02:23:50,254] DEBUG: https://s3.cn-north-1.amazonaws.com.cn:443 "GET /XX/XX/10.44.XX.XX/2021031803/meta/tokenmap.json?Version=2006-03-01 HTTP/1.1" 200 1535[2021-03-25 02:23:50,255] INFO: Stopping Cassandra
> + /usr/bin/nodetool u cassandra -pw if9te8ohKei9xaep drain+ /usr/bin/nodetool -u cassandra -pw if9te8ohKei9xaep drainerror: null- StackTrace --java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298) at com.sun.proxy.$Proxy8.drain(Unknown Source) at org.apache.cassandra.tools.NodeProbe.drain(NodeProbe.java:371) at org.apache.cassandra.tools.nodetool.Drain.execute(Drain.java:36) at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)
>  + ls -l /var/run/cassandra/cassandra.pidls: cannot access /var/run/cassandra/cassandra.pid: No such file or directory+ sleep 10+ echo -n 'Shutdown Cassandra: 'Shutdown Cassandra: ++ cat /var/run/cassandra/cassandra.pidcat: /var/run/cassandra/cassandra.pid: No such file or directory+ su cassandra -c 'kill 'kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]++ seq 40+ for t in '`seq 40`'+ /etc/init.d/cassandra status+ break+ sleep 5+ echo OKOK
> {code}
>  But we can get a successful run of the command on one node for
> {code:java}
> export LC_ALL=en_US.UTF-8; export LANG=en_US.UTF-8; export https_proxy=http://proxy.XX:3128 ; export PATH=$PATH:/usr/share/cassandra-medusa/bin; sudo su; mkdir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a; cd /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a;
> medusa-wrapper sudo 
> medusa -vvv restore-node --in-place --no-verify --backup-name 2021031803 --temp-dir /tmp/medusa-job-bd8a39ca-a5ea-4a3a-820f-0fa6ddc5130a{code}
> We are running the command on 
> {code:java}
> uname -a
> Linux sha8392 5.3.0-53-generic #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux{code}
> Could you please have a look at the issue?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org