You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mitch Gitman <mg...@gmail.com> on 2015/06/19 05:44:49 UTC

sstableloader "Could not retrieve endpoint ranges"

I'm using sstableloader to bulk-load a table from one cluster to another. I
can't just copy sstables because the clusters have different topologies.
While we're looking to upgrade soon to Cassandra 2.0.x, we're on Cassandra
1.2.19. The source data comes from a "nodetool snapshot."

Here's the command I ran:
sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/*

Here's the result I got:
Could not retrieve endpoint ranges:
 -pr,--principal                       kerberos principal
 -k,--keytab                           keytab location
 --ssl-keystore                        ssl keystore location
 --ssl-keystore-password               ssl keystore password
 --ssl-keystore-type                   ssl keystore type
 --ssl-truststore                      ssl truststore location
 --ssl-truststore-password             ssl truststore password
 --ssl-truststore-type                 ssl truststore type

Not sure what to make of this, what with the hints at security arguments
that pop up. The source and destination clusters have no security.

Hoping this might ring a bell with someone out there.

Re: sstableloader "Could not retrieve endpoint ranges"

Posted by Mitch Gitman <mg...@gmail.com>.
I want to follow up on this thread to describe what I was able to get
working. My goal was to switch a cluster to vnodes, in the process
preserving the data for a single table, endpoints.endpoint_messages.
Otherwise, I could afford to start from a clean slate. As should be
apparent, I could also afford to do this within a maintenance window where
the cluster was down. In other words, I had the luxury of not having to add
a new data center to a live cluster per DataStax's documented procedure to
enable vnodes:
http://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configVnodesProduction_t.html
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configVnodesProduction_t.html

What I got working relies on the nodetool snapshot command to create
various SSTable snapshots under
endpoints/endpoint_messages/snapshots/SNAPSHOT_NAME. The snapshots
represent the data being backed up and restored from. The backup and
restore is not directly, literally working against the original SSTables
directly in various endpoints/endpoint_messages/ directories.

   - endpoints/endpoint_messages/snapshots/SNAPSHOT_NAME/: These SSTables
   are being copied off and restored from.
   - endpoints/endpoint_messages/: These SSTables are obviously the source
   of the snapshots but are not being copied off and restored from.

Instead of using sstableloader to load the snapshots into the
re-initialized Cassandra cluster, I used the JMX StorageService.bulkLoad
command after establishing a JConsole session to each node. I copied off
the snapshots to load to a directory path that ends with
endpoints/endpoint_messages/ to give the bulk-loader a path it expects. The
directory path that is the destination for nodetool snapshot and the source
for StorageService.bulkLoad is on the same host as the Cassandra node but
outside the purview of the Cassandra node.

This procedure can be summarized as follows:
1. For each node, create a snapshot of the endpoint_messages table as a
backup.
2. Stop the cluster.
3. On each node, wipe all the data, i.e. the contents of
data_files_directories, commitlog, and saved_caches.
4. Deploy the cassandra.yaml configuration that makes the switch to vnodes
and restart the cluster to apply the vnodes change.
5. Re-create the endpoints keyspace.
6. On each node, bulk-load the snapshots for that particular node.

This summary can be reduced even further:
1. On each node, export the data to preserve.
2. On each node, wipe the data.
3. On all nodes, switch to vnodes.
4. On each node, import back in the exported data.

I'm sure this process could have been streamlined.

One caveat for anyone looking to emulate this: Our situation might have
been a little easier to reason about because our original endpoint_messages
table had a replication factor of 1. We used the vnodes switch as an
opportunity to up the RF to 3.

I can only speculate as to why what I was originally attempting wasn't
working. But what I was originally attempting wasn't precisely the use case
I care about. What I'm following up with now was.

On Fri, Jun 19, 2015 at 8:22 PM, Mitch Gitman <mg...@gmail.com> wrote:

> I checked the system.log for the Cassandra node that I did the jconsole
> JMX session against and which had the data to load. Lot of log output
> indicating that it's busy loading the files. Lot of stacktraces indicating
> a broken pipe. I have no reason to believe there are connectivity issues
> between the nodes, but verifying that is beyond my expertise. What's
> indicative is this last bit of log output:
>  INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,441
> StreamReplyVerbHandler.java (line 44) Successfully sent
> /srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-34-Data.db
> to /10.205.55.101
>  INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,457
> OutputHandler.java (line 42) Streaming session to /10.205.55.101 failed
> ERROR [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,458
> CassandraDaemon.java (line 253) Exception in thread Thread[Streaming to /
> 10.205.55.101:5,5,RMI Runtime]
> java.lang.RuntimeException: java.io.IOException: Broken pipe
> at com.google.common.base.Throwables.propagate(Throwables.java:160)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Broken pipe
> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:433)
> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565)
> at
> org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
> at
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ... 3 more
>
> And then right after that I see what appears to be the output from the
> nodetool refresh:
>  INFO [RMI TCP Connection(2480)-10.2.101.114] 2015-06-19 21:22:56,877
> ColumnFamilyStore.java (line 478) Loading new SSTables for
> endpoints/endpoint_messages...
>  INFO [RMI TCP Connection(2480)-10.2.101.114] 2015-06-19 21:22:56,878
> ColumnFamilyStore.java (line 524) No new SSTables were found for
> endpoints/endpoint_messages
>
> Notice that Cassandra hasn't found any new SSTables, even though it was
> just so busy loading them.
>
> What's also noteworthy is that the output from the originating node shows
> it successfully sent endpoints-endpoint_messages-ic-34-Data.db to another
> node. But then in the system.log for that destination node, I see no
> mention of that file. What I do see on the destination node are a few INFO
> messages about streaming one of the .db files, and every time that's
> immediately followed by an error message:
>  INFO [Thread-108] 2015-06-19 21:20:45,453 StreamInSession.java (line 142)
> Streaming of file
> /srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-26-Data.db
> sections=1 progress=0/105137329 - 0% for
> org.apache.cassandra.streaming.StreamInSession@46c039ef failed:
> requesting a retry.
> ERROR [Thread-109] 2015-06-19 21:20:45,456 CassandraDaemon.java (line 253)
> Exception in thread Thread[Thread-109,5,main]
> java.lang.RuntimeException: java.nio.channels.AsynchronousCloseException
> at com.google.common.base.Throwables.propagate(Throwables.java:160)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.AsynchronousCloseException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:412)
> at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:203)
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
> at
> org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:151)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ... 1 more
>
> I don't know, I'm seeing enough flakiness here as to consider Cassandra
> bulk-loading a lost cause, even if there is something wrong and fixable
> about my particular cluster. On to exporting and re-importing data at the
> proprietary application level. Life is too short.
>
> On Fri, Jun 19, 2015 at 2:40 PM, Mitch Gitman <mg...@gmail.com> wrote:
>
>> Fabien, thanks for the reply. We do have Thrift enabled. From what I can
>> tell, the "Could not retrieve endpoint ranges:" crops up under various
>> circumstances.
>>
>> From further reading on sstableloader, it occurred to me that it might be
>> a safer bet to use the JMX StorageService bulkLoad command, considering
>> that the data to import was already on one of the Cassandra nodes, just in
>> an arbitrary directory outside the Cassandra data directories.
>>
>> I was able to get this bulkLoad command to fail with a message that the
>> directory structure did not follow the expected keyspace/table/ pattern. So
>> I created a keyspace directory and then a table directory within that and
>> moved all the files under the table directory. Executed bulkLoad, passing
>> in that directory. It succeeded.
>>
>> Then I went and ran a nodetool refresh on the table in question.
>>
>> Only one problem. If I then went to query the table for, well, anything,
>> nothing came back. And this was after successfully querying the table
>> before and truncating the table just prior to the bulkLoad, so that I knew
>> that only the data coming from the bulkLoad could show up there.
>>
>> Oh, and for good measure, I stopped and started all the nodes too. No
>> luck still.
>>
>> What's puzzling about this is that the bulkLoad silently succeeds, even
>> though it doesn't appear to be doing anything. I haven't bothered yet to
>> check the Cassandra logs.
>>
>> On Fri, Jun 19, 2015 at 12:28 AM, Fabien Rousseau <fa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I already got this error on a 2.1 clusters because thrift was disabled.
>>> So you should check that thrift is enabled and accessible from the
>>> sstableloader process.
>>>
>>> Hope this help
>>>
>>> Fabien
>>> Le 19 juin 2015 05:44, "Mitch Gitman" <mg...@gmail.com> a écrit :
>>>
>>>> I'm using sstableloader to bulk-load a table from one cluster to
>>>> another. I can't just copy sstables because the clusters have different
>>>> topologies. While we're looking to upgrade soon to Cassandra 2.0.x, we're
>>>> on Cassandra 1.2.19. The source data comes from a "nodetool snapshot."
>>>>
>>>> Here's the command I ran:
>>>> sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/*
>>>>
>>>> Here's the result I got:
>>>> Could not retrieve endpoint ranges:
>>>>  -pr,--principal                       kerberos principal
>>>>  -k,--keytab                           keytab location
>>>>  --ssl-keystore                        ssl keystore location
>>>>  --ssl-keystore-password               ssl keystore password
>>>>  --ssl-keystore-type                   ssl keystore type
>>>>  --ssl-truststore                      ssl truststore location
>>>>  --ssl-truststore-password             ssl truststore password
>>>>  --ssl-truststore-type                 ssl truststore type
>>>>
>>>> Not sure what to make of this, what with the hints at security
>>>> arguments that pop up. The source and destination clusters have no security.
>>>>
>>>> Hoping this might ring a bell with someone out there.
>>>>
>>>
>>
>

Re: sstableloader "Could not retrieve endpoint ranges"

Posted by Mitch Gitman <mg...@gmail.com>.
I checked the system.log for the Cassandra node that I did the jconsole JMX
session against and which had the data to load. Lot of log output
indicating that it's busy loading the files. Lot of stacktraces indicating
a broken pipe. I have no reason to believe there are connectivity issues
between the nodes, but verifying that is beyond my expertise. What's
indicative is this last bit of log output:
 INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,441
StreamReplyVerbHandler.java (line 44) Successfully sent
/srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-34-Data.db
to /10.205.55.101
 INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,457
OutputHandler.java (line 42) Streaming session to /10.205.55.101 failed
ERROR [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,458
CassandraDaemon.java (line 253) Exception in thread Thread[Streaming to /
10.205.55.101:5,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Broken pipe
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:433)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565)
at
org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
at
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
... 3 more

And then right after that I see what appears to be the output from the
nodetool refresh:
 INFO [RMI TCP Connection(2480)-10.2.101.114] 2015-06-19 21:22:56,877
ColumnFamilyStore.java (line 478) Loading new SSTables for
endpoints/endpoint_messages...
 INFO [RMI TCP Connection(2480)-10.2.101.114] 2015-06-19 21:22:56,878
ColumnFamilyStore.java (line 524) No new SSTables were found for
endpoints/endpoint_messages

Notice that Cassandra hasn't found any new SSTables, even though it was
just so busy loading them.

What's also noteworthy is that the output from the originating node shows
it successfully sent endpoints-endpoint_messages-ic-34-Data.db to another
node. But then in the system.log for that destination node, I see no
mention of that file. What I do see on the destination node are a few INFO
messages about streaming one of the .db files, and every time that's
immediately followed by an error message:
 INFO [Thread-108] 2015-06-19 21:20:45,453 StreamInSession.java (line 142)
Streaming of file
/srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-26-Data.db
sections=1 progress=0/105137329 - 0% for
org.apache.cassandra.streaming.StreamInSession@46c039ef failed: requesting
a retry.
ERROR [Thread-109] 2015-06-19 21:20:45,456 CassandraDaemon.java (line 253)
Exception in thread Thread[Thread-109,5,main]
java.lang.RuntimeException: java.nio.channels.AsynchronousCloseException
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.AsynchronousCloseException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:412)
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:203)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at
org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:151)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
... 1 more

I don't know, I'm seeing enough flakiness here as to consider Cassandra
bulk-loading a lost cause, even if there is something wrong and fixable
about my particular cluster. On to exporting and re-importing data at the
proprietary application level. Life is too short.

On Fri, Jun 19, 2015 at 2:40 PM, Mitch Gitman <mg...@gmail.com> wrote:

> Fabien, thanks for the reply. We do have Thrift enabled. From what I can
> tell, the "Could not retrieve endpoint ranges:" crops up under various
> circumstances.
>
> From further reading on sstableloader, it occurred to me that it might be
> a safer bet to use the JMX StorageService bulkLoad command, considering
> that the data to import was already on one of the Cassandra nodes, just in
> an arbitrary directory outside the Cassandra data directories.
>
> I was able to get this bulkLoad command to fail with a message that the
> directory structure did not follow the expected keyspace/table/ pattern. So
> I created a keyspace directory and then a table directory within that and
> moved all the files under the table directory. Executed bulkLoad, passing
> in that directory. It succeeded.
>
> Then I went and ran a nodetool refresh on the table in question.
>
> Only one problem. If I then went to query the table for, well, anything,
> nothing came back. And this was after successfully querying the table
> before and truncating the table just prior to the bulkLoad, so that I knew
> that only the data coming from the bulkLoad could show up there.
>
> Oh, and for good measure, I stopped and started all the nodes too. No luck
> still.
>
> What's puzzling about this is that the bulkLoad silently succeeds, even
> though it doesn't appear to be doing anything. I haven't bothered yet to
> check the Cassandra logs.
>
> On Fri, Jun 19, 2015 at 12:28 AM, Fabien Rousseau <fa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I already got this error on a 2.1 clusters because thrift was disabled.
>> So you should check that thrift is enabled and accessible from the
>> sstableloader process.
>>
>> Hope this help
>>
>> Fabien
>> Le 19 juin 2015 05:44, "Mitch Gitman" <mg...@gmail.com> a écrit :
>>
>>> I'm using sstableloader to bulk-load a table from one cluster to
>>> another. I can't just copy sstables because the clusters have different
>>> topologies. While we're looking to upgrade soon to Cassandra 2.0.x, we're
>>> on Cassandra 1.2.19. The source data comes from a "nodetool snapshot."
>>>
>>> Here's the command I ran:
>>> sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/*
>>>
>>> Here's the result I got:
>>> Could not retrieve endpoint ranges:
>>>  -pr,--principal                       kerberos principal
>>>  -k,--keytab                           keytab location
>>>  --ssl-keystore                        ssl keystore location
>>>  --ssl-keystore-password               ssl keystore password
>>>  --ssl-keystore-type                   ssl keystore type
>>>  --ssl-truststore                      ssl truststore location
>>>  --ssl-truststore-password             ssl truststore password
>>>  --ssl-truststore-type                 ssl truststore type
>>>
>>> Not sure what to make of this, what with the hints at security arguments
>>> that pop up. The source and destination clusters have no security.
>>>
>>> Hoping this might ring a bell with someone out there.
>>>
>>
>

Re: sstableloader "Could not retrieve endpoint ranges"

Posted by Mitch Gitman <mg...@gmail.com>.
Fabien, thanks for the reply. We do have Thrift enabled. From what I can
tell, the "Could not retrieve endpoint ranges:" crops up under various
circumstances.

>From further reading on sstableloader, it occurred to me that it might be a
safer bet to use the JMX StorageService bulkLoad command, considering that
the data to import was already on one of the Cassandra nodes, just in an
arbitrary directory outside the Cassandra data directories.

I was able to get this bulkLoad command to fail with a message that the
directory structure did not follow the expected keyspace/table/ pattern. So
I created a keyspace directory and then a table directory within that and
moved all the files under the table directory. Executed bulkLoad, passing
in that directory. It succeeded.

Then I went and ran a nodetool refresh on the table in question.

Only one problem. If I then went to query the table for, well, anything,
nothing came back. And this was after successfully querying the table
before and truncating the table just prior to the bulkLoad, so that I knew
that only the data coming from the bulkLoad could show up there.

Oh, and for good measure, I stopped and started all the nodes too. No luck
still.

What's puzzling about this is that the bulkLoad silently succeeds, even
though it doesn't appear to be doing anything. I haven't bothered yet to
check the Cassandra logs.

On Fri, Jun 19, 2015 at 12:28 AM, Fabien Rousseau <fa...@gmail.com>
wrote:

> Hi,
>
> I already got this error on a 2.1 clusters because thrift was disabled. So
> you should check that thrift is enabled and accessible from the
> sstableloader process.
>
> Hope this help
>
> Fabien
> Le 19 juin 2015 05:44, "Mitch Gitman" <mg...@gmail.com> a écrit :
>
>> I'm using sstableloader to bulk-load a table from one cluster to another.
>> I can't just copy sstables because the clusters have different topologies.
>> While we're looking to upgrade soon to Cassandra 2.0.x, we're on Cassandra
>> 1.2.19. The source data comes from a "nodetool snapshot."
>>
>> Here's the command I ran:
>> sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/*
>>
>> Here's the result I got:
>> Could not retrieve endpoint ranges:
>>  -pr,--principal                       kerberos principal
>>  -k,--keytab                           keytab location
>>  --ssl-keystore                        ssl keystore location
>>  --ssl-keystore-password               ssl keystore password
>>  --ssl-keystore-type                   ssl keystore type
>>  --ssl-truststore                      ssl truststore location
>>  --ssl-truststore-password             ssl truststore password
>>  --ssl-truststore-type                 ssl truststore type
>>
>> Not sure what to make of this, what with the hints at security arguments
>> that pop up. The source and destination clusters have no security.
>>
>> Hoping this might ring a bell with someone out there.
>>
>

Re: sstableloader "Could not retrieve endpoint ranges"

Posted by Fabien Rousseau <fa...@gmail.com>.
Hi,

I already got this error on a 2.1 clusters because thrift was disabled. So
you should check that thrift is enabled and accessible from the
sstableloader process.

Hope this help

Fabien
Le 19 juin 2015 05:44, "Mitch Gitman" <mg...@gmail.com> a écrit :

> I'm using sstableloader to bulk-load a table from one cluster to another.
> I can't just copy sstables because the clusters have different topologies.
> While we're looking to upgrade soon to Cassandra 2.0.x, we're on Cassandra
> 1.2.19. The source data comes from a "nodetool snapshot."
>
> Here's the command I ran:
> sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/*
>
> Here's the result I got:
> Could not retrieve endpoint ranges:
>  -pr,--principal                       kerberos principal
>  -k,--keytab                           keytab location
>  --ssl-keystore                        ssl keystore location
>  --ssl-keystore-password               ssl keystore password
>  --ssl-keystore-type                   ssl keystore type
>  --ssl-truststore                      ssl truststore location
>  --ssl-truststore-password             ssl truststore password
>  --ssl-truststore-type                 ssl truststore type
>
> Not sure what to make of this, what with the hints at security arguments
> that pop up. The source and destination clusters have no security.
>
> Hoping this might ring a bell with someone out there.
>