You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Henrik Schröder <sk...@gmail.com> on 2012/05/24 19:41:44 UTC

Migrating from a windows cluster to a linux cluster.

Hey everyone,

We're trying to migrate a cassandra cluster from a bunch of Windows
machines to a bunch of (newer and more powerful) Linux machines.

Our initial plan was to simply bootstrap the Linux servers into the cluster
one by one, and then decommission the old servers one by one. However, when
we try to join a Linux server to the cluster, we get the following error:

ERROR 11:52:22,959 Fatal exception in thread Thread[Thread-21,5,main]
java.lang.AssertionError: Filename must include parent directory.
        at
org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:148)
        at
org.apache.cassandra.streaming.PendingFile$PendingFileSerializer.deserialize(PendingFile.java:138)
        at
org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:88)
        at
org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:70)
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:80)

A quick googling reveals that the cause is the simple fact that Cassandra
is transmitting the full path of the datafiles with the native directory
separator, "\", and the Linux servers expect it to be "/", and get confused
as a result.

We're running version 1.0.8. Is this fixed in a later release? Will this be
fixed in a later release?

Are there any other ways of doing the migration? What happens if we join
the new servers without bootstrapping and run repair? Are there any other
ugly hacks or workaround we can do? We're not looking to run a mixed
cluster, we just want to migrate all the data as painlessly as possible.


/Henrik

Re: Migrating from a windows cluster to a linux cluster.

Posted by Henrik Schröder <sk...@gmail.com>.
Hey, we thought a bit about it and came up with another solution:

We shut down Cassandra on one of the windows servers, copy over the data
directory to one of the Linux servers, delete the LocationInfo files from
the system keyspace, and start it up.

It should read the saved token from the datafiles, it should have all the
data associated with that token, and on joining the cluster it should just
pop in at the right place, but with a new ip address. And then we repeat
that for each server.

Will this work? Or is there a better way?


/Henrik

On Thu, May 24, 2012 at 7:41 PM, Henrik Schröder <sk...@gmail.com> wrote:

> Hey everyone,
>
> We're trying to migrate a cassandra cluster from a bunch of Windows
> machines to a bunch of (newer and more powerful) Linux machines.
>
> Our initial plan was to simply bootstrap the Linux servers into the
> cluster one by one, and then decommission the old servers one by one.
> However, when we try to join a Linux server to the cluster, we get the
> following error:
>
> ERROR 11:52:22,959 Fatal exception in thread Thread[Thread-21,5,main]
> java.lang.AssertionError: Filename must include parent directory.
>         at
> org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:148)
>         at
> org.apache.cassandra.streaming.PendingFile$PendingFileSerializer.deserialize(PendingFile.java:138)
>         at
> org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:88)
>         at
> org.apache.cassandra.streaming.StreamHeader$StreamHeaderSerializer.deserialize(StreamHeader.java:70)
>         at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:80)
>
> A quick googling reveals that the cause is the simple fact that Cassandra
> is transmitting the full path of the datafiles with the native directory
> separator, "\", and the Linux servers expect it to be "/", and get confused
> as a result.
>
> We're running version 1.0.8. Is this fixed in a later release? Will this
> be fixed in a later release?
>
> Are there any other ways of doing the migration? What happens if we join
> the new servers without bootstrapping and run repair? Are there any other
> ugly hacks or workaround we can do? We're not looking to run a mixed
> cluster, we just want to migrate all the data as painlessly as possible.
>
>
> /Henrik
>

Re: Migrating from a windows cluster to a linux cluster.

Posted by Brandon Williams <dr...@gmail.com>.
On Thu, May 24, 2012 at 3:36 PM, Henrik Schröder <sk...@gmail.com> wrote:
>> That sounds fine, with the caveat that you can't run sstableloader
>> from a machine running Cassandra before 1.1, so copying the sstables
>> manually (assuming both clusters are the same size and have the same
>> tokens) might be better.
>
>
> Why is version 1.1 required for sstableloader? We're running 1.0.x on both
> clusters, but we can of course upgrade if that's required.

Before 1.1 sstableloader is a fat client, and thus can't coexist with
an existing Cassandra instance on the same machine.

-Brandon

Re: Migrating from a windows cluster to a linux cluster.

Posted by Henrik Schröder <sk...@gmail.com>.
On Thu, May 24, 2012 at 9:28 PM, Brandon Williams <dr...@gmail.com> wrote:

>
> That sounds fine, with the caveat that you can't run sstableloader
> from a machine running Cassandra before 1.1, so copying the sstables
> manually (assuming both clusters are the same size and have the same
> tokens) might be better.
>

Why is version 1.1 required for sstableloader? We're running 1.0.x on both
clusters, but we can of course upgrade if that's required.


> > The only issue with this is the timestamps of the data and tombstones in
> > each sstable, will they be preserved by sstableloader? What about
> deletes of
> > non-existing keys? Will they be stored in the Linux cluster so that when
> > sstableloader inserts the key later, it's resolved as being deleted?
>
> None of that should be a problem.
>

Excellent, thanks!


/Henrik

Re: Migrating from a windows cluster to a linux cluster.

Posted by Conan Cook <co...@amee.com>.
Hi,

We were trying to do a similar kind of migration (to a new cluster, no
downtime) in order to remove a legacy OrderedPartitioner limitation.  In
the end we were allowed enough downtime to migrate, but originally we were
proposing a similar solution based around deploying an update to the
application to write to two clusters simultaneously, and a background copy
of older data in some way.

I'd love to hear how the migration went, and whether there were any
(un)expected hurdles along the way!

Thanks,


Conan

On 24 May 2012 23:56, Rob Coli <rc...@palominodb.com> wrote:

> On Thu, May 24, 2012 at 12:44 PM, Steve Neely <sn...@rallydev.com> wrote:
> > It also seems like a dark deployment of your new cluster is a great
> method
> > for testing the Linux-based systems before switching your mision critical
> > traffic over. Monitor them for a while with real traffic and you can have
> > confidence that they'll function correctly when you perform the
> switchover.
>
> FWIW, I would love to see graphs which show their compared performance
> under identical write load and then show the cut-over point for reads
> between the two clusters. My hypothesis is that your linux cluster
> will magically be much more perfomant/less loaded due to many
> linux-specific optimizations in Cassandra, but I'd dig seeing this
> illustrated in an apples to apples sense with real app traffic.
>
> =Rob
>
> --
> =Robert Coli
> AIM&GTALK - rcoli@palominodb.com
> YAHOO - rcoli.palominob
> SKYPE - rcoli_palominodb
>

Re: Migrating from a windows cluster to a linux cluster.

Posted by Rob Coli <rc...@palominodb.com>.
On Thu, May 24, 2012 at 12:44 PM, Steve Neely <sn...@rallydev.com> wrote:
> It also seems like a dark deployment of your new cluster is a great method
> for testing the Linux-based systems before switching your mision critical
> traffic over. Monitor them for a while with real traffic and you can have
> confidence that they'll function correctly when you perform the switchover.

FWIW, I would love to see graphs which show their compared performance
under identical write load and then show the cut-over point for reads
between the two clusters. My hypothesis is that your linux cluster
will magically be much more perfomant/less loaded due to many
linux-specific optimizations in Cassandra, but I'd dig seeing this
illustrated in an apples to apples sense with real app traffic.

=Rob

-- 
=Robert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

Re: Migrating from a windows cluster to a linux cluster.

Posted by Steve Neely <sn...@rallydev.com>.
It also seems like a dark deployment of your new cluster is a great method
for testing the Linux-based systems *before* switching your mision critical
traffic over. Monitor them for a while with real traffic and you can have
confidence that they'll function correctly when you perform the switchover.

-- Steve


On Thu, May 24, 2012 at 1:28 PM, Brandon Williams <dr...@gmail.com> wrote:

> On Thu, May 24, 2012 at 1:50 PM, Henrik Schröder <sk...@gmail.com>
> wrote:
> > Ok. It's important for us to not have any downtime, so how about this
> > solution:
> >
> > We startup the Linux cluster independently.
> > We configure our application to send all Cassandra writes to both
> clusters,
> > but only read from the Windows cluster.
> > We run sstableloader on each windows server (Is it possible to do in
> > parallell?), sending whatever it has to the Linux cluster.
> > When it's done on all Windows servers, we configure our application to
> only
> > talk to the Linux cluster.
>
> That sounds fine, with the caveat that you can't run sstableloader
> from a machine running Cassandra before 1.1, so copying the sstables
> manually (assuming both clusters are the same size and have the same
> tokens) might be better.
>
> > The only issue with this is the timestamps of the data and tombstones in
> > each sstable, will they be preserved by sstableloader? What about
> deletes of
> > non-existing keys? Will they be stored in the Linux cluster so that when
> > sstableloader inserts the key later, it's resolved as being deleted?
>
> None of that should be a problem.
>
> -Brandon
>

Re: Migrating from a windows cluster to a linux cluster.

Posted by Brandon Williams <dr...@gmail.com>.
On Thu, May 24, 2012 at 1:50 PM, Henrik Schröder <sk...@gmail.com> wrote:
> Ok. It's important for us to not have any downtime, so how about this
> solution:
>
> We startup the Linux cluster independently.
> We configure our application to send all Cassandra writes to both clusters,
> but only read from the Windows cluster.
> We run sstableloader on each windows server (Is it possible to do in
> parallell?), sending whatever it has to the Linux cluster.
> When it's done on all Windows servers, we configure our application to only
> talk to the Linux cluster.

That sounds fine, with the caveat that you can't run sstableloader
from a machine running Cassandra before 1.1, so copying the sstables
manually (assuming both clusters are the same size and have the same
tokens) might be better.

> The only issue with this is the timestamps of the data and tombstones in
> each sstable, will they be preserved by sstableloader? What about deletes of
> non-existing keys? Will they be stored in the Linux cluster so that when
> sstableloader inserts the key later, it's resolved as being deleted?

None of that should be a problem.

-Brandon

Re: Migrating from a windows cluster to a linux cluster.

Posted by Henrik Schröder <sk...@gmail.com>.
On Thu, May 24, 2012 at 8:07 PM, Brandon Williams <dr...@gmail.com> wrote:

> > Are there any other ways of doing the migration? What happens if we join
> the
> > new servers without bootstrapping and run repair? Are there any other
> ugly
> > hacks or workaround we can do? We're not looking to run a mixed cluster,
> we
> > just want to migrate all the data as painlessly as possible.
>
> Start the linux cluster independently and use sstableloader from the
> windows cluster to populate it.
>
>
Ok. It's important for us to not have any downtime, so how about this
solution:

We startup the Linux cluster independently.
We configure our application to send all Cassandra writes to both clusters,
but only read from the Windows cluster.
We run sstableloader on each windows server (Is it possible to do in
parallell?), sending whatever it has to the Linux cluster.
When it's done on all Windows servers, we configure our application to only
talk to the Linux cluster.

The only issue with this is the timestamps of the data and tombstones in
each sstable, will they be preserved by sstableloader? What about deletes
of non-existing keys? Will they be stored in the Linux cluster so that when
sstableloader inserts the key later, it's resolved as being deleted?


/Henrik

Re: Migrating from a windows cluster to a linux cluster.

Posted by Brandon Williams <dr...@gmail.com>.
On Thu, May 24, 2012 at 12:41 PM, Henrik Schröder <sk...@gmail.com> wrote:
> We're running version 1.0.8. Is this fixed in a later release? Will this be
> fixed in a later release?

No, mixed-OS clusters are unsupported.

> Are there any other ways of doing the migration? What happens if we join the
> new servers without bootstrapping and run repair? Are there any other ugly
> hacks or workaround we can do? We're not looking to run a mixed cluster, we
> just want to migrate all the data as painlessly as possible.

Start the linux cluster independently and use sstableloader from the
windows cluster to populate it.

-Brandon