You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Shashilpi Krishan <Sh...@wizecommerce.com> on 2013/09/23 04:02:21 UTC

Row Mutation Errors while upgrading to Cassandra2.0

Hi Everyone.

We had a Cassandra cluster (running with v1.0.7) spread across 3 data centers with each data center having 16 nodes. We started upgrading that to 2.0 but realized that we can't go directly to 2.0 due to read failures, hence to avoid down time we have to go from 1.0 --> 1.1 --> 1.2 --> 2.0.

Now problem is while upgrading from 1.2 --> 2.0 we saw below errors flooding the system.log files in one data center only until we upgraded all the nodes in every DC to 2.0. We think that moment last node was upgraded then this error was gone. Does anyone has idea what could have been causing this? Is it because of some version incompatibility?

ERROR [MutationStage:208] 2013-09-21 17:51:08,163 RowMutationVerbHandler.java (line 63) Error in row mutation
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:197)
        at org.apache.cassandra.net.CompactEndpointSerializationHelper.deserialize(CompactEndpointSerializationHelper.java:37)
        at org.apache.cassandra.db.RowMutationVerbHandler.forwardToLocalNodes(RowMutationVerbHandler.java:81)
        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:49)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
________________________________
Thanks & Regards

Shashilpi Krishan


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Row Mutation Errors while upgrading to Cassandra2.0

Posted by sankalp kohli <ko...@gmail.com>.
I have not used 2.0 but plainly looking at the code for this exception this
is what I am seeing.
So in multi DC deployments, the co-ordinator node sends a write to only one
node in other data center. Now this node in other DC forwards it to other
local nodes if the replication in that DC is more than one. For this to
happen, the co-ordinator serializes the endpoints that need the mutation in
other data center and put it in message.
The exception you are seeing is because  of incompatibility in the way they
serialize the endpoints that need to get the mutation in the remote DC.

Which minor versions did you use while upgrading?



On Mon, Sep 23, 2013 at 1:47 AM, Shashilpi Krishan <
Shashilpi.Krishan@wizecommerce.com> wrote:

>  Thanks for replying Sankalp
>
>
>
> One has to start and test, and we have many such clusters and this cluster
> hold trivial data and a candidate for the trial. But if others also are
> seeing problems with 2.0 then we won’t proceed to do that on our main
> clusters and let them upgrade to 1.2.9 only.
>
>
>
> Before doing this in production we tested 2.0 successfully in test
> environment and at that time, no such issues were noticed (perhaps testing
> happened in single DC is the reason for that).
>
>
>
> BTW, we are using GossipingPropertyFileSnitch ….so could that be the
> reason.
>  ------------------------------
>
> *Thanks & Regards*
>
> *
> Shashilpi Krishan*
>
>
>
> *From:* sankalp kohli [mailto:kohlisankalp@gmail.com]
> *Sent:* Monday, September 23, 2013 8:01 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Row Mutation Errors while upgrading to Cassandra2.0
>
>
>
> You are upgrading to 2.0 in Prod? What is the urgency?
>
>
>
> On Sun, Sep 22, 2013 at 7:02 PM, Shashilpi Krishan <
> Shashilpi.Krishan@wizecommerce.com> wrote:
>
> Hi Everyone.
>
>
>
> We had a Cassandra cluster (running with v1.0.7) spread across 3 data
> centers with each data center having 16 nodes. We started upgrading that to
> 2.0 but realized that we can’t go directly to 2.0 due to read failures,
> hence to avoid down time we have to go from 1.0 à 1.1 à 1.2 à 2.0.
>
>
>
> Now problem is while upgrading from 1.2 à 2.0 we saw below errors
> flooding the system.log files in one data center only until we upgraded all
> the nodes in every DC to 2.0. We think that moment last node was upgraded
> then this error was gone. Does anyone has idea what could have been causing
> this? Is it because of some version incompatibility?
>
>
>
> ERROR [MutationStage:208] 2013-09-21 17:51:08,163
> RowMutationVerbHandler.java (line 63) Error in row mutation
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readFully(DataInputStream.java:197)
>
>         at
> org.apache.cassandra.net.CompactEndpointSerializationHelper.deserialize(CompactEndpointSerializationHelper.java:37)
>
>         at
> org.apache.cassandra.db.RowMutationVerbHandler.forwardToLocalNodes(RowMutationVerbHandler.java:81)
>
>         at
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:49)
>
>         at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:722)
>  ------------------------------
>
> *Thanks & Regards*
>
> *
> Shashilpi Krishan*
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

RE: Row Mutation Errors while upgrading to Cassandra2.0

Posted by Shashilpi Krishan <Sh...@wizecommerce.com>.
Thanks for replying Sankalp

One has to start and test, and we have many such clusters and this cluster hold trivial data and a candidate for the trial. But if others also are seeing problems with 2.0 then we won't proceed to do that on our main clusters and let them upgrade to 1.2.9 only.

Before doing this in production we tested 2.0 successfully in test environment and at that time, no such issues were noticed (perhaps testing happened in single DC is the reason for that).

BTW, we are using GossipingPropertyFileSnitch ....so could that be the reason.
________________________________
Thanks & Regards

Shashilpi Krishan

From: sankalp kohli [mailto:kohlisankalp@gmail.com]
Sent: Monday, September 23, 2013 8:01 AM
To: user@cassandra.apache.org
Subject: Re: Row Mutation Errors while upgrading to Cassandra2.0

You are upgrading to 2.0 in Prod? What is the urgency?

On Sun, Sep 22, 2013 at 7:02 PM, Shashilpi Krishan <Sh...@wizecommerce.com>> wrote:
Hi Everyone.

We had a Cassandra cluster (running with v1.0.7) spread across 3 data centers with each data center having 16 nodes. We started upgrading that to 2.0 but realized that we can't go directly to 2.0 due to read failures, hence to avoid down time we have to go from 1.0 --> 1.1 --> 1.2 --> 2.0.

Now problem is while upgrading from 1.2 --> 2.0 we saw below errors flooding the system.log files in one data center only until we upgraded all the nodes in every DC to 2.0. We think that moment last node was upgraded then this error was gone. Does anyone has idea what could have been causing this? Is it because of some version incompatibility?

ERROR [MutationStage:208] 2013-09-21 17:51:08,163 RowMutationVerbHandler.java (line 63) Error in row mutation
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:197)
        at org.apache.cassandra.net.CompactEndpointSerializationHelper.deserialize(CompactEndpointSerializationHelper.java:37)
        at org.apache.cassandra.db.RowMutationVerbHandler.forwardToLocalNodes(RowMutationVerbHandler.java:81)
        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:49)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
________________________________
Thanks & Regards

Shashilpi Krishan


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Row Mutation Errors while upgrading to Cassandra2.0

Posted by sankalp kohli <ko...@gmail.com>.
You are upgrading to 2.0 in Prod? What is the urgency?


On Sun, Sep 22, 2013 at 7:02 PM, Shashilpi Krishan <
Shashilpi.Krishan@wizecommerce.com> wrote:

>  Hi Everyone.
>
>
>
> We had a Cassandra cluster (running with v1.0.7) spread across 3 data
> centers with each data center having 16 nodes. We started upgrading that to
> 2.0 but realized that we can’t go directly to 2.0 due to read failures,
> hence to avoid down time we have to go from 1.0 à 1.1 à 1.2 à 2.0.
>
>
>
> Now problem is while upgrading from 1.2 à 2.0 we saw below errors
> flooding the system.log files in one data center only until we upgraded all
> the nodes in every DC to 2.0. We think that moment last node was upgraded
> then this error was gone. Does anyone has idea what could have been causing
> this? Is it because of some version incompatibility?
>
>
>
> ERROR [MutationStage:208] 2013-09-21 17:51:08,163
> RowMutationVerbHandler.java (line 63) Error in row mutation
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readFully(DataInputStream.java:197)
>
>         at
> org.apache.cassandra.net.CompactEndpointSerializationHelper.deserialize(CompactEndpointSerializationHelper.java:37)
>
>         at
> org.apache.cassandra.db.RowMutationVerbHandler.forwardToLocalNodes(RowMutationVerbHandler.java:81)
>
>         at
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:49)
>
>         at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:722)
>  ------------------------------
>
> *Thanks & Regards*
>
> *
> Shashilpi Krishan*
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Re: Row Mutation Errors while upgrading to Cassandra2.0

Posted by sankalp kohli <ko...@gmail.com>.
"It is quite possible that this is expected, major version upgrades
semi-frequently spam logs with non-pathological error messages."
The exception is while trying to deserialize the endpoints in the remote
DC. Due to this error, the mutation will not be applied to any node in the
remote DC.



On Mon, Sep 23, 2013 at 10:29 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Sun, Sep 22, 2013 at 7:02 PM, Shashilpi Krishan <
> Shashilpi.Krishan@wizecommerce.com> wrote:
>
>>  We had a Cassandra cluster (running with v1.0.7) spread across 3 data
>> centers with each data center having 16 nodes. We started upgrading that to
>> 2.0 but realized that we can’t go directly to 2.0 due to read failures,
>> hence to avoid down time we have to go from 1.0 à 1.1 à 1.2 à 2.0.
>>
>
>  In general you should not run a Cassandra version X.Y.Z in production
> where Z < 5. Although I notice down thread that this cluster is not serving
> a critical business function... :)
>
> As you have discovered, you also should not generally try to upgrade
> across more than one major version.
>
>
>> Now problem is while upgrading from 1.2 à 2.0 we saw below errors
>> flooding the system.log files in one data center only until we upgraded all
>> the nodes in every DC to 2.0. We think that moment last node was upgraded
>> then this error was gone. Does anyone has idea what could have been causing
>> this? Is it because of some version incompatibility?
>>
>
> I would probably file a JIRA with relevant details/log snippets. It is
> quite possible that this is expected, major version upgrades
> semi-frequently spam logs with non-pathological error messages.
>
> =Rob
>

Re: Row Mutation Errors while upgrading to Cassandra2.0

Posted by Robert Coli <rc...@eventbrite.com>.
On Sun, Sep 22, 2013 at 7:02 PM, Shashilpi Krishan <
Shashilpi.Krishan@wizecommerce.com> wrote:

>  We had a Cassandra cluster (running with v1.0.7) spread across 3 data
> centers with each data center having 16 nodes. We started upgrading that to
> 2.0 but realized that we can’t go directly to 2.0 due to read failures,
> hence to avoid down time we have to go from 1.0 à 1.1 à 1.2 à 2.0.
>

In general you should not run a Cassandra version X.Y.Z in production where
Z < 5. Although I notice down thread that this cluster is not serving a
critical business function... :)

As you have discovered, you also should not generally try to upgrade across
more than one major version.


> Now problem is while upgrading from 1.2 à 2.0 we saw below errors
> flooding the system.log files in one data center only until we upgraded all
> the nodes in every DC to 2.0. We think that moment last node was upgraded
> then this error was gone. Does anyone has idea what could have been causing
> this? Is it because of some version incompatibility?
>

I would probably file a JIRA with relevant details/log snippets. It is
quite possible that this is expected, major version upgrades
semi-frequently spam logs with non-pathological error messages.

=Rob