You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Prakrati Agrawal <Pr...@mu-sigma.com> on 2012/06/08 08:20:38 UTC

Problem in getting data from a 2 node cluster of Cassandra

Dear all

I was originally having a 1 node cluster. Then I added one more node to it with initial token configured appropriately. Now when I run my queries I am not getting all my data ie all columns.
 Output on 2 nodes
Time taken to retrieve columns 43707 of key range is 1276
Time taken to retrieve columns 2084199 of all tickers is 54334
Time taken to count is 230776
Total number of rows in the database are 183
Total number of columns in the database are 7903753
Output on 1 node
Time taken to retrieve columns 43707 of key range is 767
Time taken to retrieve columns 3855552 of all tickers is 52793
Time taken to count is 268135
Total number of rows in the database are 396
Total number of columns in the database are 16316426
Please help me. Where is my data going or how should I retrieve it. I have consistency level specified as ONE and I did not specify any replication factor.



Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com


________________________________
This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.

Re: Problem in getting data from a 2 node cluster of Cassandra

Posted by rohit bhatia <ro...@gmail.com>.
run nodetool -h localhost cfstats on the nodes... this gives node
specific column family based data...
just run this for both nodes...

On Fri, Jun 8, 2012 at 12:46 PM, Prakrati Agrawal
<Pr...@mu-sigma.com> wrote:
> Yes the code is the same for both 1 and 2 node cluster. It's a Hector code. How do I get the number of rows and columns from Cassandra CLI as the data is very large.
>
> Thanks and Regards
> Prakrati
>
>
> -----Original Message-----
> From: Roshni Rajagopal [mailto:Roshni.Rajagopal@wal-mart.com]
> Sent: Friday, June 08, 2012 12:43 PM
> To: user@cassandra.apache.org
> Subject: Re: Problem in getting data from a 2 node cluster of Cassandra
>
> Hi Prakrati,
>
>  In an ideal situation, no data should be lost when a node is added. How are you getting the statistics below.
> The output below looks like its from some code using Hector or Thrift..is the code to get statistics from a 1 node cluster or 2 exactly the same- with the only change being a node being added or removed?
> Could you verify the number of rows & cols in the column family using CLI or CQL..
>
> Regards,
> Roshni
>
>
>
>
> From: Prakrati Agrawal <Pr...@mu-sigma.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Date: Friday 8 June 2012 11:50 AM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Subject: Problem in getting data from a 2 node cluster of Cassandra
>
> Dear all
>
> I was originally having a 1 node cluster. Then I added one more node to it with initial token configured appropriately. Now when I run my queries I am not getting all my data ie all columns.
>  Output on 2 nodes
> Time taken to retrieve columns 43707 of key range is 1276
> Time taken to retrieve columns 2084199 of all tickers is 54334
> Time taken to count is 230776
> Total number of rows in the database are 183
> Total number of columns in the database are 7903753
> Output on 1 node
> Time taken to retrieve columns 43707 of key range is 767
> Time taken to retrieve columns 3855552 of all tickers is 52793
> Time taken to count is 268135
> Total number of rows in the database are 396
> Total number of columns in the database are 16316426
> Please help me. Where is my data going or how should I retrieve it. I have consistency level specified as ONE and I did not specify any replication factor.
>
>
>
> Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com
>
>
> ________________________________
> This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
>
> This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
>
>  This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.

RE: Problem in getting data from a 2 node cluster of Cassandra

Posted by Prakrati Agrawal <Pr...@mu-sigma.com>.
Yes the code is the same for both 1 and 2 node cluster. It's a Hector code. How do I get the number of rows and columns from Cassandra CLI as the data is very large.

Thanks and Regards
Prakrati


-----Original Message-----
From: Roshni Rajagopal [mailto:Roshni.Rajagopal@wal-mart.com]
Sent: Friday, June 08, 2012 12:43 PM
To: user@cassandra.apache.org
Subject: Re: Problem in getting data from a 2 node cluster of Cassandra

Hi Prakrati,

 In an ideal situation, no data should be lost when a node is added. How are you getting the statistics below.
The output below looks like its from some code using Hector or Thrift..is the code to get statistics from a 1 node cluster or 2 exactly the same- with the only change being a node being added or removed?
Could you verify the number of rows & cols in the column family using CLI or CQL..

Regards,
Roshni




From: Prakrati Agrawal <Pr...@mu-sigma.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday 8 June 2012 11:50 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Problem in getting data from a 2 node cluster of Cassandra

Dear all

I was originally having a 1 node cluster. Then I added one more node to it with initial token configured appropriately. Now when I run my queries I am not getting all my data ie all columns.
 Output on 2 nodes
Time taken to retrieve columns 43707 of key range is 1276
Time taken to retrieve columns 2084199 of all tickers is 54334
Time taken to count is 230776
Total number of rows in the database are 183
Total number of columns in the database are 7903753
Output on 1 node
Time taken to retrieve columns 43707 of key range is 767
Time taken to retrieve columns 3855552 of all tickers is 52793
Time taken to count is 268135
Total number of rows in the database are 396
Total number of columns in the database are 16316426
Please help me. Where is my data going or how should I retrieve it. I have consistency level specified as ONE and I did not specify any replication factor.



Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com


________________________________
This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.

This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***

 This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.

Re: Problem in getting data from a 2 node cluster of Cassandra

Posted by Roshni Rajagopal <Ro...@wal-mart.com>.
Hi Prakrati,

 In an ideal situation, no data should be lost when a node is added. How are you getting the statistics below.
The output below looks like its from some code using Hector or Thrift..is the code to get statistics from a 1 node cluster or 2 exactly the same- with the only change being a node being added or removed?
Could you verify the number of rows & cols in the column family using CLI or CQL..

Regards,
Roshni




From: Prakrati Agrawal <Pr...@mu-sigma.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday 8 June 2012 11:50 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Problem in getting data from a 2 node cluster of Cassandra

Dear all

I was originally having a 1 node cluster. Then I added one more node to it with initial token configured appropriately. Now when I run my queries I am not getting all my data ie all columns.
 Output on 2 nodes
Time taken to retrieve columns 43707 of key range is 1276
Time taken to retrieve columns 2084199 of all tickers is 54334
Time taken to count is 230776
Total number of rows in the database are 183
Total number of columns in the database are 7903753
Output on 1 node
Time taken to retrieve columns 43707 of key range is 767
Time taken to retrieve columns 3855552 of all tickers is 52793
Time taken to count is 268135
Total number of rows in the database are 396
Total number of columns in the database are 16316426
Please help me. Where is my data going or how should I retrieve it. I have consistency level specified as ONE and I did not specify any replication factor.



Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com


________________________________
This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.

This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***