You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by joseph gao <ga...@gmail.com> on 2015/06/15 09:52:14 UTC

PrepareStatement problem

hi, all
      I'm using PrepareStatement. If I prepare a sql everytime I use,
cassandra will give me a warning tell me NOT PREPARE EVERYTIME. So I Cache
the PrepareStatement locally . But when other client change the table's
schema, like, add a new Column, If I still use the former Cached
PrepareStatement, the metadata will dismatch the data. The metadata tells n
column, and the data tells n+1 column. So what should I do to avoid this
problem?

-- 
------
Joseph Gao
PhoneNum:15210513582
QQ: 409343351

RE: PrepareStatement problem

Posted by "Peer, Oded" <Od...@rsa.com>.
When you alter a table the Cassandra server invalidates the prepared statements it is holding, so when clients (like your own) execute the prepared statement the server informs the client it needs to be re-prepared and the client does it automatically.
If this isn’t working for you then you should comment with your use case on the jira issue.

From: joseph gao [mailto:gaojf.bokecc@gmail.com]
Sent: Tuesday, June 16, 2015 10:31 AM
To: user@cassandra.apache.org
Subject: Re: PrepareStatement problem

But I'm using 2.1.6, I still get this bug. So, I should discard that PrepareStatement when I get the alter table message?  How can I get and deal that message?

2015-06-15 16:45 GMT+08:00 Peer, Oded <Od...@rsa.com>>:
This only applies to “select *” queries where you don’t specify the column names.
There is a reported bug and fixed in 2.1.3. See https://issues.apache.org/jira/browse/CASSANDRA-7910

From: joseph gao [mailto:gaojf.bokecc@gmail.com<ma...@gmail.com>]
Sent: Monday, June 15, 2015 10:52 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: PrepareStatement problem

hi, all
      I'm using PrepareStatement. If I prepare a sql everytime I use, cassandra will give me a warning tell me NOT PREPARE EVERYTIME. So I Cache the PrepareStatement locally . But when other client change the table's schema, like, add a new Column, If I still use the former Cached PrepareStatement, the metadata will dismatch the data. The metadata tells n column, and the data tells n+1 column. So what should I do to avoid this problem?

--
------
Joseph Gao
PhoneNum:15210513582
QQ: 409343351



--
------
Joseph Gao
PhoneNum:15210513582
QQ: 409343351

Re: PrepareStatement problem

Posted by joseph gao <ga...@gmail.com>.
But I'm using 2.1.6, I still get this bug. So, I should discard that
PrepareStatement when I get the alter table message?  How can I get and
deal that message?

2015-06-15 16:45 GMT+08:00 Peer, Oded <Od...@rsa.com>:

>  This only applies to “select *” queries where you don’t specify the
> column names.
>
> There is a reported bug and fixed in 2.1.3. See
> https://issues.apache.org/jira/browse/CASSANDRA-7910
>
>
>
> *From:* joseph gao [mailto:gaojf.bokecc@gmail.com]
> *Sent:* Monday, June 15, 2015 10:52 AM
> *To:* user@cassandra.apache.org
> *Subject:* PrepareStatement problem
>
>
>
> hi, all
>
>       I'm using PrepareStatement. If I prepare a sql everytime I use,
> cassandra will give me a warning tell me NOT PREPARE EVERYTIME. So I Cache
> the PrepareStatement locally . But when other client change the table's
> schema, like, add a new Column, If I still use the former Cached
> PrepareStatement, the metadata will dismatch the data. The metadata tells n
> column, and the data tells n+1 column. So what should I do to avoid this
> problem?
>
>
>
> --
>
> ------
>
> Joseph Gao
>
> PhoneNum:15210513582
>
> QQ: 409343351
>



-- 
------
Joseph Gao
PhoneNum:15210513582
QQ: 409343351

RE: PrepareStatement problem

Posted by "Peer, Oded" <Od...@rsa.com>.
This only applies to “select *” queries where you don’t specify the column names.
There is a reported bug and fixed in 2.1.3. See https://issues.apache.org/jira/browse/CASSANDRA-7910

From: joseph gao [mailto:gaojf.bokecc@gmail.com]
Sent: Monday, June 15, 2015 10:52 AM
To: user@cassandra.apache.org
Subject: PrepareStatement problem

hi, all
      I'm using PrepareStatement. If I prepare a sql everytime I use, cassandra will give me a warning tell me NOT PREPARE EVERYTIME. So I Cache the PrepareStatement locally . But when other client change the table's schema, like, add a new Column, If I still use the former Cached PrepareStatement, the metadata will dismatch the data. The metadata tells n column, and the data tells n+1 column. So what should I do to avoid this problem?

--
------
Joseph Gao
PhoneNum:15210513582
QQ: 409343351

Re: Missing data

Posted by Jean Tremblay <je...@zen-innovations.com>.
Thanks Bryan.
I believe I have a different problem with the Datastax 2.1.6 driver.
My problem is not that I make huge selects.
My problem seems more to occur on some inserts. I inserts MANY rows and with the version 2.1.6 of the driver I seem to be loosing some records.

But thanks anyway I will remember your mail when I bump into the select problem.

Cheers

Jean


On 15 Jun 2015, at 19:13 , Bryan Holladay <ho...@longsight.com>> wrote:

Theres your problem, you're using the DataStax java driver :) I just ran into this issue in the last week and it was incredibly frustrating. If you are doing a simple loop on a "select * " query, then the DataStax java driver will only process 2^31 rows (e.g. the Java Integer Max (2,147,483,647)) before it stops w/o any error or output in the logs. The fact that you said you only had about 2 billion rows but you are seeing missing data is a red flag.

I found the only way around this is to do your "select *" in chunks based on the token range (see this gist for an example: https://gist.github.com/baholladay/21eb4c61ea8905302195 )
Just loop for every 100million rows and make a new query "select * from TABLE where token(key) > lastToken"

Thanks,
Bryan




On Mon, Jun 15, 2015 at 12:50 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Dear all,

I identified a bit more closely the root cause of my missing data.

The problem is occurring when I use

<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>2.1.6</version>
</dependency>

on my client against Cassandra 2.1.6.

I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4.
Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6.  !!!!!!

So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!!

——————

As far as my tombstones are concerned I don’t understand their origin.
I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project).

And yet I have many tombstones building up.

Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them?

@Carlos thanks for your guidance.

Kind regards

Jean



On 15 Jun 2015, at 11:17 , Carlos Rolo <ro...@pythian.com>> wrote:

Hi Jean,

The problem of that Warning is that you are reading too many tombstones per request.

If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649>
www.pythian.com<http://www.pythian.com/>

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput has been change.

I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no errors.
On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.



--







Re: Missing data

Posted by Bryan Holladay <ho...@longsight.com>.
Theres your problem, you're using the DataStax java driver :) I just ran
into this issue in the last week and it was incredibly frustrating. If you
are doing a simple loop on a "select * " query, then the DataStax java
driver will only process 2^31 rows (e.g. the Java Integer Max
(2,147,483,647)) before it stops w/o any error or output in the logs. The
fact that you said you only had about 2 billion rows but you are seeing
missing data is a red flag.

I found the only way around this is to do your "select *" in chunks based
on the token range (see this gist for an example:
https://gist.github.com/baholladay/21eb4c61ea8905302195 )
Just loop for every 100million rows and make a new query "select * from
TABLE where token(key) > lastToken"

Thanks,
Bryan




On Mon, Jun 15, 2015 at 12:50 PM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

>  Dear all,
>
>  I identified a bit more closely the root cause of my missing data.
>
>  The problem is occurring when I use
>
>   <dependency>
> <groupId>com.datastax.cassandra</groupId>
> <artifactId>cassandra-driver-core</artifactId>
>  <version>2.1.6</version>
>  </dependency>
>
>  on my client against Cassandra 2.1.6.
>
>  I did not have the problem when I was using the driver 2.1.4 with C*
> 2.1.4.
> Interestingly enough I don’t have the problem with the driver 2.1.4 with
> C* 2.1.6.  !!!!!!
>
>  So as far as I can locate the problem, I would say that the version
> 2.1.6 of the driver is not working properly and is loosing some of my
> records.!!!
>
>  ——————
>
>  As far as my tombstones are concerned I don’t understand their origin.
> I removed all location in my code where I delete items, and I do not use
> TTL anywhere ( I don’t need this feature in my project).
>
>  And yet I have many tombstones building up.
>
>  Is there another origin for tombstone beside TTL, and deleting items?
> Could the compaction of LeveledCompactionStrategy be the origin of them?
>
>  @Carlos thanks for your guidance.
>
>  Kind regards
>
>  Jean
>
>
>
>  On 15 Jun 2015, at 11:17 , Carlos Rolo <ro...@pythian.com> wrote:
>
>  Hi Jean,
>
>  The problem of that Warning is that you are reading too many tombstones
> per request.
>
>  If you do have Tombstones without doing DELETE it because you probably
> TTL'ed the data when inserting (By mistake? Or did you set
> default_time_to_live in your table?). You can use nodetool cfstats to see
> how many tombstones per read slice you have. This is, probably, also the
> cause of your missing data. Data was tombstoned, so it is not available.
>
>
>
>    Regards,
>
>  Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
>  rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>> Hi,
>>
>>  I have reloaded the data in my cluster of 3 nodes RF: 2.
>> I have loaded about 2 billion rows in one table.
>> I use LeveledCompactionStrategy on my table.
>> I use version 2.1.6.
>> I use the default cassandra.yaml, only the ip address for seeds and
>> throughput has been change.
>>
>>  I loaded my data with simple insert statements. This took a bit more
>> than one day to load the data… and one more day to compact the data on all
>> nodes.
>> For me this is quite acceptable since I should not be doing this again.
>> I have done this with previous versions like 2.1.3 and others and I
>> basically had absolutely no problems.
>>
>>  Now I read the log files on the client side, there I see no warning and
>> no errors.
>> On the nodes side there I see many WARNING, all related with tombstones,
>> but there are no ERRORS.
>>
>>  My problem is that I see some *many missing records* in the DB, and I
>> have never observed this with previous versions.
>>
>>  1) Is this a know problem?
>> 2) Do you have any idea how I could track down this problem?
>> 3) What is the meaning of this WARNING (the only type of ERROR | WARN  I
>> could find)?
>>
>>  WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866
>> SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in
>> gttdata.alltrades_co_rep_pcode for key: D:07 (see
>> tombstone_warn_threshold). 5000 columns were requested,
>> slices=[388:201001-388:201412:!]
>>
>>
>>  4) Is it possible to have Tombstone when we make no DELETE statements?
>>
>>  I’m lost…
>>
>>  Thanks for your help.
>>
>
>
> --
>
>
>
>
>
>

Re: Missing data

Posted by Jean Tremblay <je...@zen-innovations.com>.
Thanks Robert, but I don’t insert NULL values, but thanks anyway.

On 15 Jun 2015, at 19:16 , Robert Wille <rw...@fold3.com>> wrote:

You can get tombstones from inserting null values. Not sure if that’s the problem, but it is another way of getting tombstones in your data.

On Jun 15, 2015, at 10:50 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:

Dear all,

I identified a bit more closely the root cause of my missing data.

The problem is occurring when I use

<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>2.1.6</version>
</dependency>

on my client against Cassandra 2.1.6.

I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4.
Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6.  !!!!!!

So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!!

——————

As far as my tombstones are concerned I don’t understand their origin.
I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project).

And yet I have many tombstones building up.

Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them?

@Carlos thanks for your guidance.

Kind regards

Jean



On 15 Jun 2015, at 11:17 , Carlos Rolo <ro...@pythian.com>> wrote:

Hi Jean,

The problem of that Warning is that you are reading too many tombstones per request.

If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com<http://www.pythian.com/>

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput has been change.

I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no errors.
On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.



--







Re: Missing data

Posted by Robert Wille <rw...@fold3.com>.
You can get tombstones from inserting null values. Not sure if that’s the problem, but it is another way of getting tombstones in your data.

On Jun 15, 2015, at 10:50 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:

Dear all,

I identified a bit more closely the root cause of my missing data.

The problem is occurring when I use

<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>2.1.6</version>
</dependency>

on my client against Cassandra 2.1.6.

I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4.
Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6.  !!!!!!

So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!!

——————

As far as my tombstones are concerned I don’t understand their origin.
I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project).

And yet I have many tombstones building up.

Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them?

@Carlos thanks for your guidance.

Kind regards

Jean



On 15 Jun 2015, at 11:17 , Carlos Rolo <ro...@pythian.com>> wrote:

Hi Jean,

The problem of that Warning is that you are reading too many tombstones per request.

If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com<http://www.pythian.com/>

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput has been change.

I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no errors.
On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.



--






Re: Missing data

Posted by Jean Tremblay <je...@zen-innovations.com>.
Dear all,

I identified a bit more closely the root cause of my missing data.

The problem is occurring when I use

<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>2.1.6</version>
</dependency>

on my client against Cassandra 2.1.6.

I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4.
Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6.  !!!!!!

So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!!

——————

As far as my tombstones are concerned I don’t understand their origin.
I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project).

And yet I have many tombstones building up.

Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them?

@Carlos thanks for your guidance.

Kind regards

Jean



On 15 Jun 2015, at 11:17 , Carlos Rolo <ro...@pythian.com>> wrote:

Hi Jean,

The problem of that Warning is that you are reading too many tombstones per request.

If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com<http://www.pythian.com/>

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput has been change.

I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no errors.
On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.



--





Re: Missing data

Posted by Carlos Rolo <ro...@pythian.com>.
Hi Jean,

The problem of that Warning is that you are reading too many tombstones per
request.

If you do have Tombstones without doing DELETE it because you probably
TTL'ed the data when inserting (By mistake? Or did you set
default_time_to_live in your table?). You can use nodetool cfstats to see
how many tombstones per read slice you have. This is, probably, also the
cause of your missing data. Data was tombstoned, so it is not available.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

>  Hi,
>
>  I have reloaded the data in my cluster of 3 nodes RF: 2.
> I have loaded about 2 billion rows in one table.
> I use LeveledCompactionStrategy on my table.
> I use version 2.1.6.
> I use the default cassandra.yaml, only the ip address for seeds and
> throughput has been change.
>
>  I loaded my data with simple insert statements. This took a bit more
> than one day to load the data… and one more day to compact the data on all
> nodes.
> For me this is quite acceptable since I should not be doing this again.
> I have done this with previous versions like 2.1.3 and others and I
> basically had absolutely no problems.
>
>  Now I read the log files on the client side, there I see no warning and
> no errors.
> On the nodes side there I see many WARNING, all related with tombstones,
> but there are no ERRORS.
>
>  My problem is that I see some *many missing records* in the DB, and I
> have never observed this with previous versions.
>
>  1) Is this a know problem?
> 2) Do you have any idea how I could track down this problem?
> 3) What is the meaning of this WARNING (the only type of ERROR | WARN  I
> could find)?
>
>  WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866
> SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in
> gttdata.alltrades_co_rep_pcode for key: D:07 (see
> tombstone_warn_threshold). 5000 columns were requested,
> slices=[388:201001-388:201412:!]
>
>
>  4) Is it possible to have Tombstone when we make no DELETE statements?
>
>  I’m lost…
>
>  Thanks for your help.
>

-- 


--




RE: nodetool repair

Posted by Jens Rantil <je...@tink.se>.
Hi,


For the record I've succesfully used https://github.com/BrianGallew/cassandra_range_repair to make smooth repairing. Could maybe also be of interest don't know...




Cheers,

Jens





–
Skickat från Mailbox

On Fri, Jun 19, 2015 at 8:36 PM, null <SE...@homedepot.com> wrote:

> It seems to me that running repair on any given node may also induce repairs to related replica nodes. For example, if I run repair on node A and node B has some replicas, data might stream from A to B (assuming A has newer/more data). Now, that does NOT mean that node B will be fully repaired. You still need to run repair -pr on all nodes before gc_grace_seconds.
> You can run repairs on multiple nodes at the same time. However, you might end up with a large amount of streaming, if many repairs are needed. So, you should be aware of a performance impact.
> I run weekly repairs on one node at a time, if possible. On, larger rings, though, I run repairs on multiple nodes staggered by a few hours. Once your routine maintenance is established, repairs will not run for very long. But, if you have a large ring that hasn’t been repaired, those first repairs may take days (but should get faster as you get further through the ring).
> Sean Durity
> From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
> Sent: Friday, June 19, 2015 3:56 AM
> To: user@cassandra.apache.org
> Subject: Re: nodetool repair
> Hi,
> This is not necessarily true. Repair will induce compactions only if you have entropy in your cluster. If not it will just read your data to compare all the replica of each piece of data (using indeed cpu and disk IO).
> If there is some data missing it will "repair" it. Though, due to merkle tree size, you will generally stream more data than just the data needed. To limit this downside and the compactions amount, use range repairs --> http://www.datastax.com/dev/blog/advanced-repair-techniques.
> About tombstones, they will be evicted only after gc_grace_period and only if all the parts of the row are part of the compaction.
> C*heers,
> Alain
> 2015-06-19 9:08 GMT+02:00 arun sirimalla <ar...@gmail.com>>:
> Yes compactions will remove tombstones
> On Thu, Jun 18, 2015 at 11:46 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
> Perfect thank you.
> So making a weekly "nodetool repair -pr”  on all nodes one after the other will repair my cluster. That is great.
> If it does a compaction, does it mean that it would also clean up my tombstone from my LeveledCompactionStrategy tables at the same time?
> Thanks for your help.
> On 19 Jun 2015, at 07:56 , arun sirimalla <ar...@gmail.com>> wrote:
> Hi Jean,
> Running nodetool repair on a node will repair only that node in the cluster. It is recommended to run nodetool repair on one node at a time.
> Few things to keep in mind while running repair
>    1. Running repair will trigger compactions
>    2. Increase in CPU utilization.
> Run node tool repair with -pr option, so that it will repair only the range that node is responsible for.
> On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
> Thanks Jonathan.
> But I need to know the following:
> If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?
> If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?
> Kind regards
> On 18 Jun 2015, at 19:19 , Jonathan Haddad <jo...@jonhaddad.com>> wrote:
> If you're using DSE, you can schedule it automatically using the repair service.  If you're open source, check out Spotify cassandra reaper, it'll manage it for you.
> https://github.com/spotify/cassandra-reaper
> On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <je...@zen-innovations.com>> wrote:
> Hi,
> I want to make on a regular base repairs on my cluster as suggested by the documentation.
> I want to do this in a way that the cluster is still responding to read requests.
> So I understand that I should not use the -par switch for that as it will do the repair in parallel and consume all available resources.
> If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?
> If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?
> If we had down time periods I would issue a nodetool -par, but we don’t have down time periods.
> Sorry for the stupid questions.
> Thanks for your help.
> --
> Arun
> Senior Hadoop/Cassandra Engineer
> Cloudwick
> 2014 Data Impact Award Winner (Cloudera)
> http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
> --
> Arun
> Senior Hadoop/Cassandra Engineer
> Cloudwick
> 2014 Data Impact Award Winner (Cloudera)
> http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
> ________________________________
> The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: nodetool repair

Posted by Jean Tremblay <je...@zen-innovations.com>.
Thank you for your reply.



On 19 Jun 2015, at 20:36, "SEAN_R_DURITY@homedepot.com<ma...@homedepot.com>" <SE...@homedepot.com>> wrote:

It seems to me that running repair on any given node may also induce repairs to related replica nodes. For example, if I run repair on node A and node B has some replicas, data might stream from A to B (assuming A has newer/more data). Now, that does NOT mean that node B will be fully repaired. You still need to run repair -pr on all nodes before gc_grace_seconds.

You can run repairs on multiple nodes at the same time. However, you might end up with a large amount of streaming, if many repairs are needed. So, you should be aware of a performance impact.

I run weekly repairs on one node at a time, if possible. On, larger rings, though, I run repairs on multiple nodes staggered by a few hours. Once your routine maintenance is established, repairs will not run for very long. But, if you have a large ring that hasn’t been repaired, those first repairs may take days (but should get faster as you get further through the ring).


Sean Durity

From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
Sent: Friday, June 19, 2015 3:56 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: nodetool repair

Hi,

This is not necessarily true. Repair will induce compactions only if you have entropy in your cluster. If not it will just read your data to compare all the replica of each piece of data (using indeed cpu and disk IO).

If there is some data missing it will "repair" it. Though, due to merkle tree size, you will generally stream more data than just the data needed. To limit this downside and the compactions amount, use range repairs --> http://www.datastax.com/dev/blog/advanced-repair-techniques.

About tombstones, they will be evicted only after gc_grace_period and only if all the parts of the row are part of the compaction.

C*heers,

Alain

2015-06-19 9:08 GMT+02:00 arun sirimalla <ar...@gmail.com>>:
Yes compactions will remove tombstones

On Thu, Jun 18, 2015 at 11:46 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Perfect thank you.
So making a weekly "nodetool repair -pr”  on all nodes one after the other will repair my cluster. That is great.

If it does a compaction, does it mean that it would also clean up my tombstone from my LeveledCompactionStrategy tables at the same time?

Thanks for your help.

On 19 Jun 2015, at 07:56 , arun sirimalla <ar...@gmail.com>> wrote:

Hi Jean,

Running nodetool repair on a node will repair only that node in the cluster. It is recommended to run nodetool repair on one node at a time.

Few things to keep in mind while running repair
   1. Running repair will trigger compactions
   2. Increase in CPU utilization.


Run node tool repair with -pr option, so that it will repair only the range that node is responsible for.

On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Thanks Jonathan.

But I need to know the following:

If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?
If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?

Kind regards

On 18 Jun 2015, at 19:19 , Jonathan Haddad <jo...@jonhaddad.com>> wrote:

If you're using DSE, you can schedule it automatically using the repair service.  If you're open source, check out Spotify cassandra reaper, it'll manage it for you.

https://github.com/spotify/cassandra-reaper



On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I want to make on a regular base repairs on my cluster as suggested by the documentation.
I want to do this in a way that the cluster is still responding to read requests.
So I understand that I should not use the -par switch for that as it will do the repair in parallel and consume all available resources.

If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?

If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?

If we had down time periods I would issue a nodetool -par, but we don’t have down time periods.

Sorry for the stupid questions.
Thanks for your help.




--
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html





--
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html



________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

RE: nodetool repair

Posted by SE...@homedepot.com.
It seems to me that running repair on any given node may also induce repairs to related replica nodes. For example, if I run repair on node A and node B has some replicas, data might stream from A to B (assuming A has newer/more data). Now, that does NOT mean that node B will be fully repaired. You still need to run repair -pr on all nodes before gc_grace_seconds.

You can run repairs on multiple nodes at the same time. However, you might end up with a large amount of streaming, if many repairs are needed. So, you should be aware of a performance impact.

I run weekly repairs on one node at a time, if possible. On, larger rings, though, I run repairs on multiple nodes staggered by a few hours. Once your routine maintenance is established, repairs will not run for very long. But, if you have a large ring that hasn’t been repaired, those first repairs may take days (but should get faster as you get further through the ring).


Sean Durity

From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
Sent: Friday, June 19, 2015 3:56 AM
To: user@cassandra.apache.org
Subject: Re: nodetool repair

Hi,

This is not necessarily true. Repair will induce compactions only if you have entropy in your cluster. If not it will just read your data to compare all the replica of each piece of data (using indeed cpu and disk IO).

If there is some data missing it will "repair" it. Though, due to merkle tree size, you will generally stream more data than just the data needed. To limit this downside and the compactions amount, use range repairs --> http://www.datastax.com/dev/blog/advanced-repair-techniques.

About tombstones, they will be evicted only after gc_grace_period and only if all the parts of the row are part of the compaction.

C*heers,

Alain

2015-06-19 9:08 GMT+02:00 arun sirimalla <ar...@gmail.com>>:
Yes compactions will remove tombstones

On Thu, Jun 18, 2015 at 11:46 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Perfect thank you.
So making a weekly "nodetool repair -pr”  on all nodes one after the other will repair my cluster. That is great.

If it does a compaction, does it mean that it would also clean up my tombstone from my LeveledCompactionStrategy tables at the same time?

Thanks for your help.

On 19 Jun 2015, at 07:56 , arun sirimalla <ar...@gmail.com>> wrote:

Hi Jean,

Running nodetool repair on a node will repair only that node in the cluster. It is recommended to run nodetool repair on one node at a time.

Few things to keep in mind while running repair
   1. Running repair will trigger compactions
   2. Increase in CPU utilization.


Run node tool repair with -pr option, so that it will repair only the range that node is responsible for.

On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Thanks Jonathan.

But I need to know the following:

If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?
If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?

Kind regards

On 18 Jun 2015, at 19:19 , Jonathan Haddad <jo...@jonhaddad.com>> wrote:

If you're using DSE, you can schedule it automatically using the repair service.  If you're open source, check out Spotify cassandra reaper, it'll manage it for you.

https://github.com/spotify/cassandra-reaper



On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I want to make on a regular base repairs on my cluster as suggested by the documentation.
I want to do this in a way that the cluster is still responding to read requests.
So I understand that I should not use the -par switch for that as it will do the repair in parallel and consume all available resources.

If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?

If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?

If we had down time periods I would issue a nodetool -par, but we don’t have down time periods.

Sorry for the stupid questions.
Thanks for your help.




--
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html





--
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html



________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: nodetool repair

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi,

This is not necessarily true. Repair will induce compactions only if you
have entropy in your cluster. If not it will just read your data to compare
all the replica of each piece of data (using indeed cpu and disk IO).

If there is some data missing it will "repair" it. Though, due to merkle
tree size, you will generally stream more data than just the data needed.
To limit this downside and the compactions amount, use range repairs -->
http://www.datastax.com/dev/blog/advanced-repair-techniques.

About tombstones, they will be evicted only after gc_grace_period and only
if all the parts of the row are part of the compaction.

C*heers,

Alain

2015-06-19 9:08 GMT+02:00 arun sirimalla <ar...@gmail.com>:

> Yes compactions will remove tombstones
>
> On Thu, Jun 18, 2015 at 11:46 PM, Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>>  Perfect thank you.
>> So making a weekly "nodetool repair -pr”  on all nodes one after the
>> other will repair my cluster. That is great.
>>
>>  If it does a compaction, does it mean that it would also clean up my
>> tombstone from my LeveledCompactionStrategy tables at the same time?
>>
>>  Thanks for your help.
>>
>>  On 19 Jun 2015, at 07:56 , arun sirimalla <ar...@gmail.com> wrote:
>>
>>  Hi Jean,
>>
>>  Running nodetool repair on a node will repair only that node in the
>> cluster. It is recommended to run nodetool repair on one node at a time.
>>
>>  Few things to keep in mind while running repair
>>    1. Running repair will trigger compactions
>>    2. Increase in CPU utilization.
>>
>>
>>  Run node tool repair with -pr option, so that it will repair only the
>> range that node is responsible for.
>>
>> On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay <
>> jean.tremblay@zen-innovations.com> wrote:
>>
>>> Thanks Jonathan.
>>>
>>>  But I need to know the following:
>>>
>>>  If you issue a “nodetool repair” on one node will it repair all the
>>> nodes in the cluster or only the one on which we issue the command?
>>>
>>>    If it repairs only one node, do I have to wait that the nodetool
>>> repair ends, and only then issue another “nodetool repair” on the next node?
>>>
>>>  Kind regards
>>>
>>>  On 18 Jun 2015, at 19:19 , Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>>
>>>  If you're using DSE, you can schedule it automatically using the
>>> repair service.  If you're open source, check out Spotify cassandra reaper,
>>> it'll manage it for you.
>>>
>>>  https://github.com/spotify/cassandra-reaper
>>>
>>>
>>>
>>>  On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <
>>> jean.tremblay@zen-innovations.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to make on a regular base repairs on my cluster as suggested by
>>>> the documentation.
>>>> I want to do this in a way that the cluster is still responding to read
>>>> requests.
>>>> So I understand that I should not use the -par switch for that as it
>>>> will do the repair in parallel and consume all available resources.
>>>>
>>>> If you issue a “nodetool repair” on one node will it repair all the
>>>> nodes in the cluster or only the one on which we issue the command?
>>>>
>>>> If it repairs only one node, do I have to wait that the nodetool repair
>>>> ends, and only then issue another “nodetool repair” on the next node?
>>>>
>>>> If we had down time periods I would issue a nodetool -par, but we don’t
>>>> have down time periods.
>>>>
>>>> Sorry for the stupid questions.
>>>> Thanks for your help.
>>>
>>>
>>>
>>
>>
>>  --
>>     Arun
>> Senior Hadoop/Cassandra Engineer
>> Cloudwick
>>
>>
>>  2014 Data Impact Award Winner (Cloudera)
>>
>> http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
>>
>>
>>
>
>
> --
> Arun
> Senior Hadoop/Cassandra Engineer
> Cloudwick
>
>
> 2014 Data Impact Award Winner (Cloudera)
>
> http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
>
>

Re: nodetool repair

Posted by arun sirimalla <ar...@gmail.com>.
Yes compactions will remove tombstones

On Thu, Jun 18, 2015 at 11:46 PM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

>  Perfect thank you.
> So making a weekly "nodetool repair -pr”  on all nodes one after the other
> will repair my cluster. That is great.
>
>  If it does a compaction, does it mean that it would also clean up my
> tombstone from my LeveledCompactionStrategy tables at the same time?
>
>  Thanks for your help.
>
>  On 19 Jun 2015, at 07:56 , arun sirimalla <ar...@gmail.com> wrote:
>
>  Hi Jean,
>
>  Running nodetool repair on a node will repair only that node in the
> cluster. It is recommended to run nodetool repair on one node at a time.
>
>  Few things to keep in mind while running repair
>    1. Running repair will trigger compactions
>    2. Increase in CPU utilization.
>
>
>  Run node tool repair with -pr option, so that it will repair only the
> range that node is responsible for.
>
> On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>> Thanks Jonathan.
>>
>>  But I need to know the following:
>>
>>  If you issue a “nodetool repair” on one node will it repair all the
>> nodes in the cluster or only the one on which we issue the command?
>>
>>    If it repairs only one node, do I have to wait that the nodetool
>> repair ends, and only then issue another “nodetool repair” on the next node?
>>
>>  Kind regards
>>
>>  On 18 Jun 2015, at 19:19 , Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>
>>  If you're using DSE, you can schedule it automatically using the repair
>> service.  If you're open source, check out Spotify cassandra reaper, it'll
>> manage it for you.
>>
>>  https://github.com/spotify/cassandra-reaper
>>
>>
>>
>>  On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <
>> jean.tremblay@zen-innovations.com> wrote:
>>
>>> Hi,
>>>
>>> I want to make on a regular base repairs on my cluster as suggested by
>>> the documentation.
>>> I want to do this in a way that the cluster is still responding to read
>>> requests.
>>> So I understand that I should not use the -par switch for that as it
>>> will do the repair in parallel and consume all available resources.
>>>
>>> If you issue a “nodetool repair” on one node will it repair all the
>>> nodes in the cluster or only the one on which we issue the command?
>>>
>>> If it repairs only one node, do I have to wait that the nodetool repair
>>> ends, and only then issue another “nodetool repair” on the next node?
>>>
>>> If we had down time periods I would issue a nodetool -par, but we don’t
>>> have down time periods.
>>>
>>> Sorry for the stupid questions.
>>> Thanks for your help.
>>
>>
>>
>
>
>  --
>     Arun
> Senior Hadoop/Cassandra Engineer
> Cloudwick
>
>
>  2014 Data Impact Award Winner (Cloudera)
>
> http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
>
>
>


-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html

Re: nodetool repair

Posted by Jean Tremblay <je...@zen-innovations.com>.
Perfect thank you.
So making a weekly "nodetool repair -pr”  on all nodes one after the other will repair my cluster. That is great.

If it does a compaction, does it mean that it would also clean up my tombstone from my LeveledCompactionStrategy tables at the same time?

Thanks for your help.

On 19 Jun 2015, at 07:56 , arun sirimalla <ar...@gmail.com>> wrote:

Hi Jean,

Running nodetool repair on a node will repair only that node in the cluster. It is recommended to run nodetool repair on one node at a time.

Few things to keep in mind while running repair
   1. Running repair will trigger compactions
   2. Increase in CPU utilization.


Run node tool repair with -pr option, so that it will repair only the range that node is responsible for.

On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Thanks Jonathan.

But I need to know the following:

If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?

If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?

Kind regards

On 18 Jun 2015, at 19:19 , Jonathan Haddad <jo...@jonhaddad.com>> wrote:

If you're using DSE, you can schedule it automatically using the repair service.  If you're open source, check out Spotify cassandra reaper, it'll manage it for you.

https://github.com/spotify/cassandra-reaper



On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I want to make on a regular base repairs on my cluster as suggested by the documentation.
I want to do this in a way that the cluster is still responding to read requests.
So I understand that I should not use the -par switch for that as it will do the repair in parallel and consume all available resources.

If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?

If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?

If we had down time periods I would issue a nodetool -par, but we don’t have down time periods.

Sorry for the stupid questions.
Thanks for your help.




--
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html



Re: nodetool repair

Posted by arun sirimalla <ar...@gmail.com>.
Hi Jean,

Running nodetool repair on a node will repair only that node in the
cluster. It is recommended to run nodetool repair on one node at a time.

Few things to keep in mind while running repair
   1. Running repair will trigger compactions
   2. Increase in CPU utilization.


Run node tool repair with -pr option, so that it will repair only the range
that node is responsible for.

On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

>  Thanks Jonathan.
>
>  But I need to know the following:
>
>  If you issue a “nodetool repair” on one node will it repair all the
> nodes in the cluster or only the one on which we issue the command?
>
>    If it repairs only one node, do I have to wait that the nodetool
> repair ends, and only then issue another “nodetool repair” on the next node?
>
>  Kind regards
>
>  On 18 Jun 2015, at 19:19 , Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
>  If you're using DSE, you can schedule it automatically using the repair
> service.  If you're open source, check out Spotify cassandra reaper, it'll
> manage it for you.
>
>  https://github.com/spotify/cassandra-reaper
>
>
>
>  On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>> Hi,
>>
>> I want to make on a regular base repairs on my cluster as suggested by
>> the documentation.
>> I want to do this in a way that the cluster is still responding to read
>> requests.
>> So I understand that I should not use the -par switch for that as it will
>> do the repair in parallel and consume all available resources.
>>
>> If you issue a “nodetool repair” on one node will it repair all the nodes
>> in the cluster or only the one on which we issue the command?
>>
>> If it repairs only one node, do I have to wait that the nodetool repair
>> ends, and only then issue another “nodetool repair” on the next node?
>>
>> If we had down time periods I would issue a nodetool -par, but we don’t
>> have down time periods.
>>
>> Sorry for the stupid questions.
>> Thanks for your help.
>
>
>


-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html

Re: nodetool repair

Posted by Jean Tremblay <je...@zen-innovations.com>.
Thanks Jonathan.

But I need to know the following:

If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?

If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?

Kind regards

On 18 Jun 2015, at 19:19 , Jonathan Haddad <jo...@jonhaddad.com>> wrote:

If you're using DSE, you can schedule it automatically using the repair service.  If you're open source, check out Spotify cassandra reaper, it'll manage it for you.

https://github.com/spotify/cassandra-reaper



On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I want to make on a regular base repairs on my cluster as suggested by the documentation.
I want to do this in a way that the cluster is still responding to read requests.
So I understand that I should not use the -par switch for that as it will do the repair in parallel and consume all available resources.

If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?

If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?

If we had down time periods I would issue a nodetool -par, but we don’t have down time periods.

Sorry for the stupid questions.
Thanks for your help.


Re: nodetool repair

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
If you're using DSE, you can schedule it automatically using the repair
service.  If you're open source, check out Spotify cassandra reaper, it'll
manage it for you.

https://github.com/spotify/cassandra-reaper



On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

> Hi,
>
> I want to make on a regular base repairs on my cluster as suggested by the
> documentation.
> I want to do this in a way that the cluster is still responding to read
> requests.
> So I understand that I should not use the -par switch for that as it will
> do the repair in parallel and consume all available resources.
>
> If you issue a “nodetool repair” on one node will it repair all the nodes
> in the cluster or only the one on which we issue the command?
>
> If it repairs only one node, do I have to wait that the nodetool repair
> ends, and only then issue another “nodetool repair” on the next node?
>
> If we had down time periods I would issue a nodetool -par, but we don’t
> have down time periods.
>
> Sorry for the stupid questions.
> Thanks for your help.

nodetool repair

Posted by Jean Tremblay <je...@zen-innovations.com>.
Hi,

I want to make on a regular base repairs on my cluster as suggested by the documentation.
I want to do this in a way that the cluster is still responding to read requests.
So I understand that I should not use the -par switch for that as it will do the repair in parallel and consume all available resources.

If you issue a “nodetool repair” on one node will it repair all the nodes in the cluster or only the one on which we issue the command?

If it repairs only one node, do I have to wait that the nodetool repair ends, and only then issue another “nodetool repair” on the next node?

If we had down time periods I would issue a nodetool -par, but we don’t have down time periods.

Sorry for the stupid questions.
Thanks for your help.

Re: Catastrophy Recovery.

Posted by Saladi Naidu <na...@yahoo.com>.
Alain great write-up on the recovery procedure. You had covered both RF factor and Consistency levels. As mentioned two anti entropy mechanisms, hinted hand off's and Read Repair work for temporary node outage and incremental recovery. In case of disaster/catastrophic recovery, nodetool repair is best way to recover back. 
Is below procedure would have ensured node being added properly to the cluster?
Adding nodes to an existing cluster | DataStax Cassandra 2.0 Documentation 
|   |
|   |   |   |   |   |
| Adding nodes to an existing cluster | DataStax Cassandra 2.0 DocumentationSteps to add nodes when using virtual nodes. | Version 2.0 |
|  |
| View on docs.datastax.com | Preview by Yahoo |
|  |
|   |

   Naidu Saladi 

      From: Jean Tremblay <je...@zen-innovations.com>
 To: "user@cassandra.apache.org" <us...@cassandra.apache.org> 
 Sent: Monday, June 15, 2015 10:58 AM
 Subject: Re: Catastrophy Recovery.
   
That is really wonderful. Thank you very much Alain. You gave me a lot of trails to investigate. Thanks again for you help.



On 15 Jun 2015, at 17:49 , Alain RODRIGUEZ <ar...@gmail.com> wrote:
Hi, it looks like your starting to use Cassandra.
Welcome.
I invite you to read from here as much as you can http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html.
When a node lose some data you have various anti entropy mechanism
Hinted Handoff --> For writes that occurred while node was down and known as such by other nodes (exclusively)Read repair --> On each read, you can set a chance to check other nodes for auto correction.Repair ( called either manual / anti entropy / full / ...) : Which takes care to give back a node its missing data only for the range this node handles (-pr) or for all its data (its range plus its replica). This is something you generally want to perform on all nodes on a regular basis (lower than the lowest gc_grace_period set on any of your tables).
Also, you are having wrong values because you probably have a Consistency Level (CL) too low. If you want this to never happen you have to set Read (R) / Write (W) consistency level as follow : R + W > RF (Refplication Factor), if not you can see what you are currently seeing. I advise you to set your consistency to "local_quorum" or "quorum" on single DC environment. Also, with 3 nodes, you should set RF to 3, if not you won't be able to reach a strong consistency due to the formula I just give you.
There is a lot more to know, you should read about this all. Using Cassandra without knowing about its internals would lead you to very poor and unexpected results.
To answer your questions:
"For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct?"

--> Not at all, unless it join the ring for the first time, which is not your case. Through it will (by default) slowly recover while you read.
"After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe?"
No, we can't ensure that, excepted dropping the node and bootstrapping a new one. What we can make sure of is that there is enough replica remaining to serve consistent data (search for RF and CL)
"After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair"?"
This sentence is false bootstrap / joining phase ≠ from broken node coming back. You are right on repair, if a broken node (or down for too long - default 3 hours) come back you have to repair. But repair is slow, make sure you can afford a node, see my previous answer.
Testing is a really good idea but you also have to read a lot imho.
Good luck,
C*heers,
Alain

2015-06-15 11:13 GMT+02:00 Jean Tremblay <je...@zen-innovations.com>:




Hi,

I have a cluster of 3 nodes RF: 2.
There are about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput has been change.
I am have tested a scenario where one node crashes and loose all its data.I have deleted all data on this node after having stopped Cassandra.At this point I noticed that the cluster was giving proper results. What I was expecting from a cluster DB.
I then restarted that node and I observed that the node was joining the cluster.After an hour or so the old “defect” node was up and normal. I noticed that its hard disk loaded with much less data than its neighbours.
When I was querying the DB, the cluster was giving me different results for successive identical queries.I guess the old “defect” node was giving me less rows than it should have.
1) For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct?2) After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe?3) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair"?

Thanks for your comments.
Kind regards
Jean






  

Re: Catastrophy Recovery.

Posted by Jean Tremblay <je...@zen-innovations.com>.
That is really wonderful. Thank you very much Alain. You gave me a lot of trails to investigate. Thanks again for you help.

On 15 Jun 2015, at 17:49 , Alain RODRIGUEZ <ar...@gmail.com>> wrote:

Hi, it looks like your starting to use Cassandra.

Welcome.

I invite you to read from here as much as you can http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html.

When a node lose some data you have various anti entropy mechanism

Hinted Handoff --> For writes that occurred while node was down and known as such by other nodes (exclusively)
Read repair --> On each read, you can set a chance to check other nodes for auto correction.
Repair ( called either manual / anti entropy / full / ...) : Which takes care to give back a node its missing data only for the range this node handles (-pr) or for all its data (its range plus its replica). This is something you generally want to perform on all nodes on a regular basis (lower than the lowest gc_grace_period set on any of your tables).

Also, you are having wrong values because you probably have a Consistency Level (CL) too low. If you want this to never happen you have to set Read (R) / Write (W) consistency level as follow : R + W > RF (Refplication Factor), if not you can see what you are currently seeing. I advise you to set your consistency to "local_quorum" or "quorum" on single DC environment. Also, with 3 nodes, you should set RF to 3, if not you won't be able to reach a strong consistency due to the formula I just give you.

There is a lot more to know, you should read about this all. Using Cassandra without knowing about its internals would lead you to very poor and unexpected results.

To answer your questions:

"For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct?"

--> Not at all, unless it join the ring for the first time, which is not your case. Through it will (by default) slowly recover while you read.

"After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe?"

No, we can't ensure that, excepted dropping the node and bootstrapping a new one. What we can make sure of is that there is enough replica remaining to serve consistent data (search for RF and CL)

"After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair"?"

This sentence is false bootstrap / joining phase ≠ from broken node coming back. You are right on repair, if a broken node (or down for too long - default 3 hours) come back you have to repair. But repair is slow, make sure you can afford a node, see my previous answer.

Testing is a really good idea but you also have to read a lot imho.

Good luck,

C*heers,

Alain


2015-06-15 11:13 GMT+02:00 Jean Tremblay <je...@zen-innovations.com>>:

Hi,

I have a cluster of 3 nodes RF: 2.
There are about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput has been change.

I am have tested a scenario where one node crashes and loose all its data.
I have deleted all data on this node after having stopped Cassandra.
At this point I noticed that the cluster was giving proper results. What I was expecting from a cluster DB.

I then restarted that node and I observed that the node was joining the cluster.
After an hour or so the old “defect” node was up and normal.
I noticed that its hard disk loaded with much less data than its neighbours.

When I was querying the DB, the cluster was giving me different results for successive identical queries.
I guess the old “defect” node was giving me less rows than it should have.

1) For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct?
2) After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe?
3) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair"?


Thanks for your comments.

Kind regards

Jean




Re: Catastrophy Recovery.

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi, it looks like your starting to use Cassandra.

Welcome.

I invite you to read from here as much as you can
http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html
.

When a node lose some data you have various anti entropy mechanism

Hinted Handoff --> For writes that occurred while node was down and known
as such by other nodes (exclusively)
Read repair --> On each read, you can set a chance to check other nodes for
auto correction.
Repair ( called either manual / anti entropy / full / ...) : Which takes
care to give back a node its missing data only for the range this node
handles (-pr) or for all its data (its range plus its replica). This is
something you generally want to perform on all nodes on a regular basis
(lower than the lowest gc_grace_period set on any of your tables).

Also, you are having wrong values because you probably have a Consistency
Level (CL) too low. If you want this to never happen you have to set Read
(R) / Write (W) consistency level as follow : R + W > RF (Refplication
Factor), if not you can see what you are currently seeing. I advise you to
set your consistency to "local_quorum" or "quorum" on single DC
environment. Also, with 3 nodes, you should set RF to 3, if not you won't
be able to reach a strong consistency due to the formula I just give you.

There is a lot more to know, you should read about this all. Using
Cassandra without knowing about its internals would lead you to very poor
and unexpected results.

To answer your questions:

"For what I understand, if you have a fixed node with no data it will
automatically bootstrap and recover all its old data from its neighbour
while doing the joining phase. Is this correct?"

--> Not at all, unless it join the ring for the first time, which is not
your case. Through it will (by default) slowly recover while you read.

"After such catastrophe, and after the joining phase is done should the
cluster not be ready to deliver always consistent data if there was no
inserts or delete during the catastrophe?"

No, we can't ensure that, excepted dropping the node and bootstrapping a
new one. What we can make sure of is that there is enough replica remaining
to serve consistent data (search for RF and CL)

"After the bootstrap of a broken node is finish, i.e. after the joining
phase, is there not simply a repair to be done on that node using “node
repair"?"

This sentence is false bootstrap / joining phase ≠ from broken node coming
back. You are right on repair, if a broken node (or down for too long -
default 3 hours) come back you have to repair. But repair is slow, make
sure you can afford a node, see my previous answer.

Testing is a really good idea but you also have to read a lot imho.

Good luck,

C*heers,

Alain


2015-06-15 11:13 GMT+02:00 Jean Tremblay <je...@zen-innovations.com>
:

>
> Hi,
>
> I have a cluster of 3 nodes RF: 2.
> There are about 2 billion rows in one table.
> I use LeveledCompactionStrategy on my table.
> I use version 2.1.6.
> I use the default cassandra.yaml, only the ip address for seeds and
> throughput has been change.
>
>  I am have tested a scenario where one node crashes and loose all its
> data.
> I have deleted all data on this node after having stopped Cassandra.
> At this point I noticed that the cluster was giving proper results. What I
> was expecting from a cluster DB.
>
>  I then restarted that node and I observed that the node was joining the
> cluster.
> After an hour or so the old “defect” node was up and normal.
> I noticed that its hard disk loaded with much less data than its
> neighbours.
>
>  When I was querying the DB, the cluster was giving me different results
> for successive identical queries.
> I guess the old “defect” node was giving me less rows than it should have.
>
>  1) For what I understand, if you have a fixed node with no data it will
> automatically bootstrap and recover all its old data from its neighbour
> while doing the joining phase. Is this correct?
> 2) After such catastrophe, and after the joining phase is done should the
> cluster not be ready to deliver always consistent data if there was no
> inserts or delete during the catastrophe?
> 3) After the bootstrap of a broken node is finish, i.e. after the joining
> phase, is there not simply a repair to be done on that node using “node
> repair"?
>
>
>  Thanks for your comments.
>
>  Kind regards
>
>  Jean
>
>

Catastrophy Recovery.

Posted by Jean Tremblay <je...@zen-innovations.com>.
Hi,

I have a cluster of 3 nodes RF: 2.
There are about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput has been change.

I am have tested a scenario where one node crashes and loose all its data.
I have deleted all data on this node after having stopped Cassandra.
At this point I noticed that the cluster was giving proper results. What I was expecting from a cluster DB.

I then restarted that node and I observed that the node was joining the cluster.
After an hour or so the old “defect” node was up and normal.
I noticed that its hard disk loaded with much less data than its neighbours.

When I was querying the DB, the cluster was giving me different results for successive identical queries.
I guess the old “defect” node was giving me less rows than it should have.

1) For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct?
2) After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe?
3) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair"?


Thanks for your comments.

Kind regards

Jean


Missing data

Posted by Jean Tremblay <je...@zen-innovations.com>.
Hi,

I have reloaded the data in my cluster of 3 nodes RF: 2.
I have loaded about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput has been change.

I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes.
For me this is quite acceptable since I should not be doing this again.
I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems.

Now I read the log files on the client side, there I see no warning and no errors.
On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS.

My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions.

1) Is this a know problem?
2) Do you have any idea how I could track down this problem?
3) What is the meaning of this WARNING (the only type of ERROR | WARN  I could find)?

WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!]


4) Is it possible to have Tombstone when we make no DELETE statements?

I’m lost…

Thanks for your help.