You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Alex Circus <ci...@gmail.com> on 2017/11/15 22:04:12 UTC

Possibly cassandra 3.0.9 bug?

Hi,

*On short:*
I use cassandra 3.0.9 in a cluster of 6 nodes.
1. I create a keyspace called test:
    CREATE KEYSPACE business WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '3'}  AND durable_writes = true;
2. I create table called test:

CREATE TABLE test.test (

    test_id bigint,

    test_value text

    PRIMARY KEY (test_id)

)

3. I insert test_id=23 and test_value=some very large string/html (like
406088 chars utf8).

4. I query for test_id=35 and I get timeout (even with clqsh
--request-timeout=3600).......

5. If I run the above on an existing cassandra cluster with cassa 2.0 the
select returns instantly....The Java heap size is 8GB and in JMX I see max
4GB used of these 8 GB in the new cluster....


*Detailed:*

The above was just a test. The real scenario is:

I migrated some tables from an old cassa (2.0) cluster with 9 nodes into
another with 6 nodes and with cassa 3.0.9 and there was a lot of
problems....

I have a table like this:

CREATE TABLE table (
  id text,
  ts text,
  score decimal,
  type text,
  values text,
  PRIMARY KEY (id, ts)
) WITH CLUSTERING ORDER BY (ts DESC)

and the following query (which returns instantly):

SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06');

*If I add another day in the IN clause, the response never comes (even
after 10 minutes!!!):*

SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06',
*'2017-11-07'*);

*The 'values' column may have large json data. *

I managed to trace one of the timeouts by looking into system_trace
keyspace. Please look into the attached image and see the last process took
10 minutes!!!

I think there is some size limit somewhere because in* the IN clause *if I
have 23 params it works(under 1 second), but with more(1+) it fails. The
rows are the same size (same json size on all). In node2 of those 6 it
works with 24 params. In node1 and node3 no. The other nodes I haven't
checked yet.

I saw no concluding logs except this one from cassa's debug.log (in the
moment of the timeout or very close to that):

*DEBUG [Thrift:2608] 2017-11-15 13:48:05,611 ReadCallback.java:126 - Timed
out; received 0 of 1 responses*

I think this problem has the same root cause as the one from the test
(large html text) and it is related to some memory limit by code somewhere.


Thank you,

Alex.
[image: screenshot.png]

Re: Possibly cassandra 3.0.9 bug?

Posted by Alex Circus <ci...@gmail.com>.

Thanks,

Here is a link with the image:
https://imgur.com/a/KpJMh

On Thu, Nov 16, 2017 at 11:22 AM Pavel Drankov <ti...@gmail.com> wrote:

> You also can use any image uploading service like https://imgur.com/
>
> Best wishes,
> Pavel
>
> On 16 November 2017 at 11:21, Murukesh Mohanan <murukesh.mohanan@gmail.com
> >
> wrote:
>
> > Hi Alex,
> >
> > It's still not visible... I don't think the mailing list supports image
> > attachments. Maybe you can create an issue on JIRA with the attachments?
> >
> > Thanks,
> > Muru
> > On Thu, 16 Nov 2017 at 17:00 Alex Circus <ci...@gmail.com>
> > wrote:
> >
> > > Hi Pavel,
> > >
> > > I'm attaching it again. I use gmail app from browser. Please check now.
> > >
> > > Thanks,
> > > Alex.
> > >
> > > On Thu, Nov 16, 2017 at 9:17 AM, Pavel Drankov <ti...@gmail.com>
> > > wrote:
> > >
> > >> Hi Alex,
> > >>
> > >> I don't see any attached image. Can you please send it one more time?
> > >>
> > >> Best wishes,
> > >> Pavel
> > >>
> > >> On 16 November 2017 at 01:04, Alex Circus <circus.alexandru@gmail.com
> >
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > *On short:*
> > >> > I use cassandra 3.0.9 in a cluster of 6 nodes.
> > >> > 1. I create a keyspace called test:
> > >> >     CREATE KEYSPACE business WITH replication = {'class':
> > >> > 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes =
> > true;
> > >> > 2. I create table called test:
> > >> >
> > >> > CREATE TABLE test.test (
> > >> >
> > >> >     test_id bigint,
> > >> >
> > >> >     test_value text
> > >> >
> > >> >     PRIMARY KEY (test_id)
> > >> >
> > >> > )
> > >> >
> > >> > 3. I insert test_id=23 and test_value=some very large string/html
> > (like
> > >> > 406088 chars utf8).
> > >> >
> > >> > 4. I query for test_id=35 and I get timeout (even with clqsh
> > >> > --request-timeout=3600).......
> > >> >
> > >> > 5. If I run the above on an existing cassandra cluster with cassa
> 2.0
> > >> the
> > >> > select returns instantly....The Java heap size is 8GB and in JMX I
> see
> > >> max
> > >> > 4GB used of these 8 GB in the new cluster....
> > >> >
> > >> >
> > >> > *Detailed:*
> > >> >
> > >> > The above was just a test. The real scenario is:
> > >> >
> > >> > I migrated some tables from an old cassa (2.0) cluster with 9 nodes
> > into
> > >> > another with 6 nodes and with cassa 3.0.9 and there was a lot of
> > >> > problems....
> > >> >
> > >> > I have a table like this:
> > >> >
> > >> > CREATE TABLE table (
> > >> >   id text,
> > >> >   ts text,
> > >> >   score decimal,
> > >> >   type text,
> > >> >   values text,
> > >> >   PRIMARY KEY (id, ts)
> > >> > ) WITH CLUSTERING ORDER BY (ts DESC)
> > >> >
> > >> > and the following query (which returns instantly):
> > >> >
> > >> > SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
> > >> ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
> > 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
> > 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
> > 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
> > 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
> > 2017-11-04','2017-11-05','2017-11-06');
> > >> >
> > >> > *If I add another day in the IN clause, the response never comes
> (even
> > >> > after 10 minutes!!!):*
> > >> >
> > >> > SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
> > >> > ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
> > >> > 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
> > >> > 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
> > >> > 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
> > >> > 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
> > >> > 2017-11-04','2017-11-05','2017-11-06', *'2017-11-07'*);
> > >> >
> > >> > *The 'values' column may have large json data. *
> > >> >
> > >> > I managed to trace one of the timeouts by looking into system_trace
> > >> > keyspace. Please look into the attached image and see the last
> process
> > >> took
> > >> > 10 minutes!!!
> > >> >
> > >> > I think there is some size limit somewhere because in* the IN clause
> > *if
> > >> > I have 23 params it works(under 1 second), but with more(1+) it
> fails.
> > >> The
> > >> > rows are the same size (same json size on all). In node2 of those 6
> it
> > >> > works with 24 params. In node1 and node3 no. The other nodes I
> haven't
> > >> > checked yet.
> > >> >
> > >> > I saw no concluding logs except this one from cassa's debug.log (in
> > the
> > >> > moment of the timeout or very close to that):
> > >> >
> > >> > *DEBUG [Thrift:2608] 2017-11-15 13:48:05,611 ReadCallback.java:126 -
> > >> Timed
> > >> > out; received 0 of 1 responses*
> > >> >
> > >> > I think this problem has the same root cause as the one from the
> test
> > >> > (large html text) and it is related to some memory limit by code
> > >> somewhere.
> > >> >
> > >> >
> > >> > Thank you,
> > >> >
> > >> > Alex.
> > >> > [image: screenshot.png]
> > >> >
> > >> >
> > >>
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> > --
> >
> > Murukesh Mohanan,
> > Yahoo! Japan
> >
>

Re: Possibly cassandra 3.0.9 bug?

Posted by Pavel Drankov <ti...@gmail.com>.

You also can use any image uploading service like https://imgur.com/

Best wishes,
Pavel

On 16 November 2017 at 11:21, Murukesh Mohanan <mu...@gmail.com>
wrote:

> Hi Alex,
>
> It's still not visible... I don't think the mailing list supports image
> attachments. Maybe you can create an issue on JIRA with the attachments?
>
> Thanks,
> Muru
> On Thu, 16 Nov 2017 at 17:00 Alex Circus <ci...@gmail.com>
> wrote:
>
> > Hi Pavel,
> >
> > I'm attaching it again. I use gmail app from browser. Please check now.
> >
> > Thanks,
> > Alex.
> >
> > On Thu, Nov 16, 2017 at 9:17 AM, Pavel Drankov <ti...@gmail.com>
> > wrote:
> >
> >> Hi Alex,
> >>
> >> I don't see any attached image. Can you please send it one more time?
> >>
> >> Best wishes,
> >> Pavel
> >>
> >> On 16 November 2017 at 01:04, Alex Circus <ci...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > *On short:*
> >> > I use cassandra 3.0.9 in a cluster of 6 nodes.
> >> > 1. I create a keyspace called test:
> >> >     CREATE KEYSPACE business WITH replication = {'class':
> >> > 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes =
> true;
> >> > 2. I create table called test:
> >> >
> >> > CREATE TABLE test.test (
> >> >
> >> >     test_id bigint,
> >> >
> >> >     test_value text
> >> >
> >> >     PRIMARY KEY (test_id)
> >> >
> >> > )
> >> >
> >> > 3. I insert test_id=23 and test_value=some very large string/html
> (like
> >> > 406088 chars utf8).
> >> >
> >> > 4. I query for test_id=35 and I get timeout (even with clqsh
> >> > --request-timeout=3600).......
> >> >
> >> > 5. If I run the above on an existing cassandra cluster with cassa 2.0
> >> the
> >> > select returns instantly....The Java heap size is 8GB and in JMX I see
> >> max
> >> > 4GB used of these 8 GB in the new cluster....
> >> >
> >> >
> >> > *Detailed:*
> >> >
> >> > The above was just a test. The real scenario is:
> >> >
> >> > I migrated some tables from an old cassa (2.0) cluster with 9 nodes
> into
> >> > another with 6 nodes and with cassa 3.0.9 and there was a lot of
> >> > problems....
> >> >
> >> > I have a table like this:
> >> >
> >> > CREATE TABLE table (
> >> >   id text,
> >> >   ts text,
> >> >   score decimal,
> >> >   type text,
> >> >   values text,
> >> >   PRIMARY KEY (id, ts)
> >> > ) WITH CLUSTERING ORDER BY (ts DESC)
> >> >
> >> > and the following query (which returns instantly):
> >> >
> >> > SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
> >> ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
> 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
> 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
> 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
> 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
> 2017-11-04','2017-11-05','2017-11-06');
> >> >
> >> > *If I add another day in the IN clause, the response never comes (even
> >> > after 10 minutes!!!):*
> >> >
> >> > SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
> >> > ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
> >> > 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
> >> > 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
> >> > 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
> >> > 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
> >> > 2017-11-04','2017-11-05','2017-11-06', *'2017-11-07'*);
> >> >
> >> > *The 'values' column may have large json data. *
> >> >
> >> > I managed to trace one of the timeouts by looking into system_trace
> >> > keyspace. Please look into the attached image and see the last process
> >> took
> >> > 10 minutes!!!
> >> >
> >> > I think there is some size limit somewhere because in* the IN clause
> *if
> >> > I have 23 params it works(under 1 second), but with more(1+) it fails.
> >> The
> >> > rows are the same size (same json size on all). In node2 of those 6 it
> >> > works with 24 params. In node1 and node3 no. The other nodes I haven't
> >> > checked yet.
> >> >
> >> > I saw no concluding logs except this one from cassa's debug.log (in
> the
> >> > moment of the timeout or very close to that):
> >> >
> >> > *DEBUG [Thrift:2608] 2017-11-15 13:48:05,611 ReadCallback.java:126 -
> >> Timed
> >> > out; received 0 of 1 responses*
> >> >
> >> > I think this problem has the same root cause as the one from the test
> >> > (large html text) and it is related to some memory limit by code
> >> somewhere.
> >> >
> >> >
> >> > Thank you,
> >> >
> >> > Alex.
> >> > [image: screenshot.png]
> >> >
> >> >
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
>
> --
>
> Murukesh Mohanan,
> Yahoo! Japan
>

Re: Possibly cassandra 3.0.9 bug?

Posted by Murukesh Mohanan <mu...@gmail.com>.

Hi Alex,

It's still not visible... I don't think the mailing list supports image
attachments. Maybe you can create an issue on JIRA with the attachments?

Thanks,
Muru
On Thu, 16 Nov 2017 at 17:00 Alex Circus <ci...@gmail.com> wrote:

> Hi Pavel,
>
> I'm attaching it again. I use gmail app from browser. Please check now.
>
> Thanks,
> Alex.
>
> On Thu, Nov 16, 2017 at 9:17 AM, Pavel Drankov <ti...@gmail.com>
> wrote:
>
>> Hi Alex,
>>
>> I don't see any attached image. Can you please send it one more time?
>>
>> Best wishes,
>> Pavel
>>
>> On 16 November 2017 at 01:04, Alex Circus <ci...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > *On short:*
>> > I use cassandra 3.0.9 in a cluster of 6 nodes.
>> > 1. I create a keyspace called test:
>> >     CREATE KEYSPACE business WITH replication = {'class':
>> > 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
>> > 2. I create table called test:
>> >
>> > CREATE TABLE test.test (
>> >
>> >     test_id bigint,
>> >
>> >     test_value text
>> >
>> >     PRIMARY KEY (test_id)
>> >
>> > )
>> >
>> > 3. I insert test_id=23 and test_value=some very large string/html (like
>> > 406088 chars utf8).
>> >
>> > 4. I query for test_id=35 and I get timeout (even with clqsh
>> > --request-timeout=3600).......
>> >
>> > 5. If I run the above on an existing cassandra cluster with cassa 2.0
>> the
>> > select returns instantly....The Java heap size is 8GB and in JMX I see
>> max
>> > 4GB used of these 8 GB in the new cluster....
>> >
>> >
>> > *Detailed:*
>> >
>> > The above was just a test. The real scenario is:
>> >
>> > I migrated some tables from an old cassa (2.0) cluster with 9 nodes into
>> > another with 6 nodes and with cassa 3.0.9 and there was a lot of
>> > problems....
>> >
>> > I have a table like this:
>> >
>> > CREATE TABLE table (
>> >   id text,
>> >   ts text,
>> >   score decimal,
>> >   type text,
>> >   values text,
>> >   PRIMARY KEY (id, ts)
>> > ) WITH CLUSTERING ORDER BY (ts DESC)
>> >
>> > and the following query (which returns instantly):
>> >
>> > SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
>> ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06');
>> >
>> > *If I add another day in the IN clause, the response never comes (even
>> > after 10 minutes!!!):*
>> >
>> > SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
>> > ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
>> > 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
>> > 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
>> > 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
>> > 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
>> > 2017-11-04','2017-11-05','2017-11-06', *'2017-11-07'*);
>> >
>> > *The 'values' column may have large json data. *
>> >
>> > I managed to trace one of the timeouts by looking into system_trace
>> > keyspace. Please look into the attached image and see the last process
>> took
>> > 10 minutes!!!
>> >
>> > I think there is some size limit somewhere because in* the IN clause *if
>> > I have 23 params it works(under 1 second), but with more(1+) it fails.
>> The
>> > rows are the same size (same json size on all). In node2 of those 6 it
>> > works with 24 params. In node1 and node3 no. The other nodes I haven't
>> > checked yet.
>> >
>> > I saw no concluding logs except this one from cassa's debug.log (in the
>> > moment of the timeout or very close to that):
>> >
>> > *DEBUG [Thrift:2608] 2017-11-15 13:48:05,611 ReadCallback.java:126 -
>> Timed
>> > out; received 0 of 1 responses*
>> >
>> > I think this problem has the same root cause as the one from the test
>> > (large html text) and it is related to some memory limit by code
>> somewhere.
>> >
>> >
>> > Thank you,
>> >
>> > Alex.
>> > [image: screenshot.png]
>> >
>> >
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org

-- 

Murukesh Mohanan,
Yahoo! Japan

Re: Possibly cassandra 3.0.9 bug?

Posted by Alex Circus <ci...@gmail.com>.

Hi Pavel,

I'm attaching it again. I use gmail app from browser. Please check now.

Thanks,
Alex.

On Thu, Nov 16, 2017 at 9:17 AM, Pavel Drankov <ti...@gmail.com> wrote:

> Hi Alex,
>
> I don't see any attached image. Can you please send it one more time?
>
> Best wishes,
> Pavel
>
> On 16 November 2017 at 01:04, Alex Circus <ci...@gmail.com>
> wrote:
>
> > Hi,
> >
> > *On short:*
> > I use cassandra 3.0.9 in a cluster of 6 nodes.
> > 1. I create a keyspace called test:
> >     CREATE KEYSPACE business WITH replication = {'class':
> > 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
> > 2. I create table called test:
> >
> > CREATE TABLE test.test (
> >
> >     test_id bigint,
> >
> >     test_value text
> >
> >     PRIMARY KEY (test_id)
> >
> > )
> >
> > 3. I insert test_id=23 and test_value=some very large string/html (like
> > 406088 chars utf8).
> >
> > 4. I query for test_id=35 and I get timeout (even with clqsh
> > --request-timeout=3600).......
> >
> > 5. If I run the above on an existing cassandra cluster with cassa 2.0 the
> > select returns instantly....The Java heap size is 8GB and in JMX I see
> max
> > 4GB used of these 8 GB in the new cluster....
> >
> >
> > *Detailed:*
> >
> > The above was just a test. The real scenario is:
> >
> > I migrated some tables from an old cassa (2.0) cluster with 9 nodes into
> > another with 6 nodes and with cassa 3.0.9 and there was a lot of
> > problems....
> >
> > I have a table like this:
> >
> > CREATE TABLE table (
> >   id text,
> >   ts text,
> >   score decimal,
> >   type text,
> >   values text,
> >   PRIMARY KEY (id, ts)
> > ) WITH CLUSTERING ORDER BY (ts DESC)
> >
> > and the following query (which returns instantly):
> >
> > SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
> ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
> 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
> 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
> 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
> 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
> 2017-11-04','2017-11-05','2017-11-06');
> >
> > *If I add another day in the IN clause, the response never comes (even
> > after 10 minutes!!!):*
> >
> > SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
> > ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
> > 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
> > 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
> > 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
> > 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
> > 2017-11-04','2017-11-05','2017-11-06', *'2017-11-07'*);
> >
> > *The 'values' column may have large json data. *
> >
> > I managed to trace one of the timeouts by looking into system_trace
> > keyspace. Please look into the attached image and see the last process
> took
> > 10 minutes!!!
> >
> > I think there is some size limit somewhere because in* the IN clause *if
> > I have 23 params it works(under 1 second), but with more(1+) it fails.
> The
> > rows are the same size (same json size on all). In node2 of those 6 it
> > works with 24 params. In node1 and node3 no. The other nodes I haven't
> > checked yet.
> >
> > I saw no concluding logs except this one from cassa's debug.log (in the
> > moment of the timeout or very close to that):
> >
> > *DEBUG [Thrift:2608] 2017-11-15 13:48:05,611 ReadCallback.java:126 -
> Timed
> > out; received 0 of 1 responses*
> >
> > I think this problem has the same root cause as the one from the test
> > (large html text) and it is related to some memory limit by code
> somewhere.
> >
> >
> > Thank you,
> >
> > Alex.
> > [image: screenshot.png]
> >
> >
>

Re: Possibly cassandra 3.0.9 bug?

Posted by Pavel Drankov <ti...@gmail.com>.

Hi Alex,

I don't see any attached image. Can you please send it one more time?

Best wishes,
Pavel

On 16 November 2017 at 01:04, Alex Circus <ci...@gmail.com>
wrote:

> Hi,
>
> *On short:*
> I use cassandra 3.0.9 in a cluster of 6 nodes.
> 1. I create a keyspace called test:
>     CREATE KEYSPACE business WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
> 2. I create table called test:
>
> CREATE TABLE test.test (
>
>     test_id bigint,
>
>     test_value text
>
>     PRIMARY KEY (test_id)
>
> )
>
> 3. I insert test_id=23 and test_value=some very large string/html (like
> 406088 chars utf8).
>
> 4. I query for test_id=35 and I get timeout (even with clqsh
> --request-timeout=3600).......
>
> 5. If I run the above on an existing cassandra cluster with cassa 2.0 the
> select returns instantly....The Java heap size is 8GB and in JMX I see max
> 4GB used of these 8 GB in the new cluster....
>
>
> *Detailed:*
>
> The above was just a test. The real scenario is:
>
> I migrated some tables from an old cassa (2.0) cluster with 9 nodes into
> another with 6 nodes and with cassa 3.0.9 and there was a lot of
> problems....
>
> I have a table like this:
>
> CREATE TABLE table (
>   id text,
>   ts text,
>   score decimal,
>   type text,
>   values text,
>   PRIMARY KEY (id, ts)
> ) WITH CLUSTERING ORDER BY (ts DESC)
>
> and the following query (which returns instantly):
>
> SELECT * FROM keyspace.table WHERE id='someId' AND ts IN ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06');
>
> *If I add another day in the IN clause, the response never comes (even
> after 10 minutes!!!):*
>
> SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
> ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
> 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
> 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
> 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
> 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
> 2017-11-04','2017-11-05','2017-11-06', *'2017-11-07'*);
>
> *The 'values' column may have large json data. *
>
> I managed to trace one of the timeouts by looking into system_trace
> keyspace. Please look into the attached image and see the last process took
> 10 minutes!!!
>
> I think there is some size limit somewhere because in* the IN clause *if
> I have 23 params it works(under 1 second), but with more(1+) it fails. The
> rows are the same size (same json size on all). In node2 of those 6 it
> works with 24 params. In node1 and node3 no. The other nodes I haven't
> checked yet.
>
> I saw no concluding logs except this one from cassa's debug.log (in the
> moment of the timeout or very close to that):
>
> *DEBUG [Thrift:2608] 2017-11-15 13:48:05,611 ReadCallback.java:126 - Timed
> out; received 0 of 1 responses*
>
> I think this problem has the same root cause as the one from the test
> (large html text) and it is related to some memory limit by code somewhere.
>
>
> Thank you,
>
> Alex.
> [image: screenshot.png]
>
>

Re: Possibly cassandra 3.0.9 bug?

Posted by Jeff Jirsa <jj...@gmail.com>.

1) There are a LOT of bugs in cassandra-3.0.9. Some of them are really bad,
you should definitely consider upgrading to 3.0.15.

2) The GC profile changed between 2.0 and 3.0. It may be that you're
generating a bit more garbage and causing GC pauses (especially likely
since you're using thrift), or it could be that you're hitting some other
bug.

3) It's also possible there's bad rows involved, or some row that's very
out of sync. You'll get read timeouts if (for example) you generate a read
repair mutation > your max mutation size.


On Wed, Nov 15, 2017 at 2:04 PM, Alex Circus <ci...@gmail.com>
wrote:

> Hi,
>
> *On short:*
> I use cassandra 3.0.9 in a cluster of 6 nodes.
> 1. I create a keyspace called test:
>     CREATE KEYSPACE business WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
> 2. I create table called test:
>
> CREATE TABLE test.test (
>
>     test_id bigint,
>
>     test_value text
>
>     PRIMARY KEY (test_id)
>
> )
>
> 3. I insert test_id=23 and test_value=some very large string/html (like
> 406088 chars utf8).
>
> 4. I query for test_id=35 and I get timeout (even with clqsh
> --request-timeout=3600).......
>
> 5. If I run the above on an existing cassandra cluster with cassa 2.0 the
> select returns instantly....The Java heap size is 8GB and in JMX I see max
> 4GB used of these 8 GB in the new cluster....
>
>
> *Detailed:*
>
> The above was just a test. The real scenario is:
>
> I migrated some tables from an old cassa (2.0) cluster with 9 nodes into
> another with 6 nodes and with cassa 3.0.9 and there was a lot of
> problems....
>
> I have a table like this:
>
> CREATE TABLE table (
>   id text,
>   ts text,
>   score decimal,
>   type text,
>   values text,
>   PRIMARY KEY (id, ts)
> ) WITH CLUSTERING ORDER BY (ts DESC)
>
> and the following query (which returns instantly):
>
> SELECT * FROM keyspace.table WHERE id='someId' AND ts IN ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06');
>
> *If I add another day in the IN clause, the response never comes (even
> after 10 minutes!!!):*
>
> SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
> ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
> 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
> 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
> 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
> 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
> 2017-11-04','2017-11-05','2017-11-06', *'2017-11-07'*);
>
> *The 'values' column may have large json data. *
>
> I managed to trace one of the timeouts by looking into system_trace
> keyspace. Please look into the attached image and see the last process took
> 10 minutes!!!
>
> I think there is some size limit somewhere because in* the IN clause *if
> I have 23 params it works(under 1 second), but with more(1+) it fails. The
> rows are the same size (same json size on all). In node2 of those 6 it
> works with 24 params. In node1 and node3 no. The other nodes I haven't
> checked yet.
>
> I saw no concluding logs except this one from cassa's debug.log (in the
> moment of the timeout or very close to that):
>
> *DEBUG [Thrift:2608] 2017-11-15 13:48:05,611 ReadCallback.java:126 - Timed
> out; received 0 of 1 responses*
>
> I think this problem has the same root cause as the one from the test
> (large html text) and it is related to some memory limit by code somewhere.
>
>
> Thank you,
>
> Alex.
> [image: screenshot.png]
>
>