You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Deepak Sharma <sh...@salesforce.com.INVALID> on 2020/10/27 19:07:12 UTC

Re: Cassandra timeout during read query

Hi Attlila,

We did have larger partitions which are now below 100MB threshold after we
ran nodetool repair. And now we do see most of the time, query runs are
running successfully but there is a small percentage of query runs which
are still failing.

Regarding your comment ```considered with your fetchSize together (driver
setting on the query level)```, can you elaborate more on it? Are you
suggesting to reduce the fetchSize (right now fetchSize is 5000) for this
query?

Also, we are trying to use prefetch feature as well but it is also not
helping. Following is the code:

Iterator<Row> iter = resultSet.iterator();
while (iter.hasNext()) {
  if (resultSet.getAvailableWithoutFetching() <= fetchSize &&
!resultSet.isFullyFetched()) {
    resultSet.fetchMoreResults();
  }
  Row row = iter.next();
  .....
}

Thanks,
Deepak

On Sat, Sep 19, 2020 at 6:56 PM Deepak Sharma <sh...@salesforce.com>
wrote:

> Thanks Attila and Aaron for the response. These are great insights. I will
> check and get back to you in case I have any questions.
>
> Best,
> Deepak
>
> On Tue, Sep 15, 2020 at 4:33 AM Attila Wind <at...@swf.technology>
> wrote:
>
>> Hi Deepak,
>>
>> Aaron has right - in order being able to help (better) you need to share
>> those details
>>
>> That 5 secs timeout comes from the coordinator node I think - see
>> cassandra.yaml "read_request_timeout_in_ms" setting - that is influencing
>> this
>>
>> But it does not matter too much... The point is that none of the replicas
>> could completed your query within that 5 secs. And this is a clean
>> indication of something is slow with your query.
>> Maybe 4) is a bit less important here, or I would a bit make it more
>> precise: considered with your fetchSize together (driver setting on the
>> query level)
>>
>> By experience one reason could be if the query which used to works starts
>> not to work any longer is growing number of data. And a possible "wide
>> cluster" problem.
>> Do you have monitoring on the Cassandra machines? What does iowait show?
>> (for us when things like this will start happening is a clean indication)
>>
>> cheers
>> Attila Wind
>>
>> http://www.linkedin.com/in/attilaw
>> Mobile: +49 176 43556932
>>
>>
>> 14.09.2020 18:36 keltezéssel, Aaron Ploetz írta:
>>
>> Deepak,
>>
>> Can you reply with:
>>
>> 1) The query you are trying to run.
>> 2) The table definition (PRIMARY KEY, specifically).
>> 3) Maybe a little description of what the table is designed to do.
>> 4) How much data you're expecting returned (both # of rows and data size).
>>
>> Thanks,
>>
>> Aaron
>>
>>
>> On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma
>> <sh...@salesforce.com.invalid>
>> <sh...@salesforce.com.invalid> wrote:
>>
>>> Hi There,
>>>
>>> We are running into a strange issue in our Cassandra Cluster where one
>>> specific query is failing with following error:
>>>
>>> Cassandra timeout during read query at consistency QUORUM (3 responses
>>> were required but only 0 replica responded)
>>>
>>> This is not a typical query read timeout that we know for sure. This
>>> error is getting spit out within 5 seconds and the query timeout we have
>>> set is around 30 seconds
>>>
>>> Can we know what is happening here and how can we reproduce this in our
>>> local environment?
>>>
>>> Thanks,
>>> Deepak
>>>
>>>

Re: Cassandra timeout during read query

Posted by Attila Wind <at...@swf.technology>.

Hey Deepak,

"Are you suggesting to reduce the fetchSize (right now fetchSize is 
5000) for this query?"

Definitely yes! If you would go with 1000 only that would give 5x more 
chance to the concrete Cassandra node/nodes which is/are executing your 
query to finish in time pulling together the records (page) - thus helps 
you to avoid the timeout issue.
Based on our measurements smaller page sizes does not add too much to 
the overall query time at all - but helps Cassandra a lot to eventually 
fulfill the full request as she can do much better load balancing too as 
you are iterating over your result set.
I would give it a try - same tactics helped a lot on our side

I also recommend to try to optimize your data in parallel with the above 
- if possible and there is space for improvement.
All I wrote earlier counts a lot. You need to also take care of data 
cleanup strategies in your tables to keep the amount of data managed 
somehow. TTL based approach e.g. is the best if you ask me especially if 
you have huge data set.

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


27.10.2020 20:07 keltezéssel, Deepak Sharma írta:
> Hi Attlila,
>
> We did have larger partitions which are now below 100MB threshold 
> after we ran nodetool repair. And now we do see most of the time, 
> query runs are running successfully but there is a small percentage of 
> query runs which are still failing.
>
> Regarding your comment ```considered with your fetchSize together 
> (driver setting on the query level)```, can you elaborate more on it? 
> Are you suggesting to reduce the fetchSize (right now fetchSize is 
> 5000) for this query?
>
> Also, we are trying to use prefetch feature as well but it is also not 
> helping. Following is the code:
>
> Iterator<Row> iter = resultSet.iterator();
> while (iter.hasNext()) {
>   if (resultSet.getAvailableWithoutFetching() <= fetchSize && 
> !resultSet.isFullyFetched()) {
>     resultSet.fetchMoreResults();
>   }
>   Row row = iter.next();
>   .....
> }
>
> Thanks,
> Deepak
>
> On Sat, Sep 19, 2020 at 6:56 PM Deepak Sharma 
> <sharma.deepak@salesforce.com <ma...@salesforce.com>> 
> wrote:
>
>     Thanks Attila and Aaron for the response. These are great
>     insights. I will check and get back to you in case I have any
>     questions.
>
>     Best,
>     Deepak
>
>     On Tue, Sep 15, 2020 at 4:33 AM Attila Wind
>     <at...@swf.technology> wrote:
>
>         Hi Deepak,
>
>         Aaron has right - in order being able to help (better) you
>         need to share those details
>
>         That 5 secs timeout comes from the coordinator node I think -
>         see cassandra.yaml "read_request_timeout_in_ms" setting - that
>         is influencing this
>
>         But it does not matter too much... The point is that none of
>         the replicas could completed your query within that 5 secs.
>         And this is a clean indication of something is slow with your
>         query.
>         Maybe 4) is a bit less important here, or I would a bit make
>         it more precise: considered with your fetchSize together
>         (driver setting on the query level)
>
>         By experience one reason could be if the query which used to
>         works starts not to work any longer is growing number of data.
>         And a possible "wide cluster" problem.
>         Do you have monitoring on the Cassandra machines? What does
>         iowait show? (for us when things like this will start
>         happening is a clean indication)
>
>         cheers
>
>         Attila Wind
>
>         http://www.linkedin.com/in/attilaw
>         Mobile: +49 176 43556932
>
>
>         14.09.2020 18:36 keltezéssel, Aaron Ploetz írta:
>>         Deepak,
>>
>>         Can you reply with:
>>
>>         1) The query you are trying to run.
>>         2) The table definition (PRIMARY KEY, specifically).
>>         3) Maybe a little description of what the table is designed
>>         to do.
>>         4) How much data you're expecting returned (both # of rows
>>         and data size).
>>
>>         Thanks,
>>
>>         Aaron
>>
>>
>>         On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma
>>         <sh...@salesforce.com.invalid>
>>         <ma...@salesforce.com.invalid> wrote:
>>
>>             Hi There,
>>
>>             We are running into a strange issue in our Cassandra
>>             Cluster where one specific query is failing with
>>             following error:
>>
>>             Cassandra timeout during read query at consistency QUORUM
>>             (3 responses were required but only 0 replica responded)
>>
>>             This is not a typical query read timeout that we know for
>>             sure. This error is getting spit out within 5 seconds and
>>             the query timeout we have set is around 30 seconds
>>
>>             Can we know what is happening here and how can we
>>             reproduce this in our local environment?
>>
>>             Thanks,
>>             Deepak
>>