You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by 18624049226 <18...@163.com> on 2022/04/26 12:48:10 UTC

about the performance of select * from tbl

We have a business scenario. We must execute the following statement:

select * from tbl;

This CQL has no WHERE condition.

What I want to ask is that if the data in this table is more than one 
million or more, what methods or parameters can improve the performance 
of this CQL?

Re: about the performance of select * from tbl

Posted by Jeff Jirsa <jj...@gmail.com>.

Yes, you CAN change the fetch size to adjust how many pages of results are
returned. But, if you have a million rows, you may still do hundreds or
thousands of queries, one after the next. Even if each is 1ms, it's going
to take a long time.

What Dor suggested is generating a number of SELECT statements, each of
which would return part of the table (using TOKEN()), that you can execute
in parallel. This will end up being much faster than trying to tune the
single SELECT.



On Tue, Apr 26, 2022 at 7:35 AM 18624049226 <18...@163.com> wrote:

> Thank you for your reply!
>
> What I want to know is that the data volume of this table is not massive.
> If the logic of CQL cannot be modified, just inside Cassandra, are there
> any parameters that can affect the behavior of this query? For example, the
> fetchSize parameter of other databases?
> 在 2022/4/26 21:18, Dor Laor 写道:
>
> select * reads all of the data from the cluster, obviously it would be bad
> if you'll
> run a single query and expect it to return 'fast'. The best way is to
> divide the data
> set into chunks which will be selected by the range ownership per node, so
> you'll
> be able to query in parallel the entire cluster and maximize the
> parallelism.
>
> If needed, I can provide an example for this
>
> On Tue, Apr 26, 2022 at 3:48 PM 18624049226 <18...@163.com> wrote:
>
>> We have a business scenario. We must execute the following statement:
>>
>> select * from tbl;
>>
>> This CQL has no WHERE condition.
>>
>> What I want to ask is that if the data in this table is more than one
>> million or more, what methods or parameters can improve the performance of
>> this CQL?
>>
>

Re: about the performance of select * from tbl

Posted by 18624049226 <18...@163.com>.

Thank you for your reply!

What I want to know is that the data volume of this table is not 
massive. If the logic of CQL cannot be modified, just inside Cassandra, 
are there any parameters that can affect the behavior of this query? For 
example, the fetchSize parameter of other databases?

在 2022/4/26 21:18, Dor Laor 写道:
> select * reads all of the data from the cluster, obviously it would be 
> bad if you'll
> run a single query and expect it to return 'fast'. The best way is to 
> divide the data
> set into chunks which will be selected by the range ownership per 
> node, so you'll
> be able to query in parallel the entire cluster and maximize the 
> parallelism.
>
> If needed, I can provide an example for this
>
> On Tue, Apr 26, 2022 at 3:48 PM 18624049226 <18...@163.com> wrote:
>
>     We have a business scenario. We must execute the following statement:
>
>     select * from tbl;
>
>     This CQL has no WHERE condition.
>
>     What I want to ask is that if the data in this table is more than
>     one million or more, what methods or parameters can improve the
>     performance of this CQL?
>

Re: about the performance of select * from tbl

Posted by Dor Laor <do...@scylladb.com>.

select * reads all of the data from the cluster, obviously it would be bad
if you'll
run a single query and expect it to return 'fast'. The best way is to
divide the data
set into chunks which will be selected by the range ownership per node, so
you'll
be able to query in parallel the entire cluster and maximize the
parallelism.

If needed, I can provide an example for this

On Tue, Apr 26, 2022 at 3:48 PM 18624049226 <18...@163.com> wrote:

> We have a business scenario. We must execute the following statement:
>
> select * from tbl;
>
> This CQL has no WHERE condition.
>
> What I want to ask is that if the data in this table is more than one
> million or more, what methods or parameters can improve the performance of
> this CQL?
>

RE: about the performance of select * from tbl

Posted by "Durity, Sean R" <SE...@homedepot.com>.

If the number of rows is known and bounded and would be under 100 MB in size, I would suggest adding an artificial partition key so that all rows are in one partition. I recommend this technique for something like an application settings table that is retrieved infrequently (like on app start-up) but needs all rows at once. If it is often accessed, this strategy could create hot spots or potential availability concerns.

If this is more about analytics and the row count is unbounded, I would pursue something like Spark OR re-design the model so that you do have some kind of partition (and maybe clustering) keys. I’m always telling app teams that more in-parallel queries are a very good option for Cassandra.

My bottom line is this: the BEST way to scale Cassandra is NOT tuning queries, but designing the tables to easily answer what you need with proper partitioning.


Sean R. Durity
From: Joe Obernberger <jo...@gmail.com>
Sent: Tuesday, April 26, 2022 1:10 PM
To: user@cassandra.apache.org; 18624049226 <18...@163.com>
Subject: [EXTERNAL] Re: about the performance of select * from tbl


This would be a good use case for Spark + Cassandra.

-Joe
On 4/26/2022 8:48 AM, 18624049226 wrote:

We have a business scenario. We must execute the following statement:

select * from tbl;

This CQL has no WHERE condition.

What I want to ask is that if the data in this table is more than one million or more, what methods or parameters can improve the performance of this CQL?

________________________________
[Image removed by sender. AVG logo][avg.com]<https://urldefense.com/v3/__https:/www.avg.com/internet-security__;!!M-nmYVHPHQ!LXIUqM_3oQ6NQh9DsGfUplOMuPHZ9AHoRrvgwyAsZl8-vKAyMttKCW1TAuM5fcK_BIfSZ-0azIAR_ELw4yBPp3I3kUmRjnCW_w$>

This email has been checked for viruses by AVG antivirus software.
www.avg.com [avg.com]<https://urldefense.com/v3/__https:/www.avg.com/internet-security__;!!M-nmYVHPHQ!LXIUqM_3oQ6NQh9DsGfUplOMuPHZ9AHoRrvgwyAsZl8-vKAyMttKCW1TAuM5fcK_BIfSZ-0azIAR_ELw4yBPp3I3kUmRjnCW_w$>




INTERNAL USE

Re: about the performance of select * from tbl

Posted by Joe Obernberger <jo...@gmail.com>.

This would be a good use case for Spark + Cassandra.

-Joe

On 4/26/2022 8:48 AM, 18624049226 wrote:
>
> We have a business scenario. We must execute the following statement:
>
> select * from tbl;
>
> This CQL has no WHERE condition.
>
> What I want to ask is that if the data in this table is more than one 
> million or more, what methods or parameters can improve the 
> performance of this CQL?
>

-- 
This email has been checked for viruses by AVG.
https://www.avg.com