You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Xiangfei Ni <xi...@cm-dt.com> on 2018/04/16 01:52:56 UTC

Time serial column family design

Hi Experts,
  We have a design requirement, for example,
  Create table test(
     vin text,
     create_date int,
     create_time timestamp,
a text,
     b text,
     primary key ((vin,create_date),create_time))
  with clustering order by (create_time DESC);
  we store data in this table like this:
ZD41578123DSAFWE12313 |20180316| 2018-03-16 20:51:33.000000+0800 |  P023  | P001
ZD41578123DSAFWE12313 |20180315| 2017-03-15 20:51:33.000000+0800 |  P000  | P001
ZD41578123DSAFWE12313 |20180314| 2017-03-14 20:51:33.000000+0800 |  P456  | P001
            3431241241234 |20180317| 2017-03-17 20:51:33.000000+0800 |  P000  | P001
            3431241241234 |20180316| 2017-03-16 20:51:33.000000+0800 |  P123  | P001
            3431241241234 |20180315| 2017-03-15 20:51:33.000000+0800 |  P456  | P001
            3431241241234 |20180314| 2017-03-14 20:51:33.000000+0800 |  P789  | P001
ZD41578123DSAFWE13333 |20180314| 2017-03-14 20:51:33.000000+0800 |  P023  | P001
              41034800994 |20180313| 2017-03-13 08:26:55.000000+0800 | P0133  | P001
              41034800994 |20180312| 2017-03-12 08:26:55.000000+0800 | P0420  | P001
We know that we can only use “=” or “in”for the partition key query,my question is that is there a convenient way to query a range result or other design for this requirement,for example 3 or 6 months  backward from nowadays,currently we can only use:
Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
But this cause the cql query is very long,and I don’t know whether there is limitation for the length of the cql.
Please give me some advice,thanks in advance.

Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516

答复: 答复: Time serial column family design

Posted by Xiangfei Ni <xi...@cm-dt.com>.

Hi Javier,
VIN is the Vehicle Identity Number, the Vehicle upload the inform from can-bus every 10 second,this table contains about 20 columns,so if we can just VIN as the partition key, every vehicle just has only one partition,the partition will become very large and never stop increasing,this is why we use the create_date in the partition key,this sounds good .
But we have requirement that we need to query the history data for a vehicle,for example,we need to query the vehicle data from 2018-01-01 until now.If we use create_month in the partition key,we can only get whole month data but not exact day data.
I found an article:
https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/
so your suggestion is to get the data by below code:
[cid:image001.png@01D3D702.064C0C70]
    We need to test it.
    Is there other design pattern to meet this requirement with better performance?
Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: Javier Pareja <pa...@gmail.com>
发送时间: 2018年4月18日 6:00
收件人: user@cassandra.apache.org
主题: Re: 答复: Time serial column family design

Hi David,

Could you describe why you chose to include the create date in the partition key? If the vin in enough "partitioning", meaning that the size (number of rows x size of row) of each partition is less than 100MB, then remove the date and just use the create_time, because the date is already included in that column anyways.

For example if columns "a" and "b" (from your table) are of max 256 UTF8 characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows per partition. You can actually have many more but you don't want to go much higher for performance reasons.

If this is not enough you could use create_month instead of create_date, for example, to reduce the partition size while not being too granular.


On Tue, 17 Apr 2018, 22:17 Nate McCall, <na...@thelastpickle.com>> wrote:
Your table design will work fine as you have appropriately bucketed by an integer-based 'create_date' field.

Your goal for this refactor should be to remove the "IN" clause from your code. This will move the rollup of multiple partition keys being retrieved into the client instead of relying on the coordinator assembling the results. You have to do more work and add some complexity, but the trade off will be much higher performance as you are removing the single coordinator as the bottleneck.

On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xi...@cm-dt.com>> wrote:
Hi Nate,
    Thanks for your reply!
    Is there other way to design this table to meet this requirement?

Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: Nate McCall <na...@thelastpickle.com>>
发送时间: 2018年4月17日 7:12
收件人: Cassandra Users <us...@cassandra.apache.org>>
主题: Re: Time serial column family design


Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
But this cause the cql query is very long,and I don’t know whether there is limitation for the length of the cql.
Please give me some advice,thanks in advance.

Using the SELECT ... IN syntax  means that:
- the driver will not be able to route the queries to the nodes which have the partition
- a single coordinator must scatter-gather the query and results

Break this up into a series of single statements using the executeAsync method and gather the results via something like Futures in Guava or similar.



--
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: 答复: Time serial column family design

Posted by Eric Plowe <er...@gmail.com>.

Jon,

Great article. Thank you. (I have nothing to do with this issue, but I
appreciate nuggets of information I glean from the list)

Regards,

Eric
On Tue, Apr 17, 2018 at 10:57 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> To add to what Nate suggested, we have an entire blog post on scaling time
> series data models:
>
>
> http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html
>
> Jon
>
>
> On Tue, Apr 17, 2018 at 7:39 PM Nate McCall <na...@thelastpickle.com>
> wrote:
>
>> I disagree. Create date as a raw integer is an excellent surrogate for
>> controlling time series "buckets" as it gives you complete control over the
>> granularity. You can even have multiple granularities in the same table -
>> remember that partition key "misses" in Cassandra are pretty lightweight as
>> they won't make it past the bloom filter on the read path.
>>
>> On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja <pa...@gmail.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> Could you describe why you chose to include the create date in the
>>> partition key? If the vin in enough "partitioning", meaning that the size
>>> (number of rows x size of row) of each partition is less than 100MB, then
>>> remove the date and just use the create_time, because the date is already
>>> included in that column anyways.
>>>
>>> For example if columns "a" and "b" (from your table) are of max 256 UTF8
>>> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
>>> per partition. You can actually have many more but you don't want to go
>>> much higher for performance reasons.
>>>
>>> If this is not enough you could use create_month instead of create_date,
>>> for example, to reduce the partition size while not being too granular.
>>>
>>>
>>> On Tue, 17 Apr 2018, 22:17 Nate McCall, <na...@thelastpickle.com> wrote:
>>>
>>>> Your table design will work fine as you have appropriately bucketed by
>>>> an integer-based 'create_date' field.
>>>>
>>>> Your goal for this refactor should be to remove the "IN" clause from
>>>> your code. This will move the rollup of multiple partition keys being
>>>> retrieved into the client instead of relying on the coordinator assembling
>>>> the results. You have to do more work and add some complexity, but the
>>>> trade off will be much higher performance as you are removing the single
>>>> coordinator as the bottleneck.
>>>>
>>>> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xi...@cm-dt.com>
>>>> wrote:
>>>>
>>>>> Hi Nate,
>>>>>
>>>>>     Thanks for your reply!
>>>>>
>>>>>     Is there other way to design this table to meet this requirement?
>>>>>
>>>>>
>>>>>
>>>>> Best Regards,
>>>>>
>>>>>
>>>>>
>>>>> 倪项菲*/ **David Ni*
>>>>>
>>>>> 中移德电网络科技有限公司
>>>>>
>>>>> Virtue Intelligent Network Ltd, co.
>>>>>
>>>>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>>>>>
>>>>> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>>>>>
>>>>>
>>>>>
>>>>> *发件人:* Nate McCall <na...@thelastpickle.com>
>>>>> *发送时间:* 2018年4月17日 7:12
>>>>> *收件人:* Cassandra Users <us...@cassandra.apache.org>
>>>>> *主题:* Re: Time serial column family design
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date
>>>>> in (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
>>>>>
>>>>> But this cause the cql query is very long,and I don’t know whether
>>>>> there is limitation for the length of the cql.
>>>>>
>>>>> Please give me some advice,thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> Using the SELECT ... IN syntax  means that:
>>>>>
>>>>> - the driver will not be able to route the queries to the nodes which
>>>>> have the partition
>>>>>
>>>>> - a single coordinator must scatter-gather the query and results
>>>>>
>>>>>
>>>>>
>>>>> Break this up into a series of single statements using the
>>>>> executeAsync method and gather the results via something like Futures in
>>>>> Guava or similar.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -----------------
>>>> Nate McCall
>>>> Wellington, NZ
>>>> @zznate
>>>>
>>>> CTO
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>
>>
>>
>> --
>> -----------------
>> Nate McCall
>> Wellington, NZ
>> @zznate
>>
>> CTO
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>

Re: 答复: Time serial column family design

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

To add to what Nate suggested, we have an entire blog post on scaling time
series data models:

http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Jon


On Tue, Apr 17, 2018 at 7:39 PM Nate McCall <na...@thelastpickle.com> wrote:

> I disagree. Create date as a raw integer is an excellent surrogate for
> controlling time series "buckets" as it gives you complete control over the
> granularity. You can even have multiple granularities in the same table -
> remember that partition key "misses" in Cassandra are pretty lightweight as
> they won't make it past the bloom filter on the read path.
>
> On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja <pa...@gmail.com>
> wrote:
>
>> Hi David,
>>
>> Could you describe why you chose to include the create date in the
>> partition key? If the vin in enough "partitioning", meaning that the size
>> (number of rows x size of row) of each partition is less than 100MB, then
>> remove the date and just use the create_time, because the date is already
>> included in that column anyways.
>>
>> For example if columns "a" and "b" (from your table) are of max 256 UTF8
>> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
>> per partition. You can actually have many more but you don't want to go
>> much higher for performance reasons.
>>
>> If this is not enough you could use create_month instead of create_date,
>> for example, to reduce the partition size while not being too granular.
>>
>>
>> On Tue, 17 Apr 2018, 22:17 Nate McCall, <na...@thelastpickle.com> wrote:
>>
>>> Your table design will work fine as you have appropriately bucketed by
>>> an integer-based 'create_date' field.
>>>
>>> Your goal for this refactor should be to remove the "IN" clause from
>>> your code. This will move the rollup of multiple partition keys being
>>> retrieved into the client instead of relying on the coordinator assembling
>>> the results. You have to do more work and add some complexity, but the
>>> trade off will be much higher performance as you are removing the single
>>> coordinator as the bottleneck.
>>>
>>> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xi...@cm-dt.com>
>>> wrote:
>>>
>>>> Hi Nate,
>>>>
>>>>     Thanks for your reply!
>>>>
>>>>     Is there other way to design this table to meet this requirement?
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>>
>>>>
>>>> 倪项菲*/ **David Ni*
>>>>
>>>> 中移德电网络科技有限公司
>>>>
>>>> Virtue Intelligent Network Ltd, co.
>>>>
>>>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>>>>
>>>> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>>>>
>>>>
>>>>
>>>> *发件人:* Nate McCall <na...@thelastpickle.com>
>>>> *发送时间:* 2018年4月17日 7:12
>>>> *收件人:* Cassandra Users <us...@cassandra.apache.org>
>>>> *主题:* Re: Time serial column family design
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date
>>>> in (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
>>>>
>>>> But this cause the cql query is very long,and I don’t know whether
>>>> there is limitation for the length of the cql.
>>>>
>>>> Please give me some advice,thanks in advance.
>>>>
>>>>
>>>>
>>>> Using the SELECT ... IN syntax  means that:
>>>>
>>>> - the driver will not be able to route the queries to the nodes which
>>>> have the partition
>>>>
>>>> - a single coordinator must scatter-gather the query and results
>>>>
>>>>
>>>>
>>>> Break this up into a series of single statements using the executeAsync
>>>> method and gather the results via something like Futures in Guava or
>>>> similar.
>>>>
>>>
>>>
>>>
>>> --
>>> -----------------
>>> Nate McCall
>>> Wellington, NZ
>>> @zznate
>>>
>>> CTO
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>
>
> --
> -----------------
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: 答复: Time serial column family design

Posted by Nate McCall <na...@thelastpickle.com>.

I disagree. Create date as a raw integer is an excellent surrogate for
controlling time series "buckets" as it gives you complete control over the
granularity. You can even have multiple granularities in the same table -
remember that partition key "misses" in Cassandra are pretty lightweight as
they won't make it past the bloom filter on the read path.

On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja <pa...@gmail.com>
wrote:

> Hi David,
>
> Could you describe why you chose to include the create date in the
> partition key? If the vin in enough "partitioning", meaning that the size
> (number of rows x size of row) of each partition is less than 100MB, then
> remove the date and just use the create_time, because the date is already
> included in that column anyways.
>
> For example if columns "a" and "b" (from your table) are of max 256 UTF8
> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
> per partition. You can actually have many more but you don't want to go
> much higher for performance reasons.
>
> If this is not enough you could use create_month instead of create_date,
> for example, to reduce the partition size while not being too granular.
>
>
> On Tue, 17 Apr 2018, 22:17 Nate McCall, <na...@thelastpickle.com> wrote:
>
>> Your table design will work fine as you have appropriately bucketed by an
>> integer-based 'create_date' field.
>>
>> Your goal for this refactor should be to remove the "IN" clause from your
>> code. This will move the rollup of multiple partition keys being retrieved
>> into the client instead of relying on the coordinator assembling the
>> results. You have to do more work and add some complexity, but the trade
>> off will be much higher performance as you are removing the single
>> coordinator as the bottleneck.
>>
>> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xi...@cm-dt.com>
>> wrote:
>>
>>> Hi Nate,
>>>
>>>     Thanks for your reply!
>>>
>>>     Is there other way to design this table to meet this requirement?
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> 倪项菲*/ **David Ni*
>>>
>>> 中移德电网络科技有限公司
>>>
>>> Virtue Intelligent Network Ltd, co.
>>>
>>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>>>
>>> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>>>
>>>
>>>
>>> *发件人:* Nate McCall <na...@thelastpickle.com>
>>> *发送时间:* 2018年4月17日 7:12
>>> *收件人:* Cassandra Users <us...@cassandra.apache.org>
>>> *主题:* Re: Time serial column family design
>>>
>>>
>>>
>>>
>>>
>>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
>>> (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
>>>
>>> But this cause the cql query is very long,and I don’t know whether there
>>> is limitation for the length of the cql.
>>>
>>> Please give me some advice,thanks in advance.
>>>
>>>
>>>
>>> Using the SELECT ... IN syntax  means that:
>>>
>>> - the driver will not be able to route the queries to the nodes which
>>> have the partition
>>>
>>> - a single coordinator must scatter-gather the query and results
>>>
>>>
>>>
>>> Break this up into a series of single statements using the executeAsync
>>> method and gather the results via something like Futures in Guava or
>>> similar.
>>>
>>
>>
>>
>> --
>> -----------------
>> Nate McCall
>> Wellington, NZ
>> @zznate
>>
>> CTO
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>


-- 
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: 答复: Time serial column family design

Posted by Javier Pareja <pa...@gmail.com>.

Hi David,

Could you describe why you chose to include the create date in the
partition key? If the vin in enough "partitioning", meaning that the size
(number of rows x size of row) of each partition is less than 100MB, then
remove the date and just use the create_time, because the date is already
included in that column anyways.

For example if columns "a" and "b" (from your table) are of max 256 UTF8
characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
per partition. You can actually have many more but you don't want to go
much higher for performance reasons.

If this is not enough you could use create_month instead of create_date,
for example, to reduce the partition size while not being too granular.


On Tue, 17 Apr 2018, 22:17 Nate McCall, <na...@thelastpickle.com> wrote:

> Your table design will work fine as you have appropriately bucketed by an
> integer-based 'create_date' field.
>
> Your goal for this refactor should be to remove the "IN" clause from your
> code. This will move the rollup of multiple partition keys being retrieved
> into the client instead of relying on the coordinator assembling the
> results. You have to do more work and add some complexity, but the trade
> off will be much higher performance as you are removing the single
> coordinator as the bottleneck.
>
> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xi...@cm-dt.com>
> wrote:
>
>> Hi Nate,
>>
>>     Thanks for your reply!
>>
>>     Is there other way to design this table to meet this requirement?
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> 倪项菲*/ **David Ni*
>>
>> 中移德电网络科技有限公司
>>
>> Virtue Intelligent Network Ltd, co.
>>
>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>>
>> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>>
>>
>>
>> *发件人:* Nate McCall <na...@thelastpickle.com>
>> *发送时间:* 2018年4月17日 7:12
>> *收件人:* Cassandra Users <us...@cassandra.apache.org>
>> *主题:* Re: Time serial column family design
>>
>>
>>
>>
>>
>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
>> (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
>>
>> But this cause the cql query is very long,and I don’t know whether there
>> is limitation for the length of the cql.
>>
>> Please give me some advice,thanks in advance.
>>
>>
>>
>> Using the SELECT ... IN syntax  means that:
>>
>> - the driver will not be able to route the queries to the nodes which
>> have the partition
>>
>> - a single coordinator must scatter-gather the query and results
>>
>>
>>
>> Break this up into a series of single statements using the executeAsync
>> method and gather the results via something like Futures in Guava or
>> similar.
>>
>
>
>
> --
> -----------------
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: 答复: Time serial column family design

Posted by Nate McCall <na...@thelastpickle.com>.

Your table design will work fine as you have appropriately bucketed by an
integer-based 'create_date' field.

Your goal for this refactor should be to remove the "IN" clause from your
code. This will move the rollup of multiple partition keys being retrieved
into the client instead of relying on the coordinator assembling the
results. You have to do more work and add some complexity, but the trade
off will be much higher performance as you are removing the single
coordinator as the bottleneck.

On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xi...@cm-dt.com> wrote:

> Hi Nate,
>
>     Thanks for your reply!
>
>     Is there other way to design this table to meet this requirement?
>
>
>
> Best Regards,
>
>
>
> 倪项菲*/ **David Ni*
>
> 中移德电网络科技有限公司
>
> Virtue Intelligent Network Ltd, co.
>
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>
> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>
>
>
> *发件人:* Nate McCall <na...@thelastpickle.com>
> *发送时间:* 2018年4月17日 7:12
> *收件人:* Cassandra Users <us...@cassandra.apache.org>
> *主题:* Re: Time serial column family design
>
>
>
>
>
> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
> (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
>
> But this cause the cql query is very long,and I don’t know whether there
> is limitation for the length of the cql.
>
> Please give me some advice,thanks in advance.
>
>
>
> Using the SELECT ... IN syntax  means that:
>
> - the driver will not be able to route the queries to the nodes which have
> the partition
>
> - a single coordinator must scatter-gather the query and results
>
>
>
> Break this up into a series of single statements using the executeAsync
> method and gather the results via something like Futures in Guava or
> similar.
>



-- 
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

答复: Time serial column family design

Posted by Xiangfei Ni <xi...@cm-dt.com>.

Hi Nate,
    Thanks for your reply!
    Is there other way to design this table to meet this requirement?

Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: Nate McCall <na...@thelastpickle.com>
发送时间: 2018年4月17日 7:12
收件人: Cassandra Users <us...@cassandra.apache.org>
主题: Re: Time serial column family design


Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
But this cause the cql query is very long,and I don’t know whether there is limitation for the length of the cql.
Please give me some advice,thanks in advance.

Using the SELECT ... IN syntax  means that:
- the driver will not be able to route the queries to the nodes which have the partition
- a single coordinator must scatter-gather the query and results

Break this up into a series of single statements using the executeAsync method and gather the results via something like Futures in Guava or similar.

Re: Time serial column family design

Posted by Nate McCall <na...@thelastpickle.com>.

>
>
> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
> (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
>
> But this cause the cql query is very long,and I don’t know whether there
> is limitation for the length of the cql.
>
> Please give me some advice,thanks in advance.
>

Using the SELECT ... IN syntax  means that:
- the driver will not be able to route the queries to the nodes which have
the partition
- a single coordinator must scatter-gather the query and results

Break this up into a series of single statements using the executeAsync
method and gather the results via something like Futures in Guava or
similar.