You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rahul Reddy <ra...@gmail.com> on 2019/08/18 19:57:32 UTC

New column

Hello,

We have a table and want to add column and select based on existing entire
primary key plus new column using allow filtering. Since my where clause
has all the primary key + new column does the allow filtering scan only the
partions which are listed or does it has to scan whole table? What is the
best approach add new column and query it based on existing primary key
plus new column?

Re: New column

Posted by Stefan Miklosovic <st...@instaclustr.com>.
You have to basically create new table and include that column either
as part of primary key or you make it a clustering column. Avoid using
allow filtering, it should not be used in production nor any serious
app.

On Sun, 18 Aug 2019 at 21:57, Rahul Reddy <ra...@gmail.com> wrote:
>
> Hello,
>
> We have a table and want to add column and select based on existing entire primary key plus new column using allow filtering. Since my where clause has all the primary key + new column does the allow filtering scan only the partions which are listed or does it has to scan whole table? What is the best approach add new column and query it based on existing primary key plus new column?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: New column

Posted by Rahul Reddy <ra...@gmail.com>.
Thank you jon

On Thu, Aug 22, 2019, 1:27 PM Jon Haddad <jo...@jonhaddad.com> wrote:

> Just to close the loop on this, I did a release of tlp-stress last night,
> which now has this workload (AllowFiltering).  You can grab a deb, rpm,
> tarball or docker image.
>
> Docs are here: http://thelastpickle.com/tlp-stress/
>
> Jon
>
> On Mon, Aug 19, 2019 at 2:21 PM Jon Haddad <jo...@jonhaddad.com> wrote:
>
>> It'll be about the same overhead as selecting the entire partition, since
>> that's essentially what you're doing.
>>
>> I created a tlp-stress workload this morning but haven't merged it into
>> master yet.  I need to do a little cleanup and I might tweak it a little,
>> but if you're feeling adventurous you can build the branch yourself:
>> https://github.com/thelastpickle/tlp-stress/tree/jon/106-allow-filtering-workload
>>
>> Once you do an in place build (./gradlew shadowJar), you'll probably want
>> to do something like the following:
>>
>> bin/tlp-stress run AllowFiltering -p 1k -d 1h -r .5 --populate 1m
>> --field.allow_filtering.payload='random(100,200)' --compaction lcs
>>
>> That's running against C* on my laptop.  Here's what all those arguments
>> do:
>>
>> -p 1k # 1000 partitions
>> -d 1h # run for 1 hour (-d = duration)
>> -r .5  # (50% reads)
>> --populate 1m # (pre populate with 1 million rows)
>> --field.allow_filtering.payload='random(100,200)'  # use 100 - 200 bytes
>> for the payload.  I assume there will be other data other than just the
>> record, this will let you size each row accordingly
>> --compaction lcs # use leveled compaction
>>
>> You can tweak the params as needed.  If you've got a cluster up, use the
>> --host to point to it.    If you don't have a cluster up, you can spin one
>> up in AWS in about 5-10 minutes using our tools:
>> https://thelastpickle.com/tlp-cluster/
>>
>> Happy testing!
>> Jon
>>
>>
>> On Mon, Aug 19, 2019 at 1:23 PM Rahul Reddy <ra...@gmail.com>
>> wrote:
>>
>>> Jon,
>>>
>>> If we expect non of  our partition key to have more than 100 records and
>>> pass partition key in where clause we wouldnt see issues using new column
>>> and allow filtering?  Can you please point me to any doc how allow
>>> filtering works. I was in assumption of it goes through all the partitions
>>>
>>>
>>> On Sun, Aug 18, 2019, 4:33 PM Jon Haddad <jo...@jonhaddad.com> wrote:
>>>
>>>> If you're giving the partition key you won't scan the whole table. The
>>>> overhead will depend on the size or the partition.
>>>>
>>>> Would be an interesting workload for our tlp-stress tool, I'll code
>>>> something up for the next release.
>>>>
>>>> On Sun, Aug 18, 2019, 12:58 PM Rahul Reddy <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We have a table and want to add column and select based on existing
>>>>> entire primary key plus new column using allow filtering. Since my where
>>>>> clause has all the primary key + new column does the allow filtering scan
>>>>> only the partions which are listed or does it has to scan whole table? What
>>>>> is the best approach add new column and query it based on existing primary
>>>>> key plus new column?
>>>>>
>>>>

Re: New column

Posted by Jon Haddad <jo...@jonhaddad.com>.
Just to close the loop on this, I did a release of tlp-stress last night,
which now has this workload (AllowFiltering).  You can grab a deb, rpm,
tarball or docker image.

Docs are here: http://thelastpickle.com/tlp-stress/

Jon

On Mon, Aug 19, 2019 at 2:21 PM Jon Haddad <jo...@jonhaddad.com> wrote:

> It'll be about the same overhead as selecting the entire partition, since
> that's essentially what you're doing.
>
> I created a tlp-stress workload this morning but haven't merged it into
> master yet.  I need to do a little cleanup and I might tweak it a little,
> but if you're feeling adventurous you can build the branch yourself:
> https://github.com/thelastpickle/tlp-stress/tree/jon/106-allow-filtering-workload
>
> Once you do an in place build (./gradlew shadowJar), you'll probably want
> to do something like the following:
>
> bin/tlp-stress run AllowFiltering -p 1k -d 1h -r .5 --populate 1m
> --field.allow_filtering.payload='random(100,200)' --compaction lcs
>
> That's running against C* on my laptop.  Here's what all those arguments
> do:
>
> -p 1k # 1000 partitions
> -d 1h # run for 1 hour (-d = duration)
> -r .5  # (50% reads)
> --populate 1m # (pre populate with 1 million rows)
> --field.allow_filtering.payload='random(100,200)'  # use 100 - 200 bytes
> for the payload.  I assume there will be other data other than just the
> record, this will let you size each row accordingly
> --compaction lcs # use leveled compaction
>
> You can tweak the params as needed.  If you've got a cluster up, use the
> --host to point to it.    If you don't have a cluster up, you can spin one
> up in AWS in about 5-10 minutes using our tools:
> https://thelastpickle.com/tlp-cluster/
>
> Happy testing!
> Jon
>
>
> On Mon, Aug 19, 2019 at 1:23 PM Rahul Reddy <ra...@gmail.com>
> wrote:
>
>> Jon,
>>
>> If we expect non of  our partition key to have more than 100 records and
>> pass partition key in where clause we wouldnt see issues using new column
>> and allow filtering?  Can you please point me to any doc how allow
>> filtering works. I was in assumption of it goes through all the partitions
>>
>>
>> On Sun, Aug 18, 2019, 4:33 PM Jon Haddad <jo...@jonhaddad.com> wrote:
>>
>>> If you're giving the partition key you won't scan the whole table. The
>>> overhead will depend on the size or the partition.
>>>
>>> Would be an interesting workload for our tlp-stress tool, I'll code
>>> something up for the next release.
>>>
>>> On Sun, Aug 18, 2019, 12:58 PM Rahul Reddy <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> We have a table and want to add column and select based on existing
>>>> entire primary key plus new column using allow filtering. Since my where
>>>> clause has all the primary key + new column does the allow filtering scan
>>>> only the partions which are listed or does it has to scan whole table? What
>>>> is the best approach add new column and query it based on existing primary
>>>> key plus new column?
>>>>
>>>

Re: New column

Posted by Jon Haddad <jo...@jonhaddad.com>.
It'll be about the same overhead as selecting the entire partition, since
that's essentially what you're doing.

I created a tlp-stress workload this morning but haven't merged it into
master yet.  I need to do a little cleanup and I might tweak it a little,
but if you're feeling adventurous you can build the branch yourself:
https://github.com/thelastpickle/tlp-stress/tree/jon/106-allow-filtering-workload

Once you do an in place build (./gradlew shadowJar), you'll probably want
to do something like the following:

bin/tlp-stress run AllowFiltering -p 1k -d 1h -r .5 --populate 1m
--field.allow_filtering.payload='random(100,200)' --compaction lcs

That's running against C* on my laptop.  Here's what all those arguments do:

-p 1k # 1000 partitions
-d 1h # run for 1 hour (-d = duration)
-r .5  # (50% reads)
--populate 1m # (pre populate with 1 million rows)
--field.allow_filtering.payload='random(100,200)'  # use 100 - 200 bytes
for the payload.  I assume there will be other data other than just the
record, this will let you size each row accordingly
--compaction lcs # use leveled compaction

You can tweak the params as needed.  If you've got a cluster up, use the
--host to point to it.    If you don't have a cluster up, you can spin one
up in AWS in about 5-10 minutes using our tools:
https://thelastpickle.com/tlp-cluster/

Happy testing!
Jon


On Mon, Aug 19, 2019 at 1:23 PM Rahul Reddy <ra...@gmail.com>
wrote:

> Jon,
>
> If we expect non of  our partition key to have more than 100 records and
> pass partition key in where clause we wouldnt see issues using new column
> and allow filtering?  Can you please point me to any doc how allow
> filtering works. I was in assumption of it goes through all the partitions
>
>
> On Sun, Aug 18, 2019, 4:33 PM Jon Haddad <jo...@jonhaddad.com> wrote:
>
>> If you're giving the partition key you won't scan the whole table. The
>> overhead will depend on the size or the partition.
>>
>> Would be an interesting workload for our tlp-stress tool, I'll code
>> something up for the next release.
>>
>> On Sun, Aug 18, 2019, 12:58 PM Rahul Reddy <ra...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> We have a table and want to add column and select based on existing
>>> entire primary key plus new column using allow filtering. Since my where
>>> clause has all the primary key + new column does the allow filtering scan
>>> only the partions which are listed or does it has to scan whole table? What
>>> is the best approach add new column and query it based on existing primary
>>> key plus new column?
>>>
>>

Re: New column

Posted by Rahul Reddy <ra...@gmail.com>.
Jon,

If we expect non of  our partition key to have more than 100 records and
pass partition key in where clause we wouldnt see issues using new column
and allow filtering?  Can you please point me to any doc how allow
filtering works. I was in assumption of it goes through all the partitions


On Sun, Aug 18, 2019, 4:33 PM Jon Haddad <jo...@jonhaddad.com> wrote:

> If you're giving the partition key you won't scan the whole table. The
> overhead will depend on the size or the partition.
>
> Would be an interesting workload for our tlp-stress tool, I'll code
> something up for the next release.
>
> On Sun, Aug 18, 2019, 12:58 PM Rahul Reddy <ra...@gmail.com>
> wrote:
>
>> Hello,
>>
>> We have a table and want to add column and select based on existing
>> entire primary key plus new column using allow filtering. Since my where
>> clause has all the primary key + new column does the allow filtering scan
>> only the partions which are listed or does it has to scan whole table? What
>> is the best approach add new column and query it based on existing primary
>> key plus new column?
>>
>

Re: New column

Posted by Jon Haddad <jo...@jonhaddad.com>.
If you're giving the partition key you won't scan the whole table. The
overhead will depend on the size or the partition.

Would be an interesting workload for our tlp-stress tool, I'll code
something up for the next release.

On Sun, Aug 18, 2019, 12:58 PM Rahul Reddy <ra...@gmail.com> wrote:

> Hello,
>
> We have a table and want to add column and select based on existing entire
> primary key plus new column using allow filtering. Since my where clause
> has all the primary key + new column does the allow filtering scan only the
> partions which are listed or does it has to scan whole table? What is the
> best approach add new column and query it based on existing primary key
> plus new column?
>