You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Hanauer, Arnulf, Vodacom South Africa (External)" <Ar...@vcontractor.co.za> on 2020/03/06 10:14:55 UTC

Performance of Data Types used for Primary keys

Hi Cassandra folks,

Is there any difference in performance of general operations if using a TEXT based Primary key versus a BIGINT Primary key.

Our use-case requires low latency reads but currently the Primary key is TEXT based but the data could work on BIGINT. We are trying to optimise where possible.
Any experiences that could point to a winner?


Kind regards
Arnulf Hanauer



"This e-mail is sent on the Terms and Conditions that can be accessed by Clicking on this linkhttps://www.vodacom.co.za/vodacom/terms/email-acceptable-user-policy" 

RE: [EXTERNAL] Re: Performance of Data Types used for Primary keys

Posted by "Durity, Sean R" <SE...@homedepot.com>.
I agree. Cassandra already hashes the partition key to a numeric token.

Sean Durity

From: Jon Haddad <jo...@jonhaddad.com>
Sent: Friday, March 6, 2020 9:29 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Performance of Data Types used for Primary keys

It's not going to matter at all.

On Fri, Mar 6, 2020, 2:15 AM Hanauer, Arnulf, Vodacom South Africa (External) <Ar...@vcontractor.co.za>> wrote:
Hi Cassandra folks,

Is there any difference in performance of general operations if using a TEXT based Primary key versus a BIGINT Primary key.

Our use-case requires low latency reads but currently the Primary key is TEXT based but the data could work on BIGINT. We are trying to optimise where possible.
Any experiences that could point to a winner?


Kind regards
Arnulf Hanauer









"This e-mail is sent on the Terms and Conditions that can be accessed by Clicking on this link https://webmail.vodacom.co.za/tc/default.html [vodacom.co.za]<https://urldefense.com/v3/__https:/www.vodacom.co.za/vodacom/terms/email-acceptable-user-policy__;!!M-nmYVHPHQ!cQ0VYz-L_NF7utZ99Mz-BJGPsQOCWOykrGzBAiIo_lxJVyJVs-FnQN4b-lvC7Hxg4J_qi6M$> "

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: Performance of Data Types used for Primary keys

Posted by Jon Haddad <jo...@jonhaddad.com>.
It's not going to matter at all.

On Fri, Mar 6, 2020, 2:15 AM Hanauer, Arnulf, Vodacom South Africa
(External) <Ar...@vcontractor.co.za> wrote:

> Hi Cassandra folks,
>
>
>
> Is there any difference in performance of general operations if using a
> TEXT based Primary key versus a BIGINT Primary key.
>
>
>
> Our use-case requires low latency reads but currently the Primary key is
> TEXT based but the data could work on BIGINT. We are trying to optimise
> where possible.
>
> Any experiences that could point to a winner?
>
>
>
>
>
> Kind regards
> Arnulf Hanauer
>
>
>
>
>
>
>
>
>
>
> "This e-mail is sent on the Terms and Conditions that can be accessed by
> Clicking on this link https://webmail.vodacom.co.za/tc/default.html
> <https://www.vodacom.co.za/vodacom/terms/email-acceptable-user-policy> "
>

Re: Performance of Data Types used for Primary keys

Posted by Reid Pinchback <rp...@tripadvisor.com>.
If you care about low-latency reads, I’d worry less about columnar data types, and more about the general quality of the data modeling and usage patterns, and tuning the things that you see cause latency spikes.  There isn’t just a single cause to latency spikes, so expect to spend a couple of months playing whack-a-mole as you identify root causes.

What you’re likely going to see most impacting latency variance are GC and I/O artifacts.  That’s a quick thing to say, but isolating what specifically to do, that’s where the hard work comes in.  Overly-simplistic guesses on what to do, I haven’t seen pan out very well. A lot of the tuning knobs in C* can start to feel like a kid’s teeter-totter, because making one dynamic better is sometimes at the expense of making something else be worse. Quality metric gathering and heap examinations will be your friend, and expect to do bursts of per-second and sometimes sub-second metric examinations.  I/O in particular, you often won’t realize what is going on without a high enough metric frequency to see when and how I/O ops are suddenly getting queued up.

Throughput in C* is easier to tune for than latency, and writes are easier to have fast than the reads because of how C* is designed.  Latency on reads, you’re in your worst-case tuning scenario. particularly if you’re looking for tight latency at 3 9’s.

Don’t forget to see how your numbers stack up during repairs.  That includes both nodetool or reaper-managed repairs, but per my comment on usage patterns, if you have antipatterns like write-then-read-back going on, under the hood you’ll be triggering the equivalent of localized repairs.  All of that adds to GC pressure, and hence to latency variance.
From: "Hanauer, Arnulf, Vodacom South Africa (External)" <Ar...@vcontractor.co.za>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Friday, March 6, 2020 at 5:15 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Performance of Data Types used for Primary keys

Message from External Sender
Hi Cassandra folks,

Is there any difference in performance of general operations if using a TEXT based Primary key versus a BIGINT Primary key.

Our use-case requires low latency reads but currently the Primary key is TEXT based but the data could work on BIGINT. We are trying to optimise where possible.
Any experiences that could point to a winner?


Kind regards
Arnulf Hanauer











"This e-mail is sent on the Terms and Conditions that can be accessed by Clicking on this link https://webmail.vodacom.co.za/tc/default.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.vodacom.co.za_vodacom_terms_email-2Dacceptable-2Duser-2Dpolicy&d=DwMFAg&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=57fboCMzTVES21tjhLMKhiwSSDcQSxciDyaBdC6yJtA&s=1dy32y2P5dHUOOpQLhr-0I6Tu1EjX4bJoduN8jq3Nwg&e=> "