You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vikas Jaiman <er...@gmail.com> on 2016/11/06 20:42:27 UTC

Are Cassandra writes are faster than reads?

Hi all,

Are Cassandra writes are faster than reads ?? If yes, why is this so? I am
using consistency 1 and data is in memory.

Vikas

RE: Are Cassandra writes are faster than reads?

Posted by Rajesh Radhakrishnan <Ra...@phe.gov.uk>.
Hi,

In my case writing is slower using Python driver, using Batch execution and prepared statements.
I am looking at different ways to speed it up, as I am trying to write 100 * 200 Million records .

Cheers
Rajesh R
________________________________
From: Vikas Jaiman [er.vikasjaiman@gmail.com]
Sent: 07 November 2016 10:43
To: user@cassandra.apache.org
Subject: Re: Are Cassandra writes are faster than reads?

Thanks Jeff and Ben for the info.

On Mon, Nov 7, 2016 at 6:44 AM, Ben Bromhead <ben@instaclustr.com<redir.aspx?REF=ey2mzijIPfoP38WPORdqr52eAUdx5abbX0dZnLhBi3ZpQd62uAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote:
They can be and it depends on your compaction strategy :)

On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac200@gmail.com<redir.aspx?REF=y2UbHNoyvav6lpbIQuVob9scj_-eADBmQptG4Uvt5C5pQd62uAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> wrote:
tl;dr? I just want to know if updates are bad for performance, and if so, for how long.

On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <ben@instaclustr.com<redir.aspx?REF=ey2mzijIPfoP38WPORdqr52eAUdx5abbX0dZnLhBi3ZpQd62uAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote:
Check out https://wiki.apache.org/cassandra/WritePathForUsers<redir.aspx?REF=Oqikfm09AEccf_SL9_unEbJh198hCTPzdyEOxatdaXBpQd62uAfUCAFodHRwczovL3dpa2kuYXBhY2hlLm9yZy9jYXNzYW5kcmEvV3JpdGVQYXRoRm9yVXNlcnM.> for the full gory details.

On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac200@gmail.com<redir.aspx?REF=y2UbHNoyvav6lpbIQuVob9scj_-eADBmQptG4Uvt5C5pQd62uAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> wrote:
How long does it take for updates to get merged / compacted into the main data file?

On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <ben@instaclustr.com<redir.aspx?REF=ey2mzijIPfoP38WPORdqr52eAUdx5abbX0dZnLhBi3ZpQd62uAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote:
To add some flavor as to how the commitlog implementation is so quick.

It only flushes to disk every 10s by default. So writes are effectively done to memory and then to disk asynchronously later on. This is generally accepted to be OK, as the write is also going to other nodes.

You can of course change this behavior to flush on each write or to skip the commitlog altogether (danger!). This however will change how "safe" things are from a durability perspective.

On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.jirsa@crowdstrike.com<redir.aspx?REF=10sTR-XC53MCnCxaOGffDwnNLsWSpMDBGUFRYenqeSxpQd62uAfUCAFtYWlsdG86amVmZi5qaXJzYUBjcm93ZHN0cmlrZS5jb20.>> wrote:

Cassandra writes are particularly fast, for a few reasons:



1)       Most writes go to a commitlog (append-only file, written linearly, so particularly fast in terms of disk operations) and then pushed to the memTable. Memtable is flushed in batches to the permanent data files, so it buffers many mutations and then does a sequential write to persist that data to disk.

2)       Reads may have to merge data from many data tables on disk. Because the writes (described very briefly in step 1) write to immutable files, updates/deletes have to be merged on read – this is extra effort for the read path.



If you don’t do much in terms of overwrites/deletes, and your partitions are particularly small, and your data fits in RAM (probably mmap/page cache of data files, unless you’re using the row cache), reads may be very fast for you. Certainly individual reads on low-merge workloads can be < 0.1ms.



-          Jeff



From: Vikas Jaiman <er.vikasjaiman@gmail.com<redir.aspx?REF=qhOUWUvNa2wfs0uwEblsPbLhZd7IlBDrvIA51F6ZYpBpQd62uAfUCAFtYWlsdG86ZXIudmlrYXNqYWltYW5AZ21haWwuY29t>>
Reply-To: "user@cassandra.apache.org<redir.aspx?REF=lHYm4DqZQSPlb4r_E0nu-vqqh9-x0l01cgX0d9aUWUFpQd62uAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" <user@cassandra.apache.org<redir.aspx?REF=lHYm4DqZQSPlb4r_E0nu-vqqh9-x0l01cgX0d9aUWUFpQd62uAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>>
Date: Sunday, November 6, 2016 at 12:42 PM
To: "user@cassandra.apache.org<redir.aspx?REF=lHYm4DqZQSPlb4r_E0nu-vqqh9-x0l01cgX0d9aUWUFpQd62uAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" <user@cassandra.apache.org<redir.aspx?REF=lHYm4DqZQSPlb4r_E0nu-vqqh9-x0l01cgX0d9aUWUFpQd62uAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>>
Subject: Are Cassandra writes are faster than reads?



Hi all,



Are Cassandra writes are faster than reads ?? If yes, why is this so? I am using consistency 1 and data is in memory.



Vikas

--
Ben Bromhead
CTO | Instaclustr<redir.aspx?REF=LADWmNdI1Cf1wI5U3Sp0ZqCWl66NSFTd0-qqd0iNvPdpQd62uAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v>
+1 650 284 9692<tel:%2B1%20650%20284%209692>
Managed Cassandra / Spark on AWS, Azure and Softlayer

--
Ben Bromhead
CTO | Instaclustr<redir.aspx?REF=LADWmNdI1Cf1wI5U3Sp0ZqCWl66NSFTd0-qqd0iNvPdpQd62uAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v>
+1 650 284 9692<tel:%2B1%20650%20284%209692>
Managed Cassandra / Spark on AWS, Azure and Softlayer

--
Ben Bromhead
CTO | Instaclustr<redir.aspx?REF=LADWmNdI1Cf1wI5U3Sp0ZqCWl66NSFTd0-qqd0iNvPdpQd62uAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v>
+1 650 284 9692<tel:%2B1%20650%20284%209692>
Managed Cassandra / Spark on AWS, Azure and Softlayer





**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
**************************************************************************

Re: Are Cassandra writes are faster than reads?

Posted by Vikas Jaiman <er...@gmail.com>.
Thanks Jeff and Ben for the info.

On Mon, Nov 7, 2016 at 6:44 AM, Ben Bromhead <be...@instaclustr.com> wrote:

> They can be and it depends on your compaction strategy :)
>
> On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <al...@gmail.com> wrote:
>
>> tl;dr? I just want to know if updates are bad for performance, and if so,
>> for how long.
>>
>> On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <be...@instaclustr.com>
>> wrote:
>>
>> Check out https://wiki.apache.org/cassandra/WritePathForUsers for the
>> full gory details.
>>
>> On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <al...@gmail.com> wrote:
>>
>> How long does it take for updates to get merged / compacted into the main
>> data file?
>>
>> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <be...@instaclustr.com> wrote:
>>
>> To add some flavor as to how the commitlog implementation is so quick.
>>
>> It only flushes to disk every 10s by default. So writes are effectively
>> done to memory and then to disk asynchronously later on. This is generally
>> accepted to be OK, as the write is also going to other nodes.
>>
>> You can of course change this behavior to flush on each write or to skip
>> the commitlog altogether (danger!). This however will change how "safe"
>> things are from a durability perspective.
>>
>> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <je...@crowdstrike.com> wrote:
>>
>> Cassandra writes are particularly fast, for a few reasons:
>>
>>
>>
>> 1)       Most writes go to a commitlog (append-only file, written
>> linearly, so particularly fast in terms of disk operations) and then pushed
>> to the memTable. Memtable is flushed in batches to the permanent data
>> files, so it buffers many mutations and then does a sequential write to
>> persist that data to disk.
>>
>> 2)       Reads may have to merge data from many data tables on disk.
>> Because the writes (described very briefly in step 1) write to immutable
>> files, updates/deletes have to be merged on read – this is extra effort for
>> the read path.
>>
>>
>>
>> If you don’t do much in terms of overwrites/deletes, and your partitions
>> are particularly small, and your data fits in RAM (probably mmap/page cache
>> of data files, unless you’re using the row cache), reads may be very fast
>> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>>
>>
>>
>> -          Jeff
>>
>>
>>
>> *From: *Vikas Jaiman <er...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Date: *Sunday, November 6, 2016 at 12:42 PM
>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Subject: *Are Cassandra writes are faster than reads?
>>
>>
>>
>> Hi all,
>>
>>
>>
>> Are Cassandra writes are faster than reads ?? If yes, why is this so? I
>> am using consistency 1 and data is in memory.
>>
>>
>>
>> Vikas
>>
>> --
>> Ben Bromhead
>> CTO | Instaclustr <https://www.instaclustr.com/>
>> +1 650 284 9692
>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>
>>
>> --
>> Ben Bromhead
>> CTO | Instaclustr <https://www.instaclustr.com/>
>> +1 650 284 9692
>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>
>>
>> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>

Re: Are Cassandra writes are faster than reads?

Posted by Ben Bromhead <be...@instaclustr.com>.
Awesome! For a full explanation of what you are seeing (we call it micro
batching) check out Adam Zegelins talk on it
https://www.youtube.com/watch?v=wF3Ec1rdWgc

On Tue, 8 Nov 2016 at 02:21 Rajesh Radhakrishnan <
Rajesh.Radhakrishnan@phe.gov.uk> wrote:

>
> Hi,
>
> Just found that reducing the batch size below 20 also increases the
> writing speed and reduction in memory usage(especially for Python driver).
>
> Kind regards,
> Rajesh R
>
> ------------------------------
> *From:* Ben Bromhead [ben@instaclustr.com]
> *Sent:* 07 November 2016 05:44
> *To:* user@cassandra.apache.org
> *Subject:* Re: Are Cassandra writes are faster than reads?
>
> They can be and it depends on your compaction strategy :)
>
> On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac200@gmail.com
> <http://redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>>
> wrote:
>
> tl;dr? I just want to know if updates are bad for performance, and if so,
> for how long.
>
> On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <ben@instaclustr.com
> <http://redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>
> > wrote:
>
> Check out https://wiki.apache.org/cassandra/WritePathForUsers
> <http://redir.aspx?REF=z6gebtTM9Bi4b1ZEZqnpcgJOwnifCWloccEOX28F8UC8Cvd6wAfUCAFodHRwczovL3dpa2kuYXBhY2hlLm9yZy9jYXNzYW5kcmEvV3JpdGVQYXRoRm9yVXNlcnM.> for
> the full gory details.
>
> On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac200@gmail.com
> <http://redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>>
> wrote:
>
> How long does it take for updates to get merged / compacted into the main
> data file?
>
> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <ben@instaclustr.com
> <http://redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>
> > wrote:
>
> To add some flavor as to how the commitlog implementation is so quick.
>
> It only flushes to disk every 10s by default. So writes are effectively
> done to memory and then to disk asynchronously later on. This is generally
> accepted to be OK, as the write is also going to other nodes.
>
> You can of course change this behavior to flush on each write or to skip
> the commitlog altogether (danger!). This however will change how "safe"
> things are from a durability perspective.
>
> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.jirsa@crowdstrike.com
> <http://redir.aspx?REF=CSJmlUdwjTSoe3NQdZNlO6pFPeaI_KxNpZweB-GbDYO8Cvd6wAfUCAFtYWlsdG86amVmZi5qaXJzYUBjcm93ZHN0cmlrZS5jb20.>>
> wrote:
>
> Cassandra writes are particularly fast, for a few reasons:
>
>
>
> 1)       Most writes go to a commitlog (append-only file, written
> linearly, so particularly fast in terms of disk operations) and then pushed
> to the memTable. Memtable is flushed in batches to the permanent data
> files, so it buffers many mutations and then does a sequential write to
> persist that data to disk.
>
> 2)       Reads may have to merge data from many data tables on disk.
> Because the writes (described very briefly in step 1) write to immutable
> files, updates/deletes have to be merged on read – this is extra effort for
> the read path.
>
>
>
> If you don’t do much in terms of overwrites/deletes, and your partitions
> are particularly small, and your data fits in RAM (probably mmap/page cache
> of data files, unless you’re using the row cache), reads may be very fast
> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>
>
>
> -          Jeff
>
>
>
> *From: *Vikas Jaiman <er.vikasjaiman@gmail.com
> <http://redir.aspx?REF=VgqqnBUEzP6sLWofnDxFp3iyHQ4TGCTJL8MbqH0NOUK8Cvd6wAfUCAFtYWlsdG86ZXIudmlrYXNqYWltYW5AZ21haWwuY29t>
> >
> *Reply-To: *"user@cassandra.apache.org
> <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>"
> <user@cassandra.apache.org
> <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>
> >
> *Date: *Sunday, November 6, 2016 at 12:42 PM
> *To: *"user@cassandra.apache.org
> <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>"
> <user@cassandra.apache.org
> <http://redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>
> >
> *Subject: *Are Cassandra writes are faster than reads?
>
>
>
> Hi all,
>
>
>
> Are Cassandra writes are faster than reads ?? If yes, why is this so? I am
> using consistency 1 and data is in memory.
>
>
>
> Vikas
>
> --
> Ben Bromhead
> CTO | Instaclustr
> <http://redir.aspx?REF=N46JHXr59B026V3xSfBozh2xZoVS0DwdAV5Sm_LybJG8Cvd6wAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
> Ben Bromhead
> CTO | Instaclustr
> <http://redir.aspx?REF=Y61HittTE07k3NR47zwHMClylS3zrPdxkOXCEQRVNWUdbPl6wAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
> Ben Bromhead
> CTO | Instaclustr
> <http://redir.aspx?REF=Y61HittTE07k3NR47zwHMClylS3zrPdxkOXCEQRVNWUdbPl6wAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v>
> +1 650 284 9692 <(650)%20284-9692>
> Managed Cassandra / Spark on AWS, Azure and Softlayer
> **************************************************************************
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before opening or
> saving. http://www.gov.uk/PHE
> **************************************************************************
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

RE: Are Cassandra writes are faster than reads?

Posted by Rajesh Radhakrishnan <Ra...@phe.gov.uk>.
Hi,

Just found that reducing the batch size below 20 also increases the writing speed and reduction in memory usage(especially for Python driver).

Kind regards,
Rajesh R

________________________________
From: Ben Bromhead [ben@instaclustr.com]
Sent: 07 November 2016 05:44
To: user@cassandra.apache.org
Subject: Re: Are Cassandra writes are faster than reads?

They can be and it depends on your compaction strategy :)

On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <ali.rac200@gmail.com<redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> wrote:
tl;dr? I just want to know if updates are bad for performance, and if so, for how long.

On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <ben@instaclustr.com<redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote:
Check out https://wiki.apache.org/cassandra/WritePathForUsers<redir.aspx?REF=z6gebtTM9Bi4b1ZEZqnpcgJOwnifCWloccEOX28F8UC8Cvd6wAfUCAFodHRwczovL3dpa2kuYXBhY2hlLm9yZy9jYXNzYW5kcmEvV3JpdGVQYXRoRm9yVXNlcnM.> for the full gory details.

On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <ali.rac200@gmail.com<redir.aspx?REF=KvuN_F91CkILmAKkPOD8RLOkpaObm4vWZ4CTx2PNAjG8Cvd6wAfUCAFtYWlsdG86YWxpLnJhYzIwMEBnbWFpbC5jb20.>> wrote:
How long does it take for updates to get merged / compacted into the main data file?

On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <ben@instaclustr.com<redir.aspx?REF=bOLz-2Z_cjZ-R5mW4ySFRmRgIvYoWF43pRrpxxUsOOC8Cvd6wAfUCAFtYWlsdG86YmVuQGluc3RhY2x1c3RyLmNvbQ..>> wrote:
To add some flavor as to how the commitlog implementation is so quick.

It only flushes to disk every 10s by default. So writes are effectively done to memory and then to disk asynchronously later on. This is generally accepted to be OK, as the write is also going to other nodes.

You can of course change this behavior to flush on each write or to skip the commitlog altogether (danger!). This however will change how "safe" things are from a durability perspective.

On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <jeff.jirsa@crowdstrike.com<redir.aspx?REF=CSJmlUdwjTSoe3NQdZNlO6pFPeaI_KxNpZweB-GbDYO8Cvd6wAfUCAFtYWlsdG86amVmZi5qaXJzYUBjcm93ZHN0cmlrZS5jb20.>> wrote:

Cassandra writes are particularly fast, for a few reasons:



1)       Most writes go to a commitlog (append-only file, written linearly, so particularly fast in terms of disk operations) and then pushed to the memTable. Memtable is flushed in batches to the permanent data files, so it buffers many mutations and then does a sequential write to persist that data to disk.

2)       Reads may have to merge data from many data tables on disk. Because the writes (described very briefly in step 1) write to immutable files, updates/deletes have to be merged on read – this is extra effort for the read path.



If you don’t do much in terms of overwrites/deletes, and your partitions are particularly small, and your data fits in RAM (probably mmap/page cache of data files, unless you’re using the row cache), reads may be very fast for you. Certainly individual reads on low-merge workloads can be < 0.1ms.



-          Jeff



From: Vikas Jaiman <er.vikasjaiman@gmail.com<redir.aspx?REF=VgqqnBUEzP6sLWofnDxFp3iyHQ4TGCTJL8MbqH0NOUK8Cvd6wAfUCAFtYWlsdG86ZXIudmlrYXNqYWltYW5AZ21haWwuY29t>>
Reply-To: "user@cassandra.apache.org<redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" <user@cassandra.apache.org<redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>>
Date: Sunday, November 6, 2016 at 12:42 PM
To: "user@cassandra.apache.org<redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>" <user@cassandra.apache.org<redir.aspx?REF=yxCMb2E-WgRKlJCeCUpFf-0-Th-NE4pZJyZdWo0SRMS8Cvd6wAfUCAFtYWlsdG86dXNlckBjYXNzYW5kcmEuYXBhY2hlLm9yZw..>>
Subject: Are Cassandra writes are faster than reads?



Hi all,



Are Cassandra writes are faster than reads ?? If yes, why is this so? I am using consistency 1 and data is in memory.



Vikas

--
Ben Bromhead
CTO | Instaclustr<redir.aspx?REF=N46JHXr59B026V3xSfBozh2xZoVS0DwdAV5Sm_LybJG8Cvd6wAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v>
+1 650 284 9692<tel:%2B1%20650%20284%209692>
Managed Cassandra / Spark on AWS, Azure and Softlayer

--
Ben Bromhead
CTO | Instaclustr<redir.aspx?REF=Y61HittTE07k3NR47zwHMClylS3zrPdxkOXCEQRVNWUdbPl6wAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v>
+1 650 284 9692<tel:%2B1%20650%20284%209692>
Managed Cassandra / Spark on AWS, Azure and Softlayer

--
Ben Bromhead
CTO | Instaclustr<redir.aspx?REF=Y61HittTE07k3NR47zwHMClylS3zrPdxkOXCEQRVNWUdbPl6wAfUCAFodHRwczovL3d3dy5pbnN0YWNsdXN0ci5jb20v>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
**************************************************************************

Re: Are Cassandra writes are faster than reads?

Posted by Ben Bromhead <be...@instaclustr.com>.
They can be and it depends on your compaction strategy :)

On Sun, 6 Nov 2016 at 21:24 Ali Akhtar <al...@gmail.com> wrote:

> tl;dr? I just want to know if updates are bad for performance, and if so,
> for how long.
>
> On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <be...@instaclustr.com> wrote:
>
> Check out https://wiki.apache.org/cassandra/WritePathForUsers for the
> full gory details.
>
> On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <al...@gmail.com> wrote:
>
> How long does it take for updates to get merged / compacted into the main
> data file?
>
> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <be...@instaclustr.com> wrote:
>
> To add some flavor as to how the commitlog implementation is so quick.
>
> It only flushes to disk every 10s by default. So writes are effectively
> done to memory and then to disk asynchronously later on. This is generally
> accepted to be OK, as the write is also going to other nodes.
>
> You can of course change this behavior to flush on each write or to skip
> the commitlog altogether (danger!). This however will change how "safe"
> things are from a durability perspective.
>
> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <je...@crowdstrike.com> wrote:
>
> Cassandra writes are particularly fast, for a few reasons:
>
>
>
> 1)       Most writes go to a commitlog (append-only file, written
> linearly, so particularly fast in terms of disk operations) and then pushed
> to the memTable. Memtable is flushed in batches to the permanent data
> files, so it buffers many mutations and then does a sequential write to
> persist that data to disk.
>
> 2)       Reads may have to merge data from many data tables on disk.
> Because the writes (described very briefly in step 1) write to immutable
> files, updates/deletes have to be merged on read – this is extra effort for
> the read path.
>
>
>
> If you don’t do much in terms of overwrites/deletes, and your partitions
> are particularly small, and your data fits in RAM (probably mmap/page cache
> of data files, unless you’re using the row cache), reads may be very fast
> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>
>
>
> -          Jeff
>
>
>
> *From: *Vikas Jaiman <er...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Sunday, November 6, 2016 at 12:42 PM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *Are Cassandra writes are faster than reads?
>
>
>
> Hi all,
>
>
>
> Are Cassandra writes are faster than reads ?? If yes, why is this so? I am
> using consistency 1 and data is in memory.
>
>
>
> Vikas
>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Are Cassandra writes are faster than reads?

Posted by Ali Akhtar <al...@gmail.com>.
tl;dr? I just want to know if updates are bad for performance, and if so,
for how long.

On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead <be...@instaclustr.com> wrote:

> Check out https://wiki.apache.org/cassandra/WritePathForUsers for the
> full gory details.
>
> On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <al...@gmail.com> wrote:
>
>> How long does it take for updates to get merged / compacted into the main
>> data file?
>>
>> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <be...@instaclustr.com> wrote:
>>
>> To add some flavor as to how the commitlog implementation is so quick.
>>
>> It only flushes to disk every 10s by default. So writes are effectively
>> done to memory and then to disk asynchronously later on. This is generally
>> accepted to be OK, as the write is also going to other nodes.
>>
>> You can of course change this behavior to flush on each write or to skip
>> the commitlog altogether (danger!). This however will change how "safe"
>> things are from a durability perspective.
>>
>> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <je...@crowdstrike.com> wrote:
>>
>> Cassandra writes are particularly fast, for a few reasons:
>>
>>
>>
>> 1)       Most writes go to a commitlog (append-only file, written
>> linearly, so particularly fast in terms of disk operations) and then pushed
>> to the memTable. Memtable is flushed in batches to the permanent data
>> files, so it buffers many mutations and then does a sequential write to
>> persist that data to disk.
>>
>> 2)       Reads may have to merge data from many data tables on disk.
>> Because the writes (described very briefly in step 1) write to immutable
>> files, updates/deletes have to be merged on read – this is extra effort for
>> the read path.
>>
>>
>>
>> If you don’t do much in terms of overwrites/deletes, and your partitions
>> are particularly small, and your data fits in RAM (probably mmap/page cache
>> of data files, unless you’re using the row cache), reads may be very fast
>> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>>
>>
>>
>> -          Jeff
>>
>>
>>
>> *From: *Vikas Jaiman <er...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Date: *Sunday, November 6, 2016 at 12:42 PM
>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Subject: *Are Cassandra writes are faster than reads?
>>
>>
>>
>> Hi all,
>>
>>
>>
>> Are Cassandra writes are faster than reads ?? If yes, why is this so? I
>> am using consistency 1 and data is in memory.
>>
>>
>>
>> Vikas
>>
>> --
>> Ben Bromhead
>> CTO | Instaclustr <https://www.instaclustr.com/>
>> +1 650 284 9692
>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>
>>
>> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>

Re: Are Cassandra writes are faster than reads?

Posted by Ben Bromhead <be...@instaclustr.com>.
Check out https://wiki.apache.org/cassandra/WritePathForUsers for the full
gory details.

On Sun, 6 Nov 2016 at 21:09 Ali Akhtar <al...@gmail.com> wrote:

> How long does it take for updates to get merged / compacted into the main
> data file?
>
> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <be...@instaclustr.com> wrote:
>
> To add some flavor as to how the commitlog implementation is so quick.
>
> It only flushes to disk every 10s by default. So writes are effectively
> done to memory and then to disk asynchronously later on. This is generally
> accepted to be OK, as the write is also going to other nodes.
>
> You can of course change this behavior to flush on each write or to skip
> the commitlog altogether (danger!). This however will change how "safe"
> things are from a durability perspective.
>
> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <je...@crowdstrike.com> wrote:
>
> Cassandra writes are particularly fast, for a few reasons:
>
>
>
> 1)       Most writes go to a commitlog (append-only file, written
> linearly, so particularly fast in terms of disk operations) and then pushed
> to the memTable. Memtable is flushed in batches to the permanent data
> files, so it buffers many mutations and then does a sequential write to
> persist that data to disk.
>
> 2)       Reads may have to merge data from many data tables on disk.
> Because the writes (described very briefly in step 1) write to immutable
> files, updates/deletes have to be merged on read – this is extra effort for
> the read path.
>
>
>
> If you don’t do much in terms of overwrites/deletes, and your partitions
> are particularly small, and your data fits in RAM (probably mmap/page cache
> of data files, unless you’re using the row cache), reads may be very fast
> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>
>
>
> -          Jeff
>
>
>
> *From: *Vikas Jaiman <er...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Sunday, November 6, 2016 at 12:42 PM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *Are Cassandra writes are faster than reads?
>
>
>
> Hi all,
>
>
>
> Are Cassandra writes are faster than reads ?? If yes, why is this so? I am
> using consistency 1 and data is in memory.
>
>
>
> Vikas
>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Are Cassandra writes are faster than reads?

Posted by Ali Akhtar <al...@gmail.com>.
How long does it take for updates to get merged / compacted into the main
data file?

On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead <be...@instaclustr.com> wrote:

> To add some flavor as to how the commitlog implementation is so quick.
>
> It only flushes to disk every 10s by default. So writes are effectively
> done to memory and then to disk asynchronously later on. This is generally
> accepted to be OK, as the write is also going to other nodes.
>
> You can of course change this behavior to flush on each write or to skip
> the commitlog altogether (danger!). This however will change how "safe"
> things are from a durability perspective.
>
> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <je...@crowdstrike.com> wrote:
>
>> Cassandra writes are particularly fast, for a few reasons:
>>
>>
>>
>> 1)       Most writes go to a commitlog (append-only file, written
>> linearly, so particularly fast in terms of disk operations) and then pushed
>> to the memTable. Memtable is flushed in batches to the permanent data
>> files, so it buffers many mutations and then does a sequential write to
>> persist that data to disk.
>>
>> 2)       Reads may have to merge data from many data tables on disk.
>> Because the writes (described very briefly in step 1) write to immutable
>> files, updates/deletes have to be merged on read – this is extra effort for
>> the read path.
>>
>>
>>
>> If you don’t do much in terms of overwrites/deletes, and your partitions
>> are particularly small, and your data fits in RAM (probably mmap/page cache
>> of data files, unless you’re using the row cache), reads may be very fast
>> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>>
>>
>>
>> -          Jeff
>>
>>
>>
>> *From: *Vikas Jaiman <er...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Date: *Sunday, November 6, 2016 at 12:42 PM
>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Subject: *Are Cassandra writes are faster than reads?
>>
>>
>>
>> Hi all,
>>
>>
>>
>> Are Cassandra writes are faster than reads ?? If yes, why is this so? I
>> am using consistency 1 and data is in memory.
>>
>>
>>
>> Vikas
>>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>

Re: Are Cassandra writes are faster than reads?

Posted by Ben Bromhead <be...@instaclustr.com>.
To add some flavor as to how the commitlog implementation is so quick.

It only flushes to disk every 10s by default. So writes are effectively
done to memory and then to disk asynchronously later on. This is generally
accepted to be OK, as the write is also going to other nodes.

You can of course change this behavior to flush on each write or to skip
the commitlog altogether (danger!). This however will change how "safe"
things are from a durability perspective.

On Sun, Nov 6, 2016, 12:51 Jeff Jirsa <je...@crowdstrike.com> wrote:

> Cassandra writes are particularly fast, for a few reasons:
>
>
>
> 1)       Most writes go to a commitlog (append-only file, written
> linearly, so particularly fast in terms of disk operations) and then pushed
> to the memTable. Memtable is flushed in batches to the permanent data
> files, so it buffers many mutations and then does a sequential write to
> persist that data to disk.
>
> 2)       Reads may have to merge data from many data tables on disk.
> Because the writes (described very briefly in step 1) write to immutable
> files, updates/deletes have to be merged on read – this is extra effort for
> the read path.
>
>
>
> If you don’t do much in terms of overwrites/deletes, and your partitions
> are particularly small, and your data fits in RAM (probably mmap/page cache
> of data files, unless you’re using the row cache), reads may be very fast
> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>
>
>
> -          Jeff
>
>
>
> *From: *Vikas Jaiman <er...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Sunday, November 6, 2016 at 12:42 PM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *Are Cassandra writes are faster than reads?
>
>
>
> Hi all,
>
>
>
> Are Cassandra writes are faster than reads ?? If yes, why is this so? I am
> using consistency 1 and data is in memory.
>
>
>
> Vikas
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Are Cassandra writes are faster than reads?

Posted by Jeff Jirsa <je...@crowdstrike.com>.
Cassandra writes are particularly fast, for a few reasons:

 

1)       Most writes go to a commitlog (append-only file, written linearly, so particularly fast in terms of disk operations) and then pushed to the memTable. Memtable is flushed in batches to the permanent data files, so it buffers many mutations and then does a sequential write to persist that data to disk.

2)       Reads may have to merge data from many data tables on disk. Because the writes (described very briefly in step 1) write to immutable files, updates/deletes have to be merged on read – this is extra effort for the read path.

 

If you don’t do much in terms of overwrites/deletes, and your partitions are particularly small, and your data fits in RAM (probably mmap/page cache of data files, unless you’re using the row cache), reads may be very fast for you. Certainly individual reads on low-merge workloads can be < 0.1ms.

 

-          Jeff

 

From: Vikas Jaiman <er...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Sunday, November 6, 2016 at 12:42 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Are Cassandra writes are faster than reads?

 

Hi all,

 

Are Cassandra writes are faster than reads ?? If yes, why is this so? I am using consistency 1 and data is in memory. 

 

Vikas