You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jeff Jirsa <jj...@gmail.com> on 2017/10/16 15:37:13 UTC

Weekly Cassandra Wrap-Up: Oct 16 Edition

I got some feedback last week that I should try this on Monday morning, so
let's see if we can nudge a few people into action this week.

3.0.15 and 3.11.1 are released. This is a dev list, so that shouldn't be a
surprise to anyone here - you should have seen the votes and release
notifications. The people working directly ON Cassandra every day are
probably very aware of the number and nature of fixes in those versions -
if you're not aware, the Change lists are HUGE, and some of the fixes are
VERY IMPORTANT. So this week's wrap-up is really a reflection on the size
of those two release changelogs.

One of the advantages of the Cassandra project is the size of the user base
- I don't know if we have accurate counts (and some of the "surveys" are
laughable), but we know it's on the order of thousands (probably tens of
thousands) of companies, and some huge number of instances (not willing to
speculate here, we know it's at least in the hundreds of thousands, may be
well into the millions). Historically, the best stabilizer of a release was
people upgrading their unusual use cases, finding bugs that the developers
hadn't anticipated (and therefore tests didn't exist for those edge cases),
reporting them, and the next release would be slightly better than the one
before it. The chicken/egg problem here is pretty obvious, and while a lot
of us are spending a lot of time making things better, I want to use this
email to ask a favor (in 3 parts):

1) If you haven't tried 3.0 or 3.11 yet, please spin it up on a test
cluster. 3.11 would be better, 3.0 is ok too. It doesn't need to be a
thousand node cluster, most of the weird stuff we've seen in the post-3.0
world deals with data, not cluster size. Grab some of your prod data if you
can, throw it into a test cluster, add a node/remove a node, tell us if it
doesn't work.
2) Please run a stress workload against that test cluster, even if it's
5-10 minutes. Purpose here is two-fold: like #1, it'll help us find some
edge cases we haven't seen before, but it'll also help us identify holes in
stress coverage. We have some tickets to add UDTs to stress (
https://issues.apache.org/jira/browse/CASSANDRA-13260 ) and LWT (
https://issues.apache.org/jira/browse/CASSANDRA-7960 ). Ideally your stress
profile should be more than "80% reads 20% writes" - try to actually model
your schema and query behavior. Do you use static columns? Do you use
collections?  If you're unable to model your use case because of a
deficiency in stress, open a JIRA. If things break, open a JIRA. If it
works perfectly, I'm interested in seeing your stress yaml and results
(please send it to me privately, don't spam the list).
3) If you're somehow not able to run stress because you don't have hardware
for a spare cluster, profiling your live cluster is also incredibly useful.
TLP has some notes on how to generate flame graphs -
https://github.com/thelastpickle/lightweight-java-profiler - I saw one
example from a cluster that really surprised me. There are versions and use
cases that we know have been heavily profiled, but there are probably
versions and use cases where nobody's ever run much in the way of
profiling. If you're running openjdk in prod, and you're able to SAFELY
attach a profiler to generate some flame graphs, please send those to me
(again, privately please, I don't think the whole list needs a copy).

My hope in all of this is to build up a corpus of real world use cases (and
real current state via profiling) that we can leverage to make testing and
performance better going forward. If I get much in the way of response to
either of these, I'll try to send out a summary in next week's email).

- Jeff

Re: Weekly Cassandra Wrap-Up: Oct 16 Edition

Posted by Jeff Jirsa <jj...@gmail.com>.

Also learned of https://github.com/aragozin/jvm-tools , which can generate
flame graphs easily without requiring a restart with an agent, and works on
openjdk+oracle.



On Mon, Oct 16, 2017 at 8:37 AM, Jeff Jirsa <jj...@gmail.com> wrote:

>
> I got some feedback last week that I should try this on Monday morning, so
> let's see if we can nudge a few people into action this week.
>
> 3.0.15 and 3.11.1 are released. This is a dev list, so that shouldn't be a
> surprise to anyone here - you should have seen the votes and release
> notifications. The people working directly ON Cassandra every day are
> probably very aware of the number and nature of fixes in those versions -
> if you're not aware, the Change lists are HUGE, and some of the fixes are
> VERY IMPORTANT. So this week's wrap-up is really a reflection on the size
> of those two release changelogs.
>
> One of the advantages of the Cassandra project is the size of the user
> base - I don't know if we have accurate counts (and some of the "surveys"
> are laughable), but we know it's on the order of thousands (probably tens
> of thousands) of companies, and some huge number of instances (not willing
> to speculate here, we know it's at least in the hundreds of thousands, may
> be well into the millions). Historically, the best stabilizer of a release
> was people upgrading their unusual use cases, finding bugs that the
> developers hadn't anticipated (and therefore tests didn't exist for those
> edge cases), reporting them, and the next release would be slightly better
> than the one before it. The chicken/egg problem here is pretty obvious, and
> while a lot of us are spending a lot of time making things better, I want
> to use this email to ask a favor (in 3 parts):
>
> 1) If you haven't tried 3.0 or 3.11 yet, please spin it up on a test
> cluster. 3.11 would be better, 3.0 is ok too. It doesn't need to be a
> thousand node cluster, most of the weird stuff we've seen in the post-3.0
> world deals with data, not cluster size. Grab some of your prod data if you
> can, throw it into a test cluster, add a node/remove a node, tell us if it
> doesn't work.
> 2) Please run a stress workload against that test cluster, even if it's
> 5-10 minutes. Purpose here is two-fold: like #1, it'll help us find some
> edge cases we haven't seen before, but it'll also help us identify holes in
> stress coverage. We have some tickets to add UDTs to stress (
> https://issues.apache.org/jira/browse/CASSANDRA-13260 ) and LWT (
> https://issues.apache.org/jira/browse/CASSANDRA-7960 ). Ideally your
> stress profile should be more than "80% reads 20% writes" - try to actually
> model your schema and query behavior. Do you use static columns? Do you use
> collections?  If you're unable to model your use case because of a
> deficiency in stress, open a JIRA. If things break, open a JIRA. If it
> works perfectly, I'm interested in seeing your stress yaml and results
> (please send it to me privately, don't spam the list).
> 3) If you're somehow not able to run stress because you don't have
> hardware for a spare cluster, profiling your live cluster is also
> incredibly useful. TLP has some notes on how to generate flame graphs -
> https://github.com/thelastpickle/lightweight-java-profiler - I saw one
> example from a cluster that really surprised me. There are versions and use
> cases that we know have been heavily profiled, but there are probably
> versions and use cases where nobody's ever run much in the way of
> profiling. If you're running openjdk in prod, and you're able to SAFELY
> attach a profiler to generate some flame graphs, please send those to me
> (again, privately please, I don't think the whole list needs a copy).
>
> My hope in all of this is to build up a corpus of real world use cases
> (and real current state via profiling) that we can leverage to make testing
> and performance better going forward. If I get much in the way of response
> to either of these, I'll try to send out a summary in next week's email).
>
> - Jeff
>
>
>

Re: Weekly Cassandra Wrap-Up: Oct 16 Edition

Posted by Lucas Benevides <lu...@maurobenevides.com.br>.

Hello Pedro,

You can see two of mine there https://github.com/lucasbenevides/dtcs_vs_twcs
 .
It was based on Ben Slater's post
<https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/> on
Cassandra Stress Tool.

Lucas Benevides

2017-11-23 9:18 GMT-02:00 Pedro Gordo <pe...@gmail.com>:

> Hi Jon
>
> I'm looking to create some more stress profiles, but I would like to see a
> couple of stress profiles from someone with more experience first. Can you
> please send a link to this repo?
>
> Pedro Gordo
>
> On 16 October 2017 at 19:09, Jon Haddad <jo...@jonhaddad.com> wrote:
>
> > Regarding the stress tests, if you’re willing to share, I’m starting a
> > repo where we can keep a bunch of different stress profiles.  I’d like to
> > start running them on releases before we agree to push them out.  If
> anyone
> > has a stress test they are willing to share, please get in touch with me!
> >
> >
> >
> > > On Oct 16, 2017, at 8:37 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> > >
> > > I got some feedback last week that I should try this on Monday morning,
> > so
> > > let's see if we can nudge a few people into action this week.
> > >
> > > 3.0.15 and 3.11.1 are released. This is a dev list, so that shouldn't
> be
> > a
> > > surprise to anyone here - you should have seen the votes and release
> > > notifications. The people working directly ON Cassandra every day are
> > > probably very aware of the number and nature of fixes in those
> versions -
> > > if you're not aware, the Change lists are HUGE, and some of the fixes
> are
> > > VERY IMPORTANT. So this week's wrap-up is really a reflection on the
> size
> > > of those two release changelogs.
> > >
> > > One of the advantages of the Cassandra project is the size of the user
> > base
> > > - I don't know if we have accurate counts (and some of the "surveys"
> are
> > > laughable), but we know it's on the order of thousands (probably tens
> of
> > > thousands) of companies, and some huge number of instances (not willing
> > to
> > > speculate here, we know it's at least in the hundreds of thousands, may
> > be
> > > well into the millions). Historically, the best stabilizer of a release
> > was
> > > people upgrading their unusual use cases, finding bugs that the
> > developers
> > > hadn't anticipated (and therefore tests didn't exist for those edge
> > cases),
> > > reporting them, and the next release would be slightly better than the
> > one
> > > before it. The chicken/egg problem here is pretty obvious, and while a
> > lot
> > > of us are spending a lot of time making things better, I want to use
> this
> > > email to ask a favor (in 3 parts):
> > >
> > > 1) If you haven't tried 3.0 or 3.11 yet, please spin it up on a test
> > > cluster. 3.11 would be better, 3.0 is ok too. It doesn't need to be a
> > > thousand node cluster, most of the weird stuff we've seen in the
> post-3.0
> > > world deals with data, not cluster size. Grab some of your prod data if
> > you
> > > can, throw it into a test cluster, add a node/remove a node, tell us if
> > it
> > > doesn't work.
> > > 2) Please run a stress workload against that test cluster, even if it's
> > > 5-10 minutes. Purpose here is two-fold: like #1, it'll help us find
> some
> > > edge cases we haven't seen before, but it'll also help us identify
> holes
> > in
> > > stress coverage. We have some tickets to add UDTs to stress (
> > > https://issues.apache.org/jira/browse/CASSANDRA-13260 ) and LWT (
> > > https://issues.apache.org/jira/browse/CASSANDRA-7960 ). Ideally your
> > stress
> > > profile should be more than "80% reads 20% writes" - try to actually
> > model
> > > your schema and query behavior. Do you use static columns? Do you use
> > > collections?  If you're unable to model your use case because of a
> > > deficiency in stress, open a JIRA. If things break, open a JIRA. If it
> > > works perfectly, I'm interested in seeing your stress yaml and results
> > > (please send it to me privately, don't spam the list).
> > > 3) If you're somehow not able to run stress because you don't have
> > hardware
> > > for a spare cluster, profiling your live cluster is also incredibly
> > useful.
> > > TLP has some notes on how to generate flame graphs -
> > > https://github.com/thelastpickle/lightweight-java-profiler - I saw one
> > > example from a cluster that really surprised me. There are versions and
> > use
> > > cases that we know have been heavily profiled, but there are probably
> > > versions and use cases where nobody's ever run much in the way of
> > > profiling. If you're running openjdk in prod, and you're able to SAFELY
> > > attach a profiler to generate some flame graphs, please send those to
> me
> > > (again, privately please, I don't think the whole list needs a copy).
> > >
> > > My hope in all of this is to build up a corpus of real world use cases
> > (and
> > > real current state via profiling) that we can leverage to make testing
> > and
> > > performance better going forward. If I get much in the way of response
> to
> > > either of these, I'll try to send out a summary in next week's email).
> > >
> > > - Jeff
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: Weekly Cassandra Wrap-Up: Oct 16 Edition

Posted by Pedro Gordo <pe...@gmail.com>.

Hi Jon

I'm looking to create some more stress profiles, but I would like to see a
couple of stress profiles from someone with more experience first. Can you
please send a link to this repo?

Pedro Gordo

On 16 October 2017 at 19:09, Jon Haddad <jo...@jonhaddad.com> wrote:

> Regarding the stress tests, if you’re willing to share, I’m starting a
> repo where we can keep a bunch of different stress profiles.  I’d like to
> start running them on releases before we agree to push them out.  If anyone
> has a stress test they are willing to share, please get in touch with me!
>
>
>
> > On Oct 16, 2017, at 8:37 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> >
> > I got some feedback last week that I should try this on Monday morning,
> so
> > let's see if we can nudge a few people into action this week.
> >
> > 3.0.15 and 3.11.1 are released. This is a dev list, so that shouldn't be
> a
> > surprise to anyone here - you should have seen the votes and release
> > notifications. The people working directly ON Cassandra every day are
> > probably very aware of the number and nature of fixes in those versions -
> > if you're not aware, the Change lists are HUGE, and some of the fixes are
> > VERY IMPORTANT. So this week's wrap-up is really a reflection on the size
> > of those two release changelogs.
> >
> > One of the advantages of the Cassandra project is the size of the user
> base
> > - I don't know if we have accurate counts (and some of the "surveys" are
> > laughable), but we know it's on the order of thousands (probably tens of
> > thousands) of companies, and some huge number of instances (not willing
> to
> > speculate here, we know it's at least in the hundreds of thousands, may
> be
> > well into the millions). Historically, the best stabilizer of a release
> was
> > people upgrading their unusual use cases, finding bugs that the
> developers
> > hadn't anticipated (and therefore tests didn't exist for those edge
> cases),
> > reporting them, and the next release would be slightly better than the
> one
> > before it. The chicken/egg problem here is pretty obvious, and while a
> lot
> > of us are spending a lot of time making things better, I want to use this
> > email to ask a favor (in 3 parts):
> >
> > 1) If you haven't tried 3.0 or 3.11 yet, please spin it up on a test
> > cluster. 3.11 would be better, 3.0 is ok too. It doesn't need to be a
> > thousand node cluster, most of the weird stuff we've seen in the post-3.0
> > world deals with data, not cluster size. Grab some of your prod data if
> you
> > can, throw it into a test cluster, add a node/remove a node, tell us if
> it
> > doesn't work.
> > 2) Please run a stress workload against that test cluster, even if it's
> > 5-10 minutes. Purpose here is two-fold: like #1, it'll help us find some
> > edge cases we haven't seen before, but it'll also help us identify holes
> in
> > stress coverage. We have some tickets to add UDTs to stress (
> > https://issues.apache.org/jira/browse/CASSANDRA-13260 ) and LWT (
> > https://issues.apache.org/jira/browse/CASSANDRA-7960 ). Ideally your
> stress
> > profile should be more than "80% reads 20% writes" - try to actually
> model
> > your schema and query behavior. Do you use static columns? Do you use
> > collections?  If you're unable to model your use case because of a
> > deficiency in stress, open a JIRA. If things break, open a JIRA. If it
> > works perfectly, I'm interested in seeing your stress yaml and results
> > (please send it to me privately, don't spam the list).
> > 3) If you're somehow not able to run stress because you don't have
> hardware
> > for a spare cluster, profiling your live cluster is also incredibly
> useful.
> > TLP has some notes on how to generate flame graphs -
> > https://github.com/thelastpickle/lightweight-java-profiler - I saw one
> > example from a cluster that really surprised me. There are versions and
> use
> > cases that we know have been heavily profiled, but there are probably
> > versions and use cases where nobody's ever run much in the way of
> > profiling. If you're running openjdk in prod, and you're able to SAFELY
> > attach a profiler to generate some flame graphs, please send those to me
> > (again, privately please, I don't think the whole list needs a copy).
> >
> > My hope in all of this is to build up a corpus of real world use cases
> (and
> > real current state via profiling) that we can leverage to make testing
> and
> > performance better going forward. If I get much in the way of response to
> > either of these, I'll try to send out a summary in next week's email).
> >
> > - Jeff
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Weekly Cassandra Wrap-Up: Oct 16 Edition

Posted by Jon Haddad <jo...@jonhaddad.com>.

Regarding the stress tests, if you’re willing to share, I’m starting a repo where we can keep a bunch of different stress profiles.  I’d like to start running them on releases before we agree to push them out.  If anyone has a stress test they are willing to share, please get in touch with me!



> On Oct 16, 2017, at 8:37 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> I got some feedback last week that I should try this on Monday morning, so
> let's see if we can nudge a few people into action this week.
> 
> 3.0.15 and 3.11.1 are released. This is a dev list, so that shouldn't be a
> surprise to anyone here - you should have seen the votes and release
> notifications. The people working directly ON Cassandra every day are
> probably very aware of the number and nature of fixes in those versions -
> if you're not aware, the Change lists are HUGE, and some of the fixes are
> VERY IMPORTANT. So this week's wrap-up is really a reflection on the size
> of those two release changelogs.
> 
> One of the advantages of the Cassandra project is the size of the user base
> - I don't know if we have accurate counts (and some of the "surveys" are
> laughable), but we know it's on the order of thousands (probably tens of
> thousands) of companies, and some huge number of instances (not willing to
> speculate here, we know it's at least in the hundreds of thousands, may be
> well into the millions). Historically, the best stabilizer of a release was
> people upgrading their unusual use cases, finding bugs that the developers
> hadn't anticipated (and therefore tests didn't exist for those edge cases),
> reporting them, and the next release would be slightly better than the one
> before it. The chicken/egg problem here is pretty obvious, and while a lot
> of us are spending a lot of time making things better, I want to use this
> email to ask a favor (in 3 parts):
> 
> 1) If you haven't tried 3.0 or 3.11 yet, please spin it up on a test
> cluster. 3.11 would be better, 3.0 is ok too. It doesn't need to be a
> thousand node cluster, most of the weird stuff we've seen in the post-3.0
> world deals with data, not cluster size. Grab some of your prod data if you
> can, throw it into a test cluster, add a node/remove a node, tell us if it
> doesn't work.
> 2) Please run a stress workload against that test cluster, even if it's
> 5-10 minutes. Purpose here is two-fold: like #1, it'll help us find some
> edge cases we haven't seen before, but it'll also help us identify holes in
> stress coverage. We have some tickets to add UDTs to stress (
> https://issues.apache.org/jira/browse/CASSANDRA-13260 ) and LWT (
> https://issues.apache.org/jira/browse/CASSANDRA-7960 ). Ideally your stress
> profile should be more than "80% reads 20% writes" - try to actually model
> your schema and query behavior. Do you use static columns? Do you use
> collections?  If you're unable to model your use case because of a
> deficiency in stress, open a JIRA. If things break, open a JIRA. If it
> works perfectly, I'm interested in seeing your stress yaml and results
> (please send it to me privately, don't spam the list).
> 3) If you're somehow not able to run stress because you don't have hardware
> for a spare cluster, profiling your live cluster is also incredibly useful.
> TLP has some notes on how to generate flame graphs -
> https://github.com/thelastpickle/lightweight-java-profiler - I saw one
> example from a cluster that really surprised me. There are versions and use
> cases that we know have been heavily profiled, but there are probably
> versions and use cases where nobody's ever run much in the way of
> profiling. If you're running openjdk in prod, and you're able to SAFELY
> attach a profiler to generate some flame graphs, please send those to me
> (again, privately please, I don't think the whole list needs a copy).
> 
> My hope in all of this is to build up a corpus of real world use cases (and
> real current state via profiling) that we can leverage to make testing and
> performance better going forward. If I get much in the way of response to
> either of these, I'll try to send out a summary in next week's email).
> 
> - Jeff


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org