You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rahul Singh <ra...@gmail.com> on 2019/04/01 19:44:54 UTC

Re: Five Questions for Cassandra Users

Answers inline.


1.       Do the same people where you work operate the cluster and write
the code to develop the application?


No but the operators need to know development , data-modeling, and
generally how to "code" the application. (Coding is a low-level task of
assigning a code to a concept.. so I don't think that's the proper verb in
these scenarios.. engineering, or software development, or even programing
is a better term). It's because the developers are hired dime a dozen at
the B / C level and then replaced by D /E / F level developers as things go
on.. so the Data team eventually ends up being the expert of the
application and the data platform, and a "Center of Excellence" for the
development / architects to work with on a collaborative basis.



2.       Do you have a metrics stack that allows you to see graphs of
various metrics with all the nodes displayed together?



Yes. OpsCenter, ELK, Grafana, custom node data visualizers in excel
(because lines and charts don't tell you everything)


3.       Do you have a log stack that allows you to see the logs for all
the nodes together?

ELK. CloudWatch


4.       Do you regularly repair your clusters - such as by using Reaper?

 Depends. Cron, Reaper, OpsCenter Repair, and now NodeSync


5.       Do you use artificial intelligence to help manage your clusters?


Yes, I actually have made an artificial general intelligence called
Gravitron. It learns by ingesting all the news articles I aggregate about
Cassandra and the links I curate on cassandra.link into a solr/lucene index
and then using clustering find out the most popular and popularly connected
content. Once it does that there's a summarization of the content into
human readable content as well as interpreted bash code that gets pushed
into a "Recipe Book." As the master operator identifies scenarios using
english language, and then runs the bash commands, the machine slowly but
surely "wakes up" and starts to manage itself. It can also play Go , the
game, and beat IBM's AlphaGo at Go, and Donald Trump at golf while he was
cheating!



rahul.xavier.singh@gmail.com

http://cassandra.link

I'm speaking at #DataStaxAccelerate, the world’s premiere #ApacheCassandra
conference, and I want to see you there! Use my code Singh50 for 50% off
your registration. www.datastax.com/accelerate

































Happy april fools day.





On Thu, Mar 28, 2019 at 5:03 AM Kenneth Brotman
<ke...@yahoo.com.invalid> wrote:

> I’m looking to get a better feel for how people use Cassandra in
> practice.  I thought others would benefit as well so may I ask you the
> following five questions:
>
>
>
> 1.       Do the same people where you work operate the cluster and write
> the code to develop the application?
>
>
>
> 2.       Do you have a metrics stack that allows you to see graphs of
> various metrics with all the nodes displayed together?
>
>
>
> 3.       Do you have a log stack that allows you to see the logs for all
> the nodes together?
>
>
>
> 4.       Do you regularly repair your clusters - such as by using Reaper?
>
>
>
> 5.       Do you use artificial intelligence to help manage your clusters?
>
>
>
>
>
> Thank you for taking your time to share this information!
>
>
>
> Kenneth Brotman
>

Re: Five Questions for Cassandra Users

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hello,

I'm no longer operating "my" own cluster, but now doing consulting for TLP.
Here is what is would say with my own experience:

1.       Do the same people where you work operate the cluster and write
> the code to develop the application?
>

It's not the same set of skills that is required to operate and to use the
driver, write query and develop the needed code.
Some people have all the requested skills, yet the amount of work one can
do in a day is limited anyway.

Some thoughts:
- The design/model should always be done with all the people involved in
this feature/project.
--- Operators will know about the best practices and will assume the
responsibility of having it working ultimately. They are the fire
extinguishers and should help building the house, because they know what
will burn and what is reliable.
--- Devs are the most qualified to build the code, interact with the
drivers and potentially write the code (and tests) needed to query
Cassandra in the 'right way', as defined together with operators.
--- Lawyers/Legal teams can help answer questions around the TTL to use
(Time To Live). Not many data are requested to live forever, and setting a
TTL is a good way to keep the data size under control.
--- If the same person cares of both DEV/OPS (start-ups, small teams
generally), it's good for this team to be at 2+ people big. One alone
cannot exchange ideas, nor be up 24/7...

- A team of operators, that just knows the basic can do level 1 support if
procedures are well documented and if the proper tooling is in place. There
is a fair amount of repetitive work, many times where the 'protocol' is the
same one to react to X or Y. Ultimately, they can escalate to the people
who are responsible for the cluster.


> 2.       Do you have a metrics stack that allows you to see graphs of
> various metrics with all the nodes displayed together?
>

I definitely recommend and advocate for this. It is the best way to get a
feeling of the health of. your cluster at first sight. To understand the
patterns, the bottlenecks, to see the impacts of optimisations  and to
diagnose issues efficiently.

We built Datadog default dashboards to help people using Datadog to monitor
their Cassandra clusters. The release post is here:
http://thelastpickle.com/blog/2017/12/05/datadog-tlp-dashboards.html
Also if you prefer videos, here is what I think about why, what and how to
monitor: https://www.youtube.com/watch?v=Q9AAR4UQzMk

If you're not using Datadog, they are Grafana dashboards available and
prometheus metric exporters as well.
- "grafana cassandra dashboards
<https://www.google.com/search?client=safari&rls=en&q=grafana+cassandra+dashboards&ie=UTF-8&oe=UTF-8>"
on any search engine should give you a few options
- https://github.com/instaclustr/cassandra-exporter

3.       Do you have a log stack that allows you to see the logs for all
> the nodes together?
>

I would say it's a 'should-have' by opposition to a good monitoring system
for example that is a 'must-have'. I never had one or really used one,
despite the fact that as a consultant I worked on multiple clusters.

If you have it in place for other services, then maybe just plug in C*
nodes as well. It will help you if a machine becomes completely unreachable
or to easily aggregate and make statistics for the whole cluster. it can be
really nice. Then just be aware of the amount of logs that Cassandra
generates, the debug level you want to have and think about the appropriate
retention policy.

But it's definitely not the first thing I would care about, as tools allow
you to query from all nodes through ssh to get information about each node.
Or you can always jump on a faulty node.


> 4.       Do you regularly repair your clusters - such as by using Reaper?
>

Most of the people do I believe, one way or another. With cron, house built
tools, reaper, oss scripts to handle "range repairs".
It is not mandatory as long as you do not delete data. It's maybe not
needed if you use strong consistency. I always like to do it regularly.
I like to think that my nodes are having the same data, that entropy is as
low as possible. It always worked well for me, making me more confident
when operating the cluster (moving token ranges, removing forcefully a
node, etc) and I did not lose data in 6 years (apart from counters, but
they were known to be not 'accurate' not to say 'broken' already by then)
and despite the fact I started with C*0.8 (and fresh first counters
implementation yay!).
I would keep routine repairs as a good practice when it is useful (deletes,
not strongly consistent read) but also when it theory it's not needed, to
help keeping the data where it belongs, and despite Cassandra is now pretty
resilient.

Yet some people are doing perfectly fine... until they run a first repair!
Be sure to read about it before. With default number of vnodes and default
repair options in older versions of cassandra, you could really harm your
cluster.


> 5.       Do you use artificial intelligence to help manage your clusters?



So far I only used "human intelligence" (mine, the collective one from this
mailing list and my colleague's one - really often) to manage my and other
people clusters ;-).

But there's a small part what I do that I could trust a machine to do for
me, and better than I would do. There is a lot of tools out there that do
"things" for us (Reaper, OpsCenter, in-house shared/oss tools, Netflix
opens a lot of tools for many years, but also dashboards that you just have
to plug, etc) that bring some intelligence from other people who face, not
real IA per se as it won't learn by itself, maybe. Also, there is an
ongoing work to make operating Cassandra greater, search for "management
tool" off the top of my head...

I never used any IA to help managing my clusters. The closest thing I had
installed at some point was the OpsCenter adviser once, multiple years ago.
But I was knowing more than 'it' (the IA) about Cassandra by then ;-). I
never had the chance to see a really great IA that would actually help me
with cluster management. Thus I see the interest of some tools to help
people managing their cluster.

Other alternatives are fully managed Cassandra clusters services if that's
of interest for you, using the mailing list (as you did), working with
consultant is another option (but I could be a bit biased recommending you
to work with consultants ;-)).

Hope some of it helps (I always write too much ¯\_(ツ)_/¯),

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com




Le lun. 1 avr. 2019 à 20:45, Rahul Singh <ra...@gmail.com> a
écrit :

> Answers inline.
>
>
> 1.       Do the same people where you work operate the cluster and write
> the code to develop the application?
>
>
> No but the operators need to know development , data-modeling, and
> generally how to "code" the application. (Coding is a low-level task of
> assigning a code to a concept.. so I don't think that's the proper verb in
> these scenarios.. engineering, or software development, or even programing
> is a better term). It's because the developers are hired dime a dozen at
> the B / C level and then replaced by D /E / F level developers as things go
> on.. so the Data team eventually ends up being the expert of the
> application and the data platform, and a "Center of Excellence" for the
> development / architects to work with on a collaborative basis.
>
>
>
> 2.       Do you have a metrics stack that allows you to see graphs of
> various metrics with all the nodes displayed together?
>
>
>
> Yes. OpsCenter, ELK, Grafana, custom node data visualizers in excel
> (because lines and charts don't tell you everything)
>
>
> 3.       Do you have a log stack that allows you to see the logs for all
> the nodes together?
>
> ELK. CloudWatch
>
>
> 4.       Do you regularly repair your clusters - such as by using Reaper?
>
>  Depends. Cron, Reaper, OpsCenter Repair, and now NodeSync
>
>
> 5.       Do you use artificial intelligence to help manage your clusters?
>
>
> Yes, I actually have made an artificial general intelligence called
> Gravitron. It learns by ingesting all the news articles I aggregate about
> Cassandra and the links I curate on cassandra.link into a solr/lucene index
> and then using clustering find out the most popular and popularly connected
> content. Once it does that there's a summarization of the content into
> human readable content as well as interpreted bash code that gets pushed
> into a "Recipe Book." As the master operator identifies scenarios using
> english language, and then runs the bash commands, the machine slowly but
> surely "wakes up" and starts to manage itself. It can also play Go , the
> game, and beat IBM's AlphaGo at Go, and Donald Trump at golf while he was
> cheating!
>
>
>
> rahul.xavier.singh@gmail.com
>
> http://cassandra.link
>
> I'm speaking at #DataStaxAccelerate, the world’s premiere #ApacheCassandra
> conference, and I want to see you there! Use my code Singh50 for 50% off
> your registration. www.datastax.com/accelerate
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Happy april fools day.
>
>
>
>
>
> On Thu, Mar 28, 2019 at 5:03 AM Kenneth Brotman
> <ke...@yahoo.com.invalid> wrote:
>
>> I’m looking to get a better feel for how people use Cassandra in
>> practice.  I thought others would benefit as well so may I ask you the
>> following five questions:
>>
>>
>>
>> 1.       Do the same people where you work operate the cluster and write
>> the code to develop the application?
>>
>>
>>
>> 2.       Do you have a metrics stack that allows you to see graphs of
>> various metrics with all the nodes displayed together?
>>
>>
>>
>> 3.       Do you have a log stack that allows you to see the logs for all
>> the nodes together?
>>
>>
>>
>> 4.       Do you regularly repair your clusters - such as by using Reaper?
>>
>>
>>
>> 5.       Do you use artificial intelligence to help manage your clusters?
>>
>>
>>
>>
>>
>> Thank you for taking your time to share this information!
>>
>>
>>
>> Kenneth Brotman
>>
>