You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Jaydeep Chovatia <ch...@gmail.com> on 2018/06/20 04:33:17 UTC

Real time bad query logging framework in C*

Hi,

We have worked on developing some common framework to detect/log
anti-patterns/bad queries in Cassandra. Target for this effort would be
to reduce burden on ops to handle Cassandra at large scale, as well as
help beginners to quickly identify performance problems with the Cassandra.
Initially we wanted to try out to make sure it really works and provides
value. we've opened JIRA with all the details. Would you please review and
provide your feedback on this effort?
https://issues.apache.org/jira/browse/CASSANDRA-14527


Thank You!!!


Jaydeep

Re: Real time bad query logging framework in C*

Posted by Jaydeep Chovatia <ch...@gmail.com>.
Thanks Stefan for reviewing this, please find my comments inline:


>We already provide tons of metrics and provide some useful logging (e.g.
when reading too many tombstones), but I think we should still be able to
implement further >checks in-code that highlight potentially issues. Maybe
we could >really use a framework for that, I don't know.


I agree, Cassandra already has details coming out as part of metrics,
logging (like tombstones), etc.

Current log messages for (tombstone messages, large partition message, slow
query messages, etc.) are very useful, but one important aspect missing
here is, all of these are trying to solve same problem but they are
implemented on their own (at different times) and as a result it has
duplicate code and lacks important things like changing threshold w/o
restart, commonality among log messages, have different interface so that
users can consume differently, etc. If we look at this new effort then it
is just making them common so we have a common way of doing the things in
Cassandra with more features like change threshold at runtime, commonality
in log messages, user can consume differently, etc.


>If you followed the discussions a while ago, we also talked about moving some
of the code out of Cassandra into side-car processes. Although this will
likely not >manifest for 4.0, most of the devs seem to be fond of the idea
and so am I.


I agree that side-car is very useful project but in my opinion it will be
difficult to get internal details out in realtime without modifying
Cassandra.


>Not wanting to derail this discussion (about your proposed solution), but
let me just briefly mention that I've been working on some related approach
(diagnostic events, >CASSANDRA-12944), which would allow to expose internal
events to external processes that would be able to analyze these events,
alert users, or event act on them. >It's a different approach from what
you're suggesting, but just wanted to mention this and maybe you'd agree
that having external processes for monitoring Cassandra >has some
advantages.


Thanks for sharing this, this is really useful feature and will make
operational aspect even more easy.

If we look at my proposed then it is just picking low hanging fruit, in
other words it is just rearchitecting existing logs messages like
(tombstone messages, large partition message, slow query messages, etc.)
and adding few more in generic way with more features like (one can
threshold at runtime, commonality in log messages, user can consume
differently, etc.). Idea here is we make it a framework to report these
type of messages so that all the messages (existing + new ones) will have
similarity among them.



On Wed, Jun 20, 2018 at 1:35 AM Stefan Podkowinski <sp...@apache.org> wrote:

> Jaydeep, thanks for taking this discussion to the dev list. I think it's
> the best place to introduce new idea, discuss them in general and how
> they potentially fit in. As already mention in the ticket, I do share
> your assessment that we should try to improve making operational issue
> more visible to users. We already provide tons of metrics and provide
> some useful logging (e.g. when reading too many tombstones), but I think
> we should still be able to implement further checks in-code that
> highlight potentially issues. Maybe we could really use a framework for
> that, I don't know.
>
> If you followed the discussions a while ago, we also talked about moving
> some of the code out of Cassandra into side-car processes. Although this
> will likely not manifest for 4.0, most of the devs seem to be fond of
> the idea and so am I. Not wanting to derail this discussion (about your
> proposed solution), but let me just briefly mention that I've been
> working on some related approach (diagnostic events, CASSANDRA-12944),
> which would allow to expose internal events to external processes that
> would be able to analyze these events, alert users, or event act on
> them. It's a different approach from what you're suggesting, but just
> wanted to mention this and maybe you'd agree that having external
> processes for monitoring Cassandra has some advantages.
>
>
>
> On 20.06.2018 06:33, Jaydeep Chovatia wrote:
> > Hi,
> >
> > We have worked on developing some common framework to detect/log
> > anti-patterns/bad queries in Cassandra. Target for this effort would be
> > to reduce burden on ops to handle Cassandra at large scale, as well as
> > help beginners to quickly identify performance problems with the
> Cassandra.
> > Initially we wanted to try out to make sure it really works and provides
> > value. we've opened JIRA with all the details. Would you please review
> and
> > provide your feedback on this effort?
> > https://issues.apache.org/jira/browse/CASSANDRA-14527
> >
> >
> > Thank You!!!
> >
> >
> > Jaydeep
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Real time bad query logging framework in C*

Posted by Stefan Podkowinski <sp...@apache.org>.
Jaydeep, thanks for taking this discussion to the dev list. I think it's
the best place to introduce new idea, discuss them in general and how
they potentially fit in. As already mention in the ticket, I do share
your assessment that we should try to improve making operational issue
more visible to users. We already provide tons of metrics and provide
some useful logging (e.g. when reading too many tombstones), but I think
we should still be able to implement further checks in-code that
highlight potentially issues. Maybe we could really use a framework for
that, I don't know.

If you followed the discussions a while ago, we also talked about moving
some of the code out of Cassandra into side-car processes. Although this
will likely not manifest for 4.0, most of the devs seem to be fond of
the idea and so am I. Not wanting to derail this discussion (about your
proposed solution), but let me just briefly mention that I've been
working on some related approach (diagnostic events, CASSANDRA-12944),
which would allow to expose internal events to external processes that
would be able to analyze these events, alert users, or event act on
them. It's a different approach from what you're suggesting, but just
wanted to mention this and maybe you'd agree that having external
processes for monitoring Cassandra has some advantages.



On 20.06.2018 06:33, Jaydeep Chovatia wrote:
> Hi,
> 
> We have worked on developing some common framework to detect/log
> anti-patterns/bad queries in Cassandra. Target for this effort would be
> to reduce burden on ops to handle Cassandra at large scale, as well as
> help beginners to quickly identify performance problems with the Cassandra.
> Initially we wanted to try out to make sure it really works and provides
> value. we've opened JIRA with all the details. Would you please review and
> provide your feedback on this effort?
> https://issues.apache.org/jira/browse/CASSANDRA-14527
> 
> 
> Thank You!!!
> 
> 
> Jaydeep
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org