You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Peter Schuller <pe...@infidyne.com> on 2011/03/31 18:58:04 UTC

Cassandra documentation (and in this case the datastax anti-entropy docs)

In response to the apparent mass confusion about nodetool repair that
became evidence in the thread:

   http://www.mail-archive.com/user@cassandra.apache.org/msg11755.html

I started looking around to see what is actually claimed about repair.
I found that the Datastax docs:

   http://www.datastax.com/docs/0.7/operations/scheduled_tasks#repair

... uses phrasing which seems very very wrong. It strongly seems to
imply that you should not normally run nodetool repair on a cluster.

First of all, have I completely flown off the handle and completely
and utterly confused myself - is what I say in the E-Mail thread
wrong?

On the assumption that I'm not crazy, I think this is a good time to
talk about documentation. I've been itching for a while about the
state of documentation. There is the ad-hoc wiki, and there is the
Datastax stuff, but neither is really complete.

What I ask myself is how we can achieve the goal that people who are
trying to adopt Cassandra can do so, and use it reliably, without
extensive time spent following mailinglists, JIRA, and trying to keep
track of what's still true and not on the wiki, etc.

This includes challenges like:

* How to actually phrase and structure documentation in an accessible
fashion for people who just want to *use* Cassandra, and not be
knee-deep in the community.

* Try to minimize the amount of "subtle detail" that you have to get
right in order to not have a problem; the very up-to-you-to-fix and
not-very-well-advertised state of 'nodetool repair' is a good example.
Whatever can be done to avoid there even having to *be* documentation
for it, except for people who want to know extra details or are doing
stuff like not having deletes and wanting to avoid repair.

* Keeping the documentation up-to-date.

Do people agree with the goals and the claim that we're not there?
What are good ways to achieve the goals?

I keep feeling the need that there should really be a handbook. The
datastax docs seem to be the right "format" (similarly to the FreeBSD
handbook, which is a good example). But it seems we need something
more agile that people can easily contribute to, while it still can be
kept up-to-date. So what can be done?

Is having a handbook a good idea? The key property of what I call a
handbook is that there is some section on e.g. "Cassandra operations"
that is reasonably long, and that someone can read through from
beginning to end and get a coherent overall view of how things work,
and know the important aspects that must be taken care of in
production clusters.

It's fine if every single little detail and potential kink isn't there
(like long long details about how to calculate memtable thresholds).
But stuff like 'yeah, you need to run nodetool repair at least as
often as X"' is important. So are operational best-practices for
performing operations on a cluster in a safe manner (e.g., moving
nodes, seeds being sensitive, gossip delays, bootstrapping multiple
nodes at once, etc).

I'm not sure how to get there. It's not like I'm *so* motivated and
have *so* much time that if people agree I'll sit down and write 500
pages of Cassandra handbook. So the question is how to achieve
something incrementally that is yet more organized than the wiki.

Thoughts?

-- 
/ Peter Schuller

Re: Cassandra documentation (and in this case the datastax anti-entropy docs)

Posted by Diallo Mamadou Bobo <ex...@gmail.com>.
Hi.
I also believe that wiki are not so good for a community managed wiki even thought it is used as "de facto" tool for opensource projects doc.

I suggest that we use drupal's book module, it is really full featured and easy to use.

I would be happy to do some wireframing and set-up a demo if neccessary.


Envoyé de mon iPhone

Le 1 avr. 2011 à 01:27, Nick Telford <ni...@gmail.com> a écrit :

> I agree that wikis are great for contribution; what I meant was that they're
> rather poor at organising information for ease of discovery, especially by
> new users.
> 
> I still like the idea of some more structured docs being managed by the
> community though.
> 
> On 1 April 2011 02:16, Eric Evans <ee...@rackspace.com> wrote:
> 
>> On Thu, 2011-03-31 at 18:57 +0100, Nick Telford wrote:
>>> I don't think the Wiki is the right place for community maintained
>>> user docs; it doesn't have the necessary structure.
>> 
>> The wiki is great at what wikis are great at, lowering the barrier to
>> contribution.  There is a lot of good stuff (some of it is even
>> translated to other languages!); I'm guessing there would be much less
>> if people had to jump through more hoops.
>> 
>>> Perhaps some generated docs maintained in-tree and hosted somewhere on
>>> cassandra.apache.org might be an idea? This would also enforce some
>>> order over changes made to them as changes would be controlled by
>>> committers and managed through JIRA.
>> 
>> I had this exact idea, I even checked the CQL language documentation
>> into the tree as doc/cql/CQL.textile.  I had expected that to either set
>> a precedent, or to be told to get it out of there, but neither
>> happened. :)
>> 
>> I don't think we need to choose one or the other.  If someone would
>> rather add documentation to the wiki, we should let them (thank them
>> even).  People interested in something maintained with more rigor can
>> invite the wiki peeps to submit patches, and steal their content if they
>> won't!
>> 
>> --
>> Eric Evans
>> eevans@rackspace.com
>> 
>> 

Re: Cassandra documentation (and in this case the datastax anti-entropy docs)

Posted by Nick Telford <ni...@gmail.com>.
I agree that wikis are great for contribution; what I meant was that they're
rather poor at organising information for ease of discovery, especially by
new users.

I still like the idea of some more structured docs being managed by the
community though.

On 1 April 2011 02:16, Eric Evans <ee...@rackspace.com> wrote:

> On Thu, 2011-03-31 at 18:57 +0100, Nick Telford wrote:
> > I don't think the Wiki is the right place for community maintained
> > user docs; it doesn't have the necessary structure.
>
> The wiki is great at what wikis are great at, lowering the barrier to
> contribution.  There is a lot of good stuff (some of it is even
> translated to other languages!); I'm guessing there would be much less
> if people had to jump through more hoops.
>
> > Perhaps some generated docs maintained in-tree and hosted somewhere on
> > cassandra.apache.org might be an idea? This would also enforce some
> > order over changes made to them as changes would be controlled by
> > committers and managed through JIRA.
>
> I had this exact idea, I even checked the CQL language documentation
> into the tree as doc/cql/CQL.textile.  I had expected that to either set
> a precedent, or to be told to get it out of there, but neither
> happened. :)
>
> I don't think we need to choose one or the other.  If someone would
> rather add documentation to the wiki, we should let them (thank them
> even).  People interested in something maintained with more rigor can
> invite the wiki peeps to submit patches, and steal their content if they
> won't!
>
> --
> Eric Evans
> eevans@rackspace.com
>
>

Re: Cassandra documentation (and in this case the datastax anti-entropy docs)

Posted by Eric Evans <ee...@rackspace.com>.
On Thu, 2011-03-31 at 18:57 +0100, Nick Telford wrote:
> I don't think the Wiki is the right place for community maintained
> user docs; it doesn't have the necessary structure. 

The wiki is great at what wikis are great at, lowering the barrier to
contribution.  There is a lot of good stuff (some of it is even
translated to other languages!); I'm guessing there would be much less
if people had to jump through more hoops.

> Perhaps some generated docs maintained in-tree and hosted somewhere on
> cassandra.apache.org might be an idea? This would also enforce some
> order over changes made to them as changes would be controlled by
> committers and managed through JIRA. 

I had this exact idea, I even checked the CQL language documentation
into the tree as doc/cql/CQL.textile.  I had expected that to either set
a precedent, or to be told to get it out of there, but neither
happened. :)

I don't think we need to choose one or the other.  If someone would
rather add documentation to the wiki, we should let them (thank them
even).  People interested in something maintained with more rigor can
invite the wiki peeps to submit patches, and steal their content if they
won't!  

-- 
Eric Evans
eevans@rackspace.com


Re: Cassandra documentation (and in this case the datastax anti-entropy docs)

Posted by Nick Telford <ni...@gmail.com>.
I couldn't agree more, the DataStax docs (try saying that 3 times fast) are
definitely the most complete and user-friendly source for end-users, while
the wiki contains a lot more detailed information on the architecture and
internals.

Ideally, I'd like to see the user docs be in a place that the community can
maintain, although I imagine DataStax would likely keep their own, be it a
mirror or independently maintained.

Another issue I've seen becoming increasingly problematic on the wiki is
versioning. With so many major differences between each major release, it's
important that we properly version the docs by release. Some effort here has
been made on the API page, but there are other pages where this has become a
problem, especially ones pertaining to operations details.

I don't think the Wiki is the right place for community maintained user
docs; it doesn't have the necessary structure. Perhaps some generated docs
maintained in-tree and hosted somewhere on cassandra.apache.org might be an
idea? This would also enforce some order over changes made to them as
changes would be controlled by committers and managed through JIRA.

These are just some ideas I had while reading Peter's post, feel free to
tear them apart if you disagree. Just do it nicely.

Regards,

Nick Telford

On 31 March 2011 17:58, Peter Schuller <pe...@infidyne.com> wrote:

> In response to the apparent mass confusion about nodetool repair that
> became evidence in the thread:
>
>   http://www.mail-archive.com/user@cassandra.apache.org/msg11755.html
>
> I started looking around to see what is actually claimed about repair.
> I found that the Datastax docs:
>
>   http://www.datastax.com/docs/0.7/operations/scheduled_tasks#repair
>
> ... uses phrasing which seems very very wrong. It strongly seems to
> imply that you should not normally run nodetool repair on a cluster.
>
> First of all, have I completely flown off the handle and completely
> and utterly confused myself - is what I say in the E-Mail thread
> wrong?
>
> On the assumption that I'm not crazy, I think this is a good time to
> talk about documentation. I've been itching for a while about the
> state of documentation. There is the ad-hoc wiki, and there is the
> Datastax stuff, but neither is really complete.
>
> What I ask myself is how we can achieve the goal that people who are
> trying to adopt Cassandra can do so, and use it reliably, without
> extensive time spent following mailinglists, JIRA, and trying to keep
> track of what's still true and not on the wiki, etc.
>
> This includes challenges like:
>
> * How to actually phrase and structure documentation in an accessible
> fashion for people who just want to *use* Cassandra, and not be
> knee-deep in the community.
>
> * Try to minimize the amount of "subtle detail" that you have to get
> right in order to not have a problem; the very up-to-you-to-fix and
> not-very-well-advertised state of 'nodetool repair' is a good example.
> Whatever can be done to avoid there even having to *be* documentation
> for it, except for people who want to know extra details or are doing
> stuff like not having deletes and wanting to avoid repair.
>
> * Keeping the documentation up-to-date.
>
> Do people agree with the goals and the claim that we're not there?
> What are good ways to achieve the goals?
>
> I keep feeling the need that there should really be a handbook. The
> datastax docs seem to be the right "format" (similarly to the FreeBSD
> handbook, which is a good example). But it seems we need something
> more agile that people can easily contribute to, while it still can be
> kept up-to-date. So what can be done?
>
> Is having a handbook a good idea? The key property of what I call a
> handbook is that there is some section on e.g. "Cassandra operations"
> that is reasonably long, and that someone can read through from
> beginning to end and get a coherent overall view of how things work,
> and know the important aspects that must be taken care of in
> production clusters.
>
> It's fine if every single little detail and potential kink isn't there
> (like long long details about how to calculate memtable thresholds).
> But stuff like 'yeah, you need to run nodetool repair at least as
> often as X"' is important. So are operational best-practices for
> performing operations on a cluster in a safe manner (e.g., moving
> nodes, seeds being sensitive, gossip delays, bootstrapping multiple
> nodes at once, etc).
>
> I'm not sure how to get there. It's not like I'm *so* motivated and
> have *so* much time that if people agree I'll sit down and write 500
> pages of Cassandra handbook. So the question is how to achieve
> something incrementally that is yet more organized than the wiki.
>
> Thoughts?
>
> --
> / Peter Schuller
>