You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anshu Vajpayee <an...@gmail.com> on 2016/09/09 01:53:29 UTC

Partition size

Is there any way to get partition size for a  partition key ?

Re: Partition size

Posted by Jeff Jirsa <jj...@apache.org>.

On 2016-09-12 10:17 (-0700), Anshu Vajpayee <an...@gmail.com> wrote: 
> Thanks Jeff.  I got the answer now.
> Is there any way to put guardrail  to avoid large partition from cassandra
> side?  I know it is modeling problem and cassandra writes warning on
> system. log for large partition.  But I think there should be a way to put
> restriction for it from Cassandra side.

Perhaps not surprisingly, folks active in the other ticket (for determining partition size) also have a ticket to blacklist large partitions:

https://issues.apache.org/jira/browse/CASSANDRA-12106

Again, not complete, but it's an active topic of discussion and may appear in future versions. In the mean time, having your application maintain a list of 'blacklisted' partitions may be a suitable workaround.




Re: Partition size

Posted by Jeremy Hanna <je...@gmail.com>.
Generally if you foresee the partitions getting out of control in terms of size, a method often employed is to bucket according to some criteria.  For example, if I have a time series use case, I might bucket by month or week.  That presumes you can foresee it though.  As far as limiting that capability, I can see that being in the ballpark of https://issues.apache.org/jira/browse/CASSANDRA-8303 <https://issues.apache.org/jira/browse/CASSANDRA-8303> but a bit trickier than the limits mentioned in that ticket.

> On Sep 12, 2016, at 12:17 PM, Anshu Vajpayee <an...@gmail.com> wrote:
> 
> Thanks Jeff.  I got the answer now. 
> Is there any way to put guardrail  to avoid large partition from cassandra side?  I know it is modeling problem and cassandra writes warning on system. log for large partition.  But I think there should be a way to put restriction for it from Cassandra side. 
> 
> On 12 Sep 2016 9:50 p.m., "Jeff Jirsa" <jjirsa@apache.org <ma...@apache.org>> wrote:
> On 2016-09-08 18:53 (-0700), Anshu Vajpayee <anshu.vajpayee@gmail.com <ma...@gmail.com>> wrote:
> > Is there any way to get partition size for a  partition key ?
> >
> 
> Anshu,
> 
> The simple answer to your question is that it is not currently possible to get a partition size for an arbitrary key without quite a lot of work (basically you'd have to write a tool that iterated over the data on disk, which is nontrivial).
> 
> There exists a ticket to expose this: https://issues.apache.org/jira/browse/CASSANDRA-12367 <https://issues.apache.org/jira/browse/CASSANDRA-12367>
> 
> It's not clear when that ticket will land, but I expect you'll see an API for getting the size of a partition key in the near future.
> 
> 


Re: Partition size

Posted by Anshu Vajpayee <an...@gmail.com>.
Thanks Jeff.  I got the answer now.
Is there any way to put guardrail  to avoid large partition from cassandra
side?  I know it is modeling problem and cassandra writes warning on
system. log for large partition.  But I think there should be a way to put
restriction for it from Cassandra side.
On 12 Sep 2016 9:50 p.m., "Jeff Jirsa" <jj...@apache.org> wrote:

> On 2016-09-08 18:53 (-0700), Anshu Vajpayee <an...@gmail.com>
> wrote:
> > Is there any way to get partition size for a  partition key ?
> >
>
> Anshu,
>
> The simple answer to your question is that it is not currently possible to
> get a partition size for an arbitrary key without quite a lot of work
> (basically you'd have to write a tool that iterated over the data on disk,
> which is nontrivial).
>
> There exists a ticket to expose this: https://issues.apache.org/
> jira/browse/CASSANDRA-12367
>
> It's not clear when that ticket will land, but I expect you'll see an API
> for getting the size of a partition key in the near future.
>
>
>

Re: Partition size

Posted by Jeff Jirsa <jj...@apache.org>.
On 2016-09-08 18:53 (-0700), Anshu Vajpayee <an...@gmail.com> wrote: 
> Is there any way to get partition size for a  partition key ?
> 

Anshu,

The simple answer to your question is that it is not currently possible to get a partition size for an arbitrary key without quite a lot of work (basically you'd have to write a tool that iterated over the data on disk, which is nontrivial).

There exists a ticket to expose this: https://issues.apache.org/jira/browse/CASSANDRA-12367

It's not clear when that ticket will land, but I expect you'll see an API for getting the size of a partition key in the near future.



Re: Partition size

Posted by Mark Curtis <ma...@datastax.com>.
On 9 September 2016 at 16:47, Rakesh Kumar <ra...@gmail.com>
wrote:

> On Fri, Sep 9, 2016 at 11:46 AM, Mark Curtis <ma...@datastax.com>
> wrote:
> > If your partition sizes are over 100MB iirc then you'll normally see
> > warnings in your system.log, this will outline the partition key, at
> least
> > in Cassandra 2.0 and 2.1 as I recall.
>
> Has it improved in C* 3.x. What is considered a good partition size in C*
> 3.x
>

The 100MB is just a default setting you can set this up or down as you need
it:

https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__compaction_large_partition_warning_threshold_mb

There isn't really a "good" or "bad" value, it all depends on the data
model, your query patterns and required response times as to what's
acceptable for your application. The 100MB default is just a guide.

If you're seeing partitions of 1GB and above then you may very well start
to see problems. Again cfstats is your friend here!

-Mark

Re: Partition size

Posted by Jeff Jirsa <je...@crowdstrike.com>.

On 9/9/16, 8:47 AM, "Rakesh Kumar" <ra...@gmail.com> wrote:

>> If your partition sizes are over 100MB iirc then you'll normally see
>> warnings in your system.log, this will outline the partition key, at least
>> in Cassandra 2.0 and 2.1 as I recall.
>
>Has it improved in C* 3.x. What is considered a good partition size in C* 3.x

In modern versions (2.1 and newer), the “real” risk of large partitions is that they generate a lot of garbage on read – it’s not a 1:1 equivalence, but it’s linear, and a partition that’s 10x as large generates 10x as much garbage.

You can tune around it (very large new gen, for example), but it’s best fixed at the data model most of the time.

The long term fix will be Cassandra-9754, which is a work in progress. The short term fix for 3.x was http://issues.apache.org/jira/browse/CASSANDRA-11206 , which went into 3.6 and higher

In the notes on 11206, you’ll see that Robert Stupp tested up to an 8GB partition – while nobody’s going to recommend you create a data model with 8gb partitions, I imagine you may find partitions in that rough order of magnitude acceptable.

Re: Partition size

Posted by Rakesh Kumar <ra...@gmail.com>.
On Fri, Sep 9, 2016 at 11:46 AM, Mark Curtis <ma...@datastax.com> wrote:
> If your partition sizes are over 100MB iirc then you'll normally see
> warnings in your system.log, this will outline the partition key, at least
> in Cassandra 2.0 and 2.1 as I recall.

Has it improved in C* 3.x. What is considered a good partition size in C* 3.x

Re: Partition size

Posted by Jeff Jirsa <je...@crowdstrike.com>.

On 9/9/16, 12:14 PM, "Mark Thomas" <ma...@apache.org> wrote:

> If you are going to point to docs, please
>point to the official Apache docs unless there is a very good reason not to.
>

(And if the good reason is that there’s a deficiency in the apache Cassandra docs, please make it known on the list or in a jira so someone can write what’s missing)



Re: Partition size

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
I fully agree with Benedict here.  I would much prefer to keep this sort of
toxic behavior off the ML.  People can link to whatever helpful docs /
blogs they choose.

On Fri, Sep 9, 2016 at 1:12 PM Benedict Elliott Smith <be...@apache.org>
wrote:

> Come on. This kind of inconsistent 'policing' is not helpful.
>
> By all means, push the *committers* to improve the project docs as is
> happening, and to promote the internal resources over external ones.
>
> But Mark has absolutely no formal connection with the project, and his
> contributions have only been to file a couple of JIRA (all of which have so
> far been ignored by those of his colleagues who *are* active community
> members, I'll note!).  Shaming him for not linking docs that describe
> something *other* than what he was even talking about is crossing the
> line IMO.
>
> Linking to third-party resources is commonplace, the only difference I can
> see here is that these have been called "docs"  by the authors, instead of
> a blog post, and Mark has a DataStax email address.
>
> Would you have reacted this way if Aaron Morton linked a blog post by
> thelastpickle?  Or a random user posted their own resources?  Obviously not.
>
> I was initially all for the ASF endeavour to counteract DataStax' outsized
> influence on the project, and was hopeful you might achieve some positive
> change.  Perhaps you may well still do.  But it seems to me that the ASF
> behaviour is beginning to cross from constructive criticism of the project
> participants to prejudicially hostile behaviour against certain community
> members - and that is unlikely to result in a better project.
>
> You should be treating everyone consistently, in a manner that promotes
> project health.
>
>
>
> On Friday, 9 September 2016, Mark Thomas <ma...@apache.org> wrote:
>
>> On 09/09/2016 16:46, Mark Curtis wrote:
>> > If your partition sizes are over 100MB iirc then you'll normally see
>> > warnings in your system.log, this will outline the partition key, at
>> > least in Cassandra 2.0 and 2.1 as I recall.
>> >
>> > Your best friend here is nodetool cfstats which shows you the
>> > min/mean/max partition sizes for your table. It's quite often used to
>> > pinpoint large partitons on nodes in a cluster.
>> >
>> > More info
>> > here:
>> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
>>
>> Folks,
>>
>> It is *Apache* Cassandra. If you are going to point to docs, please
>> point to the official Apache docs unless there is a very good reason not
>> to.
>>
>> In this case:
>>
>>
>> http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
>>
>> looks to the place.
>>
>> Mark
>>
>>
>> >
>> > Thanks
>> >
>> > Mark
>> >
>> >
>> > On 9 September 2016 at 02:53, Anshu Vajpayee <anshu.vajpayee@gmail.com
>> > <ma...@gmail.com>> wrote:
>> >
>> >     Is there any way to get partition size for a  partition key ?
>> >
>> >
>>
>>

Re: Partition size

Posted by Edward Capriolo <ed...@gmail.com>.
In US english it is also debatable over which words are profane.

https://simple.wikipedia.org/wiki/Profanity
Different words can be profanity to different people, and what words are
thought of as profanity in English can change over time.

Suggestion:
https://www.youtube.com/watch?v=L0MK7qz13bU

On Mon, Sep 12, 2016 at 9:36 AM, Benedict Elliott Smith <benedict@apache.org
> wrote:

> The guidelines stipulate no "excessive or unnecessary" profanity.  Perhaps
> you also decide what qualifies as necessary or non-excessive?
>
> To summarise my view of this entire discussion: policing users is just...
> mind boggling. Well worthy of profanity.
>
>
>
>
>
> On 12 September 2016 at 14:16, Mark Thomas <ma...@apache.org> wrote:
>
>> On 12/09/2016 12:51, Benedict Elliott Smith wrote:
>>
>> Please tone down your language. There is no need for profanity.
>>
>> Now is probably a good time to remind everyone of the Apache Code of
>> Conduct:
>> http://www.apache.org/foundation/policies/conduct.html
>>
>>
>> >     (a link to 3rd party docs in response to a question when an
>> >     equivalent link to project hosted docs was available)
>> >
>> >
>> > No, it wasn't.  Or at least the link you sent was not remotely the same
>> > as the link in the email you responded to, which was about how to
>> > understand your partition sizes - not the configuration parameter.
>> > Possibly you responded to the wrong email.
>>
>> I did respond to the wrong e-mail. I apologise for any confusion caused.
>> I intended to respond to this message:
>>
>> https://lists.apache.org/thread.html/6a68da3467b1fe8fe96c1be
>> de135d329419b78bf3cc3912e727304db@%3Cuser.cassandra.apache.org%3E
>>
>> rather than this one:
>>
>> https://lists.apache.org/thread.html/39a47ddf3cdecf6a196967b
>> a679c30d65279a2afc05a2588e8c69bac@%3Cuser.cassandra.apache.org%3E
>>
>> I must have clicked on the wrong message in the thread as I moved
>> between windows.
>>
>> >     Any member of a project community (contributor, committer or PMC
>> >     member)
>> >
>> >
>> > Right.  But policing /users/ (which Mark most certainly is) is just
>> > douchebaggery.  Users should feel free to participate with the resources
>> > /they know best /without fear of reprisal.  All of your statement
>> > suggests this shit belongs on the dev list.
>>
>> Users are as much part of the community as anyone else.
>>
>> > Or are we really suggesting that anyone discussing things on the user
>> > list must be 100% conversant with the "official" docs before they can
>> > make any kind of posting to the list?  Or otherwise they can expect to
>> > be attacked by other community members?
>>
>> I am not saying that at all. I am saying that, unless there is a good
>> reason, links to documentation - particularly reference documentation -
>> should be to the official Apache hosts docs in preference to links to a
>> third party.
>>
>> > Talk about chilling.  I do not see this promoting engagement - who wants
>> > to help other users out if this is what they can expect in return?  A
>> > public shaming?
>>
>> My response was not to Mark, but to the community as a whole. It was not
>> intended as either a reprimand or a shaming. If Mark feels differently,
>> then I apologise. My intention was to make a simple request to the
>> community as a whole to reference the official documentation in
>> preference to 3rd party docs unless there was a good reason.
>>
>> >     Linking to third party docs, blogs, etc is fairly common but they
>> >     tend to be linked by the OP in the form of "I've followed the
>> >     instructions I found here and it doesn't work".
>> >
>> >
>> > Bullshit. Try a simple google
>> > search: site:https://mail-archives.apache.org/mod_mbox/cassandra-user/
>> > thelastpickle.com/blog <http://thelastpickle.com/blog>
>> >
>> > There are 500 results.  For just one external resource.  I don't recall
>> > a single one of these resulting in a reprimand.  Try the first three
>> > links from the search - they do not fit /any/ of your characterisations
>> > of "normal" - but they do fit mine.
>>
>> None of which, according to Google, have been made since I joined the
>> list in August. The past is the past and I don't see how a review of any
>> of those posts helps the project.
>>
>> There are also ~1500 references to docs.datastax.com. I don't think
>> reviewing those posts would help either.
>>
>> I'll note that the search didn't turn up this post (probably because of
>> the combined delay in mail-archives.a.o updating and Google indexing the
>> site):
>>
>> https://lists.apache.org/thread.html/7f60b641c40e5e7ba9c7c5c
>> 90eee47a94e5ce8690450c7617adc4a41@%3Cuser.cassandra.apache.org%3E
>>
>> That is a good example of the "more involved" question I referred to
>> previously. Hopefully, some of that information will find its way into
>> the architecture section of the official docs.
>>
>> > Perhaps you can link the history of projects attacking users for their
>> > email content?
>>
>> I did say that linking to 3rd party reference docs rather than the
>> official reference docs as part of an answer to a question was unusual.
>> In the Apache community I know best, Tomcat, I do recall it happening a
>> few times but less than once a year. I don't recall any of the specifics
>> so finding a reference in the ~150k user@ list messages over the last 10
>> years is a tall order. I did try, but finding a reference is going to
>> take more time than I have.
>>
>> Mark
>>
>>
>> > On 12 September 2016 at 12:10, Mark Thomas <markt@apache.org
>> > <ma...@apache.org>> wrote:
>> >
>> >     On 09/09/2016 21:11, Benedict Elliott Smith wrote:
>> >     > Come on. This kind of inconsistent 'policing' is not helpful.
>> >
>> >     How is it inconsistent? Since I subscribed to the mailing list on 22
>> >     August, this is the first instance I have seen of anyone providing a
>> >     link to third party docs rather than the equivalent project hosted
>> docs
>> >     in response to a user question. If I missed any, please point them
>> out.
>> >     The lists are pretty busy and that, combined with my minimal
>> technical
>> >     knowledge of Cassandra, means it is perfectly possible I missed
>> some.
>> >
>> >     I've done a quick double check of the user@ archives and while I
>> do see
>> >     a number of messages referencing 3rd party docs, those references
>> were
>> >     made by the OP rather than someone from the community providing an
>> >     answer.
>> >
>> >     > By all means, push the /*committers*/ to improve the project docs
>> >     as is
>> >     > happening, and to promote the internal resources over external
>> ones.
>> >     >
>> >     > But Mark has absolutely no formal connection with the project,
>> and his
>> >     > contributions have only been to file a couple of JIRA (all of
>> which have
>> >     > so far been ignored by those of his colleagues who /are/ active
>> >     > community members, I'll note!).  Shaming him for not linking docs
>> that
>> >     > describe something /other/ than what he was even talking about is
>> >     > crossing the line IMO.
>> >
>> >     Any member of a project community (contributor, committer or PMC
>> member)
>> >     directing users to 3rd party docs in preference to project docs
>> without
>> >     a good reason is missing an opportunity to strengthen that project
>> >     community.
>> >
>> >     > Linking to third-party resources is commonplace, the only
>> difference I
>> >     > can see here is that these have been called "docs"  by the
>> authors,
>> >     > instead of a blog post, and Mark has a DataStax email address.
>> >
>> >     Linking to third party reference docs for an Apache project in
>> response
>> >     to a configuration question about that Apache project on one of the
>> >     project's mailing lists is pretty unusual.
>> >
>> >     Linking to third party docs, blogs, etc is fairly common but they
>> tend
>> >     to be linked by the OP in the form of "I've followed the
>> instructions I
>> >     found here and it doesn't work". The responses to such questions
>> >     typically include links to the relevant parts of the Apache hosted
>> docs.
>> >
>> >     If the question is more involved then I have seen links to blogs,
>> >     presentations, YouTube etc provided as an answer. If this happens
>> >     multiple times for the same topic then it is usually added to an
>> FAQ,
>> >     wiki or similar along with an e-mail to the author to see if they'd
>> be
>> >     willing to contribute something to the docs.
>> >
>> >     > Would you have reacted this way if Aaron Morton linked a blog
>> post by
>> >     > thelastpickle?  Or a random user posted their own resources?
>> Obviously not.
>> >
>> >     Wrong. My reaction was based on the content of the message (a link
>> to
>> >     3rd party docs in response to a question when an equivalent link to
>> >     project hosted docs was available) not on who sent it or their
>> employer.
>> >
>> >     > I was initially all for the ASF endeavour to counteract DataStax'
>> >     > outsized influence on the project, and was hopeful you might
>> achieve
>> >     > some positive change.  Perhaps you may well still do.  But it
>> seems to
>> >     > me that the ASF behaviour is beginning to cross from constructive
>> >     > criticism of the project participants to prejudicially hostile
>> behaviour
>> >     > against certain community members - and that is unlikely to
>> result in a
>> >     > better project.
>> >     >
>> >     > You should be treating everyone consistently, in a manner that
>> promotes
>> >     > project health.
>> >
>> >     It is not healthy if community members are directing users to 3rd
>> party
>> >     documentation in preference to the project's own documentation. If
>> it is
>> >     happening because the project's documentation is non-existent /
>> wrong /
>> >     poorly written / etc. then that is understandable (and would be an
>> issue
>> >     the project needed to address) but that was not the case in this
>> >     instance.
>> >
>> >     There are many aspects to community health. In the grand scheme of
>> >     things the single e-mail that started this particular discussion is
>> in
>> >     the noise. However, a consistent pattern of such e-mails would be
>> much
>> >     more troubling. My intent was to ensure that such a pattern did not
>> >     form.
>> >
>> >     Whether people agree with my response or not, the community is
>> hopefully
>> >     more aware of the issue than it was previously.
>> >
>> >     Mark
>> >
>> >
>> >     > On Friday, 9 September 2016, Mark Thomas <markt@apache.org
>> <ma...@apache.org>
>> >     > <mailto:markt@apache.org <ma...@apache.org>>> wrote:
>> >     >
>> >     >     On 09/09/2016 16:46, Mark Curtis wrote:
>> >     >     > If your partition sizes are over 100MB iirc then you'll
>> >     normally see
>> >     >     > warnings in your system.log, this will outline the partition
>> >     key, at
>> >     >     > least in Cassandra 2.0 and 2.1 as I recall.
>> >     >     >
>> >     >     > Your best friend here is nodetool cfstats which shows you
>> the
>> >     >     > min/mean/max partition sizes for your table. It's quite
>> >     often used to
>> >     >     > pinpoint large partitons on nodes in a cluster.
>> >     >     >
>> >     >     > More info
>> >     >     > here:
>> >     >
>> >      https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/t
>> oolsCFstats.html
>> >     <https://docs.datastax.com/en/cassandra/2.1/cassandra/
>> tools/toolsCFstats.html>
>> >     >
>> >      <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
>> toolsCFstats.html
>> >     <https://docs.datastax.com/en/cassandra/2.1/cassandra/
>> tools/toolsCFstats.html>>
>> >     >
>> >     >     Folks,
>> >     >
>> >     >     It is *Apache* Cassandra. If you are going to point to docs,
>> >     please
>> >     >     point to the official Apache docs unless there is a very good
>> >     reason
>> >     >     not to.
>> >     >
>> >     >     In this case:
>> >     >
>> >     >
>> >      http://cassandra.apache.org/doc/latest/configuration/cassand
>> ra_config_file.html#compaction_large_partition_warning_threshold_mb
>> >     <http://cassandra.apache.org/doc/latest/configuration/cassa
>> ndra_config_file.html#compaction_large_partition_warning_threshold_mb>
>> >     >
>> >      <http://cassandra.apache.org/doc/latest/configuration/cassan
>> dra_config_file.html#compaction_large_partition_warning_threshold_mb
>> >     <http://cassandra.apache.org/doc/latest/configuration/cassa
>> ndra_config_file.html#compaction_large_partition_warning_threshold_mb>>
>> >     >
>> >     >     looks to the place.
>> >     >
>> >     >     Mark
>> >     >
>> >     >
>> >     >     >
>> >     >     > Thanks
>> >     >     >
>> >     >     > Mark
>> >     >     >
>> >     >     >
>> >     >     > On 9 September 2016 at 02:53, Anshu Vajpayee
>> >     <anshu.vajpayee@gmail.com <ma...@gmail.com>
>> >     >     > <mailto:anshu.vajpayee@gmail.com
>> >     <ma...@gmail.com>>> wrote:
>> >     >     >
>> >     >     >     Is there any way to get partition size for a  partition
>> >     key ?
>> >     >     >
>> >     >     >
>> >     >
>> >
>> >
>>
>>
>

Re: Partition size

Posted by Benedict Elliott Smith <be...@apache.org>.
The guidelines stipulate no "excessive or unnecessary" profanity.  Perhaps
you also decide what qualifies as necessary or non-excessive?

To summarise my view of this entire discussion: policing users is just...
mind boggling. Well worthy of profanity.





On 12 September 2016 at 14:16, Mark Thomas <ma...@apache.org> wrote:

> On 12/09/2016 12:51, Benedict Elliott Smith wrote:
>
> Please tone down your language. There is no need for profanity.
>
> Now is probably a good time to remind everyone of the Apache Code of
> Conduct:
> http://www.apache.org/foundation/policies/conduct.html
>
>
> >     (a link to 3rd party docs in response to a question when an
> >     equivalent link to project hosted docs was available)
> >
> >
> > No, it wasn't.  Or at least the link you sent was not remotely the same
> > as the link in the email you responded to, which was about how to
> > understand your partition sizes - not the configuration parameter.
> > Possibly you responded to the wrong email.
>
> I did respond to the wrong e-mail. I apologise for any confusion caused.
> I intended to respond to this message:
>
> https://lists.apache.org/thread.html/6a68da3467b1fe8fe96c1bede135d3
> 29419b78bf3cc3912e727304db@%3Cuser.cassandra.apache.org%3E
>
> rather than this one:
>
> https://lists.apache.org/thread.html/39a47ddf3cdecf6a196967ba679c30
> d65279a2afc05a2588e8c69bac@%3Cuser.cassandra.apache.org%3E
>
> I must have clicked on the wrong message in the thread as I moved
> between windows.
>
> >     Any member of a project community (contributor, committer or PMC
> >     member)
> >
> >
> > Right.  But policing /users/ (which Mark most certainly is) is just
> > douchebaggery.  Users should feel free to participate with the resources
> > /they know best /without fear of reprisal.  All of your statement
> > suggests this shit belongs on the dev list.
>
> Users are as much part of the community as anyone else.
>
> > Or are we really suggesting that anyone discussing things on the user
> > list must be 100% conversant with the "official" docs before they can
> > make any kind of posting to the list?  Or otherwise they can expect to
> > be attacked by other community members?
>
> I am not saying that at all. I am saying that, unless there is a good
> reason, links to documentation - particularly reference documentation -
> should be to the official Apache hosts docs in preference to links to a
> third party.
>
> > Talk about chilling.  I do not see this promoting engagement - who wants
> > to help other users out if this is what they can expect in return?  A
> > public shaming?
>
> My response was not to Mark, but to the community as a whole. It was not
> intended as either a reprimand or a shaming. If Mark feels differently,
> then I apologise. My intention was to make a simple request to the
> community as a whole to reference the official documentation in
> preference to 3rd party docs unless there was a good reason.
>
> >     Linking to third party docs, blogs, etc is fairly common but they
> >     tend to be linked by the OP in the form of "I've followed the
> >     instructions I found here and it doesn't work".
> >
> >
> > Bullshit. Try a simple google
> > search: site:https://mail-archives.apache.org/mod_mbox/cassandra-user/
> > thelastpickle.com/blog <http://thelastpickle.com/blog>
> >
> > There are 500 results.  For just one external resource.  I don't recall
> > a single one of these resulting in a reprimand.  Try the first three
> > links from the search - they do not fit /any/ of your characterisations
> > of "normal" - but they do fit mine.
>
> None of which, according to Google, have been made since I joined the
> list in August. The past is the past and I don't see how a review of any
> of those posts helps the project.
>
> There are also ~1500 references to docs.datastax.com. I don't think
> reviewing those posts would help either.
>
> I'll note that the search didn't turn up this post (probably because of
> the combined delay in mail-archives.a.o updating and Google indexing the
> site):
>
> https://lists.apache.org/thread.html/7f60b641c40e5e7ba9c7c5c90eee47
> a94e5ce8690450c7617adc4a41@%3Cuser.cassandra.apache.org%3E
>
> That is a good example of the "more involved" question I referred to
> previously. Hopefully, some of that information will find its way into
> the architecture section of the official docs.
>
> > Perhaps you can link the history of projects attacking users for their
> > email content?
>
> I did say that linking to 3rd party reference docs rather than the
> official reference docs as part of an answer to a question was unusual.
> In the Apache community I know best, Tomcat, I do recall it happening a
> few times but less than once a year. I don't recall any of the specifics
> so finding a reference in the ~150k user@ list messages over the last 10
> years is a tall order. I did try, but finding a reference is going to
> take more time than I have.
>
> Mark
>
>
> > On 12 September 2016 at 12:10, Mark Thomas <markt@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     On 09/09/2016 21:11, Benedict Elliott Smith wrote:
> >     > Come on. This kind of inconsistent 'policing' is not helpful.
> >
> >     How is it inconsistent? Since I subscribed to the mailing list on 22
> >     August, this is the first instance I have seen of anyone providing a
> >     link to third party docs rather than the equivalent project hosted
> docs
> >     in response to a user question. If I missed any, please point them
> out.
> >     The lists are pretty busy and that, combined with my minimal
> technical
> >     knowledge of Cassandra, means it is perfectly possible I missed some.
> >
> >     I've done a quick double check of the user@ archives and while I do
> see
> >     a number of messages referencing 3rd party docs, those references
> were
> >     made by the OP rather than someone from the community providing an
> >     answer.
> >
> >     > By all means, push the /*committers*/ to improve the project docs
> >     as is
> >     > happening, and to promote the internal resources over external
> ones.
> >     >
> >     > But Mark has absolutely no formal connection with the project, and
> his
> >     > contributions have only been to file a couple of JIRA (all of
> which have
> >     > so far been ignored by those of his colleagues who /are/ active
> >     > community members, I'll note!).  Shaming him for not linking docs
> that
> >     > describe something /other/ than what he was even talking about is
> >     > crossing the line IMO.
> >
> >     Any member of a project community (contributor, committer or PMC
> member)
> >     directing users to 3rd party docs in preference to project docs
> without
> >     a good reason is missing an opportunity to strengthen that project
> >     community.
> >
> >     > Linking to third-party resources is commonplace, the only
> difference I
> >     > can see here is that these have been called "docs"  by the authors,
> >     > instead of a blog post, and Mark has a DataStax email address.
> >
> >     Linking to third party reference docs for an Apache project in
> response
> >     to a configuration question about that Apache project on one of the
> >     project's mailing lists is pretty unusual.
> >
> >     Linking to third party docs, blogs, etc is fairly common but they
> tend
> >     to be linked by the OP in the form of "I've followed the
> instructions I
> >     found here and it doesn't work". The responses to such questions
> >     typically include links to the relevant parts of the Apache hosted
> docs.
> >
> >     If the question is more involved then I have seen links to blogs,
> >     presentations, YouTube etc provided as an answer. If this happens
> >     multiple times for the same topic then it is usually added to an FAQ,
> >     wiki or similar along with an e-mail to the author to see if they'd
> be
> >     willing to contribute something to the docs.
> >
> >     > Would you have reacted this way if Aaron Morton linked a blog post
> by
> >     > thelastpickle?  Or a random user posted their own resources?
> Obviously not.
> >
> >     Wrong. My reaction was based on the content of the message (a link to
> >     3rd party docs in response to a question when an equivalent link to
> >     project hosted docs was available) not on who sent it or their
> employer.
> >
> >     > I was initially all for the ASF endeavour to counteract DataStax'
> >     > outsized influence on the project, and was hopeful you might
> achieve
> >     > some positive change.  Perhaps you may well still do.  But it
> seems to
> >     > me that the ASF behaviour is beginning to cross from constructive
> >     > criticism of the project participants to prejudicially hostile
> behaviour
> >     > against certain community members - and that is unlikely to result
> in a
> >     > better project.
> >     >
> >     > You should be treating everyone consistently, in a manner that
> promotes
> >     > project health.
> >
> >     It is not healthy if community members are directing users to 3rd
> party
> >     documentation in preference to the project's own documentation. If
> it is
> >     happening because the project's documentation is non-existent /
> wrong /
> >     poorly written / etc. then that is understandable (and would be an
> issue
> >     the project needed to address) but that was not the case in this
> >     instance.
> >
> >     There are many aspects to community health. In the grand scheme of
> >     things the single e-mail that started this particular discussion is
> in
> >     the noise. However, a consistent pattern of such e-mails would be
> much
> >     more troubling. My intent was to ensure that such a pattern did not
> >     form.
> >
> >     Whether people agree with my response or not, the community is
> hopefully
> >     more aware of the issue than it was previously.
> >
> >     Mark
> >
> >
> >     > On Friday, 9 September 2016, Mark Thomas <markt@apache.org
> <ma...@apache.org>
> >     > <mailto:markt@apache.org <ma...@apache.org>>> wrote:
> >     >
> >     >     On 09/09/2016 16:46, Mark Curtis wrote:
> >     >     > If your partition sizes are over 100MB iirc then you'll
> >     normally see
> >     >     > warnings in your system.log, this will outline the partition
> >     key, at
> >     >     > least in Cassandra 2.0 and 2.1 as I recall.
> >     >     >
> >     >     > Your best friend here is nodetool cfstats which shows you the
> >     >     > min/mean/max partition sizes for your table. It's quite
> >     often used to
> >     >     > pinpoint large partitons on nodes in a cluster.
> >     >     >
> >     >     > More info
> >     >     > here:
> >     >
> >      https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html
> >     <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html>
> >     >
> >      <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html
> >     <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html>>
> >     >
> >     >     Folks,
> >     >
> >     >     It is *Apache* Cassandra. If you are going to point to docs,
> >     please
> >     >     point to the official Apache docs unless there is a very good
> >     reason
> >     >     not to.
> >     >
> >     >     In this case:
> >     >
> >     >
> >      http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >     <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >
> >     >
> >      <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >     <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >>
> >     >
> >     >     looks to the place.
> >     >
> >     >     Mark
> >     >
> >     >
> >     >     >
> >     >     > Thanks
> >     >     >
> >     >     > Mark
> >     >     >
> >     >     >
> >     >     > On 9 September 2016 at 02:53, Anshu Vajpayee
> >     <anshu.vajpayee@gmail.com <ma...@gmail.com>
> >     >     > <mailto:anshu.vajpayee@gmail.com
> >     <ma...@gmail.com>>> wrote:
> >     >     >
> >     >     >     Is there any way to get partition size for a  partition
> >     key ?
> >     >     >
> >     >     >
> >     >
> >
> >
>
>

Re: Partition size

Posted by Mark Thomas <ma...@apache.org>.
On 12/09/2016 12:51, Benedict Elliott Smith wrote:

Please tone down your language. There is no need for profanity.

Now is probably a good time to remind everyone of the Apache Code of
Conduct:
http://www.apache.org/foundation/policies/conduct.html


>     (a link to 3rd party docs in response to a question when an
>     equivalent link to project hosted docs was available) 
> 
> 
> No, it wasn't.  Or at least the link you sent was not remotely the same
> as the link in the email you responded to, which was about how to
> understand your partition sizes - not the configuration parameter. 
> Possibly you responded to the wrong email.

I did respond to the wrong e-mail. I apologise for any confusion caused.
I intended to respond to this message:

https://lists.apache.org/thread.html/6a68da3467b1fe8fe96c1bede135d329419b78bf3cc3912e727304db@%3Cuser.cassandra.apache.org%3E

rather than this one:

https://lists.apache.org/thread.html/39a47ddf3cdecf6a196967ba679c30d65279a2afc05a2588e8c69bac@%3Cuser.cassandra.apache.org%3E

I must have clicked on the wrong message in the thread as I moved
between windows.

>     Any member of a project community (contributor, committer or PMC
>     member) 
> 
> 
> Right.  But policing /users/ (which Mark most certainly is) is just
> douchebaggery.  Users should feel free to participate with the resources
> /they know best /without fear of reprisal.  All of your statement
> suggests this shit belongs on the dev list.

Users are as much part of the community as anyone else.

> Or are we really suggesting that anyone discussing things on the user
> list must be 100% conversant with the "official" docs before they can
> make any kind of posting to the list?  Or otherwise they can expect to
> be attacked by other community members?

I am not saying that at all. I am saying that, unless there is a good
reason, links to documentation - particularly reference documentation -
should be to the official Apache hosts docs in preference to links to a
third party.

> Talk about chilling.  I do not see this promoting engagement - who wants
> to help other users out if this is what they can expect in return?  A
> public shaming?

My response was not to Mark, but to the community as a whole. It was not
intended as either a reprimand or a shaming. If Mark feels differently,
then I apologise. My intention was to make a simple request to the
community as a whole to reference the official documentation in
preference to 3rd party docs unless there was a good reason.

>     Linking to third party docs, blogs, etc is fairly common but they
>     tend to be linked by the OP in the form of "I've followed the
>     instructions I found here and it doesn't work". 
> 
> 
> Bullshit. Try a simple google
> search: site:https://mail-archives.apache.org/mod_mbox/cassandra-user/
> thelastpickle.com/blog <http://thelastpickle.com/blog>
> 
> There are 500 results.  For just one external resource.  I don't recall
> a single one of these resulting in a reprimand.  Try the first three
> links from the search - they do not fit /any/ of your characterisations
> of "normal" - but they do fit mine.

None of which, according to Google, have been made since I joined the
list in August. The past is the past and I don't see how a review of any
of those posts helps the project.

There are also ~1500 references to docs.datastax.com. I don't think
reviewing those posts would help either.

I'll note that the search didn't turn up this post (probably because of
the combined delay in mail-archives.a.o updating and Google indexing the
site):

https://lists.apache.org/thread.html/7f60b641c40e5e7ba9c7c5c90eee47a94e5ce8690450c7617adc4a41@%3Cuser.cassandra.apache.org%3E

That is a good example of the "more involved" question I referred to
previously. Hopefully, some of that information will find its way into
the architecture section of the official docs.

> Perhaps you can link the history of projects attacking users for their
> email content?

I did say that linking to 3rd party reference docs rather than the
official reference docs as part of an answer to a question was unusual.
In the Apache community I know best, Tomcat, I do recall it happening a
few times but less than once a year. I don't recall any of the specifics
so finding a reference in the ~150k user@ list messages over the last 10
years is a tall order. I did try, but finding a reference is going to
take more time than I have.

Mark


> On 12 September 2016 at 12:10, Mark Thomas <markt@apache.org
> <ma...@apache.org>> wrote:
> 
>     On 09/09/2016 21:11, Benedict Elliott Smith wrote:
>     > Come on. This kind of inconsistent 'policing' is not helpful.
> 
>     How is it inconsistent? Since I subscribed to the mailing list on 22
>     August, this is the first instance I have seen of anyone providing a
>     link to third party docs rather than the equivalent project hosted docs
>     in response to a user question. If I missed any, please point them out.
>     The lists are pretty busy and that, combined with my minimal technical
>     knowledge of Cassandra, means it is perfectly possible I missed some.
> 
>     I've done a quick double check of the user@ archives and while I do see
>     a number of messages referencing 3rd party docs, those references were
>     made by the OP rather than someone from the community providing an
>     answer.
> 
>     > By all means, push the /*committers*/ to improve the project docs
>     as is
>     > happening, and to promote the internal resources over external ones.
>     >
>     > But Mark has absolutely no formal connection with the project, and his
>     > contributions have only been to file a couple of JIRA (all of which have
>     > so far been ignored by those of his colleagues who /are/ active
>     > community members, I'll note!).  Shaming him for not linking docs that
>     > describe something /other/ than what he was even talking about is
>     > crossing the line IMO.
> 
>     Any member of a project community (contributor, committer or PMC member)
>     directing users to 3rd party docs in preference to project docs without
>     a good reason is missing an opportunity to strengthen that project
>     community.
> 
>     > Linking to third-party resources is commonplace, the only difference I
>     > can see here is that these have been called "docs"  by the authors,
>     > instead of a blog post, and Mark has a DataStax email address.
> 
>     Linking to third party reference docs for an Apache project in response
>     to a configuration question about that Apache project on one of the
>     project's mailing lists is pretty unusual.
> 
>     Linking to third party docs, blogs, etc is fairly common but they tend
>     to be linked by the OP in the form of "I've followed the instructions I
>     found here and it doesn't work". The responses to such questions
>     typically include links to the relevant parts of the Apache hosted docs.
> 
>     If the question is more involved then I have seen links to blogs,
>     presentations, YouTube etc provided as an answer. If this happens
>     multiple times for the same topic then it is usually added to an FAQ,
>     wiki or similar along with an e-mail to the author to see if they'd be
>     willing to contribute something to the docs.
> 
>     > Would you have reacted this way if Aaron Morton linked a blog post by
>     > thelastpickle?  Or a random user posted their own resources?  Obviously not.
> 
>     Wrong. My reaction was based on the content of the message (a link to
>     3rd party docs in response to a question when an equivalent link to
>     project hosted docs was available) not on who sent it or their employer.
> 
>     > I was initially all for the ASF endeavour to counteract DataStax'
>     > outsized influence on the project, and was hopeful you might achieve
>     > some positive change.  Perhaps you may well still do.  But it seems to
>     > me that the ASF behaviour is beginning to cross from constructive
>     > criticism of the project participants to prejudicially hostile behaviour
>     > against certain community members - and that is unlikely to result in a
>     > better project.
>     >
>     > You should be treating everyone consistently, in a manner that promotes
>     > project health.
> 
>     It is not healthy if community members are directing users to 3rd party
>     documentation in preference to the project's own documentation. If it is
>     happening because the project's documentation is non-existent / wrong /
>     poorly written / etc. then that is understandable (and would be an issue
>     the project needed to address) but that was not the case in this
>     instance.
> 
>     There are many aspects to community health. In the grand scheme of
>     things the single e-mail that started this particular discussion is in
>     the noise. However, a consistent pattern of such e-mails would be much
>     more troubling. My intent was to ensure that such a pattern did not
>     form.
> 
>     Whether people agree with my response or not, the community is hopefully
>     more aware of the issue than it was previously.
> 
>     Mark
> 
> 
>     > On Friday, 9 September 2016, Mark Thomas <markt@apache.org <ma...@apache.org>
>     > <mailto:markt@apache.org <ma...@apache.org>>> wrote:
>     >
>     >     On 09/09/2016 16:46, Mark Curtis wrote:
>     >     > If your partition sizes are over 100MB iirc then you'll
>     normally see
>     >     > warnings in your system.log, this will outline the partition
>     key, at
>     >     > least in Cassandra 2.0 and 2.1 as I recall.
>     >     >
>     >     > Your best friend here is nodetool cfstats which shows you the
>     >     > min/mean/max partition sizes for your table. It's quite
>     often used to
>     >     > pinpoint large partitons on nodes in a cluster.
>     >     >
>     >     > More info
>     >     > here:
>     >   
>      https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
>     <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html>
>     >   
>      <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
>     <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html>>
>     >
>     >     Folks,
>     >
>     >     It is *Apache* Cassandra. If you are going to point to docs,
>     please
>     >     point to the official Apache docs unless there is a very good
>     reason
>     >     not to.
>     >
>     >     In this case:
>     >
>     >   
>      http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
>     <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb>
>     >   
>      <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
>     <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb>>
>     >
>     >     looks to the place.
>     >
>     >     Mark
>     >
>     >
>     >     >
>     >     > Thanks
>     >     >
>     >     > Mark
>     >     >
>     >     >
>     >     > On 9 September 2016 at 02:53, Anshu Vajpayee
>     <anshu.vajpayee@gmail.com <ma...@gmail.com>
>     >     > <mailto:anshu.vajpayee@gmail.com
>     <ma...@gmail.com>>> wrote:
>     >     >
>     >     >     Is there any way to get partition size for a  partition
>     key ?
>     >     >
>     >     >
>     >
> 
> 


Re: Partition size

Posted by Benedict Elliott Smith <be...@apache.org>.
(a link to 3rd party docs in response to a question when an equivalent link
> to project hosted docs was available)
>

No, it wasn't.  Or at least the link you sent was not remotely the same as
the link in the email you responded to, which was about how to understand
your partition sizes - not the configuration parameter.  Possibly you
responded to the wrong email.

Any member of a project community (contributor, committer or PMC member)


Right.  But policing *users* (which Mark most certainly is) is just
douchebaggery.  Users should feel free to participate with the resources *they
know best *without fear of reprisal.  All of your statement suggests this
shit belongs on the dev list.

Or are we really suggesting that anyone discussing things on the user list
must be 100% conversant with the "official" docs before they can make any
kind of posting to the list?  Or otherwise they can expect to be attacked
by other community members?

Talk about chilling.  I do not see this promoting engagement - who wants to
help other users out if this is what they can expect in return?  A public
shaming?

Linking to third party docs, blogs, etc is fairly common but they tend to
> be linked by the OP in the form of "I've followed the instructions I found
> here and it doesn't work".


Bullshit. Try a simple google search: site:
https://mail-archives.apache.org/mod_mbox/cassandra-user/
thelastpickle.com/blog

There are 500 results.  For just one external resource.  I don't recall a
single one of these resulting in a reprimand.  Try the first three links
from the search - they do not fit *any* of your characterisations of
"normal" - but they do fit mine.

Perhaps you can link the history of projects attacking users for their
email content?









On 12 September 2016 at 12:10, Mark Thomas <ma...@apache.org> wrote:

> On 09/09/2016 21:11, Benedict Elliott Smith wrote:
> > Come on. This kind of inconsistent 'policing' is not helpful.
>
> How is it inconsistent? Since I subscribed to the mailing list on 22
> August, this is the first instance I have seen of anyone providing a
> link to third party docs rather than the equivalent project hosted docs
> in response to a user question. If I missed any, please point them out.
> The lists are pretty busy and that, combined with my minimal technical
> knowledge of Cassandra, means it is perfectly possible I missed some.
>
> I've done a quick double check of the user@ archives and while I do see
> a number of messages referencing 3rd party docs, those references were
> made by the OP rather than someone from the community providing an answer.
>
> > By all means, push the /*committers*/ to improve the project docs as is
> > happening, and to promote the internal resources over external ones.
> >
> > But Mark has absolutely no formal connection with the project, and his
> > contributions have only been to file a couple of JIRA (all of which have
> > so far been ignored by those of his colleagues who /are/ active
> > community members, I'll note!).  Shaming him for not linking docs that
> > describe something /other/ than what he was even talking about is
> > crossing the line IMO.
>
> Any member of a project community (contributor, committer or PMC member)
> directing users to 3rd party docs in preference to project docs without
> a good reason is missing an opportunity to strengthen that project
> community.
>
> > Linking to third-party resources is commonplace, the only difference I
> > can see here is that these have been called "docs"  by the authors,
> > instead of a blog post, and Mark has a DataStax email address.
>
> Linking to third party reference docs for an Apache project in response
> to a configuration question about that Apache project on one of the
> project's mailing lists is pretty unusual.
>
> Linking to third party docs, blogs, etc is fairly common but they tend
> to be linked by the OP in the form of "I've followed the instructions I
> found here and it doesn't work". The responses to such questions
> typically include links to the relevant parts of the Apache hosted docs.
>
> If the question is more involved then I have seen links to blogs,
> presentations, YouTube etc provided as an answer. If this happens
> multiple times for the same topic then it is usually added to an FAQ,
> wiki or similar along with an e-mail to the author to see if they'd be
> willing to contribute something to the docs.
>
> > Would you have reacted this way if Aaron Morton linked a blog post by
> > thelastpickle?  Or a random user posted their own resources?  Obviously
> not.
>
> Wrong. My reaction was based on the content of the message (a link to
> 3rd party docs in response to a question when an equivalent link to
> project hosted docs was available) not on who sent it or their employer.
>
> > I was initially all for the ASF endeavour to counteract DataStax'
> > outsized influence on the project, and was hopeful you might achieve
> > some positive change.  Perhaps you may well still do.  But it seems to
> > me that the ASF behaviour is beginning to cross from constructive
> > criticism of the project participants to prejudicially hostile behaviour
> > against certain community members - and that is unlikely to result in a
> > better project.
> >
> > You should be treating everyone consistently, in a manner that promotes
> > project health.
>
> It is not healthy if community members are directing users to 3rd party
> documentation in preference to the project's own documentation. If it is
> happening because the project's documentation is non-existent / wrong /
> poorly written / etc. then that is understandable (and would be an issue
> the project needed to address) but that was not the case in this instance.
>
> There are many aspects to community health. In the grand scheme of
> things the single e-mail that started this particular discussion is in
> the noise. However, a consistent pattern of such e-mails would be much
> more troubling. My intent was to ensure that such a pattern did not form.
>
> Whether people agree with my response or not, the community is hopefully
> more aware of the issue than it was previously.
>
> Mark
>
>
> > On Friday, 9 September 2016, Mark Thomas <markt@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     On 09/09/2016 16:46, Mark Curtis wrote:
> >     > If your partition sizes are over 100MB iirc then you'll normally
> see
> >     > warnings in your system.log, this will outline the partition key,
> at
> >     > least in Cassandra 2.0 and 2.1 as I recall.
> >     >
> >     > Your best friend here is nodetool cfstats which shows you the
> >     > min/mean/max partition sizes for your table. It's quite often used
> to
> >     > pinpoint large partitons on nodes in a cluster.
> >     >
> >     > More info
> >     > here:
> >     https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html
> >     <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html>
> >
> >     Folks,
> >
> >     It is *Apache* Cassandra. If you are going to point to docs, please
> >     point to the official Apache docs unless there is a very good reason
> >     not to.
> >
> >     In this case:
> >
> >     http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >     <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >
> >
> >     looks to the place.
> >
> >     Mark
> >
> >
> >     >
> >     > Thanks
> >     >
> >     > Mark
> >     >
> >     >
> >     > On 9 September 2016 at 02:53, Anshu Vajpayee <
> anshu.vajpayee@gmail.com
> >     > <ma...@gmail.com>> wrote:
> >     >
> >     >     Is there any way to get partition size for a  partition key ?
> >     >
> >     >
> >
>
>

Re: Partition size

Posted by San Luoji <sa...@gmail.com>.
Mark,

As you admitted: "I subscribed to the mailing list on 22 August" and "my
minimal technical knowledge of Cassandra", then why you're even posting
something that's not providing real help to the Cassandra user community?

As stated in RFC 1855 (https://www.ietf.org/rfc/rfc1855.txt), section
3.1.1, "Read both mailing lists and newsgroups for one to two months before
you post anything" and "Don't wander off-topic", aren't those some basic
netiquette you should keep in mind in order to respect all the other
readers of the mailing list?

San

On Mon, Sep 12, 2016 at 5:10 AM, Mark Thomas <ma...@apache.org> wrote:

> On 09/09/2016 21:11, Benedict Elliott Smith wrote:
> > Come on. This kind of inconsistent 'policing' is not helpful.
>
> How is it inconsistent? Since I subscribed to the mailing list on 22
> August, this is the first instance I have seen of anyone providing a
> link to third party docs rather than the equivalent project hosted docs
> in response to a user question. If I missed any, please point them out.
> The lists are pretty busy and that, combined with my minimal technical
> knowledge of Cassandra, means it is perfectly possible I missed some.
>
> I've done a quick double check of the user@ archives and while I do see
> a number of messages referencing 3rd party docs, those references were
> made by the OP rather than someone from the community providing an answer.
>
> > By all means, push the /*committers*/ to improve the project docs as is
> > happening, and to promote the internal resources over external ones.
> >
> > But Mark has absolutely no formal connection with the project, and his
> > contributions have only been to file a couple of JIRA (all of which have
> > so far been ignored by those of his colleagues who /are/ active
> > community members, I'll note!).  Shaming him for not linking docs that
> > describe something /other/ than what he was even talking about is
> > crossing the line IMO.
>
> Any member of a project community (contributor, committer or PMC member)
> directing users to 3rd party docs in preference to project docs without
> a good reason is missing an opportunity to strengthen that project
> community.
>
> > Linking to third-party resources is commonplace, the only difference I
> > can see here is that these have been called "docs"  by the authors,
> > instead of a blog post, and Mark has a DataStax email address.
>
> Linking to third party reference docs for an Apache project in response
> to a configuration question about that Apache project on one of the
> project's mailing lists is pretty unusual.
>
> Linking to third party docs, blogs, etc is fairly common but they tend
> to be linked by the OP in the form of "I've followed the instructions I
> found here and it doesn't work". The responses to such questions
> typically include links to the relevant parts of the Apache hosted docs.
>
> If the question is more involved then I have seen links to blogs,
> presentations, YouTube etc provided as an answer. If this happens
> multiple times for the same topic then it is usually added to an FAQ,
> wiki or similar along with an e-mail to the author to see if they'd be
> willing to contribute something to the docs.
>
> > Would you have reacted this way if Aaron Morton linked a blog post by
> > thelastpickle?  Or a random user posted their own resources?  Obviously
> not.
>
> Wrong. My reaction was based on the content of the message (a link to
> 3rd party docs in response to a question when an equivalent link to
> project hosted docs was available) not on who sent it or their employer.
>
> > I was initially all for the ASF endeavour to counteract DataStax'
> > outsized influence on the project, and was hopeful you might achieve
> > some positive change.  Perhaps you may well still do.  But it seems to
> > me that the ASF behaviour is beginning to cross from constructive
> > criticism of the project participants to prejudicially hostile behaviour
> > against certain community members - and that is unlikely to result in a
> > better project.
> >
> > You should be treating everyone consistently, in a manner that promotes
> > project health.
>
> It is not healthy if community members are directing users to 3rd party
> documentation in preference to the project's own documentation. If it is
> happening because the project's documentation is non-existent / wrong /
> poorly written / etc. then that is understandable (and would be an issue
> the project needed to address) but that was not the case in this instance.
>
> There are many aspects to community health. In the grand scheme of
> things the single e-mail that started this particular discussion is in
> the noise. However, a consistent pattern of such e-mails would be much
> more troubling. My intent was to ensure that such a pattern did not form.
>
> Whether people agree with my response or not, the community is hopefully
> more aware of the issue than it was previously.
>
> Mark
>
>
> > On Friday, 9 September 2016, Mark Thomas <markt@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     On 09/09/2016 16:46, Mark Curtis wrote:
> >     > If your partition sizes are over 100MB iirc then you'll normally
> see
> >     > warnings in your system.log, this will outline the partition key,
> at
> >     > least in Cassandra 2.0 and 2.1 as I recall.
> >     >
> >     > Your best friend here is nodetool cfstats which shows you the
> >     > min/mean/max partition sizes for your table. It's quite often used
> to
> >     > pinpoint large partitons on nodes in a cluster.
> >     >
> >     > More info
> >     > here:
> >     https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html
> >     <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsCFstats.html>
> >
> >     Folks,
> >
> >     It is *Apache* Cassandra. If you are going to point to docs, please
> >     point to the official Apache docs unless there is a very good reason
> >     not to.
> >
> >     In this case:
> >
> >     http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >     <http://cassandra.apache.org/doc/latest/configuration/
> cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
> >
> >
> >     looks to the place.
> >
> >     Mark
> >
> >
> >     >
> >     > Thanks
> >     >
> >     > Mark
> >     >
> >     >
> >     > On 9 September 2016 at 02:53, Anshu Vajpayee <
> anshu.vajpayee@gmail.com
> >     > <ma...@gmail.com>> wrote:
> >     >
> >     >     Is there any way to get partition size for a  partition key ?
> >     >
> >     >
> >
>
>

Re: Partition size

Posted by Mark Thomas <ma...@apache.org>.
On 09/09/2016 21:11, Benedict Elliott Smith wrote:
> Come on. This kind of inconsistent 'policing' is not helpful.

How is it inconsistent? Since I subscribed to the mailing list on 22
August, this is the first instance I have seen of anyone providing a
link to third party docs rather than the equivalent project hosted docs
in response to a user question. If I missed any, please point them out.
The lists are pretty busy and that, combined with my minimal technical
knowledge of Cassandra, means it is perfectly possible I missed some.

I've done a quick double check of the user@ archives and while I do see
a number of messages referencing 3rd party docs, those references were
made by the OP rather than someone from the community providing an answer.

> By all means, push the /*committers*/ to improve the project docs as is
> happening, and to promote the internal resources over external ones.
> 
> But Mark has absolutely no formal connection with the project, and his
> contributions have only been to file a couple of JIRA (all of which have
> so far been ignored by those of his colleagues who /are/ active
> community members, I'll note!).  Shaming him for not linking docs that
> describe something /other/ than what he was even talking about is
> crossing the line IMO.  

Any member of a project community (contributor, committer or PMC member)
directing users to 3rd party docs in preference to project docs without
a good reason is missing an opportunity to strengthen that project
community.

> Linking to third-party resources is commonplace, the only difference I
> can see here is that these have been called "docs"  by the authors,
> instead of a blog post, and Mark has a DataStax email address.

Linking to third party reference docs for an Apache project in response
to a configuration question about that Apache project on one of the
project's mailing lists is pretty unusual.

Linking to third party docs, blogs, etc is fairly common but they tend
to be linked by the OP in the form of "I've followed the instructions I
found here and it doesn't work". The responses to such questions
typically include links to the relevant parts of the Apache hosted docs.

If the question is more involved then I have seen links to blogs,
presentations, YouTube etc provided as an answer. If this happens
multiple times for the same topic then it is usually added to an FAQ,
wiki or similar along with an e-mail to the author to see if they'd be
willing to contribute something to the docs.

> Would you have reacted this way if Aaron Morton linked a blog post by
> thelastpickle?  Or a random user posted their own resources?  Obviously not.

Wrong. My reaction was based on the content of the message (a link to
3rd party docs in response to a question when an equivalent link to
project hosted docs was available) not on who sent it or their employer.

> I was initially all for the ASF endeavour to counteract DataStax'
> outsized influence on the project, and was hopeful you might achieve
> some positive change.  Perhaps you may well still do.  But it seems to
> me that the ASF behaviour is beginning to cross from constructive
> criticism of the project participants to prejudicially hostile behaviour
> against certain community members - and that is unlikely to result in a
> better project.
> 
> You should be treating everyone consistently, in a manner that promotes
> project health.

It is not healthy if community members are directing users to 3rd party
documentation in preference to the project's own documentation. If it is
happening because the project's documentation is non-existent / wrong /
poorly written / etc. then that is understandable (and would be an issue
the project needed to address) but that was not the case in this instance.

There are many aspects to community health. In the grand scheme of
things the single e-mail that started this particular discussion is in
the noise. However, a consistent pattern of such e-mails would be much
more troubling. My intent was to ensure that such a pattern did not form.

Whether people agree with my response or not, the community is hopefully
more aware of the issue than it was previously.

Mark


> On Friday, 9 September 2016, Mark Thomas <markt@apache.org
> <ma...@apache.org>> wrote:
> 
>     On 09/09/2016 16:46, Mark Curtis wrote:
>     > If your partition sizes are over 100MB iirc then you'll normally see
>     > warnings in your system.log, this will outline the partition key, at
>     > least in Cassandra 2.0 and 2.1 as I recall.
>     >
>     > Your best friend here is nodetool cfstats which shows you the
>     > min/mean/max partition sizes for your table. It's quite often used to
>     > pinpoint large partitons on nodes in a cluster.
>     >
>     > More info
>     > here:
>     https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
>     <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html>
> 
>     Folks,
> 
>     It is *Apache* Cassandra. If you are going to point to docs, please
>     point to the official Apache docs unless there is a very good reason
>     not to.
> 
>     In this case:
> 
>     http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
>     <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb>
> 
>     looks to the place.
> 
>     Mark
> 
> 
>     >
>     > Thanks
>     >
>     > Mark
>     >
>     >
>     > On 9 September 2016 at 02:53, Anshu Vajpayee <anshu.vajpayee@gmail.com
>     > <ma...@gmail.com>> wrote:
>     >
>     >     Is there any way to get partition size for a  partition key ?
>     >
>     >
> 


Re: Partition size

Posted by Benedict Elliott Smith <be...@apache.org>.
Come on. This kind of inconsistent 'policing' is not helpful.

By all means, push the *committers* to improve the project docs as is
happening, and to promote the internal resources over external ones.

But Mark has absolutely no formal connection with the project, and his
contributions have only been to file a couple of JIRA (all of which have so
far been ignored by those of his colleagues who *are* active community
members, I'll note!).  Shaming him for not linking docs that describe
something *other* than what he was even talking about is crossing the line
IMO.

Linking to third-party resources is commonplace, the only difference I can
see here is that these have been called "docs"  by the authors, instead of
a blog post, and Mark has a DataStax email address.

Would you have reacted this way if Aaron Morton linked a blog post by
thelastpickle?  Or a random user posted their own resources?  Obviously not.

I was initially all for the ASF endeavour to counteract DataStax' outsized
influence on the project, and was hopeful you might achieve some positive
change.  Perhaps you may well still do.  But it seems to me that the ASF
behaviour is beginning to cross from constructive criticism of the project
participants to prejudicially hostile behaviour against certain community
members - and that is unlikely to result in a better project.

You should be treating everyone consistently, in a manner that promotes
project health.



On Friday, 9 September 2016, Mark Thomas <ma...@apache.org> wrote:

> On 09/09/2016 16:46, Mark Curtis wrote:
> > If your partition sizes are over 100MB iirc then you'll normally see
> > warnings in your system.log, this will outline the partition key, at
> > least in Cassandra 2.0 and 2.1 as I recall.
> >
> > Your best friend here is nodetool cfstats which shows you the
> > min/mean/max partition sizes for your table. It's quite often used to
> > pinpoint large partitons on nodes in a cluster.
> >
> > More info
> > here: https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/t
> oolsCFstats.html
>
> Folks,
>
> It is *Apache* Cassandra. If you are going to point to docs, please
> point to the official Apache docs unless there is a very good reason not
> to.
>
> In this case:
>
> http://cassandra.apache.org/doc/latest/configuration/cassand
> ra_config_file.html#compaction_large_partition_warning_threshold_mb
>
> looks to the place.
>
> Mark
>
>
> >
> > Thanks
> >
> > Mark
> >
> >
> > On 9 September 2016 at 02:53, Anshu Vajpayee <anshu.vajpayee@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >     Is there any way to get partition size for a  partition key ?
> >
> >
>
>

Re: Partition size

Posted by Mark Thomas <ma...@apache.org>.
On 09/09/2016 16:46, Mark Curtis wrote:
> If your partition sizes are over 100MB iirc then you'll normally see
> warnings in your system.log, this will outline the partition key, at
> least in Cassandra 2.0 and 2.1 as I recall.
> 
> Your best friend here is nodetool cfstats which shows you the
> min/mean/max partition sizes for your table. It's quite often used to
> pinpoint large partitons on nodes in a cluster.
> 
> More info
> here: https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html

Folks,

It is *Apache* Cassandra. If you are going to point to docs, please
point to the official Apache docs unless there is a very good reason not to.

In this case:

http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb

looks to the place.

Mark


> 
> Thanks
> 
> Mark
> 
> 
> On 9 September 2016 at 02:53, Anshu Vajpayee <anshu.vajpayee@gmail.com
> <ma...@gmail.com>> wrote:
> 
>     Is there any way to get partition size for a  partition key ?
> 
> 


Re: Partition size

Posted by Mark Curtis <ma...@datastax.com>.
If your partition sizes are over 100MB iirc then you'll normally see
warnings in your system.log, this will outline the partition key, at least
in Cassandra 2.0 and 2.1 as I recall.

Your best friend here is nodetool cfstats which shows you the min/mean/max
partition sizes for your table. It's quite often used to pinpoint large
partitons on nodes in a cluster.

More info here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html

Thanks

Mark


On 9 September 2016 at 02:53, Anshu Vajpayee <an...@gmail.com>
wrote:

> Is there any way to get partition size for a  partition key ?
>