You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Tim Dunphy <bl...@gmail.com> on 2015/02/19 02:09:51 UTC

run cassandra on a small instance

Hey all,

I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance over
at Digital Ocean. It's a CentOS 7 host.

But I'm having some difficulty there. Cassandra will start with no problems
and run for a while. But then choke on the lack of memory and crash. This
is what the system looks like after a reboot:

[root@web2:~] #free -m
             total       used       free     shared    buffers     cached
Mem:          2002        568       1433          8         20        207
-/+ buffers/cache:        340       1661
Swap:            0          0          0


After I start cassandra and leave it running for a few minutes, this is
what the memory situation looks like:

[root@web2:~] #free -m
             total       used       free     shared    buffers     cached
Mem:          2002       1669        332          8         21        359
-/+ buffers/cache:       1288        713
Swap:            0          0          0

I've been able to find this article on how to tune memory for Cassandra on
the datastax site:

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html

So I tried setting up the MAX_HEAP_SIZE and HEAP_NEW_SIZE like this in
cassandra-env.sh:

MAX_HEAP_SIZE="800M"
HEAP_NEWSIZE="200M"


And I gave the latter 200MB based on the fact that this VM has 2 cores:

[root@web2:/etc/alternatives/cassandrahome] #grep name /proc/cpuinfo
model name      : QEMU Virtual CPU version 1.0
model name      : QEMU Virtual CPU version 1.0

So, 100MB per core basically.

And I've found that this will run for a while.. like maybe 5 or 6 hours!!
So it does stay up a while. But then it finally will crash due to the lack
of memory.

Are there any tricks I can try or things I can do to get Cassandra to stay
runing happily in this amount of space? I'm just using test data and this
is an exercise to learn more about cassandra. I realize a 'real' data set
will require more resources in terms of memory.

Thanks!
Tim
-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Posted by Tim Dunphy <bl...@gmail.com>.
>
> 2.1.2 is IMO broken and should not be used for any purpose.
> Use 2.1.1 or 2.1.3.
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
> =Rob


Cool man. Thanks for the info. I just upgraded to 2.1.3. We'll see how that
goes. I can let you know more once it's been running for a while.

Thanks
Tim

On Wed, Feb 18, 2015 at 8:16 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
>> over at Digital Ocean. It's a CentOS 7 host.
>>
>
> 2.1.2 is IMO broken and should not be used for any purpose.
>
> Use 2.1.1 or 2.1.3.
>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>
> =Rob
>
>



-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Posted by Tim Dunphy <bl...@gmail.com>.
>
> What does your schema look like, your total data size and your read/write
> patterns? Maybe you are simply doing a heavier workload than a small
> instance can handle.


Hi Mark,

 OK well as mentioned this is all test data with almost literally no
workload. So I doubt it's the data and/ or workload that's causing it to
crash on the 2GB instance after 5 hours.

But when I describe the schema with my test data this is what I see:


cqlsh> use joke_fire1
   ... ;
cqlsh:joke_fire1> describe schema;

CREATE KEYSPACE joke_fire1 WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '3'}  AND durable_writes = true;

'module' object has no attribute 'UserTypesMeta'

If I take a look at the size of the total amount of data this is what I see:

[root@beta-new:/etc/alternatives/cassandrahome/data] #du -hs data
17M     data


Which includes the system keyspace. But the test data that I created for my
use is only 15MB:

[root@beta-new:/etc/alternatives/cassandrahome/data/data] #du -hs
joke_fire1/
15M     joke_fire1/

But just to see if it's my data that could be causing the problem, I tried
removing it all, and setting the IP of the 2GB instance itself as the seed
node. I'll try running that for a while and seeing if it crashes.


Also I tried just installing a plain cassandra 2.1.3 onto a plain CentOS
6.6 instance on the AWS free tier. It's a t.2 micro instance. So far it's
running. I'll keep an eye on both. At this point, I'm thinking that there
might be something about my data that could be causing it to fail after 5
or so hours.

However I might need some help diagnosing the data, as I'm not familiar on
how to do that with cassandra.

Thanks!
Tim

On Thu, Feb 19, 2015 at 3:51 AM, Mark Reddy <ma...@gmail.com> wrote:

> What does your schema look like, your total data size and your read/write
> patterns? Maybe you are simply doing a heavier workload than a small
> instance can handle.
>
>
> Regards,
> Mark
>
> On 19 February 2015 at 08:40, Carlos Rolo <ro...@pythian.com> wrote:
>
>> I have Cassandra instances running on VMs with smaller RAM (1GB even) and
>> I don't go OOM when testing them. Although I use them in AWS and other
>> providers, never tried Digital Ocean.
>>
>> Does Cassandra just fails after some time running or it is failing on
>> some specific read/write?
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Tel: 1649
>> www.pythian.com
>>
>> On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy <bl...@gmail.com> wrote:
>>
>>> Hey guys,
>>>
>>> After the upgrade to 2.1.3, and after almost exactly 5 hours running
>>> cassandra did indeed crash again on the 2GB ram VM.
>>>
>>> This is how the memory on the VM looked after the crash:
>>>
>>> [root@web2:~] #free -m
>>>              total       used       free     shared    buffers     cached
>>> Mem:          2002       1227        774          8         45        386
>>> -/+ buffers/cache:        794       1207
>>> Swap:            0          0          0
>>>
>>>
>>> And that's with this set in the cassandra-env.sh file:
>>>
>>> MAX_HEAP_SIZE="800M"
>>> HEAP_NEWSIZE="200M"
>>>
>>> So I'm thinking now, do I just have to abandon this idea I have of
>>> running Cassandra on a 2GB instance? Or is this something we can all agree
>>> can be done? And if so, how can we do that? :)
>>>
>>> Thanks
>>> Tim
>>>
>>> On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA <
>>> jason.kushmaul@wda.com> wrote:
>>>
>>>> I asked this previously when a similar message came through, with a
>>>> similar response.
>>>>
>>>>
>>>>
>>>> planetcassandra seems to have it “right”, in that stable=2.0,
>>>> development=2.1, whereas the apache site says stable is 2.1.
>>>>
>>>> “Right” in they assume latest minor version is development.  Why not
>>>> have the apache site do the same?  That’s just my lowly non-contributing
>>>> opinion though.
>>>>
>>>>
>>>>
>>>> *Jason  *
>>>>
>>>>
>>>>
>>>> *From:* Andrew [mailto:redmumba@gmail.com]
>>>> *Sent:* Wednesday, February 18, 2015 8:26 PM
>>>> *To:* Robert Coli; user@cassandra.apache.org
>>>> *Subject:* Re: run cassandra on a small instance
>>>>
>>>>
>>>>
>>>> Robert,
>>>>
>>>>
>>>>
>>>> Let me know if I’m off base about this—but I feel like I see a lot of
>>>> posts that are like this (i.e., use this arbitrary version, not this other
>>>> arbitrary version).  Why are releases going out if they’re “broken”?  This
>>>> seems like a very confusing way for new (and existing) users to approach
>>>> versions...
>>>>
>>>>
>>>>
>>>> Andrew
>>>>
>>>>
>>>>
>>>> On February 18, 2015 at 5:16:27 PM, Robert Coli (rcoli@eventbrite.com)
>>>> wrote:
>>>>
>>>> On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>> I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
>>>> over at Digital Ocean. It's a CentOS 7 host.
>>>>
>>>>
>>>>
>>>> 2.1.2 is IMO broken and should not be used for any purpose.
>>>>
>>>>
>>>>
>>>> Use 2.1.1 or 2.1.3.
>>>>
>>>>
>>>>
>>>>
>>>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>>>
>>>>
>>>>
>>>> =Rob
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> GPG me!!
>>>
>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>
>>>
>>
>> --
>>
>>
>>
>>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Posted by Mark Reddy <ma...@gmail.com>.
What does your schema look like, your total data size and your read/write
patterns? Maybe you are simply doing a heavier workload than a small
instance can handle.


Regards,
Mark

On 19 February 2015 at 08:40, Carlos Rolo <ro...@pythian.com> wrote:

> I have Cassandra instances running on VMs with smaller RAM (1GB even) and
> I don't go OOM when testing them. Although I use them in AWS and other
> providers, never tried Digital Ocean.
>
> Does Cassandra just fails after some time running or it is failing on some
> specific read/write?
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Tel: 1649
> www.pythian.com
>
> On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> Hey guys,
>>
>> After the upgrade to 2.1.3, and after almost exactly 5 hours running
>> cassandra did indeed crash again on the 2GB ram VM.
>>
>> This is how the memory on the VM looked after the crash:
>>
>> [root@web2:~] #free -m
>>              total       used       free     shared    buffers     cached
>> Mem:          2002       1227        774          8         45        386
>> -/+ buffers/cache:        794       1207
>> Swap:            0          0          0
>>
>>
>> And that's with this set in the cassandra-env.sh file:
>>
>> MAX_HEAP_SIZE="800M"
>> HEAP_NEWSIZE="200M"
>>
>> So I'm thinking now, do I just have to abandon this idea I have of
>> running Cassandra on a 2GB instance? Or is this something we can all agree
>> can be done? And if so, how can we do that? :)
>>
>> Thanks
>> Tim
>>
>> On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA <
>> jason.kushmaul@wda.com> wrote:
>>
>>> I asked this previously when a similar message came through, with a
>>> similar response.
>>>
>>>
>>>
>>> planetcassandra seems to have it “right”, in that stable=2.0,
>>> development=2.1, whereas the apache site says stable is 2.1.
>>>
>>> “Right” in they assume latest minor version is development.  Why not
>>> have the apache site do the same?  That’s just my lowly non-contributing
>>> opinion though.
>>>
>>>
>>>
>>> *Jason  *
>>>
>>>
>>>
>>> *From:* Andrew [mailto:redmumba@gmail.com]
>>> *Sent:* Wednesday, February 18, 2015 8:26 PM
>>> *To:* Robert Coli; user@cassandra.apache.org
>>> *Subject:* Re: run cassandra on a small instance
>>>
>>>
>>>
>>> Robert,
>>>
>>>
>>>
>>> Let me know if I’m off base about this—but I feel like I see a lot of
>>> posts that are like this (i.e., use this arbitrary version, not this other
>>> arbitrary version).  Why are releases going out if they’re “broken”?  This
>>> seems like a very confusing way for new (and existing) users to approach
>>> versions...
>>>
>>>
>>>
>>> Andrew
>>>
>>>
>>>
>>> On February 18, 2015 at 5:16:27 PM, Robert Coli (rcoli@eventbrite.com)
>>> wrote:
>>>
>>> On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com>
>>> wrote:
>>>
>>> I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
>>> over at Digital Ocean. It's a CentOS 7 host.
>>>
>>>
>>>
>>> 2.1.2 is IMO broken and should not be used for any purpose.
>>>
>>>
>>>
>>> Use 2.1.1 or 2.1.3.
>>>
>>>
>>>
>>>
>>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>>
>>>
>>>
>>> =Rob
>>>
>>>
>>>
>>>
>>
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
> --
>
>
>
>

Re: run cassandra on a small instance

Posted by Tim Dunphy <bl...@gmail.com>.
>
> What I normally do is install plain CentOS (Not any AMI build for
> Cassandra) and I don't use them for production! I run them for testing,
> fire drills and some cassandra-stress benchmarks. I will look if I had more
> than 5h Cassandra uptime. I can even put one up now and do the test and get
> the results back to you.


Hey thanks for letting me know that. And yep! Same here. It's just a plain
CentOS 7 VM I've been using. None of this is for production. I also have an
AWS account that I use only for testing. I can try setting it up there to
and get back to you with my results.

Thank you!
Tim

On Thu, Feb 19, 2015 at 12:52 PM, Carlos Rolo <ro...@pythian.com> wrote:

> What I normally do is install plain CentOS (Not any AMI build for
> Cassandra) and I don't use them for production! I run them for testing,
> fire drills and some cassandra-stress benchmarks. I will look if I had more
> than 5h Cassandra uptime. I can even put one up now and do the test and get
> the results back to you.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Tel: 1649
> www.pythian.com
>
> On Thu, Feb 19, 2015 at 6:41 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> I have Cassandra instances running on VMs with smaller RAM (1GB even) and
>>> I don't go OOM when testing them. Although I use them in AWS and other
>>> providers, never tried Digital Ocean.
>>> Does Cassandra just fails after some time running or it is failing on
>>> some specific read/write?
>>
>>
>> Hi  Carlos,
>>
>> Ok, that's really interesting. So I have to ask, did you have to do
>> anything special to get Cassandra to run on those 1GB AWS instances? I'd
>> love to do the same. I even tried there as well and failed due to lack of
>> memory to run it.
>>
>> And there is no specific reason other than lack of memory that I can tell
>> for it to fail. And it doesn's seem to matter what data I use either.
>> Because even if I remove the data directory with rm -rf, the phenomenon is
>> the same. It'll run for a while, usually about 5 hours and then just crash
>> with the word 'killed' as the last line of output.
>>
>> Thanks
>> Tim
>>
>>
>> On Thu, Feb 19, 2015 at 3:40 AM, Carlos Rolo <ro...@pythian.com> wrote:
>>
>>> I have Cassandra instances running on VMs with smaller RAM (1GB even)
>>> and I don't go OOM when testing them. Although I use them in AWS and other
>>> providers, never tried Digital Ocean.
>>>
>>> Does Cassandra just fails after some time running or it is failing on
>>> some specific read/write?
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>> Tel: 1649
>>> www.pythian.com
>>>
>>> On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy <bl...@gmail.com>
>>> wrote:
>>>
>>>> Hey guys,
>>>>
>>>> After the upgrade to 2.1.3, and after almost exactly 5 hours running
>>>> cassandra did indeed crash again on the 2GB ram VM.
>>>>
>>>> This is how the memory on the VM looked after the crash:
>>>>
>>>> [root@web2:~] #free -m
>>>>              total       used       free     shared    buffers
>>>> cached
>>>> Mem:          2002       1227        774          8         45
>>>>  386
>>>> -/+ buffers/cache:        794       1207
>>>> Swap:            0          0          0
>>>>
>>>>
>>>> And that's with this set in the cassandra-env.sh file:
>>>>
>>>> MAX_HEAP_SIZE="800M"
>>>> HEAP_NEWSIZE="200M"
>>>>
>>>> So I'm thinking now, do I just have to abandon this idea I have of
>>>> running Cassandra on a 2GB instance? Or is this something we can all agree
>>>> can be done? And if so, how can we do that? :)
>>>>
>>>> Thanks
>>>> Tim
>>>>
>>>> On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA <
>>>> jason.kushmaul@wda.com> wrote:
>>>>
>>>>> I asked this previously when a similar message came through, with a
>>>>> similar response.
>>>>>
>>>>>
>>>>>
>>>>> planetcassandra seems to have it “right”, in that stable=2.0,
>>>>> development=2.1, whereas the apache site says stable is 2.1.
>>>>>
>>>>> “Right” in they assume latest minor version is development.  Why not
>>>>> have the apache site do the same?  That’s just my lowly non-contributing
>>>>> opinion though.
>>>>>
>>>>>
>>>>>
>>>>> *Jason  *
>>>>>
>>>>>
>>>>>
>>>>> *From:* Andrew [mailto:redmumba@gmail.com]
>>>>> *Sent:* Wednesday, February 18, 2015 8:26 PM
>>>>> *To:* Robert Coli; user@cassandra.apache.org
>>>>> *Subject:* Re: run cassandra on a small instance
>>>>>
>>>>>
>>>>>
>>>>> Robert,
>>>>>
>>>>>
>>>>>
>>>>> Let me know if I’m off base about this—but I feel like I see a lot of
>>>>> posts that are like this (i.e., use this arbitrary version, not this other
>>>>> arbitrary version).  Why are releases going out if they’re “broken”?  This
>>>>> seems like a very confusing way for new (and existing) users to approach
>>>>> versions...
>>>>>
>>>>>
>>>>>
>>>>> Andrew
>>>>>
>>>>>
>>>>>
>>>>> On February 18, 2015 at 5:16:27 PM, Robert Coli (rcoli@eventbrite.com)
>>>>> wrote:
>>>>>
>>>>> On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
>>>>> over at Digital Ocean. It's a CentOS 7 host.
>>>>>
>>>>>
>>>>>
>>>>> 2.1.2 is IMO broken and should not be used for any purpose.
>>>>>
>>>>>
>>>>>
>>>>> Use 2.1.1 or 2.1.3.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>>>>
>>>>>
>>>>>
>>>>> =Rob
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> GPG me!!
>>>>
>>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>>
>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
> --
>
>
>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Posted by Carlos Rolo <ro...@pythian.com>.
What I normally do is install plain CentOS (Not any AMI build for
Cassandra) and I don't use them for production! I run them for testing,
fire drills and some cassandra-stress benchmarks. I will look if I had more
than 5h Cassandra uptime. I can even put one up now and do the test and get
the results back to you.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Tel: 1649
www.pythian.com

On Thu, Feb 19, 2015 at 6:41 PM, Tim Dunphy <bl...@gmail.com> wrote:

> I have Cassandra instances running on VMs with smaller RAM (1GB even) and
>> I don't go OOM when testing them. Although I use them in AWS and other
>> providers, never tried Digital Ocean.
>> Does Cassandra just fails after some time running or it is failing on
>> some specific read/write?
>
>
> Hi  Carlos,
>
> Ok, that's really interesting. So I have to ask, did you have to do
> anything special to get Cassandra to run on those 1GB AWS instances? I'd
> love to do the same. I even tried there as well and failed due to lack of
> memory to run it.
>
> And there is no specific reason other than lack of memory that I can tell
> for it to fail. And it doesn's seem to matter what data I use either.
> Because even if I remove the data directory with rm -rf, the phenomenon is
> the same. It'll run for a while, usually about 5 hours and then just crash
> with the word 'killed' as the last line of output.
>
> Thanks
> Tim
>
>
> On Thu, Feb 19, 2015 at 3:40 AM, Carlos Rolo <ro...@pythian.com> wrote:
>
>> I have Cassandra instances running on VMs with smaller RAM (1GB even) and
>> I don't go OOM when testing them. Although I use them in AWS and other
>> providers, never tried Digital Ocean.
>>
>> Does Cassandra just fails after some time running or it is failing on
>> some specific read/write?
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Tel: 1649
>> www.pythian.com
>>
>> On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy <bl...@gmail.com> wrote:
>>
>>> Hey guys,
>>>
>>> After the upgrade to 2.1.3, and after almost exactly 5 hours running
>>> cassandra did indeed crash again on the 2GB ram VM.
>>>
>>> This is how the memory on the VM looked after the crash:
>>>
>>> [root@web2:~] #free -m
>>>              total       used       free     shared    buffers     cached
>>> Mem:          2002       1227        774          8         45        386
>>> -/+ buffers/cache:        794       1207
>>> Swap:            0          0          0
>>>
>>>
>>> And that's with this set in the cassandra-env.sh file:
>>>
>>> MAX_HEAP_SIZE="800M"
>>> HEAP_NEWSIZE="200M"
>>>
>>> So I'm thinking now, do I just have to abandon this idea I have of
>>> running Cassandra on a 2GB instance? Or is this something we can all agree
>>> can be done? And if so, how can we do that? :)
>>>
>>> Thanks
>>> Tim
>>>
>>> On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA <
>>> jason.kushmaul@wda.com> wrote:
>>>
>>>> I asked this previously when a similar message came through, with a
>>>> similar response.
>>>>
>>>>
>>>>
>>>> planetcassandra seems to have it “right”, in that stable=2.0,
>>>> development=2.1, whereas the apache site says stable is 2.1.
>>>>
>>>> “Right” in they assume latest minor version is development.  Why not
>>>> have the apache site do the same?  That’s just my lowly non-contributing
>>>> opinion though.
>>>>
>>>>
>>>>
>>>> *Jason  *
>>>>
>>>>
>>>>
>>>> *From:* Andrew [mailto:redmumba@gmail.com]
>>>> *Sent:* Wednesday, February 18, 2015 8:26 PM
>>>> *To:* Robert Coli; user@cassandra.apache.org
>>>> *Subject:* Re: run cassandra on a small instance
>>>>
>>>>
>>>>
>>>> Robert,
>>>>
>>>>
>>>>
>>>> Let me know if I’m off base about this—but I feel like I see a lot of
>>>> posts that are like this (i.e., use this arbitrary version, not this other
>>>> arbitrary version).  Why are releases going out if they’re “broken”?  This
>>>> seems like a very confusing way for new (and existing) users to approach
>>>> versions...
>>>>
>>>>
>>>>
>>>> Andrew
>>>>
>>>>
>>>>
>>>> On February 18, 2015 at 5:16:27 PM, Robert Coli (rcoli@eventbrite.com)
>>>> wrote:
>>>>
>>>> On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com>
>>>> wrote:
>>>>
>>>> I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
>>>> over at Digital Ocean. It's a CentOS 7 host.
>>>>
>>>>
>>>>
>>>> 2.1.2 is IMO broken and should not be used for any purpose.
>>>>
>>>>
>>>>
>>>> Use 2.1.1 or 2.1.3.
>>>>
>>>>
>>>>
>>>>
>>>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>>>
>>>>
>>>>
>>>> =Rob
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> GPG me!!
>>>
>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>
>>>
>>
>> --
>>
>>
>>
>>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>

-- 


--




Re: run cassandra on a small instance

Posted by Tim Dunphy <bl...@gmail.com>.
>
> I have Cassandra instances running on VMs with smaller RAM (1GB even) and
> I don't go OOM when testing them. Although I use them in AWS and other
> providers, never tried Digital Ocean.
> Does Cassandra just fails after some time running or it is failing on some
> specific read/write?


Hi  Carlos,

Ok, that's really interesting. So I have to ask, did you have to do
anything special to get Cassandra to run on those 1GB AWS instances? I'd
love to do the same. I even tried there as well and failed due to lack of
memory to run it.

And there is no specific reason other than lack of memory that I can tell
for it to fail. And it doesn's seem to matter what data I use either.
Because even if I remove the data directory with rm -rf, the phenomenon is
the same. It'll run for a while, usually about 5 hours and then just crash
with the word 'killed' as the last line of output.

Thanks
Tim


On Thu, Feb 19, 2015 at 3:40 AM, Carlos Rolo <ro...@pythian.com> wrote:

> I have Cassandra instances running on VMs with smaller RAM (1GB even) and
> I don't go OOM when testing them. Although I use them in AWS and other
> providers, never tried Digital Ocean.
>
> Does Cassandra just fails after some time running or it is failing on some
> specific read/write?
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Tel: 1649
> www.pythian.com
>
> On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> Hey guys,
>>
>> After the upgrade to 2.1.3, and after almost exactly 5 hours running
>> cassandra did indeed crash again on the 2GB ram VM.
>>
>> This is how the memory on the VM looked after the crash:
>>
>> [root@web2:~] #free -m
>>              total       used       free     shared    buffers     cached
>> Mem:          2002       1227        774          8         45        386
>> -/+ buffers/cache:        794       1207
>> Swap:            0          0          0
>>
>>
>> And that's with this set in the cassandra-env.sh file:
>>
>> MAX_HEAP_SIZE="800M"
>> HEAP_NEWSIZE="200M"
>>
>> So I'm thinking now, do I just have to abandon this idea I have of
>> running Cassandra on a 2GB instance? Or is this something we can all agree
>> can be done? And if so, how can we do that? :)
>>
>> Thanks
>> Tim
>>
>> On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA <
>> jason.kushmaul@wda.com> wrote:
>>
>>> I asked this previously when a similar message came through, with a
>>> similar response.
>>>
>>>
>>>
>>> planetcassandra seems to have it “right”, in that stable=2.0,
>>> development=2.1, whereas the apache site says stable is 2.1.
>>>
>>> “Right” in they assume latest minor version is development.  Why not
>>> have the apache site do the same?  That’s just my lowly non-contributing
>>> opinion though.
>>>
>>>
>>>
>>> *Jason  *
>>>
>>>
>>>
>>> *From:* Andrew [mailto:redmumba@gmail.com]
>>> *Sent:* Wednesday, February 18, 2015 8:26 PM
>>> *To:* Robert Coli; user@cassandra.apache.org
>>> *Subject:* Re: run cassandra on a small instance
>>>
>>>
>>>
>>> Robert,
>>>
>>>
>>>
>>> Let me know if I’m off base about this—but I feel like I see a lot of
>>> posts that are like this (i.e., use this arbitrary version, not this other
>>> arbitrary version).  Why are releases going out if they’re “broken”?  This
>>> seems like a very confusing way for new (and existing) users to approach
>>> versions...
>>>
>>>
>>>
>>> Andrew
>>>
>>>
>>>
>>> On February 18, 2015 at 5:16:27 PM, Robert Coli (rcoli@eventbrite.com)
>>> wrote:
>>>
>>> On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com>
>>> wrote:
>>>
>>> I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
>>> over at Digital Ocean. It's a CentOS 7 host.
>>>
>>>
>>>
>>> 2.1.2 is IMO broken and should not be used for any purpose.
>>>
>>>
>>>
>>> Use 2.1.1 or 2.1.3.
>>>
>>>
>>>
>>>
>>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>>
>>>
>>>
>>> =Rob
>>>
>>>
>>>
>>>
>>
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
> --
>
>
>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Posted by Carlos Rolo <ro...@pythian.com>.
I have Cassandra instances running on VMs with smaller RAM (1GB even) and I
don't go OOM when testing them. Although I use them in AWS and other
providers, never tried Digital Ocean.

Does Cassandra just fails after some time running or it is failing on some
specific read/write?

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Tel: 1649
www.pythian.com

On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy <bl...@gmail.com> wrote:

> Hey guys,
>
> After the upgrade to 2.1.3, and after almost exactly 5 hours running
> cassandra did indeed crash again on the 2GB ram VM.
>
> This is how the memory on the VM looked after the crash:
>
> [root@web2:~] #free -m
>              total       used       free     shared    buffers     cached
> Mem:          2002       1227        774          8         45        386
> -/+ buffers/cache:        794       1207
> Swap:            0          0          0
>
>
> And that's with this set in the cassandra-env.sh file:
>
> MAX_HEAP_SIZE="800M"
> HEAP_NEWSIZE="200M"
>
> So I'm thinking now, do I just have to abandon this idea I have of running
> Cassandra on a 2GB instance? Or is this something we can all agree can be
> done? And if so, how can we do that? :)
>
> Thanks
> Tim
>
> On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA <
> jason.kushmaul@wda.com> wrote:
>
>> I asked this previously when a similar message came through, with a
>> similar response.
>>
>>
>>
>> planetcassandra seems to have it “right”, in that stable=2.0,
>> development=2.1, whereas the apache site says stable is 2.1.
>>
>> “Right” in they assume latest minor version is development.  Why not have
>> the apache site do the same?  That’s just my lowly non-contributing opinion
>> though.
>>
>>
>>
>> *Jason  *
>>
>>
>>
>> *From:* Andrew [mailto:redmumba@gmail.com]
>> *Sent:* Wednesday, February 18, 2015 8:26 PM
>> *To:* Robert Coli; user@cassandra.apache.org
>> *Subject:* Re: run cassandra on a small instance
>>
>>
>>
>> Robert,
>>
>>
>>
>> Let me know if I’m off base about this—but I feel like I see a lot of
>> posts that are like this (i.e., use this arbitrary version, not this other
>> arbitrary version).  Why are releases going out if they’re “broken”?  This
>> seems like a very confusing way for new (and existing) users to approach
>> versions...
>>
>>
>>
>> Andrew
>>
>>
>>
>> On February 18, 2015 at 5:16:27 PM, Robert Coli (rcoli@eventbrite.com)
>> wrote:
>>
>> On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com> wrote:
>>
>> I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
>> over at Digital Ocean. It's a CentOS 7 host.
>>
>>
>>
>> 2.1.2 is IMO broken and should not be used for any purpose.
>>
>>
>>
>> Use 2.1.1 or 2.1.3.
>>
>>
>>
>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>
>>
>>
>> =Rob
>>
>>
>>
>>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>

-- 


--




Re: run cassandra on a small instance

Posted by Tim Dunphy <bl...@gmail.com>.
Hey guys,

After the upgrade to 2.1.3, and after almost exactly 5 hours running
cassandra did indeed crash again on the 2GB ram VM.

This is how the memory on the VM looked after the crash:

[root@web2:~] #free -m
             total       used       free     shared    buffers     cached
Mem:          2002       1227        774          8         45        386
-/+ buffers/cache:        794       1207
Swap:            0          0          0


And that's with this set in the cassandra-env.sh file:

MAX_HEAP_SIZE="800M"
HEAP_NEWSIZE="200M"

So I'm thinking now, do I just have to abandon this idea I have of running
Cassandra on a 2GB instance? Or is this something we can all agree can be
done? And if so, how can we do that? :)

Thanks
Tim

On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA <
jason.kushmaul@wda.com> wrote:

> I asked this previously when a similar message came through, with a
> similar response.
>
>
>
> planetcassandra seems to have it “right”, in that stable=2.0,
> development=2.1, whereas the apache site says stable is 2.1.
>
> “Right” in they assume latest minor version is development.  Why not have
> the apache site do the same?  That’s just my lowly non-contributing opinion
> though.
>
>
>
> *Jason  *
>
>
>
> *From:* Andrew [mailto:redmumba@gmail.com]
> *Sent:* Wednesday, February 18, 2015 8:26 PM
> *To:* Robert Coli; user@cassandra.apache.org
> *Subject:* Re: run cassandra on a small instance
>
>
>
> Robert,
>
>
>
> Let me know if I’m off base about this—but I feel like I see a lot of
> posts that are like this (i.e., use this arbitrary version, not this other
> arbitrary version).  Why are releases going out if they’re “broken”?  This
> seems like a very confusing way for new (and existing) users to approach
> versions...
>
>
>
> Andrew
>
>
>
> On February 18, 2015 at 5:16:27 PM, Robert Coli (rcoli@eventbrite.com)
> wrote:
>
> On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
> I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
> over at Digital Ocean. It's a CentOS 7 host.
>
>
>
> 2.1.2 is IMO broken and should not be used for any purpose.
>
>
>
> Use 2.1.1 or 2.1.3.
>
>
>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>
>
>
> =Rob
>
>
>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

RE: run cassandra on a small instance

Posted by Jason Kushmaul | WDA <ja...@wda.com>.
I asked this previously when a similar message came through, with a similar response.

planetcassandra seems to have it “right”, in that stable=2.0, development=2.1, whereas the apache site says stable is 2.1.
“Right” in they assume latest minor version is development.  Why not have the apache site do the same?  That’s just my lowly non-contributing opinion though.

Jason

From: Andrew [mailto:redmumba@gmail.com]
Sent: Wednesday, February 18, 2015 8:26 PM
To: Robert Coli; user@cassandra.apache.org
Subject: Re: run cassandra on a small instance

Robert,

Let me know if I’m off base about this—but I feel like I see a lot of posts that are like this (i.e., use this arbitrary version, not this other arbitrary version).  Why are releases going out if they’re “broken”?  This seems like a very confusing way for new (and existing) users to approach versions...

Andrew


On February 18, 2015 at 5:16:27 PM, Robert Coli (rcoli@eventbrite.com<ma...@eventbrite.com>) wrote:
On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com>> wrote:
I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB<http://2.GB> ram instance over at Digital Ocean. It's a CentOS 7 host.

2.1.2 is IMO broken and should not be used for any purpose.

Use 2.1.1 or 2.1.3.

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

=Rob


Re: run cassandra on a small instance

Posted by Nate McCall <na...@thelastpickle.com>.
Agreed and good point. Just added it to mine - thanks Ben.

On Sun, Feb 22, 2015 at 9:43 PM, Ben Bromhead <be...@instaclustr.com> wrote:

> You might also have some gains setting in_memory_compaction_limit_in_mb to
> something very low to force Cassandra to use on disk compaction rather than
> doing it in memory.
>
> On 23 February 2015 at 14:12, Tim Dunphy <bl...@gmail.com> wrote:
>
>> Nate,
>>
>>  Definitely thank you for this advice. After leaving the new Cassandra
>> node running on the 2GB instance for the past couple of days, I think I've
>> had ample reason to report complete success in getting it stabilized on
>> that instance! Here are the changes I've been able to make:
>>
>>  I think manipulating the key cache and other stuff like concurrent
>> writes and some of the other stuff I worked on based on that thread from
>> the cassandra list definitely was key in getting Cassandra to work on the
>> new instance.
>>
>> Check out the before and after (before working/ after working):
>>
>> Before in cassandra-env.sh:
>>    MAX_HEAP_SIZE="800M"
>>    HEAP_NEWSIZE="200M"
>>
>> After:
>>     MAX_HEAP_SIZE="512M"
>>     HEAP_NEWSIZE="100M"
>>
>> And before in the cassandra.yaml file:
>>
>>    concurrent_writes: 32
>>    compaction_throughput_mb_per_sec: 16
>>    key_cache_size_in_mb:
>>    key_cache_save_period: 14400
>>    # native_transport_max_threads: 128
>>
>> And after:
>>
>>     concurrent_writes: 2
>>     compaction_throughput_mb_per_sec: 8
>>     key_cache_size_in_mb: 4
>>     key_cache_save_period: 0
>>     native_transport_max_threads: 4
>>
>>
>> That really made the difference. I'm a puppet user, so these changes are
>> in puppet. So any new 2GB instances I should bring up on Digital Ocean
>> should absolutely work the way the first 2GB node does, there.  But I was
>> able to make enough sense of your chef recipe to adapt what you were
>> showing me.
>>
>> Thanks again!
>> Tim
>>
>> On Fri, Feb 20, 2015 at 10:31 PM, Tim Dunphy <bl...@gmail.com>
>> wrote:
>>
>>> The most important things to note:
>>>> - don't include JNA (it needs to lock pages larger than what will be
>>>> available)
>>>> - turn down threadpools for transports
>>>> - turn compaction throughput way down
>>>> - make concurrent reads and writes very small
>>>> I have used the above run a healthy 5 node clusters locally in it's own
>>>> private network with a 6th monitoring server for light to moderate local
>>>> testing in 16g of laptop ram. YMMV but it is possible.
>>>
>>>
>>> Thanks!! That was very helpful. I just tried applying your suggestions
>>> to my cassandra.yaml file. I used the info from your chef recipe. Well like
>>> I've been saying typically it takes about 5 hours or so for this situation
>>> to shake itself out. I'll provide an update to the list once I have a
>>> better idea of how this is working.
>>>
>>> Thanks again!
>>> Tim
>>>
>>> On Fri, Feb 20, 2015 at 9:37 PM, Nate McCall <na...@thelastpickle.com>
>>> wrote:
>>>
>>>> I frequently test with multi-node vagrant-based clusters locally. The
>>>> following chef attributes should give you an idea of what to turn down in
>>>> cassandra.yaml and cassandra-env.sh to build a decent testing cluster:
>>>>
>>>>           :cassandra => {'cluster_name' => 'VerifyCluster',
>>>>                          'package_name' => 'dsc20',
>>>>                          'version' => '2.0.11',
>>>>                          'release' => '1',
>>>>                          'setup_jna' => false,
>>>>                          'max_heap_size' => '512M',
>>>>                          'heap_new_size' => '100M',
>>>>                          'initial_token' => server['initial_token'],
>>>>                          'seeds' => "192.168.33.10",
>>>>                          'listen_address' => server['ip'],
>>>>                          'broadcast_address' => server['ip'],
>>>>                          'rpc_address' => server['ip'],
>>>>                          'conconcurrent_reads' => "2",
>>>>                          'concurrent_writes' => "2",
>>>>                          'memtable_flush_queue_size' => "2",
>>>>                          'compaction_throughput_mb_per_sec' => "8",
>>>>                          'key_cache_size_in_mb' => "4",
>>>>                          'key_cache_save_period' => "0",
>>>>                          'native_transport_min_threads' => "2",
>>>>                          'native_transport_max_threads' => "4",
>>>>                          'notify_restart' => true,
>>>>                          'reporter' => {
>>>>                            'riemann' => {
>>>>                              'enable' => true,
>>>>                              'host' => '192.168.33.51'
>>>>                            },
>>>>                            'graphite' => {
>>>>                              'enable' => true,
>>>>                              'host' => '192.168.33.51'
>>>>                            }
>>>>                          }
>>>>                        },
>>>>
>>>> The most important things to note:
>>>> - don't include JNA (it needs to lock pages larger than what will be
>>>> available)
>>>> - turn down threadpools for transports
>>>> - turn compaction throughput way down
>>>> - make concurrent reads and writes very small
>>>>
>>>> I have used the above run a healthy 5 node clusters locally in it's own
>>>> private network with a 6th monitoring server for light to moderate local
>>>> testing in 16g of laptop ram. YMMV but it is possible.
>>>>
>>>
>>>
>>>
>>> --
>>> GPG me!!
>>>
>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>
>>>
>>
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
>
> --
>
> Ben Bromhead
>
> Instaclustr | www.instaclustr.com | @instaclustr
> <http://twitter.com/instaclustr> | (650) 284 9692
>



-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: run cassandra on a small instance

Posted by Ben Bromhead <be...@instaclustr.com>.
You might also have some gains setting in_memory_compaction_limit_in_mb to
something very low to force Cassandra to use on disk compaction rather than
doing it in memory.

On 23 February 2015 at 14:12, Tim Dunphy <bl...@gmail.com> wrote:

> Nate,
>
>  Definitely thank you for this advice. After leaving the new Cassandra
> node running on the 2GB instance for the past couple of days, I think I've
> had ample reason to report complete success in getting it stabilized on
> that instance! Here are the changes I've been able to make:
>
>  I think manipulating the key cache and other stuff like concurrent writes
> and some of the other stuff I worked on based on that thread from the
> cassandra list definitely was key in getting Cassandra to work on the new
> instance.
>
> Check out the before and after (before working/ after working):
>
> Before in cassandra-env.sh:
>    MAX_HEAP_SIZE="800M"
>    HEAP_NEWSIZE="200M"
>
> After:
>     MAX_HEAP_SIZE="512M"
>     HEAP_NEWSIZE="100M"
>
> And before in the cassandra.yaml file:
>
>    concurrent_writes: 32
>    compaction_throughput_mb_per_sec: 16
>    key_cache_size_in_mb:
>    key_cache_save_period: 14400
>    # native_transport_max_threads: 128
>
> And after:
>
>     concurrent_writes: 2
>     compaction_throughput_mb_per_sec: 8
>     key_cache_size_in_mb: 4
>     key_cache_save_period: 0
>     native_transport_max_threads: 4
>
>
> That really made the difference. I'm a puppet user, so these changes are
> in puppet. So any new 2GB instances I should bring up on Digital Ocean
> should absolutely work the way the first 2GB node does, there.  But I was
> able to make enough sense of your chef recipe to adapt what you were
> showing me.
>
> Thanks again!
> Tim
>
> On Fri, Feb 20, 2015 at 10:31 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> The most important things to note:
>>> - don't include JNA (it needs to lock pages larger than what will be
>>> available)
>>> - turn down threadpools for transports
>>> - turn compaction throughput way down
>>> - make concurrent reads and writes very small
>>> I have used the above run a healthy 5 node clusters locally in it's own
>>> private network with a 6th monitoring server for light to moderate local
>>> testing in 16g of laptop ram. YMMV but it is possible.
>>
>>
>> Thanks!! That was very helpful. I just tried applying your suggestions to
>> my cassandra.yaml file. I used the info from your chef recipe. Well like
>> I've been saying typically it takes about 5 hours or so for this situation
>> to shake itself out. I'll provide an update to the list once I have a
>> better idea of how this is working.
>>
>> Thanks again!
>> Tim
>>
>> On Fri, Feb 20, 2015 at 9:37 PM, Nate McCall <na...@thelastpickle.com>
>> wrote:
>>
>>> I frequently test with multi-node vagrant-based clusters locally. The
>>> following chef attributes should give you an idea of what to turn down in
>>> cassandra.yaml and cassandra-env.sh to build a decent testing cluster:
>>>
>>>           :cassandra => {'cluster_name' => 'VerifyCluster',
>>>                          'package_name' => 'dsc20',
>>>                          'version' => '2.0.11',
>>>                          'release' => '1',
>>>                          'setup_jna' => false,
>>>                          'max_heap_size' => '512M',
>>>                          'heap_new_size' => '100M',
>>>                          'initial_token' => server['initial_token'],
>>>                          'seeds' => "192.168.33.10",
>>>                          'listen_address' => server['ip'],
>>>                          'broadcast_address' => server['ip'],
>>>                          'rpc_address' => server['ip'],
>>>                          'conconcurrent_reads' => "2",
>>>                          'concurrent_writes' => "2",
>>>                          'memtable_flush_queue_size' => "2",
>>>                          'compaction_throughput_mb_per_sec' => "8",
>>>                          'key_cache_size_in_mb' => "4",
>>>                          'key_cache_save_period' => "0",
>>>                          'native_transport_min_threads' => "2",
>>>                          'native_transport_max_threads' => "4",
>>>                          'notify_restart' => true,
>>>                          'reporter' => {
>>>                            'riemann' => {
>>>                              'enable' => true,
>>>                              'host' => '192.168.33.51'
>>>                            },
>>>                            'graphite' => {
>>>                              'enable' => true,
>>>                              'host' => '192.168.33.51'
>>>                            }
>>>                          }
>>>                        },
>>>
>>> The most important things to note:
>>> - don't include JNA (it needs to lock pages larger than what will be
>>> available)
>>> - turn down threadpools for transports
>>> - turn compaction throughput way down
>>> - make concurrent reads and writes very small
>>>
>>> I have used the above run a healthy 5 node clusters locally in it's own
>>> private network with a 6th monitoring server for light to moderate local
>>> testing in 16g of laptop ram. YMMV but it is possible.
>>>
>>
>>
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
<http://twitter.com/instaclustr> | (650) 284 9692

Re: run cassandra on a small instance

Posted by Tim Dunphy <bl...@gmail.com>.
>
> You might also have some gains setting in_memory_compaction_limit_in_mb
> to something very low to force Cassandra to use on disk compaction rather
> than doing it in memory.


Cool Ben.. thanks I'll add that to my config as well.

Glad that helped. Thanks for reporting back!


No problem, Nate! That's the least I can do. All I can hope is that this
thread adds to the overall fund of knowledge for the list.

Cheers,
Tim



On Mon, Feb 23, 2015 at 11:46 AM, Nate McCall <na...@thelastpickle.com>
wrote:

> Glad that helped. Thanks for reporting back!
>
> On Sun, Feb 22, 2015 at 9:12 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> Nate,
>>
>>  Definitely thank you for this advice. After leaving the new Cassandra
>> node running on the 2GB instance for the past couple of days, I think I've
>> had ample reason to report complete success in getting it stabilized on
>> that instance! Here are the changes I've been able to make:
>>
>>  I think manipulating the key cache and other stuff like concurrent
>> writes and some of the other stuff I worked on based on that thread from
>> the cassandra list definitely was key in getting Cassandra to work on the
>> new instance.
>>
>> Check out the before and after (before working/ after working):
>>
>> Before in cassandra-env.sh:
>>    MAX_HEAP_SIZE="800M"
>>    HEAP_NEWSIZE="200M"
>>
>> After:
>>     MAX_HEAP_SIZE="512M"
>>     HEAP_NEWSIZE="100M"
>>
>> And before in the cassandra.yaml file:
>>
>>    concurrent_writes: 32
>>    compaction_throughput_mb_per_sec: 16
>>    key_cache_size_in_mb:
>>    key_cache_save_period: 14400
>>    # native_transport_max_threads: 128
>>
>> And after:
>>
>>     concurrent_writes: 2
>>     compaction_throughput_mb_per_sec: 8
>>     key_cache_size_in_mb: 4
>>     key_cache_save_period: 0
>>     native_transport_max_threads: 4
>>
>>
>> That really made the difference. I'm a puppet user, so these changes are
>> in puppet. So any new 2GB instances I should bring up on Digital Ocean
>> should absolutely work the way the first 2GB node does, there.  But I was
>> able to make enough sense of your chef recipe to adapt what you were
>> showing me.
>>
>> Thanks again!
>> Tim
>>
>> On Fri, Feb 20, 2015 at 10:31 PM, Tim Dunphy <bl...@gmail.com>
>> wrote:
>>
>>> The most important things to note:
>>>> - don't include JNA (it needs to lock pages larger than what will be
>>>> available)
>>>> - turn down threadpools for transports
>>>> - turn compaction throughput way down
>>>> - make concurrent reads and writes very small
>>>> I have used the above run a healthy 5 node clusters locally in it's own
>>>> private network with a 6th monitoring server for light to moderate local
>>>> testing in 16g of laptop ram. YMMV but it is possible.
>>>
>>>
>>> Thanks!! That was very helpful. I just tried applying your suggestions
>>> to my cassandra.yaml file. I used the info from your chef recipe. Well like
>>> I've been saying typically it takes about 5 hours or so for this situation
>>> to shake itself out. I'll provide an update to the list once I have a
>>> better idea of how this is working.
>>>
>>> Thanks again!
>>> Tim
>>>
>>> On Fri, Feb 20, 2015 at 9:37 PM, Nate McCall <na...@thelastpickle.com>
>>> wrote:
>>>
>>>> I frequently test with multi-node vagrant-based clusters locally. The
>>>> following chef attributes should give you an idea of what to turn down in
>>>> cassandra.yaml and cassandra-env.sh to build a decent testing cluster:
>>>>
>>>>           :cassandra => {'cluster_name' => 'VerifyCluster',
>>>>                          'package_name' => 'dsc20',
>>>>                          'version' => '2.0.11',
>>>>                          'release' => '1',
>>>>                          'setup_jna' => false,
>>>>                          'max_heap_size' => '512M',
>>>>                          'heap_new_size' => '100M',
>>>>                          'initial_token' => server['initial_token'],
>>>>                          'seeds' => "192.168.33.10",
>>>>                          'listen_address' => server['ip'],
>>>>                          'broadcast_address' => server['ip'],
>>>>                          'rpc_address' => server['ip'],
>>>>                          'conconcurrent_reads' => "2",
>>>>                          'concurrent_writes' => "2",
>>>>                          'memtable_flush_queue_size' => "2",
>>>>                          'compaction_throughput_mb_per_sec' => "8",
>>>>                          'key_cache_size_in_mb' => "4",
>>>>                          'key_cache_save_period' => "0",
>>>>                          'native_transport_min_threads' => "2",
>>>>                          'native_transport_max_threads' => "4",
>>>>                          'notify_restart' => true,
>>>>                          'reporter' => {
>>>>                            'riemann' => {
>>>>                              'enable' => true,
>>>>                              'host' => '192.168.33.51'
>>>>                            },
>>>>                            'graphite' => {
>>>>                              'enable' => true,
>>>>                              'host' => '192.168.33.51'
>>>>                            }
>>>>                          }
>>>>                        },
>>>>
>>>> The most important things to note:
>>>> - don't include JNA (it needs to lock pages larger than what will be
>>>> available)
>>>> - turn down threadpools for transports
>>>> - turn compaction throughput way down
>>>> - make concurrent reads and writes very small
>>>>
>>>> I have used the above run a healthy 5 node clusters locally in it's own
>>>> private network with a 6th monitoring server for light to moderate local
>>>> testing in 16g of laptop ram. YMMV but it is possible.
>>>>
>>>
>>>
>>>
>>> --
>>> GPG me!!
>>>
>>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>>
>>>
>>
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
>
> --
> -----------------
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Posted by Nate McCall <na...@thelastpickle.com>.
Glad that helped. Thanks for reporting back!

On Sun, Feb 22, 2015 at 9:12 PM, Tim Dunphy <bl...@gmail.com> wrote:

> Nate,
>
>  Definitely thank you for this advice. After leaving the new Cassandra
> node running on the 2GB instance for the past couple of days, I think I've
> had ample reason to report complete success in getting it stabilized on
> that instance! Here are the changes I've been able to make:
>
>  I think manipulating the key cache and other stuff like concurrent writes
> and some of the other stuff I worked on based on that thread from the
> cassandra list definitely was key in getting Cassandra to work on the new
> instance.
>
> Check out the before and after (before working/ after working):
>
> Before in cassandra-env.sh:
>    MAX_HEAP_SIZE="800M"
>    HEAP_NEWSIZE="200M"
>
> After:
>     MAX_HEAP_SIZE="512M"
>     HEAP_NEWSIZE="100M"
>
> And before in the cassandra.yaml file:
>
>    concurrent_writes: 32
>    compaction_throughput_mb_per_sec: 16
>    key_cache_size_in_mb:
>    key_cache_save_period: 14400
>    # native_transport_max_threads: 128
>
> And after:
>
>     concurrent_writes: 2
>     compaction_throughput_mb_per_sec: 8
>     key_cache_size_in_mb: 4
>     key_cache_save_period: 0
>     native_transport_max_threads: 4
>
>
> That really made the difference. I'm a puppet user, so these changes are
> in puppet. So any new 2GB instances I should bring up on Digital Ocean
> should absolutely work the way the first 2GB node does, there.  But I was
> able to make enough sense of your chef recipe to adapt what you were
> showing me.
>
> Thanks again!
> Tim
>
> On Fri, Feb 20, 2015 at 10:31 PM, Tim Dunphy <bl...@gmail.com> wrote:
>
>> The most important things to note:
>>> - don't include JNA (it needs to lock pages larger than what will be
>>> available)
>>> - turn down threadpools for transports
>>> - turn compaction throughput way down
>>> - make concurrent reads and writes very small
>>> I have used the above run a healthy 5 node clusters locally in it's own
>>> private network with a 6th monitoring server for light to moderate local
>>> testing in 16g of laptop ram. YMMV but it is possible.
>>
>>
>> Thanks!! That was very helpful. I just tried applying your suggestions to
>> my cassandra.yaml file. I used the info from your chef recipe. Well like
>> I've been saying typically it takes about 5 hours or so for this situation
>> to shake itself out. I'll provide an update to the list once I have a
>> better idea of how this is working.
>>
>> Thanks again!
>> Tim
>>
>> On Fri, Feb 20, 2015 at 9:37 PM, Nate McCall <na...@thelastpickle.com>
>> wrote:
>>
>>> I frequently test with multi-node vagrant-based clusters locally. The
>>> following chef attributes should give you an idea of what to turn down in
>>> cassandra.yaml and cassandra-env.sh to build a decent testing cluster:
>>>
>>>           :cassandra => {'cluster_name' => 'VerifyCluster',
>>>                          'package_name' => 'dsc20',
>>>                          'version' => '2.0.11',
>>>                          'release' => '1',
>>>                          'setup_jna' => false,
>>>                          'max_heap_size' => '512M',
>>>                          'heap_new_size' => '100M',
>>>                          'initial_token' => server['initial_token'],
>>>                          'seeds' => "192.168.33.10",
>>>                          'listen_address' => server['ip'],
>>>                          'broadcast_address' => server['ip'],
>>>                          'rpc_address' => server['ip'],
>>>                          'conconcurrent_reads' => "2",
>>>                          'concurrent_writes' => "2",
>>>                          'memtable_flush_queue_size' => "2",
>>>                          'compaction_throughput_mb_per_sec' => "8",
>>>                          'key_cache_size_in_mb' => "4",
>>>                          'key_cache_save_period' => "0",
>>>                          'native_transport_min_threads' => "2",
>>>                          'native_transport_max_threads' => "4",
>>>                          'notify_restart' => true,
>>>                          'reporter' => {
>>>                            'riemann' => {
>>>                              'enable' => true,
>>>                              'host' => '192.168.33.51'
>>>                            },
>>>                            'graphite' => {
>>>                              'enable' => true,
>>>                              'host' => '192.168.33.51'
>>>                            }
>>>                          }
>>>                        },
>>>
>>> The most important things to note:
>>> - don't include JNA (it needs to lock pages larger than what will be
>>> available)
>>> - turn down threadpools for transports
>>> - turn compaction throughput way down
>>> - make concurrent reads and writes very small
>>>
>>> I have used the above run a healthy 5 node clusters locally in it's own
>>> private network with a 6th monitoring server for light to moderate local
>>> testing in 16g of laptop ram. YMMV but it is possible.
>>>
>>
>>
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: run cassandra on a small instance

Posted by Tim Dunphy <bl...@gmail.com>.
Nate,

 Definitely thank you for this advice. After leaving the new Cassandra node
running on the 2GB instance for the past couple of days, I think I've had
ample reason to report complete success in getting it stabilized on that
instance! Here are the changes I've been able to make:

 I think manipulating the key cache and other stuff like concurrent writes
and some of the other stuff I worked on based on that thread from the
cassandra list definitely was key in getting Cassandra to work on the new
instance.

Check out the before and after (before working/ after working):

Before in cassandra-env.sh:
   MAX_HEAP_SIZE="800M"
   HEAP_NEWSIZE="200M"

After:
    MAX_HEAP_SIZE="512M"
    HEAP_NEWSIZE="100M"

And before in the cassandra.yaml file:

   concurrent_writes: 32
   compaction_throughput_mb_per_sec: 16
   key_cache_size_in_mb:
   key_cache_save_period: 14400
   # native_transport_max_threads: 128

And after:

    concurrent_writes: 2
    compaction_throughput_mb_per_sec: 8
    key_cache_size_in_mb: 4
    key_cache_save_period: 0
    native_transport_max_threads: 4


That really made the difference. I'm a puppet user, so these changes are in
puppet. So any new 2GB instances I should bring up on Digital Ocean should
absolutely work the way the first 2GB node does, there.  But I was able to
make enough sense of your chef recipe to adapt what you were showing me.

Thanks again!
Tim

On Fri, Feb 20, 2015 at 10:31 PM, Tim Dunphy <bl...@gmail.com> wrote:

> The most important things to note:
>> - don't include JNA (it needs to lock pages larger than what will be
>> available)
>> - turn down threadpools for transports
>> - turn compaction throughput way down
>> - make concurrent reads and writes very small
>> I have used the above run a healthy 5 node clusters locally in it's own
>> private network with a 6th monitoring server for light to moderate local
>> testing in 16g of laptop ram. YMMV but it is possible.
>
>
> Thanks!! That was very helpful. I just tried applying your suggestions to
> my cassandra.yaml file. I used the info from your chef recipe. Well like
> I've been saying typically it takes about 5 hours or so for this situation
> to shake itself out. I'll provide an update to the list once I have a
> better idea of how this is working.
>
> Thanks again!
> Tim
>
> On Fri, Feb 20, 2015 at 9:37 PM, Nate McCall <na...@thelastpickle.com>
> wrote:
>
>> I frequently test with multi-node vagrant-based clusters locally. The
>> following chef attributes should give you an idea of what to turn down in
>> cassandra.yaml and cassandra-env.sh to build a decent testing cluster:
>>
>>           :cassandra => {'cluster_name' => 'VerifyCluster',
>>                          'package_name' => 'dsc20',
>>                          'version' => '2.0.11',
>>                          'release' => '1',
>>                          'setup_jna' => false,
>>                          'max_heap_size' => '512M',
>>                          'heap_new_size' => '100M',
>>                          'initial_token' => server['initial_token'],
>>                          'seeds' => "192.168.33.10",
>>                          'listen_address' => server['ip'],
>>                          'broadcast_address' => server['ip'],
>>                          'rpc_address' => server['ip'],
>>                          'conconcurrent_reads' => "2",
>>                          'concurrent_writes' => "2",
>>                          'memtable_flush_queue_size' => "2",
>>                          'compaction_throughput_mb_per_sec' => "8",
>>                          'key_cache_size_in_mb' => "4",
>>                          'key_cache_save_period' => "0",
>>                          'native_transport_min_threads' => "2",
>>                          'native_transport_max_threads' => "4",
>>                          'notify_restart' => true,
>>                          'reporter' => {
>>                            'riemann' => {
>>                              'enable' => true,
>>                              'host' => '192.168.33.51'
>>                            },
>>                            'graphite' => {
>>                              'enable' => true,
>>                              'host' => '192.168.33.51'
>>                            }
>>                          }
>>                        },
>>
>> The most important things to note:
>> - don't include JNA (it needs to lock pages larger than what will be
>> available)
>> - turn down threadpools for transports
>> - turn compaction throughput way down
>> - make concurrent reads and writes very small
>>
>> I have used the above run a healthy 5 node clusters locally in it's own
>> private network with a 6th monitoring server for light to moderate local
>> testing in 16g of laptop ram. YMMV but it is possible.
>>
>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Posted by Tim Dunphy <bl...@gmail.com>.
>
> The most important things to note:
> - don't include JNA (it needs to lock pages larger than what will be
> available)
> - turn down threadpools for transports
> - turn compaction throughput way down
> - make concurrent reads and writes very small
> I have used the above run a healthy 5 node clusters locally in it's own
> private network with a 6th monitoring server for light to moderate local
> testing in 16g of laptop ram. YMMV but it is possible.


Thanks!! That was very helpful. I just tried applying your suggestions to
my cassandra.yaml file. I used the info from your chef recipe. Well like
I've been saying typically it takes about 5 hours or so for this situation
to shake itself out. I'll provide an update to the list once I have a
better idea of how this is working.

Thanks again!
Tim

On Fri, Feb 20, 2015 at 9:37 PM, Nate McCall <na...@thelastpickle.com> wrote:

> I frequently test with multi-node vagrant-based clusters locally. The
> following chef attributes should give you an idea of what to turn down in
> cassandra.yaml and cassandra-env.sh to build a decent testing cluster:
>
>           :cassandra => {'cluster_name' => 'VerifyCluster',
>                          'package_name' => 'dsc20',
>                          'version' => '2.0.11',
>                          'release' => '1',
>                          'setup_jna' => false,
>                          'max_heap_size' => '512M',
>                          'heap_new_size' => '100M',
>                          'initial_token' => server['initial_token'],
>                          'seeds' => "192.168.33.10",
>                          'listen_address' => server['ip'],
>                          'broadcast_address' => server['ip'],
>                          'rpc_address' => server['ip'],
>                          'conconcurrent_reads' => "2",
>                          'concurrent_writes' => "2",
>                          'memtable_flush_queue_size' => "2",
>                          'compaction_throughput_mb_per_sec' => "8",
>                          'key_cache_size_in_mb' => "4",
>                          'key_cache_save_period' => "0",
>                          'native_transport_min_threads' => "2",
>                          'native_transport_max_threads' => "4",
>                          'notify_restart' => true,
>                          'reporter' => {
>                            'riemann' => {
>                              'enable' => true,
>                              'host' => '192.168.33.51'
>                            },
>                            'graphite' => {
>                              'enable' => true,
>                              'host' => '192.168.33.51'
>                            }
>                          }
>                        },
>
> The most important things to note:
> - don't include JNA (it needs to lock pages larger than what will be
> available)
> - turn down threadpools for transports
> - turn compaction throughput way down
> - make concurrent reads and writes very small
>
> I have used the above run a healthy 5 node clusters locally in it's own
> private network with a 6th monitoring server for light to moderate local
> testing in 16g of laptop ram. YMMV but it is possible.
>



-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Posted by Nate McCall <na...@thelastpickle.com>.
I frequently test with multi-node vagrant-based clusters locally. The
following chef attributes should give you an idea of what to turn down in
cassandra.yaml and cassandra-env.sh to build a decent testing cluster:

          :cassandra => {'cluster_name' => 'VerifyCluster',
                         'package_name' => 'dsc20',
                         'version' => '2.0.11',
                         'release' => '1',
                         'setup_jna' => false,
                         'max_heap_size' => '512M',
                         'heap_new_size' => '100M',
                         'initial_token' => server['initial_token'],
                         'seeds' => "192.168.33.10",
                         'listen_address' => server['ip'],
                         'broadcast_address' => server['ip'],
                         'rpc_address' => server['ip'],
                         'conconcurrent_reads' => "2",
                         'concurrent_writes' => "2",
                         'memtable_flush_queue_size' => "2",
                         'compaction_throughput_mb_per_sec' => "8",
                         'key_cache_size_in_mb' => "4",
                         'key_cache_save_period' => "0",
                         'native_transport_min_threads' => "2",
                         'native_transport_max_threads' => "4",
                         'notify_restart' => true,
                         'reporter' => {
                           'riemann' => {
                             'enable' => true,
                             'host' => '192.168.33.51'
                           },
                           'graphite' => {
                             'enable' => true,
                             'host' => '192.168.33.51'
                           }
                         }
                       },

The most important things to note:
- don't include JNA (it needs to lock pages larger than what will be
available)
- turn down threadpools for transports
- turn compaction throughput way down
- make concurrent reads and writes very small

I have used the above run a healthy 5 node clusters locally in it's own
private network with a 6th monitoring server for light to moderate local
testing in 16g of laptop ram. YMMV but it is possible.

Re: run cassandra on a small instance

Posted by Tim Dunphy <bl...@gmail.com>.
Hey guys,

 OK well I've experimented with this a bit, and I think at this point the
problem with Cassandra crashing on the smaller instances is probably an
issue with my data. Because what I've done is blown away my data directory
to start fresh. And then started up Cassandra on the 2GB instance.

In addition to that, I've also fired up another Cassandra instance on a
t2.micro instance on Amazon Web Services. Again, without my data and just
to see if it would run on less resources in terms of memory.

And both the 2GB ram instance at Digital Ocean and the new t2.micro
instance at AWS. And cassandra has been running for a while now, without
interruption. It's been running well past the 5-hour window of time I've
been experiencing on the 2GB instance.

And with my test data involved on the 4GB memory instance at Digital Ocean,
it just runs and there's no issue.

All told I have 15MB of data in my keyspace for my test data. So I'm hoping
if I show you my schema, there might be some way of understanding why, when
I introduce my data, the smaller instances refuse to run for any great
length of time.

CREATE KEYSPACE IF NOT EXISTS joke_fire1 WITH replication = {'class':
'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;

use joke_fire1;

CREATE TABLE IF NOT EXISTS joke_fire1.jokes (
    joke_id int PRIMARY KEY,
    fire_it int,
    joke_title text,
    joke_type int,
    long_descr text,
    long_descrlink text,
    long_descrtype int,
    posted_on timestamp,
    short_descr text,
    status int,
    user_id int,
    view_by int
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

CREATE TABLE IF NOT EXISTS joke_fire1.joke_details (
    cmt_id int PRIMARY KEY,
    cmts text,
    fireit_tag boolean,
    joke_id int,
    posted_by int,
    posted_on timestamp,
    rated_value int
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX IF NOT EXISTS joke_details_joke_id ON joke_fire1.joke_details
(joke_id);

CREATE TABLE IF NOT EXISTS joke_fire1.users (
    user_id int,
    user_name text PRIMARY KEY,
    email text,
    first_name text,
    last_name text,
    password varchar,
    city text,
    state text,
    b_date timestamp,
    gender text,
    about_u text,
    profile_image varchar,
    status text,
    added_date timestamp,
    fb_uid varchar,
    twt_uid varchar,
    open_uid varchar,
    lastactivity timestamp
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Also, because it's a test environment, the data is not seeing any kind of
heavy use. I just say that to rule out any factors that reads/writes may
introduce to the equation.

Thanks in advance for any insights you may have to offer!


Tim

On Thu, Feb 19, 2015 at 2:25 PM, Kai Wang <de...@gmail.com> wrote:

> One welcome change is http://cassandra.apache.org/ actually starts
> displaying:
>
> "Latest release *2.1.3* (Changes
> <http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3>),
> Stable release *2.0.12* (Changes
> <http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.0.12>)
> "
>
> It's better than before where "Latest release" is the only link available
> on the home page and naturally that's most people download.
>
> On Thu, Feb 19, 2015 at 1:57 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Wed, Feb 18, 2015 at 5:26 PM, Andrew <re...@gmail.com> wrote:
>>
>>> Let me know if I’m off base about this—but I feel like I see a lot of
>>> posts that are like this (i.e., use this arbitrary version, not this other
>>> arbitrary version).  Why are releases going out if they’re “broken”?  This
>>> seems like a very confusing way for new (and existing) users to approach
>>> versions...
>>>
>>
>> In my opinion and in no way speaking for or representing Apache
>> Cassandra, Datastax, or anyone else :
>>
>> I think it's a problem of messaging, and a mismatch of expectations
>> between the development team and operators.
>>
>> I think the "stable" versions are stable by the dev team's standards, and
>> not by operators' standards. While testing has historically been IMO
>> insufficient for a data-store (where correctness really matters) there are
>> also various issues which probably can not realistically be detected in
>> testing. Of course, operators need to be willing to operate (ideally in
>> non-production) near the cutting edge in order to assist in the detection
>> and resolution of these bugs, but I think the project does itself a
>> disservice by encouraging noobs to run these versions. You only get one
>> chance to make a first impression, as the saying goes.
>>
>> My ideal messaging would probably say something like "versions near the
>> cutting edge should be treated cautiously, conservative operators should
>> run mature point releases in production and only upgrade to near the
>> cutting edge after extended burn-in in dev/QA/stage environments."
>>
>> A fair response to this critique is that operators should know better
>> than to trust that x.y.0-5 release versions of any open source software are
>> likely to be production ready, even if the website says "stable" next to
>> the download. Trust, but verify?
>>
>> =Rob
>>
>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: run cassandra on a small instance

Posted by Kai Wang <de...@gmail.com>.
One welcome change is http://cassandra.apache.org/ actually starts
displaying:

"Latest release *2.1.3* (Changes
<http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3>),
Stable release *2.0.12* (Changes
<http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.0.12>)
"

It's better than before where "Latest release" is the only link available
on the home page and naturally that's most people download.

On Thu, Feb 19, 2015 at 1:57 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Feb 18, 2015 at 5:26 PM, Andrew <re...@gmail.com> wrote:
>
>> Let me know if I’m off base about this—but I feel like I see a lot of
>> posts that are like this (i.e., use this arbitrary version, not this other
>> arbitrary version).  Why are releases going out if they’re “broken”?  This
>> seems like a very confusing way for new (and existing) users to approach
>> versions...
>>
>
> In my opinion and in no way speaking for or representing Apache Cassandra,
> Datastax, or anyone else :
>
> I think it's a problem of messaging, and a mismatch of expectations
> between the development team and operators.
>
> I think the "stable" versions are stable by the dev team's standards, and
> not by operators' standards. While testing has historically been IMO
> insufficient for a data-store (where correctness really matters) there are
> also various issues which probably can not realistically be detected in
> testing. Of course, operators need to be willing to operate (ideally in
> non-production) near the cutting edge in order to assist in the detection
> and resolution of these bugs, but I think the project does itself a
> disservice by encouraging noobs to run these versions. You only get one
> chance to make a first impression, as the saying goes.
>
> My ideal messaging would probably say something like "versions near the
> cutting edge should be treated cautiously, conservative operators should
> run mature point releases in production and only upgrade to near the
> cutting edge after extended burn-in in dev/QA/stage environments."
>
> A fair response to this critique is that operators should know better than
> to trust that x.y.0-5 release versions of any open source software are
> likely to be production ready, even if the website says "stable" next to
> the download. Trust, but verify?
>
> =Rob
>

Re: run cassandra on a small instance

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Feb 18, 2015 at 5:26 PM, Andrew <re...@gmail.com> wrote:

> Let me know if I’m off base about this—but I feel like I see a lot of
> posts that are like this (i.e., use this arbitrary version, not this other
> arbitrary version).  Why are releases going out if they’re “broken”?  This
> seems like a very confusing way for new (and existing) users to approach
> versions...
>

In my opinion and in no way speaking for or representing Apache Cassandra,
Datastax, or anyone else :

I think it's a problem of messaging, and a mismatch of expectations between
the development team and operators.

I think the "stable" versions are stable by the dev team's standards, and
not by operators' standards. While testing has historically been IMO
insufficient for a data-store (where correctness really matters) there are
also various issues which probably can not realistically be detected in
testing. Of course, operators need to be willing to operate (ideally in
non-production) near the cutting edge in order to assist in the detection
and resolution of these bugs, but I think the project does itself a
disservice by encouraging noobs to run these versions. You only get one
chance to make a first impression, as the saying goes.

My ideal messaging would probably say something like "versions near the
cutting edge should be treated cautiously, conservative operators should
run mature point releases in production and only upgrade to near the
cutting edge after extended burn-in in dev/QA/stage environments."

A fair response to this critique is that operators should know better than
to trust that x.y.0-5 release versions of any open source software are
likely to be production ready, even if the website says "stable" next to
the download. Trust, but verify?

=Rob

Re: run cassandra on a small instance

Posted by Andrew <re...@gmail.com>.
Robert,

Let me know if I’m off base about this—but I feel like I see a lot of posts that are like this (i.e., use this arbitrary version, not this other arbitrary version).  Why are releases going out if they’re “broken”?  This seems like a very confusing way for new (and existing) users to approach versions...

Andrew

On February 18, 2015 at 5:16:27 PM, Robert Coli (rcoli@eventbrite.com) wrote:

On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com> wrote:
I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance over at Digital Ocean. It's a CentOS 7 host.

2.1.2 is IMO broken and should not be used for any purpose.

Use 2.1.1 or 2.1.3.

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

=Rob
 

Re: run cassandra on a small instance

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy <bl...@gmail.com> wrote:

> I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
> over at Digital Ocean. It's a CentOS 7 host.
>

2.1.2 is IMO broken and should not be used for any purpose.

Use 2.1.1 or 2.1.3.

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

=Rob