You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Roshan Dawrani <ro...@gmail.com> on 2011/01/20 12:46:06 UTC

Embedded Cassandra server startup question

Hi,

I am using Cassandra for a Grails application and in that I start the
embedded server when the Spring application context gets built.

When I run my Grails app test suite - it first runs the integration and then
functional test suite and it builds the application text individually for
each phase.

When it brings the up the embedded Cassandra server in 2nd phase (for
functional tests), it fails saying "*Attempt to assign id to existing column
family.*"

Anyone familiar with this error? Is it because both the test phases are
executed in the same JVM instance and there is some Cassandra meta-data from
phase 1 server start that is affecting the server startup in 2nd phase?

Any way I can cleanly start the server 2 times in my case? Any other
suggestion? Thanks.

-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani <http://twitter.com/roshandawrani>
Skype: roshandawrani

Re: Embedded Cassandra server startup question

Posted by Anand Somani <me...@gmail.com>.
It is a little slow not to the point where it concerns me (only have few
tests for now), but keeps things very clean so no surprise effects.



On Thu, Jan 20, 2011 at 6:33 PM, Roshan Dawrani <ro...@gmail.com>wrote:

> On Fri, Jan 21, 2011 at 5:14 AM, Anand Somani <me...@gmail.com>wrote:
>
>> Here is what worked for me, I use testNg, and initialize and createschema
>> in the @BeforeClass for each test
>>
>>    - In the @AfterClass, I had to drop schema, otherwise I was getting
>>    the same exception.
>>    - After this I started getting port conflict with the second test, so
>>    I added my own version of EmbeddedCass.. class, added a stop which calls a
>>    stop on the cassandradaemon (which from code comments seems to closes the
>>    thrift port)
>>
>> How was this clean-up experience, Anand? Shutting down the cassandra
> daemon and droping and creating schema between tests? Sounds like something
> that could be time consuming.
>
> I am currently firing all-deletes on all my CFs and am looking for more
> efficient ways to have data cleaned-up between tests.
>
> Thanks.
>

Re: Embedded Cassandra server startup question

Posted by Roshan Dawrani <ro...@gmail.com>.
On Fri, Jan 21, 2011 at 5:14 AM, Anand Somani <me...@gmail.com> wrote:

> Here is what worked for me, I use testNg, and initialize and createschema
> in the @BeforeClass for each test
>
>    - In the @AfterClass, I had to drop schema, otherwise I was getting the
>    same exception.
>    - After this I started getting port conflict with the second test, so I
>    added my own version of EmbeddedCass.. class, added a stop which calls a
>    stop on the cassandradaemon (which from code comments seems to closes the
>    thrift port)
>
> How was this clean-up experience, Anand? Shutting down the cassandra daemon
and droping and creating schema between tests? Sounds like something that
could be time consuming.

I am currently firing all-deletes on all my CFs and am looking for more
efficient ways to have data cleaned-up between tests.

Thanks.

Re: Embedded Cassandra server startup question

Posted by Anand Somani <me...@gmail.com>.
Here is what worked for me, I use testNg, and initialize and createschema in
the @BeforeClass for each test

   - In the @AfterClass, I had to drop schema, otherwise I was getting the
   same exception.
   - After this I started getting port conflict with the second test, so I
   added my own version of EmbeddedCass.. class, added a stop which calls a
   stop on the cassandradaemon (which from code comments seems to closes the
   thrift port)


On Thu, Jan 20, 2011 at 1:32 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> Do you have a full error stack?
>
> That error is raised when the schema is added to an internal static map.
> There is a lot of static state so it's probably going to make your life
> easier if you can avoid reusing the JVM.
>
> Im guessing your errors comes from AbstractCassandraDaemon.setup() calling
> DatabaseDescriptor.loadSchemas() . It may be possible to work around this
> issue, but I don't have time today. Let me know how you get on.
>
> Aaron
>
>
> On 21/01/2011, at 12:46 AM, Roshan Dawrani <ro...@gmail.com>
> wrote:
>
> Hi,
>
> I am using Cassandra for a Grails application and in that I start the
> embedded server when the Spring application context gets built.
>
> When I run my Grails app test suite - it first runs the integration and
> then functional test suite and it builds the application text individually
> for each phase.
>
> When it brings the up the embedded Cassandra server in 2nd phase (for
> functional tests), it fails saying "*Attempt to assign id to existing
> column family.*"
>
> Anyone familiar with this error? Is it because both the test phases are
> executed in the same JVM instance and there is some Cassandra meta-data from
> phase 1 server start that is affecting the server startup in 2nd phase?
>
> Any way I can cleanly start the server 2 times in my case? Any other
> suggestion? Thanks.
>
> --
> Roshan
> Blog: <http://roshandawrani.wordpress.com/>
> http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani <http://twitter.com/roshandawrani>
> Skype: roshandawrani
>
>

Re: Embedded Cassandra server startup question

Posted by Roshan Dawrani <ro...@gmail.com>.
Ok, got a Cassandra client from Hector and changed my clean-up to be
truncate() based.

Here is how I did it, if it could be any use to anyone:

=============================================
HConnectionManager connectionManager = cassandraCluster.connectionManager
Collection<ConcurrentHClientPool> activePools =
connectionManager.activePools

ConcurrentHClientPool pool = activePools.iterator().next()
HThriftClient client = pool.borrowClient()

Cassandra.Client c = client.getCassandra()
c.set_keyspace(keyspaceName)

cfsToTrucate.each {cf ->
    c.truncate(cf)
}
=============================================

Thanks to everyone who shared their inputs.

-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani <http://twitter.com/roshandawrani>
Skype: roshandawrani

On Fri, Jan 21, 2011 at 10:35 AM, Roshan Dawrani <ro...@gmail.com>wrote:

> Back to square one on using CliMain/CliClient vs Cassandra/Hector API for
> cleanuup.
>
> It seems CliClient uses Antlr 3.1+ for parsing the statements passed to it,
> but I am using Grails that uses Antlr 2.7.7 (used by groovy code parsing),
> so I can't mix the two for programmatic use.
>
> Someone please tell how I can truncate my column families in my Hector
> based environment? Does it expose a thrift Cassandra.Client somewhere so I
> can make calls that its API does not cover yet?
>
> Thanks.
>
> On Fri, Jan 21, 2011 at 9:12 AM, Roshan Dawrani <ro...@gmail.com>wrote:
>
>> On Fri, Jan 21, 2011 at 8:56 AM, Roshan Dawrani <ro...@gmail.com>wrote:
>>
>>> On Fri, Jan 21, 2011 at 8:52 AM, Maxim Potekhin <po...@bnl.gov>wrote:
>>>
>>>>  You can script the actions you need and pipe the file into
>>>> Cassandra-CLI.
>>>> Works for me.
>>>>
>>>
>>>
>> Probably CliMain / CliClient will help me there doing it as per your
>> suggestion.
>>
>> Still would like to confirm if I cannot do it through Hector API at this
>> point of time, when there  is no direct Hector API call for truncate().
>> Anyway I can still reach Cassandra's truncate() call?
>>
>> Thanks.
>>
>
>
>
> --
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani <http://twitter.com/roshandawrani>
> Skype: roshandawrani
>
>

Re: Embedded Cassandra server startup question

Posted by Roshan Dawrani <ro...@gmail.com>.
Back to square one on using CliMain/CliClient vs Cassandra/Hector API for
cleanuup.

It seems CliClient uses Antlr 3.1+ for parsing the statements passed to it,
but I am using Grails that uses Antlr 2.7.7 (used by groovy code parsing),
so I can't mix the two for programmatic use.

Someone please tell how I can truncate my column families in my Hector based
environment? Does it expose a thrift Cassandra.Client somewhere so I can
make calls that its API does not cover yet?

Thanks.

On Fri, Jan 21, 2011 at 9:12 AM, Roshan Dawrani <ro...@gmail.com>wrote:

> On Fri, Jan 21, 2011 at 8:56 AM, Roshan Dawrani <ro...@gmail.com>wrote:
>
>> On Fri, Jan 21, 2011 at 8:52 AM, Maxim Potekhin <po...@bnl.gov> wrote:
>>
>>>  You can script the actions you need and pipe the file into
>>> Cassandra-CLI.
>>> Works for me.
>>>
>>
>>
> Probably CliMain / CliClient will help me there doing it as per your
> suggestion.
>
> Still would like to confirm if I cannot do it through Hector API at this
> point of time, when there  is no direct Hector API call for truncate().
> Anyway I can still reach Cassandra's truncate() call?
>
> Thanks.
>



-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani <http://twitter.com/roshandawrani>
Skype: roshandawrani

Re: Embedded Cassandra server startup question

Posted by Roshan Dawrani <ro...@gmail.com>.
On Fri, Jan 21, 2011 at 8:56 AM, Roshan Dawrani <ro...@gmail.com>wrote:

> On Fri, Jan 21, 2011 at 8:52 AM, Maxim Potekhin <po...@bnl.gov> wrote:
>
>>  You can script the actions you need and pipe the file into Cassandra-CLI.
>> Works for me.
>>
>
>
Probably CliMain / CliClient will help me there doing it as per your
suggestion.

Still would like to confirm if I cannot do it through Hector API at this
point of time, when there  is no direct Hector API call for truncate().
Anyway I can still reach Cassandra's truncate() call?

Thanks.

Re: Embedded Cassandra server startup question

Posted by Roshan Dawrani <ro...@gmail.com>.
On Fri, Jan 21, 2011 at 8:52 AM, Maxim Potekhin <po...@bnl.gov> wrote:

>  You can script the actions you need and pipe the file into Cassandra-CLI.
> Works for me.
>

Thanks Maxim,  but first preference will be to do it through the API and not
launch the Cassandra-CLI process with a scripted set of actions (I assume
that is what your suggestion meant)

truncate() may work best for me, if I can get it working through Hector API
that I already use.

Re: Embedded Cassandra server startup question

Posted by Maxim Potekhin <po...@bnl.gov>.
You can script the actions you need and pipe the file into Cassandra-CLI.
Works for me.

On 1/20/2011 10:18 PM, Roshan Dawrani wrote:
> On Fri, Jan 21, 2011 at 8:07 AM, Aaron Morton <aaron@thelastpickle.com 
> <ma...@thelastpickle.com>> wrote:
>
>     There is a truncate() function that will clear a CF. It may leave
>     a snapshot around, cannot remember exactly.
>
>
> Not sure if Hector (0.7.0-22) has added truncate() to its API yet. I 
> can't find it.
>
> In Hector, I see a _dropColumnFamily()_ that goes to Cassandra's 
> _system_drop_column_family()_ call.
>
> I am not sure how this system_drop_column_family() fares in 
> comparision to truncate() in terms of time the clean-up would take.
>
> I am new to Hector/Cass and all my exposure to Cass API has been 
> through Hector. So a basic question.
>
> If Hector has not provided truncate() to its API, can I bypass it and 
> make the call to Cassandra API directly? Does Hector leave any opening 
> for such bypassed calls?
>
> Thanks.


Re: Embedded Cassandra server startup question

Posted by Roshan Dawrani <ro...@gmail.com>.
On Fri, Jan 21, 2011 at 8:07 AM, Aaron Morton <aa...@thelastpickle.com>wrote:

> There is a truncate() function that will clear a CF. It may leave a
> snapshot around, cannot remember exactly.
>

Not sure if Hector (0.7.0-22) has added truncate() to its API yet. I can't
find it.

In Hector, I see a *dropColumnFamily()* that goes to Cassandra's *
system_drop_column_family()* call.

I am not sure how this system_drop_column_family() fares in comparision to
truncate() in terms of time the clean-up would take.

I am new to Hector/Cass and all my exposure to Cass API has been through
Hector. So a basic question.

If Hector has not provided truncate() to its API, can I bypass it and make
the call to Cassandra API directly? Does Hector leave any opening for such
bypassed calls?

Thanks.

Re: Embedded Cassandra server startup question

Posted by Roshan Dawrani <ro...@gmail.com>.
On Fri, Jan 21, 2011 at 8:07 AM, Aaron Morton <aa...@thelastpickle.com>wrote:

> There is a truncate() function that will clear a CF. It may leave a
> snapshot around, cannot remember exactly.
>
> Or you could drop and recreate the keyspace between tests using
> system_add_keyspace() and system_drop_keyspace(). The system tests in the
> test/system/__init__.py sort of do this.
>

Thanks Aaron. I will checkout both the options. If the existing system tests
there are adding / dropping keyspace between tests, maybe it is not a very
expensive operation after all.

At the minimum, I can replace my clean-up with truncate() calls.

Thanks a lot.

Re: Embedded Cassandra server startup question

Posted by Aaron Morton <aa...@thelastpickle.com>.
There is a truncate() function that will clear a CF. It may leave a snapshot around, cannot remember exactly. 

Or you could drop and recreate the keyspace between tests using system_add_keyspace() and system_drop_keyspace(). The system tests in the test/system/__init__.py sort of do this. 

Aaron

On 21 Jan, 2011,at 03:16 PM, Roshan Dawrani <ro...@gmail.com> wrote:

On Fri, Jan 21, 2011 at 3:02 AM, Aaron Morton <aa...@thelastpickle.com> wrote:
Do you have a full error stack?

That error is raised when the schema is added to an internal static map. There is a lot of static state so it's probably going to make your life easier if you can avoid reusing the JVM.


Hi Aaron,

Actually it is not my primary requirement to start the Embedded server twice in the same JVM. The requirement is to have the empty column families before each test so that changes made in tests do not affect each other.

Keeping a single instance of the embedded server up across test phases, what would be the most efficient way to clean-up the CFs between tests?

I have around 10 CFs and not too much data is generated in each test, so right now, I collect all keys from CFs and then fire a batch query to delete them.

Can I improve on that clean-up process between tests?

Im guessing your errors comes from AbstractCassandraDaemon.setup() calling DatabaseDescriptor.loadSchemas() .

I start the embedded server using EmbeddedServerHelper@setup(). I am not directly dealing with AbstractCassandraDaemonsetup(). I guess that all happens inside EmbeddedServerHelper.

Re: Embedded Cassandra server startup question

Posted by Roshan Dawrani <ro...@gmail.com>.
On Fri, Jan 21, 2011 at 3:02 AM, Aaron Morton <aa...@thelastpickle.com>wrote:

> Do you have a full error stack?
>
> That error is raised when the schema is added to an internal static map.
> There is a lot of static state so it's probably going to make your life
> easier if you can avoid reusing the JVM.
>
>
Hi Aaron,

Actually it is not my primary requirement to start the Embedded server twice
in the same JVM. The requirement is to have the empty column families before
each test so that changes made in tests do not affect each other.

Keeping a single instance of the embedded server up across test phases, what
would be the most efficient way to clean-up the CFs between tests?

I have around 10 CFs and not too much data is generated in each test, so
right now, I collect all keys from CFs and then fire a batch query to delete
them.

Can I improve on that clean-up process between tests?

Im guessing your errors comes from AbstractCassandraDaemon.setup() calling
> DatabaseDescriptor.loadSchemas() .
>

I start the embedded server using EmbeddedServerHelper@setup(). I am not
directly dealing with AbstractCassandraDaemon.setup(). I guess that all
happens inside EmbeddedServerHelper.

Re: Embedded Cassandra server startup question

Posted by Aaron Morton <aa...@thelastpickle.com>.
Do you have a full error stack?

That error is raised when the schema is added to an internal static map. There is a lot of static state so it's probably going to make your life easier if you can avoid reusing the JVM.

Im guessing your errors comes from AbstractCassandraDaemon.setup() calling DatabaseDescriptor.loadSchemas() . It may be possible to work around this issue, but I don't have time today. Let me know how you get on.

Aaron


On 21/01/2011, at 12:46 AM, Roshan Dawrani <ro...@gmail.com> wrote:

> Hi,
> 
> I am using Cassandra for a Grails application and in that I start the embedded server when the Spring application context gets built.
> 
> When I run my Grails app test suite - it first runs the integration and then functional test suite and it builds the application text individually for each phase.
> 
> When it brings the up the embedded Cassandra server in 2nd phase (for functional tests), it fails saying "Attempt to assign id to existing column family."
> 
> Anyone familiar with this error? Is it because both the test phases are executed in the same JVM instance and there is some Cassandra meta-data from phase 1 server start that is affecting the server startup in 2nd phase?
> 
> Any way I can cleanly start the server 2 times in my case? Any other suggestion? Thanks.
> 
> -- 
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani
> Skype: roshandawrani
>