You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Ryan Daum <ry...@thimbleware.com> on 2010/08/01 17:41:07 UTC

Re: Creating two instances in code

This is very discouraging; I've looked several times at this code and could
not believe my eyes in regard to the wanton use of global statics. In
addition to smelling bad, it makes it difficult to embed Cassandra. Is there
no will at all to fix this?

Ryan

On Fri, Jul 30, 2010 at 4:50 PM, <al...@ceid.upatras.gr> wrote:

> From what I found, I'd say it lies somewhere between pointless and
> impossible. :-\
>
> Alexander
>
> > Several more.  Without much thinking there are Gossiper,
> > MessageService and StorageService.  All are singletons that
> > encapsulate much of the functionality of Cassandra.  We've been very
> > consistent about naming the singleton instances "instance," so if you
> > search for "public static final [\w]+ instance" and ignore what you
> > find in db.marshal, you'll have a pretty good idea of what you're in
> > for.
> >
> > Gary.
> >
> >
> > On Fri, Jul 30, 2010 at 12:01,  <al...@ceid.upatras.gr> wrote:
> > > Thank you for your answer. I know I should change the code. My question
> > > was mainly on how to change the code to do this.
> > >
> > > For example, will making DatabaseDescriptor a non-static class be
> enough?
> > > I could have Cassandra build a DatabaseDescriptor instance at startup,
> > > which could be a class variable of some basic class, and then replace
> all
> > > instances of DatabaseDescriptor.someFunction() with method calls on the
> > > object. Will that be enough or are there many more singletons in the
> code?
> > >
> > > Any different suggestions?
> > >
> > > Alexander Altanis
> > >
> > >> The resource file (cassandra.yaml) is statically defined that is
> > >> brought in from the classpath. To do what you desire will require
> > >> changing the code.
> > >>
> > >> You could modify DatabaseDescriptor.getStorageConfigPath() to accept
> > >> some kind of variable to indicate the instance and then load a
> > >> different file.
> > >>
> > >> But that's the least of your problems, as you're probably noticing.
> > >> The heavy use of singletons will prevent you from isolating some of
> > >> the services, and the fact that DatabaseDescriptor loads its
> > >> configuration in a static block into static members will be very
> > >> problematic.
> > >>
> > >> Gary.
> > >>
> > >> On Thu, Jul 29, 2010 at 11:36, �<al...@ceid.upatras.gr> wrote:
> > >> > Hello,
> > >> >
> > >> > I'd like to make some changes to cassandra so that when starting up
> a
> > > node
> > >> > in a cluster, another node starts in another cluster. That requires
> that
> > >> > the two nodes have different configurations, but DatabaseDescriptor
> > > (where
> > >> > I think all the config reading is done) seems to load everything
> > >> > statically when the class is loaded. The configuration path seems to
> > >> > reside in System.getProperties(). Can you suggest a way for me to
> build a
> > >> > second node with a different configuration path in the same code?
> > >> >
> > >> > Unfortunately, I cannot simply launch two different cassandra
> > > instances on
> > >> > the same computer, as I want the second node to have access to
> > > information
> > >> > from the first node, such as node load for the first cluster and
> such
> > >> > (plus even running two separate cassandra instances on the same node
> > > seems
> > >> > to require workarounds and hacks).
> > >> >
> > >> > Alexander Altanis
> > >> >
> > >
>

Re: Creating two instances in code

Posted by Ran Tavory <ra...@gmail.com>.

+ 1 to all suggestions from Bjorn, but I'm sorry I can't devote time to it.
FWIW there's an old issue I once reported which tells part of the story. At
the time it was resolved as Won't Fix, but as Gary mentioned, time change
https://issues.apache.org/jira/browse/CASSANDRA-741 (Refactor for
testability: Make class DatabaseDescriptor a real class with member methods,
and non-static methods)
On Tue, Aug 17, 2010 at 4:31 PM, Bjorn Borud <bb...@gmail.com> wrote:

> Gary Dusbabek <gd...@gmail.com> writes:
>
> >
> > I looked into doing this when I was first learning the code and had an
> > experience simliar to yours.  At the time there wasn't much interest
> > in seeing it through to fruition, but maybe times have changed.
>
> any lack of interest in solving these problems just means that people
> haven't stumbled on these problems yet :-)
> ...but eventually they will (and people like Ran Tavory and the Hector
> team have already stumbled across these hurdles and had to devote time
> to creating some workarounds).
>
> > If I were to attempt it again I would do it in this error:
> > 1.  Make the config customizable.
>
> Would it be good enough if you had a CassandraConfig object and some
> ways to create it?  Either directly or through:
>
>  CassandraConfig config = CassandraConfig.parseFile(...);
>
> and then some:
>
>  Cassandra cassandra = Cassandra.createInstance(config);
>
> or even
>
>  Cassandra cassandra = new Cassandra(config);
>
> > 2.  Make the services re-entrant (You should be able to start, stop,
> > then start again without problems).
>
> you mean restart an instance or be able to throw away your instance and
> create a new one?  for me, being able to restart a stopped instance
> isn't really that important because it would work fine for me to create
> a new instance (possibly with the same config, using the same files/dirs
> and ports).
>
> you may have good reasons to be able to restart a stopped Cassandra
> instance though.  (But I suspect we more or less want the same thing).
>
> > 3.  Get rid of the singletons.  This will involve coming up with a
> > smart way to couple instances of the services with each other.
>
> indeed.  but I hope nobody falls for the temptation of introducing
> Spring or something similar to do the wiring in the Cassandra
> code. (what people do in their own projects is their problem, but
> Cassandra should not require you to adopt additional mamoth frameworks).
>
> > 4.  Integrate the storage port into how we canonically identify a node
> > (its just hostname now).
>
> hmm, I see your point, but I am not sure I understand the consequences
> fully.
>
> > 5.  While you're at it, figure out how to get JMX to bind to something
> > other than 0.0.0.0.  (I hear it is possible, see
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6425769)
>
> I have limited experience with JMX so I'll pass on commenting on this.
>
> >> there are other valid reasons for wanting to embed Cassandra besides
> >> unit testing.  for instance, if you are writing an application that
> >> depends on Cassandra and you want the option of packaging it as a single
> >> binary for single node experimentation, development and demo purposes.
> >
> > I'd kind of like to see this too, although I admit that from the
> > pragmatic standpoint of running a Cassandra server, it represents a
> > whole lot of change for what amounts to very little tangible benefit.
>
> while the benefit may be hard to articulate, I think it is significant.
> any time you can embed a "server" in your binary you can make life a lot
> easier for casual users and for testing.
>
> almost all server projects I have done in the past 7-8 years have been
> like this:  I make it possible to embed the server so that people can
> build and distribute prototypes or they can use the exact same binary to
> either use an external (distributed) instance or just create an internal
> instance for simpler use-cases (by config).
>
> compare to Hudson.  it is distributed as a WAR so you can load it into
> your web server.  but for most people, they just want it up and running
> with as little hassle as possible on a single node, so being able to
> fire it up from the command line, and rely on the embedded web server is
> very attractive compared to fooling around with Jetty, Tomcat or worse.
> if Hudson had required me to manage a number of services that I need to
> manually set up and manage, I would probably not have bothered using it.
>
> (not sure if that example is very clear, but hey... :-)
>
> > From a development standpoint, the biggest benefit I see it would that
> > we could write unit tests for small clusters that run on a single
> > node.
>
> yeah, it is critical for unit testing.  right now we are forced to do
> testing in a rather clumsy fashion.  it is a big step backward from, for
> instance, the way I do testing with Apache Derby (which has hairy
> lifecycle management, but it is embeddable).
>
> > One interesting thing that this would make possible is the ability to
> > have a node with >1 tokens in a single JVM.  Useful, who knows?  But
> > it is interesting because I think it would make Cassandra more elastic
> > (and could theoretically help with the hot-node problem when using
> > OPP).
>
> (there are some usage scenarios using OSGi to run multiple Cassandra
> instances in the same JVM that come to mind, but I haven't really given
> this a lot of (any) detailed thought)
>
> -Bjørn
>
>

Re: Creating two instances in code

Posted by Bjorn Borud <bb...@gmail.com>.

Gary Dusbabek <gd...@gmail.com> writes:

>
> I looked into doing this when I was first learning the code and had an
> experience simliar to yours.  At the time there wasn't much interest
> in seeing it through to fruition, but maybe times have changed.

any lack of interest in solving these problems just means that people
haven't stumbled on these problems yet :-)
...but eventually they will (and people like Ran Tavory and the Hector
team have already stumbled across these hurdles and had to devote time
to creating some workarounds).

> If I were to attempt it again I would do it in this error:
> 1.  Make the config customizable.

Would it be good enough if you had a CassandraConfig object and some
ways to create it?  Either directly or through:

  CassandraConfig config = CassandraConfig.parseFile(...);

and then some:

  Cassandra cassandra = Cassandra.createInstance(config);

or even

  Cassandra cassandra = new Cassandra(config);

> 2.  Make the services re-entrant (You should be able to start, stop,
> then start again without problems).

you mean restart an instance or be able to throw away your instance and
create a new one?  for me, being able to restart a stopped instance
isn't really that important because it would work fine for me to create
a new instance (possibly with the same config, using the same files/dirs
and ports).  

you may have good reasons to be able to restart a stopped Cassandra
instance though.  (But I suspect we more or less want the same thing).

> 3.  Get rid of the singletons.  This will involve coming up with a
> smart way to couple instances of the services with each other.

indeed.  but I hope nobody falls for the temptation of introducing
Spring or something similar to do the wiring in the Cassandra
code. (what people do in their own projects is their problem, but
Cassandra should not require you to adopt additional mamoth frameworks).

> 4.  Integrate the storage port into how we canonically identify a node
> (its just hostname now).

hmm, I see your point, but I am not sure I understand the consequences
fully.

> 5.  While you're at it, figure out how to get JMX to bind to something
> other than 0.0.0.0.  (I hear it is possible, see
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6425769)

I have limited experience with JMX so I'll pass on commenting on this.

>> there are other valid reasons for wanting to embed Cassandra besides
>> unit testing.  for instance, if you are writing an application that
>> depends on Cassandra and you want the option of packaging it as a single
>> binary for single node experimentation, development and demo purposes.
>
> I'd kind of like to see this too, although I admit that from the
> pragmatic standpoint of running a Cassandra server, it represents a
> whole lot of change for what amounts to very little tangible benefit.

while the benefit may be hard to articulate, I think it is significant.
any time you can embed a "server" in your binary you can make life a lot
easier for casual users and for testing.

almost all server projects I have done in the past 7-8 years have been
like this:  I make it possible to embed the server so that people can
build and distribute prototypes or they can use the exact same binary to
either use an external (distributed) instance or just create an internal
instance for simpler use-cases (by config).

compare to Hudson.  it is distributed as a WAR so you can load it into
your web server.  but for most people, they just want it up and running
with as little hassle as possible on a single node, so being able to
fire it up from the command line, and rely on the embedded web server is
very attractive compared to fooling around with Jetty, Tomcat or worse.
if Hudson had required me to manage a number of services that I need to
manually set up and manage, I would probably not have bothered using it.

(not sure if that example is very clear, but hey... :-)

> From a development standpoint, the biggest benefit I see it would that
> we could write unit tests for small clusters that run on a single
> node.

yeah, it is critical for unit testing.  right now we are forced to do
testing in a rather clumsy fashion.  it is a big step backward from, for
instance, the way I do testing with Apache Derby (which has hairy
lifecycle management, but it is embeddable).

> One interesting thing that this would make possible is the ability to
> have a node with >1 tokens in a single JVM.  Useful, who knows?  But
> it is interesting because I think it would make Cassandra more elastic
> (and could theoretically help with the hot-node problem when using
> OPP).

(there are some usage scenarios using OSGi to run multiple Cassandra
instances in the same JVM that come to mind, but I haven't really given
this a lot of (any) detailed thought)

-Bjørn

Re: Creating two instances in code

Posted by Gary Dusbabek <gd...@gmail.com>.

On Fri, Aug 13, 2010 at 11:28, Bjorn Borud <bb...@gmail.com> wrote:
> Ryan Daum <ry...@thimbleware.com> writes:
>
>> This is very discouraging; I've looked several times at this code and could
>> not believe my eyes in regard to the wanton use of global statics. In
>> addition to smelling bad, it makes it difficult to embed Cassandra. Is there
>> no will at all to fix this?
>
> I experienced all manner of problems when trying to embed Cassandra
> myself. the primary reason I wanted to embed Cassandra was for unit
> testing.
>
> of course, reality came crashing in when I had more than one test and
> thus more than one embedded Cassandra instance.  I tried to look for
> quick solutions to this, but eventually flushed an entire week's work
> down the toilet and left for vacation.
>
> okay, so what I would have wanted to do if I had the time:
>
>  - go through the Cassandra code and remove singletons.
>
>  - make Cassandra easier to embed by making starting and stopping work
>    properly (for some reason that I have forgotten I had shutdown
>    and/or timing issues.  for servers to be embeddable the start() and
>    stop()/shutdown() methods need to block until some known state is
>    reached.  (if shutdown() has to be slow because of work that needs
>    to be done before safe shutdown it may be an idea to implement
>    kill() for unsafe shutdown -- for instance when you know you will
>    nuke the data anyway)
>
>  - Remove dependence on config files.  It should be possible to
>    just instantiate an embedded Cassandra server, pass it a config
>    object and then start it without having to touch the filesystem or
>    access any resource files for config. Depending on files or
>    resources for config is bad. (However, there is nothing wrong with
>    having a trivial API for reading files to produce a config object
>    you can then pass into Cassandra).
>    The detour I made into rendering an Apache Velocity template to
>    produce a storage-conf.xml only to have my embedded Cassandra
>    instance read it again was just silly.

I looked into doing this when I was first learning the code and had an
experience simliar to yours.  At the time there wasn't much interest
in seeing it through to fruition, but maybe times have changed.

If I were to attempt it again I would do it in this error:
1.  Make the config customizable.
2.  Make the services re-entrant (You should be able to start, stop,
then start again without problems).
3.  Get rid of the singletons.  This will involve coming up with a
smart way to couple instances of the services with each other.
4.  Integrate the storage port into how we canonically identify a node
(its just hostname now).
5.  While you're at it, figure out how to get JMX to bind to something
other than 0.0.0.0.  (I hear it is possible, see
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6425769)

> there are other valid reasons for wanting to embed Cassandra besides
> unit testing.  for instance, if you are writing an application that
> depends on Cassandra and you want the option of packaging it as a single
> binary for single node experimentation, development and demo purposes.

I'd kind of like to see this too, although I admit that from the
pragmatic standpoint of running a Cassandra server, it represents a
whole lot of change for what amounts to very little tangible benefit.

>From a development standpoint, the biggest benefit I see it would that
we could write unit tests for small clusters that run on a single
node.

One interesting thing that this would make possible is the ability to
have a node with >1 tokens in a single JVM.  Useful, who knows?  But
it is interesting because I think it would make Cassandra more elastic
(and could theoretically help with the hot-node problem when using
OPP).

Gary.

> so in short:  yes, I am very, very interested in Cassandra being
> embeddable,
> -Bjørn
>
>

Re: Creating two instances in code

Posted by Bjorn Borud <bb...@gmail.com>.

Ryan Daum <ry...@thimbleware.com> writes:

> This is very discouraging; I've looked several times at this code and could
> not believe my eyes in regard to the wanton use of global statics. In
> addition to smelling bad, it makes it difficult to embed Cassandra. Is there
> no will at all to fix this?

I experienced all manner of problems when trying to embed Cassandra
myself. the primary reason I wanted to embed Cassandra was for unit
testing.

I was using the @Rule annotation in JUnit to let junit create a unique
temporary directory for the Cassandra instance. Once I had a temp dir I
then created the needed directories and used the Apache Velocity
templating engine to produce a storage-conf.xml with absolute paths to
the various directories for commit logs, data etc. once the tests are
done the framework takes care of cleaning up the files. this also
ensures that if I run several tests in parallell I get separate unique
temp directories for each instance. (I saw Ran Tavory had contributed a
DataCleaner class (or what it was named) to do something similar, but I
didn't want to use that since JUnit already has the needed mechanisms
for doing this. besides, I didn't like relying on a single testing
directory.

of course, reality came crashing in when I had more than one test and
thus more than one embedded Cassandra instance.  I tried to look for
quick solutions to this, but eventually flushed an entire week's work
down the toilet and left for vacation.

now we plan to take an inferior approach to the testing simply because
we've run out of time to get this done properly.  (In an ideal world I
would be able to sit down with the Cassandra code, rewrite the parts
that are "misbehaving" and work with someone to get the code reviewed).

okay, so what I would have wanted to do if I had the time:

  - go through the Cassandra code and remove singletons.

  - make Cassandra easier to embed by making starting and stopping work
    properly (for some reason that I have forgotten I had shutdown
    and/or timing issues.  for servers to be embeddable the start() and
    stop()/shutdown() methods need to block until some known state is
    reached.  (if shutdown() has to be slow because of work that needs
    to be done before safe shutdown it may be an idea to implement
    kill() for unsafe shutdown -- for instance when you know you will
    nuke the data anyway)

  - Remove dependence on config files.  It should be possible to
    just instantiate an embedded Cassandra server, pass it a config
    object and then start it without having to touch the filesystem or
    access any resource files for config. Depending on files or
    resources for config is bad. (However, there is nothing wrong with
    having a trivial API for reading files to produce a config object
    you can then pass into Cassandra).
    The detour I made into rendering an Apache Velocity template to
    produce a storage-conf.xml only to have my embedded Cassandra
    instance read it again was just silly.

there are other valid reasons for wanting to embed Cassandra besides
unit testing.  for instance, if you are writing an application that
depends on Cassandra and you want the option of packaging it as a single
binary for single node experimentation, development and demo purposes.  

as an example, I am currently working on a project where I have a server
that will be talking to a Cassandra cluster of half a dozen nodes. but
other development projects depend on this server, so they need some
quick way of getting it up and running on their own workstations and
laptops-- so they can start the server with a command line option that
says "use an embedded Cassandra server". of course, in unit tests they
also want to be able to embed my server and, of course, Cassandra.

I've done this a few times with Apache Derby -- to give users the option
of running with an embedded SQL server if they don't want the hassle of
setting up a MySQL instance, or fire up the application and have it talk
to a MySQL instance.

so in short:  yes, I am very, very interested in Cassandra being
embeddable, I am very interested in being able to have more than one
Cassandra instance in the same JVM and I am very interested in being
able to programmatically configuring Cassandra rather than messing with
config files.  :-)

sorry for not having more time to actually go and do these things rather
than whine about them.

-Bjørn