You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Andy Ballingall TF <ba...@thefoundry.co.uk> on 2012/08/17 12:09:45 UTC

What is the ideal server-side technology stack to use with Cassandra?

Hi,

I've been running a number of tests with Cassandra using a couple of
PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and
PDO-cassandra (http://code.google.com/a/apache-extras.org/p/cassandra-pdo/),
and the experience hasn't been great, mainly because I can't try out
the CQL3.

Aaron Morton (aaron@thelastpickle.com) advised:

"If possible i would avoid using PHP. The PHP story with cassandra has
not been great in the past. There is little love for it, so it takes a
while for work changes to get in the client drivers.

AFAIK it lacks server side states which makes connection pooling
impossible. You should not pool cassandra connections in something
like HAProxy."

So my question is - if you were to build a new scalable project from
scratch tomorrow sitting on top of Cassandra, which technologies would
you select to serve HTTP requests to ensure you get:

a) The best support from the cassandra community (e.g. timely updates
of drivers, better stability)
b) Optimal efficiency between webservers and cassandra cluster, in
terms of the performance of individual requests and in the volumes of
connections handled per second
c) Ease of development and and deployment.

What worked for you, and why? What didn't work for you?


Thanks,
Andy


-- 
Andy Ballingall
Senior Software Engineer

The Foundry
6th Floor, The Communications Building,
48, Leicester Square,
London, WC2H 7LT, UK
Tel: +44 (0)20 7968 6828 - Fax: +44 (0)20 7930 8906
Web: http://www.thefoundry.co.uk/

The Foundry Visionmongers Ltd.
Registered in England and Wales No: 4642027

Re: What is the ideal server-side technology stack to use with Cassandra?

Posted by Maciej Miklas <ma...@gmail.com>.
I'am using Java + Tomcat + Spring + Hector  on Lunux - I works as always
just great.

It is also not bad idea to mix databases - Cassandra is not always solution
for every problem, Cassandra + Mongo could be ;)

On Fri, Aug 17, 2012 at 7:54 PM, Aaron Turner <sy...@gmail.com> wrote:

> My stack:
>
> Java + JRuby + Rails + Torquebox
>
> I'm using the Hector client (arguably the most mature out there) and
> JRuby+RoR+Torquebox gives me a great development platform which really
> scales (full native thread support for example) and is extremely
> powerful.  Honestly I expect, all my future RoR apps will be built on
> JRuby/Torquebox because I've been so happy with it even if I don't
> have a specific need to utilize Java libraries from inside the app.
>
> And the best part is that I've yet to have to write a single line of Java!
> :)
>
>
>
> On Fri, Aug 17, 2012 at 6:53 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
> > The best stack is the THC stack. :)
> >
> > Tomcat Hadoop Cassandra :)
> >
> > On Fri, Aug 17, 2012 at 6:09 AM, Andy Ballingall TF
> > <ba...@thefoundry.co.uk> wrote:
> >> Hi,
> >>
> >> I've been running a number of tests with Cassandra using a couple of
> >> PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and
> >> PDO-cassandra (
> http://code.google.com/a/apache-extras.org/p/cassandra-pdo/),
> >> and the experience hasn't been great, mainly because I can't try out
> >> the CQL3.
> >>
> >> Aaron Morton (aaron@thelastpickle.com) advised:
> >>
> >> "If possible i would avoid using PHP. The PHP story with cassandra has
> >> not been great in the past. There is little love for it, so it takes a
> >> while for work changes to get in the client drivers.
> >>
> >> AFAIK it lacks server side states which makes connection pooling
> >> impossible. You should not pool cassandra connections in something
> >> like HAProxy."
> >>
> >> So my question is - if you were to build a new scalable project from
> >> scratch tomorrow sitting on top of Cassandra, which technologies would
> >> you select to serve HTTP requests to ensure you get:
> >>
> >> a) The best support from the cassandra community (e.g. timely updates
> >> of drivers, better stability)
> >> b) Optimal efficiency between webservers and cassandra cluster, in
> >> terms of the performance of individual requests and in the volumes of
> >> connections handled per second
> >> c) Ease of development and and deployment.
> >>
> >> What worked for you, and why? What didn't work for you?
>
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>

Re: What is the ideal server-side technology stack to use with Cassandra?

Posted by "Hiller, Dean" <De...@nrel.gov>.
As far as opinions go, the stack we are using is

Playframework 1.2.5 (the stateless nature rocks compared to other
platforms like tomcat or servlet container stuff).
playOrm
Astyanax

Later,
Dean

On 8/17/12 11:54 AM, "Aaron Turner" <sy...@gmail.com> wrote:

>My stack:
>
>Java + JRuby + Rails + Torquebox
>
>I'm using the Hector client (arguably the most mature out there) and
>JRuby+RoR+Torquebox gives me a great development platform which really
>scales (full native thread support for example) and is extremely
>powerful.  Honestly I expect, all my future RoR apps will be built on
>JRuby/Torquebox because I've been so happy with it even if I don't
>have a specific need to utilize Java libraries from inside the app.
>
>And the best part is that I've yet to have to write a single line of
>Java! :)
>
>
>
>On Fri, Aug 17, 2012 at 6:53 AM, Edward Capriolo <ed...@gmail.com>
>wrote:
>> The best stack is the THC stack. :)
>>
>> Tomcat Hadoop Cassandra :)
>>
>> On Fri, Aug 17, 2012 at 6:09 AM, Andy Ballingall TF
>> <ba...@thefoundry.co.uk> wrote:
>>> Hi,
>>>
>>> I've been running a number of tests with Cassandra using a couple of
>>> PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and
>>> PDO-cassandra 
>>>(http://code.google.com/a/apache-extras.org/p/cassandra-pdo/),
>>> and the experience hasn't been great, mainly because I can't try out
>>> the CQL3.
>>>
>>> Aaron Morton (aaron@thelastpickle.com) advised:
>>>
>>> "If possible i would avoid using PHP. The PHP story with cassandra has
>>> not been great in the past. There is little love for it, so it takes a
>>> while for work changes to get in the client drivers.
>>>
>>> AFAIK it lacks server side states which makes connection pooling
>>> impossible. You should not pool cassandra connections in something
>>> like HAProxy."
>>>
>>> So my question is - if you were to build a new scalable project from
>>> scratch tomorrow sitting on top of Cassandra, which technologies would
>>> you select to serve HTTP requests to ensure you get:
>>>
>>> a) The best support from the cassandra community (e.g. timely updates
>>> of drivers, better stability)
>>> b) Optimal efficiency between webservers and cassandra cluster, in
>>> terms of the performance of individual requests and in the volumes of
>>> connections handled per second
>>> c) Ease of development and and deployment.
>>>
>>> What worked for you, and why? What didn't work for you?
>
>-- 
>Aaron Turner
>http://synfin.net/         Twitter: @synfinatic
>http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
>Windows
>Those who would give up essential Liberty, to purchase a little temporary
>Safety, deserve neither Liberty nor Safety.
>    -- Benjamin Franklin
>"carpe diem quam minimum credula postero"


Re: What is the ideal server-side technology stack to use with Cassandra?

Posted by Aaron Turner <sy...@gmail.com>.
My stack:

Java + JRuby + Rails + Torquebox

I'm using the Hector client (arguably the most mature out there) and
JRuby+RoR+Torquebox gives me a great development platform which really
scales (full native thread support for example) and is extremely
powerful.  Honestly I expect, all my future RoR apps will be built on
JRuby/Torquebox because I've been so happy with it even if I don't
have a specific need to utilize Java libraries from inside the app.

And the best part is that I've yet to have to write a single line of Java! :)



On Fri, Aug 17, 2012 at 6:53 AM, Edward Capriolo <ed...@gmail.com> wrote:
> The best stack is the THC stack. :)
>
> Tomcat Hadoop Cassandra :)
>
> On Fri, Aug 17, 2012 at 6:09 AM, Andy Ballingall TF
> <ba...@thefoundry.co.uk> wrote:
>> Hi,
>>
>> I've been running a number of tests with Cassandra using a couple of
>> PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and
>> PDO-cassandra (http://code.google.com/a/apache-extras.org/p/cassandra-pdo/),
>> and the experience hasn't been great, mainly because I can't try out
>> the CQL3.
>>
>> Aaron Morton (aaron@thelastpickle.com) advised:
>>
>> "If possible i would avoid using PHP. The PHP story with cassandra has
>> not been great in the past. There is little love for it, so it takes a
>> while for work changes to get in the client drivers.
>>
>> AFAIK it lacks server side states which makes connection pooling
>> impossible. You should not pool cassandra connections in something
>> like HAProxy."
>>
>> So my question is - if you were to build a new scalable project from
>> scratch tomorrow sitting on top of Cassandra, which technologies would
>> you select to serve HTTP requests to ensure you get:
>>
>> a) The best support from the cassandra community (e.g. timely updates
>> of drivers, better stability)
>> b) Optimal efficiency between webservers and cassandra cluster, in
>> terms of the performance of individual requests and in the volumes of
>> connections handled per second
>> c) Ease of development and and deployment.
>>
>> What worked for you, and why? What didn't work for you?

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Re: What is the ideal server-side technology stack to use with Cassandra?

Posted by Edward Capriolo <ed...@gmail.com>.
The best stack is the THC stack. :)

Tomcat Hadoop Cassandra :)

On Fri, Aug 17, 2012 at 6:09 AM, Andy Ballingall TF
<ba...@thefoundry.co.uk> wrote:
> Hi,
>
> I've been running a number of tests with Cassandra using a couple of
> PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and
> PDO-cassandra (http://code.google.com/a/apache-extras.org/p/cassandra-pdo/),
> and the experience hasn't been great, mainly because I can't try out
> the CQL3.
>
> Aaron Morton (aaron@thelastpickle.com) advised:
>
> "If possible i would avoid using PHP. The PHP story with cassandra has
> not been great in the past. There is little love for it, so it takes a
> while for work changes to get in the client drivers.
>
> AFAIK it lacks server side states which makes connection pooling
> impossible. You should not pool cassandra connections in something
> like HAProxy."
>
> So my question is - if you were to build a new scalable project from
> scratch tomorrow sitting on top of Cassandra, which technologies would
> you select to serve HTTP requests to ensure you get:
>
> a) The best support from the cassandra community (e.g. timely updates
> of drivers, better stability)
> b) Optimal efficiency between webservers and cassandra cluster, in
> terms of the performance of individual requests and in the volumes of
> connections handled per second
> c) Ease of development and and deployment.
>
> What worked for you, and why? What didn't work for you?
>
>
> Thanks,
> Andy
>
>
> --
> Andy Ballingall
> Senior Software Engineer
>
> The Foundry
> 6th Floor, The Communications Building,
> 48, Leicester Square,
> London, WC2H 7LT, UK
> Tel: +44 (0)20 7968 6828 - Fax: +44 (0)20 7930 8906
> Web: http://www.thefoundry.co.uk/
>
> The Foundry Visionmongers Ltd.
> Registered in England and Wales No: 4642027

Re: What is the ideal server-side technology stack to use with Cassandra?

Posted by Tim Wintle <ti...@gmail.com>.
On Fri, 2012-08-17 at 11:09 +0100, Andy Ballingall TF wrote:
> So my question is - if you were to build a new scalable project from
> scratch tomorrow sitting on top of Cassandra, which technologies would
> you select to serve HTTP requests to ensure you get:
> 
> a) The best support from the cassandra community (e.g. timely updates
> of drivers, better stability)
> b) Optimal efficiency between webservers and cassandra cluster, in
> terms of the performance of individual requests and in the volumes of
> connections handled per second
> c) Ease of development and and deployment.
> 
> What worked for you, and why? What didn't work for you?

We do almost everything in python, so our stack is basically
python-everywhere (with a bit of C and a bit of PHP).

If you're most comfortable in PHP, I'd suggest writing a data layer in
another language (Java or python) which handles the cassandra requests,
and then making requests back to that from PHP.

That's general advice for any scalable system though - the frontends are
stateless and can be scaled out horizontally (with caching if it fits
your requirements).

If you split your Data layer into parts that are stateless and parts
which aren't then you can load balance the horizontally scalable parts
of that layer using something like haproxy too if you need to.

Tim Wintle


Re: What is the ideal server-side technology stack to use with Cassandra?

Posted by Alex Major <al...@gmail.com>.
On Sun, Aug 19, 2012 at 11:04 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> On Sun, Aug 19, 2012 at 3:55 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>>
>>
>> It is not a judgement on the quality of PHPCassa or PDO-cassandra,
>> neither of which I have used.
>>
>> My comments were mostly informed by past issues with Thrift and PHP.
>>
>
> Eh, you don't need to disclaim your opinion that much :)
>
> The PHP clients have, overall, been a bit rough and slow moving compared
> to the Java and Python clients.  My hope is that the transition to cql3
> will it easier to maintain the drivers and clients; it just tends to be a
> lot of work with PHP.
>
> Thrift does have some issues of its own, so perhaps the custom protocol
> that's replacing it will smooth out some of the issues.  Regardless, some
> work on enabling persistent connections is definitely needed.  If anybody
> is familiar enough with that to lend a hand, I would be glad to get some
> kind of support in.
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>
The company I work for currently uses PHP with Cassandra in production and
we're certainly interested in helping out with this. However for persistent
connections and some of the more advanced features, I think it would
require a move away from PHPCassa to the PDO extension. I was around when
the mysql extension introduced persistent connections and it wasn't as
painful as first thought.

We're using PHPCassa at the moment, but currently doing a data-model
re-write towards CQL3 with compound columns/sets. For our part, we were
looking at first moving the PDO driver (which needs some TLC) to CQL3, but
not until the native driver is out in Cassandra 1.2.

The only thing that the PHP Driver won't natively be able to handle
properly is connection pooling (as its stateless), however that can fairly
painlessly be handled in the application via APC (we currently use this
option).

Given a little time I would have confidence that PHP drivers will catch up
to other language drivers, I know we're not the only ones interested in
helping out with that effort.

Re: What is the ideal server-side technology stack to use with Cassandra?

Posted by Tyler Hobbs <ty...@datastax.com>.
On Sun, Aug 19, 2012 at 3:55 AM, aaron morton <aa...@thelastpickle.com>wrote:

>
>
> It is not a judgement on the quality of PHPCassa or PDO-cassandra, neither
> of which I have used.
>
> My comments were mostly informed by past issues with Thrift and PHP.
>

Eh, you don't need to disclaim your opinion that much :)

The PHP clients have, overall, been a bit rough and slow moving compared to
the Java and Python clients.  My hope is that the transition to cql3 will
it easier to maintain the drivers and clients; it just tends to be a lot of
work with PHP.

Thrift does have some issues of its own, so perhaps the custom protocol
that's replacing it will smooth out some of the issues.  Regardless, some
work on enabling persistent connections is definitely needed.  If anybody
is familiar enough with that to lend a hand, I would be glad to get some
kind of support in.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: What is the ideal server-side technology stack to use with Cassandra?

Posted by Andy Ballingall TF <ba...@thefoundry.co.uk>.
On Aug 19, 2012 9:55 AM, "aaron morton" <aa...@thelastpickle.com> wrote:
>
> > Aaron Morton (aaron@thelastpickle.com) advised:
> >
> > "If possible i would avoid using PHP. The PHP story with cassandra has
> > not been great in the past. There is little love for it, so it takes a
> > while for work changes to get in the client drivers.
> >
> > AFAIK it lacks server side states which makes connection pooling
> > impossible. You should not pool cassandra connections in something
> > like HAProxy."
>
> Please note, this was a personal opinion expressed off list.
>
> It is not a judgement on the quality of PHPCassa or PDO-cassandra,
neither of which I have used.

I'd like to apologise to Aaron for taking part of a private message and
sharing it in public without permission, and for any potential
embarrassment caused. I'm certainly embarrassed by the thoughtlessness I've
displayed.

I've used PHP successfully in many projects, and though I didn't take his
comment as a criticism of the efforts of others in the PHP community, I now
appreciate that some might do. In any case, nothing excuses my lack of
respect for Aaron's privacy, which was a victim purely of particularly
clumsy attempt to open up a public debate and no more.

I suspect I'm not the only person trying to identify the suitability or
otherwise of an application stack with Cassandra. In general, if there's a
path of least resistance, or a path supported by a larger community, then
I'd consider that path rather than impose choices that worked in previous
projects. As Cassandra will probably be the core of that I'm working on, if
there are good reasons why PHP isn't an optimal choice, then I'd consider
adopting the alternative, and that's all I'm trying to get to the bottom
of. Believe me, I'd prefer not to learn yet-another-software stack if I can
help it!

Finally, I do hope that despite my stupidity, Aaron will forgive me and
contribute to this discussion.


Andy








>
> My comments were mostly informed by past issues with Thrift and PHP.
>
> Aaron
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/08/2012, at 10:09 PM, Andy Ballingall TF <
ballingall@thefoundry.co.uk> wrote:
>
> > Hi,
> >
> > I've been running a number of tests with Cassandra using a couple of
> > PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and
> > PDO-cassandra (
http://code.google.com/a/apache-extras.org/p/cassandra-pdo/),
> > and the experience hasn't been great, mainly because I can't try out
> > the CQL3.
> >
> > Aaron Morton (aaron@thelastpickle.com) advised:
> >
> > "If possible i would avoid using PHP. The PHP story with cassandra has
> > not been great in the past. There is little love for it, so it takes a
> > while for work changes to get in the client drivers.
> >
> > AFAIK it lacks server side states which makes connection pooling
> > impossible. You should not pool cassandra connections in something
> > like HAProxy."
> >
> > So my question is - if you were to build a new scalable project from
> > scratch tomorrow sitting on top of Cassandra, which technologies would
> > you select to serve HTTP requests to ensure you get:
> >
> > a) The best support from the cassandra community (e.g. timely updates
> > of drivers, better stability)
> > b) Optimal efficiency between webservers and cassandra cluster, in
> > terms of the performance of individual requests and in the volumes of
> > connections handled per second
> > c) Ease of development and and deployment.
> >
> > What worked for you, and why? What didn't work for you?
> >
> >
> > Thanks,
> > Andy
> >
> >
> > --
> > Andy Ballingall
> > Senior Software Engineer
> >
> > The Foundry
> > 6th Floor, The Communications Building,
> > 48, Leicester Square,
> > London, WC2H 7LT, UK
> > Tel: +44 (0)20 7968 6828 - Fax: +44 (0)20 7930 8906
> > Web: http://www.thefoundry.co.uk/
> >
> > The Foundry Visionmongers Ltd.
> > Registered in England and Wales No: 4642027
>

Re: What is the ideal server-side technology stack to use with Cassandra?

Posted by aaron morton <aa...@thelastpickle.com>.
> Aaron Morton (aaron@thelastpickle.com) advised:
> 
> "If possible i would avoid using PHP. The PHP story with cassandra has
> not been great in the past. There is little love for it, so it takes a
> while for work changes to get in the client drivers.
> 
> AFAIK it lacks server side states which makes connection pooling
> impossible. You should not pool cassandra connections in something
> like HAProxy."

Please note, this was a personal opinion expressed off list. 

It is not a judgement on the quality of PHPCassa or PDO-cassandra, neither of which I have used. 

My comments were mostly informed by past issues with Thrift and PHP. 
 
Aaron

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/08/2012, at 10:09 PM, Andy Ballingall TF <ba...@thefoundry.co.uk> wrote:

> Hi,
> 
> I've been running a number of tests with Cassandra using a couple of
> PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and
> PDO-cassandra (http://code.google.com/a/apache-extras.org/p/cassandra-pdo/),
> and the experience hasn't been great, mainly because I can't try out
> the CQL3.
> 
> Aaron Morton (aaron@thelastpickle.com) advised:
> 
> "If possible i would avoid using PHP. The PHP story with cassandra has
> not been great in the past. There is little love for it, so it takes a
> while for work changes to get in the client drivers.
> 
> AFAIK it lacks server side states which makes connection pooling
> impossible. You should not pool cassandra connections in something
> like HAProxy."
> 
> So my question is - if you were to build a new scalable project from
> scratch tomorrow sitting on top of Cassandra, which technologies would
> you select to serve HTTP requests to ensure you get:
> 
> a) The best support from the cassandra community (e.g. timely updates
> of drivers, better stability)
> b) Optimal efficiency between webservers and cassandra cluster, in
> terms of the performance of individual requests and in the volumes of
> connections handled per second
> c) Ease of development and and deployment.
> 
> What worked for you, and why? What didn't work for you?
> 
> 
> Thanks,
> Andy
> 
> 
> -- 
> Andy Ballingall
> Senior Software Engineer
> 
> The Foundry
> 6th Floor, The Communications Building,
> 48, Leicester Square,
> London, WC2H 7LT, UK
> Tel: +44 (0)20 7968 6828 - Fax: +44 (0)20 7930 8906
> Web: http://www.thefoundry.co.uk/
> 
> The Foundry Visionmongers Ltd.
> Registered in England and Wales No: 4642027