You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Eric Evans <ee...@acunu.com> on 2012/01/11 22:14:53 UTC

Re: Prepared Statement support (CASSANDRA-2475)

On Wed, Dec 14, 2011 at 5:49 PM, Eric Evans <ee...@acunu.com> wrote:
> Thanks to the hard work of Rick Shaw, prepared statements
> (https://issues.apache.org/jira/browse/CASSANDRA-2475) has been
> committed to trunk.  However, before you use it, be advised that the
> API might be changing in the next few days.
>
> If it does change, it should be limited to moving the bind parameters
> from string to bytes, (pending a comparison of the performance).  I'll
> send another email with the changes, if any, after the API is expected
> to be stable.

To follow up on this, and to draw the attention of the folks on
client-dev@ who have some stake in this:

There was some discussion in #2475, and later in #3634, about whether
clients would supply string, or binary bind arguments for a prepared
statement.

I encourage anyone that is interesting to read through the tickets,
but the short version is that since Cassandra uses binary values
internally, having the clients serialize types to binary would be more
performant than having Cassandra do it, while string arguments result
in simpler, easier to code drivers.  Since it boils down to a question
of trading one thing for another, we agreed to do some performance
testing so that we could at least put some real numbers to it.

That performance testing is complete.  Again, I encourage you to check
out the results[2] yourself, but they could be summarized by saying
that most operations (reads, counter increments, inserts with an
indexed columns) are equivalent.  It mostly boils down to standard
inserts which are 10% faster when using binary arguments than for
string arguments.  It's worth noting (because either way it's awesome
:)), that even with string arguments, writes using prepared statements
are 5% faster than RPC (16% with binary arguments).

We need to drive a stake in the ground Real Soon Now, but since this
issue directly affects client maintainers, I'd be interested in
hearing what they had to say about this (either here, or in the
ticket).

Cheers,


[1]: https://issues.apache.org/jira/browse/CASSANDRA-2475
[2]: https://issues.apache.org/jira/browse/CASSANDRA-3634


-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Jonathan Ellis <jb...@gmail.com>.
You can always use hex in classic string-based, non-prepared-statements CQL.

On Fri, Jan 13, 2012 at 10:53 AM, Jake Luciani <ja...@gmail.com> wrote:
> Not to mention in the case of CFS we deal only in binary blobs.
>
> I'd rather see us add a hex hack for JS and PHP rather then cater to them.
>
> -Jake
>
> On Fri, Jan 13, 2012 at 11:42 AM, Sylvain Lebresne <sy...@datastax.com>wrote:
>
>> I think CQL has a problem, it doesn't deal with binary correctly.
>> I have very successfully used C* to store big amounts of small
>> pictures and CQL just cannot handle that efficiently.
>>
>> I've looked quickly but I didn't see anywhere in the tests on
>> CASSANDRA-3634 biggish column values tested (say in the order of 2-3
>> digits of KB). I'm willing to bet that with that, the difference is
>> not 10%, because there is the time to serialize/deserialize to string,
>> but also the exploded size of binary represented as hex strings.
>>
>> I think prepared statement are a very good candidate to solve that
>> 'handling binary correctly' problem, but only if we use binary for it.
>> To me that is *very* big argument against Strings. Sure we could add
>> yet another feature (or some hack) to handle binary, but why bother
>> when prepared statement with binary gives us that *and* we get 10%
>> faster writes even for small values.
>>
>> --
>> Sylvain
>>
>>
>> On Fri, Jan 13, 2012 at 5:01 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> > On Fri, Jan 13, 2012 at 9:45 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>> >> On Fri, Jan 13, 2012 at 8:11 AM, Gary Dusbabek <gd...@gmail.com>
>> wrote:
>> >>
>> >>>
>> >>> Not all languages <cough>Javascript</cough> make it easy to do binary.
>> >>
>> >>
>> >> PHP also goes in this boat, which leads me to agree with Gary.
>> >
>> > I don't get it.  Don't you already have the binary encoding done for
>> > the PHPCassa?
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder of DataStax, the source for professional Cassandra support
>> > http://www.datastax.com
>>
>
>
>
> --
> http://twitter.com/tjake



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Jake Luciani <ja...@gmail.com>.
Not to mention in the case of CFS we deal only in binary blobs.

I'd rather see us add a hex hack for JS and PHP rather then cater to them.

-Jake

On Fri, Jan 13, 2012 at 11:42 AM, Sylvain Lebresne <sy...@datastax.com>wrote:

> I think CQL has a problem, it doesn't deal with binary correctly.
> I have very successfully used C* to store big amounts of small
> pictures and CQL just cannot handle that efficiently.
>
> I've looked quickly but I didn't see anywhere in the tests on
> CASSANDRA-3634 biggish column values tested (say in the order of 2-3
> digits of KB). I'm willing to bet that with that, the difference is
> not 10%, because there is the time to serialize/deserialize to string,
> but also the exploded size of binary represented as hex strings.
>
> I think prepared statement are a very good candidate to solve that
> 'handling binary correctly' problem, but only if we use binary for it.
> To me that is *very* big argument against Strings. Sure we could add
> yet another feature (or some hack) to handle binary, but why bother
> when prepared statement with binary gives us that *and* we get 10%
> faster writes even for small values.
>
> --
> Sylvain
>
>
> On Fri, Jan 13, 2012 at 5:01 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> > On Fri, Jan 13, 2012 at 9:45 AM, Tyler Hobbs <ty...@datastax.com> wrote:
> >> On Fri, Jan 13, 2012 at 8:11 AM, Gary Dusbabek <gd...@gmail.com>
> wrote:
> >>
> >>>
> >>> Not all languages <cough>Javascript</cough> make it easy to do binary.
> >>
> >>
> >> PHP also goes in this boat, which leads me to agree with Gary.
> >
> > I don't get it.  Don't you already have the binary encoding done for
> > the PHPCassa?
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra support
> > http://www.datastax.com
>



-- 
http://twitter.com/tjake

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Sylvain Lebresne <sy...@datastax.com>.
I think CQL has a problem, it doesn't deal with binary correctly.
I have very successfully used C* to store big amounts of small
pictures and CQL just cannot handle that efficiently.

I've looked quickly but I didn't see anywhere in the tests on
CASSANDRA-3634 biggish column values tested (say in the order of 2-3
digits of KB). I'm willing to bet that with that, the difference is
not 10%, because there is the time to serialize/deserialize to string,
but also the exploded size of binary represented as hex strings.

I think prepared statement are a very good candidate to solve that
'handling binary correctly' problem, but only if we use binary for it.
To me that is *very* big argument against Strings. Sure we could add
yet another feature (or some hack) to handle binary, but why bother
when prepared statement with binary gives us that *and* we get 10%
faster writes even for small values.

--
Sylvain


On Fri, Jan 13, 2012 at 5:01 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Fri, Jan 13, 2012 at 9:45 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>> On Fri, Jan 13, 2012 at 8:11 AM, Gary Dusbabek <gd...@gmail.com> wrote:
>>
>>>
>>> Not all languages <cough>Javascript</cough> make it easy to do binary.
>>
>>
>> PHP also goes in this boat, which leads me to agree with Gary.
>
> I don't get it.  Don't you already have the binary encoding done for
> the PHPCassa?
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Jonathan Ellis <jb...@gmail.com>.
On Fri, Jan 13, 2012 at 9:45 AM, Tyler Hobbs <ty...@datastax.com> wrote:
> On Fri, Jan 13, 2012 at 8:11 AM, Gary Dusbabek <gd...@gmail.com> wrote:
>
>>
>> Not all languages <cough>Javascript</cough> make it easy to do binary.
>
>
> PHP also goes in this boat, which leads me to agree with Gary.

I don't get it.  Don't you already have the binary encoding done for
the PHPCassa?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Tyler Hobbs <ty...@datastax.com>.
On Fri, Jan 13, 2012 at 8:11 AM, Gary Dusbabek <gd...@gmail.com> wrote:

>
> Not all languages <cough>Javascript</cough> make it easy to do binary.


PHP also goes in this boat, which leads me to agree with Gary.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Gary Dusbabek <gd...@gmail.com>.
I'd like to see the clients stay as simple as possible, keeping the
bind parameters as strings.

Not all languages <cough>Javascript</cough> make it easy to do binary.

Gary.


On Wed, Jan 11, 2012 at 15:14, Eric Evans <ee...@acunu.com> wrote:
> On Wed, Dec 14, 2011 at 5:49 PM, Eric Evans <ee...@acunu.com> wrote:
>> Thanks to the hard work of Rick Shaw, prepared statements
>> (https://issues.apache.org/jira/browse/CASSANDRA-2475) has been
>> committed to trunk.  However, before you use it, be advised that the
>> API might be changing in the next few days.
>>
>> If it does change, it should be limited to moving the bind parameters
>> from string to bytes, (pending a comparison of the performance).  I'll
>> send another email with the changes, if any, after the API is expected
>> to be stable.
>
> To follow up on this, and to draw the attention of the folks on
> client-dev@ who have some stake in this:
>
> There was some discussion in #2475, and later in #3634, about whether
> clients would supply string, or binary bind arguments for a prepared
> statement.
>
> I encourage anyone that is interesting to read through the tickets,
> but the short version is that since Cassandra uses binary values
> internally, having the clients serialize types to binary would be more
> performant than having Cassandra do it, while string arguments result
> in simpler, easier to code drivers.  Since it boils down to a question
> of trading one thing for another, we agreed to do some performance
> testing so that we could at least put some real numbers to it.
>
> That performance testing is complete.  Again, I encourage you to check
> out the results[2] yourself, but they could be summarized by saying
> that most operations (reads, counter increments, inserts with an
> indexed columns) are equivalent.  It mostly boils down to standard
> inserts which are 10% faster when using binary arguments than for
> string arguments.  It's worth noting (because either way it's awesome
> :)), that even with string arguments, writes using prepared statements
> are 5% faster than RPC (16% with binary arguments).
>
> We need to drive a stake in the ground Real Soon Now, but since this
> issue directly affects client maintainers, I'd be interested in
> hearing what they had to say about this (either here, or in the
> ticket).
>
> Cheers,
>
>
> [1]: https://issues.apache.org/jira/browse/CASSANDRA-2475
> [2]: https://issues.apache.org/jira/browse/CASSANDRA-3634
>
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Eric Evans <ee...@acunu.com>.
On Wed, Jan 11, 2012 at 3:14 PM, Eric Evans <ee...@acunu.com> wrote:
> I encourage anyone that is interesting to read through the tickets...

Of course that should have read "interested", not "interesting".
Uninteresting people who are interested, are also encouraged to the
read the issues. :)

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Eric Evans <ee...@acunu.com>.
On Wed, Jan 11, 2012 at 3:14 PM, Eric Evans <ee...@acunu.com> wrote:
> We need to drive a stake in the ground Real Soon Now, but since this
> issue directly affects client maintainers, I'd be interested in
> hearing what they had to say about this (either here, or in the
> ticket).

FYI, trunk has been updated so that prepared statements use binary
arguments: http://goo.gl/j54o4

Thanks,

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Nate McCall <na...@datastax.com>.
Hi Eric,
Thanks for the follow up. I see the point of "increased complexity on
the clients" keep coming up in the references, but the truth is that
we've pretty much all had to abstract serialization to some degree or
another just to keep up with changes. At least in the case of Hector,
dealing directly with the byte encoded form would be easier.

If it's a couple of ticks faster on the server side, that's fine with me.

Thanks for thinking of us though :)

On Wed, Jan 11, 2012 at 3:14 PM, Eric Evans <ee...@acunu.com> wrote:
> On Wed, Dec 14, 2011 at 5:49 PM, Eric Evans <ee...@acunu.com> wrote:
>> Thanks to the hard work of Rick Shaw, prepared statements
>> (https://issues.apache.org/jira/browse/CASSANDRA-2475) has been
>> committed to trunk.  However, before you use it, be advised that the
>> API might be changing in the next few days.
>>
>> If it does change, it should be limited to moving the bind parameters
>> from string to bytes, (pending a comparison of the performance).  I'll
>> send another email with the changes, if any, after the API is expected
>> to be stable.
>
> To follow up on this, and to draw the attention of the folks on
> client-dev@ who have some stake in this:
>
> There was some discussion in #2475, and later in #3634, about whether
> clients would supply string, or binary bind arguments for a prepared
> statement.
>
> I encourage anyone that is interesting to read through the tickets,
> but the short version is that since Cassandra uses binary values
> internally, having the clients serialize types to binary would be more
> performant than having Cassandra do it, while string arguments result
> in simpler, easier to code drivers.  Since it boils down to a question
> of trading one thing for another, we agreed to do some performance
> testing so that we could at least put some real numbers to it.
>
> That performance testing is complete.  Again, I encourage you to check
> out the results[2] yourself, but they could be summarized by saying
> that most operations (reads, counter increments, inserts with an
> indexed columns) are equivalent.  It mostly boils down to standard
> inserts which are 10% faster when using binary arguments than for
> string arguments.  It's worth noting (because either way it's awesome
> :)), that even with string arguments, writes using prepared statements
> are 5% faster than RPC (16% with binary arguments).
>
> We need to drive a stake in the ground Real Soon Now, but since this
> issue directly affects client maintainers, I'd be interested in
> hearing what they had to say about this (either here, or in the
> ticket).
>
> Cheers,
>
>
> [1]: https://issues.apache.org/jira/browse/CASSANDRA-2475
> [2]: https://issues.apache.org/jira/browse/CASSANDRA-3634
>
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu

Re: Prepared Statement support (CASSANDRA-2475)

Posted by Eric Evans <ee...@acunu.com>.
On Wed, Jan 11, 2012 at 3:14 PM, Eric Evans <ee...@acunu.com> wrote:
> We need to drive a stake in the ground Real Soon Now, but since this
> issue directly affects client maintainers, I'd be interested in
> hearing what they had to say about this (either here, or in the
> ticket).

FYI, trunk has been updated so that prepared statements use binary
arguments: http://goo.gl/j54o4

Thanks,

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu