You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by buddhasystem <po...@bnl.gov> on 2011/03/19 00:54:18 UTC

Reading whole row vs a range of columns (pycassa)

Is there is noticeable difference in speed between reading the whole row
through Pycassa, vs a range of columns? Both rows and columns are pretty
slim.


--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186518.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Reading whole row vs a range of columns (pycassa)

Posted by aaron morton <aa...@thelastpickle.com>.

Internally a multiget just turned into a series of single row gets. There is no seek and partial scan such as you may see when reading from the clustered index in a RDBMS. 

Unless you have a performance problem and you've tried other things I'd put this idea of the back burner. There are many other factors that impact read performance, and OOP requires a lot more care than RP.

Aaron
 
On 21 Mar 2011, at 11:36, buddhasystem wrote:

> Aaron, thanks for chiming in.
> 
> I'm doing what you said, i.e. all data for a single object (which is quite
> lean with about 100 attributes 10 bytes each) just goes into a single
> column, as opposed to the previous version of my application, which had all
> attributes of each small object mapped to individual columns.
> 
> So yes, I perhaps considered having 100 objects in a single column but that
> is suboptimal for many reasons (hard to add object later).
> 
> My reference to OOP was this -- if I was sticking with the original design,
> it could have been advantageous to have OOP since statistically it's likely
> that requests for objects are often serial, e.g. often people don't query
> for just one object with id=123, but for a series like id=[123..145]. If I
> bunch these into rows containing 100 objects each, that promises some
> efficiency right there, as I read one row as opposed to say 50.
> 
> 
> 
> 
> aaron morton wrote:
>> 
>> I'd collapse all the data for a single object into a single column, not
>> sure about storing 100 objects in a single column though. 
>> 
>> Have you considered any concurrency issues ? e.g. multiple threads /
>> processes wanting to update different objects in the same group of 100? 
>> 
>> Dont understand your reference to the OOP in the context of a reading 100
>> columns from a row. 
>> 
>> Aaron
>> 
>> 
>> On 19 Mar 2011, at 16:22, buddhasystem wrote:
>> 
>> &gt; As I'm working on this further, I want to understand this:
>> &gt; 
>> &gt; Is it advantageous to flatten data in blocks (strings) each
>> containing a
>> &gt; series of objects, if I know that a serial object read is often
>> likely, but
>> &gt; don't want to resort to OPP? I worked out the optimal granularity, it
>> seems.
>> &gt; Is it better to read a serialized single column with 100 objects than
>> a row
>> &gt; consisting of a hundred columns each modeling an object?
>> &gt; 
>> &gt; --
>> &gt; View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html
>> &gt; Sent from the cassandra-user@incubator.apache.org mailing list
>> archive at Nabble.com.
>> 
> 
> 
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6190639.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Reading whole row vs a range of columns (pycassa)

Posted by buddhasystem <po...@bnl.gov>.

Aaron, thanks for chiming in.

I'm doing what you said, i.e. all data for a single object (which is quite
lean with about 100 attributes 10 bytes each) just goes into a single
column, as opposed to the previous version of my application, which had all
attributes of each small object mapped to individual columns.

So yes, I perhaps considered having 100 objects in a single column but that
is suboptimal for many reasons (hard to add object later).

My reference to OOP was this -- if I was sticking with the original design,
it could have been advantageous to have OOP since statistically it's likely
that requests for objects are often serial, e.g. often people don't query
for just one object with id=123, but for a series like id=[123..145]. If I
bunch these into rows containing 100 objects each, that promises some
efficiency right there, as I read one row as opposed to say 50.

aaron morton wrote:
> 
> I'd collapse all the data for a single object into a single column, not
> sure about storing 100 objects in a single column though. 
> 
> Have you considered any concurrency issues ? e.g. multiple threads /
> processes wanting to update different objects in the same group of 100? 
> 
> Dont understand your reference to the OOP in the context of a reading 100
> columns from a row. 
> 
> Aaron
> 
>  
> On 19 Mar 2011, at 16:22, buddhasystem wrote:
> 
> &gt; As I'm working on this further, I want to understand this:
> &gt; 
> &gt; Is it advantageous to flatten data in blocks (strings) each
> containing a
> &gt; series of objects, if I know that a serial object read is often
> likely, but
> &gt; don't want to resort to OPP? I worked out the optimal granularity, it
> seems.
> &gt; Is it better to read a serialized single column with 100 objects than
> a row
> &gt; consisting of a hundred columns each modeling an object?
> &gt; 
> &gt; --
> &gt; View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html
> &gt; Sent from the cassandra-user@incubator.apache.org mailing list
> archive at Nabble.com.
> 

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6190639.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Reading whole row vs a range of columns (pycassa)

Posted by aaron morton <aa...@thelastpickle.com>.

I'd collapse all the data for a single object into a single column, not sure about storing 100 objects in a single column though. 

Have you considered any concurrency issues ? e.g. multiple threads / processes wanting to update different objects in the same group of 100? 

Dont understand your reference to the OOP in the context of a reading 100 columns from a row. 

Aaron

On 19 Mar 2011, at 16:22, buddhasystem wrote:

> As I'm working on this further, I want to understand this:
> 
> Is it advantageous to flatten data in blocks (strings) each containing a
> series of objects, if I know that a serial object read is often likely, but
> don't want to resort to OPP? I worked out the optimal granularity, it seems.
> Is it better to read a serialized single column with 100 objects than a row
> consisting of a hundred columns each modeling an object?
> 
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Reading whole row vs a range of columns (pycassa)

Posted by buddhasystem <po...@bnl.gov>.

As I'm working on this further, I want to understand this:

Is it advantageous to flatten data in blocks (strings) each containing a
series of objects, if I know that a serial object read is often likely, but
don't want to resort to OPP? I worked out the optimal granularity, it seems.
Is it better to read a serialized single column with 100 objects than a row
consisting of a hundred columns each modeling an object?

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.