You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Priyanka <pr...@gmail.com> on 2011/07/26 17:39:23 UTC

Slow Reads

Hello All,
        
          I am doing some read tests on Cassandra on a single node.But they
are turning up to be very slow.
Here is the data model in detail.
I am using a super column family.Cassandra has total 970 rows and each row
has 620901 super columns and each super column has 2 columns.Total data in
the database would be around 45GB.
I am trying to retrieve the data of a particular super column[Trying to pull
the row key associated with the super column and the column values with in
the super column.
It is taking 2.5 secs with java code and 4.7 secs with the python code.

Here is the python code.
 result = col_fam.get_range(start="",
finish="",columns=None,column_start="",column_finish
="",column_reversed=False,column_count=2,row_count=None,include_timestamp=False,
super_column='200003', read_consistency_level=None,buffer_size=None)

This is very slow compared to MySQL.
Am not sure whats going wrong here.Could some one let me know if there is
any problem with my model.


Any help in this regard is highly appreciated.

Thank you.

Regards,
Priyanka

         



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Slow Reads

Posted by Priyanka Ganuthula <pr...@gmail.com>.

Supercolumn has two columns and each column has only one byte.
It is a bit faster but not significant.

On Tue, Jul 26, 2011 at 12:49 PM, Jake Luciani <ja...@gmail.com> wrote:

> It doesn't read the entire row, but it does read a section of the row from
> disk...
>
> How big is each supercolumn?  If you re-read the data does the query time
> get faster?
>
>
>
> On Tue, Jul 26, 2011 at 11:59 AM, Philippe <wa...@gmail.com> wrote:
>
>> i believe it's because it needs to read the whole row to get to your super
>> column.
>>
>> you might have to reconsider your model.
>> Le 26 juil. 2011 17:39, "Priyanka" <pr...@gmail.com> a écrit :
>>
>> >
>> > Hello All,
>> >
>> > I am doing some read tests on Cassandra on a single node.But they
>> > are turning up to be very slow.
>> > Here is the data model in detail.
>> > I am using a super column family.Cassandra has total 970 rows and each
>> row
>> > has 620901 super columns and each super column has 2 columns.Total data
>> in
>> > the database would be around 45GB.
>> > I am trying to retrieve the data of a particular super column[Trying to
>> pull
>> > the row key associated with the super column and the column values with
>> in
>> > the super column.
>> > It is taking 2.5 secs with java code and 4.7 secs with the python code.
>> >
>> > Here is the python code.
>> > result = col_fam.get_range(start="",
>> > finish="",columns=None,column_start="",column_finish
>> >
>> ="",column_reversed=False,column_count=2,row_count=None,include_timestamp=False,
>> > super_column='200003', read_consistency_level=None,buffer_size=None)
>> >
>> > This is very slow compared to MySQL.
>> > Am not sure whats going wrong here.Could some one let me know if there
>> is
>> > any problem with my model.
>> >
>> >
>> > Any help in this regard is highly appreciated.
>> >
>> > Thank you.
>> >
>> > Regards,
>> > Priyanka
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html
>> > Sent from the cassandra-user@incubator.apache.org mailing list archive
>> at Nabble.com.
>>
>
>
>
> --
> http://twitter.com/tjake
>

Re: Slow Reads

Posted by Jake Luciani <ja...@gmail.com>.

It doesn't read the entire row, but it does read a section of the row from
disk...

How big is each supercolumn?  If you re-read the data does the query time
get faster?



On Tue, Jul 26, 2011 at 11:59 AM, Philippe <wa...@gmail.com> wrote:

> i believe it's because it needs to read the whole row to get to your super
> column.
>
> you might have to reconsider your model.
> Le 26 juil. 2011 17:39, "Priyanka" <pr...@gmail.com> a écrit :
>
> >
> > Hello All,
> >
> > I am doing some read tests on Cassandra on a single node.But they
> > are turning up to be very slow.
> > Here is the data model in detail.
> > I am using a super column family.Cassandra has total 970 rows and each
> row
> > has 620901 super columns and each super column has 2 columns.Total data
> in
> > the database would be around 45GB.
> > I am trying to retrieve the data of a particular super column[Trying to
> pull
> > the row key associated with the super column and the column values with
> in
> > the super column.
> > It is taking 2.5 secs with java code and 4.7 secs with the python code.
> >
> > Here is the python code.
> > result = col_fam.get_range(start="",
> > finish="",columns=None,column_start="",column_finish
> >
> ="",column_reversed=False,column_count=2,row_count=None,include_timestamp=False,
> > super_column='200003', read_consistency_level=None,buffer_size=None)
> >
> > This is very slow compared to MySQL.
> > Am not sure whats going wrong here.Could some one let me know if there is
> > any problem with my model.
> >
> >
> > Any help in this regard is highly appreciated.
> >
> > Thank you.
> >
> > Regards,
> > Priyanka
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html
> > Sent from the cassandra-user@incubator.apache.org mailing list archive
> at Nabble.com.
>



-- 
http://twitter.com/tjake

Re: Slow Reads

Posted by Priyanka Ganuthula <pr...@gmail.com>.

Yes am using hector for java

On Wed, Jul 27, 2011 at 3:35 AM, CASSANDRA learner <
cassandralearner@gmail.com> wrote:

> R u using hector client for java
>
>
> On Tue, Jul 26, 2011 at 11:17 PM, Priyanka <pr...@gmail.com> wrote:
>
>> this is how my data looks
>> “rowkey1”:{
>>            “supercol1”:{ “col1”:T,”col2”:C}
>>            “supercol2”:{“col1”:C,”col2”:T }
>>            “supercol3”:{ “col1”:C,”col2”:T}
>>                }
>> "rowkey2”:{
>>           “supercol1”:{ “col1”:A,”col2”:A}
>>            “supercol2”:{“col1”:A,”col2”:T }
>>            “supercol3”:{ “col1”:C,”col2”:T}
>>             }
>>
>> each row has 620901 super columns and 2 columns for each super column.
>> Name of the super columns remain same for all the rows but the data in
>> each
>> super column is different.
>> I am trying to get the data of a particular super col which is spread
>> across
>> all the rows but with different data.
>>
>> So  yes,its getting data from all rows.
>> Please suggest me a better way to do so.
>> Thank you.
>>
>> the output of my query will be (suppose if i do for supercol1)
>> rowkey1,T,C
>> rowkey2,A,A
>>
>>
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6623091.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>

Re: Slow Reads

Posted by CASSANDRA learner <ca...@gmail.com>.

R u using hector client for java

On Tue, Jul 26, 2011 at 11:17 PM, Priyanka <pr...@gmail.com> wrote:

> this is how my data looks
> “rowkey1”:{
>            “supercol1”:{ “col1”:T,”col2”:C}
>            “supercol2”:{“col1”:C,”col2”:T }
>            “supercol3”:{ “col1”:C,”col2”:T}
>                }
> "rowkey2”:{
>           “supercol1”:{ “col1”:A,”col2”:A}
>            “supercol2”:{“col1”:A,”col2”:T }
>            “supercol3”:{ “col1”:C,”col2”:T}
>             }
>
> each row has 620901 super columns and 2 columns for each super column.
> Name of the super columns remain same for all the rows but the data in each
> super column is different.
> I am trying to get the data of a particular super col which is spread
> across
> all the rows but with different data.
>
> So  yes,its getting data from all rows.
> Please suggest me a better way to do so.
> Thank you.
>
> the output of my query will be (suppose if i do for supercol1)
> rowkey1,T,C
> rowkey2,A,A
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6623091.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: Slow Reads

Posted by Jake Luciani <ja...@gmail.com>.

The philosophy in no-sql is to store the data as you plan to access it. that
means duplicating the data many time possibly.  Disk is cheap, writes are
fast.


On Wed, Jul 27, 2011 at 2:22 PM, Priyanka <pr...@gmail.com> wrote:

> Thank you Indra for your suggestion.
> But the thing is apart from pulling data based on supercol in the below
> example I also need to query to pull the data based on a particular
> rowkey.If I change the model as u mentioned this query becomes slow.
> I need to do both the retrievals efficiently.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6627231.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>



-- 
http://twitter.com/tjake

Re: Slow Reads

Posted by Priyanka <pr...@gmail.com>.

Thank you Indra for your suggestion.
But the thing is apart from pulling data based on supercol in the below
example I also need to query to pull the data based on a particular
rowkey.If I change the model as u mentioned this query becomes slow.
I need to do both the retrievals efficiently.

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6627231.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Slow Reads

Posted by Indranath Ghosh <in...@gmail.com>.

You might want to avoid super columns and denormalize your schema...
Since you are querying by the supercoumns... you can make them the rowkeys
and current rowkeys can be your column names.. and using composite column
names to get to the columns faster.
Something like this (used your representation):

"supercol1":{

"rowkey1_col1":T

"rowkey1_col2":C

"rowkey2_col1":A

"rowkkey2_col2":A

       }

"supercol2":{

"rowkey1_col1":C

"rowkey1_col2":T
"rowkey2_col1":A
"rowkkey2_col2":A

       }


"supercol3":{

       }

"rowkey1_col1":C

"rowkey1_col2":T

"rowkey2_col1":C
"rowkkey2_col2":T

-indra

On Tue, Jul 26, 2011 at 10:47 AM, Priyanka <pr...@gmail.com> wrote:

> this is how my data looks
> “rowkey1”:{
>            “supercol1”:{ “col1”:T,”col2”:C}
>            “supercol2”:{“col1”:C,”col2”:T }
>            “supercol3”:{ “col1”:C,”col2”:T}
>                }
> "rowkey2”:{
>           “supercol1”:{ “col1”:A,”col2”:A}
>            “supercol2”:{“col1”:A,”col2”:T }
>            “supercol3”:{ “col1”:C,”col2”:T}
>             }
>
> each row has 620901 super columns and 2 columns for each super column.
> Name of the super columns remain same for all the rows but the data in each
> super column is different.
> I am trying to get the data of a particular super col which is spread
> across
> all the rows but with different data.
>
> So  yes,its getting data from all rows.
> Please suggest me a better way to do so.
> Thank you.
>
> the output of my query will be (suppose if i do for supercol1)
> rowkey1,T,C
> rowkey2,A,A
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6623091.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>



-- 
*Indranath Ghosh
Phone: 408-813-9207*

Re: Slow Reads

Posted by Priyanka <pr...@gmail.com>.

this is how my data looks
“rowkey1”:{
            “supercol1”:{ “col1”:T,”col2”:C}
            “supercol2”:{“col1”:C,”col2”:T }
            “supercol3”:{ “col1”:C,”col2”:T} 
		}
"rowkey2”:{
           “supercol1”:{ “col1”:A,”col2”:A}
            “supercol2”:{“col1”:A,”col2”:T }                	                                                      	        
            “supercol3”:{ “col1”:C,”col2”:T} 
	     }

each row has 620901 super columns and 2 columns for each super column.
Name of the super columns remain same for all the rows but the data in each
super column is different.
I am trying to get the data of a particular super col which is spread across
all the rows but with different data.

So  yes,its getting data from all rows.
Please suggest me a better way to do so.
Thank you.

the output of my query will be (suppose if i do for supercol1)
rowkey1,T,C
rowkey2,A,A



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6623091.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Slow Reads

Posted by Priyanka Ganuthula <pr...@gmail.com>.

Thanks Philippe , I have a question here...I am specifying the required
super column.Does it still need to read the entire row?
Or is it because am listing all the slices and then going to each slice and
picking data for the required super column?
SlicePredicate slicePredicate = new SlicePredicate();
SliceRange sliceRange = new SliceRange();
sliceRange.setStart(new byte[] {});
sliceRange.setFinish(new byte[] {});
slicePredicate.setSlice_range(sliceRange);


ColumnParent columnParent = new ColumnParent(COLUMN_FAMILY);
KeyRange keyRange = new KeyRange();
keyRange.start_key= ByteBuffer.wrap(lastkey.getBytes());
keyRange.end_key=ByteBuffer.wrap("".getBytes());


List<KeySlice> slices = client.get_range_slices(columnParent,
slicePredicate, keyRange, ConsistencyLevel.ONE);

Then i loop around slices and  and list super columns and set the name of
the super column and look for that.

Am I missing sth here ?

On Tue, Jul 26, 2011 at 11:59 AM, Philippe <wa...@gmail.com> wrote:

> i believe it's because it needs to read the whole row to get to your super
> column.
>
> you might have to reconsider your model.
> Le 26 juil. 2011 17:39, "Priyanka" <pr...@gmail.com> a écrit :
>
> >
> > Hello All,
> >
> > I am doing some read tests on Cassandra on a single node.But they
> > are turning up to be very slow.
> > Here is the data model in detail.
> > I am using a super column family.Cassandra has total 970 rows and each
> row
> > has 620901 super columns and each super column has 2 columns.Total data
> in
> > the database would be around 45GB.
> > I am trying to retrieve the data of a particular super column[Trying to
> pull
> > the row key associated with the super column and the column values with
> in
> > the super column.
> > It is taking 2.5 secs with java code and 4.7 secs with the python code.
> >
> > Here is the python code.
> > result = col_fam.get_range(start="",
> > finish="",columns=None,column_start="",column_finish
> >
> ="",column_reversed=False,column_count=2,row_count=None,include_timestamp=False,
> > super_column='200003', read_consistency_level=None,buffer_size=None)
> >
> > This is very slow compared to MySQL.
> > Am not sure whats going wrong here.Could some one let me know if there is
> > any problem with my model.
> >
> >
> > Any help in this regard is highly appreciated.
> >
> > Thank you.
> >
> > Regards,
> > Priyanka
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html
> > Sent from the cassandra-user@incubator.apache.org mailing list archive
> at Nabble.com.
>

Re: Slow Reads

Posted by Philippe <wa...@gmail.com>.

i believe it's because it needs to read the whole row to get to your super
column.

you might have to reconsider your model.
Le 26 juil. 2011 17:39, "Priyanka" <pr...@gmail.com> a écrit :
>
> Hello All,
>
> I am doing some read tests on Cassandra on a single node.But they
> are turning up to be very slow.
> Here is the data model in detail.
> I am using a super column family.Cassandra has total 970 rows and each row
> has 620901 super columns and each super column has 2 columns.Total data in
> the database would be around 45GB.
> I am trying to retrieve the data of a particular super column[Trying to
pull
> the row key associated with the super column and the column values with in
> the super column.
> It is taking 2.5 secs with java code and 4.7 secs with the python code.
>
> Here is the python code.
> result = col_fam.get_range(start="",
> finish="",columns=None,column_start="",column_finish
>
="",column_reversed=False,column_count=2,row_count=None,include_timestamp=False,
> super_column='200003', read_consistency_level=None,buffer_size=None)
>
> This is very slow compared to MySQL.
> Am not sure whats going wrong here.Could some one let me know if there is
> any problem with my model.
>
>
> Any help in this regard is highly appreciated.
>
> Thank you.
>
> Regards,
> Priyanka
>
>
>
>
>
> --
> View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
Nabble.com.

Re: Slow Reads

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Tue, Jul 26, 2011 at 5:39 PM, Priyanka <pr...@gmail.com> wrote:
>
> Hello All,
>
>          I am doing some read tests on Cassandra on a single node.But they
> are turning up to be very slow.
> Here is the data model in detail.
> I am using a super column family.Cassandra has total 970 rows and each row
> has 620901 super columns and each super column has 2 columns.Total data in
> the database would be around 45GB.
> I am trying to retrieve the data of a particular super column[Trying to pull
> the row key associated with the super column and the column values with in
> the super column.
> It is taking 2.5 secs with java code and 4.7 secs with the python code.
>
> Here is the python code.
>  result = col_fam.get_range(start="",
> finish="",columns=None,column_start="",column_finish
> ="",column_reversed=False,column_count=2,row_count=None,include_timestamp=False,
> super_column='200003', read_consistency_level=None,buffer_size=None)

What are you trying to query exactly ? All the rows or only one ?
Because I'm no expert in pycassa but if read this code and pycassa
code correctly,
request will query 1024 rows upfront and return an iterator that will
eventually read
all the rows in the database if you iter.


> This is very slow compared to MySQL.
> Am not sure whats going wrong here.Could some one let me know if there is
> any problem with my model.
>
>
> Any help in this regard is highly appreciated.
>
> Thank you.
>
> Regards,
> Priyanka
>
>
>
>
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>