You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by hajjat <ha...@purdue.edu> on 2013/07/18 23:03:19 UTC

Recommended data size for Reads/Writes in Cassandra

Hi,

Is there a recommended data size for Reads/Writes in Cassandra? I tried
inserting 10 MB objects and the latency I got was pretty high. Also, I was
never able to insert larger objects (say 50 MB) since Cassandra kept
crashing when I tried that.

Here is my experiment setup: 
I used two Large VMs in EC2 within the same data-center. Inserts have ALL
consistency (strong consistency).  The latencies were as follows:
Data size:	10 MB		1 MB		100 Bytes
Latency:	250ms		50ms		8ms

I've also done the same for two Large VMs across two data-centers. The
latencies were around:
Data size:	10 MB		1 MB		100 Bytes
Latency:	1200ms		800ms	80ms

1) Ain't the 10 MB latency extremely high? 
2) Is there a recommended data size to use with Cassandra (e.g., a few bytes
up to 1 MB)?
3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
anybody know why? I thought the max data size should be up to 2 GB?

Thanks,
Mohammad

PS. Here is my python code I use to insert into Cassandra. I put my
stopwatch timers around the insert statement:
    fh = open(TEST_FILE,'r')
    data = str(fh.read())

    POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
timeout=None)
    USER = ColumnFamily(POOL, 'User')
    USER.insert('Ali', {'data':
data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)




--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Recommended data size for Reads/Writes in Cassandra

Posted by aaron morton <aa...@thelastpickle.com>.
> Do you guys have any idea why the 10 MB writes took a lot of time in my case although I'm using Large VMs which have plenty of resources?
If you are talking about m1.large IMHO they are under powered, at a minimum you should be using m1.xlarge. 

Cheers
 
-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/07/2013, at 11:26 AM, Tyler Hobbs <ty...@datastax.com> wrote:

> Large writes can sometimes put a lot of heap/GC pressure on the node, which can be an additional source of latency.  Use the query tracing in Cassandra 1.2+ to get a better picture of where the latency is.
> 
> 
> On Thu, Jul 18, 2013 at 6:18 PM, Mohammad Hajjat <ha...@purdue.edu> wrote:
> Thanks Andrey and Tyler! That was useful :)
> 
> Do you guys have any idea why the 10 MB writes took a lot of time in my case although I'm using Large VMs which have plenty of resources? Or do you think this latency is expected?
> I'm trying to see how much time is spent in the network versus processing CPU cycles of the nodes; any suggestion for a good profiling tool?
> 
> 
> 
> On Thu, Jul 18, 2013 at 5:50 PM, Tyler Hobbs <ty...@datastax.com> wrote:
> The default limit is 16mb, but realistically you should try to keep writes under 10mb, breaking up large values into multiple columns/rows if necessary.
> 
> 
> On Thu, Jul 18, 2013 at 4:31 PM, Andrey Ilinykh <ai...@gmail.com> wrote:
> there is a limit of thrift message ( thrift_max_message_length_in_mb), by default it is 64m if I'm not mistaken. This is your limit.
> 
> 
> On Thu, Jul 18, 2013 at 2:03 PM, hajjat <ha...@purdue.edu> wrote:
> Hi,
> 
> Is there a recommended data size for Reads/Writes in Cassandra? I tried
> inserting 10 MB objects and the latency I got was pretty high. Also, I was
> never able to insert larger objects (say 50 MB) since Cassandra kept
> crashing when I tried that.
> 
> Here is my experiment setup:
> I used two Large VMs in EC2 within the same data-center. Inserts have ALL
> consistency (strong consistency).  The latencies were as follows:
> Data size:      10 MB           1 MB            100 Bytes
> Latency:        250ms           50ms            8ms
> 
> I've also done the same for two Large VMs across two data-centers. The
> latencies were around:
> Data size:      10 MB           1 MB            100 Bytes
> Latency:        1200ms          800ms   80ms
> 
> 1) Ain't the 10 MB latency extremely high?
> 2) Is there a recommended data size to use with Cassandra (e.g., a few bytes
> up to 1 MB)?
> 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
> anybody know why? I thought the max data size should be up to 2 GB?
> 
> Thanks,
> Mohammad
> 
> PS. Here is my python code I use to insert into Cassandra. I put my
> stopwatch timers around the insert statement:
>     fh = open(TEST_FILE,'r')
>     data = str(fh.read())
> 
>     POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
> timeout=None)
>     USER = ColumnFamily(POOL, 'User')
>     USER.insert('Ali', {'data':
> data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)
> 
> 
> 
> 
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
> 
> 
> 
> 
> -- 
> Tyler Hobbs
> DataStax
> 
> 
> 
> -- 
> Mohammad Hajjat
> Ph.D. Student
> Electrical and Computer Engineering
> Purdue University
> 
> 
> 
> -- 
> Tyler Hobbs
> DataStax


Re: Recommended data size for Reads/Writes in Cassandra

Posted by Tyler Hobbs <ty...@datastax.com>.
Large writes can sometimes put a lot of heap/GC pressure on the node, which
can be an additional source of latency.  Use the query tracing in Cassandra
1.2+ to get a better picture of where the latency is.


On Thu, Jul 18, 2013 at 6:18 PM, Mohammad Hajjat <ha...@purdue.edu> wrote:

> Thanks Andrey and Tyler! That was useful :)
>
> Do you guys have any idea why the 10 MB writes took a lot of time in my
> case although I'm using Large VMs which have plenty of resources? Or do you
> think this latency is expected?
> I'm trying to see how much time is spent in the network versus processing
> CPU cycles of the nodes; any suggestion for a good profiling tool?
>
>
>
> On Thu, Jul 18, 2013 at 5:50 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>
>> The default limit is 16mb, but realistically you should try to keep
>> writes under 10mb, breaking up large values into multiple columns/rows if
>> necessary.
>>
>>
>> On Thu, Jul 18, 2013 at 4:31 PM, Andrey Ilinykh <ai...@gmail.com>wrote:
>>
>>> there is a limit of thrift message ( thrift_max_message_length_in_mb),
>>> by default it is 64m if I'm not mistaken. This is your limit.
>>>
>>>
>>> On Thu, Jul 18, 2013 at 2:03 PM, hajjat <ha...@purdue.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there a recommended data size for Reads/Writes in Cassandra? I tried
>>>> inserting 10 MB objects and the latency I got was pretty high. Also, I
>>>> was
>>>> never able to insert larger objects (say 50 MB) since Cassandra kept
>>>> crashing when I tried that.
>>>>
>>>> Here is my experiment setup:
>>>> I used two Large VMs in EC2 within the same data-center. Inserts have
>>>> ALL
>>>> consistency (strong consistency).  The latencies were as follows:
>>>> Data size:      10 MB           1 MB            100 Bytes
>>>> Latency:        250ms           50ms            8ms
>>>>
>>>> I've also done the same for two Large VMs across two data-centers. The
>>>> latencies were around:
>>>> Data size:      10 MB           1 MB            100 Bytes
>>>> Latency:        1200ms          800ms   80ms
>>>>
>>>> 1) Ain't the 10 MB latency extremely high?
>>>> 2) Is there a recommended data size to use with Cassandra (e.g., a few
>>>> bytes
>>>> up to 1 MB)?
>>>> 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
>>>> anybody know why? I thought the max data size should be up to 2 GB?
>>>>
>>>> Thanks,
>>>> Mohammad
>>>>
>>>> PS. Here is my python code I use to insert into Cassandra. I put my
>>>> stopwatch timers around the insert statement:
>>>>     fh = open(TEST_FILE,'r')
>>>>     data = str(fh.read())
>>>>
>>>>     POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
>>>> timeout=None)
>>>>     USER = ColumnFamily(POOL, 'User')
>>>>     USER.insert('Ali', {'data':
>>>>
>>>> data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
>>>> Sent from the cassandra-user@incubator.apache.org mailing list archive
>>>> at Nabble.com.
>>>>
>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>
>
> --
> *Mohammad Hajjat*
> *Ph.D. Student*
> *Electrical and Computer Engineering*
> *Purdue University*
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Recommended data size for Reads/Writes in Cassandra

Posted by Mohammad Hajjat <ha...@purdue.edu>.
Thanks Andrey and Tyler! That was useful :)

Do you guys have any idea why the 10 MB writes took a lot of time in my
case although I'm using Large VMs which have plenty of resources? Or do you
think this latency is expected?
I'm trying to see how much time is spent in the network versus processing
CPU cycles of the nodes; any suggestion for a good profiling tool?



On Thu, Jul 18, 2013 at 5:50 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> The default limit is 16mb, but realistically you should try to keep writes
> under 10mb, breaking up large values into multiple columns/rows if
> necessary.
>
>
> On Thu, Jul 18, 2013 at 4:31 PM, Andrey Ilinykh <ai...@gmail.com>wrote:
>
>> there is a limit of thrift message ( thrift_max_message_length_in_mb), by
>> default it is 64m if I'm not mistaken. This is your limit.
>>
>>
>> On Thu, Jul 18, 2013 at 2:03 PM, hajjat <ha...@purdue.edu> wrote:
>>
>>> Hi,
>>>
>>> Is there a recommended data size for Reads/Writes in Cassandra? I tried
>>> inserting 10 MB objects and the latency I got was pretty high. Also, I
>>> was
>>> never able to insert larger objects (say 50 MB) since Cassandra kept
>>> crashing when I tried that.
>>>
>>> Here is my experiment setup:
>>> I used two Large VMs in EC2 within the same data-center. Inserts have ALL
>>> consistency (strong consistency).  The latencies were as follows:
>>> Data size:      10 MB           1 MB            100 Bytes
>>> Latency:        250ms           50ms            8ms
>>>
>>> I've also done the same for two Large VMs across two data-centers. The
>>> latencies were around:
>>> Data size:      10 MB           1 MB            100 Bytes
>>> Latency:        1200ms          800ms   80ms
>>>
>>> 1) Ain't the 10 MB latency extremely high?
>>> 2) Is there a recommended data size to use with Cassandra (e.g., a few
>>> bytes
>>> up to 1 MB)?
>>> 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
>>> anybody know why? I thought the max data size should be up to 2 GB?
>>>
>>> Thanks,
>>> Mohammad
>>>
>>> PS. Here is my python code I use to insert into Cassandra. I put my
>>> stopwatch timers around the insert statement:
>>>     fh = open(TEST_FILE,'r')
>>>     data = str(fh.read())
>>>
>>>     POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
>>> timeout=None)
>>>     USER = ColumnFamily(POOL, 'User')
>>>     USER.insert('Ali', {'data':
>>>
>>> data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
>>> Sent from the cassandra-user@incubator.apache.org mailing list archive
>>> at Nabble.com.
>>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>



-- 
*Mohammad Hajjat*
*Ph.D. Student*
*Electrical and Computer Engineering*
*Purdue University*

Re: Recommended data size for Reads/Writes in Cassandra

Posted by Tyler Hobbs <ty...@datastax.com>.
The default limit is 16mb, but realistically you should try to keep writes
under 10mb, breaking up large values into multiple columns/rows if
necessary.


On Thu, Jul 18, 2013 at 4:31 PM, Andrey Ilinykh <ai...@gmail.com> wrote:

> there is a limit of thrift message ( thrift_max_message_length_in_mb), by
> default it is 64m if I'm not mistaken. This is your limit.
>
>
> On Thu, Jul 18, 2013 at 2:03 PM, hajjat <ha...@purdue.edu> wrote:
>
>> Hi,
>>
>> Is there a recommended data size for Reads/Writes in Cassandra? I tried
>> inserting 10 MB objects and the latency I got was pretty high. Also, I was
>> never able to insert larger objects (say 50 MB) since Cassandra kept
>> crashing when I tried that.
>>
>> Here is my experiment setup:
>> I used two Large VMs in EC2 within the same data-center. Inserts have ALL
>> consistency (strong consistency).  The latencies were as follows:
>> Data size:      10 MB           1 MB            100 Bytes
>> Latency:        250ms           50ms            8ms
>>
>> I've also done the same for two Large VMs across two data-centers. The
>> latencies were around:
>> Data size:      10 MB           1 MB            100 Bytes
>> Latency:        1200ms          800ms   80ms
>>
>> 1) Ain't the 10 MB latency extremely high?
>> 2) Is there a recommended data size to use with Cassandra (e.g., a few
>> bytes
>> up to 1 MB)?
>> 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
>> anybody know why? I thought the max data size should be up to 2 GB?
>>
>> Thanks,
>> Mohammad
>>
>> PS. Here is my python code I use to insert into Cassandra. I put my
>> stopwatch timers around the insert statement:
>>     fh = open(TEST_FILE,'r')
>>     data = str(fh.read())
>>
>>     POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
>> timeout=None)
>>     USER = ColumnFamily(POOL, 'User')
>>     USER.insert('Ali', {'data':
>>
>> data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive
>> at Nabble.com.
>>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Recommended data size for Reads/Writes in Cassandra

Posted by Andrey Ilinykh <ai...@gmail.com>.
there is a limit of thrift message ( thrift_max_message_length_in_mb), by
default it is 64m if I'm not mistaken. This is your limit.


On Thu, Jul 18, 2013 at 2:03 PM, hajjat <ha...@purdue.edu> wrote:

> Hi,
>
> Is there a recommended data size for Reads/Writes in Cassandra? I tried
> inserting 10 MB objects and the latency I got was pretty high. Also, I was
> never able to insert larger objects (say 50 MB) since Cassandra kept
> crashing when I tried that.
>
> Here is my experiment setup:
> I used two Large VMs in EC2 within the same data-center. Inserts have ALL
> consistency (strong consistency).  The latencies were as follows:
> Data size:      10 MB           1 MB            100 Bytes
> Latency:        250ms           50ms            8ms
>
> I've also done the same for two Large VMs across two data-centers. The
> latencies were around:
> Data size:      10 MB           1 MB            100 Bytes
> Latency:        1200ms          800ms   80ms
>
> 1) Ain't the 10 MB latency extremely high?
> 2) Is there a recommended data size to use with Cassandra (e.g., a few
> bytes
> up to 1 MB)?
> 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
> anybody know why? I thought the max data size should be up to 2 GB?
>
> Thanks,
> Mohammad
>
> PS. Here is my python code I use to insert into Cassandra. I put my
> stopwatch timers around the insert statement:
>     fh = open(TEST_FILE,'r')
>     data = str(fh.read())
>
>     POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
> timeout=None)
>     USER = ColumnFamily(POOL, 'User')
>     USER.insert('Ali', {'data':
>
> data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)
>
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>