You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jonathan Guberman <jg...@tineye.com> on 2017/05/05 19:19:44 UTC

Cassandra as a key/object store for many small (10-60k) files

Hello,

We’re currently testing Cassandra for use as a pure key-object store for data blobs around 10kB - 60kB each. Our use case is storing on the order of 10 billion objects with about 5-20 million new writes per day. A written object will never be updated or deleted. Objects will be read at least once, some time within 10 days of being written. This will generally happen as a batch; that is, all of the images written on a particular day will be read together at the same time. This batch read will only happen one time; future reads will happen on individual objects, with no grouping, and they will follow a long-tail distribution, with popular objects read thousands of times per year but most read never or virtually never.

I’ve set up a small four node test cluster and have written test scripts to benchmark writing and reading our data. The table I’ve set up is very simple: an ascii primary key column with the object ID and a blob column for the data. All other settings were left at their defaults.

I’ve found write speeds to be very fast most of the time. However, periodically, writes will slow to a crawl for anywhere between half an hour to two hours, after which speeds recover to their previous levels. I assume this is some sort of data compaction or flushing to disk, but I haven’t been able to figure out the exact cause.

Read speeds have been more disappointing. Cached reads are very fast, but random read speed averages about 2 MB/sec, which is too slow when we need to read out a batch of several million objects. I don’t think it’s reasonable to assume that these rows will all still be cached by the time we need to read them for that first large batch read.

My general question is whether anyone has any suggestions for how to improve performance for our use case. More specifically:

- Is there a way to mitigate or eliminate the huge slowdowns I see when writing millions of rows?
- Are there settings I should be using in order to maximize read speeds for random reads?
- Is there a way to design our tables to improve the read speeds for the initial large batched reads? I was thinking of using a batch ID column that could be used to retrieve the data for the initial block. However, future reads would need to be done by the object ID, not the batch ID, so it seems to me I’d need to duplicate the data, one in a “objects by batch” table, and the other in a simple “objects” table. Is there a better approach than this?

Thank you!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Cassandra as a key/object store for many small (10-60k) files

Posted by daemeon reiydelle <da...@gmail.com>.

I would guess you have network overload issues, I have seen pretty much
exactly what you describe many times, (so far ;{) always this is the issue.
Especially with 1gbit networks, no jumbo frames, etc. Get your network guys
to monitor the error retry packets across ALL of the interfaces (all the
nodes, Top of Rack switch, network switches, etc.). If you see ANY retries,
timeouts, errors, you have found your problem.

Or it could be something like java stack garbage collection, cpu overload,
etc.


*.......*

*Making a billion dollar startup is easy: "take a human desire, preferably
one that has been around for a really long time … Identify that desire and
use modern technology to take out steps."*


*.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144
9872*

On Fri, May 5, 2017 at 12:26 PM, Jonathan Guberman <jg...@tineye.com> wrote:

> Yes, local storage volumes on each machine.
>
> On May 5, 2017, at 3:25 PM, daemeon reiydelle <da...@gmail.com> wrote:
>
> These numbers do not match e.g. AWS, so guessing you are using local
> storage?
>
>
> *.......*
>
> *Making a billion dollar startup is easy: "take a human desire, preferably
> one that has been around for a really long time … Identify that desire and
> use modern technology to take out steps."*
>
>
> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <(415)%20501-0198>London (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
> On Fri, May 5, 2017 at 12:19 PM, Jonathan Guberman <jg...@tineye.com> wrote:
>
>> Hello,
>>
>> We’re currently testing Cassandra for use as a pure key-object store for
>> data blobs around 10kB - 60kB each. Our use case is storing on the order of
>> 10 billion objects with about 5-20 million new writes per day. A written
>> object will never be updated or deleted. Objects will be read at least
>> once, some time within 10 days of being written. This will generally happen
>> as a batch; that is, all of the images written on a particular day will be
>> read together at the same time. This batch read will only happen one time;
>> future reads will happen on individual objects, with no grouping, and they
>> will follow a long-tail distribution, with popular objects read thousands
>> of times per year but most read never or virtually never.
>>
>> I’ve set up a small four node test cluster and have written test scripts
>> to benchmark writing and reading our data. The table I’ve set up is very
>> simple: an ascii primary key column with the object ID and a blob column
>> for the data. All other settings were left at their defaults.
>>
>> I’ve found write speeds to be very fast most of the time. However,
>> periodically, writes will slow to a crawl for anywhere between half an hour
>> to two hours, after which speeds recover to their previous levels. I assume
>> this is some sort of data compaction or flushing to disk, but I haven’t
>> been able to figure out the exact cause.
>>
>> Read speeds have been more disappointing. Cached reads are very fast, but
>> random read speed averages about 2 MB/sec, which is too slow when we need
>> to read out a batch of several million objects. I don’t think it’s
>> reasonable to assume that these rows will all still be cached by the time
>> we need to read them for that first large batch read.
>>
>> My general question is whether anyone has any suggestions for how to
>> improve performance for our use case. More specifically:
>>
>> - Is there a way to mitigate or eliminate the huge slowdowns I see when
>> writing millions of rows?
>> - Are there settings I should be using in order to maximize read speeds
>> for random reads?
>> - Is there a way to design our tables to improve the read speeds for the
>> initial large batched reads? I was thinking of using a batch ID column that
>> could be used to retrieve the data for the initial block. However, future
>> reads would need to be done by the object ID, not the batch ID, so it seems
>> to me I’d need to duplicate the data, one in a “objects by batch” table,
>> and the other in a simple “objects” table. Is there a better approach than
>> this?
>>
>> Thank you!
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>
>
>

Re: Cassandra as a key/object store for many small (10-60k) files

Posted by Jonathan Guberman <jg...@tineye.com>.

Yes, local storage volumes on each machine.

> On May 5, 2017, at 3:25 PM, daemeon reiydelle <da...@gmail.com> wrote:
> 
> These numbers do not match e.g. AWS, so guessing you are using local storage?
> 
> 
> .......
> Making a billion dollar startup is easy: "take a human desire, preferably one that has been around for a really long time … Identify that desire and use modern technology to take out steps."
> .......
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872
> 
> On Fri, May 5, 2017 at 12:19 PM, Jonathan Guberman <jg@tineye.com <ma...@tineye.com>> wrote:
> Hello,
> 
> We’re currently testing Cassandra for use as a pure key-object store for data blobs around 10kB - 60kB each. Our use case is storing on the order of 10 billion objects with about 5-20 million new writes per day. A written object will never be updated or deleted. Objects will be read at least once, some time within 10 days of being written. This will generally happen as a batch; that is, all of the images written on a particular day will be read together at the same time. This batch read will only happen one time; future reads will happen on individual objects, with no grouping, and they will follow a long-tail distribution, with popular objects read thousands of times per year but most read never or virtually never.
> 
> I’ve set up a small four node test cluster and have written test scripts to benchmark writing and reading our data. The table I’ve set up is very simple: an ascii primary key column with the object ID and a blob column for the data. All other settings were left at their defaults.
> 
> I’ve found write speeds to be very fast most of the time. However, periodically, writes will slow to a crawl for anywhere between half an hour to two hours, after which speeds recover to their previous levels. I assume this is some sort of data compaction or flushing to disk, but I haven’t been able to figure out the exact cause.
> 
> Read speeds have been more disappointing. Cached reads are very fast, but random read speed averages about 2 MB/sec, which is too slow when we need to read out a batch of several million objects. I don’t think it’s reasonable to assume that these rows will all still be cached by the time we need to read them for that first large batch read.
> 
> My general question is whether anyone has any suggestions for how to improve performance for our use case. More specifically:
> 
> - Is there a way to mitigate or eliminate the huge slowdowns I see when writing millions of rows?
> - Are there settings I should be using in order to maximize read speeds for random reads?
> - Is there a way to design our tables to improve the read speeds for the initial large batched reads? I was thinking of using a batch ID column that could be used to retrieve the data for the initial block. However, future reads would need to be done by the object ID, not the batch ID, so it seems to me I’d need to duplicate the data, one in a “objects by batch” table, and the other in a simple “objects” table. Is there a better approach than this?
> 
> Thank you!
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
> For additional commands, e-mail: user-help@cassandra.apache.org <ma...@cassandra.apache.org>
> 
>

Re: Cassandra as a key/object store for many small (10-60k) files

Posted by daemeon reiydelle <da...@gmail.com>.

These numbers do not match e.g. AWS, so guessing you are using local
storage?


*.......*

*Making a billion dollar startup is easy: "take a human desire, preferably
one that has been around for a really long time … Identify that desire and
use modern technology to take out steps."*


*.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144
9872*

On Fri, May 5, 2017 at 12:19 PM, Jonathan Guberman <jg...@tineye.com> wrote:

> Hello,
>
> We’re currently testing Cassandra for use as a pure key-object store for
> data blobs around 10kB - 60kB each. Our use case is storing on the order of
> 10 billion objects with about 5-20 million new writes per day. A written
> object will never be updated or deleted. Objects will be read at least
> once, some time within 10 days of being written. This will generally happen
> as a batch; that is, all of the images written on a particular day will be
> read together at the same time. This batch read will only happen one time;
> future reads will happen on individual objects, with no grouping, and they
> will follow a long-tail distribution, with popular objects read thousands
> of times per year but most read never or virtually never.
>
> I’ve set up a small four node test cluster and have written test scripts
> to benchmark writing and reading our data. The table I’ve set up is very
> simple: an ascii primary key column with the object ID and a blob column
> for the data. All other settings were left at their defaults.
>
> I’ve found write speeds to be very fast most of the time. However,
> periodically, writes will slow to a crawl for anywhere between half an hour
> to two hours, after which speeds recover to their previous levels. I assume
> this is some sort of data compaction or flushing to disk, but I haven’t
> been able to figure out the exact cause.
>
> Read speeds have been more disappointing. Cached reads are very fast, but
> random read speed averages about 2 MB/sec, which is too slow when we need
> to read out a batch of several million objects. I don’t think it’s
> reasonable to assume that these rows will all still be cached by the time
> we need to read them for that first large batch read.
>
> My general question is whether anyone has any suggestions for how to
> improve performance for our use case. More specifically:
>
> - Is there a way to mitigate or eliminate the huge slowdowns I see when
> writing millions of rows?
> - Are there settings I should be using in order to maximize read speeds
> for random reads?
> - Is there a way to design our tables to improve the read speeds for the
> initial large batched reads? I was thinking of using a batch ID column that
> could be used to retrieve the data for the initial block. However, future
> reads would need to be done by the object ID, not the batch ID, so it seems
> to me I’d need to duplicate the data, one in a “objects by batch” table,
> and the other in a simple “objects” table. Is there a better approach than
> this?
>
> Thank you!
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>