You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Durity, Sean R" <SE...@homedepot.com> on 2019/04/19 13:15:16 UTC

RE: [EXTERNAL] Re: Using Cassandra as an object store

Object stores are some of our largest and oldest use cases. Cassandra has been a good choice for us. We do chunk the objects into 64k chunks (I think), so that partitions are not too large and it scales predictably. For us, the choice was more about high availability and scalability, which Cassandra provides well.

Sean Durity




From: Paul Chandler <pa...@redshots.com>
Sent: Friday, April 19, 2019 5:24 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Using Cassandra as an object store

Gene,

I have found that clusters used as object stores have caused me more problems than normal in the past, so I recommend using a separate object store if possible.

However, it certainly can be done, there is just a few things to consider:

1) Deletion policy: How are these objects going to be deleted, we have had problems in the past where deleted objects didn’t get removed from disk. This was because by the time they were deleted they had been compacted into very large sstables that were rarely compacted again. So think about compaction strategy and any tombstone issues you may come across.

2) Compression: Are the objects already compressed before they are stored eg jpgs ? If so turn compression off on the table, this reduces the amount of data read into memory when reading the data, reducing pressure on the heap. We did some trials with one system, and found much better performance if the compression was performed on the client side. So try some tests with that.

3) How often is the data read? There will be be completely different hardware requirements depending on whether this is a image store for an e-commerce site, compared with a pdf store holding client invoices. With a small amount of reads per object, then you can specify smaller CPUs and memory machines with a large amount of storage. If there are a large amount of reads, them you need to think much more carefully about memory and CPU, as per the Walmart article you referenced.

Thanks

Paul Chandler
www.redshots.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.redshots.com&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=2XnWJZ_TELTnIh3QtGe5SMJbuLNmTeKSC_cHooe3jYw&s=qymTcRJstEMuDEFFmnzgGLitW-sPExPRTKslnzg56nI&e=>




On 19 Apr 2019, at 09:04, DuyHai Doan <do...@gmail.com>> wrote:

Idea:

To guarantee data integrity, you can store an MD5 of all chunks data as static column in the partition that contains the chunks

On Fri, Apr 19, 2019 at 9:18 AM cclive1601你 <cc...@gmail.com>> wrote:
we have use cassandra as object store for some years, you can just split the object into some small pieces. object got a pk, then the some small pieces got some pks ,object's pk and pieces's pk can be store in meta table in cassandra, and small pieces's pk and some pieces store in data table.  we store videos ,picture and other no structure data.

Gene <gh...@gmail.com>> 于2019年4月19日周五 下午1:25写道:
Howdy

I'm looking at the possibility of using cassandra as an object store to offload image/blob data from an Oracle database.  I've seen mentions of it being used as an object store in a large scale fashion, like with Walmart:

https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593<https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_walmartlabs_building-2Dobject-2Dstore-2Dstoring-2Dimages-2Din-2Dcassandra-2Dwalmart-2Dscale-2Da6b9c02af593&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=2XnWJZ_TELTnIh3QtGe5SMJbuLNmTeKSC_cHooe3jYw&s=Ea7HkmBSM32WG3930PP3mqmx7FmjQyJnNjNKULshL4U&e=>

However I have found little on small scale setups and if it's even worth using Cassandra in place of something else that's meant to be used for object storage, like Ceph.

Additionally, I've read that cassandra struggles with storing objects 10MB or larger and it's recommended to break objects up into smaller chunks, which either requires some kind of middleware between our application and cassandra, or it would require our application to split objects into smaller chunks and recombine them as needed.

I've looked into pithos and astyanax, but those are both no longer developed and I'm not seeing anything that might replace them in the long term.

https://github.com/exoscale/pithos<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_exoscale_pithos&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=2XnWJZ_TELTnIh3QtGe5SMJbuLNmTeKSC_cHooe3jYw&s=VXuCOqIAr5OnfYjD386q__7GaDFCeXxP2uVtDBWf4q0&e=>
https://github.com/Netflix/astyanax<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Netflix_astyanax&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=2XnWJZ_TELTnIh3QtGe5SMJbuLNmTeKSC_cHooe3jYw&s=uLgsw32DlBnzdGCqCbWn2VMQ5YCtzTs6YpiozT79fpM&e=>

Any helpful information or advice would be greatly appreciated.

Thanks in advance.

-Gene


--
you are the apple of my eye !


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.