You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2010/07/15 10:09:49 UTC

[Cassandra Wiki] Trivial Update of "FAQ" by MichaelSchade

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FAQ" page has been changed by MichaelSchade.
The comment on this change is: Grammar..
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=76&rev2=77

--------------------------------------------------

<<Anchor(large_file_and_blob_storage)>>

== Can I Store BLOBs in Cassandra? ==
- Currently Cassandra isn't optimized specifically for large file or BLOB storage. However, files of around 64Mb and smaller can be easily stored in the database without splitting them into smaller chunks. This is primarily due to the fact that Cassandra's public API is based on Thrift, which offers no streaming abilities; any value written or fetched has to fit in to memory. Other non Thrift interfaces may solve this problem in the future, but there are currently no plans to change Thrifts behavior. When planning applications that require storing BLOBS, you should also consider these attributes of Cassandra as well:
+ Currently Cassandra isn't optimized specifically for large file or BLOB storage. However, files of around 64Mb and smaller can be easily stored in the database without splitting them into smaller chunks. This is primarily due to the fact that Cassandra's public API is based on Thrift, which offers no streaming abilities; any value written or fetched has to fit in to memory. Other non Thrift interfaces may solve this problem in the future, but there are currently no plans to change Thrift's behavior. When planning applications that require storing BLOBS, you should also consider these attributes of Cassandra as well:

* The main limitation on a column and super column size is that all the data for a single key and column must fit (on disk) on a single machine(node) in the cluster. Because keys alone are used to determine the nodes responsible for replicating their data, the amount of data associated with a single key has this upper bound. This is an inherent limitation of the distribution model.