You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Brendan Poole <br...@new-law.co.uk> on 2011/02/03 13:33:58 UTC

Using Cassandra to store files

Hi
 
Would anyone recommend using Cassandra for storing hundreds of thousands
of documents in Word/PDF format? The manual says it can store documents
under 64MB with no issue but was wondering if anyone is using it for
this specific perpose.  Would it be efficient/reliable and is there
anything I need to bear in mind?
 
Thanks in advance

     Brendan Poole
     Systems Developer
     NewLaw Solicitors
     Helmont House 
     Churchill Way
     Cardiff
     brendan.poole@new-law.co.uk
     029 2078 4283
     www.new-law.co.uk

Please consider the environment before printing this e-mail

Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law.  

The intended recipient is authorised to access it.  If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the 
contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. 

NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission.  This message and any 
attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened.  However,it is the responsibility of the recipient to 
ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use. 

NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company registered in England and Wales with registered number 07200038.  
NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose website is http://www.sra.org.uk 

The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: info@new-law.co.uk. www.new-law.co.uk.  

We use the word ‘partner’ to refer to a shareowner or director of the company, or an employee or consultant of the company who is a lawyer with equivalent standing and qualifications. A list 
of the directors is displayed at the above address, together with a list of those persons who are designated as partners. 

Re: Using Cassandra to store files

Posted by Daniel Doubleday <da...@gmx.net>.
Hundreds of thousands doesn't sound too bad. Good old NFS would do with an ok directory structure.

We are doing this. Our documents are pretty small though (a few kb). We have around 40M right now with around 300GB total.

Generally the problem is that much data usually means that cassandra becomes io bound during repairs and compactions even if your hot dataset would fit in the page cache. There are efforts to overcome this and 0.7 will help with repair problems but for the time being you have to have quite some headroom in terms of io performance to handle these situations.  

Here is a related post:

http://comments.gmane.org/gmane.comp.db.cassandra.user/11190

On Feb 3, 2011, at 1:33 PM, Brendan Poole wrote:

> Hi
>  
> Would anyone recommend using Cassandra for storing hundreds of thousands of documents in Word/PDF format? The manual says it can store documents under 64MB with no issue but was wondering if anyone is using it for this specific perpose.  Would it be efficient/reliable and is there anything I need to bear in mind?
>  
> Thanks in advance
>  
> 
> <Signature.jpg>     Brendan Poole
>      Systems Developer
>       NewLaw Solicitors
>      Helmont House  
>      Churchill Way
>      Cardiff
>      brendan.poole@new-law.co.uk
>      029 2078 4283
>      www.new-law.co.uk
> 
> 
>  
> 
> 
> P Please consider the environment before printing this e-mail
> Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law.
> The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited.
> NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However, it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use.
> NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company registered in England and Wales with registered number 07200038.
> NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose website is http://www.sra.org.uk
> The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: info@new-law.co.uk. www.new-law.co.uk.
> We use the word ‘partner’ to refer to a shareowner or director of the company, or an employee or consultant of the company who is a lawyer with equivalent standing and qualifications. A list of the directors is displayed at the above address, together with a list of those persons who are designated as partners.


Re: Using Cassandra to store files

Posted by Victor Kabdebon <vi...@gmail.com>.
Dear Brendan,

I would really be interested by your findings too. I need a system to store
various documents, I am thinking of Cassandra (that I am already using) or
using a second type of database or any other system. Maybe like dan
suggested, using mogilefs.

Thank you,
Victor Kabdebon
http://www.voxnucleus.fr

2011/2/3 Dan Kuebrich <da...@gmail.com>

>
>> CouchDB
>>
> That's not what document-oriented means! (har har)
>
> I don't know all the details of your case, but with serving static files I
> suspect you could do ok with something that has a much smaller memory/cpu
> footprint as you won't have as great of write throughput / read latency
> concerns.  I've used mogilefs <http://www.danga.com/mogilefs/> for this
> before.
>
> --
>>
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>

Re: Using Cassandra to store files

Posted by Dan Kuebrich <da...@gmail.com>.
>
>
> CouchDB
>
That's not what document-oriented means! (har har)

I don't know all the details of your case, but with serving static files I
suspect you could do ok with something that has a much smaller memory/cpu
footprint as you won't have as great of write throughput / read latency
concerns.  I've used mogilefs <http://www.danga.com/mogilefs/> for this
before.

--
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: Using Cassandra to store files

Posted by buddhasystem <po...@bnl.gov>.
CouchDB

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Using Cassandra to store files

Posted by sridhar basam <sr...@basam.org>.
For the  number of file the OP has why not just use a traditional filesystem
and solr to index the pdf data. You get to search inside of the files for
relevant information?

 Sri

On Fri, Feb 4, 2011 at 12:47 PM, buddhasystem <po...@bnl.gov> wrote:

>
> Even when storage is in NFS, Cassandra can still be quite useful as a file
> catalog. Your physical storage can change, move etc. Therefore, it's a good
> idea to provide mapping of logical names to physical store points (which in
> fact can be many). This is a standard technique used in mass storage.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: Using Cassandra to store files

Posted by Aditya Narayan <ad...@gmail.com>.
yes, definitely a database for mapping ofcourse!

On Fri, Feb 4, 2011 at 11:17 PM, buddhasystem <po...@bnl.gov> wrote:
>
> Even when storage is in NFS, Cassandra can still be quite useful as a file
> catalog. Your physical storage can change, move etc. Therefore, it's a good
> idea to provide mapping of logical names to physical store points (which in
> fact can be many). This is a standard technique used in mass storage.
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>

Re: Using Cassandra to store files

Posted by buddhasystem <po...@bnl.gov>.
Even when storage is in NFS, Cassandra can still be quite useful as a file
catalog. Your physical storage can change, move etc. Therefore, it's a good
idea to provide mapping of logical names to physical store points (which in
fact can be many). This is a standard technique used in mass storage.

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.