You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Bingbing Liu <ru...@gmail.com> on 2010/04/26 02:50:26 UTC

how to store file in the cassandra?

any suggestion?

2010-04-26 



Bingbing Liu 

Re: how to store file in the cassandra?

Posted by Tatu Saloranta <ts...@gmail.com>.
On Tue, Apr 27, 2010 at 10:49 PM, Jeff Zhang <zj...@gmail.com> wrote:
> Mark,
>
> Thanks for your suggestion, It's really not a good idea to store one
> file in multiple columns in one row. The heap space problem will still
> exist. And I take your advice to store it in multiple rows, it works,
> I can event store one file with 2G.

True. Unfortunately, splitting into multiple rows complicates things a
lot, since handling of separate chunks breaks atomicity of updates.
But for write-and-forget cases that works (start with assumed first
chunk, continue until non-existing chunk encountered).

-+ Tatu +-

Re: how to store file in the cassandra?

Posted by Jeff Zhang <zj...@gmail.com>.
Mark,

Thanks for your suggestion, It's really not a good idea to store one
file in multiple columns in one row. The heap space problem will still
exist. And I take your advice to store it in multiple rows, it works,
I can event store one file with 2G.



On Mon, Apr 26, 2010 at 6:12 PM, Mark Robson <ma...@gmail.com> wrote:
> On 26 April 2010 00:57, Shuge Lee <sh...@gmail.com> wrote:
>>
>> In Python:
>>
>> keyspace.columnfamily[key][column] = value
>>
>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>
> Hi.
> Storing the filename in the database will not solve the file storage
> problem. Cassandra is a distributed database, and a file stored locally will
> not be available on other client nodes.
> If you're using Cassandra at all, that probably implies that you have lots
> of client nodes. A non-redundant NFS server (for example) would not offer
> high availability, so would be inadequate for the OP's situation.
> Storing files *IN* Cassandra is very useful because you can then retrieve
> them from anywhere with high availability.
> However, as others have discussed, they should be split across multiple
> columns, or if very big, multiple rows.
> I prefer to split by row because this scales better to very large files.
> During compaction, as is well noted, Cassandra needs the entire row in
> memory, which will cause a FAIL  once you have files more than a few gigs.
> Mark



-- 
Best Regards

Jeff Zhang

Re: how to store file in the cassandra?

Posted by Mark Robson <ma...@gmail.com>.
On 26 April 2010 00:57, Shuge Lee <sh...@gmail.com> wrote:

> In Python:
>
> keyspace.columnfamily[key][column] = value
>
> files.video[uuid.uuid4()]['name'] = 'foo.flv'
> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>

Hi.

Storing the filename in the database will not solve the file storage
problem. Cassandra is a distributed database, and a file stored locally will
not be available on other client nodes.

If you're using Cassandra at all, that probably implies that you have lots
of client nodes. A non-redundant NFS server (for example) would not offer
high availability, so would be inadequate for the OP's situation.

Storing files *IN* Cassandra is very useful because you can then retrieve
them from anywhere with high availability.

However, as others have discussed, they should be split across multiple
columns, or if very big, multiple rows.

I prefer to split by row because this scales better to very large files.
During compaction, as is well noted, Cassandra needs the entire row in
memory, which will cause a FAIL  once you have files more than a few gigs.

Mark

Re: how to store file in the cassandra?

Posted by Robert Coli <rc...@digg.com>.
On 4/26/10 2:44 AM, dir dir wrote:
> Suppose I have a MPEG video files 15 MB. To save this video file into 
> Cassandra database I will store
> this file into array of byte. One day, I feel this video is not 
> necessary again,
> therefore I delete it from the database. My question is, after I 
> delete this
> video from Cassandra database, should I perform defragmentation operation
> into Cassandra's file database??
Yes, you will need to perform a compaction on the SSTable in order for 
the data to be deleted from disk after a deletion from cassandra cluster.

http://wiki.apache.org/cassandra/DistributedDeletes
"
Thus, a delete operation can't just wipe out all traces of the data 
being removed immediately: if we did, and a replica did not receive the 
delete operation, when it becomes available again it will treat the 
replicas that did receive the delete as having missed a write update, 
and repair them! So, instead of wiping out data on delete, Cassandra 
replaces it with a special value called a tombstone. The tombstone can 
then be propagated to replicas that missed the initial remove request.
...
Here, we defined a constant, GCGraceSeconds, and had each node track 
tombstone age locally. Once it has aged past the constant, it can be 
GC'd during compaction (see MemtableSStable).
"

=Rob

Re: how to store file in the cassandra?

Posted by dir dir <si...@gmail.com>.
Hi Jonathan,

Cassandra seems has not a Blob data type. To handle binary large object
data,
we have to use array of byte. I have a question to you. Suppose I have a
MPEG
video files 15 MB. To save this video file into Cassandra database I will
store
this file into array of byte. One day, I feel this video is not necessary
again,
therefore I delete it from the database. My question is, after I delete this
video from Cassandra database, should I perform defragmentation operation
into Cassandra's file database??

Thank you.


On Mon, Apr 26, 2010 at 8:28 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> Cassandra stores byte arrays.  You can certainly store file data in
> it, although if it is larger than a few MB you should chunk it into
> multiple columns.
>
> On Sun, Apr 25, 2010 at 8:21 PM, Shuge Lee <sh...@gmail.com> wrote:
> > Yes.
> >
> > Cassandra does save raw string data only, not a file, and shouldn't save
> a
> > file.
> >
> > 2010/4/26 刘兵兵 <ru...@gmail.com>
> >>
> >> sorry i'm not very familiar with python, are you meaning that the files
> >> are stored in the file system of the os?
> >>
> >> then , the cassandra just stores the path to access the files?
> >>
> >>
> >> On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:
> >>>
> >>> In Python:
> >>>
> >>> keyspace.columnfamily[key][column] = value
> >>>
> >>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
> >>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
> >>>
> >>> create a mapping
> >>> files.video =  {
> >>>     uuid.uuid4() : {
> >>>         'name' : 'foo.flv',
> >>>         'path' : '/var/files/foo.flv',
> >>>     }
> >>> }
> >>>
> >>> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
> >>>
> >>>
> >>> 2010/4/26 Bingbing Liu <ru...@gmail.com>
> >>>>
> >>>> any suggestion?
> >>>>
> >>>> 2010-04-26
> >>>> ________________________________
> >>>> Bingbing Liu
> >>>
> >>>
> >>> --
> >>> Shuge Lee | Lee Li | 李蠡
> >>
> >>
> >>
> >> --
> >> Bingbing Liu
> >>
> >> Web and Mobile Data Management lab
> >>
> >> Renmin University  of  China
> >
> >
> >
> > --
> > Shuge Lee | Lee Li | 李蠡
> >
>

Re: Re: how to store file in the cassandra?

Posted by Bingbing Liu <ru...@gmail.com>.
thanks , 


2010-04-26 



Bingbing Liu 



发件人: Jonathan Ellis 
发送时间: 2010-04-26  09:29:28 
收件人: user 
抄送: 
主题: Re: how to store file in the cassandra? 
 
Cassandra stores byte arrays.  You can certainly store file data in
it, although if it is larger than a few MB you should chunk it into
multiple columns.
On Sun, Apr 25, 2010 at 8:21 PM, Shuge Lee <sh...@gmail.com> wrote:
> Yes.
>
> Cassandra does save raw string data only, not a file, and shouldn't save a
> file.
>
> 2010/4/26 刘兵兵 <ru...@gmail.com>
>>
>> sorry i'm not very familiar with python, are you meaning that the files
>> are stored in the file system of the os?
>>
>> then , the cassandra just stores the path to access the files?
>>
>>
>> On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:
>>>
>>> In Python:
>>>
>>> keyspace.columnfamily[key][column] = value
>>>
>>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
>>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>>>
>>> create a mapping
>>> files.video =  {
>>>     uuid.uuid4() : {
>>>         'name' : 'foo.flv',
>>>         'path' : '/var/files/foo.flv',
>>>     }
>>> }
>>>
>>> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
>>>
>>>
>>> 2010/4/26 Bingbing Liu <ru...@gmail.com>
>>>>
>>>> any suggestion?
>>>>
>>>> 2010-04-26
>>>> ________________________________
>>>> Bingbing Liu
>>>
>>>
>>> --
>>> Shuge Lee | Lee Li | 李蠡
>>
>>
>>
>> --
>> Bingbing Liu
>>
>> Web and Mobile Data Management lab
>>
>> Renmin University  of  China
>
>
>
> --
> Shuge Lee | Lee Li | 李蠡
>

Re: how to store file in the cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.
Cassandra stores byte arrays.  You can certainly store file data in
it, although if it is larger than a few MB you should chunk it into
multiple columns.

On Sun, Apr 25, 2010 at 8:21 PM, Shuge Lee <sh...@gmail.com> wrote:
> Yes.
>
> Cassandra does save raw string data only, not a file, and shouldn't save a
> file.
>
> 2010/4/26 刘兵兵 <ru...@gmail.com>
>>
>> sorry i'm not very familiar with python, are you meaning that the files
>> are stored in the file system of the os?
>>
>> then , the cassandra just stores the path to access the files?
>>
>>
>> On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:
>>>
>>> In Python:
>>>
>>> keyspace.columnfamily[key][column] = value
>>>
>>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
>>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>>>
>>> create a mapping
>>> files.video =  {
>>>     uuid.uuid4() : {
>>>         'name' : 'foo.flv',
>>>         'path' : '/var/files/foo.flv',
>>>     }
>>> }
>>>
>>> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
>>>
>>>
>>> 2010/4/26 Bingbing Liu <ru...@gmail.com>
>>>>
>>>> any suggestion?
>>>>
>>>> 2010-04-26
>>>> ________________________________
>>>> Bingbing Liu
>>>
>>>
>>> --
>>> Shuge Lee | Lee Li | 李蠡
>>
>>
>>
>> --
>> Bingbing Liu
>>
>> Web and Mobile Data Management lab
>>
>> Renmin University  of  China
>
>
>
> --
> Shuge Lee | Lee Li | 李蠡
>

Re: how to store file in the cassandra?

Posted by Shuge Lee <sh...@gmail.com>.
Yes.

Cassandra does save raw string data only, not a file, and shouldn't save a
file.

2010/4/26 刘兵兵 <ru...@gmail.com>

> sorry i'm not very familiar with python, are you meaning that the files are
> stored in the file system of the os?
>
> then , the cassandra just stores the path to access the files?
>
>
>
> On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:
>
>> In Python:
>>
>> keyspace.columnfamily[key][column] = value
>>
>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>>
>> create a mapping
>> files.video =  {
>>     uuid.uuid4() : {
>>         'name' : 'foo.flv',
>>         'path' : '/var/files/foo.flv',
>>     }
>> }
>>
>> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
>>
>>
>> 2010/4/26 Bingbing Liu <ru...@gmail.com>
>>
>>  any suggestion?
>>>
>>> 2010-04-26
>>> ------------------------------
>>> Bingbing Liu
>>>
>>
>>
>>
>> --
>> Shuge Lee | Lee Li | 李蠡
>>
>
>
>
> --
> Bingbing Liu
>
> Web and Mobile Data Management lab
>
> Renmin University  of  China
>



-- 
Shuge Lee | Lee Li | 李蠡

Re: how to store file in the cassandra?

Posted by 刘兵兵 <ru...@gmail.com>.
sorry i'm not very familiar with python, are you meaning that the files are
stored in the file system of the os?

then , the cassandra just stores the path to access the files?


On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:

> In Python:
>
> keyspace.columnfamily[key][column] = value
>
> files.video[uuid.uuid4()]['name'] = 'foo.flv'
> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>
> create a mapping
> files.video =  {
>     uuid.uuid4() : {
>         'name' : 'foo.flv',
>         'path' : '/var/files/foo.flv',
>     }
> }
>
> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
>
>
> 2010/4/26 Bingbing Liu <ru...@gmail.com>
>
>  any suggestion?
>>
>> 2010-04-26
>> ------------------------------
>> Bingbing Liu
>>
>
>
>
> --
> Shuge Lee | Lee Li | 李蠡
>



-- 
Bingbing Liu

Web and Mobile Data Management lab

Renmin University  of  China

Re: how to store file in the cassandra?

Posted by Shuge Lee <sh...@gmail.com>.
In Python:

keyspace.columnfamily[key][column] = value

files.video[uuid.uuid4()]['name'] = 'foo.flv'
files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'

create a mapping
files.video =  {
    uuid.uuid4() : {
        'name' : 'foo.flv',
        'path' : '/var/files/foo.flv',
    }
}

if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.


2010/4/26 Bingbing Liu <ru...@gmail.com>

>  any suggestion?
>
> 2010-04-26
> ------------------------------
> Bingbing Liu
>



-- 
Shuge Lee | Lee Li | 李蠡