You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Bingbing Liu <ru...@gmail.com> on 2010/04/26 02:50:26 UTC
how to store file in the cassandra?
any suggestion?
2010-04-26
Bingbing Liu
Re: how to store file in the cassandra?
Posted by Tatu Saloranta <ts...@gmail.com>.
On Tue, Apr 27, 2010 at 10:49 PM, Jeff Zhang <zj...@gmail.com> wrote:
> Mark,
>
> Thanks for your suggestion, It's really not a good idea to store one
> file in multiple columns in one row. The heap space problem will still
> exist. And I take your advice to store it in multiple rows, it works,
> I can event store one file with 2G.
True. Unfortunately, splitting into multiple rows complicates things a
lot, since handling of separate chunks breaks atomicity of updates.
But for write-and-forget cases that works (start with assumed first
chunk, continue until non-existing chunk encountered).
-+ Tatu +-
Re: how to store file in the cassandra?
Posted by Jeff Zhang <zj...@gmail.com>.
Mark,
Thanks for your suggestion, It's really not a good idea to store one
file in multiple columns in one row. The heap space problem will still
exist. And I take your advice to store it in multiple rows, it works,
I can event store one file with 2G.
On Mon, Apr 26, 2010 at 6:12 PM, Mark Robson <ma...@gmail.com> wrote:
> On 26 April 2010 00:57, Shuge Lee <sh...@gmail.com> wrote:
>>
>> In Python:
>>
>> keyspace.columnfamily[key][column] = value
>>
>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>
> Hi.
> Storing the filename in the database will not solve the file storage
> problem. Cassandra is a distributed database, and a file stored locally will
> not be available on other client nodes.
> If you're using Cassandra at all, that probably implies that you have lots
> of client nodes. A non-redundant NFS server (for example) would not offer
> high availability, so would be inadequate for the OP's situation.
> Storing files *IN* Cassandra is very useful because you can then retrieve
> them from anywhere with high availability.
> However, as others have discussed, they should be split across multiple
> columns, or if very big, multiple rows.
> I prefer to split by row because this scales better to very large files.
> During compaction, as is well noted, Cassandra needs the entire row in
> memory, which will cause a FAIL once you have files more than a few gigs.
> Mark
--
Best Regards
Jeff Zhang
Re: how to store file in the cassandra?
Posted by Mark Robson <ma...@gmail.com>.
On 26 April 2010 00:57, Shuge Lee <sh...@gmail.com> wrote:
> In Python:
>
> keyspace.columnfamily[key][column] = value
>
> files.video[uuid.uuid4()]['name'] = 'foo.flv'
> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>
Hi.
Storing the filename in the database will not solve the file storage
problem. Cassandra is a distributed database, and a file stored locally will
not be available on other client nodes.
If you're using Cassandra at all, that probably implies that you have lots
of client nodes. A non-redundant NFS server (for example) would not offer
high availability, so would be inadequate for the OP's situation.
Storing files *IN* Cassandra is very useful because you can then retrieve
them from anywhere with high availability.
However, as others have discussed, they should be split across multiple
columns, or if very big, multiple rows.
I prefer to split by row because this scales better to very large files.
During compaction, as is well noted, Cassandra needs the entire row in
memory, which will cause a FAIL once you have files more than a few gigs.
Mark
Re: how to store file in the cassandra?
Posted by Robert Coli <rc...@digg.com>.
On 4/26/10 2:44 AM, dir dir wrote:
> Suppose I have a MPEG video files 15 MB. To save this video file into
> Cassandra database I will store
> this file into array of byte. One day, I feel this video is not
> necessary again,
> therefore I delete it from the database. My question is, after I
> delete this
> video from Cassandra database, should I perform defragmentation operation
> into Cassandra's file database??
Yes, you will need to perform a compaction on the SSTable in order for
the data to be deleted from disk after a deletion from cassandra cluster.
http://wiki.apache.org/cassandra/DistributedDeletes
"
Thus, a delete operation can't just wipe out all traces of the data
being removed immediately: if we did, and a replica did not receive the
delete operation, when it becomes available again it will treat the
replicas that did receive the delete as having missed a write update,
and repair them! So, instead of wiping out data on delete, Cassandra
replaces it with a special value called a tombstone. The tombstone can
then be propagated to replicas that missed the initial remove request.
...
Here, we defined a constant, GCGraceSeconds, and had each node track
tombstone age locally. Once it has aged past the constant, it can be
GC'd during compaction (see MemtableSStable).
"
=Rob
Re: how to store file in the cassandra?
Posted by dir dir <si...@gmail.com>.
Hi Jonathan,
Cassandra seems has not a Blob data type. To handle binary large object
data,
we have to use array of byte. I have a question to you. Suppose I have a
MPEG
video files 15 MB. To save this video file into Cassandra database I will
store
this file into array of byte. One day, I feel this video is not necessary
again,
therefore I delete it from the database. My question is, after I delete this
video from Cassandra database, should I perform defragmentation operation
into Cassandra's file database??
Thank you.
On Mon, Apr 26, 2010 at 8:28 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> Cassandra stores byte arrays. You can certainly store file data in
> it, although if it is larger than a few MB you should chunk it into
> multiple columns.
>
> On Sun, Apr 25, 2010 at 8:21 PM, Shuge Lee <sh...@gmail.com> wrote:
> > Yes.
> >
> > Cassandra does save raw string data only, not a file, and shouldn't save
> a
> > file.
> >
> > 2010/4/26 刘兵兵 <ru...@gmail.com>
> >>
> >> sorry i'm not very familiar with python, are you meaning that the files
> >> are stored in the file system of the os?
> >>
> >> then , the cassandra just stores the path to access the files?
> >>
> >>
> >> On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:
> >>>
> >>> In Python:
> >>>
> >>> keyspace.columnfamily[key][column] = value
> >>>
> >>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
> >>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
> >>>
> >>> create a mapping
> >>> files.video = {
> >>> uuid.uuid4() : {
> >>> 'name' : 'foo.flv',
> >>> 'path' : '/var/files/foo.flv',
> >>> }
> >>> }
> >>>
> >>> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
> >>>
> >>>
> >>> 2010/4/26 Bingbing Liu <ru...@gmail.com>
> >>>>
> >>>> any suggestion?
> >>>>
> >>>> 2010-04-26
> >>>> ________________________________
> >>>> Bingbing Liu
> >>>
> >>>
> >>> --
> >>> Shuge Lee | Lee Li | 李蠡
> >>
> >>
> >>
> >> --
> >> Bingbing Liu
> >>
> >> Web and Mobile Data Management lab
> >>
> >> Renmin University of China
> >
> >
> >
> > --
> > Shuge Lee | Lee Li | 李蠡
> >
>
Re: Re: how to store file in the cassandra?
Posted by Bingbing Liu <ru...@gmail.com>.
thanks ,
2010-04-26
Bingbing Liu
发件人: Jonathan Ellis
发送时间: 2010-04-26 09:29:28
收件人: user
抄送:
主题: Re: how to store file in the cassandra?
Cassandra stores byte arrays. You can certainly store file data in
it, although if it is larger than a few MB you should chunk it into
multiple columns.
On Sun, Apr 25, 2010 at 8:21 PM, Shuge Lee <sh...@gmail.com> wrote:
> Yes.
>
> Cassandra does save raw string data only, not a file, and shouldn't save a
> file.
>
> 2010/4/26 刘兵兵 <ru...@gmail.com>
>>
>> sorry i'm not very familiar with python, are you meaning that the files
>> are stored in the file system of the os?
>>
>> then , the cassandra just stores the path to access the files?
>>
>>
>> On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:
>>>
>>> In Python:
>>>
>>> keyspace.columnfamily[key][column] = value
>>>
>>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
>>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>>>
>>> create a mapping
>>> files.video = {
>>> uuid.uuid4() : {
>>> 'name' : 'foo.flv',
>>> 'path' : '/var/files/foo.flv',
>>> }
>>> }
>>>
>>> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
>>>
>>>
>>> 2010/4/26 Bingbing Liu <ru...@gmail.com>
>>>>
>>>> any suggestion?
>>>>
>>>> 2010-04-26
>>>> ________________________________
>>>> Bingbing Liu
>>>
>>>
>>> --
>>> Shuge Lee | Lee Li | 李蠡
>>
>>
>>
>> --
>> Bingbing Liu
>>
>> Web and Mobile Data Management lab
>>
>> Renmin University of China
>
>
>
> --
> Shuge Lee | Lee Li | 李蠡
>
Re: how to store file in the cassandra?
Posted by Jonathan Ellis <jb...@gmail.com>.
Cassandra stores byte arrays. You can certainly store file data in
it, although if it is larger than a few MB you should chunk it into
multiple columns.
On Sun, Apr 25, 2010 at 8:21 PM, Shuge Lee <sh...@gmail.com> wrote:
> Yes.
>
> Cassandra does save raw string data only, not a file, and shouldn't save a
> file.
>
> 2010/4/26 刘兵兵 <ru...@gmail.com>
>>
>> sorry i'm not very familiar with python, are you meaning that the files
>> are stored in the file system of the os?
>>
>> then , the cassandra just stores the path to access the files?
>>
>>
>> On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:
>>>
>>> In Python:
>>>
>>> keyspace.columnfamily[key][column] = value
>>>
>>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
>>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>>>
>>> create a mapping
>>> files.video = {
>>> uuid.uuid4() : {
>>> 'name' : 'foo.flv',
>>> 'path' : '/var/files/foo.flv',
>>> }
>>> }
>>>
>>> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
>>>
>>>
>>> 2010/4/26 Bingbing Liu <ru...@gmail.com>
>>>>
>>>> any suggestion?
>>>>
>>>> 2010-04-26
>>>> ________________________________
>>>> Bingbing Liu
>>>
>>>
>>> --
>>> Shuge Lee | Lee Li | 李蠡
>>
>>
>>
>> --
>> Bingbing Liu
>>
>> Web and Mobile Data Management lab
>>
>> Renmin University of China
>
>
>
> --
> Shuge Lee | Lee Li | 李蠡
>
Re: how to store file in the cassandra?
Posted by Shuge Lee <sh...@gmail.com>.
Yes.
Cassandra does save raw string data only, not a file, and shouldn't save a
file.
2010/4/26 刘兵兵 <ru...@gmail.com>
> sorry i'm not very familiar with python, are you meaning that the files are
> stored in the file system of the os?
>
> then , the cassandra just stores the path to access the files?
>
>
>
> On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:
>
>> In Python:
>>
>> keyspace.columnfamily[key][column] = value
>>
>> files.video[uuid.uuid4()]['name'] = 'foo.flv'
>> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>>
>> create a mapping
>> files.video = {
>> uuid.uuid4() : {
>> 'name' : 'foo.flv',
>> 'path' : '/var/files/foo.flv',
>> }
>> }
>>
>> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
>>
>>
>> 2010/4/26 Bingbing Liu <ru...@gmail.com>
>>
>> any suggestion?
>>>
>>> 2010-04-26
>>> ------------------------------
>>> Bingbing Liu
>>>
>>
>>
>>
>> --
>> Shuge Lee | Lee Li | 李蠡
>>
>
>
>
> --
> Bingbing Liu
>
> Web and Mobile Data Management lab
>
> Renmin University of China
>
--
Shuge Lee | Lee Li | 李蠡
Re: how to store file in the cassandra?
Posted by 刘兵兵 <ru...@gmail.com>.
sorry i'm not very familiar with python, are you meaning that the files are
stored in the file system of the os?
then , the cassandra just stores the path to access the files?
On Mon, Apr 26, 2010 at 8:57 AM, Shuge Lee <sh...@gmail.com> wrote:
> In Python:
>
> keyspace.columnfamily[key][column] = value
>
> files.video[uuid.uuid4()]['name'] = 'foo.flv'
> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>
> create a mapping
> files.video = {
> uuid.uuid4() : {
> 'name' : 'foo.flv',
> 'path' : '/var/files/foo.flv',
> }
> }
>
> if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
>
>
> 2010/4/26 Bingbing Liu <ru...@gmail.com>
>
> any suggestion?
>>
>> 2010-04-26
>> ------------------------------
>> Bingbing Liu
>>
>
>
>
> --
> Shuge Lee | Lee Li | 李蠡
>
--
Bingbing Liu
Web and Mobile Data Management lab
Renmin University of China
Re: how to store file in the cassandra?
Posted by Shuge Lee <sh...@gmail.com>.
In Python:
keyspace.columnfamily[key][column] = value
files.video[uuid.uuid4()]['name'] = 'foo.flv'
files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
create a mapping
files.video = {
uuid.uuid4() : {
'name' : 'foo.flv',
'path' : '/var/files/foo.flv',
}
}
if most of sizes >= 0.5MB, use sys-fs/reiser4progs, else use ext4.
2010/4/26 Bingbing Liu <ru...@gmail.com>
> any suggestion?
>
> 2010-04-26
> ------------------------------
> Bingbing Liu
>
--
Shuge Lee | Lee Li | 李蠡