You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Zhuguo Shi <bl...@gmail.com> on 2010/04/15 03:42:58 UTC

Is that possible to write a file system over Cassandra?

Hi,

Cassandra has a good distributed model: decentralized, auto-partition,
auto-recovery. I am evaluating about writing a file system over Cassandra
(like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
Cassandra is good at such use case?

Regards

Re: Is that possible to write a file system over Cassandra?

Posted by Miguel Verde <mi...@gmail.com>.

On Wed, Apr 14, 2010 at 9:15 PM, Ken Sandney <bl...@gmail.com> wrote:

> Large files can be split into small blocks, and the size of block can be
> tuned. It may increase the complexity of writing such a file system, but can
> be for general purpose (not only for relative small files)


 Right, this is the path that MongoDB has taken with GridFS:
http://www.mongodb.org/display/DOCS/GridFS+Specification

I don't have any use for such a filesystem, but if I were to design one I
would probably mostly follow Tatu's suggestions:


>  On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta <ts...@gmail.com>wrote:
>>
>> So I think it can definitely be a good use case, and I may use
>> Cassandra for this myself in future. Having range queries allows
>> implementing directory/path structures (list keys using path as
>> prefix). And you can split storage such that metadata could live in
>> OPP partition, raw data in RP.
>
>
but using OPP for all data, using prefixed metadata, and UUID_chunk# for
keys in the chunk CF.

Re: Is that possible to write a file system over Cassandra?

Posted by HubertChang <hu...@gmail.com>.

Note: there are glusterfs, ceph, brtfs and luster. there is drbd.
-- 
View this message in context: http://n2.nabble.com/Is-that-possible-to-write-a-file-system-over-Cassandra-tp4905111p4905312.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Is that possible to write a file system over Cassandra?

Posted by Jeff Zhang <zj...@gmail.com>.

Thanks, Nathan.



On Fri, Apr 16, 2010 at 12:04 PM, Nathan McCall <na...@vervewireless.com>wrote:

> In regards to hector, please check all the available branches on
> github. We have supported 0.6 for a little while now.
>
> http://github.com/rantav/hector/tree/0.6.0
>
> The master is still based on 0.5, but that is changing in the next
> couple of days to match the 0.6 release.
>
> -Nate
>
>
>
>
> On Thu, Apr 15, 2010 at 6:35 PM, Jeff Zhang <zj...@gmail.com> wrote:
> > Jonathan,
> >
> > Previously we use the cassandra-0.6, but we'd like to leverage the hector
> > java client since it has more advanced features. And hector currently
> only
> > support cassandra-0.5.
> > Why you think using casandra-0.5 is a stange way to do it ? Is
> cassandra-0.6
> > incompatibility with cassandra-0.5 ? The migration to cassandra-0.6 will
> > cost much ?
> >
> >
> > On Thu, Apr 15, 2010 at 11:50 AM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> You forked Cassandra 0.5 for that?
> >>
> >> That's... a strange way to do it.
> >>
> >> On Wed, Apr 14, 2010 at 9:36 PM, Jeff Zhang <zj...@gmail.com> wrote:
> >> > We are currently doing such things, and now we are still at the start
> >> > stage.
> >> > Currently we only plan to store small files. For large files,
> splitting
> >> > to
> >> > small blocks is really one of our options.
> >> > You can check out from here http://code.google.com/p/cassandra-fs/
> >> >
> >> > Document for this project is lack now, but still welcome any feedback
> >> > and
> >> > contribution.
> >> >
> >> >
> >> >
> >> > On Wed, Apr 14, 2010 at 7:32 PM, Miguel Verde <
> miguelitovert@gmail.com>
> >> > wrote:
> >> >>
> >> >> On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman
> >> >> <av...@gmail.com> wrote:
> >> >>>
> >> >>> OPP is not required here. You would be better off using a Random
> >> >>> partitioner because you want to get a random distribution of the
> >> >>> metadata.
> >> >>
> >> >>
> >> >> Not required, certainly.  However, it strikes me that 1 cluster is
> >> >> better
> >> >> than 2, and most consumers of a filesystem would expect to be able to
> >> >> get an
> >> >> ordered listing or tree of the metadata which is easy using the OPP
> row
> >> >> key
> >> >> pattern listed previously.  You could still do this with the Random
> >> >> partitioner using column names in rows to describe the structure but
> >> >> the
> >> >> current compaction limitations could be an issue if a branch becomes
> >> >> too
> >> >> large, and you'd still have a root row hotspot (at least in the
> schema
> >> >> which
> >> >> comes to mind).
> >> >
> >> >
> >> > --
> >> > Best Regards
> >> >
> >> > Jeff Zhang
> >> >
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>



-- 
Best Regards

Jeff Zhang

Re: Is that possible to write a file system over Cassandra?

Posted by Nathan McCall <na...@vervewireless.com>.

In regards to hector, please check all the available branches on
github. We have supported 0.6 for a little while now.

http://github.com/rantav/hector/tree/0.6.0

The master is still based on 0.5, but that is changing in the next
couple of days to match the 0.6 release.

-Nate




On Thu, Apr 15, 2010 at 6:35 PM, Jeff Zhang <zj...@gmail.com> wrote:
> Jonathan,
>
> Previously we use the cassandra-0.6, but we'd like to leverage the hector
> java client since it has more advanced features. And hector currently only
> support cassandra-0.5.
> Why you think using casandra-0.5 is a stange way to do it ? Is cassandra-0.6
> incompatibility with cassandra-0.5 ? The migration to cassandra-0.6 will
> cost much ?
>
>
> On Thu, Apr 15, 2010 at 11:50 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> You forked Cassandra 0.5 for that?
>>
>> That's... a strange way to do it.
>>
>> On Wed, Apr 14, 2010 at 9:36 PM, Jeff Zhang <zj...@gmail.com> wrote:
>> > We are currently doing such things, and now we are still at the start
>> > stage.
>> > Currently we only plan to store small files. For large files, splitting
>> > to
>> > small blocks is really one of our options.
>> > You can check out from here http://code.google.com/p/cassandra-fs/
>> >
>> > Document for this project is lack now, but still welcome any feedback
>> > and
>> > contribution.
>> >
>> >
>> >
>> > On Wed, Apr 14, 2010 at 7:32 PM, Miguel Verde <mi...@gmail.com>
>> > wrote:
>> >>
>> >> On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman
>> >> <av...@gmail.com> wrote:
>> >>>
>> >>> OPP is not required here. You would be better off using a Random
>> >>> partitioner because you want to get a random distribution of the
>> >>> metadata.
>> >>
>> >>
>> >> Not required, certainly.  However, it strikes me that 1 cluster is
>> >> better
>> >> than 2, and most consumers of a filesystem would expect to be able to
>> >> get an
>> >> ordered listing or tree of the metadata which is easy using the OPP row
>> >> key
>> >> pattern listed previously.  You could still do this with the Random
>> >> partitioner using column names in rows to describe the structure but
>> >> the
>> >> current compaction limitations could be an issue if a branch becomes
>> >> too
>> >> large, and you'd still have a root row hotspot (at least in the schema
>> >> which
>> >> comes to mind).
>> >
>> >
>> > --
>> > Best Regards
>> >
>> > Jeff Zhang
>> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Is that possible to write a file system over Cassandra?

Posted by Jeff Zhang <zj...@gmail.com>.

Yes, we are in a rush at the beginning of this prototype.
Now the code structure looks better.

On Fri, Apr 16, 2010 at 5:46 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> The strange part is copying the entire cassandra source tree.
>
> On Thu, Apr 15, 2010 at 8:35 PM, Jeff Zhang <zj...@gmail.com> wrote:
> > Jonathan,
> >
> > Previously we use the cassandra-0.6, but we'd like to leverage the hector
> > java client since it has more advanced features. And hector currently
> only
> > support cassandra-0.5.
> > Why you think using casandra-0.5 is a stange way to do it ? Is
> cassandra-0.6
> > incompatibility with cassandra-0.5 ? The migration to cassandra-0.6 will
> > cost much ?
> >
> >
> > On Thu, Apr 15, 2010 at 11:50 AM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> You forked Cassandra 0.5 for that?
> >>
> >> That's... a strange way to do it.
> >>
> >> On Wed, Apr 14, 2010 at 9:36 PM, Jeff Zhang <zj...@gmail.com> wrote:
> >> > We are currently doing such things, and now we are still at the start
> >> > stage.
> >> > Currently we only plan to store small files. For large files,
> splitting
> >> > to
> >> > small blocks is really one of our options.
> >> > You can check out from here http://code.google.com/p/cassandra-fs/
> >> >
> >> > Document for this project is lack now, but still welcome any feedback
> >> > and
> >> > contribution.
> >> >
> >> >
> >> >
> >> > On Wed, Apr 14, 2010 at 7:32 PM, Miguel Verde <
> miguelitovert@gmail.com>
> >> > wrote:
> >> >>
> >> >> On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman
> >> >> <av...@gmail.com> wrote:
> >> >>>
> >> >>> OPP is not required here. You would be better off using a Random
> >> >>> partitioner because you want to get a random distribution of the
> >> >>> metadata.
> >> >>
> >> >>
> >> >> Not required, certainly.  However, it strikes me that 1 cluster is
> >> >> better
> >> >> than 2, and most consumers of a filesystem would expect to be able to
> >> >> get an
> >> >> ordered listing or tree of the metadata which is easy using the OPP
> row
> >> >> key
> >> >> pattern listed previously.  You could still do this with the Random
> >> >> partitioner using column names in rows to describe the structure but
> >> >> the
> >> >> current compaction limitations could be an issue if a branch becomes
> >> >> too
> >> >> large, and you'd still have a root row hotspot (at least in the
> schema
> >> >> which
> >> >> comes to mind).
> >> >
> >> >
> >> > --
> >> > Best Regards
> >> >
> >> > Jeff Zhang
> >> >
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>



-- 
Best Regards

Jeff Zhang

Re: Is that possible to write a file system over Cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.

The strange part is copying the entire cassandra source tree.

On Thu, Apr 15, 2010 at 8:35 PM, Jeff Zhang <zj...@gmail.com> wrote:
> Jonathan,
>
> Previously we use the cassandra-0.6, but we'd like to leverage the hector
> java client since it has more advanced features. And hector currently only
> support cassandra-0.5.
> Why you think using casandra-0.5 is a stange way to do it ? Is cassandra-0.6
> incompatibility with cassandra-0.5 ? The migration to cassandra-0.6 will
> cost much ?
>
>
> On Thu, Apr 15, 2010 at 11:50 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> You forked Cassandra 0.5 for that?
>>
>> That's... a strange way to do it.
>>
>> On Wed, Apr 14, 2010 at 9:36 PM, Jeff Zhang <zj...@gmail.com> wrote:
>> > We are currently doing such things, and now we are still at the start
>> > stage.
>> > Currently we only plan to store small files. For large files, splitting
>> > to
>> > small blocks is really one of our options.
>> > You can check out from here http://code.google.com/p/cassandra-fs/
>> >
>> > Document for this project is lack now, but still welcome any feedback
>> > and
>> > contribution.
>> >
>> >
>> >
>> > On Wed, Apr 14, 2010 at 7:32 PM, Miguel Verde <mi...@gmail.com>
>> > wrote:
>> >>
>> >> On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman
>> >> <av...@gmail.com> wrote:
>> >>>
>> >>> OPP is not required here. You would be better off using a Random
>> >>> partitioner because you want to get a random distribution of the
>> >>> metadata.
>> >>
>> >>
>> >> Not required, certainly.  However, it strikes me that 1 cluster is
>> >> better
>> >> than 2, and most consumers of a filesystem would expect to be able to
>> >> get an
>> >> ordered listing or tree of the metadata which is easy using the OPP row
>> >> key
>> >> pattern listed previously.  You could still do this with the Random
>> >> partitioner using column names in rows to describe the structure but
>> >> the
>> >> current compaction limitations could be an issue if a branch becomes
>> >> too
>> >> large, and you'd still have a root row hotspot (at least in the schema
>> >> which
>> >> comes to mind).
>> >
>> >
>> > --
>> > Best Regards
>> >
>> > Jeff Zhang
>> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Is that possible to write a file system over Cassandra?

Posted by Jeff Zhang <zj...@gmail.com>.

Jonathan,

Previously we use the cassandra-0.6, but we'd like to leverage the hector
java client since it has more advanced features. And hector currently only
support cassandra-0.5.
Why you think using casandra-0.5 is a stange way to do it ? Is cassandra-0.6
incompatibility with cassandra-0.5 ? The migration to cassandra-0.6 will
cost much ?


On Thu, Apr 15, 2010 at 11:50 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> You forked Cassandra 0.5 for that?
>
> That's... a strange way to do it.
>
> On Wed, Apr 14, 2010 at 9:36 PM, Jeff Zhang <zj...@gmail.com> wrote:
> > We are currently doing such things, and now we are still at the start
> stage.
> > Currently we only plan to store small files. For large files, splitting
> to
> > small blocks is really one of our options.
> > You can check out from here http://code.google.com/p/cassandra-fs/
> >
> > Document for this project is lack now, but still welcome any feedback and
> > contribution.
> >
> >
> >
> > On Wed, Apr 14, 2010 at 7:32 PM, Miguel Verde <mi...@gmail.com>
> > wrote:
> >>
> >> On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman
> >> <av...@gmail.com> wrote:
> >>>
> >>> OPP is not required here. You would be better off using a Random
> >>> partitioner because you want to get a random distribution of the
> metadata.
> >>
> >>
> >> Not required, certainly.  However, it strikes me that 1 cluster is
> better
> >> than 2, and most consumers of a filesystem would expect to be able to
> get an
> >> ordered listing or tree of the metadata which is easy using the OPP row
> key
> >> pattern listed previously.  You could still do this with the Random
> >> partitioner using column names in rows to describe the structure but the
> >> current compaction limitations could be an issue if a branch becomes too
> >> large, and you'd still have a root row hotspot (at least in the schema
> which
> >> comes to mind).
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>



-- 
Best Regards

Jeff Zhang

Re: Is that possible to write a file system over Cassandra?

Posted by Ken Sandney <bl...@gmail.com>.

tried CassFS, but not stable yet, may be a good prototype to start

On Thu, Apr 15, 2010 at 12:15 PM, Michael Greene
<mi...@gmail.com>wrote:

> On Wed, Apr 14, 2010 at 11:01 PM, Ken Sandney <bl...@gmail.com> wrote:
>
>>  a fuse based FS maybe better I guess
>
>
> This has been done, for better or worse, by jdarcy of http://pl.atyp.us/:
> http://github.com/jdarcy/CassFS
>

Re: Is that possible to write a file system over Cassandra?

Posted by Michael Greene <mi...@gmail.com>.

On Wed, Apr 14, 2010 at 11:01 PM, Ken Sandney <bl...@gmail.com> wrote:

>  a fuse based FS maybe better I guess


This has been done, for better or worse, by jdarcy of http://pl.atyp.us/:
http://github.com/jdarcy/CassFS

Re: Is that possible to write a file system over Cassandra?

Posted by Ken Sandney <bl...@gmail.com>.

 a fuse based FS maybe better I guess

On Thu, Apr 15, 2010 at 11:50 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> You forked Cassandra 0.5 for that?
>
> That's... a strange way to do it.
>
> On Wed, Apr 14, 2010 at 9:36 PM, Jeff Zhang <zj...@gmail.com> wrote:
> > We are currently doing such things, and now we are still at the start
> stage.
> > Currently we only plan to store small files. For large files, splitting
> to
> > small blocks is really one of our options.
> > You can check out from here http://code.google.com/p/cassandra-fs/
> >
> > Document for this project is lack now, but still welcome any feedback and
> > contribution.
> >
> >
> >
> > On Wed, Apr 14, 2010 at 7:32 PM, Miguel Verde <mi...@gmail.com>
> > wrote:
> >>
> >> On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman
> >> <av...@gmail.com> wrote:
> >>>
> >>> OPP is not required here. You would be better off using a Random
> >>> partitioner because you want to get a random distribution of the
> metadata.
> >>
> >>
> >> Not required, certainly.  However, it strikes me that 1 cluster is
> better
> >> than 2, and most consumers of a filesystem would expect to be able to
> get an
> >> ordered listing or tree of the metadata which is easy using the OPP row
> key
> >> pattern listed previously.  You could still do this with the Random
> >> partitioner using column names in rows to describe the structure but the
> >> current compaction limitations could be an issue if a branch becomes too
> >> large, and you'd still have a root row hotspot (at least in the schema
> which
> >> comes to mind).
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>

Re: Is that possible to write a file system over Cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.

You forked Cassandra 0.5 for that?

That's... a strange way to do it.

On Wed, Apr 14, 2010 at 9:36 PM, Jeff Zhang <zj...@gmail.com> wrote:
> We are currently doing such things, and now we are still at the start stage.
> Currently we only plan to store small files. For large files, splitting to
> small blocks is really one of our options.
> You can check out from here http://code.google.com/p/cassandra-fs/
>
> Document for this project is lack now, but still welcome any feedback and
> contribution.
>
>
>
> On Wed, Apr 14, 2010 at 7:32 PM, Miguel Verde <mi...@gmail.com>
> wrote:
>>
>> On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman
>> <av...@gmail.com> wrote:
>>>
>>> OPP is not required here. You would be better off using a Random
>>> partitioner because you want to get a random distribution of the metadata.
>>
>>
>> Not required, certainly.  However, it strikes me that 1 cluster is better
>> than 2, and most consumers of a filesystem would expect to be able to get an
>> ordered listing or tree of the metadata which is easy using the OPP row key
>> pattern listed previously.  You could still do this with the Random
>> partitioner using column names in rows to describe the structure but the
>> current compaction limitations could be an issue if a branch becomes too
>> large, and you'd still have a root row hotspot (at least in the schema which
>> comes to mind).
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Is that possible to write a file system over Cassandra?

Posted by Jeff Zhang <zj...@gmail.com>.

We are currently doing such things, and now we are still at the start stage.

Currently we only plan to store small files. For large files, splitting to
small blocks is really one of our options.
You can check out from here http://code.google.com/p/cassandra-fs/

Document for this project is lack now, but still welcome any feedback and
contribution.

On Wed, Apr 14, 2010 at 7:32 PM, Miguel Verde <mi...@gmail.com>wrote:

> On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman <
> avinash.lakshman@gmail.com> wrote:
>
>> OPP is not required here. You would be better off using a Random
>> partitioner because you want to get a random distribution of the metadata.
>
>
> Not required, certainly.  However, it strikes me that 1 cluster is better
> than 2, and most consumers of a filesystem would expect to be able to get an
> ordered listing or tree of the metadata which is easy using the OPP row key
> pattern listed previously.  You could still do this with the Random
> partitioner using column names in rows to describe the structure but the
> current compaction limitations could be an issue if a branch becomes too
> large, and you'd still have a root row hotspot (at least in the schema which
> comes to mind).
>

-- 
Best Regards

Jeff Zhang

Re: Is that possible to write a file system over Cassandra?

Posted by Miguel Verde <mi...@gmail.com>.

On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman <
avinash.lakshman@gmail.com> wrote:

> OPP is not required here. You would be better off using a Random
> partitioner because you want to get a random distribution of the metadata.

Not required, certainly.  However, it strikes me that 1 cluster is better
than 2, and most consumers of a filesystem would expect to be able to get an
ordered listing or tree of the metadata which is easy using the OPP row key
pattern listed previously.  You could still do this with the Random
partitioner using column names in rows to describe the structure but the
current compaction limitations could be an issue if a branch becomes too
large, and you'd still have a root row hotspot (at least in the schema which
comes to mind).

Re: Is that possible to write a file system over Cassandra?

Posted by Vijay <vi...@gmail.com>.

What I did for one of our project was similar.... Use super col to strore
files and dir metadata.... use another row(Key UUID) to store the dir
contents (Files and subdirectory). we used UUID instead of paths because
there will be rename or move.... store the small files in cassandra....

We used Internally developed filesystem to store the big files which are
more than x bytes.... Locking is done using Zookeeper and queuing by zeromq.

Regards,
</VJ>

On Wed, Apr 14, 2010 at 9:39 PM, Tatu Saloranta <ts...@gmail.com>wrote:

> On Wed, Apr 14, 2010 at 7:26 PM, Avinash Lakshman
> <av...@gmail.com> wrote:
> > OPP is not required here. You would be better off using a Random
> partitioner
> > because you want to get a random distribution of the metadata.
>
> Not for splitting, but for actual file system hierarchy it would. How
> else would you traverse hierarchy? (list sub-directiories, files)
>
> As to splitting files, yes, can be done, but I personally think that
> would be asking for trouble because of lack atomicity for operations.
> Exception being if only operations ever would be append.
>
> -+ Tatu +-
>

Re: Is that possible to write a file system over Cassandra?

Posted by Tatu Saloranta <ts...@gmail.com>.

On Wed, Apr 14, 2010 at 7:26 PM, Avinash Lakshman
<av...@gmail.com> wrote:
> OPP is not required here. You would be better off using a Random partitioner
> because you want to get a random distribution of the metadata.

Not for splitting, but for actual file system hierarchy it would. How
else would you traverse hierarchy? (list sub-directiories, files)

As to splitting files, yes, can be done, but I personally think that
would be asking for trouble because of lack atomicity for operations.
Exception being if only operations ever would be append.

-+ Tatu +-

Re: Is that possible to write a file system over Cassandra?

Posted by Avinash Lakshman <av...@gmail.com>.

OPP is not required here. You would be better off using a Random partitioner
because you want to get a random distribution of the metadata.

Avinash

On Wed, Apr 14, 2010 at 7:25 PM, Avinash Lakshman <
avinash.lakshman@gmail.com> wrote:

> Exactly. You can split a file into blocks of any size and you can actually
> distribute the metadata across a large set of machines. You wouldn't have
> the issue of having small files in this approach. The issue maybe the
> eventual consistency - not sure that is a paradigm that would be acceptable
> for a file system. But that is a discussion for another time/day.
>
> Avinash
>
> On Wed, Apr 14, 2010 at 7:15 PM, Ken Sandney <bl...@gmail.com> wrote:
>
>> Large files can be split into small blocks, and the size of block can be
>> tuned. It may increase the complexity of writing such a file system, but can
>> be for general purpose (not only for relative small files)
>>
>>
>> On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta <ts...@gmail.com>wrote:
>>
>>> On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi <bl...@gmail.com> wrote:
>>> > Hi,
>>> > Cassandra has a good distributed model: decentralized, auto-partition,
>>> > auto-recovery. I am evaluating about writing a file system over
>>> Cassandra
>>> > (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
>>> > Cassandra is good at such use case?
>>>
>>> It sort of depends on what you are looking for. From use case for
>>> which something like S3 is good, yes, except with one difference:
>>> Cassandra is more geared towards lots of small files, whereas S3 is
>>> more geared towards moderate number of files (possibly large).
>>>
>>> So I think it can definitely be a good use case, and I may use
>>> Cassandra for this myself in future. Having range queries allows
>>> implementing directory/path structures (list keys using path as
>>> prefix). And you can split storage such that metadata could live in
>>> OPP partition, raw data in RP.
>>>
>>> -+ Tatu +-
>>>
>>
>>
>

Re: Is that possible to write a file system over Cassandra?

Posted by Avinash Lakshman <av...@gmail.com>.

Exactly. You can split a file into blocks of any size and you can actually
distribute the metadata across a large set of machines. You wouldn't have
the issue of having small files in this approach. The issue maybe the
eventual consistency - not sure that is a paradigm that would be acceptable
for a file system. But that is a discussion for another time/day.

Avinash

On Wed, Apr 14, 2010 at 7:15 PM, Ken Sandney <bl...@gmail.com> wrote:

> Large files can be split into small blocks, and the size of block can be
> tuned. It may increase the complexity of writing such a file system, but can
> be for general purpose (not only for relative small files)
>
>
> On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta <ts...@gmail.com>wrote:
>
>> On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi <bl...@gmail.com> wrote:
>> > Hi,
>> > Cassandra has a good distributed model: decentralized, auto-partition,
>> > auto-recovery. I am evaluating about writing a file system over
>> Cassandra
>> > (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
>> > Cassandra is good at such use case?
>>
>> It sort of depends on what you are looking for. From use case for
>> which something like S3 is good, yes, except with one difference:
>> Cassandra is more geared towards lots of small files, whereas S3 is
>> more geared towards moderate number of files (possibly large).
>>
>> So I think it can definitely be a good use case, and I may use
>> Cassandra for this myself in future. Having range queries allows
>> implementing directory/path structures (list keys using path as
>> prefix). And you can split storage such that metadata could live in
>> OPP partition, raw data in RP.
>>
>> -+ Tatu +-
>>
>
>

Re: Is that possible to write a file system over Cassandra?

Posted by Ken Sandney <bl...@gmail.com>.

Large files can be split into small blocks, and the size of block can be
tuned. It may increase the complexity of writing such a file system, but can
be for general purpose (not only for relative small files)

On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta <ts...@gmail.com>wrote:

> On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi <bl...@gmail.com> wrote:
> > Hi,
> > Cassandra has a good distributed model: decentralized, auto-partition,
> > auto-recovery. I am evaluating about writing a file system over Cassandra
> > (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
> > Cassandra is good at such use case?
>
> It sort of depends on what you are looking for. From use case for
> which something like S3 is good, yes, except with one difference:
> Cassandra is more geared towards lots of small files, whereas S3 is
> more geared towards moderate number of files (possibly large).
>
> So I think it can definitely be a good use case, and I may use
> Cassandra for this myself in future. Having range queries allows
> implementing directory/path structures (list keys using path as
> prefix). And you can split storage such that metadata could live in
> OPP partition, raw data in RP.
>
> -+ Tatu +-
>

Re: Is that possible to write a file system over Cassandra?

Posted by Tatu Saloranta <ts...@gmail.com>.

On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi <bl...@gmail.com> wrote:
> Hi,
> Cassandra has a good distributed model: decentralized, auto-partition,
> auto-recovery. I am evaluating about writing a file system over Cassandra
> (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
> Cassandra is good at such use case?

It sort of depends on what you are looking for. From use case for
which something like S3 is good, yes, except with one difference:
Cassandra is more geared towards lots of small files, whereas S3 is
more geared towards moderate number of files (possibly large).

So I think it can definitely be a good use case, and I may use
Cassandra for this myself in future. Having range queries allows
implementing directory/path structures (list keys using path as
prefix). And you can split storage such that metadata could live in
OPP partition, raw data in RP.

-+ Tatu +-

Re: Is that possible to write a file system over Cassandra?

Posted by Tatu Saloranta <ts...@gmail.com>.

On Fri, Apr 16, 2010 at 4:08 AM, Mark Robson <ma...@gmail.com> wrote:
> On 15 April 2010 02:42, Zhuguo Shi <bl...@gmail.com> wrote:
>>
>> Hi,
>> Cassandra has a good distributed model: decentralized, auto-partition,
>> auto-recovery. I am evaluating about writing a file system over Cassandra
>> (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
>> Cassandra is good at such use case?
>
> I have considered this too.
> I think a FUSE-based filesystem could be made to work over Cassandra;
> initially it could be limited to storing small files (<500M for example) so
> that we could put the entire file contents in one row.
> However a lot of operations are difficult to do no matter how you design it,
> especially renames (e.g. what happens if two nodes rename different files to
> the same name).
> Also the filesystem would not have POSIX conformity, however, would probably
> be able to produce some behaviour which was useful to most applications in
> most cases (think of straightforward document management, uploaded image
> storage, quarantine storage etc).
> Eventual consistency would mean that things which are conventionally atomic
> in POSIX, would not be (e.g. rename) and the user (application) would need
> to tolerate this.
> Depending on how you constructed it, it could be easy to "lose" files which
> continued to be stored, but no longer appears in the filesystem (broken
> link) which then could not be efficiently garbage collected - the typical
> case would be where a file was not completely created (a client node failed)
> or where two files were renamed to the same name (one would be lost, but
> might not get marked as deleted in Cassandra). This would cause a resource
> leak.
> If you can work around these problems, it would be an attractive option for
> many types of application.

I think that even without solving these, it could be an attractive
option, same way as Amazon's S3 is an attractive option.
Operations beyond PUT are not atomic, and access is by full
read/replace; and yet it is enough for many use cases, because access
is fast, maintenance very cheap (i.e. you are not the admin), and,
well, you use it for cases where regular file system properties are
not needed.

So perhaps better way to phrase it would be whether you could build
file-system - like thing on Cassandra, which could be used in lieue of
traditional file system for some tasks.

-+ Tatu +-

Re: Is that possible to write a file system over Cassandra?

Posted by Mark Robson <ma...@gmail.com>.

On 15 April 2010 02:42, Zhuguo Shi <bl...@gmail.com> wrote:

> Hi,
>
> Cassandra has a good distributed model: decentralized, auto-partition,
> auto-recovery. I am evaluating about writing a file system over Cassandra
> (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
> Cassandra is good at such use case?
>

I have considered this too.

I think a FUSE-based filesystem could be made to work over Cassandra;
initially it could be limited to storing small files (<500M for example) so
that we could put the entire file contents in one row.

However a lot of operations are difficult to do no matter how you design it,
especially renames (e.g. what happens if two nodes rename different files to
the same name).

Also the filesystem would not have POSIX conformity, however, would probably
be able to produce some behaviour which was useful to most applications in
most cases (think of straightforward document management, uploaded image
storage, quarantine storage etc).

Eventual consistency would mean that things which are conventionally atomic
in POSIX, would not be (e.g. rename) and the user (application) would need
to tolerate this.

Depending on how you constructed it, it could be easy to "lose" files which
continued to be stored, but no longer appears in the filesystem (broken
link) which then could not be efficiently garbage collected - the typical
case would be where a file was not completely created (a client node failed)
or where two files were renamed to the same name (one would be lost, but
might not get marked as deleted in Cassandra). This would cause a resource
leak.

If you can work around these problems, it would be an attractive option for
many types of application.

Mark