You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by amit handa <am...@gmail.com> on 2009/01/08 11:56:37 UTC

Resend: Standalone HDFS based distributed storage

Resending after holidays .. Any feedback will be great.

On Mon, Dec 29, 2008 at 4:28 PM, amit handa <am...@gmail.com> wrote:
> Hi,
>
> We are evaluating the use of standalone hdfs for one of our projects.
> The file system would be used to store audio,video,images and text
> files for various types of batch processing applications hosted across
> multiple machines and multiple platforms.
>
> I wanted some feedback on what are the best hdfs based options
> (fuse-dfs,hbase or others) that are available given the requirements
> below :
>
> 1.      Data type that is required to be stored is video, audio, images,
> xml and text files.
> 2.      These files needs to be created/accessed/deleted from linux and
> windows machines
> 3.      Nature of data that is to be stored is transient , we store all
> this data for a configurable amount of time (say 2 days) for
> processing across multiple machines and then delete it after
> processing is complete.
> 4.      The data needs to be available as close as possible to the
> processing machines (linux or windows) to reduce network i/o.
> 5.      The no. of files that need to be stored per day is of the order of
> millions. The number of folders that need to be created for storing
> images for a single videos will be in the order of millions
> 6.     The no. of files that need to be deleted per day will be of the
> order of millions as we would be cleaning up the files for whom
> processing has been completed.
> 7.      The file size for audio/video files can range from few KB to few GB.
> 8.      The file permissions that are needed would be at max restricting
> some hosts to access files in a read only v/s read write mode. - good
> to have not a must have requirement
> 9.      The set up can have 200 -600 machines (mix of windows (30%) and
> linux (70%)) each having 250-500 GB hard disk drives
> 10.     File system should be mountable from linux and windows
> machines (via mapping network drive)
>
> Please let me know if you need more details.
>
> Thanks in advance,
> Amit
>