You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rasit OZDAS <ra...@gmail.com> on 2009/01/27 10:20:07 UTC

Using HDFS for common purpose

Hi,
I wanted to ask, if HDFS is a good solution just as a distributed db (no
running jobs, only get and put commands)
A review says that "HDFS is not designed for low latency" and besides, it's
implemented in Java.
Do these disadvantages prevent us using it?
Or could somebody suggest a better (faster) one?

Thanks in advance..
Rasit

Re: Using HDFS for common purpose

Posted by Rasit OZDAS <ra...@gmail.com>.
Today Nitesh has given an answer to a similar thread, that was what I wanted
to learn.
I'm writing it here to help others having same question.

HDFS is a file system for distributed storage typically for distributed
computing scenerio over hadoop. For office purpose you will require a SAN
(Storage Area Network) - an architecture to attach remote computer storage
devices to servers in such a way that, to the operating system, the devices
appear as locally attached. Or you can even go for AmazonS3, if the data is
really authentic. For opensource solution related to SAN, you can go with
any of the linux server distributions (eg. RHEL, SuSE) or Solaris (ZFS +
zones) or perhaps best plug-n-play solution (non-open-source) would be a Mac
Server + XSan.

--nitesh

Thanks,
Rasit

2009/1/28 Rasit OZDAS <ra...@gmail.com>

> Thanks for responses,
>
> Sorry, I made a mistake, it's actually not a db what I wanted. We need a
> simple storage for files. Only get and put commands are enough (no queries
> needed). We don't even need append, chmod, etc.
>
> Probably from a thread on this list, I came across a link to a KFS-HDFS
> comparison:
> http://deliberateambiguity.typepad.com/blog/2007/10/advantages-of-k.html<https://webmail.uzay.tubitak.gov.tr/owa/redir.aspx?C=55b317b7ca7548209f9929c643fcbf93&URL=http%3a%2f%2fdeliberateambiguity.typepad.com%2fblog%2f2007%2f10%2fadvantages-of-k.html>
>
> It's good, that KFS is written in C++, but handling errors in C++ is
> usually more difficult.
> I need your opinion about which one could best fit.
>
> Thanks,
> Rasit
>
> 2009/1/27 Jim Twensky <ji...@gmail.com>
>
> You may also want to have a look at this to reach a decision based on your
>> needs:
>>
>> http://www.swaroopch.com/notes/Distributed_Storage_Systems
>>
>> Jim
>>
>> On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky <ji...@gmail.com>
>> wrote:
>>
>> > Rasit,
>> >
>> > What kind of data will you be storing on Hbase or directly on HDFS? Do
>> you
>> > aim to use it as a data source to do some key/value lookups for small
>> > strings/numbers or do you want to store larger files labeled with some
>> sort
>> > of a key and retrieve them during a map reduce run?
>> >
>> > Jim
>> >
>> >
>> > On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray <jl...@streamy.com>
>> wrote:
>> >
>> >> Perhaps what you are looking for is HBase?
>> >>
>> >> http://hbase.org
>> >>
>> >> HBase is a column-oriented, distributed store that sits on top of HDFS
>> and
>> >> provides random access.
>> >>
>> >> JG
>> >>
>> >> > -----Original Message-----
>> >> > From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
>> >> > Sent: Tuesday, January 27, 2009 1:20 AM
>> >> > To: core-user@hadoop.apache.org
>> >> > Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr
>> ;
>> >> > hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr
>> ;
>> >> > hakan.kocakulak@uzay.tubitak.gov.tr;
>> caglar.bilir@uzay.tubitak.gov.tr
>> >> > Subject: Using HDFS for common purpose
>> >> >
>> >> > Hi,
>> >> > I wanted to ask, if HDFS is a good solution just as a distributed db
>> >> > (no
>> >> > running jobs, only get and put commands)
>> >> > A review says that "HDFS is not designed for low latency" and
>> besides,
>> >> > it's
>> >> > implemented in Java.
>> >> > Do these disadvantages prevent us using it?
>> >> > Or could somebody suggest a better (faster) one?
>> >> >
>> >> > Thanks in advance..
>> >> > Rasit
>> >>
>> >>
>> >
>>
>
>
>
> --
> M. Raşit ÖZDAŞ
>



-- 
M. Raşit ÖZDAŞ

Re: Using HDFS for common purpose

Posted by Rasit OZDAS <ra...@gmail.com>.
Thanks for responses,

Sorry, I made a mistake, it's actually not a db what I wanted. We need a
simple storage for files. Only get and put commands are enough (no queries
needed). We don't even need append, chmod, etc.

Probably from a thread on this list, I came across a link to a KFS-HDFS
comparison:
http://deliberateambiguity.typepad.com/blog/2007/10/advantages-of-k.html<https://webmail.uzay.tubitak.gov.tr/owa/redir.aspx?C=55b317b7ca7548209f9929c643fcbf93&URL=http%3a%2f%2fdeliberateambiguity.typepad.com%2fblog%2f2007%2f10%2fadvantages-of-k.html>

It's good, that KFS is written in C++, but handling errors in C++ is usually
more difficult.
I need your opinion about which one could best fit.

Thanks,
Rasit

2009/1/27 Jim Twensky <ji...@gmail.com>

> You may also want to have a look at this to reach a decision based on your
> needs:
>
> http://www.swaroopch.com/notes/Distributed_Storage_Systems
>
> Jim
>
> On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky <ji...@gmail.com>
> wrote:
>
> > Rasit,
> >
> > What kind of data will you be storing on Hbase or directly on HDFS? Do
> you
> > aim to use it as a data source to do some key/value lookups for small
> > strings/numbers or do you want to store larger files labeled with some
> sort
> > of a key and retrieve them during a map reduce run?
> >
> > Jim
> >
> >
> > On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray <jl...@streamy.com>
> wrote:
> >
> >> Perhaps what you are looking for is HBase?
> >>
> >> http://hbase.org
> >>
> >> HBase is a column-oriented, distributed store that sits on top of HDFS
> and
> >> provides random access.
> >>
> >> JG
> >>
> >> > -----Original Message-----
> >> > From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
> >> > Sent: Tuesday, January 27, 2009 1:20 AM
> >> > To: core-user@hadoop.apache.org
> >> > Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr;
> >> > hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr;
> >> > hakan.kocakulak@uzay.tubitak.gov.tr; caglar.bilir@uzay.tubitak.gov.tr
> >> > Subject: Using HDFS for common purpose
> >> >
> >> > Hi,
> >> > I wanted to ask, if HDFS is a good solution just as a distributed db
> >> > (no
> >> > running jobs, only get and put commands)
> >> > A review says that "HDFS is not designed for low latency" and besides,
> >> > it's
> >> > implemented in Java.
> >> > Do these disadvantages prevent us using it?
> >> > Or could somebody suggest a better (faster) one?
> >> >
> >> > Thanks in advance..
> >> > Rasit
> >>
> >>
> >
>



-- 
M. Raşit ÖZDAŞ

Re: Using HDFS for common purpose

Posted by Jim Twensky <ji...@gmail.com>.
You may also want to have a look at this to reach a decision based on your
needs:

http://www.swaroopch.com/notes/Distributed_Storage_Systems

Jim

On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky <ji...@gmail.com> wrote:

> Rasit,
>
> What kind of data will you be storing on Hbase or directly on HDFS? Do you
> aim to use it as a data source to do some key/value lookups for small
> strings/numbers or do you want to store larger files labeled with some sort
> of a key and retrieve them during a map reduce run?
>
> Jim
>
>
> On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray <jl...@streamy.com> wrote:
>
>> Perhaps what you are looking for is HBase?
>>
>> http://hbase.org
>>
>> HBase is a column-oriented, distributed store that sits on top of HDFS and
>> provides random access.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
>> > Sent: Tuesday, January 27, 2009 1:20 AM
>> > To: core-user@hadoop.apache.org
>> > Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr;
>> > hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr;
>> > hakan.kocakulak@uzay.tubitak.gov.tr; caglar.bilir@uzay.tubitak.gov.tr
>> > Subject: Using HDFS for common purpose
>> >
>> > Hi,
>> > I wanted to ask, if HDFS is a good solution just as a distributed db
>> > (no
>> > running jobs, only get and put commands)
>> > A review says that "HDFS is not designed for low latency" and besides,
>> > it's
>> > implemented in Java.
>> > Do these disadvantages prevent us using it?
>> > Or could somebody suggest a better (faster) one?
>> >
>> > Thanks in advance..
>> > Rasit
>>
>>
>

Re: Using HDFS for common purpose

Posted by Jim Twensky <ji...@gmail.com>.
Rasit,

What kind of data will you be storing on Hbase or directly on HDFS? Do you
aim to use it as a data source to do some key/value lookups for small
strings/numbers or do you want to store larger files labeled with some sort
of a key and retrieve them during a map reduce run?

Jim

On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray <jl...@streamy.com> wrote:

> Perhaps what you are looking for is HBase?
>
> http://hbase.org
>
> HBase is a column-oriented, distributed store that sits on top of HDFS and
> provides random access.
>
> JG
>
> > -----Original Message-----
> > From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
> > Sent: Tuesday, January 27, 2009 1:20 AM
> > To: core-user@hadoop.apache.org
> > Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr;
> > hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr;
> > hakan.kocakulak@uzay.tubitak.gov.tr; caglar.bilir@uzay.tubitak.gov.tr
> > Subject: Using HDFS for common purpose
> >
> > Hi,
> > I wanted to ask, if HDFS is a good solution just as a distributed db
> > (no
> > running jobs, only get and put commands)
> > A review says that "HDFS is not designed for low latency" and besides,
> > it's
> > implemented in Java.
> > Do these disadvantages prevent us using it?
> > Or could somebody suggest a better (faster) one?
> >
> > Thanks in advance..
> > Rasit
>
>

RE: Using HDFS for common purpose

Posted by Jonathan Gray <jl...@streamy.com>.
Perhaps what you are looking for is HBase?

http://hbase.org

HBase is a column-oriented, distributed store that sits on top of HDFS and provides random access.

JG

> -----Original Message-----
> From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
> Sent: Tuesday, January 27, 2009 1:20 AM
> To: core-user@hadoop.apache.org
> Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr;
> hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr;
> hakan.kocakulak@uzay.tubitak.gov.tr; caglar.bilir@uzay.tubitak.gov.tr
> Subject: Using HDFS for common purpose
> 
> Hi,
> I wanted to ask, if HDFS is a good solution just as a distributed db
> (no
> running jobs, only get and put commands)
> A review says that "HDFS is not designed for low latency" and besides,
> it's
> implemented in Java.
> Do these disadvantages prevent us using it?
> Or could somebody suggest a better (faster) one?
> 
> Thanks in advance..
> Rasit