You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rasit OZDAS <ra...@gmail.com> on 2009/01/27 10:20:07 UTC
Using HDFS for common purpose
Hi,
I wanted to ask, if HDFS is a good solution just as a distributed db (no
running jobs, only get and put commands)
A review says that "HDFS is not designed for low latency" and besides, it's
implemented in Java.
Do these disadvantages prevent us using it?
Or could somebody suggest a better (faster) one?
Thanks in advance..
Rasit
Re: Using HDFS for common purpose
Posted by Rasit OZDAS <ra...@gmail.com>.
Today Nitesh has given an answer to a similar thread, that was what I wanted
to learn.
I'm writing it here to help others having same question.
HDFS is a file system for distributed storage typically for distributed
computing scenerio over hadoop. For office purpose you will require a SAN
(Storage Area Network) - an architecture to attach remote computer storage
devices to servers in such a way that, to the operating system, the devices
appear as locally attached. Or you can even go for AmazonS3, if the data is
really authentic. For opensource solution related to SAN, you can go with
any of the linux server distributions (eg. RHEL, SuSE) or Solaris (ZFS +
zones) or perhaps best plug-n-play solution (non-open-source) would be a Mac
Server + XSan.
--nitesh
Thanks,
Rasit
2009/1/28 Rasit OZDAS <ra...@gmail.com>
> Thanks for responses,
>
> Sorry, I made a mistake, it's actually not a db what I wanted. We need a
> simple storage for files. Only get and put commands are enough (no queries
> needed). We don't even need append, chmod, etc.
>
> Probably from a thread on this list, I came across a link to a KFS-HDFS
> comparison:
> http://deliberateambiguity.typepad.com/blog/2007/10/advantages-of-k.html<https://webmail.uzay.tubitak.gov.tr/owa/redir.aspx?C=55b317b7ca7548209f9929c643fcbf93&URL=http%3a%2f%2fdeliberateambiguity.typepad.com%2fblog%2f2007%2f10%2fadvantages-of-k.html>
>
> It's good, that KFS is written in C++, but handling errors in C++ is
> usually more difficult.
> I need your opinion about which one could best fit.
>
> Thanks,
> Rasit
>
> 2009/1/27 Jim Twensky <ji...@gmail.com>
>
> You may also want to have a look at this to reach a decision based on your
>> needs:
>>
>> http://www.swaroopch.com/notes/Distributed_Storage_Systems
>>
>> Jim
>>
>> On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky <ji...@gmail.com>
>> wrote:
>>
>> > Rasit,
>> >
>> > What kind of data will you be storing on Hbase or directly on HDFS? Do
>> you
>> > aim to use it as a data source to do some key/value lookups for small
>> > strings/numbers or do you want to store larger files labeled with some
>> sort
>> > of a key and retrieve them during a map reduce run?
>> >
>> > Jim
>> >
>> >
>> > On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray <jl...@streamy.com>
>> wrote:
>> >
>> >> Perhaps what you are looking for is HBase?
>> >>
>> >> http://hbase.org
>> >>
>> >> HBase is a column-oriented, distributed store that sits on top of HDFS
>> and
>> >> provides random access.
>> >>
>> >> JG
>> >>
>> >> > -----Original Message-----
>> >> > From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
>> >> > Sent: Tuesday, January 27, 2009 1:20 AM
>> >> > To: core-user@hadoop.apache.org
>> >> > Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr
>> ;
>> >> > hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr
>> ;
>> >> > hakan.kocakulak@uzay.tubitak.gov.tr;
>> caglar.bilir@uzay.tubitak.gov.tr
>> >> > Subject: Using HDFS for common purpose
>> >> >
>> >> > Hi,
>> >> > I wanted to ask, if HDFS is a good solution just as a distributed db
>> >> > (no
>> >> > running jobs, only get and put commands)
>> >> > A review says that "HDFS is not designed for low latency" and
>> besides,
>> >> > it's
>> >> > implemented in Java.
>> >> > Do these disadvantages prevent us using it?
>> >> > Or could somebody suggest a better (faster) one?
>> >> >
>> >> > Thanks in advance..
>> >> > Rasit
>> >>
>> >>
>> >
>>
>
>
>
> --
> M. Raşit ÖZDAŞ
>
--
M. Raşit ÖZDAŞ
Re: Using HDFS for common purpose
Posted by Rasit OZDAS <ra...@gmail.com>.
Thanks for responses,
Sorry, I made a mistake, it's actually not a db what I wanted. We need a
simple storage for files. Only get and put commands are enough (no queries
needed). We don't even need append, chmod, etc.
Probably from a thread on this list, I came across a link to a KFS-HDFS
comparison:
http://deliberateambiguity.typepad.com/blog/2007/10/advantages-of-k.html<https://webmail.uzay.tubitak.gov.tr/owa/redir.aspx?C=55b317b7ca7548209f9929c643fcbf93&URL=http%3a%2f%2fdeliberateambiguity.typepad.com%2fblog%2f2007%2f10%2fadvantages-of-k.html>
It's good, that KFS is written in C++, but handling errors in C++ is usually
more difficult.
I need your opinion about which one could best fit.
Thanks,
Rasit
2009/1/27 Jim Twensky <ji...@gmail.com>
> You may also want to have a look at this to reach a decision based on your
> needs:
>
> http://www.swaroopch.com/notes/Distributed_Storage_Systems
>
> Jim
>
> On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky <ji...@gmail.com>
> wrote:
>
> > Rasit,
> >
> > What kind of data will you be storing on Hbase or directly on HDFS? Do
> you
> > aim to use it as a data source to do some key/value lookups for small
> > strings/numbers or do you want to store larger files labeled with some
> sort
> > of a key and retrieve them during a map reduce run?
> >
> > Jim
> >
> >
> > On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray <jl...@streamy.com>
> wrote:
> >
> >> Perhaps what you are looking for is HBase?
> >>
> >> http://hbase.org
> >>
> >> HBase is a column-oriented, distributed store that sits on top of HDFS
> and
> >> provides random access.
> >>
> >> JG
> >>
> >> > -----Original Message-----
> >> > From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
> >> > Sent: Tuesday, January 27, 2009 1:20 AM
> >> > To: core-user@hadoop.apache.org
> >> > Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr;
> >> > hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr;
> >> > hakan.kocakulak@uzay.tubitak.gov.tr; caglar.bilir@uzay.tubitak.gov.tr
> >> > Subject: Using HDFS for common purpose
> >> >
> >> > Hi,
> >> > I wanted to ask, if HDFS is a good solution just as a distributed db
> >> > (no
> >> > running jobs, only get and put commands)
> >> > A review says that "HDFS is not designed for low latency" and besides,
> >> > it's
> >> > implemented in Java.
> >> > Do these disadvantages prevent us using it?
> >> > Or could somebody suggest a better (faster) one?
> >> >
> >> > Thanks in advance..
> >> > Rasit
> >>
> >>
> >
>
--
M. Raşit ÖZDAŞ
Re: Using HDFS for common purpose
Posted by Jim Twensky <ji...@gmail.com>.
You may also want to have a look at this to reach a decision based on your
needs:
http://www.swaroopch.com/notes/Distributed_Storage_Systems
Jim
On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky <ji...@gmail.com> wrote:
> Rasit,
>
> What kind of data will you be storing on Hbase or directly on HDFS? Do you
> aim to use it as a data source to do some key/value lookups for small
> strings/numbers or do you want to store larger files labeled with some sort
> of a key and retrieve them during a map reduce run?
>
> Jim
>
>
> On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray <jl...@streamy.com> wrote:
>
>> Perhaps what you are looking for is HBase?
>>
>> http://hbase.org
>>
>> HBase is a column-oriented, distributed store that sits on top of HDFS and
>> provides random access.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
>> > Sent: Tuesday, January 27, 2009 1:20 AM
>> > To: core-user@hadoop.apache.org
>> > Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr;
>> > hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr;
>> > hakan.kocakulak@uzay.tubitak.gov.tr; caglar.bilir@uzay.tubitak.gov.tr
>> > Subject: Using HDFS for common purpose
>> >
>> > Hi,
>> > I wanted to ask, if HDFS is a good solution just as a distributed db
>> > (no
>> > running jobs, only get and put commands)
>> > A review says that "HDFS is not designed for low latency" and besides,
>> > it's
>> > implemented in Java.
>> > Do these disadvantages prevent us using it?
>> > Or could somebody suggest a better (faster) one?
>> >
>> > Thanks in advance..
>> > Rasit
>>
>>
>
Re: Using HDFS for common purpose
Posted by Jim Twensky <ji...@gmail.com>.
Rasit,
What kind of data will you be storing on Hbase or directly on HDFS? Do you
aim to use it as a data source to do some key/value lookups for small
strings/numbers or do you want to store larger files labeled with some sort
of a key and retrieve them during a map reduce run?
Jim
On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray <jl...@streamy.com> wrote:
> Perhaps what you are looking for is HBase?
>
> http://hbase.org
>
> HBase is a column-oriented, distributed store that sits on top of HDFS and
> provides random access.
>
> JG
>
> > -----Original Message-----
> > From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
> > Sent: Tuesday, January 27, 2009 1:20 AM
> > To: core-user@hadoop.apache.org
> > Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr;
> > hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr;
> > hakan.kocakulak@uzay.tubitak.gov.tr; caglar.bilir@uzay.tubitak.gov.tr
> > Subject: Using HDFS for common purpose
> >
> > Hi,
> > I wanted to ask, if HDFS is a good solution just as a distributed db
> > (no
> > running jobs, only get and put commands)
> > A review says that "HDFS is not designed for low latency" and besides,
> > it's
> > implemented in Java.
> > Do these disadvantages prevent us using it?
> > Or could somebody suggest a better (faster) one?
> >
> > Thanks in advance..
> > Rasit
>
>
RE: Using HDFS for common purpose
Posted by Jonathan Gray <jl...@streamy.com>.
Perhaps what you are looking for is HBase?
http://hbase.org
HBase is a column-oriented, distributed store that sits on top of HDFS and provides random access.
JG
> -----Original Message-----
> From: Rasit OZDAS [mailto:rasitozdas@gmail.com]
> Sent: Tuesday, January 27, 2009 1:20 AM
> To: core-user@hadoop.apache.org
> Cc: arif.yilmaz@uzay.tubitak.gov.tr; emre.gurbuz@uzay.tubitak.gov.tr;
> hilal.tarakci@uzay.tubitak.gov.tr; serdar.arslan@uzay.tubitak.gov.tr;
> hakan.kocakulak@uzay.tubitak.gov.tr; caglar.bilir@uzay.tubitak.gov.tr
> Subject: Using HDFS for common purpose
>
> Hi,
> I wanted to ask, if HDFS is a good solution just as a distributed db
> (no
> running jobs, only get and put commands)
> A review says that "HDFS is not designed for low latency" and besides,
> it's
> implemented in Java.
> Do these disadvantages prevent us using it?
> Or could somebody suggest a better (faster) one?
>
> Thanks in advance..
> Rasit