You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Phantom <gh...@gmail.com> on 2007/06/13 23:49:03 UTC

hdfsOpenFile() API

Hi

Can this only be done for read only and write only mode ? How do I do
appends ? Because if I am using this for writing logs then I would want to
append to the file rather overwrite which is what the write only mode is
doing.

Thanks
A

Re: hdfsOpenFile() API

Posted by Briggs <ac...@gmail.com>.
Yeah, you are right about the google fs.

I have also heard from this list that some people are planning on
adding the append functionality to Hadoop, but it's just not there
yet.  I am not sure why.

Perhaps my "inefficient" comment was premature.  The term logging
stuck in my head and I have preconceived ideas of what you are doing.
I am thinking that continuously writing extremely small chucks to a
distributed file system would cause a lot of latency that would
probably slow your system down considerably. But again, I am not sure
of your situation.

As for the way hadoop is now, you would have to "copyFromLocal", which
probably sucks in your situation.  I can understand your pain in this
area.

Anyone else have any ideas?


On 6/13/07, Phantom <gh...@gmail.com> wrote:
> Hmm I was under the impression that HDFS is like GFS optimized for appends
> although GFS supports random writes. So let's say I want to process logs
> using Hadoop. The only way I can do it is to move the entire log into Hadoop
> from some place else and then perhaps run Map/Reduce jobs against it. It
> seems to kind defeat the purpose. Am I missing something ?
>
> Thanks
> A
>
> On 6/13/07, Briggs <ac...@gmail.com> wrote:
> >
> > No appending, AFAIK.  Hadoop is not intended for writing in this way.
> > It's more of a write few read many system. Such granular writes would
> > be inefficient.
> >
> > On 6/13/07, Phantom <gh...@gmail.com> wrote:
> > > Hi
> > >
> > > Can this only be done for read only and write only mode ? How do I do
> > > appends ? Because if I am using this for writing logs then I would want
> > to
> > > append to the file rather overwrite which is what the write only mode is
> > > doing.
> > >
> > > Thanks
> > > A
> > >
> >
> >
> > --
> > "Conscious decisions by conscious minds are what make reality real"
> >
>


-- 
"Conscious decisions by conscious minds are what make reality real"

Re: hdfsOpenFile() API

Posted by Doug Cutting <cu...@apache.org>.
Phantom wrote:
> Which would mean that if I want to have my logs to reside in HDFS I will
> have to move them using copyFromLocal or some version thereof and then run
> Map/Reduce process against them ? Am I right ?

Yes.  HDFS is probably not currently suitable for directly storing log 
output as it is generated.  But I don't think append is actually the 
missing feature you need.  Rather, the problem is that, currently in 
HDFS, until a file is closed, it does not exist.  So if your server 
crashes and does not close its log, the log would disappear, which is 
probably not what you'd want.

If copying log files to HDFS is prohibitive, an alternative might be to 
make them available via HTTP and to write an HttpFileSystem where they 
could be accessed directly as MapReduce inputs (assuming that's what). 
An HttpFileSystem should be easy to implement and would be useful for 
lots of things.  It need not implement things like 'delete' and 'rename' 
or even 'create', but rather just 'open' and 'list', so it could only be 
used for inputs.

Doug

Re: hdfsOpenFile() API

Posted by Phantom <gh...@gmail.com>.
Which would mean that if I want to have my logs to reside in HDFS I will
have to move them using copyFromLocal or some version thereof and then run
Map/Reduce process against them ? Am I right ?

Thanks
Avinash

On 6/13/07, Owen O'Malley <oo...@yahoo-inc.com> wrote:
>
>
> On Jun 13, 2007, at 3:29 PM, Phantom wrote:
>
> > Hmm I was under the impression that HDFS is like GFS optimized for
> > appends
> > although GFS supports random writes.
>
> HDFS doesn't support appends. There has been discussion of
> implementing single-writer appends, but it hasn't reached the top of
> anyone's priority list. Some people (me included) aren't thrilled by
> the semantics of atomic append in GFS. To me, it seems like atomic
> append is basically a poor-man's map/reduce. *smile*
>
> -- Owen
>

Re: hdfsOpenFile() API

Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Jun 13, 2007, at 3:29 PM, Phantom wrote:

> Hmm I was under the impression that HDFS is like GFS optimized for  
> appends
> although GFS supports random writes.

HDFS doesn't support appends. There has been discussion of  
implementing single-writer appends, but it hasn't reached the top of  
anyone's priority list. Some people (me included) aren't thrilled by  
the semantics of atomic append in GFS. To me, it seems like atomic  
append is basically a poor-man's map/reduce. *smile*

-- Owen

Re: hdfsOpenFile() API

Posted by Phantom <gh...@gmail.com>.
Hmm I was under the impression that HDFS is like GFS optimized for appends
although GFS supports random writes. So let's say I want to process logs
using Hadoop. The only way I can do it is to move the entire log into Hadoop
from some place else and then perhaps run Map/Reduce jobs against it. It
seems to kind defeat the purpose. Am I missing something ?

Thanks
A

On 6/13/07, Briggs <ac...@gmail.com> wrote:
>
> No appending, AFAIK.  Hadoop is not intended for writing in this way.
> It's more of a write few read many system. Such granular writes would
> be inefficient.
>
> On 6/13/07, Phantom <gh...@gmail.com> wrote:
> > Hi
> >
> > Can this only be done for read only and write only mode ? How do I do
> > appends ? Because if I am using this for writing logs then I would want
> to
> > append to the file rather overwrite which is what the write only mode is
> > doing.
> >
> > Thanks
> > A
> >
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>

Re: hdfsOpenFile() API

Posted by Briggs <ac...@gmail.com>.
No appending, AFAIK.  Hadoop is not intended for writing in this way.
It's more of a write few read many system. Such granular writes would
be inefficient.

On 6/13/07, Phantom <gh...@gmail.com> wrote:
> Hi
>
> Can this only be done for read only and write only mode ? How do I do
> appends ? Because if I am using this for writing logs then I would want to
> append to the file rather overwrite which is what the write only mode is
> doing.
>
> Thanks
> A
>


-- 
"Conscious decisions by conscious minds are what make reality real"