You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Phantom <gh...@gmail.com> on 2007/06/13 23:49:03 UTC
hdfsOpenFile() API
Hi
Can this only be done for read only and write only mode ? How do I do
appends ? Because if I am using this for writing logs then I would want to
append to the file rather overwrite which is what the write only mode is
doing.
Thanks
A
Re: hdfsOpenFile() API
Posted by Briggs <ac...@gmail.com>.
Yeah, you are right about the google fs.
I have also heard from this list that some people are planning on
adding the append functionality to Hadoop, but it's just not there
yet. I am not sure why.
Perhaps my "inefficient" comment was premature. The term logging
stuck in my head and I have preconceived ideas of what you are doing.
I am thinking that continuously writing extremely small chucks to a
distributed file system would cause a lot of latency that would
probably slow your system down considerably. But again, I am not sure
of your situation.
As for the way hadoop is now, you would have to "copyFromLocal", which
probably sucks in your situation. I can understand your pain in this
area.
Anyone else have any ideas?
On 6/13/07, Phantom <gh...@gmail.com> wrote:
> Hmm I was under the impression that HDFS is like GFS optimized for appends
> although GFS supports random writes. So let's say I want to process logs
> using Hadoop. The only way I can do it is to move the entire log into Hadoop
> from some place else and then perhaps run Map/Reduce jobs against it. It
> seems to kind defeat the purpose. Am I missing something ?
>
> Thanks
> A
>
> On 6/13/07, Briggs <ac...@gmail.com> wrote:
> >
> > No appending, AFAIK. Hadoop is not intended for writing in this way.
> > It's more of a write few read many system. Such granular writes would
> > be inefficient.
> >
> > On 6/13/07, Phantom <gh...@gmail.com> wrote:
> > > Hi
> > >
> > > Can this only be done for read only and write only mode ? How do I do
> > > appends ? Because if I am using this for writing logs then I would want
> > to
> > > append to the file rather overwrite which is what the write only mode is
> > > doing.
> > >
> > > Thanks
> > > A
> > >
> >
> >
> > --
> > "Conscious decisions by conscious minds are what make reality real"
> >
>
--
"Conscious decisions by conscious minds are what make reality real"
Re: hdfsOpenFile() API
Posted by Doug Cutting <cu...@apache.org>.
Phantom wrote:
> Which would mean that if I want to have my logs to reside in HDFS I will
> have to move them using copyFromLocal or some version thereof and then run
> Map/Reduce process against them ? Am I right ?
Yes. HDFS is probably not currently suitable for directly storing log
output as it is generated. But I don't think append is actually the
missing feature you need. Rather, the problem is that, currently in
HDFS, until a file is closed, it does not exist. So if your server
crashes and does not close its log, the log would disappear, which is
probably not what you'd want.
If copying log files to HDFS is prohibitive, an alternative might be to
make them available via HTTP and to write an HttpFileSystem where they
could be accessed directly as MapReduce inputs (assuming that's what).
An HttpFileSystem should be easy to implement and would be useful for
lots of things. It need not implement things like 'delete' and 'rename'
or even 'create', but rather just 'open' and 'list', so it could only be
used for inputs.
Doug
Re: hdfsOpenFile() API
Posted by Phantom <gh...@gmail.com>.
Which would mean that if I want to have my logs to reside in HDFS I will
have to move them using copyFromLocal or some version thereof and then run
Map/Reduce process against them ? Am I right ?
Thanks
Avinash
On 6/13/07, Owen O'Malley <oo...@yahoo-inc.com> wrote:
>
>
> On Jun 13, 2007, at 3:29 PM, Phantom wrote:
>
> > Hmm I was under the impression that HDFS is like GFS optimized for
> > appends
> > although GFS supports random writes.
>
> HDFS doesn't support appends. There has been discussion of
> implementing single-writer appends, but it hasn't reached the top of
> anyone's priority list. Some people (me included) aren't thrilled by
> the semantics of atomic append in GFS. To me, it seems like atomic
> append is basically a poor-man's map/reduce. *smile*
>
> -- Owen
>
Re: hdfsOpenFile() API
Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Jun 13, 2007, at 3:29 PM, Phantom wrote:
> Hmm I was under the impression that HDFS is like GFS optimized for
> appends
> although GFS supports random writes.
HDFS doesn't support appends. There has been discussion of
implementing single-writer appends, but it hasn't reached the top of
anyone's priority list. Some people (me included) aren't thrilled by
the semantics of atomic append in GFS. To me, it seems like atomic
append is basically a poor-man's map/reduce. *smile*
-- Owen
Re: hdfsOpenFile() API
Posted by Phantom <gh...@gmail.com>.
Hmm I was under the impression that HDFS is like GFS optimized for appends
although GFS supports random writes. So let's say I want to process logs
using Hadoop. The only way I can do it is to move the entire log into Hadoop
from some place else and then perhaps run Map/Reduce jobs against it. It
seems to kind defeat the purpose. Am I missing something ?
Thanks
A
On 6/13/07, Briggs <ac...@gmail.com> wrote:
>
> No appending, AFAIK. Hadoop is not intended for writing in this way.
> It's more of a write few read many system. Such granular writes would
> be inefficient.
>
> On 6/13/07, Phantom <gh...@gmail.com> wrote:
> > Hi
> >
> > Can this only be done for read only and write only mode ? How do I do
> > appends ? Because if I am using this for writing logs then I would want
> to
> > append to the file rather overwrite which is what the write only mode is
> > doing.
> >
> > Thanks
> > A
> >
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>
Re: hdfsOpenFile() API
Posted by Briggs <ac...@gmail.com>.
No appending, AFAIK. Hadoop is not intended for writing in this way.
It's more of a write few read many system. Such granular writes would
be inefficient.
On 6/13/07, Phantom <gh...@gmail.com> wrote:
> Hi
>
> Can this only be done for read only and write only mode ? How do I do
> appends ? Because if I am using this for writing logs then I would want to
> append to the file rather overwrite which is what the write only mode is
> doing.
>
> Thanks
> A
>
--
"Conscious decisions by conscious minds are what make reality real"