You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Niv Mizrahi <ni...@taykey.com> on 2012/03/04 15:37:03 UTC

map-reduce on a none closed files

hi all,

 we are looking for a way, to map-reduce on a *non-closed files*.
 we currently able to run a
hadoop fs -cat <non-closed-file>

*non-closed files* - files that are currently been written, and have not
been closed yet.

is there any way to run map-reduce a on *non-closed files* ??


10x in advance for any answer
-- 
*Niv Mizrahi*
Taykey | www.taykey.com

Re: map-reduce on a none closed files

Posted by Niv Mizrahi <ni...@taykey.com>.

hi harsh,

yes thank you, we are using sync() API, and still unable to read unclosed
files in mapreduce.
we are able to cat non-closed files, was it possible if we haven't use sync
API() call?

have anybody tried ruing a M/R on a non-closed files ?
are we missing something ?

10x
Niv



On Mon, Mar 5, 2012 at 3:42 PM, Harsh J <ha...@cloudera.com> wrote:

> Niv,
>
> Did you also try the sync() approach I mentioned? Did that not work?
> CDH3u2 does have the sync() API in it, so you can use it right away.
>
> On Sun, Mar 4, 2012 at 11:26 PM, Niv Mizrahi <ni...@taykey.com> wrote:
> > hi harsh,
> >
> > thank you for the quick response.
> > we are currently running with cdh3u2.
> >
> > i have run map-reduces in many forms on non-closed files:
> >  1. streaming -mapper /bin/cat
> >  2. run word count
> >  3. run our own java job.
> >
> > output parts are always empty, the jobs ended successfully.
> >
> > running hadoop fs -cat on the same input return results.
> >
> > am i doing something wrong ?
> >
> > niv
> >
> >
> >
> > On Sun, Mar 4, 2012 at 6:49 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Technically, yes, you can run MR jobs on non-closed files (It'll run
> >> the reader in the same way as your -cat) , but your would only be able
> >> to read until the last complete block, or until the point sync() was
> >> called on the output stream.
> >>
> >> It is better if your file-writer uses the sync() API judiciously to
> >> mark sync points after a considerable amount of records, so that your
> >> MR readers in tasks read until whole records and not just block
> >> boundaries.
> >>
> >> For a description on sync() API, read the section 'Coherency Model' in
> >> Tom White's "Hadoop: The Definitive Guide" (O'Reilly), Page 68.
> >>
> >> On Sun, Mar 4, 2012 at 8:07 PM, Niv Mizrahi <ni...@taykey.com> wrote:
> >> > hi all,
> >> >
> >> >  we are looking for a way, to map-reduce on a non-closed files.
> >> >  we currently able to run a
> >> > hadoop fs -cat <non-closed-file>
> >> >
> >> > non-closed files - files that are currently been written, and have not
> >> > been
> >> > closed yet.
> >> >
> >> > is there any way to run map-reduce a on non-closed files ??
> >> >
> >> >
> >> > 10x in advance for any answer
> >> > --
> >> > Niv Mizrahi
> >> > Taykey | www.taykey.com
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
> >
> >
> > --
> > Niv Mizrahi
> > Taykey | www.taykey.com
> >
>
>
>
> --
> Harsh J
>



-- 
*Niv Mizrahi*
Taykey | www.taykey.com

Re: map-reduce on a none closed files

Posted by Harsh J <ha...@cloudera.com>.

Niv,

Did you also try the sync() approach I mentioned? Did that not work?
CDH3u2 does have the sync() API in it, so you can use it right away.

On Sun, Mar 4, 2012 at 11:26 PM, Niv Mizrahi <ni...@taykey.com> wrote:
> hi harsh,
>
> thank you for the quick response.
> we are currently running with cdh3u2.
>
> i have run map-reduces in many forms on non-closed files:
>  1. streaming -mapper /bin/cat
>  2. run word count
>  3. run our own java job.
>
> output parts are always empty, the jobs ended successfully.
>
> running hadoop fs -cat on the same input return results.
>
> am i doing something wrong ?
>
> niv
>
>
>
> On Sun, Mar 4, 2012 at 6:49 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Technically, yes, you can run MR jobs on non-closed files (It'll run
>> the reader in the same way as your -cat) , but your would only be able
>> to read until the last complete block, or until the point sync() was
>> called on the output stream.
>>
>> It is better if your file-writer uses the sync() API judiciously to
>> mark sync points after a considerable amount of records, so that your
>> MR readers in tasks read until whole records and not just block
>> boundaries.
>>
>> For a description on sync() API, read the section 'Coherency Model' in
>> Tom White's "Hadoop: The Definitive Guide" (O'Reilly), Page 68.
>>
>> On Sun, Mar 4, 2012 at 8:07 PM, Niv Mizrahi <ni...@taykey.com> wrote:
>> > hi all,
>> >
>> >  we are looking for a way, to map-reduce on a non-closed files.
>> >  we currently able to run a
>> > hadoop fs -cat <non-closed-file>
>> >
>> > non-closed files - files that are currently been written, and have not
>> > been
>> > closed yet.
>> >
>> > is there any way to run map-reduce a on non-closed files ??
>> >
>> >
>> > 10x in advance for any answer
>> > --
>> > Niv Mizrahi
>> > Taykey | www.taykey.com
>> >
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Niv Mizrahi
> Taykey | www.taykey.com
>



-- 
Harsh J

Re: map-reduce on a none closed files

Posted by Niv Mizrahi <ni...@taykey.com>.

hi harsh,

thank you for the quick response.
we are currently running with cdh3u2.

i have run map-reduces in many forms on non-closed files:
 1. streaming -mapper /bin/cat
 2. run word count
 3. run our own java job.

output parts are always empty, the jobs ended successfully.

running hadoop fs -cat on the same input return results.

am i doing something wrong ?

niv



On Sun, Mar 4, 2012 at 6:49 PM, Harsh J <ha...@cloudera.com> wrote:

> Technically, yes, you can run MR jobs on non-closed files (It'll run
> the reader in the same way as your -cat) , but your would only be able
> to read until the last complete block, or until the point sync() was
> called on the output stream.
>
> It is better if your file-writer uses the sync() API judiciously to
> mark sync points after a considerable amount of records, so that your
> MR readers in tasks read until whole records and not just block
> boundaries.
>
> For a description on sync() API, read the section 'Coherency Model' in
> Tom White's "Hadoop: The Definitive Guide" (O'Reilly), Page 68.
>
> On Sun, Mar 4, 2012 at 8:07 PM, Niv Mizrahi <ni...@taykey.com> wrote:
> > hi all,
> >
> >  we are looking for a way, to map-reduce on a non-closed files.
> >  we currently able to run a
> > hadoop fs -cat <non-closed-file>
> >
> > non-closed files - files that are currently been written, and have not
> been
> > closed yet.
> >
> > is there any way to run map-reduce a on non-closed files ??
> >
> >
> > 10x in advance for any answer
> > --
> > Niv Mizrahi
> > Taykey | www.taykey.com
> >
>
>
>
> --
> Harsh J
>



-- 
*Niv Mizrahi*
Taykey | www.taykey.com

Re: map-reduce on a none closed files

Posted by Harsh J <ha...@cloudera.com>.

Technically, yes, you can run MR jobs on non-closed files (It'll run
the reader in the same way as your -cat) , but your would only be able
to read until the last complete block, or until the point sync() was
called on the output stream.

It is better if your file-writer uses the sync() API judiciously to
mark sync points after a considerable amount of records, so that your
MR readers in tasks read until whole records and not just block
boundaries.

For a description on sync() API, read the section 'Coherency Model' in
Tom White's "Hadoop: The Definitive Guide" (O'Reilly), Page 68.

On Sun, Mar 4, 2012 at 8:07 PM, Niv Mizrahi <ni...@taykey.com> wrote:
> hi all,
>
>  we are looking for a way, to map-reduce on a non-closed files.
>  we currently able to run a
> hadoop fs -cat <non-closed-file>
>
> non-closed files - files that are currently been written, and have not been
> closed yet.
>
> is there any way to run map-reduce a on non-closed files ??
>
>
> 10x in advance for any answer
> --
> Niv Mizrahi
> Taykey | www.taykey.com
>

-- 
Harsh J