You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by rab ra <ra...@gmail.com> on 2014/07/10 11:55:25 UTC

multiple map tasks writing in same hdfs file -issue

Hello


I have one use-case that spans multiple map tasks in hadoop environment. I
use hadoop 1.2.1 and with 6 task nodes. Each map task writes their output
into a file stored in hdfs. This file is shared across all the map tasks.
Though, they all computes thier output but some of them are missing in the
output file.



The output file is an excel file with 8 parameters(headings). Each map task
is supposed to compute all these 8 values, and save it as soon as it is
computed. This means, the programming logic of a map task opens the file,
writes the value and close, 8 times.



Can someone give me a hint on whats going wrong here?



Is it possible to make more than one map task to write in a shared file in
HDFS?

Regards
Rab

Re: multiple map tasks writing in same hdfs file -issue

Posted by Arpit Agarwal <aa...@hortonworks.com>.
HDFS is single-writer, multiple-reader (see sec 8.3.1 of
http://aosabook.org/en/hdfs.html). You cannot have multiple writers for a
single file at a time.


On Thu, Jul 10, 2014 at 2:55 AM, rab ra <ra...@gmail.com> wrote:

> Hello
>
>
> I have one use-case that spans multiple map tasks in hadoop environment. I
> use hadoop 1.2.1 and with 6 task nodes. Each map task writes their output
> into a file stored in hdfs. This file is shared across all the map tasks.
> Though, they all computes thier output but some of them are missing in the
> output file.
>
>
>
> The output file is an excel file with 8 parameters(headings). Each map
> task is supposed to compute all these 8 values, and save it as soon as it
> is computed. This means, the programming logic of a map task opens the
> file, writes the value and close, 8 times.
>
>
>
> Can someone give me a hint on whats going wrong here?
>
>
>
> Is it possible to make more than one map task to write in a shared file in
> HDFS?
>
> Regards
> Rab
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: multiple map tasks writing in same hdfs file -issue

Posted by Arpit Agarwal <aa...@hortonworks.com>.
HDFS is single-writer, multiple-reader (see sec 8.3.1 of
http://aosabook.org/en/hdfs.html). You cannot have multiple writers for a
single file at a time.


On Thu, Jul 10, 2014 at 2:55 AM, rab ra <ra...@gmail.com> wrote:

> Hello
>
>
> I have one use-case that spans multiple map tasks in hadoop environment. I
> use hadoop 1.2.1 and with 6 task nodes. Each map task writes their output
> into a file stored in hdfs. This file is shared across all the map tasks.
> Though, they all computes thier output but some of them are missing in the
> output file.
>
>
>
> The output file is an excel file with 8 parameters(headings). Each map
> task is supposed to compute all these 8 values, and save it as soon as it
> is computed. This means, the programming logic of a map task opens the
> file, writes the value and close, 8 times.
>
>
>
> Can someone give me a hint on whats going wrong here?
>
>
>
> Is it possible to make more than one map task to write in a shared file in
> HDFS?
>
> Regards
> Rab
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: multiple map tasks writing in same hdfs file -issue

Posted by Arpit Agarwal <aa...@hortonworks.com>.
HDFS is single-writer, multiple-reader (see sec 8.3.1 of
http://aosabook.org/en/hdfs.html). You cannot have multiple writers for a
single file at a time.


On Thu, Jul 10, 2014 at 2:55 AM, rab ra <ra...@gmail.com> wrote:

> Hello
>
>
> I have one use-case that spans multiple map tasks in hadoop environment. I
> use hadoop 1.2.1 and with 6 task nodes. Each map task writes their output
> into a file stored in hdfs. This file is shared across all the map tasks.
> Though, they all computes thier output but some of them are missing in the
> output file.
>
>
>
> The output file is an excel file with 8 parameters(headings). Each map
> task is supposed to compute all these 8 values, and save it as soon as it
> is computed. This means, the programming logic of a map task opens the
> file, writes the value and close, 8 times.
>
>
>
> Can someone give me a hint on whats going wrong here?
>
>
>
> Is it possible to make more than one map task to write in a shared file in
> HDFS?
>
> Regards
> Rab
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: multiple map tasks writing in same hdfs file -issue

Posted by Arpit Agarwal <aa...@hortonworks.com>.
HDFS is single-writer, multiple-reader (see sec 8.3.1 of
http://aosabook.org/en/hdfs.html). You cannot have multiple writers for a
single file at a time.


On Thu, Jul 10, 2014 at 2:55 AM, rab ra <ra...@gmail.com> wrote:

> Hello
>
>
> I have one use-case that spans multiple map tasks in hadoop environment. I
> use hadoop 1.2.1 and with 6 task nodes. Each map task writes their output
> into a file stored in hdfs. This file is shared across all the map tasks.
> Though, they all computes thier output but some of them are missing in the
> output file.
>
>
>
> The output file is an excel file with 8 parameters(headings). Each map
> task is supposed to compute all these 8 values, and save it as soon as it
> is computed. This means, the programming logic of a map task opens the
> file, writes the value and close, 8 times.
>
>
>
> Can someone give me a hint on whats going wrong here?
>
>
>
> Is it possible to make more than one map task to write in a shared file in
> HDFS?
>
> Regards
> Rab
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.