You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Raghavendra Chandra <ra...@gmail.com> on 2015/04/02 05:59:14 UTC

How to append the contents to a output file

Dear Team,

I am trying to append the contents to a reducer output file using multiple
output.

My requirement is to write the reducer output to mutiple folders and the
data must be appended to the existing content.

Now I have used the custom output format by extending the Text output
format class and able to write the data into multiple folders but the issue
I am facing is, it is overwriting the data in the files but I would rather
want it to append the data to the output files.

Please let me know how to handle this situiation.

Thanks and regards,

Raghav Chandra

Re: How to append the contents to a output file

Posted by Shahab Yunus <sh...@gmail.com>.

I hope I understood your requirement correctly.

If your requirement is to write into multiple folders from the reducers AND
in each folder append the data in the file in that folder, right?

Reducer-output=
folder1/file1
folder2/file2
....

This can be done with standard MultipleOutputFormat and the framework will
write data into each folder and make sure it is appended in that file. You
don't need to write your own.
Have you seen this?
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

If the issue is that in each folder you want ONE file for all the reducers,
then that you have to do yourself but post-job merge. One option is to use
the FileUtil.copyMerge (
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileUtil.html)
method to achieve this once hte job is finished.

Regards,
Shahab

On Wed, Apr 1, 2015 at 11:59 PM, Raghavendra Chandra <
raghavchandra.learning@gmail.com> wrote:

> Dear Team,
>
> I am trying to append the contents to a reducer output file using multiple
> output.
>
> My requirement is to write the reducer output to mutiple folders and the
> data must be appended to the existing content.
>
> Now I have used the custom output format by extending the Text output
> format class and able to write the data into multiple folders but the issue
> I am facing is, it is overwriting the data in the files but I would rather
> want it to append the data to the output files.
>
> Please let me know how to handle this situiation.
>
> Thanks and regards,
>
> Raghav Chandra
>

Re: How to append the contents to a output file

Posted by Shahab Yunus <sh...@gmail.com>.

I hope I understood your requirement correctly.

If your requirement is to write into multiple folders from the reducers AND
in each folder append the data in the file in that folder, right?

Reducer-output=
folder1/file1
folder2/file2
....

This can be done with standard MultipleOutputFormat and the framework will
write data into each folder and make sure it is appended in that file. You
don't need to write your own.
Have you seen this?
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

If the issue is that in each folder you want ONE file for all the reducers,
then that you have to do yourself but post-job merge. One option is to use
the FileUtil.copyMerge (
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileUtil.html)
method to achieve this once hte job is finished.

Regards,
Shahab

On Wed, Apr 1, 2015 at 11:59 PM, Raghavendra Chandra <
raghavchandra.learning@gmail.com> wrote:

> Dear Team,
>
> I am trying to append the contents to a reducer output file using multiple
> output.
>
> My requirement is to write the reducer output to mutiple folders and the
> data must be appended to the existing content.
>
> Now I have used the custom output format by extending the Text output
> format class and able to write the data into multiple folders but the issue
> I am facing is, it is overwriting the data in the files but I would rather
> want it to append the data to the output files.
>
> Please let me know how to handle this situiation.
>
> Thanks and regards,
>
> Raghav Chandra
>

Re: How to append the contents to a output file

Posted by Shahab Yunus <sh...@gmail.com>.

I hope I understood your requirement correctly.

If your requirement is to write into multiple folders from the reducers AND
in each folder append the data in the file in that folder, right?

Reducer-output=
folder1/file1
folder2/file2
....

This can be done with standard MultipleOutputFormat and the framework will
write data into each folder and make sure it is appended in that file. You
don't need to write your own.
Have you seen this?
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

If the issue is that in each folder you want ONE file for all the reducers,
then that you have to do yourself but post-job merge. One option is to use
the FileUtil.copyMerge (
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileUtil.html)
method to achieve this once hte job is finished.

Regards,
Shahab

On Wed, Apr 1, 2015 at 11:59 PM, Raghavendra Chandra <
raghavchandra.learning@gmail.com> wrote:

> Dear Team,
>
> I am trying to append the contents to a reducer output file using multiple
> output.
>
> My requirement is to write the reducer output to mutiple folders and the
> data must be appended to the existing content.
>
> Now I have used the custom output format by extending the Text output
> format class and able to write the data into multiple folders but the issue
> I am facing is, it is overwriting the data in the files but I would rather
> want it to append the data to the output files.
>
> Please let me know how to handle this situiation.
>
> Thanks and regards,
>
> Raghav Chandra
>

Re: How to append the contents to a output file

Posted by Shahab Yunus <sh...@gmail.com>.

I hope I understood your requirement correctly.

If your requirement is to write into multiple folders from the reducers AND
in each folder append the data in the file in that folder, right?

Reducer-output=
folder1/file1
folder2/file2
....

This can be done with standard MultipleOutputFormat and the framework will
write data into each folder and make sure it is appended in that file. You
don't need to write your own.
Have you seen this?
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

If the issue is that in each folder you want ONE file for all the reducers,
then that you have to do yourself but post-job merge. One option is to use
the FileUtil.copyMerge (
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileUtil.html)
method to achieve this once hte job is finished.

Regards,
Shahab

On Wed, Apr 1, 2015 at 11:59 PM, Raghavendra Chandra <
raghavchandra.learning@gmail.com> wrote:

> Dear Team,
>
> I am trying to append the contents to a reducer output file using multiple
> output.
>
> My requirement is to write the reducer output to mutiple folders and the
> data must be appended to the existing content.
>
> Now I have used the custom output format by extending the Text output
> format class and able to write the data into multiple folders but the issue
> I am facing is, it is overwriting the data in the files but I would rather
> want it to append the data to the output files.
>
> Please let me know how to handle this situiation.
>
> Thanks and regards,
>
> Raghav Chandra
>