You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Maysam Hossein Yabandeh <my...@qf.org.qa> on 2013/06/12 17:45:30 UTC

Assigning the same partition number to the mapper output

Hi,

I was wondering if it is possible in hadoop to assign the same partition numbers to the map outputs. I am running a map-only job (with zero reducers) and hadoop shuffles the partitions in the output: i.e. input/part-m-0000X is processed by task number Y and hence generates output/part-m-0000Y (where X != Y).

Thanks
Maysam

CONFIDENTIALITY  NOTICE:
This email and any attachments transmitted with it are confidential and intended for the use of individual or entity to which it is addressed. If you have received this email in error, please delete it immediately and inform the sender. Unless you are the intended recipient, you may not use, disclose, copy or distribute this email or any attachments included. The contents of the emails including any attachments may be subjected to copyrights law, In such case the contents may not be copied, adapted, distributed or transmitted without the consent of the copyright owner.

RE: Assigning the same partition number to the mapper output

Posted by Devaraj k <de...@huawei.com>.
If you are using TextOutputFormat for your job,  getRecordWriter() (i.e RecordWriter<K, V> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException) method uses  FileOutputFormat.getDefaultWorkFile() for generating the file names. It uses $output/_temporary/$taskid/part-[mr]-$id format to generate the o/p path name for task in temp dir.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext, java.lang.String)


If you want to change the o/p path name for the task, you need to override this method for your Job accordingly whichever you want.

If you want change only base output name for the output file(default value is "part"), you can use "mapreduce.output.basename" configuration.



Thanks&Regards
  Devaraj K

From: Maysam Hossein Yabandeh [mailto:myabandeh@qf.org.qa]
Sent: 12 June 2013 21:16
To: user@hadoop.apache.org
Cc: Maysam Hossein Yabandeh
Subject: Assigning the same partition number to the mapper output

Hi,

I was wondering if it is possible in hadoop to assign the same partition numbers to the map outputs. I am running a map-only job (with zero reducers) and hadoop shuffles the partitions in the output: i.e. input/part-m-0000X is processed by task number Y and hence generates output/part-m-0000Y (where X != Y).

Thanks
Maysam

CONFIDENTIALITY NOTICE:
This email and any attachments transmitted with it are confidential and intended for the use of individual or entity to which it is addressed. If you have received this email in error, please delete it immediately and inform the sender. Unless you are the intended recipient, you may not use, disclose, copy or distribute this email or any attachments included. The contents of the emails including any attachments may be subjected to copyrights law, In such case the contents may not be copied, adapted, distributed or transmitted without the consent of the copyright owner.

RE: Assigning the same partition number to the mapper output

Posted by Devaraj k <de...@huawei.com>.
If you are using TextOutputFormat for your job,  getRecordWriter() (i.e RecordWriter<K, V> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException) method uses  FileOutputFormat.getDefaultWorkFile() for generating the file names. It uses $output/_temporary/$taskid/part-[mr]-$id format to generate the o/p path name for task in temp dir.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext, java.lang.String)


If you want to change the o/p path name for the task, you need to override this method for your Job accordingly whichever you want.

If you want change only base output name for the output file(default value is "part"), you can use "mapreduce.output.basename" configuration.



Thanks&Regards
  Devaraj K

From: Maysam Hossein Yabandeh [mailto:myabandeh@qf.org.qa]
Sent: 12 June 2013 21:16
To: user@hadoop.apache.org
Cc: Maysam Hossein Yabandeh
Subject: Assigning the same partition number to the mapper output

Hi,

I was wondering if it is possible in hadoop to assign the same partition numbers to the map outputs. I am running a map-only job (with zero reducers) and hadoop shuffles the partitions in the output: i.e. input/part-m-0000X is processed by task number Y and hence generates output/part-m-0000Y (where X != Y).

Thanks
Maysam

CONFIDENTIALITY NOTICE:
This email and any attachments transmitted with it are confidential and intended for the use of individual or entity to which it is addressed. If you have received this email in error, please delete it immediately and inform the sender. Unless you are the intended recipient, you may not use, disclose, copy or distribute this email or any attachments included. The contents of the emails including any attachments may be subjected to copyrights law, In such case the contents may not be copied, adapted, distributed or transmitted without the consent of the copyright owner.

RE: Assigning the same partition number to the mapper output

Posted by Devaraj k <de...@huawei.com>.
If you are using TextOutputFormat for your job,  getRecordWriter() (i.e RecordWriter<K, V> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException) method uses  FileOutputFormat.getDefaultWorkFile() for generating the file names. It uses $output/_temporary/$taskid/part-[mr]-$id format to generate the o/p path name for task in temp dir.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext, java.lang.String)


If you want to change the o/p path name for the task, you need to override this method for your Job accordingly whichever you want.

If you want change only base output name for the output file(default value is "part"), you can use "mapreduce.output.basename" configuration.



Thanks&Regards
  Devaraj K

From: Maysam Hossein Yabandeh [mailto:myabandeh@qf.org.qa]
Sent: 12 June 2013 21:16
To: user@hadoop.apache.org
Cc: Maysam Hossein Yabandeh
Subject: Assigning the same partition number to the mapper output

Hi,

I was wondering if it is possible in hadoop to assign the same partition numbers to the map outputs. I am running a map-only job (with zero reducers) and hadoop shuffles the partitions in the output: i.e. input/part-m-0000X is processed by task number Y and hence generates output/part-m-0000Y (where X != Y).

Thanks
Maysam

CONFIDENTIALITY NOTICE:
This email and any attachments transmitted with it are confidential and intended for the use of individual or entity to which it is addressed. If you have received this email in error, please delete it immediately and inform the sender. Unless you are the intended recipient, you may not use, disclose, copy or distribute this email or any attachments included. The contents of the emails including any attachments may be subjected to copyrights law, In such case the contents may not be copied, adapted, distributed or transmitted without the consent of the copyright owner.

RE: Assigning the same partition number to the mapper output

Posted by Devaraj k <de...@huawei.com>.
If you are using TextOutputFormat for your job,  getRecordWriter() (i.e RecordWriter<K, V> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException) method uses  FileOutputFormat.getDefaultWorkFile() for generating the file names. It uses $output/_temporary/$taskid/part-[mr]-$id format to generate the o/p path name for task in temp dir.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext, java.lang.String)


If you want to change the o/p path name for the task, you need to override this method for your Job accordingly whichever you want.

If you want change only base output name for the output file(default value is "part"), you can use "mapreduce.output.basename" configuration.



Thanks&Regards
  Devaraj K

From: Maysam Hossein Yabandeh [mailto:myabandeh@qf.org.qa]
Sent: 12 June 2013 21:16
To: user@hadoop.apache.org
Cc: Maysam Hossein Yabandeh
Subject: Assigning the same partition number to the mapper output

Hi,

I was wondering if it is possible in hadoop to assign the same partition numbers to the map outputs. I am running a map-only job (with zero reducers) and hadoop shuffles the partitions in the output: i.e. input/part-m-0000X is processed by task number Y and hence generates output/part-m-0000Y (where X != Y).

Thanks
Maysam

CONFIDENTIALITY NOTICE:
This email and any attachments transmitted with it are confidential and intended for the use of individual or entity to which it is addressed. If you have received this email in error, please delete it immediately and inform the sender. Unless you are the intended recipient, you may not use, disclose, copy or distribute this email or any attachments included. The contents of the emails including any attachments may be subjected to copyrights law, In such case the contents may not be copied, adapted, distributed or transmitted without the consent of the copyright owner.