You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by 萝卜丝炒饭 <14...@qq.com> on 2017/02/03 06:34:33 UTC

Re: No Reducer scenarios

HI  Nair,
have you know the class please? I tried to find but failed. I know NewDirectOutputCollector is used to write tmp files.


---Original---
From: "☼ R Nair (रविशंकर नायर)"<ra...@gmail.com>
Date: 2017/1/30 13:32:04
To: "dev"<de...@spark.apache.org>;
Subject: No Reducer scenarios


Dear all,



1) When we don't set the reducer class in driver program, IdentityReducer is invoked.


2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked.


Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which shows the map output. But we know that the output of Map is always written to intermediate local file system. So who/which class is responsible for taking these intermediate Map outputs from local file system and writes to HDFS ? Does this particular class performs this write operation only when setNumReduceTasks is set to zero?


Best, Ravion

RE: No Reducer scenarios

Posted by Praveen Mothkuri <Pr...@kpit.com>.
In this case the output of the map-tasks directly go to distributed file-system, to the path set by FileOutputFormat.setOutputPath(JobConf, Path)<https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#setOutputPath(org.apache.hadoop.mapred.JobConf,%20org.apache.hadoop.fs.Path)>. Also, the framework doesn't sort the map-outputs before writing it out to HDFS.
From: Praveen Mothkuri
Sent: Friday, February 03, 2017 5:14 PM
To: '萝卜丝炒饭'; ☼ R Nair (रविशंकर नायर); dev; user; user
Subject: RE: No Reducer scenarios


In this case the output of the map-tasks directly go to distributed file-system, to the path set by FileOutputFormat.setOutputPath(JobConf, Path)<https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#setOutputPath(org.apache.hadoop.mapred.JobConf,%20org.apache.hadoop.fs.Path)>. Also, the framework doesn't sort the map-outputs before writing it out to HDFS.


From: 萝卜丝炒饭 [mailto:1427357147@qq.com]
Sent: Friday, February 03, 2017 12:05 PM
To: ☼ R Nair (रविशंकर नायर); dev; user; user
Subject: Re: No Reducer scenarios

HI  Nair,
have you know the class please? I tried to find but failed. I know NewDirectOutputCollector is used to write tmp files.
---Original---
From: "☼ R Nair (रविशंकर नायर)"<ra...@gmail.com>>
Date: 2017/1/30 13:32:04
To: "dev"<de...@spark.apache.org>>;
Subject: No Reducer scenarios

Dear all,


1) When we don't set the reducer class in driver program, IdentityReducer is invoked.

2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked.

Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which shows the map output. But we know that the output of Map is always written to intermediate local file system. So who/which class is responsible for taking these intermediate Map outputs from local file system and writes to HDFS ? Does this particular class performs this write operation only when setNumReduceTasks is set to zero?

Best, Ravion
This message contains information that may be privileged or confidential and is the property of the KPIT Technologies Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Technologies Ltd. does not accept any liability for virus infected mails.

RE: No Reducer scenarios

Posted by Praveen Mothkuri <Pr...@kpit.com>.
In this case the output of the map-tasks directly go to distributed file-system, to the path set by FileOutputFormat.setOutputPath(JobConf, Path)<https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#setOutputPath(org.apache.hadoop.mapred.JobConf,%20org.apache.hadoop.fs.Path)>. Also, the framework doesn't sort the map-outputs before writing it out to HDFS.
From: Praveen Mothkuri
Sent: Friday, February 03, 2017 5:14 PM
To: '萝卜丝炒饭'; ☼ R Nair (रविशंकर नायर); dev; user; user
Subject: RE: No Reducer scenarios


In this case the output of the map-tasks directly go to distributed file-system, to the path set by FileOutputFormat.setOutputPath(JobConf, Path)<https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#setOutputPath(org.apache.hadoop.mapred.JobConf,%20org.apache.hadoop.fs.Path)>. Also, the framework doesn't sort the map-outputs before writing it out to HDFS.


From: 萝卜丝炒饭 [mailto:1427357147@qq.com]
Sent: Friday, February 03, 2017 12:05 PM
To: ☼ R Nair (रविशंकर नायर); dev; user; user
Subject: Re: No Reducer scenarios

HI  Nair,
have you know the class please? I tried to find but failed. I know NewDirectOutputCollector is used to write tmp files.
---Original---
From: "☼ R Nair (रविशंकर नायर)"<ra...@gmail.com>>
Date: 2017/1/30 13:32:04
To: "dev"<de...@spark.apache.org>>;
Subject: No Reducer scenarios

Dear all,


1) When we don't set the reducer class in driver program, IdentityReducer is invoked.

2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked.

Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which shows the map output. But we know that the output of Map is always written to intermediate local file system. So who/which class is responsible for taking these intermediate Map outputs from local file system and writes to HDFS ? Does this particular class performs this write operation only when setNumReduceTasks is set to zero?

Best, Ravion
This message contains information that may be privileged or confidential and is the property of the KPIT Technologies Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Technologies Ltd. does not accept any liability for virus infected mails.

RE: No Reducer scenarios

Posted by Praveen Mothkuri <Pr...@kpit.com>.
In this case the output of the map-tasks directly go to distributed file-system, to the path set by FileOutputFormat.setOutputPath(JobConf, Path)<https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#setOutputPath(org.apache.hadoop.mapred.JobConf,%20org.apache.hadoop.fs.Path)>. Also, the framework doesn't sort the map-outputs before writing it out to HDFS.


From: 萝卜丝炒饭 [mailto:1427357147@qq.com]
Sent: Friday, February 03, 2017 12:05 PM
To: ☼ R Nair (रविशंकर नायर); dev; user; user
Subject: Re: No Reducer scenarios

HI  Nair,
have you know the class please? I tried to find but failed. I know NewDirectOutputCollector is used to write tmp files.

---Original---
From: "☼ R Nair (रविशंकर नायर)"<ra...@gmail.com>>
Date: 2017/1/30 13:32:04
To: "dev"<de...@spark.apache.org>>;
Subject: No Reducer scenarios

Dear all,


1) When we don't set the reducer class in driver program, IdentityReducer is invoked.

2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked.

Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which shows the map output. But we know that the output of Map is always written to intermediate local file system. So who/which class is responsible for taking these intermediate Map outputs from local file system and writes to HDFS ? Does this particular class performs this write operation only when setNumReduceTasks is set to zero?

Best, Ravion
This message contains information that may be privileged or confidential and is the property of the KPIT Technologies Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Technologies Ltd. does not accept any liability for virus infected mails.

RE: No Reducer scenarios

Posted by Praveen Mothkuri <Pr...@kpit.com>.
In this case the output of the map-tasks directly go to distributed file-system, to the path set by FileOutputFormat.setOutputPath(JobConf, Path)<https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapred/FileOutputFormat.html#setOutputPath(org.apache.hadoop.mapred.JobConf,%20org.apache.hadoop.fs.Path)>. Also, the framework doesn't sort the map-outputs before writing it out to HDFS.

From: 萝卜丝炒饭 [mailto:1427357147@qq.com]
Sent: Friday, February 03, 2017 12:05 PM
To: ☼ R Nair (रविशंकर नायर); dev; user; user
Subject: Re: No Reducer scenarios

HI  Nair,
have you know the class please? I tried to find but failed. I know NewDirectOutputCollector is used to write tmp files.

---Original---
From: "☼ R Nair (रविशंकर नायर)"<ra...@gmail.com>>
Date: 2017/1/30 13:32:04
To: "dev"<de...@spark.apache.org>>;
Subject: No Reducer scenarios

Dear all,


1) When we don't set the reducer class in driver program, IdentityReducer is invoked.

2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked.

Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which shows the map output. But we know that the output of Map is always written to intermediate local file system. So who/which class is responsible for taking these intermediate Map outputs from local file system and writes to HDFS ? Does this particular class performs this write operation only when setNumReduceTasks is set to zero?

Best, Ravion
This message contains information that may be privileged or confidential and is the property of the KPIT Technologies Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Technologies Ltd. does not accept any liability for virus infected mails.