You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Nan Zhu <zh...@gmail.com> on 2014/02/26 14:23:29 UTC

Discussion on SPARK-1139

Hi, all  

I just created a JIRA https://spark-project.atlassian.net/browse/SPARK-1139 . The issue discusses that:

the new Hadoop API based Spark APIs are actually a mixture of old and new Hadoop API.

Spark APIs are still using JobConf (or Configuration) as one of the parameters, but actually Configuration has been replace by mapreduce.Job in the new Hadoop API

for example : http://codesfusion.blogspot.ca/2013/10/hadoop-wordcount-with-new-map-reduce-api.html  

&  

http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api (p10)

Personally I think it’s better to fix this design, but it will introduce some compatibility issue  

Just bring it here for your advices

Best,  

--  
Nan Zhu

Re: Discussion on SPARK-1139

Posted by ligq <wi...@qq.com>.

You can make the patch so that everyone review


On Wednesday, February 26, 2014 at 8:23 AM, Nan Zhu wrote:

> Hi, all  
>  
> I just created a JIRA https://spark-project.atlassian.net/browse/SPARK-1139 . The issue discusses that:
>  
> the new Hadoop API based Spark APIs are actually a mixture of old and new Hadoop API.
>  
> Spark APIs are still using JobConf (or Configuration) as one of the parameters, but actually Configuration has been replaced by mapreduce.Job in the new Hadoop API
>  
> for example : http://codesfusion.blogspot.ca/2013/10/hadoop-wordcount-with-new-map-reduce-api.html  
>  
> &  
>  
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api (p10)
>  
> Personally I think it’s better to fix this design, but it will introduce some compatibility issue  
>  
> Just bring it here for your advices
>  
> Best,  
>  
> --  
> Nan Zhu
>

Re: Discussion on SPARK-1139

Posted by Nan Zhu <zh...@gmail.com>.

any discussion on this?  

I would like to hear more advices from the community before I create the PR,

an example is how to create a NewHadoopRDD


we get a configuration from JobContext

val updatedConf = job.getConfiguration
new NewHadoopRDD(this, fClass, kClass, vClass, updatedConf)


then we create a jobContext based on this configuration object

NewHadoopRDD.scala (L74)
val jobContext = newJobContext(conf, jobId)
val rawSplits = inputFormat.getSplits(jobContext).toArray


because inputFormat is from mapreduce package, it only accept a JobContext as the parameter in its methods


I think we should avoid introduce Configuration as the parameter, but same thing as before, it will change the APIs


Best,  

--  
Nan Zhu


On Wednesday, February 26, 2014 at 8:23 AM, Nan Zhu wrote:

> Hi, all  
>  
> I just created a JIRA https://spark-project.atlassian.net/browse/SPARK-1139 . The issue discusses that:
>  
> the new Hadoop API based Spark APIs are actually a mixture of old and new Hadoop API.
>  
> Spark APIs are still using JobConf (or Configuration) as one of the parameters, but actually Configuration has been replaced by mapreduce.Job in the new Hadoop API
>  
> for example : http://codesfusion.blogspot.ca/2013/10/hadoop-wordcount-with-new-map-reduce-api.html  
>  
> &  
>  
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api (p10)
>  
> Personally I think it’s better to fix this design, but it will introduce some compatibility issue  
>  
> Just bring it here for your advices
>  
> Best,  
>  
> --  
> Nan Zhu
>