You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by innowireless TaeYun Kim <ta...@innowireless.co.kr> on 2014/09/22 08:24:44 UTC

Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset?

Hi,

 

I'm confused with saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset.

What's the difference between the two?

What's the individual use cases of the two APIs?

Could you describe the internal flows of the two APIs briefly?

 

I've used Spark several months, but I have no experience on MapReduce
programming.

(I've read a few book chapters on MapReduce, but actually not written code
myself.)

So maybe this confusion comes from my lack of experience on MapReduce
programming.

(I hoped it won't necessary to have since I could use Spark.)

 

Thanks.

 


RE: Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset?

Posted by innowireless TaeYun Kim <ta...@innowireless.co.kr>.
Thank you.

 

Now I’ve read some part of PairRDDFunctions.scala, and I’ve found that saveAsNewAPIHadoopFile is just a thin (convenient) wrapper to saveAsNewAPIHadoopDataset.

 

From: Matei Zaharia [mailto:matei.zaharia@gmail.com] 
Sent: Monday, September 22, 2014 5:12 PM
To: user@spark.apache.org; innowireless TaeYun Kim
Subject: Re: Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset?

 

File takes a filename to write to, while Dataset takes only a JobConf. This means that Dataset is more general (it can also save to storage systems that are not file systems, such as key-value stores), but is more annoying to use if you actually have a file.

 

Matei

 

On September 21, 2014 at 11:24:35 PM, innowireless TaeYun Kim (taeyun.kim@innowireless.co.kr) wrote:

Hi,

 

I’m confused with saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset.

What’s the difference between the two?

What’s the individual use cases of the two APIs?

Could you describe the internal flows of the two APIs briefly?

 

I’ve used Spark several months, but I have no experience on MapReduce programming.

(I’ve read a few book chapters on MapReduce, but actually not written code myself.)

So maybe this confusion comes from my lack of experience on MapReduce programming.

(I hoped it won’t necessary to have since I could use Spark…)

 

Thanks.

 


Re: Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset?

Posted by Matei Zaharia <ma...@gmail.com>.
File takes a filename to write to, while Dataset takes only a JobConf. This means that Dataset is more general (it can also save to storage systems that are not file systems, such as key-value stores), but is more annoying to use if you actually have a file.

Matei

On September 21, 2014 at 11:24:35 PM, innowireless TaeYun Kim (taeyun.kim@innowireless.co.kr) wrote:

Hi,

 

I’m confused with saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset.

What’s the difference between the two?

What’s the individual use cases of the two APIs?

Could you describe the internal flows of the two APIs briefly?

 

I’ve used Spark several months, but I have no experience on MapReduce programming.

(I’ve read a few book chapters on MapReduce, but actually not written code myself.)

So maybe this confusion comes from my lack of experience on MapReduce programming.

(I hoped it won’t necessary to have since I could use Spark…)

 

Thanks.