You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Deepak Sharma <de...@gmail.com> on 2016/10/07 19:20:56 UTC

Reading back hdfs files saved as case class

Hi
I am saving RDD[Example] in hdfs from spark program , where Example is case
class.
Now when i am trying to read it back , it returns RDD[String] with the
content as below:
*Example(1,name,value)*

The workaround can be to write as a string in hdfs and read it back as
string and perform further processing.This way the case class name wouldn't
appear at all in the file being written in hdfs.
But i am keen to know if we can read the data directly in Spark if the
RDD[Case_Class] is written to hdfs?

-- 
Thanks
Deepak

Re: Reading back hdfs files saved as case class

Posted by Deepak Sharma <de...@gmail.com>.
Thanks for the answer Reynold.
Yes I can use the dataset but it will solve the purpose I am supposed to
use it for.
I am trying to work on a solution where I need to save the case class along
with data in hdfs.
Further this data will move to different folders corresponding to different
case classes .
The spark programs reading these files are supposed to apply the case class
directly depending on the folder they are reading from.

Thanks
Deepak

On Oct 8, 2016 00:53, "Reynold Xin" <rx...@databricks.com> wrote:

> You can use the Dataset API -- it should solve this issue for case classes
> that are not very complex.
>
> On Fri, Oct 7, 2016 at 12:20 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
>> Hi
>> I am saving RDD[Example] in hdfs from spark program , where Example is
>> case class.
>> Now when i am trying to read it back , it returns RDD[String] with the
>> content as below:
>> *Example(1,name,value)*
>>
>> The workaround can be to write as a string in hdfs and read it back as
>> string and perform further processing.This way the case class name wouldn't
>> appear at all in the file being written in hdfs.
>> But i am keen to know if we can read the data directly in Spark if the
>> RDD[Case_Class] is written to hdfs?
>>
>> --
>> Thanks
>> Deepak
>>
>
>

Re: Reading back hdfs files saved as case class

Posted by Reynold Xin <rx...@databricks.com>.
You can use the Dataset API -- it should solve this issue for case classes
that are not very complex.

On Fri, Oct 7, 2016 at 12:20 PM, Deepak Sharma <de...@gmail.com>
wrote:

> Hi
> I am saving RDD[Example] in hdfs from spark program , where Example is
> case class.
> Now when i am trying to read it back , it returns RDD[String] with the
> content as below:
> *Example(1,name,value)*
>
> The workaround can be to write as a string in hdfs and read it back as
> string and perform further processing.This way the case class name wouldn't
> appear at all in the file being written in hdfs.
> But i am keen to know if we can read the data directly in Spark if the
> RDD[Case_Class] is written to hdfs?
>
> --
> Thanks
> Deepak
>