You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Raghav <ra...@gmail.com> on 2016/11/22 05:45:03 UTC

newbie question about RDD

Hi

I am extremely new to Spark. I have to read a file form HDFS, and get it in
memory  in RDD format.

I have a Java class as follows:

class Person {
    private long UUID;
    private String FirstName;
    private String LastName;
    private String zip;

   // public methods
}

The file in HDFS is as follows:

UUID.     FirstName         LastName     Zip
7462       John                 Doll                06903
5231       Brad                 Finley             32820


Can someone point me how to get a JavaRDD<Person> object by reading the
file in HDFS ?

Thanks.

-- 
Raghav

Re: newbie question about RDD

Posted by Mohit Durgapal <du...@gmail.com>.
Hi Raghav,

Please refer to the following code:

SparkConf sparkConf = new
SparkConf().setMaster("local[2]").setAppName("PersonApp");

//creating java spark context

JavaSparkContext sc = new JavaSparkContext(sparkConf);

//reading file from hfs into spark rdd , the name node is localhost
JavaRDD<String> personStringRDD =
sc.textFile("hdfs://localhost:9000/custom/inputPersonFile.txt");


//Converting from String RDD to Person RDD ...this is just an example,
you can replace the parsing with a better exception handled code

JavaRDD<Person> personObjectRDD = personStringRDD.map(personRow -> {
    String[] personValues = personRow.split("\t");

        return new Person(Long.parseLong(personValues[0]),
personValues[1], personValues[2],
            personValues[3]);
});

//finally just printing the count of objects
System.out.println("Person count = "+personObjectRDD.count());


Regards
Mohit


On Tue, Nov 22, 2016 at 11:17 AM, Raghav <ra...@gmail.com> wrote:

> Sorry I forgot to ask how can I use spark context here ? I have hdfs
> directory path of the files, as well as the name node of hdfs cluster.
>
> Thanks for your help.
>
> On Mon, Nov 21, 2016 at 9:45 PM, Raghav <ra...@gmail.com> wrote:
>
>> Hi
>>
>> I am extremely new to Spark. I have to read a file form HDFS, and get it
>> in memory  in RDD format.
>>
>> I have a Java class as follows:
>>
>> class Person {
>>     private long UUID;
>>     private String FirstName;
>>     private String LastName;
>>     private String zip;
>>
>>    // public methods
>> }
>>
>> The file in HDFS is as follows:
>>
>> UUID.     FirstName         LastName     Zip
>> 7462       John                 Doll                06903
>> 5231       Brad                 Finley             32820
>>
>>
>> Can someone point me how to get a JavaRDD<Person> object by reading the
>> file in HDFS ?
>>
>> Thanks.
>>
>> --
>> Raghav
>>
>
>
>
> --
> Raghav
>

Re: newbie question about RDD

Posted by Raghav <ra...@gmail.com>.
Sorry I forgot to ask how can I use spark context here ? I have hdfs
directory path of the files, as well as the name node of hdfs cluster.

Thanks for your help.

On Mon, Nov 21, 2016 at 9:45 PM, Raghav <ra...@gmail.com> wrote:

> Hi
>
> I am extremely new to Spark. I have to read a file form HDFS, and get it
> in memory  in RDD format.
>
> I have a Java class as follows:
>
> class Person {
>     private long UUID;
>     private String FirstName;
>     private String LastName;
>     private String zip;
>
>    // public methods
> }
>
> The file in HDFS is as follows:
>
> UUID.     FirstName         LastName     Zip
> 7462       John                 Doll                06903
> 5231       Brad                 Finley             32820
>
>
> Can someone point me how to get a JavaRDD<Person> object by reading the
> file in HDFS ?
>
> Thanks.
>
> --
> Raghav
>



-- 
Raghav